EP3948656A1 - Prédiction de piétons à base d'attributs - Google Patents
Prédiction de piétons à base d'attributsInfo
- Publication number
- EP3948656A1 EP3948656A1 EP20719542.1A EP20719542A EP3948656A1 EP 3948656 A1 EP3948656 A1 EP 3948656A1 EP 20719542 A EP20719542 A EP 20719542A EP 3948656 A1 EP3948656 A1 EP 3948656A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- location
- time
- determining
- environment
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
Definitions
- Prediction techniques can be used to determine future states of entities in an environment. That is, prediction techniques can be used to determine how a particular entity is likely to behave in the future. Current prediction techniques often involve physics-based modeling or rules-of-the-road simulations to predict future states of entities in an environment.
- FIG. 1 is a pictorial flow diagram of an example process for capturing sensor data, determining attributes associated with an object, determining a predicted location based on the attributes, and controlling a vehicle based on the predicted location.
- FIG. 2 illustrates examples of attributes of an object.
- FIG. 3A illustrates an example of determining a destination associated with an obj ect in an environment.
- FIG. 3B illustrates another example of determining a destination associated with an object in an environment.
- FIG. 4 illustrates an example of determining predicted location(s) for an object based on attributes of the object over time.
- FIG. 5 illustrates an example of updating a frame of reference for use in determining predicted location(s).
- FIG. 6 is a pictorial flow diagram of an example process for capturing sensor data, determining that a first object and second object are in an environment, determining attributes associated with the second object, determining a predicted location based on the attributes and a reference line, and controlling a vehicle based on the predicted location.
- FIG. 7 illustrates examples of attributes of an object.
- FIG. 8 illustrates an example of determining predicted location(s) for a first object based on attributes of a second object over time.
- FIG. 9 depicts a block diagram of an example system for implementing the techniques described herein.
- FIG. 10 depicts an example process for capturing sensor data, determining attributes associated with an object, determining a predicted location based on the attributes, and controlling a vehicle based on the predicted location.
- FIG. 11 depicts an example process for capturing sensor data, determining that a first object and second object are in an environment, determining attributes associated with the second object, determining a predicted location based on the attributes and a reference line, and controlling a vehicle based on the predicted location.
- This disclosure is directed to techniques for predicting locations of an obj ect based on attributes of the object and/or based on attributes of other object(s) proximate to the object.
- the techniques discussed herein can be implemented to predict locations of a pedestrian proximate to a crosswalk region in an environment as they traverse or prepare to traverse through the crosswalk region.
- the techniques discussed herein can be implemented to predict locations of objects (e.g., a vehicle) as the vehicle traverses an environment. For example, predicted locations of the vehicle can be based on attributes of the vehicle as well as attributes of other vehicles proximate to the vehicle in the environment.
- Attributes can comprise information about an object, including but not limited to a position, velocity, acceleration, bounding box, etc. Attributes can be determined for an object over time (e.g., times T-M, ... , T-2 , T-i, To) such that, when input to a prediction component (e.g., a machine learned model such as a neural network), the prediction component can output predictions (e.g., predicted locations of the object) at times in the future (e.g., times Ti, T2, T3, . . . , TN).
- a vehicle such as an autonomous vehicle, can be controlled to traverse an environment based at least in part on the predicted locations of the object(s).
- the techniques discussed herein can be implemented to predict locations of a pedestrian proximate to a crosswalk region in an environment as the pedestrian traverses through or prepares to traverse through the crosswalk region.
- sensor data can be captured in an environment, and an object can be identified and classified as a pedestrian.
- a crosswalk region can be identified in the environment based on map data and/or based on sensor data (e.g., identifying a crosswalk region from sensor data, whether directly by observing visual indicators of a crosswalk region (stripes, crosswalk signs, etc.) or indirectly by historical detections of pedestrians crossing a road at such a location).
- At least one destination can be associated with a crosswalk region.
- a destination can represent an opposite side of the street in the crosswalk region.
- a destination can be selected or otherwise determined based on attributes of the pedestrian (e.g., position, velocity, acceleration, heading, etc.).
- attributes of the pedestrian e.g., position, velocity, acceleration, heading, etc.
- a score associated with a likelihood that the pedestrian will cross a particular crosswalk can be based on attributes of the pedestrian (e.g., position, velocity, acceleration, heading, etc.).
- a crosswalk region associated with a highest score can be selected or otherwise determined to be a target crosswalk associated with the pedestrian.
- a destination associated with a pedestrian can be determined based on a number of factors. For example, a destination can be determined based at least in part on one or more of: a straight line extrapolation of a velocity of a pedestrian, a nearest location of a sidewalk region associated with a pedestrian, a gap between parked vehicles, an open door associated with a vehicle, and the like.
- sensor data can be captured of an environment to determine a likelihood of these example candidate destinations being present in an environment.
- a score can be associated with each candidate destination and a likely destination can be used in accordance with the techniques discussed herein.
- the techniques can include predicting location(s) of the pedestrian over time to traverse the crosswalk region.
- attributes for the object can be determined over time (e.g., times T-M, , T-2, T-i, To), whereby the attributes can be represented in a frame of reference associated with the object at time To. That is, a position of the object at To can be considered to be an origin (e.g., coordinates (0, 0) in an x-y coordinate system)), whereby a first axis can be defined by the origin and a destination associated with the crosswalk region.
- other points can be considered as an origin for another frame of reference.
- the destination associated with the crosswalk region can be selected as a point on a second side of the street opposite the first side of the street, although any destination can be selected.
- a second axis of the frame of reference can be perpendicular to the first axis and, in at least some examples, he along the plane containing the crosswalk region.
- attributes of the pedestrian can be determined based on sensor data captured over time, and can include, but are not limited to, one or more of a position of the pedestrian at a time (e.g., wherein the position can be represented in the frame of reference discussed above), a velocity of the pedestrian at the time (e.g., a magnitude and/or angle with respect to the first axis (or other reference line)), an acceleration of the pedestrian at the time, an indication of whether the pedestrian is in a drivable area (e.g., whether the pedestrian is on a sidewalk or a road), an indication of whether the pedestrian is in a crosswalk region, a region control indicator state (e.g., whether the intersection is controlled by a traffic signal and/or whether the crosswalk is controlled by a traffic signal (e.g., walk/don’t walk) and/or a state of the traffic signal), a vehicle context (e.g., a presence of a vehicle in the environment and attribute(s) associated with the vehicle), a
- Attributes can be determined over time (e.g., at times T-M, ... , T-2, T-i, To (where M is an integer) which may represent any time(s) prior to, and/or including, a current time, such as, but not limited to, 0.01 seconds, 0.1 seconds, 1 second, 2 seconds, etc.)) and input to a prediction component to determine predicted locations of the pedestrian.
- the prediction component is a machine learned model such as a neural network, a fully connected neural network, a convolutional neural network, a recurrent neural network, and the like.
- the prediction component can output information associated with the pedestrian in the future.
- the prediction component can output predicted information associated with times in the future (e.g., times Ti, T2, T3, . . . , TN (where N is an integer) which represent any time(s) after a current time).
- the predicted information can comprise predicted location(s) of the pedestrian at future times.
- a predicted location can be represented in the frame of reference as a distance between the origin (e.g., the location of the pedestrian at To) and the pedestrian at Ti (e.g., a distance s ) and/or as a lateral offset (e y ) relative to the first axis (e.g., relative to the reference line).
- the distance s and/or the lateral offset e y can be represented as rational numbers (e.g., 0.1 meter, 1 meter, 1.5 meters, etc.).
- the distance s and/or the lateral offset can be binned (e.g., input to a binning algorithm) to discretize the original data values into one or many discrete intervals.
- bins for the distance s can be 0-1 meters, 1-2 meters, 3-4 meters, and the like, although any regular or irregular interval can be used for such bins.
- a vehicle such as an autonomous vehicle, can be controlled to traverse an environment based at least in part on the predicted locations of the pedestrian(s).
- the techniques discussed herein can be implemented to predict locations of objects (e.g., a vehicle) as the vehicle traverses an environment.
- objects e.g., a vehicle
- sensor data can be captured in an environment, and an object can be identified and classified as a vehicle.
- a reference line can be identified and associated with the vehicle based on map data (e.g., identifying a drivable area such as a lane) and/or based on sensor data (e.g., identifying a drivable area or lane from sensor data).
- map data e.g., identifying a drivable area such as a lane
- sensor data e.g., identifying a drivable area or lane from sensor data.
- an environment may include any number of objects.
- a target object or target vehicle may be traversing an environment where there are other vehicles that are proximate the target vehicle.
- the techniques may include identifying the nearest K objects to the target object (where K is an integer).
- the techniques may include identifying the nearest 5 vehicles or other objects to the target vehicle, although any number of vehicles or other objects can be identified or otherwise determined.
- the techniques may include identifying objects that a within a threshold distance to the target object.
- the vehicle capturing sensor data may be identified as one of the objects that is proximate the target vehicle.
- additional characteristics may be used to determine which objects to consider. As non-limiting examples, objects travelling in an opposing direction, on an opposite side of a divided road, objects having a particular classification (e.g., other than vehicle), etc. may be disregarded when considering the K nearest objects.
- attributes can be determined for the target object and/or other object(s) that are proximate the target object.
- ahributes can include, but are not limited to, one or more of a velocity of the object at a time, an acceleration of the object at the time, a position of the object at the time (e.g., in global or local coordinates), a bounding box associated with the object at the time (e.g., representing extent(s) of the object, roll, pitch, and/or yaw), a lighting state associated with the object at the first time (e.g., headlight(s), braking light(s), hazard light(s), turn indicator light(s), reversing light(s), etc.), a wheel orientation of a vehicle, a distance between the object and a map element at the time (e.g., a distance to a stop line, traffic line, speed bump, yield line, intersection, driveway, etc.), a classification of the object (e.g.
- attribute information associated with the target object and/or other objects that are proximate to the target object can be captured over time and can be input to a prediction component to determine predicted information associated with the target object.
- the predicted information can represent a predicted location of the target at various time intervals (e.g., a predicted location at times Ti, T2, T3, ... , TN).
- the predicted location(s) can be compared to candidate reference lines in the environment to determine a reference line associated with the target object.
- an environment may include two lanes which may be eligible (e.g., legal) drivable areas for the target vehicle to traverse. Further, such drivable areas may be associated with a representative reference line (e.g., a center of a lane or drivable area).
- the predicted location(s) can be compared to the reference line(s) to determine a similarity score between the predicted location(s) and the candidate reference line(s). In some examples, a similarity score can be based at least in part on a distance between a predicted location and a reference line, and the like.
- attributes associated with an object can be input to a reference line prediction component which can output a likely reference line associated with the object.
- the techniques can include receiving, selecting, or otherwise determining a reference line and representing the predicted location(s) with respect to the reference line in the environment. That is, the predicted location(s) can be represented as a distance s along the reference line representing a distance between a location of the target at time To and a predicted location of the target object at a future time (e.g., time Ti).
- a lateral offset e y can represent a distance between the reference line and a point intersecting with a line that is perpendicular to a tangent line associated with the reference line.
- the prediction techniques can be repeated iteratively or in parallel to determine predicted location(s) associated with objects in the environment. That is, a first target object may be associated with a first subset of objects in an environment, and a second target object may be associated with a second subset of objects in the environment. In some instances, the first target object may be included in the second subset of objects, while the second target object may be included in the first subset of objects.
- predicted locations can be determined for a plurality of objects in an environment. In some cases, the predicted locations can be determined substantially simultaneously, within technical tolerances.
- a vehicle such as an autonomous vehicle, can be controlled to traverse an environment based at least in part on the predicted locations of the object(s). For example, such predicted location(s) can be input to a planning component of the vehicle to traverse an environment with an understanding of the predicted location(s) of the objects in the environment.
- determining attributes and inputting the attributes into a prediction component can obviate hard-coded rules that may otherwise inflexibly represent an environment.
- determining predicted location(s) associated with objects in an environment e.g., pedestrians or vehicles
- predicted location(s) suggesting a likelihood of a collision or a near-collision may allow an autonomous vehicle to alter a trajectory (e.g., change lanes, stop, etc.) in order to safely traverse the environment.
- a trajectory e.g., change lanes, stop, etc.
- the techniques described herein can be implemented in a number of ways. Example implementations are provided below with reference to the following figures. Although discussed in the context of an autonomous vehicle, the methods, apparatuses, and systems described herein can be applied to a variety of systems (e.g., a sensor system or a robotic platform), and are not limited to autonomous vehicles. In one example, similar techniques may be utilized in driver controlled vehicles in which such a system may provide an indication of whether it is safe to perform various maneuvers. In another example, the techniques can be utilized in a manufacturing assembly line context, or in an aerial surveying context. Additionally, the techniques described herein can be used with real data (e.g., captured using sensor(s)), simulated data (e.g., generated by a simulator), or any combination of the two.
- real data e.g., captured using sensor(s)
- simulated data e.g., generated by a simulator
- FIG. 1 is a pictorial flow diagram of an example process 100 for capturing sensor data, determining attributes associated with an object, determining a predicted location based on the attributes, and controlling a vehicle based on the predicted location.
- the process can include capturing sensor data of an environment.
- the sensor data can be captured by one or more sensors on a vehicle (autonomous or otherwise).
- the sensor data can include data captured by a lidar sensor, an image sensor, a radar sensor, a time of flight sensor, a sonar sensor, and the like.
- the operation 102 can include determining a classification of an object (e.g., to determine that an object is a pedestrian in an environment).
- the process can include determining a destination associated with an object (e.g., a pedestrian).
- An example 106 illustrates a vehicle 108 and an obj ect 110 (e.g. , a pedestrian) in the environment.
- the vehicle 108 can perform the operations discussed in the process 100.
- the operation 104 can include determining attributes of the object 110 to determine a location, velocity, heading, etc. of the object 110. Further, the operation 104 can include accessing map data to determine whether a crosswalk region (e.g., crosswalk region 112) is present in the environment. In some examples, the crosswalk region 112 can represent a perimeter of a crosswalk in an environment. In some examples, the operation 104 can include determining that the object is within a threshold distance (e.g., 5 meters) of a portion of the crosswalk region 112. In some examples, the threshold distance may be considered to be a minimum distance from the object to any portion of the crosswalk region.
- a threshold distance e.g., 5 meters
- the operation 104 can include determining a probability or score associated with the pedestrian (e.g., the object 110) crossing a respective crosswalk region and selecting a most likely crosswalk region.
- a destination 114 can be associated with the crosswalk region 112.
- the destination 114 can represent a center or a midpoint of a side of the crosswalk region 112 that is opposite a location of the object 110, although the destination 114 can represent any point in the environment associated with the crosswalk region 112. Additional details of determining a destination are discussed in connection with FIGS. 3A and 3B, as well as throughout this disclosure.
- the process can include determining attribute(s) associated with the object.
- attributes can be determined for the obj ect 110 at various instances in time up to and including a most recent time associated with the attributes (e.g., at times T-M, ... , T-2, T-i, To).
- the object 110 can be referred to as an object 120 (e.g., at time T-2), as object 122 (e.g., at time T-i), and as object 124 (e.g., a time To).
- time To may represent a time at which data is input to a prediction component (discussed below), time T-i may represent 1 second before time To, and time T-2 may represent 2 seconds before time To.
- times To, T-i, and T-2 can represent any time instances and/or periods of time.
- time T-i may represent 0.1 seconds before time To
- time T-2 may represent 0.2 seconds before time To.
- attributes determined in the operation 116 can include, but are not limited to, information about the objects 120, 122, and/or 124.
- a velocity attribute associated with the object 120 may represent a velocity of the object 120 at time T-2.
- a velocity attribute associated with the object 122 may represent a velocity of the object at time T-i.
- a velocity attribute associated with the object 124 may represent a velocity of the object at time To.
- some or all of the attributes may be represented in a frame of reference relative to the object 124 (e.g., the object 110 at time To) and the destination 114.
- the process can include determining predicted location(s) associated with the object based on the attribute(s).
- An example 128 illustrates a predicted location 130 (e.g., a predicted location of the object 110 at time Ti, which is a time after To).
- the predicted location 130 at time Ti can represent a location of the object 110 in the future.
- the operation 126 can include determining predicted locations for a plurality of times associated with the object 124 in the future. For example, the operation 126 can include determining predicted locations of the object at times Ti, T2, T3, ...
- the predicted location(s) can be represented as a distance s along a reference line and a lateral offset e y from the reference line.
- the distance, s, and offset, e y may be relative to a relative coordinate system defined at each time step and/or relative to the last determined reference frame. Additional details of determining the predicted location(s) are discussed in connection with FIGS. 4 and 5, as well as throughout this disclosure.
- the operations 102, 104, 116, and/or 126 can be performed iteratively or repeatedly (e.g., at each time step, at a frequency of 10 Hz, etc.), although the process 100 can be performed at any interval or at any time.
- the process can include controlling a vehicle based at least in part on the predicted location(s).
- the operation 132 can include generating a trajectory for the vehicle 108 to follow (e.g., to stop before the intersection and/or before the crosswalk region 112 to allow the pedestrian 110 to traverse through the crosswalk region 112 to the destination 114).
- FIG. 2 illustrates examples 200 of attributes of an object.
- attributes 202 can represent a variety of information about or associated with an object in an environment (e.g., the object 110 of FIG. 1).
- the attributes 202 can be determined for one or more time instances associated with the object.
- the object 120 represents the object 110 at time T-2
- the object 122 represents the object 110 at time T-i
- the object 124 represents the object 110 at time To. Attributes can be determined for the objects at each of the time instances T-2, T-i, and To, for example.
- Examples of the attributes 202 include, but are not limited to, a distance between the object and a road, an x- (or first-) distance to a region, a y- (or second-) distance to a region, a distance to a destination, a velocity (magnitude), a velocity (angle), an x-position, a y-position, a region flux, a region control indicator state, a vehicle context (or an object context, generally), an object association, and the like.
- the attributes discussed herein may be relative to a relative coordinate system defined at each time step (e.g., associated with the objects 120, 122, 124, respectively), relative to the last determined reference frame, relative to a frame of reference define with respect to the vehicle 108 (e.g., at various time step(s)), with respect to a global coordinate reference frame, and the like.
- An example 204 illustrates various attributes associated with the obj ect 124.
- the example 204 illustrates attributes with respect to the crosswalk region 112 and the destination 114.
- an x-distance to a region can correspond to a distance 206. That is, the distance 206 can represent a distance in a first direction (which may be in a global or local reference frame) between the object 124 and an edge of the crosswalk region 112 nearest to the object 124.
- a y-distance to a region can correspond to a distance 208. That is, the distance 208 can represent a distance in a second direction between the object 124 and an edge of the crosswalk region 112.
- a minimum distance between the object 124 and the crosswalk region may be determined and subsequently decomposed into respective x- and y-components as the x- and y-distances, respectively.
- the object 124 is location on a sidewalk region 210 (or generally, a non-drivable region 210).
- the crosswalk region 112 may provide a path across a road 212 (or generally, a drivable region 212).
- a distance to a road can correspond to a distance 214, which can correspond to a shortest or smallest distance between the object 124 and a portion of the road 212 within the crosswalk region 112.
- a distance to a destination can correspond to a distance 216.
- the distance 216 represents a distance between the object 124 and the destination 114.
- the attribute(s) 202 can be represented in a frame of reference.
- the frame of reference may be defined with respect to a location of an object at each time steps, with respect to a last reference frame, a global coordinate system, and the like.
- an origin corresponding to the frame of reference can correspond to a location of the object 124.
- An example 218 illustrates a frame of reference 220 (also referred to as a reference frame 220).
- a first axis of the frame of reference 220 is defined by a unit vector from a location of the object 124 and in a direction of the destination 114. The first axis is labeled as an x-axis in the example 218.
- a second axis can be perpendicular to the first axis and can he in a plane comprising the crosswalk.
- the second axis is labeled as a y-axis in the example 218.
- the first axis can represent a reference line against which distances s can be determined, whereas lateral offsets e y can be determined relative to the second direction (e.g., the y-axis).
- An example 222 illustrates a velocity vector 224 associated with the object 124 and an angle 226 which represents an angle between the velocity vector 224 and a reference line.
- the reference line can correspond to the first axis of the frame of reference 220, although any reference line can be selected or otherwise determined.
- attributes associated with the object 124, 122 and 120 can be represented with respect to the frame of reference 220. That is, at time TO, the x-position and the y-position of the object 124 can be represented as (0, 0) (e.g., the object 124 represent an origin of the frame of reference 220). Further, the x-position and the y-position of the object 122 (at time To) can be represented at (-xi, -yi), and the x-position and the y-position of the object 120 (at time To) can be represented at (-X2, - y2), with respect to the frame of reference 220. In at least some examples, a single coordinate frame may be used, whereas in other examples, a relative coordinate frame may be associated with every point and attributes may be defined relative to each relative coordinate frame.
- the attributes 202 can include a region flux.
- the region flux can represent a number of objects that have passed through the crosswalk region 112 within a period of time.
- the region flux can correspond to J number of cars (and/or other objects, such as other pedestrians) that have passed through the crosswalk region 112 (or any region) within K number of seconds (e.g., 5 vehicles within the time between T and To).
- the region flux can represent any time period(s).
- the region flux can include information about a speed, acceleration, velocity, etc. about such vehicles that have traversed through the crosswalk region 112 within the period of time.
- the attributes 202 can include a region control indicator.
- the region control indicator can correspond to a state of a traffic signal or indicator controlling pedestrian traffic within the crosswalk region 112.
- the region control indicator can indicate whether a traffic light is present, a state of a traffic light (e.g., green, yellow, red, etc.), and/or a state of a crosswalk indicator (e.g., walk, don’t walk, unknown, etc.).
- the attributes 202 can include a vehicle context, which may indicate whether vehicles or other objects are proximate to the object (e.g., 124) and attributes associated with any such vehicle or object.
- a vehicle context may include, but is not limited to, a velocity, direction, acceleration, bounding box, position (e.g., in the frame of reference 220), distance between the object and the object 124, and the like.
- the attributes 202 can include an object association.
- the object association can indicate whether the object 124 is associated with other objects (e.g., whether the object 124 is in a group of pedestrians).
- the object association attribute 202 can include attributes associated with the associated objects.
- the attributes 202 may further include, but are not limited to, information associated with an acceleration, yaw, pitch, roll, relative velocity, relative acceleration, whether the object is in the road 212, whether the object is on the sidewalk 210, whether the object is within the crosswalk region 112, whether a destination has changed (e.g., whether the object has turned around in the intersection), an object height, whether the object is on a bicycle, and the like.
- the atributes 202 may further include, but are not limited to, pedestrian hand gestures, pedestrian gaze detection, an indication of whether the pedestrian is standing, walking, running, etc., whether other pedestrians are in the crosswalk, a pedestrian crosswalk flux (e.g., a number of pedestrians travelling through the crosswalk (e.g., across the drivable area) over a period of time), a ratio of a first number of pedestrians on a sidewalk (or a non-drivable area) and a second number of pedestrians in the crosswalk region (or a drivable area), variances, confidences, and/or probabilities associated with each atribute, and the like.
- a pedestrian crosswalk flux e.g., a number of pedestrians travelling through the crosswalk (e.g., across the drivable area) over a period of time
- FIGS. 3A and 3B illustrate examples of determining a destination associated with an object in an environment.
- FIG. 3 A illustrates selecting between two crosswalk regions
- FIG. 3B illustrates selecting between two destinations associated with a single crosswalk region.
- FIG. 3A illustrates an example 300 of determining a destination associated with an object in an environment. As mentioned above, and in general, FIG. 3 A illustrates selecting between two crosswalk regions.
- An example 302 illustrates an object 304, which may correspond to a pedestrian at time T-i, and an object 306, which may correspond to the pedestrian at time To.
- a vehicle such as the vehicle 108 can capture sensor data of the environment and can determine that a pedestrian is in the environment.
- a computing system can determine that the objects 304 and/or 306 are proximate to one or more crosswalk regions in the environment. For example, a computing device can access map data which may include map element(s) indicating location(s) and extent(s) (e.g., length and width) of such crosswalk regions.
- map data may include map element(s) indicating location(s) and extent(s) (e.g., length and width) of such crosswalk regions.
- the example 302 illustrates the environment as including a first crosswalk region 308 (also referred to as a region 308) and a second crosswalk region 310 (also referred to as a region 310).
- the region 308 can be associated with a threshold region 312 (also referred to as a threshold 312) and the region 310 can be associated with a threshold region 314 (also referred to as a threshold 314).
- the objects 304 and 306 are within the thresholds 312 and 314.
- a computing device can determine that the objects 304 and/or 306 are associated with the regions 308 and 310, respectively.
- the threshold 312 can represent any region or area associated with the region 308.
- the threshold 312 can represent a threshold of 5 meters surrounding the region 308, although any distance or shape of the threshold 312 can be associated with the region 308.
- the threshold 314 can include any distance or shape associated with the region 310.
- the region 308 can be associated with a destination 316. Further, and in some instances, the region 310 can be associated with a destination 318. In some examples, a location of the destinations 316 and/or 318 are situated across a street from the object 304 and/or 306. That is, a destination associated with a crosswalk region can be selected based at least in part on a location of a pedestrian with respect to the crosswalk region.
- the object 304 and/or 306 can be associated with attribute(s) as discussed herein. That is, the techniques can include determining a position, velocity, heading, acceleration, etc., of the objects 304 and 306, respectively.
- information represented in the example 302 can be input to a destination prediction component 320.
- the destination prediction component 320 can output a score or probability that the object 306 may traverse through the region 308 and/or the region 310.
- object information associated with two time steps e.g., T-i and To
- object information over any time period can be used in determining a destination.
- attributes associated with the objects 302 and 306 can be input to the destination prediction component 320 in one or more frames of reference. For example, for evaluating the destination 316, attributes associated with the object 304 and 306 can be input to the destination prediction component 320 using a frame of reference based at least in part on the destination 316. Further, for evaluating the destination 318, attributes associated with the object 304 and 306 can be input to the destination prediction component 320 using a frame of reference based at least in part on the destination 318.
- a destination associated with a pedestrian can be determined based on a number of factors. For example, a destination can be determined based at least in part on one or more of: a straight line extrapolation of a velocity of a pedestrian, a nearest location of a sidewalk region associated with a pedestrian, a gap between parked vehicles, an open door associated with a vehicle, and the like.
- sensor data can be captured of an environment to identify possible destinations in the environment. Further attributes associated with an object can be represented in a frame of reference based tat least in part on the determined destination, and the attributes can be input to the destination prediction component 320 for evaluation, as discussed herein.
- An example 322 illustrates an output of the destination prediction component 320. For example, based at least in part on the attributes of the objects 304 and/or 306, the destination prediction component 320 may predict that the object 304 and/or 306 is heading towards the destination 318.
- FIG. 3B illustrates another example 324 of determining a destination associated with an object in an environment. As noted above, FIG. 3B illustrates selecting between two destinations associated with a single crosswalk region.
- the example 324 illustrates an object 326, which may correspond to a pedestrian at time T-i, and an object 328, which may correspond to the pedestrian at time To.
- a computing device may identify two destinations 334 and 336 associated with a region 338.
- attributes associated with the objects 326 and 328 can be input to the destination prediction component 320 (along with information about the destinations 334 and 336, and the region 338, as well as other information) to determine which of the destinations 334 and 336 are most likely.
- such a destination prediction component 320 may generically determine a pedestrian is intending to jaywalk, or otherwise cross in a non-crosswalk area, and output a corresponding destination.
- attributes relative to a region may not be determined (as no region may exist).
- a fixed region perpendicular to a road segment and having a fixed width may be used as a region for determining such parameters.
- the region 338 may be associated with the objects 326 and/or 328 at a time in which the objects 326 and/or 328 are within a threshold distance of the region 338.
- FIG. 4 illustrates an example 400 of determining predicted location(s) for an object based on attributes of the object over time.
- An example 402 illustrates the object 120 (e.g., a pedestrian at time T-2), the object 122 (e.g., the pedestrian at time T-i), and the object 124 (e.g., the pedestrian at time To).
- the objects 120, 122, and 124 can be represented in a frame of reference with the object 124 as the origin (and or one or more frames of reference associated with any one or more times).
- the example 402 illustrates the objects 120, 122, and 124 associated with the crosswalk region 112 and the destination 114.
- Data associated with the example 402 can be input to a location prediction component 404 that can output predicted location(s) associated with the objects 120, 122, and/or 124.
- An example 406 illustrates predicted location(s) based on the objects 120, 122, and/or 124.
- the location prediction component 404 can output a predicted location 408, which may represent a location of the obj ect at time Ti.
- the predicted location 408 may be represented as a distance (e.g., s) 410 and a lateral offset 412 (e.g., e y ) based at least in part on a frame of reference defined by the object 124 (e.g., an origin) and the destination 114.
- the location prediction component 404 can output five predicted locations corresponding to times Ti, T2, T 3 , T4, and T5, respectively, although it can be understood that the location prediction component 404 can output any number of predicted locations that are associated with any future time(s). In some examples, such additional predicted locations may be defined by a global coordinate frame, local coordinate frame, relative to a relative reference frame associated with a previous predicted point, and the like.
- an additional value may be associated with the output bin indicating an offset from a central portion of the bin.
- an output may indicate that the next predicted location falls into a first bin (e.g., between 0 and 1 m) and an associated offset of 0.2 m may be used to indicate that a likely position of the predicted position may be 0.7 m (e.g., 0.5 m + 0.2 m).
- predicted location(s) illustrated in the example 406 can be referred to as predicted location(s) 414.
- the location prediction component 404 can output a variance, covariance, probability, or a certainty associated with the respective predicted location(s) 414 indicative of a certainty that the object 124 will be located at a respective predicted location at a respective time.
- FIG. 5 illustrates an example 500 of updating a frame of reference for use in determining predicted location(s).
- the example 406 is reproduced in FIG. 5 to represent a time TA, which may correspond to the time To represented in the example 406.
- the objects 120, 122, and 124 are represented in the frame of reference 220, which is defined in part by a location of the object 124 and a location of the destination 114.
- the example 406 can be updated for a next time step and updated predicted locations can be determined (e.g., in the operation 502).
- Such an updated example is illustrated as an example 504, which illustrates an environment corresponding to the example 406 but at a time TB that occurs after time TA.
- An object 506 in the example 504 represents a time To with respect to a frame of reference 508.
- the example 504 includes an object 510, which represent the object at time T-i.
- An object 512 further represents the object at time T-2.
- the object 510 (e.g., the object at time T-i in the frame of reference 508) can correspond to the object 124 (e.g., the object at time To in the frame of reference 220).
- the object 512 (e.g., the object at time T-2 in the frame of reference 508) can correspond to the object 122 (e.g., the object at time T-i in the frame of reference 220).
- the example 504 illustrates the object 120, whereby the object 120 (and/or attributes associated with the object 120) may or may not be used when determining updated predicted locations in the example 504.
- the frame of reference 508 can be defined by or based at least in part on a location of the object 506 and the destination 114.
- a relative reference frame can be defined with respect to the destination 114 and most current determined location of the object 124 (e.g., such a coordinate reference frame may change according to changes of the object in the environment).
- information associated with the example 504 (which may or may not include information associated with the object 120) can be input to the location prediction component 404 to determine updated predicted location(s) 514.
- the updated predicted location(s) 514 may be based at least in part on the frame of reference 508.
- updated predicted location(s) can be determined at a frequency of 10 Hz, although predicted locations can be determined at any frequency or between any regular or irregular intervals of time.
- FIG. 6 is a pictorial flow diagram of an example process 600 for capturing sensor data, determining that a first object and second object are in an environment, determining attributes associated with the second object, determining a predicted location based on the attributes and a reference line, and controlling a vehicle based on the predicted location.
- attributes may not be determined for one or more second objects, and predicted location(s) of a first object can be determined based on the attributes associated with the first object.
- the process can include capturing sensor data of an environment.
- the sensor data can be captured by one or more sensors on a vehicle (autonomous or otherwise).
- the sensor data can include data captured by a lidar sensor, an image sensor, a radar sensor, a time of flight sensor, a sonar sensor, and the like.
- the operation 602 can include determining a classification of an object (e.g., to determine that an object is a vehicle in an environment).
- An example 604 illustrates a vehicle 606, which may capture the sensor data in the operation 602.
- the environment may further include objects 608, 610, 612, 614, 616, and 618.
- the object 618 can be referred to as a target object 618, as the target object 618 may be the subject to (e.g., the target of) such prediction operations, as discussed herein.
- the vehicle 606 may traverse through the environment via a trajectory 620.
- the object 608 can be travelling in a same direction as the vehicle 606 (e.g., in the same lane as the vehicle 606), while in some examples, the objects 610-618 and the target object 618 can be travelling in an opposite direction (e.g., the target object 618 can represent oncoming traffic with respect to the vehicle 606).
- the process 600 can be used in any environment and is not limited to the particular objects and/or geometry illustrated in FIG. 6.
- the process can include determining attribute(s) associated with the target object and object(s) proximate the target object.
- An example 624 illustrates the vehicle 606, the objects 606-616, and the target object 618.
- the operation 622 may include determining attribute(s) associated with the target object without determining attributes of other objects. For example, such other objects may not be present in an environment or such attributes of other objects may not be needed, desired, or required for determining predicted location(s) of the target object 618, according to implementations of the techniques discussed herein.
- the outline of the object 612 is illustrated with a dotted line, while elements 626, 628, and 630 corresponding to the object 612 are represented as points.
- the element 626 represents a location associated with the object 612 at a time T-2.
- the element 628 represents a location associated with the object 612 at a time T-i.
- the element 630 represents a location associated with the object 612 at time To.
- the vehicle 606, the objects 608-616, and the target object 618 are associated with elements, although such elements are not labeled in FIG. 6. It can be understood in the context of this disclosure that such elements represent locations associated with the vehicle and/or objects at respective times (e.g., times T-2, T-i, and To) and/or can represent attributes associated with the objects at the respective times.
- attributes determined in the operation 622 can represent information about each respective object.
- attributes can include, but are not limited to, a location of an object (e.g., a global location and/or a relative location with respect to any frame of reference), a velocity, an acceleration, a bounding box, a lighting state, lane attribute(s), an offset from a reference line or predicted path, and the like. Additional details of such attributes are discussed in connection with
- the operation 622 can include determining or identifying objects based at least in part on a proximity of the object to the target object. For example, the operation 622 can include determining the nearest N number of objects proximate the target object 618, where N is an integer. Additionally or in the alternative, the operation 622 may include identifying or selecting objects based on the object being within a threshold distance of the target object 618.
- such selection may exclude certain objects based on one or more characteristics, for example, but not limited to, object classification (e.g., only consider vehicles), direction of motion (e.g., only consider objects moving in the same direction), location relative to a map (e.g., only consider vehicles in one or more lane(s) of a road), and the like.
- object classification e.g., only consider vehicles
- direction of motion e.g., only consider objects moving in the same direction
- location relative to a map e.g., only consider vehicles in one or more lane(s) of a road
- the process can include determining predicted location(s) associated with the target object based at least in part on the attribute(s), the predicted location(s) with respect to a reference line (which, in some examples, may comprise a center line of a lane associated with the object) in the environment.
- a reference line which, in some examples, may comprise a center line of a lane associated with the object
- An example 634 illustrates predicted location(s) 636 associated with the target object 618 in the environment.
- the predicted location(s) 636 can be defined by and/or based at least in part on a reference line 638. That is, the predicted location(s) 636 can be expressed by a distance s along the reference line 638 and by a lateral offset e y from the reference line 638.
- the reference line 638 can be based at least in part on map data of the environment. Further, in some examples, the reference line 638 can correspond to a centerline of a lane of a road or other drivable area.
- the operation 632 can include receiving a reference line associated with the target object 618, such as from a reference line prediction component.
- the reference line prediction component can comprise a machine learned model trained to output a most likely reference line based at least in part on map data, attributes of object(s) in the environment, and the like.
- the reference line prediction component can be integrated into the other machine learned models discussed herein, and in some instances, the reference line prediction component can be a separate component.
- the operation 632 can include selecting the reference line 638 from a plurality of candidate reference lines.
- the reference line can include selecting the reference line 638 from a plurality of candidate reference lines.
- predicted location(s) 636 can be selected based at least in part on a similarity score representing a similarity of the predicted location(s) 636 with respect to the reference line 638.
- predicted location(s) 636 may relative to a predicted path and/or trajectory, previously predicted waypoints, and the like. Additional examples of the predicted location(s), the reference line(s), and similarity score(s) are discussed in connection with FIG. 8, as well as throughout this disclosure.
- the process can include controlling a vehicle based at least in part on the predicted location(s).
- the operation 640 can include generating a trajectory or an updated trajectory 642 for the vehicle 608 to follow (e.g., to bias the vehicle 606 away from the predicted location(s) 636 associated with the vehicle 618, in the event the target object 618 may traverse closely to an expect path of the vehicle 608).
- FIG. 7 illustrates examples 700 of attributes of an object.
- attributes 702 can represent a variety of information about or associated with an object in an environment (e.g., the object 612 and the target object 618 of FIG. 6, as represented in the example 604 reproduced in FIG. 7).
- the attributes 702 can be determined for one or more time instances of the object.
- An example 704 illustrates the object 612 at time instances T-2, T-i, and To.
- the element 626 represents the object 612 at time T-2
- the element 628 represents the object 612 at time T-i
- the element 630 represents the object 612 at time To.
- attributes can be determined for any type and/or number of objects in the example 704, and is not limited to the object 612.
- attributes can be determined for an element 706 (e.g., representing the target object 618 at time T-2), an element 708 (e.g., representing the target object 618 at time T-i), and an element 710 (e.g., representing the target object 618 at time To).
- attributes can be determined for any number of time instances, and are not limited to T-2, T-i, and To.
- Examples of the attributes 702 include, but are not limited to, a velocity of an object, an acceleration of the object, an x-position of the object (e.g., a global position, local position, and/or a position with respect to any other frame of reference), ay-position of the object (e.g., a local position, a global position and/or a position with respect to any other frame of reference), a bounding box associated with the object (e.g., extents (length, width, and/or height), yaw, pitch, roll, etc.), lighting states (e.g., brake light(s), blinker light(s), hazard light(s), headlight(s), reverse light(s), etc.), a wheel orientation of the object, map elements (e.g.., a distance between the object and a stop light, stop sign, speed bump, intersection, yield sign, and the like), a classification of the object (e.g., vehicle, car, truck, bicycle, motorcycle, pedestrian, animal, etc).
- attributes of objects can be determined with respect to a local frame of reference, global coordinates, and the like.
- a frame of reference can be determined with an origin corresponding to a location of the target obj ect 618 at time To (e.g. , the obj ect 710).
- FIG. 8 illustrates an example 800 of determining predicted location(s) for a first object based on attributes of a second object over time.
- information associated with the example 704 of FIG. 7 can be input to a location prediction component 802, which in turn can output predicted location(s) associated with a target object.
- location prediction component 802 can output predicted location(s) associated with a target object.
- attribute information associated with the vehicle 606, the objects 608-616, and/or the target object 618 at various times e.g., T-2, T-i, and To
- T-2, T-i, and To can be input to the location prediction component 802.
- An example 804 illustrates predicted location(s) 806 associated with the target object 618. That is, the location prediction component 802 can receive attribute information associated with objects that are proximate the target object 618, as well as attribute information associated with the target object 618, and can output predicted location(s) 806 representing the target object 618 in the future.
- An object 808 illustrates the target object 618 at time T-2.
- An object 810 represents the target object 618 at time T-i.
- an object 812 represents the target object at time To.
- the location prediction component 802 can determine predicted location(s) 806 based on the attribute information discussed herein.
- the predicted location(s) can initially be represented in a global coordinate system, in a frame of reference with the target obj ect as an origin, and the like. Further, the predicted locations can be represented with respect to a reference line in the environment.
- the environment may represent a plurality of reference lines such as the reference line 814 and the reference line 816.
- the reference line 816 may, for example, correspond to a lane change of the target object.
- the reference line 814 may represent a centerline of a first road segment and the reference line 816 may represent a centerline of a second road segment (and/or a transition therebetween).
- the environment may represent a single reference line.
- an environment may represent a plurality of reference lines.
- the location prediction component 802 can receive an indication of a most likely reference line (e.g., 814) as an input. In some examples, the location prediction component 802 can determine a likely reference line based at least in part on one or more attributes of the target object 618, of other objects, and/or the environment, as described herein.
- a most likely reference line e.g., 814
- the location prediction component 802 can determine a similarity score 818 that represents a similarity between the predicted location(s) 806 and the reference line 814. Further, the location prediction component 802 can determine a similarity score 820 that represents a similarity between the predicted location(s) 806 and the reference line 816. In some examples, a similarity score can be based at least in part on an individual or cumulative lateral offset between the predicted location(s) and a respective reference line, although other metrics can be used to determine a similarity score.
- the location prediction component 802 can determine that the similarity score 818 is lower than the similarity score 820, and accordingly, can select the reference line 814 as the basis for defining, in part, the predicted location(s) 806. In other examples, however, each potential reference line may be input into the location prediction component 802 along with the previously computed attributes such that the location prediction component 802 may select the appropriate reference line and/or trajectory to use as a basis based on machine learned parameters.
- the predicted location(s) 806 can include predicted locations 822, 824, 826, 828, and/or 830.
- the predicted location 822 can represent a first distance s and a first lateral offset (e.g., (si, e yi )) with respect to the reference line 814.
- the predicted location 824 can represent a second distance s and a second lateral offset (e.g., (s2, e y 2)) with respect to the reference line 814.
- the predicted location 826 can represent a third distance s and a third lateral offset (e.g., (S3, e y3 )) with respect to the reference line 814.
- the predicted location 828 can represent a fourth distance s and a fourth lateral offset (e.g., (s4, e y4 )) with respect to the reference line 814.
- the predicted location 830 can represent a fifth distance s and a fifth lateral offset (e.g., (ss, e y5 )) with respect to the reference line 814.
- the location prediction component 802 can determine fewer or more predicted location(s), as discussed herein.
- FIG. 9 depicts a block diagram of an example system 900 for implementing the techniques described herein.
- the system 900 can include a vehicle 902, which can correspond to the vehicle 108 of FIG. 1 and the vehicle 606 of FIG. 6.
- the example vehicle 902 can be a driverless vehicle, such as an autonomous vehicle configured to operate according to a Level 5 classification issued by the U.S. National Highway Traffic Safety Administration, which describes a vehicle capable of performing all safety-critical functions for the entire trip, with the driver (or occupant) not being expected to control the vehicle at any time.
- the vehicle 902 can be configured to control all functions from start to completion of the trip, including all parking functions, it may not include a driver and/or controls for driving the vehicle 902, such as a steering wheel, an acceleration pedal, and/or a brake pedal.
- the vehicle 902 can include vehicle computing device(s) 904, one or more sensor systems 906, one or more emitters 908, one or more communication connections 910, at least one direct connection 912, and one or more drive systems 914.
- the vehicle computing device(s) 904 can include one or more processors 916 and memory 918 communicatively coupled with the one or more processors 916.
- the vehicle 902 is an autonomous vehicle; however, the vehicle 902 could be any other type of vehicle or robotic platform.
- the memory 918 of the vehicle computing device(s) 904 stores a localization component 920, a perception component 922, one or more maps 924, one or more system controllers 926, a prediction component 928 comprising an attribute component 930, a destination prediction component 932, and a location prediction component 934, and a planning component 936. Though depicted in FIG. 9 as residing in the memory 918 for illustrative purposes, it is contemplated that the localization component 920, the perception component 922, the one or more maps 924, the one or more system controllers 926, the prediction component 928, the attribute component
- the destination prediction component 932, the location prediction component 934, and a planning component 936 can additionally, or alternatively, be accessible to the vehicle 902 (e.g., stored on, or otherwise accessible by, memory remote from the vehicle 902).
- the localization component 920 can include functionality to receive data from the sensor system(s) 906 to determine a position and/or orientation of the vehicle 902 (e.g., one or more of an x-, y-, z-position, roll, pitch, or yaw).
- the localization component 920 can include and/or request / receive a map of an environment and can continuously determine a location and/or orientation of the autonomous vehicle within the map.
- the localization component 920 can utilize SLAM (simultaneous localization and mapping), CLAMS (calibration, localization and mapping, simultaneously), relative SLAM, bundle adjustment, non-linear least squares optimization, or the like to receive image data, lidar data, radar data, time of flight data, IMU data, GPS data, wheel encoder data, and the like to accurately determine a location of the autonomous vehicle.
- the localization component 920 can provide data to various components of the vehicle 902 to determine an initial position of an autonomous vehicle for generating a trajectory and/or for determining that an object is proximate to one or more crosswalk regions and/or for identifying candidate reference lines, as discussed herein.
- the perception component 922 can include functionality to perform object detection, segmentation, and/or classification.
- the perception component 922 can provide processed sensor data that indicates a presence of an entity that is proximate to the vehicle 902 and/or a classification of the entity as an entity type (e.g., car, pedestrian, cyclist, animal, building, tree, road surface, curb, sidewalk, stoplight, stop sign, unknown, etc.).
- the perception component 922 can provide processed sensor data that indicates one or more characteristics associated with a detected entity (e.g., a tracked object) and/or the environment in which the entity is positioned.
- characteristics associated with an entity can include, but are not limited to, an x-position (global and/or local position), a y-position (global and/or local position), a z-position (global and/or local position), an orientation (e.g., a roll, pitch, yaw), an entity type (e.g., a classification), a velocity of the entity, an acceleration of the entity, an extent of the entity (size), etc.
- Characteristics associated with the environment can include, but are not limited to, a presence of another entity in the environment, a state of another entity in the environment, a time of day, a day of a week, a season, a weather condition, an indication of darkness/light, etc.
- the memory 918 can further include one or more maps 924 that can be used by the vehicle 902 to navigate within the environment.
- a map can be any number of data structures modeled in two dimensions, three dimensions, or N-dimensions that are capable of providing information about an environment, such as, but not limited to, topologies (such as intersections), streets, mountain ranges, roads, terrain, and the environment in general.
- a map can include, but is not limited to: texture information (e.g., color information (e.g., RGB color information, Lab color information, HSV/HSL color information), and the like), intensity information (e.g., lidar information, radar information, and the like); spatial information (e.g., image data projected onto a mesh, individual“surfels” (e.g., polygons associated with individual color and/or intensity)), reflectivity information (e.g., specularity information, retroreflectivity information, BRDF information, BSSRDF information, and the like).
- texture information e.g., color information (e.g., RGB color information, Lab color information, HSV/HSL color information), and the like), intensity information (e.g., lidar information, radar information, and the like); spatial information (e.g., image data projected onto a mesh, individual“surfels” (e.g., polygons associated with individual color and/or intensity)), reflectivity information (e.g., specularity information
- the map can be stored in a tiled format, such that individual tiles of the map represent a discrete portion of an environment, and can be loaded into working memory as needed.
- the one or more maps 924 can include at least one map (e.g., images and/or a mesh).
- the vehicle 902 can be controlled based at least in part on the maps 924. That is, the maps 924 can be used in connection with the localization component 920, the perception component 922, the prediction component 928, and/or the planning component 936 to determine a location of the vehicle 902, identify objects in an environment, and/or generate routes and/or trajectories to navigate within an environment.
- the one or more maps 924 can be stored on a remote computing device(s) (such as the computing device(s) 940) accessible via network(s) 938.
- multiple maps 924 can be stored based on, for example, a characteristic (e.g., type of entity, time of day, day of week, season of the year, etc.). Storing multiple maps 924 can have similar memory requirements, but can increase the speed at which data in a map can be accessed.
- the vehicle computing device(s) 904 can include one or more system controllers 926, which can be configured to control steering, propulsion, braking, safety, emitters, communication, and other systems of the vehicle 902. These system controller(s) 926 can communicate with and/or control corresponding systems of the drive system(s) 914 and/or other components of the vehicle 902.
- the prediction component 928 can include functionality to generate predicted information associated with objects in an environment.
- the prediction component 928 can be implemented to predict locations of a pedestrian proximate to a crosswalk region (or otherwise a region or location associated with a pedestrian crossing a road) in an environment as they traverse or prepare to traverse through the crosswalk region.
- the techniques discussed herein can be implemented to predict locations of objects (e.g., a vehicle, a pedestrian, and the like) as the vehicle traverses an environment.
- the prediction component 928 can generate one or more predicted trajectories for such target objects based on attributes of the target object and/or other objects proximate the target object.
- the attribute component 930 can include functionality to determine attribute information associated with objects in an environment.
- the attribute component 930 can receive data from the perception component 922 to determine attribute information of objects over time.
- attributes of an object can be determined based on sensor data captured over time, and can include, but are not limited to, one or more of a position of the pedestrian at a time (e.g., wherein the position can be represented in the frame of reference discussed above), a velocity of the pedestrian at the time (e.g., a magnitude and/or angle with respect to the first axis (or other reference line)), an acceleration of the pedestrian at the time, an indication of whether the pedestrian is in a drivable area (e.g., whether the pedestrian is on a sidewalk or a road), an indication of whether the pedestrian is in a crosswalk region, an indication of whether the pedestrian is jaywalking, a region control indicator state (e.g., whether the crosswalk is controlled by a traffic signal and/or a state of the traffic signal), a vehicle context (e.g., a presence of a vehicle in the environment and attribute(s) associated with the vehicle), a flux through the cross
- a position of the pedestrian at a time e.g.
- attributes can be determined for a target object (e.g., a vehicle) and/or other object(s) (e.g., other vehicles) that are proximate the target object.
- attributes can include, but are not limited to, one or more of a velocity of the object at a time, an acceleration of the object at the time, a position of the object at the time (e.g., in global or local coordinates), abounding box associated with the object at the time (e.g., representing extent(s) of the object, roll, pitch, and/or yaw), a lighting state associated with the object at the first time (e.g., headlight(s), braking light(s), hazard light(s), turn indicator light(s), reverse light(s), etc.), a distance between the object and a map element at the time (e.g., a distance to a stop line, traffic line, speed bump, yield line, intersection, driveway, etc.), a distance between the object and other
- any combination of attributes for an object can be determined, as discussed herein.
- Attributes can be determined over time (e.g., at times T-M, ... , T , T-i, To (where M is an integer) and the various times represent any time up to a most recent time) and input to the destination prediction component 932 and/or the location prediction component 934 to determine predicted information associated with such objects.
- the destination prediction component 932 can include functionality to determine a destination for an object in an environment, as discussed herein. In the context of a pedestrian, the destination prediction component 932 can determine which crosswalk region(s) may be applicable to a pedestrian based on the pedestrian being within a threshold distance of the crosswalk region(s), as discussed herein. In at least some examples, such a destination prediction component 932 may determine a point on an opposing sidewalk, regardless of an existence of a crosswalk. Further, attributes for an object associated with any period of time can be input to the destination prediction component 932 to determine a score, probability, and/or likelihood that a pedestrian is heading towards or may be associated with a crosswalk region. [0131] In some examples, the destination prediction component 932 is a machine learned model such as a neural network, a fully connected neural network, a convolutional neural network, a recurrent neural network, and the like.
- the destination prediction component 932 can be trained by reviewing data logs to determine events where a pedestrian has crossed a crosswalk. Such events can be identified and attributes can be determined for the object (e.g., the pedestrian) and the environment, and data representing the events can be identified as training data.
- the training data can be input to a machine learning model where a known result (e.g., a ground truth, such as the known“future” attributes) can be used to adjust weights and/or parameters of the machine learning model to minimize an error.
- a known result e.g., a ground truth, such as the known“future” attributes
- the location prediction component 934 can include functionality to generate or otherwise determine predicted location(s) associated with objects in an environment. For example, as discussed herein, attribute information can be determined for one or more objects in an environment, which may include a target object and/or other object proximate to the target object. In some examples, attributes associated with the vehicle 902 can be used to determine predicted location(s) associated with object(s) in an environment.
- the location prediction component 934 can further include functionality to represent attribute information in various frame(s) of reference, as discussed herein.
- the location prediction component 934 can use a location of an object at time To as an origin for a frame of reference, which can be updated for each time instance.
- the location prediction component 934 can include functionality to identify candidate reference lines in an environment (e.g., based on map data) and can select a reference line (e.g., based on a similarity score) to determine the predicted location(s) with respect to the reference line.
- the location prediction component 934 is a machine learned model such as a neural network, a fully connected neural network, a convolutional neural network, a recurrent neural network, and the like, or any combination thereof.
- the location prediction component 934 can be trained by reviewing data logs and determining attribute information. Training data representing relevant events (e.g., vehicles driving a threshold distance away from a reference line, pedestrians traversing crosswalks, pedestrians jay walking, and the like) can be input to a machine learning model where a known result (e.g., a ground truth, such as the known “future” attributes/locations) can be used to adjust weights and/or parameters of the machine learning model to minimize an error
- a known result e.g., a ground truth, such as the known “future” attributes/locations
- the planning component 936 can determine a path for the vehicle 902 to follow to traverse the environment. For example, the planning component 936 can determine various routes and trajectories and various levels of detail. For example, the planning component 936 can determine a route to travel from a first location (e.g., a current location) to a second location (e.g., a target location). For the purpose of this discussion, a route can be a sequence of waypoints for travelling between two locations. As non-limiting examples, waypoints include streets, intersections, global positioning system (GPS) coordinates, etc. Further, the planning component 936 can generate an instruction for guiding the autonomous vehicle along at least a portion of the route from the first location to the second location.
- GPS global positioning system
- the planning component 936 can determine how to guide the autonomous vehicle from a first waypoint in the sequence of waypoints to a second waypoint in the sequence of waypoints.
- the instruction can be a trajectory, or a portion of a trajectory.
- multiple trajectories can be substantially simultaneously generated (e.g., within technical tolerances) in accordance with a receding horizon technique, wherein one of the multiple trajectories is selected for the vehicle 902 to navigate.
- the planning component 936 can generate one or more trajectories for the vehicle 902 based at least in part on predicted location(s) associated with object(s) in an environment.
- the planning component 936 can use temporal logic, such as linear temporal logic and/or signal temporal logic, to evaluate one or more trajectories of the vehicle 902.
- the components discussed herein e.g., the localization component 920, the perception component 922, the one or more maps 924, the one or more system controllers 926, the prediction component 928, the attribute component 930, the destination prediction component 932, the location prediction component 934, and the planning component 936) are described as divided for illustrative purposes. However, the operations performed by the various components can be combined or performed in any other component. Further, any of the components discussed as being implemented in software can be implemented in hardware, and vice versa. Further, any functionality implemented in the vehicle 902 can be implemented in the computing device(s) 940, or another component (and vice versa).
- the sensor system(s) 906 can include time of flight sensors, lidar sensors, radar sensors, ultrasonic transducers, sonar sensors, location sensors (e.g., GPS, compass, etc.), inertial sensors (e.g., inertial measurement units (IMUs), accelerometers, magnetometers, gyroscopes, etc.), cameras (e.g., RGB, IR, intensity, depth, etc.), microphones, wheel encoders, environment sensors (e.g., temperature sensors, humidity sensors, light sensors, pressure sensors, etc.), etc.
- the sensor system(s) 906 can include multiple instances of each of these or other types of sensors.
- the time of flight sensors can include individual time of flight sensors located at the comers, front, back, sides, and/or top of the vehicle 902.
- the camera sensors can include multiple cameras disposed at various locations about the exterior and/or interior of the vehicle 902.
- the sensor system(s) 906 can provide input to the vehicle computing device(s) 904. Additionally or alternatively, the sensor system(s) 906 can send sensor data, via the one or more networks 938, to the one or more computing device(s) 940 at a particular frequency, after a lapse of a predetermined period of time, in near real-time, etc.
- the vehicle 902 can also include one or more emitters 908 for emitting light and/or sound, as described above.
- the emitters 908 in this example include interior audio and visual emitters to communicate with passengers of the vehicle 902.
- interior emitters can include speakers, lights, signs, display screens, touch screens, haptic emitters (e.g., vibration and/or force feedback), mechanical actuators (e.g., seatbelt tensioners, seat positioners, headrest positioners, etc.), and the like.
- the emitters 908 in this example also include exterior emitters.
- the exterior emitters in this example include lights to signal a direction of travel or other indicator of vehicle action (e.g., indicator lights, signs, light arrays, etc.), and one or more audio emitters (e.g., speakers, speaker arrays, horns, etc.) to audibly communicate with pedestrians or other nearby vehicles, one or more of which comprising acoustic beam steering technology.
- lights to signal a direction of travel or other indicator of vehicle action e.g., indicator lights, signs, light arrays, etc.
- audio emitters e.g., speakers, speaker arrays, horns, etc.
- the vehicle 902 can also include one or more communication connection(s) 910 that enable communication between the vehicle 902 and one or more other local or remote computing device(s).
- the communication connection(s) 910 can facilitate communication with other local computing device(s) on the vehicle 902 and/or the drive system(s) 914.
- the communication connection(s) 910 can allow the vehicle to communicate with other nearby computing device(s) (e.g., other nearby vehicles, traffic signals, etc.).
- the communications connection(s) 910 also enable the vehicle 902 to communicate with a remote teleoperations computing device or other remote services.
- the communications connection(s) 910 can include physical and/or logical interfaces for connecting the vehicle computing device(s) 904 to another computing device or a network, such as network(s) 938.
- the communications connection(s) 910 can enable Wi-Fi-based communication such as via frequencies defined by the IEEE 802.11 standards, short range wireless frequencies such as Bluetooth®, cellular communication (e.g., 2G, 3G, 4G, 4G LTE, 5G, etc.) or any suitable wired or wireless communications protocol that enables the respective computing device to interface with the other computing device(s).
- the vehicle 902 can include one or more drive systems 914.
- the vehicle 902 can have a single drive system 914.
- individual drive systems 914 can be positioned on opposite ends of the vehicle 902 (e.g., the front and the rear, etc.).
- the drive system(s) 914 can include one or more sensor systems to detect conditions of the drive system(s) 914 and/or the surroundings of the vehicle 902.
- the sensor system(s) can include one or more wheel encoders (e.g., rotary encoders) to sense rotation of the wheels of the drive modules, inertial sensors (e.g., inertial measurement units, accelerometers, gyroscopes, magnetometers, etc.) to measure orientation and acceleration of the drive module, cameras or other image sensors, ultrasonic sensors to acoustically detect objects in the surroundings of the drive system, lidar sensors, radar sensors, etc.
- Some sensors, such as the wheel encoders can be unique to the drive system(s) 914.
- the sensor system(s) on the drive system(s) 914 can overlap or supplement corresponding systems of the vehicle 902 (e.g., sensor system(s) 906).
- the drive system(s) 914 can include many of the vehicle systems, including a high voltage battery, a motor to propel the vehicle, an inverter to convert direct current from the battery into alternating current for use by other vehicle systems, a steering system including a steering motor and steering rack (which can be electric), a braking system including hydraulic or electric actuators, a suspension system including hydraulic and/or pneumatic components, a stability control system for distributing brake forces to mitigate loss of traction and maintain control, an HVAC system, lighting (e.g., lighting such as head/tail lights to illuminate an exterior surrounding of the vehicle), and one or more other systems (e.g., cooling system, safety systems, onboard charging system, other electrical components such as a DC/DC converter, a high voltage junction, ahigh voltage cable, charging system, charge port, etc.).
- a high voltage battery including a motor to propel the vehicle
- an inverter to convert direct current from the battery into alternating current for use by other vehicle systems
- a steering system including a steering motor and steering rack (which can
- the drive system(s) 914 can include a drive system controller which can receive and preprocess data from the sensor system(s) and to control operation of the various vehicle systems.
- the drive system controller can include one or more processors and memory communicatively coupled with the one or more processors.
- the memory can store one or more components to perform various functionalities of the drive system(s) 914.
- the drive system(s) 914 also include one or more communication connection(s) that enable communication by the respective drive system with one or more other local or remote computing device(s).
- the direct connection 912 can provide a physical interface to couple the one or more drive system(s) 914 with the body of the vehicle 902.
- the direct connection 912 can allow the transfer of energy, fluids, air, data, etc. between the drive system(s) 914 and the vehicle.
- the direct connection 912 can further releasably secure the drive system(s) 914 to the body of the vehicle 902.
- the localization component 920, the perception component 922, the one or more maps 924, the one or more system controllers 926, the prediction component 928, the attribute component 930, the destination prediction component 932, the location prediction component 934, and the planning component 936 can process sensor data, as described above, and can send their respective outputs, over the one or more network(s) 938, to one or more computing device(s) 940.
- the localization component 920, the one or more maps 924, the one or more system controllers 926, the prediction component 928, the attribute component 930, the destination prediction component 932, the location prediction component 934, and the planning component 936 can send their respective outputs to the one or more computing device(s) 940 at a particular frequency, after a lapse of a predetermined period of time, in near real-time, etc.
- the vehicle 902 can send sensor data to one or more computing device(s) 940 via the network(s) 938.
- the vehicle 902 can send raw sensor data to the computing device(s) 940.
- the vehicle 902 can send processed sensor data and/or representations of sensor data to the computing device(s) 940.
- the vehicle 902 can send sensor data to the computing device(s) 940 at a particular frequency, after a lapse of a predetermined period of time, in near real-time, etc.
- the vehicle 902 can send sensor data (raw or processed) to the computing device(s) 940 as one or more log files.
- the computing device(s) 940 can include processor(s) 942 and a memory 944 storing a training component 946.
- the training component 946 can include functionality to train one or more models to determine prediction information, as discussed herein. In some instances, the training component 946 can communicate information generated by the one or more models to the vehicle computing device(s) 904 to revise how to control the vehicle 902 in response to different situations.
- the training component 946 can train one or more machine learning models to generate the prediction component(s) discussed herein.
- the training component 946 can include functionality to search data logs and determine attribute and/or location (e.g., in any one or more reference frames) information associated with object(s).
- Log data that corresponds to particular scenarios e.g., a pedestrian approaching and crossing a crosswalk region, a pedestrian jaywalking, a target object rounding a bend with an offset from a centerline, and the like
- the training data can be input to a machine learning model where a known result (e.g., a ground truth, such as the known“future” attributes) can be used to adjust weights and/or parameters of the machine learning model to minimize an error
- aspects of some or all of the components discussed herein can include any models, algorithms, and/or machine learned algorithms.
- the components in the memory 944 can be implemented as a neural network.
- the training component 946 can utilize a neural network to generate and/or execute one or more models to determine segmentation information from sensor data, as discussed herein.
- an exemplary neural network is a biologically inspired algorithm which passes input data through a series of connected layers to produce an output.
- Each layer in a neural network can also comprise another neural network, or can comprise any number of layers (whether convolutional or not).
- a neural network can utilize machine learning, which can refer to a broad class of such algorithms in which an output is generated based on learned parameters.
- machine learning or machine learned algorithms can include, but are not limited to, regression algorithms (e.g., ordinary least squares regression (OLSR), linear regression, logistic regression, stepwise regression, multivariate adaptive regression splines (MARS), locally estimated scatterplot smoothing (LOESS)), instance-based algorithms (e.g., ridge regression, least absolute shrinkage and selection operator (LASSO), elastic net, least- angle regression (LARS)), decisions tree algorithms (e.g., classification and regression tree (CART), iterative dichotomiser 3 (ID3), Chi-squared automatic interaction detection (CHAID), decision stump, conditional decision trees), Bayesian algorithms (e.g., naive Bayes, Gaussian naive Bayes, multinomial naive Bayes, average one- dependence estimators (AODE), Bayesian belief network (BNN), Bayesian networks), clustering algorithms (e.g.
- OLSR ordinary least squares regression
- MERS multivariate adaptive regression splines
- LOESS
- Additional examples of architectures include neural networks such as ResNet50, ResNetlOl, VGG, DenseNet, PointNet, and the like.
- the processor(s) 916 of the vehicle 902 and the processor(s) 942 of the computing device(s) 940 can be any suitable processor capable of executing instructions to process data and perform operations as described herein.
- the processor(s) 916 and 942 can comprise one or more Central Processing Units (CPUs), Graphics Processing Units (GPUs), or any other device or portion of a device that processes electronic data to transform that electronic data into other electronic data that can be stored in registers and/or memory.
- integrated circuits e.g., ASICs, etc.
- gate arrays e.g., FPGAs, etc.
- other hardware devices can also be considered processors in so far as they are configured to implement encoded instructions.
- Memory 918 and 944 are examples of non-transitory computer-readable media.
- the memory 918 and 944 can store an operating system and one or more software applications, instructions, programs, and/or data to implement the methods described herein and the functions attributed to the various systems.
- the memory can be implemented using any suitable memory technology, such as static random access memory (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory capable of storing information.
- SRAM static random access memory
- SDRAM synchronous dynamic RAM
- Flash-type memory any other type of memory capable of storing information.
- the architectures, systems, and individual elements described herein can include many other logical, programmatic, and physical components, of which those shown in the accompanying figures are merely examples that are related to the discussion herein.
- FIG. 9 is illustrated as a distributed system, in alternative examples, components of the vehicle 902 can be associated with the computing device(s) 940 and/or components of the computing device(s) 940 can be associated with the vehicle 902. That is, the vehicle 902 can perform one or more of the functions associated with the computing device(s) 940, and vice versa. Further, aspects of the prediction component 928 (and subcomponents) can be performed on any of the devices discussed herein.
- FIG. 10 depicts an example process 1000 for capturing sensor data, determining attributes associated with an object, determining a predicted location based on the attributes, and controlling a vehicle based on the predicted location.
- some or all of the process 1000 can be performed by one or more components in FIG. 9, as described herein.
- some or all of the process 1000 can be performed by the vehicle computing device(s) 904.
- any of the operations described in the example process 1000 may be executed in parallel, in a different order than depicted in the process 1000, omit any of the operations of the depicted process 1000, and/or be combined with any of the operations discussed herein.
- the process can include receiving sensor data of an environment.
- the operation 1002 can include receiving and/or capturing time of flight data, lidar data, image data, radar data, and the like, of the environment.
- the operation 1002 can be performed by a vehicle (e.g., an autonomous vehicle) as the vehicle traverses the environment.
- the process can include determining, based at least in part on the sensor data, that an object is in the environment.
- the operation 1004 can include classifying an object as a pedestrian in the environment.
- the operation 1004 can include determining whether the object (e.g., the pedestrian) is on a sidewalk, in a road, jaywalking, etc.
- the process can include determining whether the object is associated with a destination in the environment.
- the operation 1006 can include accessing map data of the environment to determine whether crosswalk region(s) are within a threshold distance of the object. If there is one crosswalk region and the object is on a sidewalk, the operation 1006 can include identifying a location across a drivable area as a destination. If the object is in a street and is proximate to a single crosswalk, the operation 1006 can include disambiguating between two destinations.
- the operation 1006 can include determining, based at least in part on attributes associated with the object, a likelihood that the object will approach and/or cross a particular crosswalk region. In some examples, the operation 1006 may provide such a destination regardless of the presence of a crosswalk region in proximity to the pedestrian.
- the operation 1006 can include inputting attribute(s) to a destination prediction component (e.g., the destination prediction component 320) to determine a destination associated with an object in the environment.
- a destination prediction component e.g., the destination prediction component 320
- the attribute(s) input to the destination prediction component 320 can be the same as or similar to the attributes determined below in operations 1008 and 1010.
- attribute(s) can be determined for an object before determining a destination in an environment.
- the attribute(s) can be determined in parallel using reference frames based on different destinations in the environment to determine a likely destination in the environment.
- the operation 1006 can continue to the operation 1002 to capture additional data in the environment.
- the process can include determining a first attribute associated with the object, the first attribute associated with a first time.
- attributes can include, but are not limited to, one or more of a position of the object (e.g., a pedestrian) at a time (e.g., wherein the position can be represented in the frame of reference discussed herein), a size of the object or a bounding box associated with the object (e.g., length, width, and/or height), a velocity of the pedestrian at the time (e.g., a magnitude and/or angle with respect to the first axis (or other reference line)), an acceleration of the pedestrian at the time, an indication of whether the pedestrian is in a drivable area (e.g., whether the pedestrian is on a sidewalk or a road), an indication of whether the pedestrian is in a crosswalk region, an indication of whether the pedestrian is jaywalking, a region control indicator state (e.g., whether the crosswalk is controlled by a traffic signal and/
- the process can include determining a second attribute associated with the object, the second attribute associated with a second time after the first time.
- the operation 1010 can be omitted (such that only attributes associated with the first time can be determined and/or used), while in some instances, attributes associated with additional or different time instances can be determined as well.
- the process can include determining, based at least in part on the first attribute, the second attribute, and the destination, predicted location(s) of the object at a third time after the second time.
- the operation 1012 can include inputting attribute information into a location prediction component (e.g., the location prediction component 404) and receiving as output predicted location(s) associated with the object in the environment.
- a location prediction component e.g., the location prediction component 404
- the attribute(s) and/or the predicted location(s) can be expressed in one or more frames of reference based at least in part on a location of the object at the first time and/or the second time and a location of the destination in the environment.
- the process can include controlling a vehicle based at least in part on the predicted location(s).
- the operation 1014 can include generating a trajectory to stop the vehicle or to otherwise control the vehicle to safely traverse the environment.
- FIG. 11 depicts an example process for capturing sensor data, determining that a first object and second object are in an environment, determining attributes associated with the second object, determining a predicted location based on the attributes and a reference line, and controlling a vehicle based on the predicted location.
- some or all of the process 1100 can be performed by one or more components in FIG. 9, as described herein.
- some or all of the process 1100 can be performed by the vehicle computing device(s) 904.
- any of the operations described in the example process 1100 may be executed in parallel, in a different order than depicted in the process 1100, omit any of the operations of the depicted process 1100, and/or be combined with any of the operations discussed herein.
- the process can include receiving sensor data of an environment.
- the operation 1102 can include receiving and/or capturing time of flight data, lidar data, image data, radar data, and the like, of the environment.
- the operation 1102 can be performed by a vehicle (e.g., an autonomous vehicle) as the vehicle traverses the environment.
- the process can include determining, based at least in part on the sensor data, that a first object is in the environment.
- the operation 1104 can include determining a target object to be a subject of prediction operations, as discussed herein.
- determining the target obj ect can include selecting an object from a plurality of objects in an environment as a target object.
- a target object can be selected based on a likelihood of an intersection between paths of the target object and a vehicle (e.g., the vehicle 902) capturing sensor data, a distance between the target object and the vehicle (e.g., the vehicle 902) capturing sensor data, and the like.
- the process can include determining whether a second object is proximate the first object in the environment.
- the operation 1106 can include determining whether the second object is within a threshold distance of the first object.
- the operation 1106 can include determining the closest N objects to the first object (where N is an integer). In at least some examples, such determination may exclude objects having certain characteristics, such as, but not limited to, objects of differing classes, of opposing directions of motion, and the like.
- a second object is not proximate the first object (e.g.,“no” in the operation 1106), the process can return to the operation 1102. However, in some examples, the process can continue to operation 1112 where predicted location(s) of the first object are determined without attribute(s) associated with the second object (e.g., predicted location(s) of the first object can be determined based at least in part on attribute(s) associated with the first object). That is, predicted location(s) of the first object can be determined irrespective of whether a second object is proximate the first object and/or irrespective of whether attribute(s) are determined for any second object, in some examples.
- the process can include determining a first attribute associated with the second object, the second attribute associated with a first time.
- attributes can be determined for the first object, the second object, and/or other object(s) in the environment.
- attributes can include, but are not limited to, one or more of a velocity of the object at a time, an acceleration of the object at the time, a position of the object at the time (e.g., in global or local coordinates), a bounding box associated with the object at the time (e.g., representing extent(s) of the object, roll, pitch, and/or yaw), alighting state associated with the object at the first time (e.g., headlight(s), braking light(s), hazard light(s), turn indicator light(s), reverse light(s), etc.), object wheel orientation indication(s), a distance between the object and a map element at the time (e.g., a distance to a stop line, traffic line, speed bump, yield
- the process can include determining a second attribute associated with the second object, the second attribute associated with a second time after the first time.
- the operation 1110 can be omitted (such that only attributes associated with the first time can be used), while in some instance, attributes associated with additional or different time instances can be determined as well.
- the process can include determining, based at least in part on the first attribute and the second attribute, predicted location(s) of the first obj ect at a third time after the second time, the predicted location(s) with respect to a reference line in the environment.
- the operation 1112 can include inputting attribute information associated with the first object and/or the second object into a location prediction component (e.g., the location prediction component 802) to determine predicted location(s) associated with the first object.
- a location prediction component e.g., the location prediction component 802
- the operation 1112 can include receiving or otherwise determining a reference line most closely associated with the predicted location(s) and representing the predicted locations with respect to the reference line. For example, the operation 1112 can include determining a similarity score between predicted location(s) and candidate reference line(s) and selecting a reference line based on a similarity score, or any other mechanism.
- the process can include controlling a vehicle based at least in part on the predicted location(s).
- the operation 1114 can include generating a trajectory to stop the vehicle or to otherwise control the vehicle to safely traverse the environment.
- a system comprising: one or more processors; and one or more computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: capturing sensor data of an environment using a sensor of an autonomous vehicle; determining, based at least in part on the sensor data, that an object is in the environment; determining, based at least in part on map data and the sensor data, that the object is associated with a destination in the environment; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute, and the destination to a machine learned model, wherein the first attribute and the second attribute are represented in a frame of reference based at least in part on the destination; receiving, from the machine learned model, a predicted location of the object at a third time after the second time; and controlling the autonomous vehicle based at least in part on the predicted
- C The system of paragraph A or B, the operations further comprising: determining that the object is associated with the destination based at least in part on inputting the first attribute and the second attribute into a destination prediction component; and receiving, from the destination prediction component, the destination, the destination prediction component comprising another machine learned model.
- E The system of any of paragraphs A-D, the operations further comprising: establishing the frame of reference, wherein: a first location of the object at the second time is associated with an origin of the frame of reference; a first axis is based at least in part on the origin and the destination; and a second axis is perpendicular to the first axis; and wherein the predicted location is based at least in part on the frame of reference.
- a method comprising: receiving sensor data representing an environment; determining, based at least in part on the sensor data, that an object is in the environment; determining a location in the environment, the location associated with a crosswalk region; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute, and the location to a machine learned model; and receiving, from the machine learned model, a predicted location associated with the object at a third time after the second time.
- H The method of paragraph F or G, wherein the location is a first location, the method further comprising: determining the first location based at least in part on at least one of map data or the sensor data representing the environment; determining a threshold region associated with the first location; determining a second location of the object in the environment; determining that the second location of the object is within the threshold region; and selecting, based at least in part on the second location being within the threshold region and at least one of the first attribute or the second attribute, the location as a destination associated with the object.
- K The method of paragraph I or J, wherein: the location is a first location; and the predicted location associated with the object at the third time comprises a lateral offset with respect to the second axis and a distance along the first axis representing a difference between a second location of the object at the second time and the predicted location.
- L The method of any of paragraphs F-K, further comprising: determining a number of objects entering the crosswalk region within a period of time, wherein the second attribute comprises the number of objects.
- M The method of any of paragraphs F-L, wherein the obj ect is a first obj ect, the method further comprising: determining, based at least in part on the sensor data, that a second object is in the environment; determining, as an object context, at least one of a position, a velocity, or an acceleration associated with the second object; and determining the predicted location associated with the object further based at least in part on the object context.
- N The method of any of paragraphs F-M, further comprising: binning at least a portion of the predicted location to determine a binned predicted location.
- the first attribute comprises at least one of: a position of the object at the first time; a velocity of the object at the first time; a heading of the object at the first time; a first distance between the object at the first time and a first portion of the crosswalk region; a second distance between the object at the first time and a second portion of the crosswalk region; an acceleration of the object at the first time; an indication of whether the object is in a drivable area; a region control indicator state; a vehicle context; or an object association.
- a non-transitory computer-readable medium storing instructions that, when executed, cause one or more processors to perform operations comprising: receiving sensor data representing an environment; determining, based at least in part on the sensor data, that an object is in the environment; determining a location in the environment, the location associated with at least one of a crosswalk region or a non- drivable region of the environment; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute, and the location to a machine learned model; and receiving, from the machine learned model, a predicted location associated with the object at a third time after the second time.
- R The non-transitory computer-readable medium of paragraph P or Q, wherein the location is a first location, the operations further comprising: establishing a frame of reference, wherein: a second location of the object at the second time is associated with an origin of the frame of reference; a first axis is based at least in part on the origin and the first location; and a second axis is perpendicular to the first axis; and wherein the first attribute is based at least in part on the frame of reference.
- T The non-transitory computer-readable medium of any of paragraphs P- S, further comprising: determining that the object is not associated with the crosswalk region; and determining that the location is associated with the non-drivable region of the environment.
- a system comprising: one or more processors; and one or more computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: capturing sensor data of an environment using a sensor of an autonomous vehicle; determining, based at least in part on the sensor data, that an object is in the environment; receiving a reference line associated with the object in the environment; determining a first attribute associated with the obj ect, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute, and the reference line into a machine learned model; receiving, from the machine learned model, a predicted location of the object at a third time after the second time, the predicted location with respect to the reference line in the environment; and controlling the autonomous vehicle based at least in part on the predicted location of the object in the environment at the third time.
- the at least one of the first attribute, the second attribute, the third attribute, or the fourth attribute comprises at least one of: a velocity of the second object at the first time; an acceleration of the second object at the first time; a position of the second object at the first time; a bounding box associated with the second obj ect at the first time; a lighting state associated with the second obj ect at the first time; a first distance between the second object and a map element at the first time; a second distance between the first object and the second object; a classification of the second object; or a characteristic associated with the second object.
- Z A method comprising: receiving sensor data representing an environment; determining that an object is in the environment; receiving a reference line associated with the object; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute, and the reference line to a machine learned model; and receiving, from the machine learned model, a predicted location of the object at a third time after the second time, the predicted location with respect to the reference line in the environment.
- AA The method of paragraph Z, further comprising: capturing the sensor data using a sensor of a vehicle; and controlling the vehicle based at least in part on the predicted location of the object in the environment at the third time.
- AC The method of any of paragraphs Z-AB, wherein the object is one of a plurality of objects in the environment, and wherein the object is a target object, the method further comprising: selecting, based at least in part on a proximity of the plurality of objects to the target object, a number of objects of the plurality of object; determining attributes associated with the objects; and inputting the attributes to the machine learned model to determine the predicted location.
- AD The method of paragraph AC, further comprising selecting the objects based at least in part on a classification associated with the objects.
- AE The method of any of paragraphs Z-AD, wherein the reference line corresponds to a centerline of a drivable area, and wherein the predicted location comprises a distance along the reference line and a lateral offset from the reference line.
- AF The method of any of paragraphs Z-AE, wherein the first attribute and the second attribute are represented with respect to a frame of reference, wherein an origin of the frame of reference is based at least in part on a location of the object at the second time.
- the first attribute comprises at least one of: a velocity of the object at the first time; an acceleration of the object at the first time; a position of the object at the first time; a bounding box associated with the object at the first time; a lighting state associated with the object at the first time; a distance between the object and a map element at the first time; a classification of the object; or a characteristic associated with the object.
- AH The method of paragraph AG, wherein the object is a first object and the distance is a first distance, the method further comprising: determining that a second object is proximate the first object in the environment; wherein the first attribute further comprises a second distance between the first object and the second object at the first time.
- AI A non-transitory computer-readable medium storing instructions that, when executed, cause one or more processors to perform operations comprising: receiving sensor data representing an environment; determining, based at least in part on the sensor data, that an object is in the environment; receiving a reference line associated with the object; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute, and the reference line to a machine learned model; and receiving, from the machine learned model, a predicted location of the object at a third time after the second time, the predicted location with respect to the reference line in the environment.
- AJ The non-transitory computer-readable medium of paragraph AI, wherein the object is a first object, the operations further comprising: determining that a second object is proximate the first object in the environment; determining a third attribute associated with the second object, the third attribute associated with the first time; determining a fourth attribute associated with the second object, the fourth attribute associated with the second time; and inputting the third attribute and the fourth attribute to the machine learned model to determine the predicted location associated with the first object.
- AK The non-transitory computer-readable medium of paragraph AI or AJ, the first attribute and the second attribute are represented with respect to a frame of reference, wherein an origin of the frame of reference is based at least in part on a location of the object at the second time.
- AL The non-transitory computer-readable medium of paragraph AK, wherein the predicted location is represented as a distance along the reference line and a lateral offset from the reference line.
- AM The non-transitory computer-readable medium of any of paragraphs AI-AL, wherein the first attribute comprises at least one of: a velocity of the object at the first time; an acceleration of the object at the first time; a position of the object at the first time; a bounding box associated with the object at the first time; a lighting state associated with the object at the first time; a distance between the object and a map element at the first time; a classification of the obj ect; or a characteristic associated with the object.
- AN The non-transitory computer-readable medium of paragraph AM, wherein the object is a first object, the distance is a first distance, and the first attribute further comprises a second distance between the first object and a second object at the first time.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Traffic Control Systems (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
- Image Analysis (AREA)
Abstract
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/363,627 US11351991B2 (en) | 2019-03-25 | 2019-03-25 | Prediction based on attributes |
US16/363,541 US11021148B2 (en) | 2019-03-25 | 2019-03-25 | Pedestrian prediction based on attributes |
PCT/US2020/024386 WO2020198189A1 (fr) | 2019-03-25 | 2020-03-24 | Prédiction de piétons à base d'attributs |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3948656A1 true EP3948656A1 (fr) | 2022-02-09 |
Family
ID=70289862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20719542.1A Pending EP3948656A1 (fr) | 2019-03-25 | 2020-03-24 | Prédiction de piétons à base d'attributs |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP3948656A1 (fr) |
JP (1) | JP2022527072A (fr) |
CN (1) | CN113632096A (fr) |
WO (1) | WO2020198189A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2601202B (en) * | 2020-09-25 | 2024-03-20 | Motional Ad Llc | Trajectory generation using road network model |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018176000A1 (fr) | 2017-03-23 | 2018-09-27 | DeepScale, Inc. | Synthèse de données pour systèmes de commande autonomes |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US10671349B2 (en) | 2017-07-24 | 2020-06-02 | Tesla, Inc. | Accelerated mathematical engine |
US11157441B2 (en) | 2017-07-24 | 2021-10-26 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11215999B2 (en) | 2018-06-20 | 2022-01-04 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11361457B2 (en) | 2018-07-20 | 2022-06-14 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11205093B2 (en) | 2018-10-11 | 2021-12-21 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
US11196678B2 (en) | 2018-10-25 | 2021-12-07 | Tesla, Inc. | QOS manager for system on a chip communications |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11150664B2 (en) | 2019-02-01 | 2021-10-19 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
US10997461B2 (en) | 2019-02-01 | 2021-05-04 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US10956755B2 (en) | 2019-02-19 | 2021-03-23 | Tesla, Inc. | Estimating object properties using visual image data |
CN112406905B (zh) * | 2020-09-10 | 2022-01-28 | 腾讯科技(深圳)有限公司 | 基于交通工具的数据处理方法、装置、计算机及存储介质 |
CN112785845B (zh) * | 2020-12-30 | 2022-11-01 | 桂林电子科技大学 | 一种基于K-means聚类与RBF神经网络的车速预测方法 |
EP4131180A1 (fr) * | 2021-08-05 | 2023-02-08 | Argo AI, LLC | Procédé et système permettant de prédire des trajectoires d'acteurs par rapport à une zone où l'on peut circuler |
CN114743157B (zh) * | 2022-03-30 | 2023-03-03 | 中科融信科技有限公司 | 一种基于视频的行人监控方法、装置、设备及介质 |
FR3140701A1 (fr) * | 2022-10-11 | 2024-04-12 | Psa Automobiles Sa | Méthodes et systèmes de conduite à l’approche d’un passage piéton |
US20240174266A1 (en) * | 2022-11-30 | 2024-05-30 | Zoox, Inc. | Prediction model with variable time steps |
US20240253620A1 (en) * | 2023-01-31 | 2024-08-01 | Zoox, Inc. | Image synthesis for discrete track prediction |
WO2024176651A1 (fr) * | 2023-02-24 | 2024-08-29 | 住友電気工業株式会社 | Dispositif d'analyse de trajectoire de marche, dispositif d'analyse de trajectoire de mouvement, procédé d'analyse et programme informatique |
-
2020
- 2020-03-24 JP JP2021557327A patent/JP2022527072A/ja active Pending
- 2020-03-24 WO PCT/US2020/024386 patent/WO2020198189A1/fr unknown
- 2020-03-24 CN CN202080023879.5A patent/CN113632096A/zh active Pending
- 2020-03-24 EP EP20719542.1A patent/EP3948656A1/fr active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2601202B (en) * | 2020-09-25 | 2024-03-20 | Motional Ad Llc | Trajectory generation using road network model |
Also Published As
Publication number | Publication date |
---|---|
CN113632096A (zh) | 2021-11-09 |
WO2020198189A1 (fr) | 2020-10-01 |
JP2022527072A (ja) | 2022-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11351991B2 (en) | Prediction based on attributes | |
US11021148B2 (en) | Pedestrian prediction based on attributes | |
US11631200B2 (en) | Prediction on top-down scenes based on action data | |
EP3948656A1 (fr) | Prédiction de piétons à base d'attributs | |
US11734832B1 (en) | Prediction on top-down scenes based on object motion | |
EP3908493B1 (fr) | Prédiction d'occlusion et évaluation de trajectoire | |
US12115990B2 (en) | Trajectory modifications based on a collision zone | |
US11169531B2 (en) | Trajectory prediction on top-down scenes | |
US11643073B2 (en) | Trajectory modifications based on a collision zone | |
US11708093B2 (en) | Trajectories with intent | |
US11554790B2 (en) | Trajectory classification | |
US20220274625A1 (en) | Graph neural networks with vectorized object representations in autonomous vehicle systems | |
WO2022093891A1 (fr) | Système de planification d'évitement de collision | |
US12012108B1 (en) | Prediction models in autonomous vehicles using modified map data | |
JP2024526614A (ja) | 縁石を識別するための技術 | |
US12060082B1 (en) | Machine learned interaction prediction from top-down representation | |
WO2021225822A1 (fr) | Classification de trajectoires | |
US12039784B1 (en) | Articulated object determination | |
US11772643B1 (en) | Object relevance determination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20210824 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230531 |