CN113632096A

CN113632096A - Attribute-based pedestrian prediction

Info

Publication number: CN113632096A
Application number: CN202080023879.5A
Authority: CN
Inventors: M·加法里安扎德; L·M·汉森
Original assignee: Zoox Inc
Current assignee: Zoox Inc
Priority date: 2019-03-25
Filing date: 2020-03-24
Publication date: 2021-11-09
Also published as: JP2022527072A; EP3948656A1; WO2020198189A1

Abstract

Techniques are discussed for predicting the location of an object based on properties of the object and/or based on properties of other object(s) proximate to the object. The techniques may predict the position of pedestrians near a crosswalk as they cross or are preparing to cross the crosswalk. The techniques may predict a location of an object as the object traverses an environment. Attributes may include information about the object such as location, velocity, acceleration, classification, heading, relative distance from a region or other object, bounding box, and the like. Attributes of an object may be determined over time such that when a series of attributes are input to a prediction component (e.g., a machine learning model), the prediction component may output, for example, a predicted location of the object at a future time. A vehicle (e.g., an autonomous vehicle) may be controlled to traverse the environment based on the predicted location.

Description

Attribute-based pedestrian prediction

Cross Reference to Related Applications

This patent application claims priority to both the us utility patent application serial No. 16/363,541 and the us utility patent application serial No. 16/363,627, both filed on 25.3.2019. Applications serial nos. 16/363,541 and 16/363,627 are incorporated by reference herein in their entirety.

Background

Predictive techniques may be used to determine the future state of an entity in an environment. That is, predictive techniques may be used to determine how a particular entity is likely to behave in the future. Current prediction techniques typically involve physics-based modeling or traffic regulation simulations to predict the future state of an entity in the environment.

Drawings

The detailed description is described with reference to the accompanying drawings. In the drawings, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears. The same reference numbers in different drawings identify similar or identical components or features.

FIG. 1 is an image flow diagram of an example process for capturing sensor data, determining an attribute associated with an object, determining a predicted location based on the attribute, and controlling a vehicle based on the predicted location.

FIG. 2 illustrates an example of attributes of an object.

FIG. 3A illustrates an example of determining a destination associated with an object in an environment.

FIG. 3B illustrates another example of determining a destination associated with an object in an environment.

FIG. 4 illustrates an example of determining predicted location(s) of an object based on attributes of the object over time.

FIG. 5 illustrates an example of updating a frame of reference used to determine predicted location(s).

FIG. 6 is an image flow diagram of an example process for capturing sensor data, determining that a first object and a second object are in an environment, determining an attribute associated with the second object, determining a predicted position based on the attribute and a reference line, and controlling a vehicle based on the predicted position.

Fig. 7 shows an example of attributes of an object.

Fig. 8 illustrates an example of determining predicted location(s) of a first object based on attributes of a second object over time.

FIG. 9 depicts a block diagram of an example system for implementing the techniques described herein.

FIG. 10 depicts an example process for capturing sensor data, determining attributes associated with an object, determining a predicted location based on the attributes, and controlling a vehicle based on the predicted location.

FIG. 11 depicts an example process for capturing sensor data, determining that a first object and a second object are in an environment, determining an attribute associated with the second object, determining a predicted position based on the attribute and a reference line, and controlling a vehicle based on the predicted position.

Detailed Description

The present disclosure relates to methods and systems for predicting the location of an object based on properties of the object and/or based on properties of other object(s) proximate to the object. In a first example, the techniques discussed herein may be implemented to predict pedestrian crossings in an environment proximate to their location as they traverse or are ready to traverse the pedestrian crossings area. In a second example, the techniques discussed herein may be implemented to predict the location of an object (e.g., a vehicle) as the vehicle traverses an environment. For example, the predicted location of the vehicle may be based on attributes of the vehicle and attributes of other vehicles in the environment that are proximate to the vehicle. Attributes may include information about the object including, but not limited to, position, velocity, acceleration, bounding box, and the like. Attributes may be specific to an object over time (e.g., time T) _-M、……T_-2、T_-1、T₀) The determination is made such that when input to a prediction component (e.g., a machine learning model such as a neural network), the prediction component can output a prediction of a future time (e.g., time T)₁、T₂、T₃、……、T_N) Prediction of (e.g. predicted position of object)). A vehicle (e.g., an autonomous vehicle) may be controlled to traverse the environment based at least in part on the predicted location of the object(s).

As described above and in a first example, the techniques discussed herein may be implemented to predict the location of a pedestrian in an environment proximate to a crosswalk area as the pedestrian passes or is ready to pass through the crosswalk area. For example, sensor data may be captured in the environment, and objects may be identified and classified as pedestrians. Further, the pedestrian crossing area may be identified in the environment based on map data and/or based on sensor data (e.g., identifying the pedestrian crossing area from the sensor data, whether directly by viewing a visual indicator (strip, pedestrian crossing symbol, etc.) of the pedestrian crossing area or indirectly by historical detection of pedestrians crossing the road at such location). At least one destination may be associated with a crosswalk area. For example, where a pedestrian is located on a sidewalk near a crosswalk, the destination may represent the opposite side of the street in the crosswalk area. Where the behavior is located in a street (within or outside a crosswalk area), the destination may be selected or otherwise determined based on attributes of the pedestrian (e.g., location, speed, acceleration, heading, etc.). In the case of multiple crosswalk regions that are close to each other, the score associated with the likelihood that the pedestrian will cross a particular crosswalk may be based on attributes of the pedestrian (e.g., position, speed, acceleration, heading, etc.). The crosswalk region associated with the highest score may be selected or otherwise determined as the target crosswalk associated with the pedestrian.

In some examples, the destination associated with the pedestrian may be determined based on several factors, such as in the case of a cross-road or a road that traverses an area of a crosswalk that is not readily identifiable. For example, the destination may be determined based at least in part on one or more of the following: linear extrapolation of pedestrian velocity, closest position of a sidewalk area associated with a pedestrian, a gap between parked vehicles, an open door associated with a vehicle, and the like. In some examples, sensor data in the environment may be captured to determine a likelihood that these example candidate destinations exist in the environment. In some examples, a score may be associated with each candidate destination and the possible destinations may be used according to the techniques discussed herein.

When a crosswalk area (or other location) has been determined to be the destination of a pedestrian, the techniques may include predicting the location(s) where the pedestrian traverses the crosswalk area over time. In some examples, the attributes of the object may be over time (e.g., time T)_-M、T_-2、T_-1、T₀) Is determined to be available at time T with the object₀The attributes are represented in a frame of reference. That is, the object is at T₀The location of the time may be considered an origin (e.g., coordinates (0, 0) in an x-y coordinate system), and the first axis may be defined by the origin and a destination associated with the crosswalk area. In some examples, the other points may be considered as the origin of another frame of reference. As described above, where a pedestrian is located on a first side of a street, the destination associated with the crosswalk area may be selected as a point on a second side of the street, the second side being opposite the first side of the street, although any destination may be selected. A second axis of the frame of reference may be perpendicular to the first axis and, at least in some examples, lies in a plane containing the crosswalk region.

In some examples, the attribute of the pedestrian may be determined based on sensor data captured over time and may include, but is not limited to, one or more of the following: a location of the pedestrian at a time (e.g., where the location may be represented in a frame of reference discussed above), a speed of the pedestrian at the time (e.g., a magnitude and/or an angle relative to a first axis (or other reference line)), an acceleration of the pedestrian at the time, an indication of whether the pedestrian is in a drivable area (e.g., whether the pedestrian is on a sidewalk or road), an indication of whether the pedestrian is in a crosswalk area, an area control indicator state (e.g., whether an intersection is controlled by a traffic signal and/or whether a crosswalk is controlled by a traffic signal (e.g., permit/prohibit traffic), and/or a state of the traffic signal), a vehicle context (e.g., a presence of a vehicle in an environment and attribute(s) associated with the vehicle), a flux through the crosswalk area over a period of time (e.g., an object that passes through the crosswalk area over a period of time) Number of vehicles), object association (e.g., whether the pedestrian is traveling in a crowd of pedestrians), distance to a crosswalk in a first direction (e.g., a global x-direction distance or an x-direction distance based on a frame of reference), distance to a crosswalk in a second direction (e.g., a global y-direction distance or a y-direction distance based on a frame of reference), distance to a road in a crosswalk area (e.g., a shortest distance to a road in the crosswalk area), pedestrian gesture, pedestrian gaze detection, an indication of whether the pedestrian is standing, walking, running, etc., whether there are other pedestrians in the crosswalk, pedestrian pathway flux (e.g., the number of pedestrians that pass through the crosswalk (e.g., traverse the drivable area) over a period of time), a ratio of the number of first pedestrians on the sidewalk (or non-drivable area) to the number of second pedestrians in the crosswalk area (drivable area), a ratio of the number of first pedestrians on the sidewalk (or non-drivable area) to the number of second pedestrians in the crosswalk area (or drivable area), a distance of the pedestrian area, Variance, confidence and/or probability associated with each attribute, and the like.

Attributes may be over time (e.g., at time T)_-M、……、T_-2、T_-1、T₀(where M is an integer) which may represent any time(s) prior to the current time and/or may include the current time, such as but not limited to 0.01 seconds, 0.1 seconds, 1 second, 2 seconds, etc.) is determined and may be input to a prediction component to determine the predicted location of the pedestrian. In some examples, the predictive component is a machine learning model, such as a neural network, a fully-connected neural network, a convolutional neural network, a recurrent neural network, and so forth.

In some examples, the prediction component may output future information associated with the pedestrian. For example, the prediction component may output a future time (e.g., time T)₁、T₂、T₃、……T_N(where N is an integer) that represents any time(s) after the current time). In some examples, the predicted information may include predicted location(s) of the pedestrian at a future time. For example, the predicted position may be in a frame of referenceIs represented as the origin (e.g., the pedestrian is at T₀Position of time) and the pedestrian is at T₁The distance between the times (e.g., distance s) and/or as a lateral offset (e) relative to the first axis (e.g., relative to a reference line)_y). In some examples, the distance s and/or the lateral offset e _yMay be expressed as rational numbers (e.g., 0.1 meters, 1 meter, 1.5 meters, etc.). In some examples, the distance s and/or the lateral offset may be binned (e.g., input to a binning algorithm) to discretize the raw data values into one or more discrete intervals. In some examples, the range of distance s may be 0-1 meter, 1-2 meters, 3-4 meters, etc., although any regular or irregular interval may be used for such ranges.

In some examples, a vehicle (e.g., an autonomous vehicle) may be controlled to traverse the environment based at least in part on the predicted location of the pedestrian(s).

As introduced above and in a second example, the techniques discussed herein may be implemented to predict a location of an object (e.g., a vehicle) as the vehicle traverses an environment. For example, sensor data may be captured in the environment, and objects may be identified and classified as vehicles. Further, the reference line may be identified and associated with the vehicle based on map data (e.g., identifying a drivable area, such as a lane) and/or based on sensor data (e.g., identifying a drivable area or lane from the sensor data). As should be appreciated, an environment may include any number of objects. For example, a target object or target vehicle (e.g., a vehicle, the subject of such predictive techniques) may traverse an environment in which other vehicles are in proximity to the target vehicle. In some examples, the techniques may include identifying K objects closest to the target object (where K is an integer). For example, the techniques may include identifying the 5 vehicles or other objects closest to the target vehicle, although any number of vehicles or other objects may be identified or otherwise determined. In some examples, the techniques may include identifying objects within a threshold distance from a target object. In some examples, the vehicle that captured the sensor data may be identified as one of the objects that is near the target vehicle. In at least some examples, additional features may be used to determine which objects to consider. As a non-limiting example, objects with a particular classification (e.g., not a vehicle) or the like may be ignored when considering the K nearest objects.

In some examples, the attributes may be determined for the target object and/or other object(s) proximate to the target object. For example, attributes may include, but are not limited to, one or more of the following: a speed of an object at a time, an acceleration of the object at the time, a location of the object at the time (e.g., in global or local coordinates), a bounding box associated with the object at the time (e.g., representing the range(s), roll, pitch, and/or yaw of the object), a lighting state associated with the object at a first time (e.g., headlamp(s), brake lamp(s), hazard warning lamp(s), turn indicator(s), back-up lamp(s), etc.), a wheel orientation of a vehicle, a distance between the object and a map element at the time (e.g., a distance from a stop line, a traffic line, a deceleration strip, a let-go line, an intersection, a lane, etc.), a classification of the object (e.g., car, vehicle, animal, truck, bicycle, etc.), a feature associated with the object (e.g., whether the object is changing lanes, a lane, a vehicle, etc.), a vehicle, a, Whether it is a side-by-side parked vehicle, etc.), lane type (e.g., lane direction, parking lane), road sign (e.g., an indication of whether to allow overtaking or lane changing, etc.), and the like.

In some examples, attribute information associated with the target object and/or other objects proximate to the target object may be captured over time and may be input to a prediction component to determine prediction information associated with the target object. In some cases, the prediction information may represent predicted positions of the target at different time intervals (e.g., at time T)₁、T₂、T₃、……T_NThe predicted location of).

In some examples, the predicted location(s) may be compared to candidate reference lines in the environment to determine a reference line associated with the target object. For example, the environment may include two lanes, which may be qualified (e.g., legal) drivable zones for the target vehicle to traverse. Further, the method canThe driving region may be associated with a representative reference line (e.g., a lane or a center of the drivable region). In some examples, the predicted location(s) may be compared to reference line(s) to determine a similarity score between the predicted location(s) and candidate reference line(s). In some examples, the similarity score may be based at least in part on a distance between the predicted location and the reference line, or the like. In some examples, the object is related to (e.g., at time T) _-M、T_-1、T₀Time) associated attributes may be input to a reference line prediction component, which may output possible reference lines associated with the object. The techniques may include receiving, selecting, or otherwise determining a reference line and representing the predicted location(s) relative to the reference line in the environment. That is, the predicted location(s) may be represented as a distance s along a reference line, which represents the target at time T₀The position of the time and the future time (e.g., time T) of the target object₁) The distance between predicted positions of time. Lateral offset e_yMay represent a distance between the reference line and a point that intersects a line perpendicular to a tangent line associated with the reference line.

The prediction technique may be iteratively or concurrently repeated to determine the predicted location(s) associated with the object in the environment. That is, the first target object may be associated with a first subset of objects in the environment, and the second target object may be associated with a second subset of objects in the environment. In some cases, the first target object may be included in the second subset of objects, and the second target object may be included in the first subset of objects. The predicted location may then be determined for a plurality of objects in the environment. In some cases, the predicted positions may be determined substantially simultaneously, within a technical tolerance.

In some examples, a vehicle (e.g., an autonomous vehicle) may be controlled to traverse the environment based at least in part on the predicted location of the object(s). For example, with knowledge of the predicted location(s) of objects in the environment, such predicted location(s) may be input to a planning component of the vehicle to traverse the environment.

The techniques discussed herein may improve the functionality of a computing device (e.g., a computing device of an autonomous vehicle) in a number of additional ways. In some examples, determining attributes and inputting the attributes to a prediction component (e.g., a machine learning component) may avoid hard-coded rules that may otherwise be inflexible in representing the environment. In some cases, determining the predicted location(s) associated with an object in the environment (e.g., a pedestrian or a vehicle) may allow other vehicles or objects to better plan a trajectory, which ensures safe and comfortable movement through the environment. For example, the likelihood of the predicted location-suggested collision(s) or near collision may allow the autonomous vehicle to change trajectories (e.g., change lanes, stop, etc.) in order to safely traverse the environment. These and other improvements to computing device functionality are discussed herein.

The techniques described herein may be implemented in a variety of ways. Exemplary embodiments are provided below with reference to the following drawings. Although discussed in the context of autonomous vehicles, the methods, apparatus, and systems described herein may be applied to various systems (e.g., sensor systems or robotic platforms) and are not limited to autonomous vehicles. In one example, similar techniques may be used in a vehicle controlled by a driver, where such a system may provide an indication of whether it is safe to perform various actions. In another example, the techniques may be used in the context of a manufacturing assembly line, or in the context of aeronautical measurements. Further, the techniques described herein may be used for real data (e.g., data captured using sensor (s)), simulated data (e.g., data generated by a simulator), or any combination of the two.

FIG. 1 is an image flow diagram of an example process 100 for capturing sensor data, determining an attribute associated with an object, determining a predicted location based on the attribute, and controlling a vehicle based on the predicted location.

At operation 102, the process may include capturing sensor data of an environment. In some examples, the sensor data may be captured by one or more sensors on the vehicle (autonomous or otherwise). For example, the sensor data may include data captured by a lidar sensor, an image sensor, a radar sensor, a time-of-flight sensor, a sonar sensor, and so forth. In some examples, operation 102 may include determining a category of the object (e.g., to determine that the object is a pedestrian in the environment).

At operation 104, the process may include determining a destination associated with an object (e.g., a pedestrian). Example 106 shows a vehicle 108 and an object 110 (e.g., a pedestrian) in an environment. In some examples, the vehicle 108 may perform the operations discussed in the flow 100.

Operation 104 may include determining attributes of object 110 to determine a position, a speed, a heading, etc. of object 110. Further, operation 104 may include accessing map data to determine whether a crosswalk area (e.g., crosswalk area 112) exists in the environment. In some examples, the crosswalk area 112 may represent a perimeter of a crosswalk in the environment. In some examples, operation 104 may include determining that the object is within a threshold distance (e.g., 5 meters) of a portion of the crosswalk area 112. In some examples, the threshold distance may be considered a minimum distance from the object to any portion of the crosswalk region. If the object 110 is within a threshold distance of multiple pedestrian crossing regions in the environment, operation 104 may include determining a probability or score associated with a pedestrian (e.g., the object 110) crossing the respective pedestrian crossing region and selecting a most likely pedestrian crossing region. In some cases, the destination 114 may be associated with the crosswalk area 112. In some examples, destination 114 may represent the center of crosswalk area 112 or the midpoint of the side opposite the location of object 110, although destination 114 may represent any point in the environment associated with crosswalk area 112. Additional details regarding determining a destination will be discussed in fig. 3A and 3B and in this disclosure.

At operation 116, the flow may include determining attribute(s) associated with the object. As shown in example 118, the most recent time associated with the attribute (e.g., at time T) may be up to and included_-M、……T_-2、T_-1、T₀) At different time instances of (a) determining an object110. Object 110 may be referred to as object 120 (e.g., at time T)_-2) Object 122 (e.g., at time T)_-1) And object 124 (e.g., at time T)₀). In some examples, time T₀Can represent the time at which data is input to the prediction component (discussed below), time T_-1Can represent the time T ₀1 second before, and time T_-2Can represent the time T₀The previous 2 seconds. However, it will be appreciated that time T₀、T_-1And T_-2Any time instance and/or time period may be represented. For example, time T_-1Can represent the time T₀0.1 second before, and time T_-2Can represent the time T₀The previous 0.2 seconds. In some examples, the attributes determined in operation 116 may include, but are not limited to, information about

objects

120, 122, and/or 124. For example, a velocity attribute associated with object 120 may represent object 120 at time T_-2The speed of (2). The velocity attribute associated with the object 122 may represent the object at time T_-1The speed of (2). And the velocity attribute associated with the object 124 may represent the object at time T ₀The speed of (2). In some examples, some or all of the attributes may be in relation to the object 124 (e.g., at time T)₀ Object 110 in time) and destination 114. In such an example, there may be three unique frames of reference with each previous time step (T)_-MTo T₀) Each attribute may be associated with a frame of reference at that particular time. Additional details regarding attributes will be discussed in conjunction with fig. 2 and the entire disclosure.

At operation 126, the flow may include determining predicted location(s) associated with the object based on the attribute(s). Example 128 shows predicted location 130 (e.g., object 110 at time T₀After a time T₁The predicted location of). In some examples, since operation 126 may be at time T₀Or close to time T₀Is executed at time T₁May represent a future location of subject 110. It is to be appreciated that in some examples, operation 126 may include determining and comparingThe predicted locations of the object 124 at a plurality of times associated in the future. For example, operation 126 may include determining that the object is at time T₁、T₂、T₃、……T_NWhere N is an integer representing time, e.g., 1 second, 2 seconds, 3 seconds, etc. into the future. In some examples, the predicted position(s) may be expressed as a distance s along a reference line and a lateral offset e from the reference line _y. In at least some examples, the distance s and the offset e_yMay be relative to a relative coordinate system defined at each time step and/or may be relative to a last determined frame of reference. Additional details of determining the predicted location(s) are discussed in conjunction with fig. 4 and 5, and in the present disclosure.

In some examples, although the process 100 may be performed at any interval or at any time, the

operations

102, 104, 116, and/or 126 may be performed iteratively or repeatedly (e.g., at a frequency of 10Hz (hertz) at each time step, etc.).

At operation 132, the process may include controlling the vehicle based at least in part on the predicted location(s). In some examples, operation 132 may include generating a trajectory for the vehicle 108 to follow (e.g., stopping before the intersection and/or before the crosswalk area 112 to allow the pedestrian 110 to traverse the crosswalk area 112 to the destination 114).

FIG. 2 illustrates an example 200 of attributes of an object. In some cases, attributes 202 may represent various information about or associated with objects in the environment (e.g., object 110 of FIG. 1). In some cases, attributes 202 may be determined for one or more time instances associated with the object. For example, object 120 represents object 110 at time T _-2Object 122 represents object 110 at time T_-1And object 124 represents object 110 at time T₀. For example, at each time instance T, an object may be identified_-2、T_-1And T₀An attribute is determined.

Examples of attributes 202 include, but are not limited to, distance between an object and a road, x-distance (or first distance) to a region, y-distance (or second distance) to a region, distance to a destination, speed (magnitude), speed (angle), x-position, y-position, regional flux, regional control indicator status, vehicle context (or, generally, object environment), object association, and so forth. In at least some examples, the attributes discussed herein may be relative to a relative coordinate system defined at each time step (e.g., associated with

objects

120, 122, 124, respectively), relative to a last determined reference frame, relative to a reference frame defined relative to vehicle 108 (e.g., at the respective time step (s)), relative to a global coordinate reference frame, and/or the like.

The instances 204 illustrate various attributes associated with the object 124. For example, example 204 shows attributes relative to the crosswalk area 112 and the destination 114. In some examples, the x-distance to a region may correspond to distance 206. That is, the distance 206 may represent the distance between the object 124 and the edge of the crosswalk region 112 closest to the object 124 in a first direction (which may be within a global or local frame of reference). In some examples, the y-distance to a certain area may correspond to distance 208. That is, the distance 208 may represent the distance between the object 124 and the edge of the crosswalk area 112 in the second direction. In at least some examples, a minimum distance between the object 124 and the crosswalk region may be determined and then decomposed into corresponding x-and y-components as x-and y-distances, respectively.

As shown in embodiment 204, the object 124 is located on a sidewalk area 210 (or, in general, a non-drivable area 210). In some cases, the pedestrian crossing area 112 may provide a path across the roadway 212 (or, in general, the drivable area 212). In some examples, the distance from the road may correspond to a distance 214, which may correspond to a shortest or minimum distance between the object 124 and a portion of the road 212 within the crosswalk region 112.

In some cases, the distance to the destination may correspond to distance 216. As shown, distance 216 represents the distance between object 124 and destination 114.

As described above, in some examples, attribute(s) 202 may be in a frame of referenceAnd (4) showing. As discussed herein, the frame of reference may be defined relative to the position of the object at each time step, relative to the last frame of reference, a global coordinate system, and so forth. In some examples, the origin corresponding to the frame of reference may correspond to the location of the object 124. Example 218 illustrates a frame of reference 220 (also referred to as frame of reference 220). In some examples, the first axis of the frame of reference 220 is defined by a unit vector that originates from the location of the object 124 and is in the direction of the destination 114. The first axis is labeled as the x-axis in example 218. In some examples, the second axis may be perpendicular to the first axis and may lie within a plane that includes the crosswalk. The second axis is labeled as the y-axis in example 218. In some examples, the first axis may represent a reference line, the distance s may be determined relative to the reference line, and the lateral offset e _yMay be determined relative to a second direction (e.g., the y-axis).

Example 222 shows a velocity vector 224 associated with object 124 and an angle 226 representing an angle between velocity vector 224 and a reference line. In some examples, the reference line may correspond to a first axis of the frame of reference 220, although any reference line may be selected or otherwise determined.

As discussed herein, attributes associated with

objects

124, 122, and 120 may be represented relative to a frame of reference 220. That is, at time T₀The x-position and y-position of the object 124 may be represented as (0, 0) (e.g., the object 124 represents the origin of the frame of reference 220). Further, the object 122 is referenced (at time T) relative to the frame of reference 220₀) The x-position and y-position of (a) can be expressed as (-x)₁，-y₁) Object 120 (at time T)₀) The x-position and y-position of (a) can be expressed as (-x)₂，-y₂). In at least some examples, a single coordinate system may be used, while in other examples, a relative coordinate system may be associated with each point, and attributes may be defined with respect to each relative coordinate system.

As described above, the attributes 202 may include area flux. In some examples, the regional flux may represent the number of objects that pass through the pedestrian crossing region 112 over a period of time. For example, the zone flux may correspond to a pass in K seconds The number J (e.g., at T) of cars (and/or other objects, such as other pedestrians) of the crosswalk area 112 (or any area)_-2To T₀With 5 vehicles passing in between). In some examples, the zone flux may represent any time period(s). Further, the zone flux may include information regarding the velocity, acceleration, speed, etc. of the vehicles traversing the crosswalk zone 112 over the period of time.

Further, attributes 202 may include a zone control indicator. In some examples, the zone control indicators may correspond to the status of traffic signals or indicators that control pedestrian traffic within the pedestrian crossing zone 112. In some examples, the zone control indicator may indicate whether a traffic light is present, a status of the traffic light (e.g., green, yellow, red, etc.), and/or a status of the pedestrian crossing indicator (e.g., permitted to pass, prohibited from passing, unknown, etc.).

In some examples, attributes 202 may include a vehicle context, which may indicate whether a vehicle or other object is near the object (e.g., 124) and attributes associated with any such vehicle or object. In some examples, the vehicle context may include, but is not limited to, speed, direction, acceleration, bounding box, location (e.g., in the frame of reference 220), distance between the object and the object 124, and the like.

In some examples, attributes 202 may include object associations. For example, object association may indicate whether the object 124 is associated with other objects (e.g., whether the object 124 is in a group of people). In some cases, object association attributes 202 may include attributes associated with the associated object.

The attributes 202 may further include, but are not limited to, information associated with acceleration, yaw, pitch, roll, relative speed, relative acceleration, whether the object is in the road 212, whether the object is on the sidewalk 210, whether the object is within the crosswalk area 112, whether the destination is changing (e.g., whether the object is turning at an intersection), the height of the object, whether the object is riding a bicycle, and the like.

Attributes 202 may further include, but are not limited to, gestures by pedestrians, gaze detection by pedestrians, indications of whether pedestrians are standing, walking, running, etc., whether other pedestrians are on the crosswalk, pedestrian crosswalk flux (e.g., the number of pedestrians passing through the crosswalk (e.g., across the drivable region) over a period of time), the ratio of a first number of pedestrians on the sidewalk (or non-drivable region) to a second number of pedestrians on the crosswalk region (or drivable region), variances, confidences, and/or probabilities associated with each attribute, and so forth.

Fig. 3A and 3B illustrate an example of determining a destination associated with an object in an environment. In general, fig. 3A shows a selection between two pedestrian crossing areas, while fig. 3B shows a selection between two destinations associated with a single pedestrian crossing area.

FIG. 3A illustrates an example 300 of determining a destination associated with an object in an environment. As described above, and in general, fig. 3A illustrates selection between two pedestrian crossing regions. Example 302 illustrates an object 304 and an object 306, the object 304 may correspond to a time T_-1Pedestrian of time, object 306 may correspond to time T₀The pedestrian at the time. For example, a vehicle, such as vehicle 108, may capture sensor data of the environment and may determine that a pedestrian is in the environment.

Further, based at least in part on

objects

304 and 306, the computing system may determine that objects 304 and/or 306 are proximate to one or more pedestrian crossing regions in the environment. For example, a computing device may access map data that may include map element(s) indicating location(s) and extent(s) (e.g., length and width) of such crosswalk areas. Example 302 shows the environment including a first crosswalk region 308 (also referred to as region 308) and a second crosswalk region 310 (also referred to as region 310).

In some cases, region 308 may be associated with a threshold region 312 (also referred to as threshold 312), and region 310 may be associated with a threshold region 314 (also referred to as threshold 314). As shown, objects 304 and 306 are both within thresholds 312 and 314. Based at least in part on objects 304 and/or 306 being within thresholds 312 and 314, the computing device may determine that objects 304 and/or 306 are associated with regions 308 and 310, respectively.

In some cases, the threshold 312 may represent any area or region associated with the region 308. As shown, the threshold 312 may represent a threshold of 5 meters around the area 308, although any distance or shape of the threshold 312 may be associated with the area 308. Likewise, threshold 314 may include any distance or shape associated with region 310.

In some cases, the region 308 may be associated with a destination 316. Further, and in some cases, the area 310 may be associated with a destination 318. In some examples, the location of destination 316 and/or 318 is located opposite the street relative to objects 304 and/or 306. That is, the destination associated with the crosswalk area may be selected based at least in part on the location of the pedestrian relative to the crosswalk area.

Objects 304 and/or 306 may be associated with attributes discussed herein. That is, the techniques may include determining the position, velocity, heading, acceleration, etc. of

objects

304 and 306, respectively.

Further, the information represented in example 302 (e.g., attributes associated with objects 304 and/or 306, location(s) of regions 308 and/or 310, locations of thresholds 312 and/or 314, locations of destinations 316 and/or 318, etc.) can be input to destination prediction component 320. In some cases, destination prediction component 320 may output a score or probability that object 306 may cross region 308 and/or region 310. Although example 302 illustrates a time step with two (e.g., T)_-1And T₀) Associated object information, but object information for any time period may be used to determine the destination.

In some examples, attributes associated with the

objects

302 and 306 may be input to the destination prediction component 320 in one or more frames of reference. For example, to evaluate the destination 316, attributes associated with the

objects

304 and 306 can be input to the destination prediction component 320 that uses a frame of reference based at least in part on the destination 316. Further, to evaluate the destination 318, attributes associated with the

objects

304 and 306 can be input to a destination prediction component 320 that uses a frame of reference based at least in part on the destination 318.

In some examples, the destination associated with a pedestrian may be determined based on several factors, such as where a pedestrian crossing the road or a pedestrian traverses a road that is not readily identifiable as a crosswalk area. For example, the destination may be determined based at least in part on one or more of the following: linear extrapolation of pedestrian velocity, closest position of a sidewalk area associated with a pedestrian, a gap between parked vehicles, an open door associated with a vehicle, and the like. In some examples, sensor data of an environment may be captured to identify possible destinations in the environment. Further attributes associated with the object may be represented in a frame of reference based at least in part on the determined destination, and these attributes may be input to the destination prediction component 320 for evaluation, as discussed herein.

Example 322 shows the output of destination prediction component 320. For example, based at least in part on the properties of objects 304 and/or 306, destination prediction component 320 can predict that objects 304 and/or 306 are proceeding toward destination 318.

FIG. 3B illustrates another example 324 of determining a destination associated with an object in an environment. As described above, fig. 3B illustrates selecting between two destinations associated with a single crosswalk area.

Example 324 illustrates an object 326 and an object 328, the object 326 may correspond to a time T_-1Pedestrian of time, object 328 may correspond to time T₀The pedestrian at the time. In some examples, because objects 326 and 328 are in road 330 (or drivable area 330) as opposed to being located on sidewalk 332 (or undrivable area 332), the computing device may identify two destinations 334 and 336 associated with area 338. In some examples, attributes associated with the objects 326 and 328 (along with information about the destinations 334 and 336 and the region 338, as well as other information) may be input to the destination prediction component 320 to determine which destination 334 and 336 is most likely. Although depicted in this fig. 3B as an ingress and egress crosswalk for illustrative purposes, such crosswalk areas are not required. As a non-limiting example, this isThe like destination prediction component 320 can generally determine that a pedestrian intends to cross a road or otherwise cross a road in a non-pedestrian crossing area and output a corresponding destination. In such an example, the attributes relative to a region may not be determined (as no region may exist). However, in some such examples, a fixed area perpendicular to the road segment and having a fixed width may be used as the area for determining such parameters.

As described above, in some examples, region 338 may be associated with object 326 and/or 328 at a time that object 326 and/or 328 is within a threshold distance of region 338.

Fig. 4 illustrates an example 400 of determining a predicted location(s) of an object over time based on attributes of the object.

Example 402 illustrates object 120 (e.g., time T)_-2Pedestrian), object 122 (e.g., time T)_-1Pedestrian) and object 124 (e.g., time T)₀The pedestrian). As discussed herein, objects 120, 122, and 124 can be represented in a frame of reference (and or one or more frames of reference associated with any one or more times) with the origin of object 124. Further, example 402 shows

objects

120, 122, and 124 associated with crosswalk area 112 and destination 114.

Data associated with example 402 may be input to location prediction component 404, which may output the predicted location(s) associated with

objects

120, 122, and/or 124.

Example 406 illustrates a predicted location(s) based on

objects

120, 122, and/or 124. For example, the location prediction component 404 can output a predicted location 408, which can represent an object at a time T₁The position of (a). In some cases, the predicted location 408 may be represented as a distance (e.g., s)410 and a lateral offset 412 (e.g., e) based at least in part on a frame of reference defined by the object 124 (e.g., origin) and the destination 114 _y)。

As shown, the location prediction component 404 can output signals respectively corresponding to times T₁、T₂、T₃、T₄And T₅Although it is understood that the location prediction component 404 can outputAny number of predicted locations associated with any future time(s) are presented. In some examples, such additional predicted locations may be defined by a global coordinate system, a local coordinate system relative to a relative frame of reference associated with a previously predicted point, and so forth.

In some examples, the location prediction component 404 can include a pair of sensors such as distance s or lateral offset e_yThe function of stepping the output value of (a). That is, the location prediction component 404 can include a grading function to replace a value falling within a grade with a value representing the grade. For example, the distance s that falls into a step may be replaced with a value representing the step value. For example, if the distance s is 0.9 meters, while the first range is 0.0 meters-1.0 meters and the corresponding bin value is 0.5 meters, then a bin output of distance s 0.9 meters corresponds to 0.5 meters. Any number of gradations spanning any range may be used. Of course, in some cases, the original value may be output without binning such output. In some such examples, additional values may be associated with the output bins, indicating an offset relative to the center portion of the bin. As a non-limiting example, the output may indicate that the next predicted location falls within the first step (e.g., between 0 and 1 meter), and an associated offset of 0.2 meters may be used to indicate that the likely location of the predicted location may be 0.7 meters (e.g., 0.5 meters +0.2 meters).

In general, the predicted location(s) shown in example 406 may be referred to as predicted location(s) 414.

In some examples, the location prediction component 404 may output a variance, covariance, probability, or certainty associated with each predicted location(s) 414 that indicates a certainty that the object 124 will be located at the respective predicted location at the respective time.

Fig. 5 illustrates an example 500 of updating a frame of reference used to determine predicted location(s).

Example 406 is reproduced in FIG. 5 to represent time T_AThe time T_AMay correspond to time T indicated in example 406₀. As shown, objects 120, 122, and 124 are represented in a frame of reference 220, which is partially represented by object 1The location of 24 and the location of destination 114.

In some cases, the example 406 may be updated for the next time step, and an updated predicted location may be determined (e.g., in operation 502).

An example of such an update is shown as example 504, which shows what corresponds to example 406, but at time T_ATime T after_BThe environment of the time. The object 506 in the example 504 represents a time T relative to a frame of reference 508₀. Likewise, example 504 includes object 510, which represents time T _-1The object of (1). Object 512 further represents time T_-2The object of (1).

In some examples, the object 510 (e.g., time T in the frame of reference 508)_-1May correspond to the object 124 (e.g., time T in the frame of reference 220)₀The object of (1). Similarly, object 512 (e.g., time T in frame of reference 508)_-2May correspond to object 122 (e.g., time T in frame of reference 220)_-1The object of (1). For comparison, example 504 shows object 120, whereby object 120 (and/or attributes associated with object 120) may or may not be used in determining the updated predicted location in example 504.

It will be appreciated that the frame of reference 508 may be defined by the location and destination 114 of the object 506 or based at least in part on the location and destination 114 of the object 506. Thus, a relative frame of reference may be defined with respect to the destination 114 and the most recently currently determined location of the object 124 (e.g., such a coordinate frame of reference may change as the object changes in the environment).

Accordingly, information associated with the example 504 (which may or may not include information associated with the object 120) may be input to the location prediction component 404 to determine the updated predicted location(s) 514. As discussed herein, the updated predicted location(s) 514 may be based at least in part on the frame of reference 508.

In some examples, the updated predicted location(s) may be determined at a frequency of 10Hz, although the predicted locations may be determined at any frequency or between any regular or irregular intervals of time.

FIG. 6 is an image flow diagram of an example process 600 for capturing sensor data, determining that a first object and a second object are in an environment, determining an attribute associated with the second object, determining a predicted position based on the attribute and a reference line, and controlling a vehicle based on the predicted position.

Although discussed in the context of determining attributes of the first and second objects to determine the predicted location(s) associated with the first object, in some examples, attributes of one or more of the second objects may not be determined and the predicted location(s) of the first object may be determined based on the attributes associated with the first object.

At operation 602, the process may include capturing sensor data of an environment. In some examples, the sensor data may be captured by one or more sensors on the vehicle (autonomous or otherwise). For example, the sensor data may include data captured by a lidar sensor, an image sensor, a radar sensor, a time-of-flight sensor, a sonar sensor, and so forth. In some examples, operation 602 may include determining a category of the object (e.g., to determine that the object is a vehicle in the environment).

Example 604 shows a vehicle 606 that may capture sensor data in operation 602. The environment may further include

objects

608, 610, 612, 614, 616, and 618. In some examples, the object 618 may be referred to as a target object 618, as the target object 618 may be the subject of such a prediction operation (e.g., the target of the prediction operation) as discussed herein.

In some examples, vehicle 606 may travel through the environment via trajectory 620. As can be appreciated in the context of fig. 6, the object 608 may travel in the same direction as the vehicle 606 (e.g., in the same lane as the vehicle 606), while in some examples, the object 610 and the target object 618 may travel in opposite directions (e.g., the target object 618 may represent a vehicle to and from the vehicle 606). Of course, this flow 600 may be used in any environment and is not limited to the particular objects and/or geometries shown in FIG. 6.

At operation 622, the flow may include determining attribute(s) associated with the target object and object(s) proximate to the target object. Example 624 shows vehicle 606, object 606 plus 616, and target object 618. In some examples, operation 622 may include determining the attribute(s) associated with the target object without determining the attributes of other objects. For example, such other objects may not be present in the environment, or such attributes of other objects may not be needed, desired, or required to determine the predicted location(s) of the target object 618, in accordance with embodiments of the techniques discussed herein.

For purposes of illustration, the outline of object 612 is illustrated with dashed lines, and

elements

626, 628, and 630 corresponding to object 612 are represented as points. In some examples, element 626 represents at time T_-2The location associated with object 612. In some examples, element 628 represents at time T_-1The location associated with object 612. While in some examples, element 630 represents at time T₀The location associated with object 612.

As further illustrated, the vehicle 606, the

objects

608, 616, and the target object 618 are associated with elements, although such elements are not labeled in FIG. 6. In the context of the present disclosure, it is understood that such elements represent at various times (e.g., time T)_-2、T_-1And T₀) The location associated with the vehicle and/or the object and/or may represent attributes associated with the object at various times.

In some examples, the attributes determined in operation 622 may represent corresponding information about each object. For example, such attributes may include, but are not limited to, a location of the object (e.g., a global location and/or a relative location with respect to any frame of reference), a velocity, an acceleration, a bounding box, a lighting state, lane attribute(s), an offset from a reference line or predicted path, and so forth. Additional details of these attributes are discussed in connection with fig. 7, and in the present disclosure.

In some examples, operation 622 may include determining or identifying an object based at least in part on the proximity of the object to the target object. For example, operation 622 may include determining the nearest N objects near the target object 618, where N is an integer. Additionally or alternatively, operation 622 may include identifying or selecting an object based on the object being within a threshold distance of the target object 618. In at least some examples, such selection may exclude certain objects based on one or more features such as, but not limited to, object classification (e.g., considering only vehicles), direction of motion (e.g., considering only objects moving in the same direction), location relative to a map (e.g., considering only vehicles on one or more lanes of a road), and so forth.

At operation 632, the process may include determining predicted location(s) associated with the target object based at least in part on the attribute(s), the predicted location(s) being relative to a reference line in the environment (which may include a centerline of a lane associated with the object in some examples). Example 634 shows predicted location(s) 636 associated with target object 618 in the environment. In some examples, the predicted location(s) 636 can be defined by reference line 638 and/or based at least in part on reference line 638. That is, predicted position(s) 636 can be measured by a distance s along reference line 638 and a lateral offset e relative to reference line 638 _yTo indicate.

In some examples, reference line 638 may be based at least in part on map data of the environment. Further, in some examples, reference line 638 may correspond to a centerline of a lane of a road or other drivable area.

In some examples, operation 632 may include receiving a reference line associated with the target object 618, e.g., from a reference line prediction component. In some examples, the reference line prediction component may include a trained machine learning model to output a most likely reference line based at least in part on map data, properties of object(s) in the environment, and so on. In some cases, the reference line prediction component may be integrated into other machine learning models discussed herein, while in some cases, the reference line prediction component may be a stand-alone component.

In some examples, operation 632 may include selecting reference line 638 from a plurality of candidate reference lines. In some examples, reference line 638 may be selected based at least in part on a similarity score representing the similarity of predicted location(s) 636 relative to reference line 638. In some examples, the predicted location(s) 636 can be relative to a predicted path and/or trajectory, a previously predicted path point, or the like. Other examples of predicted location(s), reference line(s), and similarity score(s) are discussed in connection with fig. 8 and the entire disclosure.

At operation 640, the process may include controlling the vehicle based at least in part on the predicted location(s). In some examples, operation 640 may include generating a trajectory to be followed by the vehicle 608 or updating a trajectory 642 (e.g., deviating the vehicle 606 from the predicted location(s) 636 associated with the vehicle 618 where the target object 618 may be traveling in close proximity to the desired path of the vehicle 608).

FIG. 7 illustrates an example 700 of attributes of an object. In some cases, attributes 702 may represent various information about or associated with objects in the environment (e.g., object 612 and target object 618 of fig. 6, as represented by example 604 rendered in fig. 7).

In some cases, the attributes 702 may be determined for one or more time instances associated with the object. Example 704 shows at time instance T_-2、T_-1And T₀Object 612 at (c). For example, element 626 represents at time T_-2Object 612, element 628 represents the time T_-1And element 630 represents the object 612 at time T₀Object 612.

Further, attributes may be determined for any type and/or number of objects in example 704, and are not limited to object 612. For example, element 706 may be present (e.g., representing the time T _-2Target object 618), element 708 (e.g., representing at time T)_-1Target object 618) and element 710 (e.g., representing the time T at₀Target object 618) determines the attributes. Further, attributes may be determined for any number of time instances, not limited to T_-2、T_-1And T₀。

Examples of attributes 702 include, but are not limited to, a velocity of the object, an acceleration of the object, an x-position of the object (e.g., global position, local position, and/or position relative to any other frame of reference), a y-position of the object (e.g., local position, global position, and/or position relative to any other frame of reference), a bounding box associated with the object (e.g., range (long, wide, and/or high), yaw, pitch, roll, etc.), a lighting state (e.g., brake light(s), flashing light(s), hazard warning light(s), headlamp(s), backup light(s), etc.), a wheel orientation of the object, a map element (e.g., a distance between the object and a stop light, stop sign, deceleration strip, intersection, yield sign, etc.), a classification of the object (e.g., vehicle, car, speed bump, intersection, yield sign, etc.), a classification of the object, Trucks, bicycles, motorcycles, pedestrians, animals, etc.), object characteristics (e.g., whether the object is changing lanes, whether the object is a side-by-side parked vehicle, etc.), proximity to one or more objects (within any coordinate system), lane type (e.g., direction of lane, parking lane), road markings (e.g., indicating whether to allow traffic or change lanes, etc.), and the like.

In some examples, attributes of the object may be determined relative to a local frame of reference, global coordinates, and the like. For example, a frame of reference may be determined whose origin corresponds to target object 618 (e.g., object 710) at time T₀The position of (a).

Fig. 8 illustrates an example 800 of determining predicted location(s) of a first object based on attributes of a second object over time.

As shown, information associated with the example 704 of fig. 7 may be input to a location prediction component 802, which in turn may output a predicted location(s) associated with the target object. E.g., at a different time (e.g., T) than the vehicle 606, the object 608-_-2、T_-1And T₀) Can be input to the location prediction component 802.

Example 804 illustrates predicted location(s) 806 associated with target object 618. That is, the location prediction component 802 can receive attribute information associated with objects proximate to the target object 618 and attribute information associated with the target object 618 and can output predicted location(s) 806 that represent the future of the target object 618.

Object 808 shows at time T_-2The target object 618. Object 810 represents at time T_-1The target object 618. And object 812 represents at time T ₀The target object of (1).

The location prediction component 802 can determine the predicted location(s) 806 based on the attribute information discussed herein. In some examples, the predicted location(s) may be initially represented in a global coordinate system, in a reference system with the target object as an origin, and so on. Further, the predicted position may be represented relative to a reference line in the environment.

In some examples, the environment may represent multiple reference lines, such as reference line 814 and reference line 816. As depicted in fig. 8 for illustrative purposes, reference line 816 may, for example, correspond to a lane change of a target object. In some examples, reference line 814 may represent a centerline of a first road segment, while reference line 816 may represent a centerline of (and/or a transition between) a second road segment. In some examples, such as a single lane road, the environment may represent a single reference line. However, in some examples, the environment may represent multiple reference lines.

In some examples, the location prediction component 802 can receive as input an indication of a most likely reference line (e.g., 814). In some examples, the location prediction component 802 may determine a likely reference line based at least in part on one or more attributes of the target object 618, other objects, and/or the environment, as described herein.

In some examples, the location prediction component 802 may determine a similarity score 818 representing a similarity between the predicted location(s) 806 and the reference line 814. Further, the location prediction component 802 can determine a similarity score 820 that represents a similarity between the predicted location(s) 806 and the reference line 816. In some examples, the similarity score may be based at least in part on a single or cumulative lateral offset between the predicted location(s) and respective reference lines, although other metrics may also be used to determine the similarity score.

In some examples, the location prediction component 802 may determine that the similarity score 818 is lower than the similarity score 820 and, accordingly, may select the reference line 814 as a basis for partially defining the predicted location(s) 806. However, in other examples, each possible reference line may be input into the location prediction component 802 along with previously calculated attributes, such that the location prediction component 802 may base the selection of an appropriate reference line and/or trajectory according to machine learning parameters.

Predicted location(s) 806 may include predicted

locations

822, 824, 826, 828, and/or 830. In some examples, the predicted location 822 may represent a first distance s and a first lateral offset (e.g.,(s) from the reference line 814 ₁，ey₁)). The predicted position 824 may represent a second distance s and a second lateral offset (e.g.,(s) from the reference line 814₂，ey₂)). The predicted position 826 may represent a third distance s and a third lateral offset (e.g.,(s) from the reference line 814₃，ey₃)). The predicted position 828 may represent a fourth distance s and a fourth lateral offset (e.g.,(s) from the reference line 814₄，ey₄)). The predicted position 830 may represent a fifth distance s and a fifth lateral offset (e.g.,(s) from the reference line 814₅，ey₅)). Of course, the location prediction component 802 can determine fewer or more predicted location(s), as described herein.

FIG. 9 depicts a block diagram of an example system 900 for implementing techniques described herein. In at least one example, system 900 may include a vehicle 902, which may correspond to vehicle 108 of fig. 1 and vehicle 606 of fig. 6.

The example vehicle 902 may be an unmanned vehicle, such as an autonomous vehicle configured to operate according to the 5-level classification promulgated by the U.S. national highway traffic safety administration, which describes vehicles capable of performing all safety critical functions for the entire journey, where the driver (or occupant) is not expected to control the vehicle at any time. In such an example, because the vehicle 902 may be configured to control all functions from the beginning to the end of a trip, including all parking functions, the vehicle may not include a driver and/or controls that drive the vehicle 902, such as a steering wheel, an accelerator pedal, and/or a brake pedal. This is merely an example, and the systems and methods described herein may be incorporated into any land, air, or water borne vehicle, ranging from vehicles that require manual control all the way by the driver to partially or fully autonomous control.

Vehicle 902 may include vehicle computing device(s) 904, one or more sensor systems 906, one or more transmitters 908, one or more communication connections 910, at least one direct connection 912, and one or more drive systems 914.

Computing device(s) 904 may include one or more processors 916 and memory 918 in communicative association with the one or more processors 916. In the illustrated example, the vehicle 902 is an autonomous vehicle; however, vehicle 902 may be any other type of vehicle or robotic platform. In the illustrated example, the memory 918 of the vehicle computing device(s) 904 stores a positioning component 920, a perception component 922, one or more maps 924, one or more system controllers 926, a prediction component 928, and a planning component 936, the prediction component 928 including an attribute component 930, a destination prediction component 932, and a location prediction component 934. While depicted in fig. 9 as residing in memory 918 for purposes of illustration, it is contemplated that positioning component 920, perception component 922, one or more maps 924, one or more system controllers 926, prediction component 928, attribute component 930, destination prediction component 932, location prediction component 934, and planning component 936 can additionally, or alternatively, be accessible to vehicle 902 (e.g., stored on a memory remote from vehicle 902, or otherwise accessed by vehicle 902).

In at least one example, the positioning component 920 can include functionality to receive data from the sensor system(s) 906 to determine a position and/or orientation (e.g., one or more of an x-position, a y-position, a z-position, a roll, a pitch, or a yaw) of the vehicle 902. For example, the positioning component 920 can include and/or request/receive a map of the environment, and can continuously determine a position and/or orientation of the autonomous vehicle in the map. In some cases, the positioning component 920 may utilize SLAM (simultaneous positioning and mapping), CLAMS (simultaneous calibration, positioning and mapping), relative SLAM, binding adjustments, non-linear least squares optimization, etc. to receive image data, lidar data, radar data, time-of-flight data, IMU data, GPS data, wheel encoder data, etc. to accurately determine the position of the autonomous vehicle. In some cases, positioning component 920 may provide data to various components of vehicle 902 to determine an initial position of the autonomous vehicle, for generating a trajectory and/or determining that an object is near one or more crosswalk regions and/or for identifying candidate reference lines, as discussed herein.

In some cases, and in general, the perception component 922 may include functionality to perform object detection, segmentation, and/or classification. In some examples, perception component 922 may provide processed sensor data that indicates the presence of and/or classifies an entity proximate to vehicle 902 as an entity type (e.g., car, pedestrian, cyclist, animal, building, tree, road surface, curb, sidewalk, traffic light, stop sign, unknown, etc.). In additional or alternative examples, the perception component 922 may provide processed sensor data indicative of one or more characteristics related to the detected entity (e.g., tracked object) and/or the environment in which the entity is located. In some examples, the characteristics related to the entity may include, but are not limited to, an x-position (global and/or local position), a y-position (global and/or local position), a z-position (global and/or local position), an orientation (e.g., roll, pitch, yaw), a type of the entity (e.g., classification), a velocity of the entity, an acceleration of the entity, a range (size) of the entity, and/or the like. The characteristics associated with the environment may include, but are not limited to, the presence of another entity in the environment, the status of another entity in the environment, a time of day, a day of the week, a season, weather conditions, an indication of darkness/light, and the like.

The memory 918 may further include one or more maps 924 that may be used by the vehicle 902 to navigate through the environment. For purposes of this discussion, a map may be any number of data structures modeled in two, three, or N dimensions that are capable of providing information about an environment, such as, but not limited to, a topology (e.g., intersections), streets, mountains, roads, terrain, and general environment. In some cases, the map may include, but is not limited to: texture information (e.g., color information (e.g., RGB color information, Lab color information, HSV/HSL color information), etc.), intensity information (e.g., lidar information, radar information, etc.); spatial information (e.g., image data projected onto a grid, individual "bins" (e.g., polygons associated with individual colors and/or intensities)), reflectivity information (e.g., specularity information, retroreflectivity information, BRDF (bidirectional reflectance distribution function) information, BSSRDF (bidirectional scattering surface reflectance distribution function) information, etc.). In one example, the map may include a three-dimensional grid of the environment. In some cases, the map may be stored in a tile format, such that a single tile of the map represents a discrete portion of the environment and may be loaded into working memory as needed. In at least one example, the one or more maps 924 can include at least one map (e.g., an image and/or a grid).

In some examples, the vehicle 902 may be controlled based at least in part on the map 924. That is, the map 924 can be used in conjunction with the positioning component 920, the perception component 922, the prediction component 928, and/or the planning component 936 to determine a location of the vehicle 902, identify objects in the environment, and/or generate routes and/or trajectories to navigate through the environment.

In some examples, one or more maps 924 may be stored on remote computing device(s) (e.g., computing device(s) 940) that are accessible over network(s) 938. In some examples, the plurality of maps 924 may be stored based on, for example, a characteristic (e.g., a type of entity, a time of day, a day of the week, a season of the year, etc.). Storing multiple maps 924 may have similar memory requirements, but may increase the speed of accessing data in the maps.

In at least one example, the vehicle computing device(s) 904 may include one or more system controllers 926 that may be configured to control steering, propulsion, braking, safety, transmitters, communications, and other systems of the vehicle 902. These system controller(s) 926 may communicate with and/or control respective systems of the drive system(s) 914 and/or other components of the vehicle 902.

In general, the prediction component 928 may include functionality to generate prediction information associated with objects in the environment. In some examples, the prediction component 928 may be implemented to predict the location of pedestrians near a crosswalk area (or other area or location associated with pedestrian crossing roads) in the environment in the event that they cross or are ready to cross the crosswalk area. In some examples, the techniques discussed herein may be implemented to predict the location of an object (e.g., a vehicle, a pedestrian, etc.) as the vehicle traverses an environment. In some examples, the prediction component 928 may generate one or more predicted trajectories for a target object based on properties of such target object and/or other objects proximate to the target object.

The properties component 930 may include functionality to determine property information associated with objects in the environment. In some examples, the attribute component 930 may receive data from the perception component 922 to determine attribute information of the object over time.

In some examples, attributes of an object (e.g., a pedestrian) may be determined based on sensor data captured over time and may include, but are not limited to, one or more of the following: a location of the pedestrian at a time (e.g., where the location may be represented in the frame of reference discussed above), a speed of the pedestrian at the time (e.g., a magnitude and/or angle relative to a first axis (or other reference line)), an acceleration of the pedestrian at the time, an indication of whether the pedestrian is in a drivable area (e.g., whether the pedestrian is on a sidewalk or road), an indication of whether the pedestrian is in a crosswalk area, an indication of whether the pedestrian is crossing a road, an area control indicator state (e.g., whether a crosswalk is controlled by a traffic signal and/or a state of the traffic signal), a vehicle context (e.g., the presence of a vehicle in the environment and attribute(s) associated with the vehicle), a flux through the crosswalk area over a period of time (e.g., a number of objects (e.g., vehicles and/or pedestrians) that pass through the crosswalk area over a period of time), a vehicle context (e.g., a location of a vehicle and/or pedestrian), a vehicle context, Object association (e.g., whether the pedestrian is traveling in a group of pedestrians), distance to a crosswalk in a first direction (e.g., global x-direction), distance to a crosswalk in a second direction (e.g., global y-direction), distance to a road in a crosswalk area (e.g., shortest distance to a road in a crosswalk area), and the like.

In some examples, attributes may be determined for a target object (e.g., a vehicle) and/or other object(s) proximate to the target object (e.g., other vehicles). For example, attributes may include, but are not limited to, one or more of the following: a speed of an object at a time, an acceleration of the object at the time, a location of the object at the time (e.g., in global or local coordinates), a bounding box associated with the object at the time (e.g., representing the range(s), roll, pitch, and/or yaw of the object), a lighting state associated with the object at a first time (e.g., headlamp(s), brake lamp(s), hazard warning lamp(s), turn indicator(s), backup lamp(s), etc.), a distance between the object and a map element at the time (e.g., a distance from a stop line, traffic line, speed bump, hiking line, intersection, lane, etc.), a distance between the object and other objects, a classification of the object (e.g., car, vehicle, animal, truck, bicycle, etc.), a feature associated with the object (e.g., whether the object is changing lanes, a lane, etc.), a location of the object at the time, a location of the object at the location, a location of the object, a location of the location, a location of the location, a location of a location, a location associated with the object, a location associated with a location, a location associated with a location, whether side-by-side parked vehicles, etc.) and the like.

In some examples, any combination of properties of the object may be determined, as discussed herein.

Attributes may be over time (e.g., at time T)_-M、……T、T_-1、T₀(where M is an integer) and each time represents any time up to the most recent time) is determined and input to the destination prediction component 932 and/or the location prediction component 934 to determine prediction information associated with such objects.

The destination prediction component 932 may include functionality to determine a destination of an object in an environment, as discussed herein. In the context of a pedestrian, the destination prediction component 932 may determine which crosswalk region(s) may be appropriate for the pedestrian based on the pedestrian being within a threshold distance of the crosswalk region(s), as described herein. In at least some examples, such destination prediction component 932 can determine points on opposing side sidewalks regardless of the presence of crosswalks. Additionally, attributes of objects associated with any time period may be input to the destination prediction component 932 to determine a score, probability, and/or likelihood that a pedestrian is heading toward, or may be associated with, a pedestrian crossing area.

In some examples, the destination prediction component 932 is a machine learning model, such as a neural network, a fully-connected neural network, a convolutional neural network, a recurrent neural network, and so forth.

In some examples, the destination prediction component 932 may be trained by reviewing data logs to determine events in which pedestrians have crossed pedestrian crossings. Such events may be identified, and attributes of objects (e.g., pedestrians) and the environment may be determined, and data representative of the events may be identified as training data. The training data may be input into a machine learning model where known results (e.g., ground truth, such as known "future" attributes) may be used to adjust the weights and/or parameters of the machine learning model to minimize errors.

The location prediction component 934 may include functionality to generate or otherwise determine a predicted location(s) associated with an object in the environment. For example, as discussed herein, attribute information may be determined for one or more objects in the environment, which may include the target object and/or other objects proximate to the target object. In some examples, attributes associated with vehicle 902 may be used to determine predicted location(s) associated with object(s) in the environment.

The location prediction component 934 can further include functionality to represent attribute information in various frame of reference(s), as discussed herein. In some examples, the location prediction component 934 may use the object in Time T₀As the origin of a frame of reference, which may be updated for each time instance.

In some examples, the location prediction component 934 may include functionality to identify candidate reference lines in the environment (e.g., based on map data), and may select the reference lines (e.g., based on similarity scores) to determine the predicted location(s) relative to the reference lines.

In some examples, the location prediction component 934 is a machine learning model, such as a neural network, a fully-connected neural network, a convolutional neural network, a recurrent neural network, and the like, or any combination thereof.

For example, the location prediction component 934 can be trained by reviewing the data logs and determining attribute information. Training data representing relevant events (e.g., a threshold distance of a vehicle from a reference line, a pedestrian crossing, a pedestrian crossing a road, etc.) may be input to the machine learning model, where known results (e.g., ground truth, such as known "future" attributes/locations) may be used to adjust the weights and/or parameters of the machine learning model to minimize errors.

In general, the planning component 936 may determine a path to be followed by the vehicle 902 to traverse the environment. For example, the planning component 936 may determine various routes and trajectories and various levels of detail. For example, the planning component 936 may determine a route for traveling from a first location (e.g., a current location) to a second location (e.g., a target location). For purposes of this discussion, a route may be a sequence of waypoints for travel between two locations. By way of non-limiting example, waypoints include streets, intersections, Global Positioning System (GPS) coordinates, and the like. Further, planning component 936 may generate instructions for guiding the autonomous vehicle along at least a portion of the route from the first location to the second location. In at least one example, planning component 936 may determine how to direct the autonomous vehicle from a first waypoint in the sequence of waypoints to a second waypoint in the sequence of waypoints. In some examples, the instruction may be a trace, or a portion of a trace. In some examples, multiple tracks may be generated substantially simultaneously (e.g., within a technical tolerance) according to a rolling horizon technique, where one of the multiple tracks is selected for navigation of the vehicle 902.

In some cases, planning component 936 may generate one or more trajectories for vehicle 902 based at least in part on the predicted location(s) associated with the object(s) in the environment. In some examples, planning component 936 may use sequential logic, such as linear sequential logic and/or signal sequential logic, to evaluate one or more trajectories of vehicle 902.

It is to be appreciated that the components discussed herein (e.g., the positioning component 920, the perception component 922, the one or more maps 924, the one or more system controllers 926, the prediction component 928, the attributes component 930, the destination prediction component 932, the location prediction component 934, and the planning component 936) are separately described for illustration purposes. However, the operations performed by the various components may be combined or performed in any other component. Furthermore, any components discussed as being implemented in software may be implemented in hardware, and vice versa. Further, any functionality implemented in vehicle 902 may be implemented in computing device(s) 940, or in another component (or vice versa).

In at least one example, sensor system(s) 906 can include time-of-flight sensors, lidar sensors, radar sensors, ultrasonic sensors, sonar sensors, position sensors (e.g., GPS, compass, etc.), inertial sensors (e.g., Inertial Measurement Unit (IMU), accelerometers, magnetometers, gyroscopes, etc.), cameras (e.g., RGB, IR, intensity, depth, etc.), microphones, wheel encoders, environmental sensors (e.g., temperature sensors, humidity sensors, light sensors, pressure sensors, etc.), and so forth. Sensor system(s) 906 may include multiple instances of each of these or other types of sensors. For example, the time-of-flight sensors may include various time-of-flight sensors located at corners, front, rear, sides, and/or top of the vehicle 902. As another example, the camera sensor may include multiple cameras disposed at different locations outside and/or inside of the vehicle 902. Sensor system(s) 906 may provide input to vehicle computing device(s) 904. Additionally or alternatively, sensor system(s) 906 can transmit sensor data to one or more computing devices 940 over one or more networks 938 at a particular frequency, after a predetermined period of time has elapsed, in near real-time, and/or the like.

Vehicle 902 may also include one or more emitters 908 for emitting light and/or sound, as described above. The transmitters 908 in this embodiment include internal audio and visual transmitters for communicating with occupants of the vehicle 902. By way of example and not limitation, the internal transmitter may include a speaker, a light, a sign, a display screen, a touch screen, a tactile transmitter (e.g., vibration and/or force feedback), a mechanical actuator (e.g., a seat belt tensioner, a seat positioner, a headrest positioner, etc.), and the like. The transmitter 908 in this example also includes an external transmitter. By way of example and not limitation, the external transmitters in this example include lights (e.g., indicator lights, signs, light arrays, etc.) to emit a driving direction signal or other indication of vehicle action, and one or more audio transmitters (e.g., speakers, speaker arrays, horns, etc.) to audibly communicate with the pedestrian or other nearby vehicle, where the one or more audio transmitters include beam steering technology.

Vehicle 902 may also include one or more communication connections 910 that enable vehicle 902 to communicate with one or more other computing devices, local or remote. For example, communication connection(s) 910 may facilitate communication with other local computing device(s) and/or drive system(s) 914 on vehicle 902. Also, communication connection(s) 910 may allow the vehicle to communicate with other computing device(s) in the vicinity (e.g., other vehicles in the vicinity, traffic signals, etc.). Communication connection(s) 910 also enable vehicle 902 to communicate with a remotely operated computing device or other remote service.

Communication connection(s) 910 may include a physical and/or logical interface for connecting vehicle computing device(s) 904 to another computing device or network, such as network(s) 938. For example, communication connection(s) 910 may be implemented based onWi-Fi communication, e.g. via frequencies specified by the IEEE 802.11 standard, short-range radio frequencies such as

Cellular communication (e.g., 2G, 3G, 4G LTE, 5G, etc.) or any suitable wired or wireless communication protocol that enables a respective computing device to interface with other computing device(s) may also be implemented.

In at least one example, vehicle 902 may include one or more drive systems 914. In some examples, the vehicle 902 may have a single drive system 914. In at least one example, if vehicle 902 has multiple drive systems 914, then each drive system 914 may be disposed at opposite ends (e.g., front and rear, etc.) of vehicle 902. In at least one example, drive system(s) 914 can include one or more sensor systems to detect conditions of drive system(s) 914 and/or the environment surrounding vehicle 902. By way of example and not limitation, the sensor system(s) may include: one or more wheel encoders (e.g., rotary encoders) to sense rotation of the wheels of the drive module, inertial sensors (e.g., inertial measurement units, accelerometers, gyroscopes, magnetometers, etc.) to measure orientation and acceleration of the drive module, cameras or other image sensors, ultrasonic sensors to acoustically detect objects around the drive system, lidar sensors, radar sensors, etc. Some sensors (e.g., wheel encoders) may be specific to the drive system(s) 914. In some cases, the sensor system(s) on the drive system(s) 914 may overlap or supplement a corresponding system of the vehicle 902 (e.g., sensor system(s) 906).

The drive system(s) 914 may include a number of vehicle systems including high voltage batteries, motors to propel the vehicle, inverters to convert dc power from the batteries to ac power for use by other vehicle systems, steering systems including steering motors and trucks (which may be electric), braking systems including hydraulic or electric actuators, a suspension system including hydraulic and/or pneumatic components, a stability control system for distributing braking forces to reduce traction loss and maintain control, an HVAC (high voltage alternating current) system, lighting (e.g., lighting of head/tail lights for illuminating the surroundings outside the vehicle), and one or more other systems (e.g., a cooling system, a security system, an onboard charging system, other electrical components such as DC/DC converters, high voltage junctions, high voltage cables, a charging system, charging ports, etc.). Further, the drive system(s) 914 may include a drive system controller that may receive and pre-process data from the sensor system(s) to control operation of the various vehicle systems. In some examples, the drive system controller may include one or more processors and memory communicatively coupled with the one or more processors. The memory may store one or more components to perform various functions of the drive system(s) 914. Further, drive system(s) 914 may also include one or more communication connections that enable each drive system to communicate with one or more other local or remote computing devices.

In at least one example, the direct connection 912 may provide a physical interface to couple one or more drive systems 914 with the body of the vehicle 902. For example, the direct connection 912 may allow for the transfer of energy, fluid, air, data, etc., between the drive system(s) 914 and the vehicle. In some cases, the direct connection 912 may further releasably secure the drive system(s) 914 to the body of the vehicle 902.

In at least one example, the positioning component 920, the perception component 922, the one or more maps 924, the one or more system controllers 926, the prediction component 928, the attributes component 930, the destination prediction component 932, the location prediction component 934, and the planning component 936 can process the sensor data, as described above, and can transmit their respective outputs to the one or more computing devices 940 via the one or more networks 938. In at least one example, the positioning component 720, one or more maps 924, one or more system controllers 926, the prediction component 928, the attributes component 930, the destination prediction component 932, the location prediction component 934, and the planning component 936 can send their respective outputs to the one or more computing devices 940 at a particular frequency, after a predetermined period of time has elapsed, in a near real-time manner, and/or the like.

In some examples, vehicle 902 may transmit sensor data to one or more computing devices 940 over network(s) 938. In some examples, the vehicle 902 may send the raw sensor data to the computing device(s) 940. In other examples, vehicle 902 may send processed sensor data and/or representations of sensor data to computing device(s) 940. In some examples, the vehicle 902 may transmit the sensor data to the computing device(s) 940 at a particular frequency, after a predetermined period of time has elapsed, in a near real-time manner, and so forth. In some cases, the vehicle 902 may send the sensor data (raw or processed) to the computing device(s) 940 as one or more log files.

Computing device(s) 940 may include processor(s) 942 and memory 944 that stores training component 946.

In some cases, the training component 946 can include functionality to train one or more models to determine predictive information, as discussed herein. In some cases, the training component 946 can communicate information generated by one or more models to the vehicle computing device(s) 904 to revise how the vehicle 902 is controlled in response to different circumstances.

For example, the training component 946 can train one or more machine learning models to produce the predictive component discussed herein. In some examples, training component 946 may include functionality to search data logs and determine attributes and/or location information (e.g., in any one or more frames of reference) associated with the object(s). Log data corresponding to a particular scene (e.g., a pedestrian approaching and crossing a pedestrian crossing area, a pedestrian crossing a road, a target object bypassing a curve at an offset relative to a centerline, etc.) may represent training data. The training data may be input into a machine learning model where known results (e.g., ground truth, such as known "future" attributes) may be used to adjust the weights and/or parameters of the machine learning model to minimize errors.

For example, aspects of some or all of the components discussed herein may include any model, algorithm, and/or machine learning algorithm. For example, in some cases, components in memory 944 (as well as memory 918 discussed above) may be implemented as a neural network. In some examples, training component 946 can utilize neural networks to generate and/or execute one or more models to determine segmentation information from sensor data, as discussed herein.

As described herein, an exemplary neural network is a biologically inspired algorithm that passes input data through a series of connected layers to produce an output. Each layer of the neural network may also include another neural network, or may include any number of layers (whether convolutional or not). As can be appreciated in the context of the present disclosure, neural networks may utilize machine learning, which may refer to a wide range of such algorithms in which outputs are generated based on learned parameters.

Although discussed in the context of a neural network, any type of machine learning may be consistent with the present disclosure. For example, the machine learning algorithm or machine learned algorithm may include, but is not limited to: regression algorithms (e.g., Ordinary Least Squares Regression (OLSR), linear regression, logistic regression, stepwise regression, Multivariate Adaptive Regression Splines (MARS), local estimation scatter plot smoothing (LOESS)), example-based algorithms (e.g., ridge regression, least absolute contraction and selection operator (LASSO), elastic net, Least Angle Regression (LARS)), decision tree algorithms (e.g., classification and regression tree (CART), iterative dichotomy 3(ID3), chi-squared automated interaction detection (CHAID), decision stump, conditional decision tree), Bayesian algorithms (e.g., naive Bayes, gaussian naive Bayes, multi-forked naive Bayes, average mononominator (DE), Bayesian belief networks (BNN), Bayesian networks), clustering algorithms (e.g., k-means, Expectation Maximization (EM), hierarchical clustering algorithms (e.g., perceptual, probabilistic, etc.), associative rule learning algorithms (e.g., back propagation, hopfield network, Radial Basis Function Network (RBFN)), deep learning algorithms (e.g., Deep Boltzmann Machine (DBM), Deep Belief Network (DBN), Convolutional Neural Network (CNN), superposition autoencoder), dimensionality reduction algorithms (e.g., Principal Component Analysis (PCA), Principal Component Regression (PCR), Partial Least Squares Regression (PLSR), samming mapping, multidimensional scaling (MDS), projection pursuit, Linear Discriminant Analysis (LDA), hybrid discriminant analysis (MDA), Quadratic Discriminant Analysis (QDA), Flexible Discriminant Analysis (FDA)), aggregation algorithms (e.g., lifting algorithms, a guided aggregation algorithm (Bagging), adaptive boosting, a stacked generalization algorithm (fusion), a gradient elevator (GBM), a gradient regression boosted tree (GBRT), a random forest, an SVM (support vector machine), supervised learning, unsupervised learning, semi-supervised learning, and the like.

Other examples of architectures include neural networks such as ResNet50, ResNet 0l, VGG, DenseNet, PointNet, and the like.

Processor(s) 916 of vehicle 902 and processor(s) 942 of computing device(s) 940 may be any suitable processor capable of executing instructions to process data and perform the operations described herein. By way of example, and not limitation, processor(s) 916 and 942 may include one or more Central Processing Units (CPUs), Graphics Processing Units (GPUs), or any other device or portion of a device to process electronic data to convert that electronic data into other electronic data that may be stored in registers and/or memory. In some examples, integrated circuits (e.g., ASICs, etc.), gate arrays (e.g., FPGAs, etc.), and other hardware devices may also be considered processors, provided they are configured to implement the coded instructions.

Memory 918 and 944 are examples of non-transitory computer readable media. Memories 918 and 944 may store an operating system and one or more software applications, instructions, programs, and/or data to implement the methods described herein and the functions attributed to the various systems. In various embodiments, the memory may be implemented using any suitable memory technology, such as Static Random Access Memory (SRAM), synchronous dynamic ram (sdram), non-volatile/flash type memory, or any other type of memory capable of storing information. The architectures, systems, and elements described herein may include many other logical, program, and physical components, where those components shown in the figures are merely examples related to the discussion herein.

It should be noted that while fig. 9 is illustrated as a distributed system, in alternative examples, components of vehicle 902 may be associated with computing device(s) 940, and/or components of computing device(s) 940 may be associated with vehicle 902. That is, vehicle 902 may perform one or more functions associated with computing device(s) 940, and vice versa. Further, various aspects of the prediction component 928 (and subcomponents) may be executed on any of the devices discussed herein.

FIG. 10 depicts an example process 1000 for capturing sensor data, determining attributes associated with an object, determining a predicted location based on the attributes, and controlling a vehicle based on the predicted location. For example, as described herein, some or all of flow 1000 may be performed by one or more components in fig. 9. For example, some or all of flow 1000 may be performed by vehicle computing device(s) 904. Further, any of the operations described in example flow 1000 may be performed in parallel, in a different order than described in flow 1000, omitted from any of the operations of flow 1000 depicted, and/or combined with any of the operations discussed herein.

At operation 1002, the process may include receiving sensor data of an environment. In some examples, operation 1002 may include receiving and/or capturing time-of-flight data, lidar data, image data, radar data, and/or the like of an environment. In some examples, operation 1002 may be performed by a vehicle (e.g., an autonomous vehicle) as the vehicle traverses an environment.

At operation 1004, the flow may include determining that an object is in the environment based at least in part on the sensor data. For example, operation 1004 may include classifying the object as a pedestrian in the environment. In some examples, operation 1004 may include determining whether the object (e.g., a pedestrian) is on a sidewalk, on a road, crossroad, or the like.

At operation 1006, the flow may include determining whether the object is associated with a destination in the environment. For example, operation 1006 may include accessing map data of the environment to determine whether the crosswalk region(s) is within a threshold distance of the object. If there is a crosswalk area and the object is on a sidewalk, operation 1006 may include identifying a location that spans the drivable area as the destination. If the object is on the street and near a single crosswalk, operation 1006 may include disambiguating between the two destinations. In some examples, operation 1006 may include determining a likelihood that the object will approach and/or traverse a particular pedestrian crossing region based at least in part on an attribute associated with the object. In some examples, operation 1006 may provide such a destination regardless of whether there is a crosswalk area near the pedestrian.

In some examples, operation 1006 may include inputting the attribute(s) to a destination prediction component (e.g., destination prediction component 320) to determine a destination associated with the object in the environment. In some examples, the attribute(s) input to destination prediction component 320 may be the same or similar to the attributes determined in

operations

1008 and 1010 below. In some examples, prior to determining a destination in the environment, attribute(s) may be determined for the object. And in some cases, the attribute(s) may be determined in parallel using a frame of reference based on different destinations in the environment to determine possible destinations in the environment.

If the object is not associated with a destination (e.g., "no" in operation 1006), operation 1006 may proceed to operation 1002 to capture additional data in the environment.

If the object is associated with a destination (e.g., "yes" at operation 1006), the operation may continue to operation 1008.

At operation 1008, the flow may include determining a first attribute associated with the object, the first attribute associated with a first time. In some examples, attributes may include, but are not limited to, one or more of the following: a location of an object (e.g., a pedestrian) at a time (e.g., where the location may be represented in a frame of reference discussed herein), a size of the object or a bounding box (e.g., a length, a width, and/or a height) associated with the object, a speed of the pedestrian at the time (e.g., a size and/or an angle relative to a first axis (or other reference line)), an acceleration of the pedestrian at the time, an indication of whether the pedestrian is in a drivable area (e.g., whether the pedestrian is on a sidewalk or a road), an indication of whether the pedestrian is in a crosswalk area, an indication of whether the pedestrian is crossing a road, an area control indicator state (e.g., a state of whether the crosswalk is controlled by a traffic signal and/or the traffic signal), a vehicle context (e.g., a presence of a vehicle in an environment and attribute(s) associated with the vehicle), a vehicle context (e.g., a state of a vehicle in an environment), a vehicle context (e.g., a vehicle context, and an attribute (e.g., a vehicle context), a vehicle context, and a vehicle context, The amount of flux that passes through the crosswalk area over a period of time (e.g., the number of objects (e.g., vehicles and/or additional pedestrians) that pass through the crosswalk area over a period of time), object association (e.g., whether the pedestrian passes in a group of pedestrians), distance to the crosswalk in a first direction (e.g., a global x-direction), distance to the crosswalk in a second direction (e.g., a global y-direction), distance to a road in the crosswalk area (e.g., the shortest distance to the road in the crosswalk area), distance to other objects, and the like.

At operation 1010, the flow may include determining a second attribute associated with the object, the second attribute associated with a second time after the first time. In some examples, operation 1010 may be omitted (such that only attributes associated with a first time may be determined and/or used), while in some examples, attributes associated with additional or different time instances may also be determined.

At operation 1012, the flow may include determining the predicted location(s) of the object at a third time after the second time based at least in part on the first attribute, the second attribute, and the destination. In some examples, operation 1012 may include inputting the attribute information to a location prediction component (e.g., location prediction component 404) and receiving as output a predicted location(s) associated with the object in the environment. As discussed herein, in some examples, the attribute(s) and/or the predicted location(s) may be represented in one or more frames of reference based at least in part on the location of the object at the first time and/or the second time and the location of the destination in the environment.

At operation 1014, the process may include controlling the vehicle based at least in part on the predicted location(s). In some cases, operation 1014 may include generating a trajectory to stop the vehicle or otherwise control the vehicle to safely traverse the environment.

FIG. 11 depicts an example process for capturing sensor data, determining that a first object and a second object are in an environment, determining an attribute associated with the second object, determining a predicted position based on the attribute and a reference line, and controlling a vehicle based on the predicted position. For example, as described herein, some or all of flow 1100 may be performed by one or more components in fig. 9. For example, some or all of flow 1100 may be performed by vehicle computing device(s) 904. Further, any of the operations described in the example flow 1100 may be performed in parallel, in a different order than described in the flow 1100, omitted from any of the operations depicted in the flow 1100, and/or combined with any of the operations discussed herein.

At operation 1102, the flow may include receiving sensor data of an environment. In some examples, operation 1102 may include receiving and/or capturing time-of-flight data, lidar data, image data, radar data, and/or the like of an environment. In some examples, operation 1102 may be performed by a vehicle (e.g., an autonomous vehicle) as the vehicle traverses an environment.

At operation 1104, the flow may include determining that the first object is in the environment based at least in part on the sensor data. For example, operation 1104 may include determining a target object as the subject of the prediction operation, as discussed herein. For example, determining the target object may include selecting an object from a plurality of objects in the environment as the target object. In some examples, the target object may be selected based on a likelihood of intersection between the path of the target object and the path of the vehicle (e.g., vehicle 902) capturing the sensor data, a distance between the target object and the vehicle (e.g., vehicle 902) capturing the sensor data, and so on.

At operation 1106, the flow may include determining whether a second object is proximate to the first object in the environment. In some examples, operation 1106 may include determining whether the second object is within a threshold distance of the first object. In some examples (e.g., in a crowded environment), operation 1106 may include determining N objects (where N is an integer) that are closest to the first object. In at least some examples, such a determination may exclude objects having certain characteristics, such as, but not limited to, objects of different classes, objects with opposite directions of motion, and so forth.

If the second object is not close to the first object (e.g., "No" in operation 1106), the flow may return to operation 1102. However, in some examples, the flow may continue to operation 1112, where the predicted location(s) of the first object are determined without the attribute(s) associated with the second object (e.g., the predicted location(s) of the first object may be determined based at least in part on the attribute(s) associated with the first object). That is, in some examples, the predicted location(s) of the first object may be determined without regard to whether the second object is proximate to the first object, and/or without regard to whether attribute(s) are determined for any of the second objects.

If the second object is near the first object (e.g., "yes" in operation 1106), the flow continues to operation 1108.

At operation 1108, the flow may include determining a first attribute associated with a second object, the second attribute associated with a first time. In some examples, attributes may be determined for the first object, the second object, and/or other object(s) in the environment. For example, attributes may include, but are not limited to, one or more of the following: a speed of an object at a time, an acceleration of the object at the time, a position of the object at the time (e.g., in global or local coordinates), a bounding box associated with the object at the time (e.g., representing the range(s), roll, pitch, and/or yaw of the object), a lighting state associated with the object at a first time (e.g., headlamp(s), brake lamp(s), hazard lamp(s), turn indicator(s), back-up lamp(s), etc.), object wheel orientation indicator(s), a distance between the object and a map element at the time (e.g., a distance from a stop line, a traffic line, a deceleration strip, a let-go line, an intersection, a lane, etc.), a relative distance from other objects in one or more reference frames, a classification of the object (e.g., a car, a vehicle, an animal, a vehicle, A truck, a bicycle, etc.), a feature associated with the object (e.g., whether the object is changing lanes, whether it is a side-by-side parked vehicle, etc.), a lane feature, and the like.

At operation 1110, the flow may include determining a second attribute associated with a second object, the second attribute associated with a second time after the first time. In some examples, operation 1110 may be omitted (so that only attributes associated with the first time may be used), while in some examples, attributes associated with additional or different time instances may also be determined.

At operation 1112, the flow may include determining a predicted location(s) of the first object at a third time after the second time based at least in part on the first and second attributes, the predicted location(s) being relative to a reference line in the environment. In some examples, operation 1112 may include inputting attribute information associated with the first object and/or the second object to a location prediction component (e.g., location prediction component 802) to determine a predicted location(s) associated with the first object.

In some examples, operation 1112 may include receiving or otherwise determining a reference line most closely associated with the predicted location(s) and representing the predicted location(s) relative to the reference line. For example, operation 1112 may include determining a similarity score between the predicted location(s) and the candidate reference line(s) and selecting the reference line based on the similarity score or any other mechanism.

At operation 1114, the process may include controlling the vehicle based at least in part on the predicted location(s). In some cases, operation 1114 may include generating a trajectory to stop the vehicle or otherwise control the vehicle to safely traverse the environment.

Example clauses

A: a system, comprising: one or more processors; and one or more computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: capturing sensor data of an environment using sensors of an autonomous vehicle; determining that an object is in an environment based at least in part on the sensor data; determining that the object is relevant to a destination in the environment based at least in part on the map data and the sensor data; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute, and the destination into a machine learning model, wherein the first attribute and the second attribute are represented in a frame of reference based at least in part on the destination; receiving, from the machine learning model, a predicted location of the object at a third time after the second time; and controlling the autonomous vehicle based at least in part on the predicted location of the object in the environment at the third time.

B: the system of paragraph a, wherein the object is a pedestrian and the destination is associated with a perimeter of a crosswalk area in the environment and opposite a drivable surface associated with the pedestrian.

C: the system of paragraphs a or B, the operations further comprising: determining that the object is associated with the destination based at least in part on inputting the first attribute and the second attribute into a destination prediction component; and receiving the destination from the destination prediction component, the destination prediction component comprising another machine learning model.

D: the system of any of paragraphs a-C, the operations further comprising: wherein the predicted location associated with the object at the third time comprises: a lateral offset based at least in part on the frame of reference; and a distance along an axis of the frame of reference representing a difference between the position of the object at the second time and the predicted position.

E: the system of any of paragraphs a-D, the operations further comprising: establishing the frame of reference, wherein: a first position of the object at a second time is associated with an origin of the frame of reference; a first axis is based at least in part on the origin and the destination; and a second axis perpendicular to the first axis; and wherein the predicted position is based at least in part on the frame of reference.

F: a method, comprising: receiving sensor data representative of an environment; determining that an object is in the environment based at least in part on the sensor data; determining a location in the environment, the location associated with a crosswalk area; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute and the location into a machine learning model; and receiving, from the machine learning model, a predicted location associated with the object at a third time after the second time.

G: the method of paragraph F, further comprising: capturing the sensor data using sensors on a vehicle; and controlling the vehicle based at least in part on the predicted location of the object in the environment at the third time.

H: the method of paragraph F or G, wherein the location is a first location, the method further comprising: determining the first location based at least in part on at least one of the sensor data or map data representative of the environment; determining a threshold region associated with the first location; determining a second location of the object in the environment; determining that the second location of the object is within the threshold region; and selecting the location as the destination associated with the object based at least in part on the second location being within the threshold region and at least one of the first attribute or the second attribute.

I: the method of any of paragraphs F-H, wherein the location is a first location, the method further comprising: establishing a frame of reference, wherein: a second position of the object at the second time is associated with the origin of the frame of reference; a first axis is based at least in part on the origin and the first position; and a second axis perpendicular to the first axis; and wherein the first attribute is based at least in part on the frame of reference.

J: the method of paragraph I, further comprising: determining a velocity of the object at the second time; and determining an angle between a velocity vector representing the velocity and the first axis; wherein the second attribute comprises the angle.

K: the method of paragraph I or J, wherein: the position is a first position; and the predicted position associated with the object at the third time comprises a lateral offset relative to the second axis and a distance along the first axis that represents a difference between a second position of the object at the second time and the predicted position.

L: the method of any of paragraphs F through K, further comprising: determining a number of objects that entered the crosswalk area over a period of time, wherein the second attribute includes the number of objects.

M: the method of any of paragraphs F-L, wherein the object is a first object, the method further comprising: determining that a second object is in an environment based at least in part on the sensor data; determining at least one of a position, a velocity, or an acceleration associated with the second object as an object context; and determining a predicted location associated with the object further based at least in part on the object context.

N: the method of any of paragraphs F-M, further comprising: at least a portion of the predicted locations are ranked to determine a ranked predicted location.

O: the method of any of paragraphs F-N, wherein the first attribute comprises at least one of: a location of the object at the first time; a velocity of the object at the first time; a direction of progress of the object at the first time; a first distance between the object at the first time and a first portion of the sidewalk area; a second distance between the object at the first time and a second portion of the sidewalk area; acceleration of the object at the first time; an indication of whether the object is in a drivable region; zone control indicator status; a vehicle background; or object association.

P: a non-transitory computer-readable medium storing instructions that, when executed, cause one or more processors to perform operations comprising: receiving sensor data representative of an environment; determining that an object is in the environment based at least in part on the sensor data; determining a location in the environment, the location associated with at least one of a non-drivable area or a crosswalk area of the environment; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute and the location into a machine learning model; and receiving, from the machine learning model, a predicted location associated with the object at a third time after the second time.

Q: the non-transitory computer-readable medium of paragraph P, wherein the location is a first location, the operations further comprising: determining the first location based at least in part on at least one of map data representative of the environment or the sensor data representative of the environment; determining a threshold region associated with the first location; determining a second location of the object in the environment; determining that the second location of the object is within the threshold region; and selecting the first location as the destination associated with the object based at least in part on the second location of the object being within the threshold region and at least one of the first attribute or the second attribute.

R: the non-transitory computer-readable medium of paragraph P or Q, wherein the location is a first location, the operations further comprising: establishing a frame of reference, wherein: a second position of the object at the second time is associated with the origin of the frame of reference; a first axis is based at least in part on the origin and the first position; and a second axis perpendicular to the first axis; and wherein the first attribute is based at least in part on the frame of reference.

S: the non-transitory computer-readable medium of paragraph R, wherein: the position is a first position; and the predicted position associated with the object at the third time includes a lateral offset along the second axis and a distance along the first axis that represents a difference between a second position of the object at the second time and the predicted position.

T: the non-transitory computer-readable medium of any of paragraphs P through S, further comprising: determining that the object is not associated with the crosswalk area; and determining that the location is associated with a non-drivable area of the environment.

U: a system, comprising: one or more processors; and one or more computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: capturing sensor data of an environment using sensors of an autonomous vehicle; determining that an object is in an environment based at least in part on the sensor data; receiving a reference line associated with the object in the environment; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute and the reference line into a machine learning model; receiving, from the machine learning model, a predicted position of the object at a third time after the second time, the predicted position being relative to the reference line in the environment; and controlling the autonomous vehicle based at least in part on the predicted location of the object in the environment at the third time.

V: the system of paragraph U, wherein the object is a first object, the operations further comprising: determining a third attribute associated with a second object proximate to the first object, the third attribute associated with the first time; determining a fourth attribute associated with the second object, the fourth attribute associated with the second time; and inputting the third and fourth attributes into the machine learning model to determine a predicted location of the first object at the third time.

W: the system of paragraph V, wherein at least one of the first, second, third, or fourth attributes comprises at least one of: a velocity of the second object at the first time; acceleration of the second object at the first time; a location of the second object at the first time; a bounding box associated with the second object at the first time; an illumination state associated with the second object at the first time; a first distance between the second object and the map element at the first time; a second distance between the first object and the second object; a classification of the second object; or a feature associated with the second object.

X: the system of any of paragraphs U-W, wherein the predicted position comprises a distance along the reference line and a lateral offset from the reference line.

Y: the system of any of paragraphs U-X, wherein the machine learning model is a first machine learning model, and wherein the reference line is received from a second machine learning model trained to output the reference line.

Z: a method, comprising: receiving sensor data representative of an environment; determining that an object is in the environment; receiving a reference line associated with the object; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute and the reference line into a machine learning model; and receiving a predicted position of the object at a third time after the second time from the machine learning model, the predicted position being relative to the reference line in the environment.

AA: the method of paragraph Z, further comprising: capturing the sensor data using a sensor of a vehicle; and controlling the vehicle based at least in part on the predicted location of the object in the environment at the third time.

AB: a method according to paragraph AA, wherein the object is one of a plurality of objects in the environment, the method further comprising: the object is selected as a target object based at least in part on a distance between the object and a vehicle in the environment.

AC: a method according to any of paragraphs Z to AB, wherein the object is one of a plurality of objects in the environment, and wherein the object is a target object, said method further comprising: selecting a number of objects of a plurality of objects based at least in part on a proximity of the plurality of objects to the target object; determining attributes associated with the number of objects; and inputting the attributes into the machine learning model to determine the predicted location.

AD: the method of paragraph AC, further comprising: selecting the number of objects based at least in part on the classifications associated with the number of objects.

AE: the method of any of paragraphs Z-AD, wherein the reference line corresponds to a centerline of the drivable region, and wherein the predicted position comprises a distance along the reference line and a lateral offset from the reference line.

AF: the method of any of paragraphs Z-AE, wherein the first and second attributes are represented relative to a frame of reference, wherein an origin of the frame of reference is based at least in part on the location of the object at the second time.

AG: the method of any of paragraphs Z-AF, wherein the first attribute comprises at least one of: a velocity of the object at the first time; acceleration of the object at the first time; a location of the object at the first time; a bounding box associated with the object at the first time; an illumination state associated with the object at the first time; a first distance between the object and a map element at the first time; a classification of the object; or a characteristic associated with the object.

AH: the method of paragraph AG, wherein the object is a first object and the distance is a first distance, the method further comprising: determining that a second object is proximate to the first object in the environment; wherein the first attribute further comprises a second distance between the first object and the second object at the first time.

AI: a non-transitory computer-readable medium storing instructions that, when executed, cause one or more processors to perform operations comprising: receiving sensor data representative of an environment; determining that an object is in the environment based at least in part on the sensor data; receiving a reference line associated with the object; determining a first attribute associated with the object, the first attribute associated with a first time; determining a second attribute associated with the object, the second attribute associated with a second time after the first time; inputting the first attribute, the second attribute and the reference line into a machine learning model; and receiving a predicted position of the object at a third time after the second time from the machine learning model, the predicted position being relative to the reference line in the environment.

AJ: the non-transitory computer-readable medium of paragraph AI, wherein the object is a first object, the operations further comprising: determining that a second object is proximate to the first object in the environment; determining a third attribute associated with the second object, the third attribute associated with the first time; determining a fourth attribute associated with the second object, the fourth attribute associated with the second time; and inputting the third and fourth attributes into the machine learning model to determine a predicted location associated with the first object.

AK: the non-transitory computer-readable medium of any of paragraphs AI-AJ, the first and second attributes being represented relative to a frame of reference, wherein an origin of the frame of reference is based at least in part on a location of the object at the second time.

AL: a non-transitory computer readable medium as paragraph AK recites, wherein the predicted position is expressed as a distance along the reference line and a lateral offset from the reference line.

AM: the non-transitory computer-readable medium of any of paragraphs AI-AL, wherein the first attribute comprises at least one of: a velocity of the object at the first time; acceleration of the object at the first time; a location of the object at the first time; a bounding box associated with the object at the first time; an illumination state associated with the object at the first time; a first distance between the object and a map element at the first time; a classification of the object; or a characteristic associated with the object.

AN: a non-transitory computer readable medium as paragraph AM recites, wherein the object is a first object, the distance is a first distance, and the first attribute further includes a second distance between the first object and the second object at the first time.

While the example clauses described above are described with respect to a particular embodiment, it should be understood that the contents of the example clauses, in the context of this document, may also be implemented in a method, apparatus, system, computer-readable medium, and/or another implementation.

Conclusion

While one or more examples of the technology described herein have been described, various modifications, additions, permutations, and equivalents thereof are included within the scope of the technology described herein.

In describing examples, reference is made to the accompanying drawings which form a part hereof, and in which are shown by way of illustration specific examples of the claimed subject matter. It is to be understood that other examples may be used and that changes or alterations, such as structural changes, may be made. Such examples, changes, or variations do not necessarily depart from the scope of the claimed subject matter as intended. Although the steps herein may be presented in a certain order, in some cases the order may be changed to provide certain inputs at different times or in a different order without changing the functionality of the systems and methods. The disclosed procedures may also be performed in a different order. Moreover, the various calculations herein need not be performed in the order disclosed, and other examples using varying orders of calculations may also be readily implemented. In addition to reordering, a computation may also be decomposed into sub-computations with the same result.

Claims

1. A method comprising

Receiving sensor data representative of an environment;

determining that an object is in the environment based at least in part on the sensor data;

determining a location in the environment, the location associated with at least one of a non-drivable area or a crosswalk area in the environment;

determining a first attribute associated with the object, the first attribute associated with a first time;

determining a second attribute associated with the object, the second attribute associated with a second time after the first time;

inputting the first attribute, the second attribute, and the location to a machine learning model; and

receiving, from the machine learning model, a predicted location associated with the object at a third time after the second time.

2. The method of claim 1, further comprising:

capturing the sensor data with a sensor on a vehicle; and

controlling the vehicle based at least in part on the predicted location of the object in the environment at the third time.

3. The method of claim 1 or 2, wherein the location is a first location, the method further comprising;

Determining the first location based at least in part on at least one of the sensor data or map data representative of the environment;

determining a threshold region associated with the first location;

determining a second location of the object in the environment;

determining that the second location of the object is within the threshold region; and

selecting the location as a destination associated with the object based at least in part on the second location being within the threshold region and at least one of the first attribute or the second attribute.

4. The method of any of claims 1-3, wherein the location is a first location, the method further comprising;

establishing a frame of reference, wherein:

a second position of the object at the second time is associated with the origin of the frame of reference;

a first axis is based at least in part on the origin and the first position; and

the second axis is perpendicular to the first axis; and

wherein the first attribute is based at least in part on the frame of reference.

5. The method of claim 4, further comprising:

determining a velocity of the object at the second time; and

Determining an angle between a velocity vector representing the velocity and the first axis;

wherein the second attribute comprises the angle.

6. The method of claim 4, wherein:

the position is a first position; and

the predicted location associated with the object at the third time comprises: a lateral offset relative to the second axis and a distance along the first axis representing a difference between a second position of the object at the second time and the predicted position.

7. The method of any of claims 1 to 6, further comprising:

determining a number of objects that entered the crosswalk area over a period of time, wherein the second attribute includes the number of objects.

8. The method of any of claims 1 to 7, wherein the object is a first object, the method further comprising;

determining that a second object is in the environment based at least in part on the sensor data;

determining at least one of a position, a velocity, or an acceleration associated with the second object as an object context; and

determining the predicted location associated with the object further based at least in part on the object context.

9. The method of any of claims 1 to 8, further comprising:

at least a portion of the predicted locations are ranked to determine a ranked predicted location.

10. The method of any of claims 1-9, wherein the first attribute comprises at least one of:

a location of the object at the first time;

a velocity of the object at the first time;

a direction of progress of the object at the first time;

a first distance between the object at the first time and a first portion of the sidewalk area;

a second distance between the object at the first time and a second portion of the sidewalk area;

an acceleration of the object at the first time;

an indication of whether the object is in a drivable region;

zone control indicator status;

a vehicle background; or

And (4) object relevance.

11. A computer program product comprising coded instructions which, when run on a computer, implement the method according to any one of claims 1 to 10.

12. A system, comprising:

one or more processors; and

one or more non-transitory computer-readable media storing instructions that, when executed, cause the system to perform operations comprising:

Receiving sensor data representative of an environment;

13. The system of claim 12, wherein the location is a first location, the operations further comprising;

determining the first location based at least in part on at least one of map data representative of the environment or the sensor data representative of the environment;

determining a threshold region associated with the first location;

determining a second location of the object in the environment;

selecting the first location as a destination associated with the object based at least in part on the second location of the object being within the threshold region and at least one of the first attribute or the second attribute.

14. The system of one of claims 12 or 13, wherein the location is a first location, the operations further comprising;

establishing a frame of reference, wherein:

a first axis based at least in part on the origin and the first position; and

the second axis is perpendicular to the first axis; and

15. The system of any of claims 12 to 14, the operations further comprising:

determining that the object is not associated with the crosswalk region; and

determining that the location is associated with the non-drivable area of the environment.