CN115515835A

CN115515835A - Trajectory classification

Info

Publication number: CN115515835A
Application number: CN202180033877.9A
Authority: CN
Inventors: K·M·西伯特; G·加利梅拉; S·帕里克
Original assignee: Zoox Inc
Current assignee: Zoox Inc
Priority date: 2020-05-08
Filing date: 2021-04-26
Publication date: 2022-12-23
Also published as: WO2021225822A1; EP4146510A1; EP4146510A4; JP2023525054A

Abstract

Techniques for predicting behavior of an object in an environment are discussed herein. For example, such techniques can include inputting data into a model and receiving output from the model representative of a discretized representation. The discretized representation can be associated with a probability that the object will arrive at a location in the environment at a future time. The vehicle computing system may use the discretized representation and the probabilities to determine trajectories and weights associated with the trajectories. Vehicles (such as autonomous vehicles) may be controlled to traverse the environment based on trajectories and weights output by the vehicle computing system.

Description

Trajectory classification

RELATED APPLICATIONS

This application claims priority to U.S. patent application Ser. No. 16/870,083 entitled "Tracertyory Classification" filed on 8/5/2020 and U.S. patent application Ser. No. 16/870,355 entitled "Tracertyory WITH INTENT" filed on 8/5/2020, the entire contents of which are incorporated herein by reference.

Background

Planning systems in autonomous and semi-autonomous vehicles determine actions taken by the vehicle in an operating environment. The action of the vehicle may be determined based in part on avoiding objects present in the environment. For example, an action to yield a pedestrian, change lanes to avoid another vehicle on the road, or the like may be generated. Accurate prediction of future behavior (e.g., intent) may be necessary for safe operation in the vicinity of the object, particularly where the behavior may change based on selected behavior of the vehicle.

Drawings

The detailed description is described with reference to the accompanying drawings. In the drawings, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears. The same reference numbers in different drawings identify similar or identical elements or features.

FIG. 1 is an illustration of an autonomous vehicle in an environment, where an example machine learning model can process an overhead representation of the environment to determine a discretized representation of the environment.

FIG. 2 is an exemplary top-view representation of an environment in which a machine learning model may be used to implement the techniques described herein.

FIG. 3 is an illustration of an autonomous vehicle in an environment, where an exemplary machine learning model can determine a discretized representation of the environment.

FIG. 4 is an exemplary discretized representation of an environment that is output by an exemplary machine learning model.

FIG. 5 is an illustration of an autonomous vehicle in an environment in which an exemplary machine learning model may process data to determine a trajectory or intent of an object.

FIG. 6 is a block diagram of an example system for implementing the techniques described herein.

FIG. 7 is a flow diagram depicting an example process for determining predicted trajectories and weights using different models.

FIG. 8 is a flow diagram depicting an example process for determining an intent associated with a trajectory or trajectory type using different models.

Detailed Description

The behavior or intent of objects in the predicted environment may affect the underlying actions of the autonomous vehicle. In at least some examples, some such intent may change in response to an action performed by the vehicle. The motion of objects in the environment may change rapidly.

Techniques for applying and/or training a model to predict behavior of objects in an environment are described. In some examples, such behavior may include an intent, which may indicate a motion that the object may take at some point in the near future. For example, one or more machine learning models may process data related to an object represented in an image format and determine possible actions that the object may take at some time in the future. In some examples, the object may be a pedestrian, and the model(s) may predict trajectory(s) of the pedestrian and weight(s) associated with the predicted trajectory. The weight may indicate whether the pedestrian will reach the destination (e.g., crosswalk). In at least some examples, such an intent may be based at least in part on an action to be performed by the vehicle. The pedestrian trajectory or weight determined by the model(s) may be taken into account during vehicle planning to improve the safety of the vehicle by planning the likelihood that pedestrians may use different trajectories to reach several possible destinations when the vehicle is navigating in the environment.

In some examples, a computing device may implement a machine learning model to predict behavior of an object (e.g., a bicycle, a pedestrian, another vehicle, an animal, etc.) that may cause an impact on the operation of an autonomous vehicle. For example, the machine learning model may determine a trajectory (e.g., direction, speed, and/or acceleration) that an object follows in the environment at a future time and a weight based on a predicted probability that the object is at a destination (e.g., a pedestrian crossing, a road segment, outside of a road segment, etc.) at the future time. In such examples, the vehicle computing system of the autonomous vehicle may consider outputs (e.g., trajectories and weights) from the machine learning model to predict candidate trajectories of the vehicle (using the same or different models) to improve the safety of the vehicle by providing candidate trajectories to the autonomous vehicle that can safely avoid potential behavior by objects that may affect vehicle operation (e.g., intersecting the trajectory of the autonomous vehicle, causing the autonomous vehicle to turn or brake heavily, etc.).

In some examples, the machine learning model may predict several different trajectories of objects associated with different possible destinations. By way of example and not limitation, an autonomous vehicle may implement a machine learning model to output a first trajectory of a pedestrian and a first predicted probability that the pedestrian will reach a first crosswalk, and to output a second trajectory of the pedestrian and a second predicted probability that the pedestrian will reach a second crosswalk in the event that the pedestrian approaches an intersection having several crosswalks. In such an example, the machine learning model may determine the first weight based at least in part on the first prediction probability and determine the second weight based at least in part on the second prediction probability. For example, the model may determine a first weight to indicate that at a future time a pedestrian has an 80% probability of entering a first pedestrian crossing and a second weight to indicate that a pedestrian has a 20% probability of entering a second pedestrian crossing. In various examples, the machine learning model may send information including the weights, trajectories, and/or predicted probabilities associated with the destination to a planning component of the autonomous vehicle, which may use the information in planning considerations (e.g., trajectory determination, calculation, etc.).

In some examples, data encoded in an image format representing an overhead view of an environment may be input to a machine learning model. The data may include sensor data and/or map data captured from or associated with sensors of vehicles in the environment, as well as any other data source that may be encoded as an overhead representation. The machine learning model may use data to represent one or more of: attributes of the object (e.g., position, velocity, acceleration, yaw, etc.), a history of the object (e.g., position history, velocity history, etc.), attributes of the vehicle (e.g., velocity, position, etc.), crosswalk permissions, traffic light permissions, and so forth. This data may be represented in an overhead view of the environment to capture the environment of the vehicle (e.g., to identify other vehicles and pedestrian actions relative to the vehicle). The overhead view of the environment represented by the data may also improve the prediction (e.g., facing and/or heading) of the direction and/or destination with which a pedestrian or other object may be associated by providing more information about the pedestrian's surroundings, such as whether another pedestrian blocks the pedestrian's path.

In some instances, output from a machine learning model (e.g., a first model) that includes a predicted trajectory (or trajectories) associated with an object and a weight (or weights) associated with the predicted trajectory may be sent to another machine learning model (e.g., a second model) that is configured to determine an intent of the trajectory associated with the object. For example, the additional model may receive the trajectory and weights as inputs and determine an intent (e.g., a possible destination) of the object and the trajectory, and in some instances, may associate the intent with the trajectory of the object. In various examples, such intent may represent a category of future (or expected) behavior of the object, such as, but not limited to, continuing straight, turning right, turning left, crossing a crosswalk, and the like.

In some examples, a machine learning model may receive data as input and provide output that includes a discretized representation of a portion of an environment. In some cases, a portion of the discretized representation (e.g., a grid) can be referred to as a cell of the discretized representation. Each cell can include a predicted probability representing a probability of the respective location of the object in the environment at a time corresponding to the discretized representation. In some examples, the location of the unit may be associated with a destination in the environment. In some instances, the machine learning model may output a plurality of prediction probabilities, which may represent probabilistic predictions associated with the object and one or more destinations at a particular time in the future (e.g., 0.5 seconds, 1 second, 3 seconds, 5 seconds, 10 seconds, etc.). In some examples, the discretized representation of the machine learning model output can be used by another model or computing device to determine the weight and/or trajectory of the object.

In various instances, the cells of the discretized representation can be associated with classification probabilities of the location of the object at future times. For example, the cell can indicate a probability that the object is at a location (e.g., yes/no) in the discretized representation at a future time. In some examples, the location representation may be based at least in part on an offset of a previous location of the object at a previous time prior to the future time. That is, in some examples, the offset may represent a direction and distance that the object may travel from the starting cell to the ending cell of the discretized representation in the future. Additional details regarding the discretized representation are contained in fig. 3 and 4, as well as elsewhere.

In some examples, the machine learning model may determine a predicted trajectory associated with the object and a weight associated with the predicted trajectory based at least in part on the discretized representation and the classification probability. For example, the predicted trajectory may be based on a path through one or more cells over a period of time. The predicted trajectory may, for example, indicate the distance, direction, speed, and/or acceleration that the object is most likely to take in the future. In some examples, the predicted trajectory may be based at least in part on interpolating a location of the object at the first time and a location associated with the classification probability at the second time. The weights determined by the model may indicate the likelihood that the predicted trajectory is used by an object to reach a destination (e.g., a road, a crosswalk, a sidewalk, etc.). For example, the weight may be determined by the model based at least in part on a classification probability (e.g., that a pedestrian will enter a pedestrian crossing or road). In at least some such examples, the plurality of trajectories may represent paths through the grid cells having the highest likelihood.

In some examples, the machine learning model may receive map data and determine, based on the map data, that the locations of the cells in the discretized representation are associated with semantic destinations in the environment (e.g., classifiable regions or destinations in the environment, such as crosswalks, sidewalks, road segments, and so forth). For example, the location of a first cell may be associated with a crosswalk (e.g., a first semantic destination), while the location of a second cell may be associated with a road (e.g., a second semantic destination). In some examples, the machine learning model may determine a weight for the predicted trajectory based at least in part on a predicted probability of the object being at the location, and the location being associated with the semantic destination. Thus, the weight of a predicted trajectory may represent how likely the predicted trajectory is to be used to get an object to a particular destination.

In some examples, the model may determine a predicted trajectory of the object based on an intersection between a future location of the object in the discretized representation and one or more cells associated with the future location. In some examples, the location of the object may intersect and/or overlap with multiple cells over a period of time and end at a cell representing a future time (e.g., 2-4 seconds in the future). In some examples, the machine learning model may determine that the location of the cell at a future time is associated with a destination (e.g., a crosswalk, a road, etc.). For example, a location in the discretized representation can be associated with a destination in the environment by identifying a destination in the map data that overlaps at least a portion of the respective cell.

In some examples, when the area of the unit includes more than one destination, the machine learning model may determine a score for the first destination and a score for the second destination, and compare the scores (e.g., highest scores) to designate one of the first destination or the second destination as the destination associated with the unit. In other examples, the machine learning model may determine a weight for each of the first destination and the second destination and send weighted destination information associated with the unit to the autonomous vehicle for consideration by the planner. This provides the autonomous vehicle with improved details of multiple possible destinations (and associated possibilities) of the object when the autonomous vehicle determines candidate trajectories to safely navigate relative to the object.

In various examples, a vehicle computing system may receive one or more instructions representative of output(s) from one or more models. The vehicle computing system may, for example, send instructions from the one or more instructions to a planning component of the vehicle that plans a trajectory for the vehicle and/or to a perception component of the vehicle that processes sensor data. Additionally or alternatively, the output(s) from the one or more models may be used by one or more computing devices remote from the vehicle computing system to train the machine learning model.

The vehicle computing system may be configured to determine an initial position of each detected object. In various examples, a prediction component of the vehicle computing system (e.g., a model that predicts behavior of objects) may determine one or more predicted trajectories associated with each detected object, e.g., as a function of an initial location associated therewith. In some examples, one or more predicted trajectories may be determined based on sensor data and/or output(s) of a model. Each predicted trajectory may represent a potential path of a detected object that may traverse the environment. The one or more predicted trajectories may be based on passive predictions (e.g., independent of an action taken by the vehicle and/or another object in the environment, substantially unresponsive to an action of the vehicle and/or other object, etc.), active predictions (e.g., based on a reaction to an action of another object in the vehicle and/or environment), or a combination thereof. In such examples, the one or more predicted trajectories may be based on an initial speed and/or direction of travel determined from the sensor data. In some examples, the one or more predicted trajectories may be determined using machine learning techniques. Additional details of generating trajectories to control vehicles are described in U.S. patent application No. 15/632,608 entitled "Trajectory Generation and Execution Architecture," filed on 23/6.2017, which is incorporated herein by reference. Additional details for assessing Risk associated with various traces are described in U.S. patent application No. 16/606,877 entitled "Probabilistic Risk for trace assessment," filed on 30/11/2018, which is incorporated herein by reference. Additional details of training a machine learning model Based on stored sensor data by minimizing differences between actual and predicted positions and/or predicted trajectories are described in U.S. patent application serial No. 16/282,201 entitled "Motion Prediction Based on Appearance" filed on 12/3/2019, which is incorporated herein by reference.

In various examples, the vehicle computing system may be configured to determine an action to take (e.g., to control a trajectory of the vehicle) while operating based on the predicted trajectory, intent, trajectory type, and/or weights determined by the one or more models. These actions may include a reference action (e.g., one of a set of maneuvers that the vehicle is configured to perform in reaction to a dynamic operating environment), such as changing lanes to the right, changing lanes to the left, staying in a lane, bypassing an obstacle (e.g., parking vehicles side-by-side, a group of pedestrians, etc.), or the like. The action may also include sub-actions such as a speed change (e.g., holding speed, accelerating, decelerating, etc.), a position change (e.g., changing position on a lane), or the like. For example, the action may include staying on the lane (action), and adjusting the position of the vehicle on the lane from a center position to an operation on the left side of the lane (sub-action).

In various examples, a vehicle computing system may be configured to determine a reference action and/or a sub-action applicable to a vehicle in an environment. For example, a pedestrian traveling toward a crosswalk will be predicted to behave differently than a pedestrian away from the road, or than a pedestrian crossing the road outside the crosswalk. As another example, a pedestrian on a road may behave differently than a pedestrian outside the road, or a pedestrian crossing the road outside a crosswalk. In another non-limiting example, a cyclist traveling along a road is predicted differently than a cyclist traveling to or within a crosswalk.

For each applicable action and sub-action, the vehicle computing system may implement different models and/or components to simulate future states (e.g., estimated states) by projecting the vehicle and associated object(s) in the environment for a period of time (e.g., 5 seconds, 8 seconds, 12 seconds, etc.). The model may project the object(s) based on the predicted trajectory associated therewith (e.g., estimate a future location of the object (s)). For example, the model may predict a trajectory of a pedestrian and predict a weight indicating whether the trajectory will be used by the object to reach the destination. The vehicle computing system may project the vehicle (e.g., estimate a future location of the vehicle) based on a vehicle trajectory associated with the action. The estimated state(s) may represent an estimated position of the vehicle (e.g., an estimated location) and an estimated position of the object(s) of interest at some future time. In some examples, the vehicle computing system may determine relative data between the vehicle and the object(s) in the estimated state(s). In such an example, the relative data may include a distance between the vehicle and the object, a location, a speed, a direction of travel, and/or other factors. In various examples, the vehicle computing system may determine the estimated state at a predetermined rate (e.g., 10 hertz, 20 hertz, 50 hertz, etc.). In at least one example, estimating the state may be performed at a rate of 10 hertz (e.g., 80 estimation intents over an 8 second period).

In various examples, the vehicle computing system may store sensor data associated with the actual position of the object at the end of the estimated state set (e.g., at the end of the time period) and use that data as training data to train one or more models. For example, the stored sensor data may be retrieved by the model and used as input data to identify cues of the object (e.g., to identify features, attributes, or gestures of the object). Such training data may be determined based on manual annotation and/or by determining a change in associated semantic information of the location of the object. As a non-limiting example, if an object is located on a portion of the map marked as a sidewalk at one point in time and on a portion of the drivable surface at some later point in time, the data associated between these time periods and with the object may be marked as instances of crossroads without the need for manual annotation. Further, the detected position associated with the object over such a period of time may be used to determine a ground truth trajectory associated with the object. In some examples, the vehicle computing system may provide the data to a remote computing device (i.e., a computing device separate from the vehicle computing system) for data analysis. In such an example, the remote computing system may analyze the sensor data to determine one or more tags of the image, actual location, speed, direction of travel, or the like of the object at the end of the estimated state set. In some such examples (e.g., examples of determining the intent of a pedestrian), a ground true position oriented during the course of the log may be determined (either manually labeled or determined by another machine learning model), and such ground true position may be used to determine the actual intent of the pedestrian (e.g., whether the pedestrian remains standing, crosses the road, starts/continues to run, starts/continues to walk, etc.). In some examples, corresponding data may be input to the model to determine outputs (e.g., intents, trajectories, weights, etc.), and differences between the determined outputs and actual actions made by the subject may be used to train the model.

The techniques discussed herein may improve the functionality of a vehicle computing system in a variety of ways. The vehicle computing system may determine an action to be taken by the autonomous vehicle based on the determined intent, trajectory, and/or trajectory type of the object represented by the data. In some examples, using the behavior prediction techniques described herein, the model may output object trajectories and associated weights that improve safe operation of the vehicle by accurately describing the motion of the objects with greater granularity and detail than previous models.

The techniques discussed herein may also improve the functionality of a computing device in a variety of other ways. In some cases, representing the environment and object(s) in the environment as an overhead view may represent a simplified representation of the environment in order to generate prediction probability(s) and/or select between candidate actions. In some cases, the overhead view representation may represent the environment without extracting particular features of the environment, which may simplify the generation of the prediction system and the subsequent generation of the at least one predicted trajectory, intent, or weight. In some cases, evaluating the output via the model(s) may cause the autonomous vehicle to generate more accurate and/or safer trajectories for the autonomous vehicle to traverse the environment. For example, a predicted probability associated with a first candidate action may be evaluated to determine a likelihood of a collision or near collision, and possibly allow the autonomous vehicle to select or determine another candidate action (e.g., change lanes, stop, etc.) in order to safely traverse the environment. In at least some examples described herein, in addition to intentional actions, prediction based on top-view coding of an environment may minimize (improve) the spread of probability distribution functions associated with objects, resulting in safer system decisions. These and other improvements to computer functionality are discussed herein.

The techniques described herein may be implemented in a variety of ways. Exemplary embodiments are provided below with reference to the following drawings. Although discussed in the context of an autonomous vehicle, the methods, apparatus, and systems described herein may be applied to a variety of systems (e.g., manually driven vehicles, sensor systems, or robotic platforms) and are not limited to autonomous vehicles. In another example, these techniques may be used in an aeronautical or nautical context, or in any system that uses machine vision (e.g., in a system that uses data represented in an image format). While examples are given in terms of determining the intent of pedestrians and bicycles, the techniques described herein are also applicable to determining attributes of other objects in an environment (e.g., vehicles, skateboarders, animals, etc.).

Fig. 1 is an illustration of an autonomous vehicle (vehicle 102) in an environment 100, where an example machine learning model can process an overhead representation of the environment to determine a discretized representation of the environment. Although fig. 1 depicts an autonomous vehicle, in some examples, the behavior prediction techniques described herein may be implemented by other vehicle systems, components, and/or remote computing devices. For example, and as will be described in further detail with respect to fig. 6, the behavior prediction techniques described herein can be implemented at least in part by or associated with the model component 630 and/or the planning component 624.

In various examples, a vehicle computing system of the vehicle 102 may be configured to detect the object 104 in the environment 100, such as by a perception component (e.g., perception component 622). In some examples, the vehicle computing system may detect the object 104 based on sensor data received from one or more sensors. In some examples, the sensor(s) may include sensors mounted on the vehicle 102, and include, but are not limited to, ultrasonic sensors, radar sensors, light detection and ranging (LIDAR) sensors, cameras, microphones, inertial sensors (e.g., inertial measurement units, accelerometers, gyroscopes, etc.), global Positioning Satellite (GPS) sensors, and so forth. In some examples, the sensor(s) may include one or more remote sensors, such as sensors installed on another autonomous vehicle, and/or sensors installed in environment 100.

In various examples, the vehicle 102 may be configured to transmit and/or receive data from other autonomous vehicles and/or remote sensors. The data may include sensor data, such as data about objects 104 detected in the environment 100. In various examples, environment 100 may include remote sensors for traffic monitoring, collision avoidance, or the like. In some examples, remote sensors may be installed in an environment to provide additional visibility in areas of reduced visibility (e.g., at blind or semi-blind intersections). For example, intersections in the environment 100 may be determined to have blind intersections, where approaching vehicles may not be able to perceive the object 104 and/or other vehicles approaching from the left or right side on the intersecting road. Accordingly, intersections in the environment may include sensors to provide sensor data about the object 104 (e.g., pedestrians approaching the intersection) to the approaching vehicle 102.

In various examples, the vehicle computing system may receive the sensor data and may determine a type of the object 104 (e.g., classify the type of the object), such as whether the object 104 is a car, truck, motorcycle, moped, cyclist, pedestrian, such as the object 104, or the like. In some examples, object types may be input into the model to provide object behavior predictions.

Fig. 1 also depicts environment 100 as including

crosswalks

106A and 106B. In some examples, a machine learning model 108 (e.g., model 108) may be applied to predict whether the pedestrian 104 will walk to and/or within the

pedestrian crossing

106A or 106B at a future time. In some examples, the machine learning model 108 may determine different behavioral predictions for the pedestrian, such as determining the predicted trajectory 110A and the predicted trajectory 110B. The model 108, for example, can determine the predicted

trajectories

110A and 110B based at least in part on receiving input data representing the top-view representation 112 and the discretized representation 114 of the output environment 100. The vehicle computing system of the vehicle 102 may use the predicted

trajectories

110A and 110B to infer the intent of the pedestrian (e.g., whether the pedestrian may be approaching a destination in the environment 100). The pedestrian trajectory and/or intent determined by the vehicle 102 can be considered during vehicle planning (e.g., planning component 624) to improve the safety of the vehicle as it navigates in the environment.

In general, the overhead representation 112 may represent an area surrounding the vehicle 102. In some examples, the area may be based at least in part on an area visible to the sensor (e.g., sensor range), a rolling time domain, an area associated with an action (e.g., crossing an intersection), and/or the like. In some examples, the overhead representation 112 may represent an area of 100 meters by 100 meters around the vehicle 102, although any area is contemplated. The machine learning model 108 can receive data from the perception component 622 regarding objects in the environment, and can receive data from the positioning component 620, the perception component 622, and one or more maps 628 regarding the environment itself. The model 108 may generate an overhead view of the environment, including objects in the environment (e.g., represented by bounding boxes discussed herein), semantic information about the objects (e.g., classification types), motion information (e.g., velocity information, acceleration information, etc.), and so forth.

In various examples, the top-view representation 112 of the environment 100 may represent a top-view perspective of the environment and may include one or more multi-channel images, such as a first channel 116, a second channel 118, and a third channel 120. The vehicle computing system may generate or determine the multi-channel image(s) to represent different attributes of the environment with different channel images. For example, an image has multiple channels, where each channel represents some information (semantic or otherwise). In general, one of the

channel images

116, 118, and 120 may represent object position, object velocity, object acceleration, object yaw, attributes of the object, crosswalk permissions (e.g., crosswalk light or audio status), and traffic light permissions (e.g., traffic light status), to name a few. An example of generating or determining the multi-channel image(s) is discussed in U.S. patent application No. 16/151,607 entitled "Trajectory Prediction of Top-Down Scenes" and filed on 2018, 10/4. The entire contents of application Ser. No. 16/151,607 are incorporated herein by reference. Details of the top view representation 112 will be discussed in FIG. 2 and elsewhere.

In some examples, the first lane 116 may represent a bounding box, a location, a range (e.g., a length and a width), etc., of the autonomous vehicle 102 and/or the object 104 in the environment. In some examples, the second channel 118 may represent crosswalk clearance information (e.g., clearance to occupy a crosswalk based on available space and/or signals). For example, the second lane 118 may illustrate an area available for a pedestrian to walk on a crosswalk and whether the area is associated with a current crosswalk signal indicating that the pedestrian is allowed to enter the crosswalk. In some examples, the third channel 120 may represent additional object data or vehicle data, in this case, corresponding to speed information 122 (e.g., V) associated with the object 104 ₁ ) And directional information 124 (e.g., D) ₁ ). In some examples, speed information 122 may include instantaneous speed, average speed, and the like. In some examples, the direction information 124 may include an instantaneous direction, an average direction, and the like. Although discussed in the context of speed, speed information 122 may represent information associated with acceleration (e.g., an average of a maneuver, a maximum acceleration associated with a maneuver, etc.), distance(s) from another object or vehicle, etc.

In some examples, the discretized representation 114 of the environment 100 can represent a grid associated with time. For example, the discretized representation 114 can represent a 21 × 21 grid (or a J × K size grid), representing an area 25 meters by 25 meters (or other size area) around the pedestrian 104. In some examples, the discretized representation 114 can have a center that includes the pedestrian 104 at the first time, and can progress in time as the pedestrian 104 moves from the initial position. Details of the discretized representation 114 are discussed elsewhere in connection with fig. 3 and 4.

In some examples, discretized representation 114 includes a plurality of cells, such as cell 126 and cell 128. Each cell may include a probability that the pedestrian 104 will be at the location of the cell in the future (e.g., a second time after the first time). For example, and as explained in more detail below, model 108 may determine that cell 126 is associated with pedestrian crossing 106A and cell 128 is associated with pedestrian crossing 106B, and output predicted

trajectories

110A and 110B based at least in part on the probabilities associated with the respective cell locations. In some examples, the

cells

126 and 128 are associated with respective locations that represent an offset (e.g., shown as shaded cells in fig. 1) from a first location of the object 104 at a first time based on a location of the object at a previous time prior to a future time. For example, in fig. 1, the shading of a cell may represent a possible path for a pedestrian from the current location to a location in the discretized representation 114, such as cell 126, which is associated with the destination, crosswalk 106A.

In some examples, the predicted

trajectories

110A and 110B may be determined by the model 108 based at least in part on interpolating the location of the object 104 at the first time and the location associated with the probability at the second time. For example, the model 108 can interpolate the position of the object 104 at different times over different regions of the discretized representation 114.

In some cases, the machine learning model 108 can output a plurality of discretized representations, where a discretized representation of the plurality of discretized representations can represent a probabilistic prediction associated with the object at a particular time in the future (e.g., 0.5 seconds, 1 second, 3 seconds, 5 seconds, 10 seconds, etc.).

In some examples, model 108 may determine a weight (e.g., a probability) to indicate whether pedestrian 104 will use predicted trajectory 110A or predicted trajectory 110B. Details of determining the weights are discussed in fig. 3 and elsewhere. The model 108 may send the predicted trajectory and associated weights to additional models that will determine the intent of the pedestrian 104. However, in some examples, the functionality provided by the additional models may be performed by the models 108. Details of the model used to determine the intent of the pedestrian 104 will be discussed in conjunction with FIG. 5 and elsewhere.

In various examples, a planning component and/or a perception component of the vehicle computing system may determine one or more candidate trajectories for the autonomous vehicle based on output (e.g., intent, predicted trajectory, weights, etc.) from one or more of the models 108 and/or additional models. In some examples, the candidate trajectory may include any number of potential paths that the vehicle 102 may travel from a current location (e.g., upon perception) and/or based on a direction of travel. In some examples, the potential path of one of the pedestrians 104 may include remaining stationary. In such an example, the corresponding trajectory may represent little or no motion. In some examples, the number of trajectories may vary depending on various factors, such as the classification of the object (e.g., type of object), other static and/or dynamic objects, a surface that may be driven, and so forth. In some examples, one or more candidate trajectories may be determined using machine learning techniques.

FIG. 2 is an example top-view representation of an environment in which a machine learning model may be used to implement the techniques described herein. In at least one example, a vehicle computing system of the vehicle 102 may implement the machine learning model 108 to process data representing the overhead representation 112 of the environment. In this way, the model 108 may use the data to better capture the background environment of the vehicle 102 than methods that do not use overhead views.

As described above, the top view representation 112 includes a multi-channel image including a first channel 116, a second channel 118, and a third channel 120. As shown in FIG. 2, the top representation 112 further includes a fourth channel 204, a fifth channel 206, and a sixth channel 208.

In some examples, the fourth channel 204 may represent traffic light permissivity information (e.g., displaying a traffic light state that allows entry into an intersection with other vehicles and pedestrians) that is correlated with traffic information 210 (e.g., T;) ₁ ) And (4) correspondingly. In some examples, multiple traffic lights or traffic signals may be associated with the fourth channel, such that the fourth channel 204 may include additional traffic information for each traffic light or signal in the environment. In some examples, the traffic information 210 may be used by the model 108 in conjunction with the crosswalk clearance of the second lane 118 to give not only a crosswalk signal or light, but also a traffic light (e.g., to giveTo see if the car has right of way relative to the crosswalk), determine when the crosswalk can pass.

In some examples, the fifth channel 206 may represent an orientation (e.g., roll, pitch, yaw) of the object 104 that corresponds to the orientation information 212 (e.g., O) ₁ ). In some examples, the sixth channel 208 may represent attributes of the object (e.g., object actions like running, walking, squatting, object location history, object speed history, object direction history, etc.) that correspond to the attribute information 214 (e.g., a ₁ ). In some examples, the attributes of the object may include historical behavior with respect to a particular region of the environment. The object properties may be determined by a vehicle computing system implementing one or more models, and may include one or more of: an action, location, or subcategory of the object. For example, the attributes of the object 104 may include a pedestrian looking at the device, looking at the vehicle 102, sitting, walking, running, entering the vehicle, exiting the vehicle, and so forth. In some examples, attribute information 214 may include object types, such as pedestrians, vehicles, mopeds, bicycles, and so forth.

In some examples, the additional lanes of the overhead representation 112 may represent drivable surfaces, weather features, and/or environmental features of the vehicle 102.

The overhead representation 112 of the environment represented by the data may also improve predictions about directions and/or destinations that may be associated with (e.g., facing and/or facing) a pedestrian or other object by providing more information about the pedestrian's surroundings (e.g., whether another pedestrian blocked the pedestrian's path). For example, by including the third channel 120, speed information 122 and direction information 124 may be processed by the model 108.

In some examples, the input to the model 108 may include data associated with a single image or cropped image frame of an object represented in the sensor data of the vehicle. As the vehicle navigates through the environment, additional images are captured for different times and provided as input to the machine learning model 108. In some examples, the image frames may be cropped to the same scale so that each image includes the same dimensions (same aspect ratio, etc.) when included in the input to the model 108.

FIG. 3 is an illustration of an exemplary autonomous vehicle (vehicle 102) in an environment 300, where an exemplary machine learning model can determine a discretized representation of the environment. In at least one example, a vehicle computing system of the vehicle 102 can implement the machine learning model 108 to output the discretized representation 114 of the environment 300.

As described above, in some examples, the discretized representation 114 includes a plurality of cells, such as

cells

126 and 128, that include respective probabilities that the pedestrian 104 will be at respective locations of the cells at a future time. As shown in FIG. 3, the model 108 may determine a predicted trajectory 110A and a weight 302A (e.g., a first weight) to indicate whether the pedestrian 104 will traverse the location corresponding to the cell 128, and determine a predicted trajectory 110B and a weight 302B (e.g., a second weight) to indicate whether the pedestrian 104 will traverse the location corresponding to the cell 126. In some examples, the vehicle computing system of the vehicle 102 may determine the trajectory and/or weight of the pedestrian 104 based on receiving the discretized representation 114 from the model 108.

In some examples, the location of the cells in the discretized representation 114 can represent an offset (e.g., a prediction of the location of the object at a future time). For example, the discretized representation 114 can enable an offset technique to determine the location of the pedestrian 104 at, for example, 4 seconds into the future, and to determine an offset from the current location at the current time to the location associated with the cell at the future time. In such an example, the model 108 (or other component of the vehicle computing system) may determine interior or waypoints based at least in part on the offset. By knowing the destination of the object using an offset technique, the model 108 can provide a predicted trajectory for vehicle planning. In some examples, the predicted trajectory (e.g., 110A or 110B) may not only identify a direction to a location of a cell, but may also identify a distance to the cell based on the discretized representation 114.

In some examples, the predicted

trajectories

110A and 110B may be determined by the model 108 based at least in part on interpolating the location of the pedestrian 104 at a first time and a location associated with a probability at a second time (e.g., the location of the cell 126 or the cell 128). For example, the model 108 can interpolate the positions of the pedestrian 104 at different times in different regions of the discretized representation 114 and determine one or more predicted trajectories based on the interpolation. In such an example, the interpolation may include estimating a set of data points from the change in position of the pedestrian 104 over a period of time. In some examples, the model 108 may implement a linear interpolation algorithm to determine the predicted trajectory.

In some examples, the first weight or the second weight may be determined based at least in part on an aggregation of probabilities associated with the one or more units. For example, the probability (e.g., classification probability) that each cell overlaps or intersects the predicted trajectory 110A may be combined to determine the weight 302A. In some examples, the model 108 may determine the first weight or the second weight based at least in part on an average of the probabilities that each cell overlaps or intersects the predicted trajectory of the object. In various examples, a cell may be considered to overlap or intersect a respective trajectory based on pixels associated with the object being within a threshold range of a lateral boundary of the cell.

By way of example and not limitation, weight 302A may comprise a value of 60% and weight 302B may comprise a value of 40%. Thus, the pedestrian 104 has a 60% likelihood of reaching the cell 126 using the predicted trajectory 110A, and a 40% likelihood of reaching the cell 128 using the predicted trajectory 110B. The trajectories and weights output by the model 108 may be sent to a planning component of the vehicle 102 for consideration by the planner (e.g., to determine actions taken by the vehicle 102).

In some examples, model 108 may determine that the location of cell 128 is associated with pedestrian crossing 106B and cell 126 is associated with pedestrian crossing 106A. For example, model 108 may receive map data and/or sensor data and determine semantic destinations associated with the locations of

cells

128 and 126.

In various examples, the vehicle computing system may store sensor data associated with the actual location of the object and train the model 108 using the data as training data. For example, stored sensor data may be retrieved by the model 108 and used as input data to identify clues to the object (e.g., identify features, attributes, or gestures of the object). In some examples, the vehicle computing system may provide the data to a remote computing device (e.g., a computing device separate from the vehicle computing system) for data analysis. In such examples, the remote computing system may analyze the sensor data to determine one or more tags of the image, the actual location of the object, speed, direction of travel, and/or the like. In some such examples (e.g., examples of determining the intent of a pedestrian), a ground true position oriented in the course of the log may be determined (either manually labeled or determined by another machine learning model), and such ground true position may be used to determine the actual intent of the pedestrian (e.g., whether the pedestrian remains standing, crosses the road, starts/continues to run, starts/continues to walk, etc.). In some examples, corresponding data may be input to the model to determine outputs (e.g., intents, trajectories, weights, etc.), and differences between the determined outputs and actual actions made by the subject may be used to train the model.

FIG. 4 depicts an exemplary discretized representation of an environment output by an exemplary machine learning model. In at least one example, a vehicle computing system of the vehicle 102 can implement the machine learning model 108 to output a discretized representation 412 of the environment 300 at a first time, tl, and to output a discretized representation 414 of the environment 300 at a second time, T2, after the first time, tl.

In the example of fig. 4, the discretized representation 412 includes the location of the vehicle 102, the location of the pedestrian 104, and weighted trajectories 416A and 416B of pedestrian arrivals at the

crosswalks

106A and 106B, respectively, at time Tl. Weighted trajectory 416A represents predicted trajectory 110A and weight 302A. Thus, the weighted trajectory 416A represents the direction of the pedestrian 104, the acceleration of the pedestrian 104, a first likelihood that the pedestrian 104 will reach the first destination (e.g., the crosswalk 106A) at the second time, and a second likelihood that the pedestrian 104 will reach the second destination (e.g., the crosswalk 106B) at the second time. As an example, the weighted trace 416A and the weighted trace 416B are associated with a value of 0.5 (other values, symbols, and expressions of probability are considered in addition to the numerical value) to indicate that the pedestrian has an equal probability of using either the weighted trace 416A or the weighted trace 416B. For example, weighted trajectory 416A and weighted trajectory 416B may be associated with relative priorities (e.g., low, medium, high) and/or other priorities (e.g., first, second, third, etc.). Information about the weighted trajectories 416A and 416B may be sent by the model 108 to the vehicle computing system of the vehicle 102 for consideration by the planner.

As shown in FIG. 4, the discretized representation 414 includes the location of the vehicle 102 at time T2 being different from the location at time T1 to indicate that the vehicle has changed location. For example, the vehicle computing system may receive additional input data corresponding to time T2 and determine discretized representation 414 to represent changes in the location of pedestrian 104 and other objects in the environment. In some examples, the model 108 determines weighted trajectories 418A and 418B of pedestrians arriving at the

crosswalks

106A and 106B, respectively, at time T2. The weighted trajectories 418A and 418B may represent the new predicted trajectory and associated weights for the pedestrian 104 at time T2. For example, the weighted trajectory 418A represents the pedestrian 104 arriving at the crosswalk 106A with a weight of 0.7, which is not surprising given that the vehicle has moved closer to the pedestrian 104 (which does not necessarily need to have moved) and new input data is processed for more accurate trajectory prediction (as compared to previous times). As shown in fig. 4, the weighted trajectory 418B is associated with a value of 0.3 to indicate that the pedestrian has a lower probability of arriving at the crosswalk 106B than the crosswalk 106A.

In some cases, the machine learning model 108 may output a plurality of weighted trajectories, which may represent probabilistic predictions associated with the object and one or more destinations at particular times in the future (e.g., 0.5 seconds, 1 second, 3 seconds, 5 seconds, l0 seconds, etc.). In this case, the discretized representation 414 can determine weighted trajectories 418A and 418B at some future time (e.g., 2 seconds).

In some examples, the time period between time T1 and time T2 may vary and may represent a 1 second interval at 5Hz (5 frame input).

FIG. 5 is an illustration of an autonomous vehicle in an environment 500 in which an example machine learning model may process data to determine a trajectory, a trajectory type, or an intent of an object. The vehicle 102 may include a machine learning model 502 configured to determine the intent of objects in the environment 500. Although described as a separate machine learning model, in some examples, the behavior prediction techniques described herein may be implemented by other vehicle systems, components, and/or computing devices. For example, the behavior prediction techniques described herein may be implemented at least in part by or associated with the model component 630 and/or the planning component 624 of the vehicle computing system 604.

In some instances, the machine learning model 502 (e.g., model 502) may receive the trajectory and the weights as inputs, such as from the model 108, and further receive input data including map data representative of one or more features of the environment 500 (e.g., destinations, roads, objects, etc.). In some examples, a plurality of trajectories and weights from the first model may be received by the model 502 for processing.

In some examples, the model 502 may receive the trajectory and the weights from a planning component of the vehicle computing system. For example, the planning component may send the candidate trajectory along with a weight indicating the likelihood that the candidate trajectory is used by the vehicle 102. In some examples, the trajectory from the planning component may be based at least in part on a regression technique (e.g., a technique that estimates or measures a relationship between two or more variables). In some examples, the model 502 may output an intent of the candidate trajectory based at least in part on the weights and send an indication of the output to the vehicle 102. Additional details of generating trajectories using regression techniques are described in U.S. patent application No. 16/363,541 entitled "Pedestrian Prediction Based On Attributes," filed On 25/3/2019, which is incorporated herein by reference.

In some examples, the model 502 may receive trajectories and weights from the planning component and also receive trajectories and weights from the model 108 and determine one or more intents to associate with the one or more trajectories received from the planning component and/or the model 108. In various examples, a trajectory from the planning component (e.g., a first trajectory) may be associated with a different semantic destination that is not the semantic destination associated with a second trajectory. In some examples, the first semantic destination may include a first region in the environment of the vehicle 102, and the second semantic destination may include a second region in the environment of the vehicle 102. In some examples, the trajectory from the model 108 may be determined based at least in part on a classification technique (e.g., a technique that maps an input to a class or category). By determining trajectories based on classification techniques by the model 108, determinations can be made that reduce inaccurate trajectories (e.g., overlapping to a common trajectory) relative to some non-classified approaches. In some examples, the model comprises a machine learning model, which further comprises a UNet framework and a softmax activation output. For example, the UNet framework can improve the resolution of the outputs made by the model 502, particularly when two or more inputs from two or more sources (e.g., a predicted trajectory from a first model and a candidate trajectory from a second model) are received and outputs of similar resolution are desired.

In some examples, the model 502 may receive an indication of an object and/or type of object detected by the vehicle computing system. For example, the vehicle computing system may provide data to the model 502 indicating that the object is a pedestrian (e.g., pedestrians 510, 520, 524, 526, and 528), a bicycle (e.g., cyclists 508 and 512), a vehicle, an animal, etc., and in some cases, additionally or alternatively, a weighted object type (e.g., 80% probability of the object being a pedestrian and 20% probability of being a bicycle).

In some examples, model 502 may process map data to determine one or more destinations associated with the received trajectory in environment 500, such as crosswalks 504 and 514. In some examples, the destination may include any one of: roads, sidewalks, bicycle lanes, road segments, pedestrian crossings, buildings, bus lanes, and the like. For example, the model may be used to determine whether an object (e.g., a bus traveling along a road) will stop on a bus lane at some time in the future, or remain on the road but not stop on a bus lane (e.g., a lane adjacent to the road to reach a passenger).

In some examples, the model 502 may determine the intent associated with the trajectory based at least in part on the destination associated with the trajectory. For example, based on the determined destination, the model 502 may output data indicating the intent of the trajectory. For example, one or more intents determined by the model 502 may be associated with a trajectory and/or a trajectory type output by the model 502. For example, the model may determine a location of the object relative to the road based on map data indicative of the road segment, and use the location to determine the type of trajectory as at least one of: a road track type or a freeform track type. For example, a road track type may be associated with an object based on the location of the object within a road segment (determined by map data), a threshold distance of the road (e.g., a boundary of a lane), and the like. In some examples, the freeform trajectory type may be associated with an object that is capable of moving independent of road geometry (e.g., within a road segment, outside of a road segment, or a threshold distance from a road segment). The object intent may vary depending on the location of the object relative to the road boundary. In some examples, when on a road, a pedestrian may have a freeform trajectory rather than a road trajectory (e.g., due to not reaching a speed threshold) in order to give the model 502 more flexibility to predict movement of the pedestrian away from the road (e.g., to predict more likely directions in which the pedestrian may move than may limit the predicted road trajectory, e.g., the pedestrian will walk away from the road).

In some examples, the model 502 may determine the intent of an object in the environment based at least in part on the proximity of the object to a region in the environment. For example, the crossroad intent may be determined based on the object not being a vehicle and being within an area, such as a road. In another illustrative example, the region may correspond to a pedestrian crossing, a sidewalk, a bike lane, or the like. In some examples, an area in an environment may include road segments associated with map data representative of the environment.

In general, the model 502 may determine whether an object is intended to enter a crosswalk (e.g., crosswalk intent), travel outside of a crosswalk and in a road (e.g., cross-road intent), and/or travel outside of a crosswalk and off-road (e.g., off-road intent). For example, the model may determine that the intent includes at least one of: an intent of an object in the environment of the autonomous vehicle to travel along the road segment, an intent of an object to travel outside of a vicinity of the road segment, an intent of an object to travel within a crosswalk, or an intent of an object to travel outside of a boundary of a crosswalk.

In various examples, the machine learning model 502 can associate the intent of the object with a trajectory type. By way of example and not limitation, the cyclist 508 may be associated with a road track 516 and the cyclist 512 may be associated with a road track having a crosswalk intent 518. Fig. 5 also depicts that the machine learning model 502 may associate a pedestrian 510 (or a trajectory of a pedestrian) with a freeform trajectory having a crosswalk intent 506, a pedestrian 520 with a freeform trajectory having an off-road intent, and

pedestrians

524, 526, and 528 with freeform trajectories having a cross-road intent 530.

In some examples, the model 502 may associate an object (or a trajectory of an object) with multiple intents and output a weight associated with each intent of the object or trajectory. For example, a pedestrian 520 may have a freeform trajectory with an off-road intent 522 and, for purposes of illustration, a weight of 0.9 to represent a 90% probability of the pedestrian having an off-road intent. Here, the model 502 may also output an indication that the weight of the freeform trajectory of the pedestrian 520 is 0.1 to represent a 10% probability that the pedestrian 520 has the intent to cross the road (e.g., the pedestrian 520 changes direction and enters the road). Accordingly, the weighted intent output by the model 502 may be associated with an object or trajectory.

The model 502 may also or alternatively be configured to determine a trajectory (e.g., a change between a road and another location (e.g., a crosswalk)) of the exit intersection 532. For example, the model 502 (or another model) is configured to receive the trajectory as an input and output a trajectory specific to the exit intersection 532 (e.g., an end point of a destination associated with the intent). As shown in fig. 5, the model 502 (or another model) may be configured to receive the trajectory as an input and output a trajectory specific to an exit intersection 532 for the cyclist 512 for return to the road at a future time after leaving the crosswalk 514.

In some examples, the vehicle computing system may determine that one of a first weight associated with the first trajectory or a second weight associated with the second trajectory is higher than the other of the first weight and the second weight. For example, a first trajectory may be associated with a candidate trajectory from the planning component, while a second trajectory may be associated with a predicted trajectory from the model 108. In some examples, the vehicle computing system may perform at least one of: controlling the autonomous vehicle in the environment based at least in part on the first trajectory in response to determining that the first weight is higher than the second weight, or controlling the autonomous vehicle in the environment based at least in part on the second trajectory in response to determining that the second weight is higher than the first weight.

In general, the output(s) of the model 502 and/or the model 108 (e.g., weights, trajectories, trajectory types, and/or intents) may be in communication with a planning component of the vehicle, which in turn may determine candidate trajectories of the vehicle based at least in part on the output(s). The planning component may, for example, determine the candidate trajectory in different ways, i.e., whether the object is related to a road trajectory type, rather than a freeform trajectory type (each type may be associated with different algorithms, parameters, and/or settings used by the vehicle computing system to generate the actions of the vehicle 102.

Fig. 6 is a block diagram of an example system 600 for implementing the techniques described herein. The vehicle 602 may include a vehicle computing system 604, one or more sensor systems 606, one or more transmitters 608, one or more communication connections 610, at least one direct connection 612, and one or more drive systems 614.

The vehicle computing system 604 may include one or more processors 616 and a memory 618 communicatively coupled to the one or more processors 616. In the illustrated example, the vehicle 602 is an autonomous vehicle; however, the vehicle 602 may be any other type of vehicle, such as a semi-autonomous vehicle, or any other system having at least an image capture device (e.g., a smartphone with a camera). In the illustrated example, the memory 618 of the vehicle computing system 604 stores a positioning component 620, a perception component 622, a planning component 624, one or more system controllers 626, one or more maps 628, and a model component 630, which includes one or more models, such as a first model 632A, a second model 632B, through an nth model 632N (collectively "models 632"), where N can be any integer greater than 1. While depicted in fig. 6 as residing in memory 618 for purposes of illustration, it is contemplated that the positioning component 620, the perception component 622, the planning component 624, the one or more system controllers 626, the one or more maps 628, and/or the model component 630 including the model 632 can additionally or alternatively be accessed by the vehicle 602 (e.g., stored on a memory remote from the vehicle 602, or otherwise accessed by the vehicle, such as on a memory 634 of a remote computing device 636).

In at least one example, the positioning component 620 can include functionality to receive data from the sensor system(s) 606 to determine a position and/or orientation of the vehicle 602 (e.g., one or more of x-position, y-position, z-position, roll, pitch, or yaw). For example, the positioning component 620 may include and/or request/receive a map of the environment, such as from the map(s) 628 and/or the map component 638, and may continuously determine the position and/or orientation of the autonomous vehicle in the map. In some cases, the locating component 620 can utilize SLAM (simultaneous location and mapping), CLAMS (simultaneous calibration, location and mapping), relative SLAM, binding adjustments, non-linear least squares optimization, and the like to receive image data, lidar data, radar data, IMU data, GPS data, wheel encoder data, and the like to accurately determine the location of the autonomous vehicle. In some cases, the positioning component 620 may provide data to various components of the vehicle 602 to determine an initial position of the autonomous vehicle to determine a relevance of the object to the vehicle 602, as described herein.

In some cases, the perception component 622 may include functionality for performing object detection, segmentation, and/or classification. In some examples, the perception component 622 may provide processed sensor data that indicates the presence of an object (e.g., an entity) proximate to the vehicle 602 and/or classifies the object as an object type (e.g., car, pedestrian, bicyclist, animal, building, tree, road, curb, sidewalk, unknown, etc.). In some examples, the perception component 622 can provide processed sensor data that indicates the presence of and/or classifies static entities proximate to the vehicle 602 as types (e.g., building, tree, road surface, curb, sidewalk, unknown, etc.). In additional or alternative examples, the perception component 622 may provide processed sensor data that is indicative of one or more characteristics associated with a detected object (e.g., a tracked object) and/or an environment in which the object is located. In some examples, the features associated with the object may include, but are not limited to, x-position (global and/or local position), y-position (global and/or local position), z-position (global and/or local position), orientation (e.g., roll, pitch, yaw), object type (e.g., classification), object velocity, object acceleration, object range (size), and the like. The characteristics associated with the environment may include, but are not limited to, the presence of another object in the environment, the status of another object in the environment, a time of day, a day of the week, a season, weather conditions, indications of darkness/light, and the like.

Generally, the planning component 624 may determine a path to be followed by the vehicle 602 to traverse the environment. For example, the planning component 624 may determine various routes and trajectories and various levels of detail. For example, the planning component 624 may determine a route for traveling from a first location (e.g., a current location) to a second location (e.g., a target location). For purposes of this discussion, a route may include a series of waypoints traveling between two locations. By way of non-limiting example, waypoints include streets, intersections, global Positioning System (GPS) coordinates, and the like. Further, planning component 624 may generate instructions for guiding the autonomous vehicle along at least a portion of a route from the first location to the second location. In at least one example, planning component 624 may determine how to direct the autonomous vehicle from a first waypoint in the sequence of waypoints to a second waypoint in the sequence of waypoints. In some examples, the instruction may be a trace, or a portion of a trace. In some examples, multiple tracks may be generated substantially simultaneously (e.g., within a technical tolerance) according to a rolling horizon technique, where one of the multiple tracks is selected for navigation by the vehicle 602.

In some examples, planning component 624 may include a prediction component to generate predicted trajectories of objects (e.g., objects) in the environment. For example, the prediction component may generate one or more predicted trajectories for objects within a threshold distance from the vehicle 602. In some examples, the prediction component may measure a trace of an object and generate a trajectory for the object based on observed and predicted behavior.

In at least one example, the vehicle computing system 604 can include one or more system controllers 626, which can be configured to control steering, propulsion, braking, safety, transmitters, communications, and other systems of the vehicle 602. These system controller(s) 626 may communicate with and/or control the respective systems of the drive system(s) 614 and/or other components of the vehicle 602.

The memory 618 may further include one or more maps 628 that may be used by the vehicle 602 to navigate through the environment. For purposes of this discussion, a map may be any number of data structures modeled in two, three, or N dimensions that are capable of providing information about an environment, such as, but not limited to, a topology (e.g., intersections), streets, mountains, roads, terrain, and general environment. In some cases, the map may include, but is not limited to: texture information (e.g., color information (e.g., RGB color information, lab color information, HSV/HSL color information), etc.), intensity information (e.g., lidar information, radar information, etc.); spatial information (e.g., image data projected onto a grid, individual "bins" (e.g., polygons associated with individual colors and/or intensities)), reflectivity information (e.g., specularity information, retroreflectivity information, BRDF (bidirectional reflection distribution function) information, BSSRDF (bidirectional scattering surface reflection distribution function) information, etc.). In one example, the map may include a three-dimensional grid of the environment. In some examples, the vehicle 602 may be controlled based at least in part on the map(s) 628. That is, map(s) 628 can be used in conjunction with positioning component 620, perception component 922, and/or planning component 624 to determine a location of vehicle 602, detect objects in an environment, generate a route, determine an action, and/or trajectory to navigate through the environment.

In some examples, one or more maps 628 may be stored on remote computing device(s) (e.g., computing device(s) 636) accessible over network(s) 640. In some examples, the plurality of maps 628 may be stored based on, for example, a characteristic (e.g., a type of entity, a time of day, a day of the week, a season of the year, etc.). Storing multiple maps 628 may have similar memory requirements, but increases the speed at which data in the maps can be accessed.

As shown in FIG. 6, the vehicle computing system 604 may include a model component 630. The model component 630 may be configured to determine a predicted trajectory of an object, weights associated with the predicted trajectory, an intent of the object, an intent of the trajectory, and/or an intent of the type of trajectory, such as the model 108 of fig. 1 and the model 502 of fig. 5. In various examples, model component 630 can receive data representing an overhead view of an environment. In some examples, model component 630 can receive environmental characteristics (e.g., environmental factors, etc.) and/or weather characteristics (e.g., weather factors, such as snow, rain, ice, etc.) from perception component 622 and/or sensor system(s) 606. Although shown separately in fig. 6, model component 630 can be part of perception component 622, planning component 624, or other component(s) of vehicle 602.

In various examples, the model component 630 can send output from the first model 632A, the second model 632B, and/or the nth model 632N that is used by the planning component 624 to generate one or more candidate trajectories (e.g., direction of travel, speed, etc.) for the vehicle 602. In some examples, planning component 624 may determine one or more actions (e.g., reference actions and/or sub-actions) for vehicle 602. In some examples, the model component 630 can be configured to output a discretized representation that can be used by the vehicle computing system 604 to determine trajectories and weights of objects at future times. In some examples, the trajectory may be based at least in part on the cells of the discretized representation. In some examples, planning component 624 may be configured to determine actions applicable to the environment, such as based on environmental characteristics, weather characteristics, or the like.

In some examples, the first model 632A, the second model 632B, and/or the nth model 632N may be configured for different objects. For example, a first model 632A may be implemented by the vehicle computing system 604 to determine the intent of a pedestrian, while a second model 632B may be implemented to determine the intent of a cyclist.

In various examples, the model component 630 can utilize machine learning techniques to determine a behavioral prediction of an object in an image depicting the vehicle surroundings and/or determine a behavioral prediction of an object in the environment, as described with respect to fig. 1-5 and elsewhere. In such examples, the machine learning algorithm may be trained to determine one or more trajectories, weights, and/or intents of the object relative to the vehicle in the environment.

In some examples, the model component 630 can determine a predicted trajectory or intent of the object based on the discretized representation of the environment (e.g., infer intent of the object). In some examples, model component 630 may be trained to learn object behaviors based at least in part on the gestures or previous behaviors of the object, and in some cases, to learn how the gestures or behaviors change over time. Thus, once trained, the model component 630 can determine the intent of an object from fewer images or a single image, just as a driver can determine whether an object will change direction or speed based on subtle features of the object.

In various examples, model component 630 can determine the weights based at least in part on probabilities associated with one or more cells in the discretized representation. For example, the model component 630 can identify which of, e.g., 400, possible classifications for each unit and aggregate, sum, or otherwise combine the probabilities on each unit associated with the predicted trajectory of the object. In such an example, the model 108 may map cells of the discretized representation to intent categories.

It can be appreciated that the components discussed herein (e.g., the positioning component 620, the perception component 622, the planning component 624, the one or more system controllers 626, the one or more maps 628, the model component 630 including one or more models, such as the first model 632A, the second model 632B, through the nth model 632) are described as being partitioned for purposes of illustration.

In some cases, aspects of some or all of the components discussed herein may include any model, technique, and/or machine learning technique. For example, in some cases, components in memory 618 (and memory 634 discussed below) may be implemented as a neural network.

As described herein, an exemplary neural network is a biologically inspired technique that passes input data through a series of connected layers to produce an output. Each layer in the neural network may also include another neural network, or may include any number of layers (whether convolutional or not). As can be appreciated in the context of the present disclosure, neural networks may utilize machine learning, which may refer to a wide range of such techniques in which an output is generated based on learned parameters.

Although discussed in the context of a neural network, any type of machine learning may be used consistent with the present disclosure. For example, machine learning techniques may include, but are not limited to, regression techniques (e.g., ordinary Least Squares Regression (OLSR), linear regression, logistic regression, stepwise regression, multivariate Adaptive Regression Splines (MARS), locally estimated scatter plot smoothing (losss)), example-based techniques (e.g., ridge regression, least absolute contraction and selection operator (LASSO), elastic nets, least Angle Regression (LARS)), decision tree techniques (e.g., classification and regression tree (CART), iterative dichotomy 3 (ID 3), chi-squared automated interaction detection (CHAID), decision stump, conditional decision tree), bayesian techniques (e.g., bayesian, gaussian bayes, multi-cross naive bayes, average single term estimator (AODE), bayesian belief network (BNN), bayesian network), clustering techniques (e.g., k-means, k-medians, expectation Maximization (EM), hierarchical), associative rule learning techniques (e.g., perception, back propagation, hopplenary network, radial basis function network (pls)), deep learning techniques (e.g., rbk-means, k-medians, expectation Maximization (EM), convolutional partial projection network (DBMs), convolutional component regression (DBMs), etc., genetic component (PCR), and principal component projection (pcds), and principal component projection (e.g., post-component regression) regression techniques, linear Discriminant Analysis (LDA), mixed Discriminant Analysis (MDA), quadratic Discriminant Analysis (QDA), flexible Discriminant Analysis (FDA)), ensemble techniques (e.g., lifting algorithm, guided aggregation algorithm (Bagging), adaptive lifting algorithm, stacked generalization algorithm (fusion), gradient elevator (GBM), gradient regression lifting tree (GBRT), random forest), SVM (support vector machine), supervised learning, unsupervised learning, semi-supervised learning, and the like. Other examples of architectures include neural networks such as ResNet70, resNetlOl, VGG, denseNet, pointNet, and the like.

In at least one example, sensor system(s) 606 can include lidar sensors, radar sensors, ultrasonic sensors, sonar sensors, position sensors (e.g., GPS, compass, etc.), inertial sensors (e.g., inertial Measurement Unit (IMU), accelerometer, magnetometer, gyroscope, etc.), cameras (e.g., RGB, IR, intensity, depth, time of flight, etc.), microphones, wheel encoders, environmental sensors (e.g., temperature sensors, humidity sensors, light sensors, pressure sensors, etc.), and the like. Sensor system(s) 606 may include multiple instances of each of these or other types of sensors. For example, the lidar sensors may include a single lidar sensor located at a corner, front, rear, side, and/or top of the vehicle 602. As another example, the camera sensor may include multiple cameras disposed at different locations outside and/or inside of the vehicle 602. The sensor system(s) 606 may provide input to the vehicle computing device 604. Additionally or alternatively, the sensor system(s) 606 can transmit sensor data to the one or more computing devices 636 via the one or more networks 640 at a particular frequency, after a predetermined period of time, in near real-time, and/or the like. In some examples, model component 630 may receive sensor data from one or more of sensor systems 606.

The vehicle 602 may also include one or more emitters 608 for emitting light and/or sound. The transmitter 608 includes internal audio and visual transmitters for communicating with the occupants of the vehicle 602. By way of example and not limitation, the internal transmitters may include speakers, lights, signs, display screens, touch screens, tactile transmitters (e.g., vibration and/or force feedback), mechanical actuators (e.g., seat belt tensioners, seat positioners, headrest positioners, etc.), and the like. The transmitter(s) 608 also include external transmitter(s). By way of example and not limitation, the external transmitters may include lights (e.g., indicator lights, signs, light arrays, etc.) that emit a driving direction signal or other indication of vehicle action, and one or more audio transmitters (e.g., speakers, speaker arrays, horns, etc.) to audibly communicate with the pedestrian or other nearby vehicle, where the one or more audio transmitters include beam steering technology.

Vehicle 602 may also include one or more communication connections 610 that enable communication between vehicle 602 and one or more other computing devices, local or remote. For example, the communication connection(s) 610 may facilitate communication with other local computing device(s) and/or drive system(s) 614 on the vehicle 602. Also, the communication connection(s) 610 may allow the vehicle to communicate with other nearby computing device(s) (e.g., remote computing device 636, other nearby vehicles, etc.) and/or one or more remote sensor systems 642 to receive sensor data. Communication connection(s) 610 also enable vehicle 602 to communicate with a remotely operated computing device or other remote service.

The communication connection(s) 610 may include a physical and/or logical interface for connecting the vehicle computing system 604 to another computing device or network, such as the network(s) 640. For example, communication connection(s) 610 may enable Wi-Fi based communication, e.g., over frequencies specified by the IEEE 802.11 standard, short range wireless frequencies such as bluetooth, cellular communication (e.g., 2G, 3G, 4G LTE, 5G, etc.), or any suitable wired or wireless communication protocol that enables a respective computing device to interface with other computing device(s).

In at least one example, the vehicle 602 may include one or more drive systems 614. In some examples, the vehicle 602 may have a single drive system 614. In at least one example, if the vehicle 602 has multiple drive systems 614, then separate drive systems 614 may be provided at opposite ends (e.g., front and rear, etc.) of the vehicle 602. In at least one example, drive system(s) 614 may include one or more sensor systems to detect conditions of drive system(s) 614 and/or the environment surrounding vehicle 602. By way of example and not limitation, the sensor system(s) may include one or more wheel encoders (e.g., rotary encoders) to sense rotation of the wheels of the drive system, inertial sensors (e.g., inertial measurement units, accelerometers, gyroscopes, magnetometers, etc.) to measure orientation and acceleration of the drive system, cameras or other image sensors, ultrasonic sensors to acoustically detect objects around the drive system, lidar sensors, radar sensors, and the like. Some sensors, such as wheel encoders, may be specific to the drive system(s) 614. In some cases, the sensor system(s) on drive system(s) 614 may overlap or complement the corresponding systems of vehicle 602 (e.g., sensor system(s) 606).

The drive system(s) 614 may include a number of vehicle systems including a high voltage battery, a motor to propel the vehicle, an inverter to convert the DC power of the battery to ac power for use by other vehicle systems, a steering system including a steering motor and a bogie (which may be electric), a braking system including hydraulic or electric brakes, a suspension system including hydraulic and/or pneumatic components, a stability control system to distribute braking force to reduce traction loss and maintain control, an HVAC (high voltage ac) system, lighting (e.g., lighting for illuminating the heads/tail lights outside of the vehicle), and one or more other systems (e.g., a cooling system, a safety system, an on-board charging system, other electrical components such as DC/DC converters, high voltage nodes, high voltage cables, a charging system, charging ports, etc.). Further, drive system(s) 614 may include a drive system controller that may receive and pre-process data from the sensor system(s) to control the operation of the various vehicle systems. In some examples, the drive system controller may include one or more processors and a memory communicatively coupled with the one or more processors. The memory may store one or more modules to perform various functions of the drive system(s) 614. In addition, drive system(s) 614 may also include one or more communication connections that enable each drive system to communicate with one or more other local or remote computing devices.

In at least one example, the direct connection 612 may provide a physical interface to couple one or more drive systems 614 with the body of the vehicle 602. For example, the direct connection 612 may allow energy, fluid, air, data, etc. to be transferred between the drive system(s) 614 and the vehicle. In some cases, direct connection 612 may further releasably secure drive system(s) 614 to the body of vehicle 602.

In at least one example, the positioning component 620, the perception component 622, the planning component 624, the one or more system controllers 626, the one or more maps 628, and the model component 630 can process the sensor data, as described above, and can transmit their respective outputs to the computing device(s) 636 via the one or more networks 640. In at least one example, the positioning component 620, the perception component 622, the planning component 624, the one or more system controllers 626, the one or more maps 628, and the model component 630 can transmit their respective outputs to the remote computing device(s) 636 at a particular frequency in a near real-time manner after a predetermined period of time has elapsed.

In some examples, the vehicle 602 can send the sensor data to the computing device(s) 636 over the network(s) 640. In some examples, the vehicle 602 can receive sensor data from the computing device(s) 636 and/or the remote sensor system(s) 642 over the network(s) 640. The sensor data may include raw sensor data and/or processed sensor data and/or a representation of the sensor data. In some examples, sensor data (raw or processed) may be sent and/or received as one or more log files.

The computing device(s) 636 may include processor(s) 644 and memory 634 that stores a map component 638, a model component 646, and a training component 648. In some examples, the map component 638 may include functionality to generate maps of various resolutions. In such an example, the map component 638 may send one or more maps to the vehicle computing system 604 for navigation purposes. In some examples, model component 646 can be configured to perform similar functions as model component 630. In various examples, model component 646 can be configured to receive data from one or more remote sensors (e.g., sensor system(s) 606 and/or remote sensor system(s) 642). In some examples, the model component 646 may be configured to process data and send the processed sensor data to the vehicle computing system 604, e.g., for use by the model component 630 (e.g., the first model 632A, the second model 632B, and/or the nth model 632N). In some examples, the model component 646 may be configured to send raw sensor data to the vehicle computing system 604.

In some cases, the training component 648 may include functionality for training machine learning models to output features of objects and/or attributes of objects. For example, the training component 648 may receive a set of images (e.g., one or more images) that represent the traversal of the environment by the object over a period of time (e.g., 0.1 milliseconds, 1 second, 3 seconds, 5 seconds, 7 seconds, etc.). At least a portion of the set of images can be used as input to train a machine learning model. As a non-limiting example, a first set (e.g., 3, 4, 5, or more) of the sequence of images may be input to the machine learning model. A second set of images (or attribute information associated therewith-e.g., by extracting attributes from the images) in the sequence of images immediately preceding the first set may be used as a ground truth for training the model. Accordingly, by providing images of the object traversing the environment, the training component 648 may be trained to output features of the object and/or attributes of the object, as discussed herein.

In some examples, the training component 648 may include training data that has been generated by a simulator. For example, the simulated training data may represent examples of a vehicle colliding or nearly colliding with an object in the environment to provide additional training examples.

Additional details of the training component 648 and examples of data used for training are discussed below in connection with fig. 3 and in the present disclosure.

Processor(s) 616 of vehicle 602 and processor(s) 644 of computing device(s) 636 may be any suitable processor capable of executing instructions to process data and perform the operations described herein. By way of example, and not limitation, processor(s) 616 and 644 may include one or more Central Processing Units (CPUs), graphics Processing Units (GPUs), or any other device or portion of a device that processes electronic data to transform that electronic data into other electronic data that may be stored in registers and/or memory. In some examples, integrated circuits (e.g., ASICs, etc.), gate arrays (e.g., FPGAs, etc.), and other hardware devices may also be considered processors, so long as they are configured to implement the encoded instructions.

Memory 618 and memory 634 are examples of non-transitory computer-readable media. Memory 618 and memory 634 may store an operating system and one or more software applications, instructions, programs, and/or data to implement the methods described herein and the functions attributed to the various systems. In various embodiments, the memory may be implemented using any suitable memory technology, such as Static Random Access Memory (SRAM), synchronous Dynamic RAM (SDRAM), non-volatile/flash type memory, or any other memory type capable of storing information. The architectures, systems, and individual elements described herein may include many other logical, procedural, and physical components, with the components shown in the figures being merely examples relating to the discussion herein.

In some cases, memory 618 and memory 634 may include at least a working memory and a storage memory. For example, the working memory may be a limited-capacity high-speed memory (e.g., a cache memory) used to store data to be operated on by the processor(s) 616 and 644. In some cases, memory 618 and memory 634 may comprise storage memory, which may be relatively large capacity low speed memory for long term storage of data. In some cases, processor(s) 616 and 644 may not be able to directly operate on data stored in the storage memory, and the data may need to be loaded into the working memory in order to perform operations based on the data, as discussed herein.

It should be noted that while fig. 6 is illustrated as a distributed system, in alternative examples, components of the vehicle 602 can be associated with the computing device(s) 636 and/or components of the computing device(s) 636 can be associated with the vehicle 602. That is, the vehicle 602 can perform one or more functions associated with the computing device(s) 636, and vice versa. For example, one of the vehicle 602 and the computing device(s) 636 may perform training operations related to one or more of the models described herein.

Fig. 7 and 8 illustrate example processes according to embodiments of the disclosure. Some or all of

processes

700 and 800 may be performed by one or more components of fig. 6, as described herein. For example, some or all of the

processes

700 and 800 may be performed by the vehicle computing system 604 and/or the computing device(s) 636. The processes are illustrated as logical flow diagrams, wherein each operation represents a sequence of operations that can be implemented in hardware, software, or a combination thereof. In software, these operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and so forth that perform particular functions or implement particular abstract data types. The order of the operations is not intended to be construed as a limitation, and any number of the operations can be omitted or combined in any order and/or in parallel to implement the flows.

FIG. 7 is a flow diagram depicting an example process 700 for determining predicted trajectories and weights using different models.

At operation 702, the process may include receiving, by a vehicle computing system, sensor data. For example, the vehicle computing system 604 may receive sensor data from the perception component 622. The sensor data may represent objects (e.g., objects 104 of fig. 1) detected in a vehicle surrounding, such as vehicle 102. In some examples, the sensor data may be received from one or more sensors on the vehicle and/or from one or more remote sensors. In some examples, operation 702 may include capturing sensor data using multiple sensors and fusing or combining the sensor data into a detailed and informational representation of the environment.

At operation 704, the process may include determining, by the vehicle computing system, data. For example, the vehicle computing system 604 may determine data representative of an overhead view of the environment (e.g., overhead view 112) and objects in the environment (e.g., objects 104). The data may include sensor data associated with sensors of vehicles in the environment, map data, and/or data from another data source, which may be encoded as an overhead representation. Examples of such data will be discussed in this disclosure.

At operation 706, the process may include inputting data into a model of the vehicle computing system. For example, the vehicle computing system 604 may input data into the model 108. In some examples, the model may be a machine learning model, as discussed in this disclosure.

At operation 708, the process can include receiving output from a model representing a discretized representation of the environment. For example, the vehicle computing system can receive the discretized representation 114 from the model 108. Additional details of the discretized representation 114 are discussed throughout the disclosure.

At operation 710, the process can include determining a predicted trajectory associated with the object and a weight associated with the predicted trajectory based at least in part on the discretized representation. For example, the vehicle computing system implements one or more components to determine the predicted

trajectories

110A and 110B and the weights 302A and 302B based on the classification probabilities associated with the cells of the discretized representation. In some examples, the classification probability may indicate whether the object will arrive at the destination at a future time. Additional details of determining predicted trajectories and/or associated weights are discussed throughout the disclosure.

At operation 712, the process may include determining whether the model is currently being trained, or whether the model has been previously trained. In some examples, the vehicle computing system may process data (sensor data, map data, image data, etc.) as part of a training operation, an inference operation, or both. If the model is not being trained (e.g., "no" at operation 712), the process may continue to operation 714 to cause operation of the vehicle to be controlled based at least in part on the output of the model. If the model is not being trained (e.g., "yes" at operation 712), the process may continue to operation 716 to update the parameter(s) of the model based at least in part on the output of the model. Of course, in some examples, the operations may be performed simultaneously, depending on the implementation.

At operation 714, the vehicle may be controlled based at least in part on the output from the model 108. For example, the output from the model 108 can be processed by a planning component 624 of the vehicle to determine actions that the vehicle can take to avoid impact with the object. Additional details of using one or more outputs from one or more modes to control a vehicle are discussed throughout this disclosure.

At operation 716, one or more parameters of the model may be updated, changed, and/or enhanced to train the model. In some cases, the output from the model 108 may be compared to training data (e.g., ground truth for data representing markers) for use in training. Based at least in part on the comparison, the parameter(s) associated with the model 108 may be updated.

At operation 802, the process may include determining, by a vehicle computing system, a vehicle trajectory (e.g., a first trajectory). For example, the vehicle computing system 604 can determine the candidate trajectories through the planning component 624. In some examples, the candidate trajectories are trajectories that may be used to navigate the vehicle in the environment. In some examples, operation 802 may include capturing sensor data using multiple sensors and fusing or combining the sensor data into a detailed and informational representation of the environment.

At operation 804, the process may include determining an object trajectory (e.g., a second trajectory) through the model. For example, the vehicle computing system 604 may implement the model 108 to determine the predicted trajectory. In some examples, the vehicle computing system 604 may also determine from weights associated with the predicted trajectory. Examples of such predicted trajectories and weights are discussed in this disclosure.

At operation 806, the process may include receiving, by the vehicle computing system, map data. For example, the vehicle computing system 604 may receive map data from the map(s) 628. The map data may indicate characteristics of the environment including crosswalks, roads, sidewalks, and the like.

At operation 808, the process may include determining, by the same or different models, an output including a first intent for the first track and a second intent for the second track. For example, the vehicle computing system 604 may determine the intent(s) using the model 502 and may map the location of the discretized representation 114 from the model 502 to a destination in the map data. In some examples, the model 502 may additionally or alternatively output one or more intents of a trajectory type (e.g., a road trajectory or freeform trajectory). Other details intended are discussed throughout the disclosure.

In some examples, at operation 808, the process may include sending data representative of the output by the model to a planning component of the vehicle computing system to cause the vehicle to plan a trajectory of the vehicle based at least in part on the output of the model. Additional details of using the output from the model to control the vehicle are discussed throughout this disclosure.

At operation 810, the process may include determining whether the model is currently being trained, or whether the model has been previously trained. In some examples, the vehicle computing system may process the data as part of the training operation, the inferencing operation, or both. If the model is not being trained (e.g., "no" at operation 810), the process may continue to operation 812 such that operation of the vehicle is controlled based at least in part on the output of the model. If the model is not being trained (e.g., "yes" at operation 810), the process may continue to operation 814 to update the parameter(s) of the model based at least in part on the output of the model. Of course, in some examples, the operations may be performed simultaneously, depending on the implementation.

At operation 812, the vehicle may be controlled based at least in part on the output from the model 502. For example, the output from the model 502 can be processed by a planning component 624 of the vehicle to determine actions that the vehicle can take to avoid impact with the object. Additional details of using one or more outputs from one or more modes to control a vehicle are discussed throughout the disclosure. In some examples, the planning component 624 may control the vehicle based at least in part on the output from the model 108 and the output from the model 502.

At operation 814, one or more parameters of the model may be updated, changed, and/or enhanced to train the model. In some cases, the output from the model 502 may be compared to training data (e.g., ground truth representing the tag data) for use in training. Based at least in part on the comparison, the parameter(s) associated with model 502 and/or model 108 may be updated.

The methodologies described herein represent sequences of operations that can be implemented in hardware, software, or a combination thereof. In software terms, the blocks represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and so forth that perform particular functions or implement particular abstract data types. The order of the operations is not intended to be construed as a limitation, and any number of the operations can be combined in any order and/or in parallel to implement the processes. In some embodiments, one or more operations of the method may be omitted entirely.

The various techniques described herein may be implemented in the context of computer-executable instructions or software, such as program modules, stored in computer-readable memory and executed by processor(s) of one or more computing devices, such as those shown in the figures. Generally, program modules include routines, programs, objects, components, data structures, etc. that define the operating logic for performing particular tasks or implement particular abstract data types.

Other architectures can be used to implement the described functionality and are intended to be within the scope of the present disclosure. Further, although a particular allocation of responsibilities is defined above for purposes of discussion, the various functions and responsibilities may be allocated and divided in different ways depending on the circumstances.

Similarly, software may be stored and distributed in a variety of ways and using different means, and the particular software storage and execution configurations described above may be varied in many different ways. Thus, software implementing the techniques described above may be distributed on various types of computer-readable media and is not limited to the form of memory specifically described.

Example clauses

A: a system, comprising: one or more processors; and one or more non-transitory computer-readable storage media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: receiving sensor data associated with an autonomous vehicle in an environment; determining data based at least in part on the sensor data, the data comprising an overhead representation of an environment and objects in the environment; inputting the data into a machine learning model; receiving an output from the machine learning model that includes a discretized representation of a portion of the environment, wherein cells of the discretized representation are associated with classification probabilities of locations of the object at future times; determining a predicted trajectory associated with the object and a weight associated with the predicted trajectory based at least in part on the discretized representation and the classification probability; and causing operation of the autonomous vehicle to be controlled based at least in part on the predicted trajectory associated with the object and the weight associated with the predicted trajectory.

B: the system of clause a, wherein the classification probability associated with the unit indicates a probability that the object is at the location at the future time.

C: the system of clauses a or B, wherein: the position is a first position; the cell is a first cell; the classification probability is a first classification probability; the predicted trajectory is a first predicted trajectory; the weight is a first weight; the discretized representation includes a second cell associated with a second classification probability that the object is at a second location at the future time; and the operations further comprise: determining, based at least in part on the map data, that the first location is associated with a first destination; determining, based at least in part on the map data, that the second location is associated with a second destination; determining a second predicted trajectory associated with the object at the future time based at least in part on the second classification probability and the second location; and causing operation of the autonomous vehicle to be controlled further based at least in part on the second predicted trajectory and a second weight associated with the second predicted trajectory.

D: the system of any of clauses a-C, the operations further comprising: the weight is determined based at least in part on the classification probability and another classification probability.

E: the system of any of clauses a-D, wherein: the position representation is based at least in part on an offset of the object position at a previous time that the object was prior to the future time.

F: one or more non-transitory computer-readable storage media storing instructions that, when executed, cause one or more processors to perform operations comprising: inputting data into the model, the data comprising an overhead representation of the environment at a first time; receiving an output from the model that includes a discretized representation of a portion of the environment, wherein elements of the discretized representation are associated with probabilities associated with the object at a second time subsequent to the first time; determining a trajectory associated with the object and a weight associated with the trajectory based at least in part on the discretized representation and the probability; and causing operation of the vehicle to be controlled based at least in part on the trajectory and the weight.

G: the one or more non-transitory computer-readable storage media of clause F, wherein: the data includes at least one of: sensor data, map data, or data based on the sensor data, the data representing one or more channel images to form the overhead representation, and the probability associated with the cell is indicative of a probability that the object is at a location at the second time.

H: the one or more non-transitory computer-readable storage media of clauses F or G, wherein: the position is a first position; the cell is a first cell; the probability is a first probability; the track is a first track; the weight is a first weight; the discretized representation includes a second cell associated with a second probability that the object is at a second location at the second time; and the operations further comprising: determining, based at least in part on the map data, that the first location is associated with a first destination; determining, based at least in part on the map data, that the second location is associated with a second destination; determining a second trajectory associated with the object at the second time based at least in part on the second probability and the second location; and causing operation of the vehicle to be controlled further based at least in part on the second trajectory and a second weight associated with the second trajectory.

I: the one or more non-transitory computer-readable storage media of any of clauses F-H, the operations further comprising: sending data including the trajectory and the weight to a planning component of the vehicle; and causing the planning component to determine a candidate trajectory for the vehicle to follow in the environment based at least in part on the data.

J: the one or more non-transitory computer-readable storage media of any of clauses F-I, the operations further comprising: receiving map data associated with the environment; determining, based at least in part on the map data and a location associated with the cell, that the location is associated with a semantic destination; and determining the weight based at least in part on the probability and the location being associated with the semantic destination at the second time.

K: the one or more non-transitory computer-readable storage media of any of clauses F-J, the operations further comprising: the weight is determined based at least in part on the probability and another probability.

L: the one or more non-transitory computer-readable storage media of any of clauses F-K, wherein the model is a machine learning model trained based at least in part on a comparison between data associated with previous outputs of the model and ground truth data.

M: the one or more non-transitory computer-readable storage media of any of clauses F-L, the operations further comprising: interpolating a position of the object at the first time and a position associated with the probability at the second time, and wherein the trajectory is based at least in part on the interpolation.

N: the one or more non-transitory computer-readable storage media of any one of clauses F-M, the operations further comprising: receiving map data associated with the environment; determining, based at least in part on the map data and a location associated with the cell, that the location is associated with a semantic destination; and determining an intent associated with the object based at least in part on the semantic destination and the probability; and wherein the operation of the vehicle is caused to be controlled based further at least in part on the intent.

O: a method, comprising: inputting image data into the model, the image data comprising an overhead representation of the environment at a first time; receiving an output from the model that includes a discretized representation of a portion of the environment, wherein elements of the discretized representation are associated with probabilities associated with objects at a second time subsequent to the first time; determining a trajectory associated with the object and a weight associated with the trajectory based at least in part on the discretized representation and the probability; and causing operation of the vehicle to be controlled based at least in part on the trajectory and the weight.

P: the method of clause O, wherein: the probability associated with the cell indicates a probability that the object is at a location at the second time.

Q: the method of clause O or P, wherein: the position representation is based at least in part on an offset of the object position at a previous time of the object prior to the second time.

R: the method according to clauses O-P, wherein the overhead representation of the environment represents one or more of: object position, object velocity, object acceleration, object yaw, object properties, crosswalk permission, or traffic light permission.

S: the method of clauses O-R, further comprising: receiving sensor data of the environment associated with sensors of the vehicle; determining, based at least in part on the sensor data, a first object type and a second object type associated with the object in the environment, the second object type being different from the first object type; determining a first probability that the object is of the first object type: and determining a second probability that the object is the second object type, wherein inputting the image data into the model comprises inputting an indication of the first probability associated with the first object type and the second probability associated with the second object type.

T: the method of clauses O-S, wherein the vehicle is an autonomous vehicle, and further comprising: transmitting data comprising the trajectory and the weight to a planning component of the autonomous vehicle; and causing the planning component to determine a candidate trajectory for the autonomous vehicle to follow in the environment based at least in part on the data.

U: a system, comprising: one or more processors; and one or more non-transitory computer-readable storage media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: receiving sensor data; determining an object represented in the sensor data; determining a first predicted trajectory of the object, the first predicted trajectory associated with a first weight; determining a second predicted trajectory of the object, the second predicted trajectory associated with a second weight; receiving map data; determining, based at least in part on the map data, a first intent of the first trajectory based on a first semantic destination; determining, based at least in part on the map data, a second intent of the second track based on a second semantic destination of the second track; and controlling an autonomous vehicle based at least in part on the first trajectory, the first weight, the first intent, the second trajectory, the second weight, and the second intent.

V: the system of clause U, wherein determining the first predicted trajectory comprises performing a regression.

W: the system of clauses U or V, wherein the second trajectory is based at least in part on the classification.

X: the system of any of clauses U-W, wherein: the first track is associated with a first destination; and the second track is associated with a second destination different from the first destination.

Y: the system of any of clauses U-X, the operations further comprising: determining that one of the first weight or the second weight is higher than the other of the first weight and the second weight; and at least one of: in response to determining that the first weight is higher than the second weight, controlling the autonomous vehicle in the environment based at least in part on the first trajectory; or in response to determining that the second weight is higher than the first weight, controlling the autonomous vehicle in the environment based at least in part on the second trajectory.

Z: a method, comprising: receiving sensor data; determining an object represented in the sensor data; determining a first trajectory associated with the object; determining a second trajectory associated with the object; determining a first intent of the first track based on the first semantic destination; determining a second intent of the second track based on a second semantic destination of the second track; and sending the first trajectory, the first intent, the second trajectory, and the second intent to a planning component to control a vehicle.

AA: the method of clause Z, wherein: the first track is associated with a first track type; and the second track is associated with a second track type different from the first track type.

AB: the method of clause Z or AA, wherein the first trajectory type or the second trajectory type comprises a trajectory type associated with a road segment in the environment of the vehicle.

AC: the method of clause Z or AB, further comprising: determining, by a first machine learning model, a first weight associated with the first trajectory; determining, by a second machine learning model, a second weight associated with the second trajectory; and controlling the vehicle based at least in part on the first trajectory, the first weight, the first intent, the second trajectory, the second weight, and the second intent.

AD: the method of any of clauses Z-AC, wherein controlling the vehicle comprises determining a candidate trajectory for the vehicle to follow in the environment.

AE: the method of any of clauses Z-AD, further comprising determining at least one of the first intent or the second intent based at least in part on a proximity of the object to an area in the vehicle surroundings.

AF: the method of any one of clauses Z-AE, wherein: the area in the environment includes a road segment associated with map data representing the environment, the object includes a pedestrian or a bicycle, the first semantic destination includes a first area in the environment of the vehicle, and the second semantic destination includes a second area in the environment of the vehicle different from the first semantic destination.

AG: the method of any of clauses Z-AF, wherein the first trajectory is based at least in part on regression and the second trajectory is based at least in part on classification.

AH: the method according to any one of clauses Z to AG, wherein: the first track is associated with a first destination; and the second track is associated with a second destination different from the first destination.

AI: the method of any of clauses Z-AH, wherein: the first intent or the second intent comprises at least one of: an intent of the object to travel along a road segment in the environment of the vehicle, an intent of the object to travel outside a vicinity of the road segment, an intent of the object to travel within a crosswalk, or an intent of the object to travel outside a boundary of the crosswalk.

AJ: one or more non-transitory computer-readable storage media having instructions stored thereon that, when executed, cause one or more processors to perform operations comprising: receiving sensor data; determining an object represented in the sensor data; determining a first trajectory associated with the object; determining a second trajectory associated with the object; determining a first intent of the first track based on the first semantic destination; determining a second intent of the second track based on a second semantic destination of the second track; and sending the first trajectory, the first intent, the second trajectory, and the second intent to a planning component to control a vehicle.

AK: the one or more non-transitory computer-readable media of clause AJ, wherein: the first track is associated with a first track type; and the second track is associated with a second track type different from the first track type.

AL: one or more non-transitory computer-readable media according to clause AJ or AK, wherein: the first track is associated with a first destination; and the second track is associated with a second destination different from the first destination.

AM: the one or more non-transitory computer-readable media of any of clauses AJ-AL, the operations further comprising: receiving a weight associated with the first trajectory from a machine learning model, wherein the sending further comprises sending the weight to the planning component to control the vehicle.

AN: the method according to any of clauses AJ-AM, the operations further comprising determining at least one of the first intent or the second intent based at least in part on a proximity of the object to a region in an environment.

While the example clauses described above are described with respect to a particular embodiment, it should be understood that the contents of the example clauses, in the context of this document, may also be implemented in a method, apparatus, system, computer-readable medium, and/or another implementation. Moreover, any of examples a to AN may be implemented alone or in combination with any other one or more of examples a to AN.

Conclusion

While one or more examples of the technology described herein have been described, various modifications, additions, permutations, and equivalents thereof are included within the scope of the technology described herein.

In the description of the examples, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific examples of the claimed subject matter. It is to be understood that other examples may be used and that changes or alterations, such as structural changes, may be made. Such examples, changes, or variations do not necessarily depart from the scope of the claimed subject matter as intended. Although the steps herein may be presented in a certain order, in some cases the order may be changed to provide certain inputs at different times or in a different order without changing the functionality of the systems and methods. The disclosed procedures may also be performed in a different order. Moreover, the various computations herein need not be performed in the order disclosed, and other examples using altered orders of computations may also be readily implemented. In addition to reordering, a computation may also be decomposed into sub-computations with the same result.

Claims

1. A system, comprising:

one or more processors; and

one or more non-transitory computer-readable storage media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising:

receiving sensor data associated with an autonomous vehicle in an environment;

determining data based at least in part on the sensor data, the data comprising an overhead representation of the environment and objects in the environment;

inputting the data into a machine learning model;

receiving an output from the machine learning model that includes a discretized representation of a portion of the environment, wherein cells of the discretized representation are associated with classification probabilities of locations of the object at future times;

determining a predicted trajectory associated with the object and a weight associated with the predicted trajectory based at least in part on the discretized representation and the classification probability; and

causing operation of the autonomous vehicle to be controlled based at least in part on the predicted trajectory associated with the object and the weight associated with the predicted trajectory.

2. The system of claim 1, wherein the classification probability associated with the cell is indicative of a probability that the object is at the location at the future time.

3. The system of any one of claims 1 or 2, wherein:

the position is a first position;

the cell is a first cell;

the classification probability is a first classification probability;

the predicted trajectory is a first predicted trajectory;

the weight is a first weight;

the discretized representation includes a second cell associated with a second classification probability that the object is at a second location at the future time; and

the operations further include:

determining, based at least in part on map data, that the first location is associated with a first destination;

determining, based at least in part on the map data, that the second location is associated with a second destination;

determining a second predicted trajectory associated with the object at the future time based at least in part on the second classification probability and the second location; and

causing operation of the autonomous vehicle to be controlled is further based at least in part on the second predicted trajectory and a second weight associated with the second predicted trajectory.

4. The system of any of claims 1 to 3, the operations further comprising:

determining the weight based at least in part on the classification probability and another classification probability.

5. The system of any one of claims 1 to 4, wherein:

the position representation is based at least in part on an offset of the object position at a previous time of the object prior to the future time.

6. A method, comprising:

inputting data into a model, the data comprising an overhead representation of an environment at a first time;

receiving an output from the model that includes a discretized representation of the portion of the environment, wherein cells of the discretized representation are associated with probabilities associated with objects at a second time after the first time;

determining a trajectory associated with the object and a weight associated with the trajectory based at least in part on the discretized representation and the probability; and

controlling operation of the vehicle based at least in part on the trajectory and the weight.

7. The method of claim 6, wherein:

the data includes at least one of: sensor data, map data, or data based on the sensor data, the data representing one or more channel images to form the overhead representation, an

The probability associated with the cell indicates a probability that the object is at a location at the second time.

8. The method of claim 7, wherein:

the position is a first position;

the cell is a first cell;

the probability is a first probability;

the track is a first track;

the weight is a first weight;

the discretized representation comprises a second cell associated with a second probability that the object is at a second location at the second time; and

the operations further include:

determining a second trajectory associated with the object at the second time based at least in part on the second probability and the second location; and

controlling the operation of the vehicle is further based at least in part on the second trajectory and a second weight associated with the second trajectory.

9. The method of any of claims 6 to 8, the operations further comprising:

transmitting data including the trajectory and the weight to a planning component of the vehicle; and

cause the planning component to determine a candidate trajectory for the vehicle to follow in the environment based at least in part on the data.

10. The method of any of claims 6 to 9, the operations further comprising:

receiving map data associated with the environment;

determining that the location is associated with a semantic destination based at least in part on the map data and a location associated with the cell; and

determining the weight based at least in part on the probability and the location being associated with the semantic destination at the second time.

11. The system of any of claims 6 to 10, the operations further comprising:

determining the weight based at least in part on the probability and another probability.

12. The method of any of claims 6 to 11, wherein the model is a machine learning model trained based at least in part on a comparison between data associated with previous outputs of the model and ground truth data.

13. The method of any of claims 6 to 12, the operations further comprising:

interpolating a position of the object at the first time and a position associated with the probability at the second time, and

wherein the trajectory is based at least in part on the interpolation.

14. The method of any of claims 6 to 13, the operations further comprising:

receiving map data associated with the environment;

determining an intent associated with the object based at least in part on the semantic destination and the probability; and

wherein causing the operation of the vehicle to be controlled is further based at least in part on the intent.

15. A computer program product comprising encoded instructions which, when run on a computer, implement the method according to any one of claims 6 to 14.