WO2024137504A1 - Détermination de variable latente par modèle de diffusion - Google Patents
Détermination de variable latente par modèle de diffusion Download PDFInfo
- Publication number
- WO2024137504A1 WO2024137504A1 PCT/US2023/084627 US2023084627W WO2024137504A1 WO 2024137504 A1 WO2024137504 A1 WO 2024137504A1 US 2023084627 W US2023084627 W US 2023084627W WO 2024137504 A1 WO2024137504 A1 WO 2024137504A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- vehicle
- environment
- model
- machine learned
- Prior art date
Links
- 238000009792 diffusion process Methods 0.000 title claims description 62
- 238000000034 method Methods 0.000 claims abstract description 157
- 230000001133 acceleration Effects 0.000 claims abstract description 51
- 239000013598 vector Substances 0.000 claims description 172
- 238000012549 training Methods 0.000 claims description 149
- 230000009471 action Effects 0.000 claims description 125
- 238000013528 artificial neural network Methods 0.000 claims description 52
- 230000003993 interaction Effects 0.000 claims description 29
- 238000013507 mapping Methods 0.000 claims description 28
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 230000008859 change Effects 0.000 claims description 15
- 230000003334 potential effect Effects 0.000 claims description 11
- 230000000306 recurrent effect Effects 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims 1
- 238000004088 simulation Methods 0.000 abstract description 25
- 230000008569 process Effects 0.000 description 44
- 230000006399 behavior Effects 0.000 description 28
- 238000010586 diagram Methods 0.000 description 20
- 238000004422 calculation algorithm Methods 0.000 description 19
- 230000008447 perception Effects 0.000 description 16
- 238000012545 processing Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 14
- 238000010801 machine learning Methods 0.000 description 13
- 230000004807 localization Effects 0.000 description 12
- 230000006870 function Effects 0.000 description 10
- 230000007613 environmental effect Effects 0.000 description 9
- 230000003068 static effect Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000033001 locomotion Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 230000004075 alteration Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000013179 statistical model Methods 0.000 description 3
- 230000016571 aggressive behavior Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010238 partial least squares regression Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000012628 principal component regression Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010899 nucleation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000013488 ordinary least square regression Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
Definitions
- Machine learned models can be employed to predict an action for a variety of robotic devices.
- planning systems in autonomous and semi-autonomous vehicles determine actions for a vehicle to take in an operating environment.
- Actions for a vehicle may be determined based in part on avoiding objects present in the environment. For example, an action may be generated to yield to a pedestrian, to change a lane to avoid another vehicle in the road, or the like. Accurately predicting future object trajectories may be necessary to safely operate the vehicle in the vicinity of the object.
- FIG. 1 is an illustration of an autonomous vehicle in an example environment, in which an example machine learned model may process input data to predict an object trajectory or a scene.
- FIG. 2 illustrates an example block diagram of an example computer architecture for implementing techniques to generate example output data, as described herein.
- FIG. 3 illustrates another block diagram of an example computer architecture for implementing techniques to generate example output data, as described herein.
- FIG. 4 illustrates an example block diagram of an example generative adversarial network implemented by a computing device to generate an object trajectory of an environment.
- FIG. 5 depicts an example block diagram of an example training component implemented by a computing device to train an example codebook.
- FIG. 6 depicts an example block diagram of an example training component implemented by a computing device to train an example machine learned model.
- FIG. 7 depicts an example block diagram of an example training component implemented by a computing device to train one or more example machine learned models.
- FIG. 8 is a block diagram of an example system for implementing the techniques described herein.
- FIG. 9 is a flowchart depicting an example process for determining an obj ect traj ectory using one or more example models.
- FIG. 10 is flowchart depicting an example process for training a codebook using an example training component.
- FIG. 11 illustrates an example block diagram of an example computer architecture for implementing techniques to generate example latent variable data, as described herein.
- FIG. 12 illustrates another example block diagram of an example computer architecture for implementing techniques to generate example output data, as described herein.
- FIG. 13 illustrates an example block diagram of an example diffusion model implemented by a computing device to generate latent variable data.
- FIG. 14 is flowchart depicting an example process for determining discrete latent variable data using an example diffusion model.
- This application describes techniques for applying and/or training one or more models to predict motion of an object in an environment.
- one or more machine learned models may process data representations associated with an object and an environment, and determine a trajectory of the object at a future time.
- a first machine learned model can access tokens from a codebook and arrange the tokens in a sequence for processing by a second machine learned model that determines the obj ect traj ectory .
- the tokens in the codebook can represent potential actions of the obj ect in the environment.
- the object trajectory determined by the second machine learned model may be considered during vehicle planning thereby improving vehicle safety as a vehicle navigates in the environment by planning for the possibility’ that an obj ect may intersect with the vehicle.
- a computing device may implement a prediction component comprising one or more machine learned models and a codebook to predict a future characteristic (e.g., a state, an action, etc.) for an object (e.g., a bicycle, a pedestrian, another vehicle, an animal, etc.) that may result in an impact to operation of an autonomous vehicle.
- a machine learned model may determine a possible trajectory (e.g., direction, speed, and/or acceleration) for an object to follow in an environment at a future time.
- a vehicle computer device of the autonomous vehicle may predict a candidate trajectory for the vehicle (using a same or different model) with consideration to an output (e.g., the trajectory) from the machine learned model thereby improving vehicle safety by providing the autonomous vehicle with a candidate trajectory that is capable of safely avoiding the potential future positions of the object that may impact operation of the vehicle (e.g.. intersect a trajectory of the autonomous vehicle, cause the autonomous vehicle to swerve or brake hard, etc ).
- a first machine learned model comprising one or more self-attention layers (e.g., a Transformer model) can receive tokens representing an object action, an object state, etc. from the codebook and arrange the tokens in a sequence to represent object behaviors (e.g., relative to the environment, relative to the autonomous vehicle, relative to another object, and so on).
- the first machine learned model can, for example, determine an output by using the one or more self-attention layers to arrange tokens in order or cluster the tokens.
- the first machine learned model can employ an autoregressive algorithm or other techniques to sample tokens from the codebook.
- the arrangement, set, or cluster of tokens output by the first machine learned model represents discrete latent variables, or interactions between tokens.
- the codebook can map, identify, or determine feature vectors that correspond to the discrete latent variables to represent the output of the first machine learned model as a set of feature vectors that represent continuous variables.
- the codebook can convert the sequence of tokens that represents discrete latent variables into a new sequence of tokens that represents feature vectors.
- the feature vectors representing continuous variables can, for example, be fed into a second machine learned model (e.g., a decoder, a generator of a Generative Adversarial Network (GAN), a Graph Neural Network (GNN), a Recurrent Neural Network (RNN), another Transformer model, etc.) as input data to determine a potential action or behavior of one or more objects in an environment.
- the second machine learned model can output data representing one or more of an object trajectory, a heatmap showing a likelihood of occupancy by an object(s), object state data, or scene data usable in simulation, just to name a few.
- An output of the second machine learned model can be sent to the vehicle computing device for use in planning operations of a vehicle (e.g., to determine a candidate trajectory for the vehicle).
- the second machine learned model can generate a simulated environment that includes one or more object trajectories based at least in part on receiving a sequence of tokens representing feature vectors, though in some examples the second machine learned model can also or instead receive a sequence of tokens representing discrete latent variables.
- the second machine learned model can determine a response by the vehicle to the object trajectory in the simulated environment and control the vehicle in a real-world environment based at least in part on the response.
- a computing device can implement a training component to train the first machine learned model, the second machine learned model, the codebook, or other machine learned model described herein.
- the first machine learned model e.g., a Transformer model
- the first machine learned model can be trained based at least in part on a set of conditions representing a previous action, a previous position, or a previous acceleration of the object.
- the first machine learned model can be trained, or conditioned, to output a token sequence, a set of tokens, or a cluster of tokens based at least in part on specific characteristics of the object, the environment, or a vehicle having a vehicle computing device to implement these techniques.
- the second machine learned model can be trained to improve predictions that represent potential actions by objects in an environment of a vehicle based at least in part on the output by the first machine learned model (e.g., the token sequence, the set of tokens, or the cluster of tokens).
- the codebook can represent a statistical model and/or a machine learned model that is trainable to determine and store data associated with individual tokens, a sequence of tokens, feature vectors, or a combination thereof.
- training the codebook can include determining a certain number of types of tokens available in the codebook usable to represent the vehicle, the object(s), and/or the environment and limiting the number of tokens for storage to the detennined number.
- the number of tokens in the codebook can affect sampling of the tokens by another entity such as the first machine learned model (e.g., a first token in the codebook can represent a state of an autonomous vehicle and a subsequent token can represent a state of an object so that the first token is considered differently when sampled for inclusion in a sequence).
- a first token in the codebook can represent a state of an autonomous vehicle and a subsequent token can represent a state of an object so that the first token is considered differently when sampled for inclusion in a sequence).
- the training component can determine the tokens to include in the codebook based at least in part on receiving feature vectors from a machine learned model (e.g., a third machine learned model).
- a machine learned model e.g., a third machine learned model
- a computing device can receive sensor data, log data, map data, and so on, as input and determine feature vectors representing an object, a vehicle, and/or an environment.
- a GNN can determine the feature vectors based at least in part on input data representing an object trajectory, an object state, a simulated scene, a real -world scene, etc.
- the codebook can receive the feature vectors from the machine learned model and map the feature vectors to corresponding tokens by comparing descriptions (e.g., metadata indicating an identifier, an index number, etc.) associated with each feature vector and token.
- the feature vectors may be generated to represent a current state of the object (e g., a heading, a speed, etc.) and/or a behavior of the object over time (e.g., a change in yaw, speed, or acceleration of the object).
- the machine learned model determines additional feature vectors to represent other objects and/or features of the environment.
- Feature vectors output from a machine learned model can be input into a quantizer to generate discretized feature vectors for sending to the codebook (or other model or component of the computing device).
- the codebook can receive the discretized feature vectors output by the quantizer and determine which token(s) of available tokens in the codebook are similar (e.g., information or characteristics associated with a feature vector (or object associated therewith) is similar to information or characteristics associated with a token (or object associated therewith), etc.).
- Quantization of the feature vectors enables the codebook to receive discrete feature vectors rather than continuous feature vectors which can enable the codebook to make better mappings between discrete feature vectors and tokens representing discrete latent variables.
- the prediction component can implement the quantizer to train the codebook by providing discrete feature vectors as input to a training component that determines a token to correspond to the feature vectors and/or the discrete feature vectors.
- a model of the prediction component may define an algorithm for use in predicting a trajectory associated with an object.
- the model to predict object trajectories may include one or more machine learned algorithm(s) to identify, detect, and/or classify attributes of a detected object.
- the model may include one or more classifiers.
- a classifier can be representative of a decision tree, a Neural Network, a Naive Bayes classifier, and/or other algorithm(s) that perform classification.
- a classifier may represent a function that assigns a class or label to an object, targets an object in an environment, and/or categorizes an object.
- a classifier can be used with a machine learning technique to train a model to perfonn a function (e.g., determine trajectories of an object, etc.).
- a machine learned model may receive a vector representation of data compiled into an image format representing a top-down view of an environment.
- the top-down view may be determined based at least in part on map data and/or sensor data captured from or associated with a sensor of an autonomous vehicle in the environment.
- the vector representation of the top-down view can represent one or more of: an attribute (e.g., position, class, velocity, acceleration, yaw, turn signal status, etc.) of an object, history of the object (e.g., location history', velocity' history', etc.), an attribute of the vehicle (e.g., velocity, position, etc.), crosswalk permission, traffic light permission, and the like.
- the data can be represented in a top-down view of the environment to capture context of the autonomous vehicle (e.g., identify actions of other vehicles and pedestrians relative to the vehicle).
- a machine learned model may receive a vector representation of data associated with one or more objects in the environment. For instance, the machine learned model can receive (or in some examples determine) one or more vectors representing one or more of: position data, orientation data, heading data, velocity data, speed data, acceleration data, yaw rate data, or turning rate data associated with the object.
- the vehicle computing device may be configured to determine actions to take while operating (e.g., trajectories to use to control the vehicle) based on predicted object trajectories, heatmap data, scene data, etc. determined by one or more models.
- the actions may include a reference action (e.g., one of a group of maneuvers the vehicle is configured to perform in reaction to a dynamic operating environment) such as a right lane change, a left lane change, staying in a lane, going around an obstacle (e.g., double-parked vehicle, a group of pedestrians, etc.), or the like.
- the actions may additionally include sub-actions, such as speed variations (e.g., maintain velocity, accelerate, decelerate, etc.), positional variations (e g., changing a position in a lane), or the like.
- sub-actions such as speed variations (e.g., maintain velocity, accelerate, decelerate, etc.), positional variations (e g., changing a position in a lane), or the like.
- an action may include staying in a lane (action) and adjusting a position of the vehicle in the lane from a centered position to operating on a left side of the lane (sub-action).
- the vehicle computing system may implement different model (s) and/or component(s) to simulate future states (e.g., estimated states) by projecting an autonomous vehicle and relevant object(s) forward in the environment for the period of time (e.g., 5 seconds, 8 seconds, 12 seconds, etc.).
- the model(s) may project the object(s) (e.g., estimate future positions of the object(s)) forward based on a predicted trajectory associated therewith.
- the model(s) may predict a trajectory of a vehicle and predict attributes about the vehicle including whether the traj ectory will be used by vehicle to arrive at a predicted location in the future.
- the vehicle computing device may project the vehicle (e.g., estimate future positions of the vehicle) forward based on the vehicle trajectories output by the model.
- the estimated state(s) may represent an estimated position (e.g.. estimated location) of the autonomous vehicle and an estimated position of the relevant obj ect(s) at a time in the future.
- the vehicle computing device may determine relative data between the autonomous vehicle and the object(s) in the estimated state(s). In such examples, the relative data may include distances, locations, speeds, directions of travel, and/or other factors between the autonomous vehicle and the object.
- the vehicle computing device may determine estimated states at a pre-determined rate (e.g., 10 Hertz, 20 Hertz, 50 Hertz, etc.). In at least one example, the estimated states may be performed at a rate of 10 Hertz (e.g., 80 estimated intents over an 8 second period of time).
- the vehicle computing system may store sensor data associated with an actual location of an object at the end of the set of estimated states (e.g., end of the period of time) and use this data as training data to train one or more models.
- stored sensor data or perception data derived therefrom
- Such training data may be determined based on manual annotation and/or by determining a change associated semantic information of the position of the object.
- the vehicle computing device may provide the data to a remote computing device (i.e., computing device separate from vehicle computing device) for data analysis.
- the remote computing device may analyze the sensor data to determine one or more labels for images, an actual location, yaw. speed, acceleration, direction of travel, or the like of the object at the end of the set of estimated states.
- ground truth data associated with one or more of: positions, trajectories, accelerations, directions, and so may be determined (either hand labelled or determined by another machine learned model) and such ground truth data may be used to determine a trajectory of an object.
- corresponding data may be input into the model to determine an output (e.g., a traj ectory, scene data, and so on) and a difference between the determined output, and the actual action by the object (or actual scene data) may be used to train the model.
- the machine learned model may be configured to determine an initial position of each object in an environment (e.g., a physical area in which a vehicle operates and/or a simulated environment) indicated by the sensor data.
- Each determined trajectory may represent a potential direction, speed, and acceleration that the object may travel through the environment.
- the object trajectories predicted by the models described herein may be based on passive prediction (e.g., independent of an action the vehicle and/or another object takes in the environment, substantially no reaction to the action of the vehicle and/or other objects, etc.), active prediction (e.g., based on a reaction to an action of the vehicle and/or another obj ect in the environment), or a combination thereof.
- models may be representative of machine learned models, statistical models, heuristic models, or a combination thereof. That is, a model may refer to a machine learning model that learns from a training data set to improve accuracy of an output (e.g., a prediction). Additionally or alternatively, a model may refer to a statistical model that is representative of logic and/or mathematical functions that generate approximations which are usable to make predictions.
- the techniques discussed herein may improve a functioning of a vehicle computing system in a number of ways.
- the vehicle computing system may determine an action for the autonomous vehicle to take based on a determined trajectory of the object represented by data.
- a model may output object trajectories and associated probabilities that improve safe operation of the vehicle by accurately characterizing motion of the object with greater detail as compared to previous models.
- representing the environment and the object(s) as tokens can represent a simplified representation of the environment for the purposes of generating prediction probability(ies).
- the tokens can represent the environment without extracting particular features of the environment, which may simplify the generation of the prediction system and subsequent generation of at least one predicted trajectory (e.g., reduce an amount of memory and/or processing resources).
- evaluating an output by a model(s) may allow an autonomous vehicle to generate more accurate and/or safer trajectories for the autonomous vehicle to traverse an environment.
- predictions based on token sequences may account for object to object dependencies, yielding safer decision-making of the system. These and other improvements to the functioning of the computing device are discussed herein.
- the object trajectory determination techniques discussed herein may reduce training time by training in parallel and/or improve accuracy by reducing an amount of data to be stored. Further, such techniques provide for training networks based on larger datasets than would otherwise not be enabled due to, for example, limitations of memory 7 , processing power, etc. (thereby creating more robust learned netw orks in shorter amounts of time).
- the methods, apparatuses, and systems described herein can be implemented in a number of ways. Example implementations are provided below' with reference to the following figures. Although discussed in the context of an autonomous vehicle in some examples below; the methods, apparatuses, and systems described herein can be applied to a variety' of systems. In one example, machine learned models may be utilized in driver-controlled vehicles in which such a system may provide an indication of whether it is safe to perfonn various maneuvers. In another example, the methods, apparatuses, and systems can be utilized in an aviation or nautical context. Additionally, or alternatively, the techniques described herein can be used with real data (e.g., captured using sensor(s)), simulated data (e.g., generated by a simulator), or any combination thereof.
- real data e.g., captured using sensor(s)
- simulated data e.g., generated by a simulator
- FIG. 1 illustrates an autonomous vehicle (vehicle 102) in an example environment 100, in which an example machine learned model (prediction component 104) may process input data to predict an object trajectory ⁇ or a scene.
- the vehicle 102 includes the prediction component 104 that represents one or more machine learned models and a codebook for processing various ty pes of input data (e.g., feature vectors, sensor data, map data, etc.) associated with the one or more objects in the environment 100, and determines output data 106 representing potential object trajectories and/or scene data usable for simulation.
- the prediction techniques described herein may be implemented at least partially by or in association with a vehicle computing device (e.g., vehicle computing device 804) and/or a remote computing device (e.g., computing device(s) 834).
- a vehicle computing device associated with the vehicle 102 may be configured to detect one or more objects (e.g., objects 108 and 110) in the environment 100, such as via a perception component.
- the vehicle computing device may detect the objects, based on sensor data received from one or more sensors.
- the sensors may include sensors mounted on the vehicle 102, and include, without limitation, ultrasonic sensors, radar sensors, light detection and ranging (lidar) sensors, cameras, microphones, inertial sensors (e.g., inertial measurement units, accelerometers, gyros, etc.), global positioning satellite (GPS) sensors, and the like.
- the vehicle 102 may be configured to transmit and/or receive data from other autonomous vehicles and/or the sensors.
- the data may include sensor data, such as data regarding the objects detected in the environment 100.
- the vehicle computing device can receive the sensor data and can semantically classify the detected objects (e.g., determine an object type), such as, for example, whether the object is a pedestrian, such as object 108, a vehicle such as object 110, a building, a truck, a motorcycle, a moped, or the like.
- the objects may include static objects (e.g.. buildings, bridges, signs, etc.) and dynamic objects such as other vehicles, pedestrians, bicyclists, or the like.
- a classification may include another vehicle (e.g., a car, a pick-up truck, a semi-trailer truck, a tractor, a bus, a train, etc.), a pedestrian, a child, a bicyclist, a skateboarder, an equestrian, an animal, or the like.
- the classification of the object may be used by a model to determine object characteristics (e.g., maximum speed, acceleration, maneuverability, etc.). In this way, potential trajectories by an object may be considered based on characteristics of the object (e.g., how the object may potentially move in the environment).
- the example environment 100 includes a crosswalk 112.
- the prediction component 104 provides functionality to determine an object trajectory 1 14 associated with the pedestrian 108, and determine an object trajectory 1 16 associated with the vehicle 110.
- the prediction component 104 can also or instead predict scene data that describes a simulated environment.
- the prediction component 104 can output one or more scenes usable in a simulation (also referred to as a scenario or estimated states) to determine a response by the vehicle 102 to a simulated object.
- the prediction component 104 can generate the output data 106 to represent one or more heat maps.
- the one or more predicted trajectories may be determined or represented using a probabilistic heat map to predict object behavior, such as that described in U.S. Patent Application Number 15/807,521, filed November 8, 2017, entitled "Probabilistic Heat Maps for Behavior Prediction,” which is incorporated herein by reference in its entirety and for all purposes.
- the prediction component 104 may be configured to receive and/or determine vector representations of one or more of: environment data 100 (e.g., top-down view data), object state(s), and vehicle state(s).
- the prediction component 104 can include a machine learned model (e.g., a Graph Neural Network) to generate one or more vectors to represent features of the environment (e.g., a roadway, a crosswalk, a building, etc.), a current state of an object (e.g., the pedestrian 108 and/or the vehicle 110), and/or a current state of the vehicle 102.
- the feature vector(s) can represent a rasterized image based on top-down view data. Additional details about inputs to the prediction component 104 are provided throughout this disclosure.
- the prediction component 104 can represent one or more machine learned models that are configured to exchange data with the codebook that stores tokens that correspond to the feature vectors.
- the feature vectors can represent continuous variables while the tokens can represent discrete latent variables.
- the codebook can receive feature vectors generated by a machine learned model (e.g., a first machine learned model) and map the feature vectors to a corresponding token.
- the tokens can represent a behavior or feature of an object, the vehicle 102, or the environment 100.
- a token can, for instance, represent how the object can move in the environment 100 at a future time.
- Another machine learned model e.g...
- a second machine learned model can access tokens from the codebook and arrange them in a set, cluster, or sequence that represents how object(s) and the vehicle 102 can potentially interact.
- Another machine learned model e.g., a third machine learned model such as a Generative Adversarial Network
- a third machine learned model such as a Generative Adversarial Network
- the prediction component 104 can include a quantizer to receive feature vectors as input and output discretized feature vectors. For instance, feature vectors output from a machine learned model (e.g., the GNN or other model) can be input into a quantizer to generate discretized feature vectors for sending to the codebook (or other model or component of a computing device associated with the vehicle 102).
- the mapping of the tokens and feature vectors can include mapping a token to a discrete feature vectors output by the quantizer. Additional details of predicting object behavior using a GNN are described in U.S. Patent Application Serial No.
- the output data 106 from the prediction component 104 can be used by a vehicle computing device in a variety of ways. For instance, information about the object trajectories and/or the scene data can be used by a planning component of the vehicle computing device to control the vehicle 102 in the environment 100 (e.g., determine a vehicle traj ectory 118 and/or control a propulsion system, a braking system, or a steering system).
- the output data 106 may also or instead be used to perform a simulation by setting up conditions (e.g., an intersection, a number of objects, a likelihood for the object to exhibit abnormal behavior, etc.) for use during the simulation.
- a training component of a remote computing device such as the computing device(s) 834 (not shown) and/or the vehicle computing device 804 (not shown) may be implemented to train the prediction component 104.
- Training data may include a wide variety of data, such as image data, video data, lidar data, radar data, audio data, other sensor data, etc., that is associated with a value (e.g., a desired classification, inference, prediction, etc.).
- training data can comprise determinations based on sensor data, such as a bounding boxes (e.g., two-dimensional and/or three- dimensional bounding boxes associated with an object), segmentation information, classification information, an object trajectory, and the like.
- Such training data may generally be referred to as a "ground truth.”
- the training data may be used for image classification and, as such, may include an image of an environment that is captured by an autonomous vehicle and that is associated with one or more classifications.
- a classification may be based on user input (e.g., user input indicating that the image depicts a specific type of object) or may be based on the output of another machine learned model.
- labeled classifications (or more generally, the labeled output associated with training data) may be referred to as ground truth.
- FIG. 2 illustrates an example block diagram 200 of an example architecture for implementing techniques to generate example output data as described herein.
- the example 200 includes a computing device (e.g., the vehicle computing device(s) 804 and/or the computing device(s) 834) that includes the prediction component in FIG. 1.
- the techniques described in relation to FIG. 2 can be performed as the vehicle 102 navigates in the environment 100 (e.g.. a real -world environment or a simulated environment).
- a codebook 202 can store tokens that represent object behavior, vehicle behavior, and/or environment features.
- a first token can represent an object state
- a second token can represent a vehicle state
- a third token can represent environment features (e.g., a traffic signals, crosswalk, weather, etc.).
- the object state can indicate a position, orientation, velocity, acceleration, yaw, etc. of an object.
- a token may also or instead indicate an action by the object and/or the vehicle (e.g., go straight, turn right, turn left, etc.).
- the codebook 202 can be configured to store a certain number of tokens.
- the tokens of the codebook 202 can represent discrete latent variables that enable a Transformer model 204 to sample tokens that represent potential interactions between a first object relative to a second object without relying on continuous distribution techniques (e.g., a Gaussian distribution) thereby saving computational resources.
- continuous distribution techniques e.g., a Gaussian distribution
- a first token in the codebook 202 can represent a characteristic (e.g.. a state or an action) such as one of: a yield action, a drive straight action, a left turn action, a right turn action, a brake action, an acceleration action, a steering action, or a lane change action, and a second token can represents a position of the object.
- a characteristic e.g.. a state or an action
- An additional token can represent an action or state associated with the vehicle 102.
- the codebook 202 can exchange data with a Transformer model 204 that is configured to output a token sequence 206.
- the Transformer model 204 can sample tokens from the codebook 202 using an autoregressive technique and arrange the tokens in a sequence or set that represents potential interactions between obj ects and the vehicle.
- the Transformer model 204 can include one or more self-attention layers that arrange a subsequent token with consideration to a first token.
- a machine learned model with one or more self-attention layers can be configured to determine scores for at least some of the tokens from the codebook such as a first score indicating a dependency between a first token and a second token and a second score indicating a dependency between a third token and one of the first token, the second token, or a fourth token.
- determining the sequence or set of tokens is based at least in part on the scores associated with various tokens.
- the Transformer model 204 can sample tokens from the codebook 202 using a combination of the autoregressive technique for some tokens and another technique different from the autoregressive technique for some other tokens. For example, the Transformer model 204 can determine the token sequence 206 by determining two or more tokens in the token sequence 206 using an autoregressive algorithm, and determining another token in the token sequence 206 randomly or without consideration to a previously determined token. In various examples, the Transformer model 204 can be trained using training data to condition the Transformer model 204 to determine the token sequence 206 with consideration to historical object state data, scene data, environmental data, and so on.
- training the Transformer model based at least in part on a set of conditions, at least one condition of the set of conditions comprising a previous action, a previous position, or a previous acceleration of the object.
- the Transformer model 204 can output the token sequence 206 having tokens that represent potential interactions between an object and the vehicle. Additional detail of training the codebook 202 is discussed in FIG. 5 and elsewhere.
- Determining the token sequence 206 can be based at least in part on the fixed number of tokens in the codebook. For example, a first token can represent an action by the vehicle, and subsequent tokens can represent an action or state of an obj ect, and yet another token can represent an action or state of an additional object. Thus, the token sequence 206 can represent potential interactions between two or more objects relative to the vehicle.
- the token sequence 206 can also or instead be based at least in part on a number of tokens in the codebook 202. For example, a greater number of tokens can result in more detailed interactions, but can take more processing resources to generate than using fewer tokens.
- the codebook 202 can be trained to include a particular number of tokens. Further, training the codebook 202 can include assigning a token to represent a previous state of an object. Additional detail of training the codebook 202 is discussed in FIG. 5 and elsewhere.
- the token sequence 206 can be input into a machine learned model 208 (e.g., a CNN, a GNN, etc.) configured to generate output data 210.
- the output data 210 can include one or more: an object trajectory, a heatmap, or scene data.
- the output data 210 can include a scene 212 which can further include the object trajectory 116.
- the machine learned model 208 can, for example, generate output data 210 for different times in the future. For instance, at a given time, the machine learned model 208 can generate the output data 210 for different times in the future (e.g., every 0.1 second for four second, or some other time period or frequency). In various examples, the machine learned model 208 can iteratively determine the output data 210 for each future time based at least in part on the output data 210 associated with a previous time. In other words, the machine learned model 208 can predict object trajectories, scenes, heat maps, or the like for different times in the future with later times considering potential actions by an object at a previous time.
- vectorized data can be determined by a graph neural network which is a type of neural network which operates on a graph structure.
- the graph neural network may be partially connected or fully connected with separate edge features associated with distinct pairs of nodes in the graph neural network.
- Machine-learning based inference operations may be performed to update the state of the graph neural network, including updating nodes and/or edge features, based on internal inputs determined from the graph neural network itself and/or based on updated observations perceived by the autonomous vehicle in the environment.
- Updates to the graph neural network may represent predicted future states of the environment, and the autonomous vehicle may decode portions of the graph neural network to determine predictions for entity positions, velocities, trajectories, and/or other updated predicted states for the entities in the environment. Additional details of graph neural networks are described in U.S. Patent Application Serial No. 17/187,170, filed on February 26, 2021, entitled “Graph Neural Network With Vectorized Object Representations in Autonomous Vehicle Systems.” which is incorporated herein by reference in its entirety and for all purposes.
- the computing device generating feature vectors based at least in part on state data associated with a vehicle and/or object(s).
- the state data can include data describing an object (e.g., the pedestrian 108, the vehicle 110 in FIG. 1) and/or a vehicle (e.g., vehicle 102) in an environment, such as in example environment 100.
- the state data can include, in various examples, one or more of position data, orientation data, heading data, velocity data, speed data, acceleration data, yaw rate data, or turning rate data associated with the object and/or the vehicle.
- sensor data or processed sensor data may be input into a machine learned model (e.g., a convolutional neural network (CNN), a Recurrent Neural Network (RNN), a graph neural network (GNN), etc.), which can determine a feature vector for processing by a machine learned model.
- a machine learned model e.g., a convolutional neural network (CNN), a Recurrent Neural Network (RNN), a graph neural network (GNN), etc.
- CNN convolutional neural network
- RNN Recurrent Neural Network
- GNN graph neural network
- functionality associated with the Transformer model 204, the codebook 202, and/or the machine learned model 208 can be included in the prediction component 104 to process, for example, feature vectors and determine the output data 210 indicative of the object trajectory 116 for the object 110.
- FIG. 3 illustrates an example block diagram 300 of an example computer architecture for implementing techniques to generate example output data as described herein.
- the example 300 includes a computing device (e.g., the vehicle computing device(s) 804 and/or the computing device(s) 834) that includes the prediction component in FIG. 1.
- the techniques described in relation to FIG. 3 can be performed as the vehicle 102 navigates in the environment 100 (e.g., a real- world environment or a simulated environment).
- input data 302 representing object trajectories associated with one or more objects, object state data, and scene data can be input into an encoder 304.
- the encoder 304 can represent a machine learned model such as a GNN, RNN, CNN, and the like, and output one or more feature vectors 306 which can be sent to a codebook 308 and a quantizer 310.
- the quantizer 310 can receive the feature vectors 306 output by the encoder 304, and discretize the feature vectors 306 to output the discretized feature vectors 312.
- the codebook 308 can receive the discretized feature vectors 312 while in other examples the codebook 308 can receive the feature vectors 306.
- a machine learned model 314 e.g., a Transformer model
- the token sequence 316 can represent tokens arranged or clustered in a particular order.
- the machine learned model 314 can include one or more self-attention layers to cause the tokens to be arranged with attention to features of another token.
- the codebook 308 can receive the discretized feature vectors 312 and determine the token sequence 316 by mapping index values of the discretized feature vectors 312 to corresponding index values of tokens.
- an arrow is shown from the codebook 308 to the token sequence 316 to indicate examples when the quantizer 310 "uses" the codebook 308 (or otherwise receives data from the codebook 308) to convert the discretized feature vectors 312 to tokens before being input into a decoder 318 (e.g., such as when the encoder 304, the quantizer 310, and the decoder 318 are being trained).
- the quantizer 310 is not used, such as when the machine learned model 314 exchanges data with the codebook 308 to determine the token sequence 316.
- FIG. 3 represents different possible implementations which do not require all of the illustrated features.
- the machine learned model 314 can iteratively assign a token to a first object, followed by assigning a token to a second object, and so on for different objects in the environment.
- the machine learned model 314 can predict a probability that each token (or characteristics describing the token, or an object of the token) in the codebook corresponds to characteristics of the first object, and assign, or associating, the token having the highest probability as a first token of a token sequence.
- the machine learned model 314 can predict new probabilities for the tokens in the codebook (including the token assigned as the first token), and determine the second token after the first token based at least in part on the new probabilities.
- a same token of the codebook can be assigned in another position of the sequence.
- the token sequence 316 can be determined to include a number of tokens based at least in part on a hyperparameter indicating a length, a size, or a total number of tokens to include in the token sequence.
- the order of the tokens in the token sequence 316 is determined based at least in part on a heuristic, or hyperparameter.
- a heuristic can assign an order to a set of objects, and the machine learned model 314 can iteratively assign a token to each object in the set of objects.
- a first token e.g., a state or an action
- a second token can be assigned to the first object or a second object.
- the second token can be assigned based at least in part on characteristics of the first object or first token.
- a decoder 318 can receive the token sequence 316 and determine the output data 320.
- the decoder 318 can represent a machine learned model such as a GNN, a GAN, an RNN. another Transformer model, etc.
- the output data 320 can, for example, be similar to the output data 106 or the output data 210 and represent an object trajectory, scene data, simulation data, and so on.
- input data for the decoder 318 can be received from the codebook 308 and/or the machine learned model 314.
- the token sequence 316 can be sent from the machine learned model 314 to the decoder 318 and/or feature vectors associated with the token sequence 316 can be sent from the codebook 308 to the decoder 318.
- the feature vectors 306 can represent a vector representation 322 of an environment (e.g., the environment 100).
- the computing device may receive sensor data associated with an autonomous vehicle 324 (e.g., the vehicle 102) and an object 326 (e.g., the object 110).
- the vector representation 322 can be determined by a graph neural netw ork w hich is a type of neural network which operates on a graph structure.
- Machinelearning based inference operations may be performed to update the state of the graph neural network, including updating nodes and/or edge features, based on internal inputs determined from the graph neural network itself and/or based on updated observations perceived by the autonomous vehicle in the environment.
- Updates to the graph neural network may represent predicted future states of the environment, and the autonomous vehicle may decode portions of the graph neural network to determine predictions for object positions, velocities, trajectories, and/or other updated predicted states for the obj ects in the environment.
- the vector representation 322 may, in some examples, be determined based on a polyline (e.g., a set of line segments) representing one or more map elements.
- the graph neural network can encode and aggregate the polyline into a node data structure representing with the map element(s).
- an object or feature of the environment can be represented by polylines (e.g., a lane can be segmented into a number of smaller line segments whose length, location, orientation angle (e.g., yaw), and directionality, when aggregated, define the lane).
- a crosswalk may be defined by four connected line segments, and a roadway edge or roadway centerline may be multiple connected line segments.
- the environment may be represented by the vector representation 322 comprising vectors to represent objects and/or features of the environment including one or more of: an attribute (e.g.. position, velocity, acceleration, yaw, etc.) of the object 326, history of the object 326 (e.g., location history, velocity history, etc.), an attribute of the autonomous vehicle 324 (e.g., velocity, position, etc.), and/or features of the environment (e.g., roadway boundary, roadway centerline, crosswalk permission, traffic light permission, and the like).
- the vector representation 322 can comprise vectors to represent features of the environment including roadw ay boundary vectors 328 and roadway centerline vectors 330, among others.
- the computing device can implement the encoder 304 (or other machine learned model) to generate the vector representation 322 based at least in part on state data associated with the autonomous vehicle 324 and/or the object 326.
- the state data can include data describing an object (e.g., the pedestrian 108, the vehicle 110 in FIG. 1) and/or a vehicle (e.g., vehicle 102) in an environment, such as in example environment 100.
- the state data can include, in various examples, one or more of position data, orientation data, heading data, velocity data, speed data, acceleration data, yaw rate data, or turning rate data associated with the object and/or the vehicle.
- vectors associated with an environment, a vehicle state, and/or an object state may be combined as the vector representation 322.
- the vector representation 322 may be input into a machine learned model (e.g., the decoder 318) which can determine the output data 320 indicative of predicted trajectories for the obj ect(s) in the environment.
- a machine learned model e.g., the decoder 3128 which can determine the output data 320 indicative of predicted trajectories for the obj ect(s) in the environment.
- FIG. 4 illustrates an example block diagram 400 of an example generative adversarial network (Generative Adversarial Network (GAN) 402) implemented by a computing device to generate an object trajectory of an environment.
- GAN Geneative Adversarial Network
- the techniques described in the example 400 may be performed by a computing device such as the vehicle computing device(s) 804 and/or the computing device(s) 834.
- the GAN 402 can include at least the functionality described in relation to the machine learned model 208 of FIG. 2 and/or the decoder 318 of FIG. 3.
- the GAN 402 comprises a generator component 404 that is configured to receive an input 406 of one or more feature vectors and/or one or more token sequences usable to determine an output 408 representing one or more object trajectories and/or simulation data.
- the output 408 of the generator component 404 can be input into a discriminator component 410 that is configured to also receive ground truth 412 (e.g., a real world scene and/or a real-world object trajectory), and determine an output 414 indicative of a classification of the output 408 from the generator component 404 (e.g., determine whether the object trajectory or scene is real or fake).
- the output 414 can be used to train the discriminator component 416 and/or train the generator component 418.
- the GAN 402 comprises the generator component 404 that is configured to receive one or more token sequences (e.g., the token sequence 206, the token sequence 316) as input and determine an object trajectory for one or more objects and/or data representing a scene usable in a simulation.
- the output 408 by the generator component 404 is usable for performing a simulation over time by the vehicle computing device.
- the computing device can perform a simulation with an autonomous vehicle in the scene.
- the GAN 402 can be implemented for training the codebook 308, the machine learned model 314, and/or the decoder 318.
- the discriminator component 410 of the GAN 402 can determine whether the output 408 is from the generator component 404 or is ground truth.
- the GAN 402 can be implemented to improve determinations by the generator component 404 (e.g., performing functionality of the decoder 318).
- the input 406 into the generator component 404 can comprise vector representations of an environment (e.g.. the environment 100) and objects in the environment.
- the GAN 402 can receive a first vector representation comprising one or more vectors representing static features in an environment and receive a second vector representation of an object in the environment.
- the output 414 can be used to train the generator component 404 to improve realism of the output 408.
- the generator component 404 can receive an indication of the classification (e g., a state of the output) and update one or more parameters used during processing. Additionally or alternatively, the generator component 404 can receive, as training data, a token sequence(s) usable to determine an object trajectory or other object state at a future time such as a position, orientation, a velocity, an acceleration etc.
- the GAN 402 can receive training data representing a top-down view of an environment, map data, and object data.
- the training data can be represented as one or more vectors.
- the generator component 404 can detennine the output 408 based at least in part on the training data.
- the output can represent a scene in an environment.
- the scene can be used in a simulation with an autonomous vehicle to test a response of a vehicle controller that controls actions performed by the autonomous vehicle. Additional details of generating simulation scenarios is described in U.S. Patent Application Serial No. 16/457,679, filed on June 28, 2019, entitled “Synthetic Scenario Generator Based on Attributes,” and in U.S. Patent Application Serial No.
- FIG. 5 depicts an example block diagram 500 of an example training component implemented by a computing device to train an example codebook.
- the computing device e.g., the vehicle computing device(s) 804 and/or the computing device(s) 834
- the training component 502 can implement the training component 502 to process training data 504 and output a codebook 506 that includes tokens that represent obj ect behavior, vehicle behavior, and/or features of an environment.
- a machine learned model e.g. the Transformer model 204. the Transfonner model 314.
- the token sequence can represent an arrangement of potential actions between a vehicle (e.g., the vehicle 102) and one or more objects in an environment.
- a machine learned model that are based on the codebook can be improved (e.g., the machine learned model that receives a set of tokens from the codebook can determine the token sequence more efficiently and accurately).
- Training the codebook 506 can include the training component 502 processing the training data 504 (e.g., object state data, continuous feature vectors, discrete feature vectors, environment data, etc.) to determine a token to represent an object state, a feature vector, a static object in the environment, a dynamic object in the environment, and/or traffic signals, etc.
- the training component 502 can determine a size of the codebook (e.g., a number of tokens, a size of each token, etc.) based at least in part on the training data 504.
- the training component 502 can determine a mapping between token values representing discrete variables and feature vectors representing continuous variables or feature vectors representing discrete variables.
- the codebook 506 can include or utilize a list, table, or index to store associations between feature vectors and tokens.
- a token and a feature vector can include a same index value to indicate an association.
- the codebook 506 can receive a token having an index value (Token 0) and “look up” or find the feature vector having a corresponding index value (Feature Vector 0) . In this way, the codebook 506 can convert between the token and the feature vector depending on whether the codebook receives the token or the feature vector as input.
- functionality 7 associated with indexing or storing associations can be performed by another component, machine learned model, etc. other than the codebook 506.
- the codebook 506 can, for example, convert a set of discretized feature vectors from a quantizer to tokens by identifying an index value associated with each respective discretized feature vector and mapping the index value to a corresponding token.
- the set of discretized feature vectors e g., discretized feature vectors 312 from the quantizer (e.g., the quantizer 310) can include feature vectors representing an object, an autonomous vehicle, and/or environmental features.
- the codebook 506 can, for example, arrange the tokens in an order similar to an order of the index values of the discretized feature vectors.
- the codebook 506 can output the tokens as the token sequence 316 (e.g., independent of the machine learned model 314).
- the codebook 506 can also or instead convert tokens to a set of discretized feature vectors or a set of continuous feature vectors (e.g., for processing by a component or machine learned model). For instance, the codebook 506 can identify an index value for each respective token in a token sequence, and map an index value for a feature vector (continuous or discrete) to the index value of each token. Thus, a token sequence can be converted to different types of feature vectors for use by a machine learned model (e.g., the machine learned model 208 or the decoder 318).
- a machine learned model e.g., the machine learned model 208 or the decoder 318.
- information associated with a token can be compared to information associated with a feature vector and if a probability’ of similarity is above a threshold, the token can be associated with the feature vector as an entry in list, table, or other memory location.
- mapping a token to a feature vector can include identifying a characteristic of a feature vector and another characteristic of atoken (e.g., an action or a state of the vehicle associated with the feature vector and an action or a state of the vehicle associated with a token), and comparing the characteristic(s) of the vehicle associated with the feature vector to the characteristic(s) associated with the token.
- the training component 502 can, for example, map the feature vector to the token when a similarity of the corresponding characteristics meets or exceeds a similarity threshold.
- Data associated with the entries of various mappings betw een tokens and feature vectors can be used after training in a variety of ways including when a machine learned model outputs a token sequence of tokens representing discrete variables, and the codebook accesses the data associated with the entries to map a discrete variable of a token to a continuous variable of a feature vector.
- the codebook 506 can receive data associated with a discrete representation or a continuous representation and output the other of the discrete representation or the continuous representation.
- the codebook 506 can map each token in a token sequence to a respective feature vector represented by continuous variables thereby enabling a machine learned model to receive feature vectors as input rather than receiving the token sequence as discrete variables.
- the training component 502 can train the codebook 506 to convert a discrete token into a continuous feature vector (e.g., a Gaussian distribution), and vice versa.
- tokens from the codebook can be arranged into a sequence of discrete tokens which can more easily express multimodality (e.g., provide for more diverse object interactions) compared to continuous approaches.
- a token can represent a high-level behavior of an object of the vehicle such as a direction of travel, an indication to turn, stop, or accelerate, just to name a few.
- a first token can represent a vehicle traveling in a first direction at a particular velocity and a second token can represent an object facing a second direction and not moving.
- a token may also or instead represent a stop sign, crosswalk, a roadway, or other environmental feature.
- a machine learned model can arrange or cluster the tokens to represent an environment having a vehicle and different types of objects, roadways, etc.
- initializing or seeding the codebook entries can be based at least in part on a clustering algorithm. For example, since a number of tokens in the codebook is limited, the training component 502 can use a clustering algorithm to ensure that tokens selected for entry into the codebook 506 are diverse relative to each other (e.g.. have different characteristics, etc.). By clustering tokens, the training component 502 can select tokens from different areas of the cluster to maintain diversity among a type of token included in the codebook 506.
- the training data 504 may be determined based on manual annotation and/or by determining a change associated with semantic information of the position of the object. Further, detected positions over such a period of time associated with the object may be used to determine a ground truth trajectory to associate with the object.
- the vehicle computing device may provide the data to a remote computing device (i.e., computing device separate from vehicle computing device) for data analysis, hi such examples, the remote computing device may analyze the sensor data to determine one or more labels for images, an actual location, yaw, speed, acceleration, direction of travel, or the like of the obj ect at the end of the set of estimated states.
- ground truth data associated with one or more of: positions, trajectories, accelerations, directions, and so may be determined (either hand labelled or determined by another machine learned model) and such ground truth data may be used to determine a trajectory’ of another object such as a pedestrian.
- corresponding data may be input into the model to determine an output (e.g., a trajectory, and so on) and a difference between the determined output, and the actual action by the object may be used to train the model.
- FIG. 6 depicts an example block diagram 600 of an example training component implemented by a computing device to train an example machine learned model.
- the computing device e.g., the vehicle computing device(s) 804 and/or the computing device(s) 834
- the training component 602 can process training data 604 and output a Transformer model 606 that arranges or clusters tokens from a codebook (e.g., the codebook 506).
- the Transformer model 606 can receive multiple tokens from the codebook as input data and determine a token sequence for the multiple tokens.
- the token sequence can represent an arrangement of potential actions between a vehicle (e.g., the vehicle 102) and one or more objects (e.g., the object 108, the object 110) in an environment.
- a vehicle e.g., the vehicle 102
- objects e.g., the object 108, the object 110
- the Transformer model 606 By training the Transformer model 606 as described herein, determinations by the Transformer model 606 provide more accurate depictions of potential interactions between the vehicle and the object(s) in an environment.
- functionality’ associated with the Transformer model 204 in FIG. 2 and/or the Transformer model 314 in FIG. 3 can be included in the Transformer model 606.
- Training the Transformer model 606 can include the training component 602 processing the training data 604 (e.g., token data, object state data, vehicle state data, etc.) to determine a sequence of tokens to represent potential interactions between a vehicle and object(s) in an environment.
- the training component 602 can arrange or cluster tokens associated with the codebook by applying an autoregressive algorithm, an attention algorithm, a self-attention algorithm, or other algorithm, to the training data 604.
- the training data 604 can represent ground truth data, and the training component 602 can compare the ground truth data to an output by the Transformer model (e.g., a token sequence, a cluster of tokens, a set of tokens, etc.) as part of backpropagation.
- the Transformer model 606 can be trained to minimize loss associated with the output and maximize accuracy of the output to represent different scenarios with different objects.
- the training component 602 can train the Transformer model 606 to improve determinations of which tokens are related or otherwise depend on one another (e.g., a first token indicates a pedestrian approaching a crosswalk, a second token indicates the pedestrian is changing a heading to face the crosswalk, a third token indicates that a vehicle in the roadway is approaching the crosswalk, a fourth token indicates crosswalk signal permission, and so on).
- the Transformer model 606 can learn to arrange the tokens in a sequence based at least in part on determining which tokens collectively represent object actions and states.
- the training component 602 can, in some examples, compare an output of a model that receives input data that is based at least in part on the token sequence.
- the codebook can receive the token sequence and determine feature vectors that correspond to the tokens in the token sequence.
- the feature vectors can be input into a machine learned model (e.g., a GAN, etc.) that generates output data representing an object trajectory, scene data, or other data.
- the output data from the machine learned model can be compared to the output by the Transformer model (e.g., a token sequence, a cluster of tokens, a set of tokens, etc.) to improve subsequent outputs by the Transformer model.
- the training component 602 can provide functionality to determine a fewest number of tokens that collectively represent a vehicle, one or more objects, and features of an environment as well as potential actions by the vehicle and the object(s) over time.
- the training data 604 may be determined based on manual annotation and/or by determining a change associated with semantic infonnation of a token or a position of a token relative to another token. Further, detected positions of tokens in a set, cluster, or sequence can be considered over a period of time and may be used to determine a ground truth token sequence or token cluster that represents a real-world environment or a simulated environment.
- the vehicle computing device may provide data associated with a sensor, the codebook, the Transformer, or another machine learned model to a remote computing device (i.e., computing device separate from vehicle computing device) for data analysis.
- the remote computing device may analyze the data to determine one or more labels for images, an actual location, yaw, speed, acceleration, direction of travel, or the like of the object at the end of the set of estimated states.
- ground truth data associated with one or more of: positions, trajectories, accelerations, directions, and so may be determined (either hand labelled or determined by another machine learned model) and such ground truth data may be used to determine a trajectory' of another object such as a vehicle.
- corresponding data may be input into the model to determine an output (e.g., a trajectory, and so on) and a difference between the determined output, and the actual action by the object may be used to train the model.
- FIG. 7 depicts an example block diagram 700 of an example training component implemented by a computing device to train one or more example machine learned models.
- the computing device e.g.. the vehicle computing device(s) 804 and/or the computing device(s) 834
- the training component 702 can determine an order for training different entities (e.g., the encoder 304, the quantizer 310. the codebook 308, the machine learned model 314, or the decoder 318) to collectively improve determinations by each of the different entities.
- the output data 320 from the decoder 318 can be used to train the codebook 308, the quantizer 310, the encoder 304, as well as the decoder 318.
- the output data 320 can be backpropagated to the codebook 308 to improve an association between a token and a feature vector and/or to the encoder 304 to improve future feature vector predictions (e.g., by companng the output data 320 to the input data 302 or otherwise minimizing an error).
- the output data 320 can also or instead be used to train the decoder 318 to improve scene generation or obj ect traj ectory prediction based on receiving the token sequence 316 (or feature vectors associated therewith).
- the encoder 304. the quantizer 310, and the decoder 318 can be trained based at least in part on the output data 320.
- the quantizer 310 can implement the codebook 308 to determine tokens that correspond to the discretized feature vectors 312, and cause the codebook 308 to forward the tokens to the decoder 318 as the token sequence 316.
- the codebook 308 can using various indexing techniques to determine tokens for sending to the decoder 318 which generates the output data 320 usable to train the encoder 304, the quantizer 310, and/or the decoder 318.
- the training component 702 can train the quantizer 310 by comparing the discretized feature vectors 312 with ground truth data and adjust parameters associated with the quantizer 310 to minimize a rounding error between the feature vectors of the ground truth and the discretized feature vectors 312.
- the training component 702 can determine a setting for a lambda hyperparameter, or other parameter, to balance the errors associated with training different entities (e.g., determine a rate to improve a first loss of a first component relative to also improving a second loss of a second component that is interdependent on the first loss).
- the training component 702 can determine to train the encoder 304 to improve determinations of feature vectors to represent future intent from past history.
- the training component 702 can train the machine learned model 314 after training the encoder 304, though in other examples training can include training multiple entities substantially in parallel.
- the training component 702 can determine that the encoder 304 be trained prior to the machine learned model 314 since improvements by training the encoder 304 cause improvements to the machine learned model 314.
- Training the machine learned model 314 can include adjusting determinations by one or more self-attention layers to improve how tokens are arranged into a sequence by the one or more self-attention layers.
- FIG. 7 shows the token sequence 316 being received by the decoder 318 from the machine learned model 314, in other examples, the token sequence 316 can be sent to the decoder 318 from the codebook 308 after the codebook 308 determines feature vectors for tokens in the token sequence.
- the token sequence 316 (or associated feature vectors) usable as input data for the decoder 318 can be received from either the codebook 308 or the machine learned model 314.
- FIG. 8 is a block diagram of an example system 800 for implementing the techniques described herein.
- the system 800 may include a vehicle, such as vehicle 802.
- the vehicle 802 may include a vehicle computing device 804. one or more sensor systems 806, one or more emitters 808, one or more communication connections 810, at least one direct connection 812, and one or more drive system(s) 814.
- the vehicle computing device 804 may include one or more processors 816 and memory 818 communicatively coupled with the one or more processors 816.
- the vehicle 802 is an autonomous vehicle; however, the vehicle 802 could be any other type of vehicle, such as a semi -autonomous vehicle, or any other system having at least an image capture device (e.g., a camera enabled smartphone).
- the autonomous vehicle 802 may be an autonomous vehicle configured to operate according to a Level 5 classification issued by the U.S. National Highway Traffic Safety Administration, which describes a vehicle capable of performing all safety-critical functions for the entire trip, with the driver (or occupant) not being expected to control the vehicle at any time.
- the autonomous vehicle 802 may be a fully or partially autonomous vehicle having any other level or classification.
- the vehicle computing device 804 may store sensor data associated with actual location of an object at the end of the set of estimated states (e.g., end of the period of time) and may use this data as training data to train one or more models.
- the vehicle computing device 804 may provide the data to a remote computing device (i.e., computing device separate from vehicle computing device such as the computing device(s) 834) for data analysis.
- the remote computing device(s) may analyze the sensor data to determine an actual location, velocity, direction of travel, or the like of the object at the end of the set of estimated states.
- the memory 818 of the vehicle computing device 804 stores a localization component 820, a perception component 822, a planning component 824, one or more system controllers 826, one or more maps 828, and a model component 830 including one or more model(s), such as a first model 832A, a second model 832B, up to an Nth model 832N (collectively “models 832”), where N is an integer.
- models 832 are depicted in FIG. 8 as residing in the memory 818 for illustrative purposes, it is contemplated that the localization component 820, a perception component 822.
- a planning component 824, one or more system controllers 826, one or more maps 828, and/or the model component 830 including the model(s) 832 may additionally, or alternatively, be accessible to the vehicle 802 (e.g., stored on, or otherwise accessible by, memory remote from the vehicle 802, such as, for example, on memory 838 of a remote computing device 834).
- the model(s) 832 can provide functionality associated with the prediction component 104.
- the model(s) 832 can include one or more of an encoder, a quantizer, a codebook, a decoder, a Transfomrer model, a machine learned model, and so on.
- the localization component 820 may include functionality to receive data from the sensor system(s) 806 to determine a position and/or orientation of the vehicle 802 (e.g., one or more of an x-, y-, z-position, roll, pitch, or yaw).
- the localization component 820 may include and/or request / receive a map of an environment, such as from map(s) 828 and/or map component 844, and may continuously determine a location and/or orientation of the autonomous vehicle within the map.
- the localization component 820 may utilize SLAM (simultaneous localization and mapping), CLAMS (calibration, localization and mapping, simultaneously), relative SLAM, bundle adjustment, non-linear least squares optimization, or the like to receive image data, lidar data, radar data, IMU data, GPS data, wheel encoder data, and the like to accurately determine a location of the autonomous vehicle.
- the localization component 820 may provide data to various components of the vehicle 802 to determine an initial position of an autonomous vehicle for determining the relevance of an object to the vehicle 802, as discussed herein.
- the perception component 822 may include functionality to perform object detection, segmentation, and/or classification.
- the perception component 822 may provide processed sensor data that indicates a presence of an object (e.g., entity) that is proximate to the vehicle 802 and/or a classification of the object as an object type (e.g., car. pedestrian, cyclist, animal, building, tree, road surface, curb, sidewalk, unknown, etc.).
- the perception component 822 may provide processed sensor data that indicates a presence of a stationary entity that is proximate to the vehicle 802 and/or a classification of the stationary entity as a type (e.g., building, tree, road surface, curb, sidewalk, unknown, etc.).
- the perception component 822 may provide processed sensor data that indicates one or more features associated with a detected object (e.g., a tracked object) and/or the environment in which the object is positioned.
- features associated with an object may include, but are not limited to, an x-position (global and/or local position), a y-position (global and/or local position), a z-position (global and/or local position), an orientation (e.g., a roll, pitch, yaw), an object type (e.g., a classification), a velocity of the object, an acceleration of the object, an extent of the object (size), etc.
- Features associated with the environment may include, but are not limited to, a presence of another object in the environment, a state of another object in the environment, a time of day, a day of a week, a season, a weather condition, an indication of darkness/light, etc.
- the planning component 824 may determine a path for the vehicle 802 to follow to traverse through an environment. For example, the planning component 824 may determine various routes and trajectories and various levels of detail. For example, the planning component 824 may determine a route to travel from a first location (e.g., a current location) to a second location (e.g.. a target location). For the purpose of this discussion, a route may include a sequence of waypoints for travelling between two locations. As non-limiting examples, waypoints include streets, intersections, global positioning system (GPS) coordinates, etc. Further, the planning component 824 may generate an instruction for guiding the autonomous vehicle along at least a portion of the route from the first location to the second location.
- GPS global positioning system
- the planning component 824 may determine how to guide the autonomous vehicle from a first w aypoint in the sequence of waypoints to a second waypoint in the sequence of waypoints.
- the instruction may be a trajectory', or a portion of a trajectory'.
- multiple trajectories may be substantially simultaneously generated (e.g., within technical tolerances) in accordance with a receding horizon technique, wherein one of the multiple trajectories is selected for the vehicle 802 to navigate.
- the planning component 824 may include a prediction component to generate predicted trajectories of objects (e.g., objects) in an environment and/or to generate predicted candidate trajectories for the vehicle 802. For example, a prediction component may generate one or more predicted trajectories for objects within a threshold distance from the vehicle 802. In some examples, a prediction component may measure a trace of an object and generate a trajectory for the obj ect based on observed and predicted behavior.
- objects e.g., objects
- a prediction component may measure a trace of an object and generate a trajectory for the obj ect based on observed and predicted behavior.
- the vehicle computing device 804 may include one or more system controllers 826. which may be configured to control steering, propulsion, braking, safety, emitters, communication, and other systems of the vehicle 802.
- the system controller(s) 826 may communicate with and/or control corresponding systems of the drive system(s) 814 and/or other components of the vehicle 802.
- the memory 818 may further include one or more maps 828 that may be used by the vehicle 802 to navigate within the environment.
- a map may be any number of data structures modeled in two dimensions, three dimensions, or N-dimensions that are capable of providing information about an environment, such as, but not limited to, topologies (such as intersections), streets, mountain ranges, roads, terrain, and the environment in general.
- a map may include, but is not limited to: texture information (e.g., color information (e.g., RGB color information.
- a map may include a three- dimensional mesh of the environment.
- the vehicle 802 may be controlled based at least in part on the map(s) 828. That is, the map(s) 828 may be used in connection w ith the localization component 820. the perception component 822. and/or the planning component 824 to determine a location of the vehicle 802, detect objects in an environment, generate routes, determine actions and/or traj ectories to navigate within an environment.
- the one or more maps 828 may be stored on a remote computing device(s) (such as the computing device(s) 834) accessible via network(s) 840.
- multiple maps 828 may be stored based on, for example, a characteristic (e.g., type of entity, time of day, day of week, season of the year, etc.). Storing multiple maps 828 may have similar memory requirements, but increase the speed at which data in a map may be accessed.
- the vehicle computing device 804 may include a model component 830.
- the model component 830 may be configured to perform the functionality of the prediction component 104, including predicting object trajectories, scene data, and/or heat maps based at least in part on tokens associated with a codebook.
- the model component 830 may receive one or more features associated with the detected object(s) from the perception component 822 and/or from the sensor system(s) 806.
- the model component 830 may receive environment characteristics (e.g., environmental factors, etc.) and/or weather characteristics (e.g., weather factors such as snow, rain, ice, etc.) from the perception component 822 and/or the sensor system(s) 806. While shown separately in FIG. 8, the model component 830 could be part of the planning component 824 or other component(s) of the vehicle 802.
- the model component 830 may send predictions from the one or more models 832 that may be used by the planning component 824 to generate one or more predicted trajectories of the object (e.g., direction of travel, speed, etc.) and/or one or more predicted trajectories of the object (e.g., direction of travel, speed, etc ), such as from the prediction component thereof.
- the planning component 824 may determine one or more actions (e.g.. reference actions and/or sub-actions) for the vehicle 802, such as vehicle candidate trajectories.
- the model component 830 may be configured to determine whether an object occupies a future position based at least in part on the one or more actions for the vehicle 802.
- the model component 830 may be configured to determine the actions that are applicable to the environment, such as based on environment characteristics, weather characteristics, another object, or the like.
- the model component 830 may generate sets of estimated states of the vehicle and one or more detected objects forward in the environment over a time period.
- the model component 830 may generate a set of estimated states for each action (e.g., reference action and/or sub-action) determined to be applicable to the environment.
- the sets of estimated states may include one or more estimated states, each estimated state including an estimated position of the vehicle and an estimated position of a detected object(s).
- the estimated positions may be detennined based on a detected trajectory and/or predicted trajectories associated with the object.
- the estimated positions may be determined based on an assumption of substantially constant velocity and/or substantially constant trajectory (e g., little to no lateral movement of the object).
- the estimated positions (and/or potential trajectories) may be based on passive and/or active prediction.
- the model component 830 may utilize physics and/or geometry based techniques, machine learning, linear temporal logic, tree search methods, heat maps, and/or other techniques for determining predicted trajectories and/or estimated positions of objects.
- the estimated states may be generated periodically throughout the time period.
- the model component 830 may generate estimated states at 0.1 second intervals throughout the time period.
- the model component 830 may generate estimated states at 0.05 second intervals.
- the estimated states may be used by the planning component 824 in determining an action for the vehicle 802 to take in an environment.
- the model component 830 may utilize machine learned techniques to predict object trajectories and scene data.
- the machine learned algorithms may be trained to determine, based on sensor data and/or previous predictions by the model, that an object is likely to behave in a particular way relative to the vehicle 802 at a particular time during a set of estimated states (e.g., time period).
- one or more of the vehicle 802 state (position, velocity, acceleration, trajectory, etc.) and/or the object state, classification, etc. may be input into such a machine learned model and, in turn, a trajectory prediction may be output by the model.
- characteristics associated with each object type may be used by the model component 830 to determine a trajectory, a velocity, or an acceleration associated with the object.
- characteristics of an obj ect type may include, but not be limited to: a maximum longitudinal acceleration, a maximum lateral acceleration, a maximum vertical acceleration, a maximum speed, maximum change in direction for a given speed, and the like.
- the components discussed herein e.g., the localization component 820. the perception component 822, the planning component 824, the one or more system controllers 826, the one or more maps 828, the model component 830 including the model(s) 832 are described as divided for illustrative purposes. However, the operations performed by the various components may be combined or performed in any other component.
- such an architecture can include a first computing device to control the vehicle 802 and a secondary safety system that operates on the vehicle 802 to validate operation of the primary system and to control the vehicle 802 to avoid collisions.
- aspects of some or all of the components discussed herein may include any models, techniques, and/or machine learned techniques.
- the components in the memory 818 (and the memory 838, discussed below) may be implemented as a neural network.
- an exemplary neural network is a technique which passes input data through a series of connected layers to produce an output.
- Each layer in a neural network may also comprise another neural network, or may comprise any number of layers (whether convolutional or not).
- a neural network may utilize machine learning, which may refer to a broad class of such techniques in which an output is generated based on learned parameters.
- machine learning techniques may include, but are not limited to, regression techniques (e.g., ordinary least squares regression (OLSR), linear regression, logistic regression, stepwise regression, multivariate adaptive regression splines (MARS), locally estimated scatterplot smoothing (LOESS)), instance-based techniques (e.g., ridge regression, least absolute shrinkage and selection operator (LASSO), elastic net, least-angle regression (LARS)), decisions tree techniques (e.g., classification and regression tree (CART), iterative dichotomiser 3 (ID3), Chi-squared automatic interaction detection (CHAID). decision stump, conditional decision trees).
- regression techniques e.g., ordinary least squares regression (OLSR), linear regression, logistic regression, stepwise regression, multivariate adaptive regression splines (MARS), locally estimated scatterplot smoothing (LOESS)
- instance-based techniques e.g., ridge regression, least absolute shrinkage and selection operator (LASSO), elastic net, least-angle regression (LARS)
- decisions tree techniques e.g., classification and
- Bayesian techniques e.g., naive Bayes, Gaussian naive Bayes, multinomial naive Bayes, average one- dependence estimators (AODE), Bayesian belief network (BNN), Bayesian networks
- clustering techniques e.g., k-means, k-medians, expectation maximization (EM), hierarchical clustering
- association rule learning techniques e.g., perceptron, back-propagation, hopfield network.
- RBFN Radial Basis Function Network
- DBM Deep Boltzmann Machine
- DBN Deep Belief Networks
- CNN Convolutional Neural Network
- PCA Principal Component Analysis
- PCR Principal Component Regression
- PLSR Partial Least Squares Regression
- MDS Multidimensional Scaling
- LDA Linear Discriminant Analysis
- MDA Mixture Discriminant Analysis
- QDA Quadratic Discriminant Analysis
- FDA Flexible Discriminant Analysis
- Ensemble Techniques e.g., Boosting, Bootstrapped Aggregation (Bagging), AdaBoost, Stacked Generalization (blending), Gradient Boosting Machines (GBM).
- Gradient Boosted Regression Trees GBRT
- Random Forest Random Forest
- SVM support vector machine
- supervised learning unsupervised learning
- semi-supervised learning etc.
- Additional examples of architectures include neural networks such as ResNet50, ResNetlOl, VGG, DenseNet, PointNet, and the like.
- the sensor system(s) 806 may include lidar sensors, radar sensors, ultrasonic transducers, sonar sensors, location sensors (e.g.. GPS, compass, etc.), inertial sensors (e.g., inertial measurement units (IMUs), accelerometers, magnetometers, gyroscopes, etc.), cameras (e.g., RGB, IR, intensity, depth, time of flight, etc.), microphones, wheel encoders, environment sensors (e.g., temperature sensors, humidity sensors, light sensors, pressure sensors, etc.), etc.
- the sensor system(s) 806 may include multiple instances of each of these or other types of sensors.
- the lidar sensors may include individual lidar sensors located at the comers, front, back, sides, and/or top of the vehicle 802.
- the camera sensors may include multiple cameras disposed at various locations about the exterior and/or interior of the vehicle 802.
- the sensor system(s) 806 may provide input to the vehicle computing device 804. Additionally, or in the alternative, the sensor system(s) 806 may send sensor data, via the one or more networks 840, to the one or more computing device(s) 834 at a particular frequency, after a lapse of a predetermined period of time, in near real-time, etc.
- the vehicle 802 may also include one or more emitters 808 for emitting light and/or sound.
- the emitters 808 may include interior audio and visual emitters to communicate with passengers of the vehicle 802.
- interior emitters may include speakers, lights, signs, display screens, touch screens, haptic emitters (e.g., vibration and/or force feedback), mechanical actuators (e.g., seatbelt tensioners, seat positioners, headrest positioners, etc.), and the like.
- the emitter(s) 808 may also include exterior emitters.
- the exterior emitters may include lights to signal a direction of travel or other indicator of vehicle action (e g., indicator lights, signs, light arrays, etc.), and one or more audio emitters (e.g., speakers, speaker arrays, horns, etc.) to audibly communicate with pedestrians or other nearby vehicles, one or more of which comprising acoustic beam steering technology.
- lights to signal a direction of travel or other indicator of vehicle action e g., indicator lights, signs, light arrays, etc.
- audio emitters e.g., speakers, speaker arrays, horns, etc.
- the vehicle 802 may also include one or more communication connections 810 that enable communication between the vehicle 802 and one or more other local or remote computing device(s).
- the communication connection(s) 810 may facilitate communication with other local computing device(s) on the vehicle 802 and/or the drive system(s) 814.
- the communication connection(s) 810 may allow the vehicle to communicate with other nearby computing device(s) (e.g., remote computing device 834, other nearby vehicles, etc.) and/or one or more remote sensor system(s) 842 for receiving sensor data.
- the communications connect on(s) 810 also enable the vehicle 802 to communicate with a remote teleoperations computing device or other remote services.
- the communications connection(s) 810 may include physical and/or logical interfaces for connecting the vehicle computing device 804 to another computing device or a network, such as network(s) 840.
- the communications connection(s) 810 can enable Wi-Fi-based communication such as via frequencies defined by the IEEE 802. 11 standards, short range wireless frequencies such as Bluetooth, cellular communication (e.g., 2G, 3G, 4G, 4G LTE. 5G, etc.) or any suitable wired or wireless communications protocol that enables the respective computing device to interface with the other computing device(s).
- the vehicle 802 may include one or more drive systems 814.
- the vehicle 802 may have a single drive system 814.
- individual drive systems 814 may be positioned on opposite ends of the vehicle 802 (e.g., the front and the rear, etc.).
- the drive system(s) 814 may include one or more sensor systems to detect conditions of the drive system(s) 814 and/or the surroundings of the vehicle 802.
- the sensor system(s) may include one or more wheel encoders (e.g., rotary encoders) to sense rotation of the wheels of the drive modules, inertial sensors (e.g., inertial measurement units, accelerometers, gyroscopes, magnetometers, etc.) to measure orientation and acceleration of the drive module, cameras or other image sensors, ultrasonic sensors to acoustically detect objects in the surroundings of the drive module, lidar sensors, radar sensors, etc.
- Some sensors, such as the wheel encoders may be unique to the drive system(s) 814. In some cases, the sensor system(s) on the drive system(s) 814 may overlap or supplement corresponding systems of the vehicle 802 (e.g., sensor system(s) 806).
- the drive system(s) 814 may include many of the vehicle systems, including a high voltage batten-, a motor to propel the vehicle, an inverter to convert direct current from the battery- into alternating current for use by other vehicle systems, a steering system including a steering motor and steering rack (which can be electric), a braking system including hydraulic or electric actuators, a suspension system including hydraulic and/or pneumatic components, a stability control system for distributing brake forces to mitigate loss of traction and maintain control, an HVAC system, lighting (e.g., lighting such as head/tail lights to illuminate an exterior surrounding of the vehicle), and one or more other systems (e.g., cooling system, safety systems, onboard charging system, other electrical components such as a DC/DC converter, a high voltage junction, a high voltage cable, charging system, charge port, etc.).
- a high voltage batten- e.g., a motor to propel the vehicle
- an inverter to convert direct current from the battery- into alternating current for use by other vehicle systems
- a steering system
- the drive system(s) 814 may include a drive module controller which may receive and preprocess data from the sensor system(s) and to control operation of the various vehicle systems.
- the drive module controller may include one or more processors and memory communicatively coupled with the one or more processors.
- the memory may store one or more modules to perform various functionalities of the drive system(s) 814.
- the drive system(s) 814 may also include one or more communication connection(s) that enable communication by the respective drive module with one or more other local or remote computing device(s).
- the direct connection 812 may provide a physical interface to couple the one or more drive system(s) 814 with the body of the vehicle 802.
- the direct connection 812 may allow the transfer of energy, fluids, air, data, etc. between the drive system(s) 814 and the vehicle.
- the direct connection 812 may further releasably secure the drive system(s) 814 to the body of the vehicle 802.
- the localization component 820, the perception component 822, the planning component 824, the one or more system controllers 826, the one or more maps 828, and the model component 830 may process sensor data, as described above, and may send their respective outputs, over the one or more network(s) 840, to the computing device(s) 834.
- the localization component 820, the perception component 822, the planning component 824, the one or more system controllers 826, the one or more maps 828, and the model component 830 may send their respective outputs to the remote computing device(s) 834 at a particular frequency, after a lapse of a predetermined period of time, in near real-time, etc.
- the vehicle 802 may send sensor data to the computing device(s) 834 via the network(s) 840. In some examples, the vehicle 802 may receive sensor data from the computing device(s) 834 and/or remote sensor system(s) 842 via the network(s) 840.
- the sensor data may include raw sensor data and/or processed sensor data and/or representations of sensor data. In some examples, the sensor data (raw or processed) may be sent and/or received as one or more log files.
- the computing device(s) 834 may include processor(s) 836 and a memory 838 storing the map component 844, a sensor data processing component 846, and a training component 848.
- the map component 844 may include functionality to generate maps of various resolutions. In such examples, the map component 844 may send one or more maps to the vehicle computing device 804 for navigational purposes.
- the sensor data processing component 846 may be configured to receive data from one or more remote sensors, such as sensor system(s) 806 and/or remote sensor system(s) 842.
- the sensor data processing component 846 may be configured to process the data and send processed sensor data to the vehicle computing device 804. such as for use by the model component 830 (e.g., the model(s) 832).
- the sensor data processing component 846 may be configured to send raw sensor data to the vehicle computing device 804.
- the training component 848 (e.g., trained in accordance with the techniques discussed in FIG. 4) can include functionality to train a machine learning model to output probabilities for whether an occluded region is free of any objects or whether the occluded region is occupied by a static obstacle or a dynamic object.
- the training component 848 can receive sensor data that represents an object traversing through an environment for a period of time, such as 0. 1 milliseconds, 1 second, 3, seconds, 5 seconds, 7 seconds, and the like. At least a portion of the sensor data can be used as an input to train the machine learning model.
- the training component 848 may be executed by the processor(s) 836 to train a machine learning model based on training data.
- the training data may include a wide variety of data, such as sensor data, audio data, image data, map data, inertia data, vehicle state data, historical data (log data), or a combination thereof, that is associated with a value (e.g., a desired classification, inference, prediction, etc.). Such values may generally be referred to as a “ground truth.”
- the training data may be used for determining risk associated with occluded regions and, as such, may include data representing an environment that is captured by an autonomous vehicle and that is associated with one or more classifications or determinations.
- such a classification may be based on user input (e.g., user input indicating that the data depicts a specific risk) or may be based on the output of another machine learned model.
- labeled classifications (or more generally, the labeled output associated with training data) may be referred to as ground truth.
- the training component 848 can include functionality to train a machine learning model to output classification values.
- the training component 848 can receive data that represents labelled collision data (e g. publicly available data, sensor data, and/or a combination thereof). At least a portion of the data can be used as an input to train the machine learning model.
- the training component 848 can be trained to output occluded value(s) associated with objects and/or occluded region(s), as discussed herein.
- the training component 848 can include training data that has been generated by a simulator.
- simulated training data can represent examples where a vehicle collides with an object in an environment or nearly collides with an object in an environment, to provide additional training examples.
- the processor(s) 816 of the vehicle 802 and the processor(s) 836 of the computing device(s) 834 may be any suitable processor capable of executing instructions to process data and perform operations as described herein.
- the processor(s) 816 and 836 may comprise one or more Central Processing Units (CPUs). Graphics Processing Units (GPUs), or any other device or portion of a device that processes electronic data to transform that electronic data into other electronic data that may be stored in registers and/or emory.
- integrated circuits e.g., ASICs, etc.
- gate arrays e.g., FPGAs, etc.
- other hardware devices may also be considered processors in so far as they are configured to implement encoded instructions.
- Memory 818 and memory 838 are examples of non-transitory computer-readable media.
- the memory 818 and memory' 838 may store an operating system and one or more software applications, instructions, programs, and/or data to implement the methods described herein and the functions attributed to the various systems.
- the memory may be implemented using any suitable memory technology, such as static random access memory (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory capable of storing information.
- SRAM static random access memory
- SDRAM synchronous dynamic RAM
- Flash-type memory any other type of memory capable of storing information.
- the architectures, systems, and individual elements described herein may include many other logical, programmatic, and physical components, of which those shown in the accompanying figures are merely examples that are related to the discussion herein.
- FIG. 8 is illustrated as a distributed system, in alternative examples, components of the vehicle 802 may be associated with the computing device(s) 834 and/or components of the computing device(s) 834 may be associated with the vehicle 802. That is, the vehicle 802 may perform one or more of the functions associated with the computing device(s) 834, and vice versa.
- FIG. 9 is a flowchart depicting an example process 900 for determining an object trajectory using one or more example models. Some or all of the process 900 may be performed by one or more components in FIG. 8, as described herein. For example, some or all of process 900 may be performed by the vehicle computing device 804.
- the process may include receiving, by a Transformer model, a request to generate a simulated environment that includes a vehicle and an object.
- the vehicle computing device 804 can initiate a simulation or otherwise send an instruction requesting that the model component 830 generate object trajectories for one or more objects in an environment of a vehicle.
- the Transformer model e.g., the Transformer model 204
- the tokens can be based at least in part on sensor data from the perception component 822 and map data from the map(s) 828 and/or the map component 844.
- the vehicle computing device may be configured to receive sensor data representing one or more objects in an environment (e.g., vehicle 110).
- the vehicle computing device may be configured to detect dynamic objects and/or static objects and combine the associated sensor data with map data.
- map data may represent fixed features of an environment including but not limited to crosswalks, traffic signals, school zones, and the like.
- the model component 830 may also or instead receive object state data such as position data, orientation data, heading data, velocity data, speed data, acceleration data, yaw rate data, or turning rate data associated with the object.
- the process may include accessing, by the Transformer model and based at least in part on the request, tokens from a codebook, at least one token in the codebook representing a behavior of the object.
- the Transformer model 204 can retrieve or otherwise receive tokens from the codebook 202 based at least in part on receiving the request.
- the process may include arranging, by the Transformer model, the tokens into a sequence of tokens.
- the Transformer model 204 can output the token sequence 206 (or feature vector associated with the token sequence).
- the Transfonner model 204 can apply an autoregressive technique and/or implement a self-attention layer to sample tokens from the codebook or to arrange the tokens in a particular order relative to one another.
- the process may include inputting the sequence of tokens into a machine learned model.
- the Transfonner model 204 can output the token sequence 206 which can be input into the machine learned model 208 (e.g., a GAN, a GNN. another Transfonner model, etc ).
- the codebook 202 can map feature vectors to the tokens in the token sequence 206 and send the feature vectors (instead of or in addition to the token sequence 206) to the machine learned model 208.
- the process may include generating, by the machine learned model, the simulated environment that includes an object trajectory for the object.
- the operation can include the machine learned model 208 using the input from the codebook 202 (e.g., feature vectors) and/or the token sequence 206 from the Transformer model 204 to determine one or more object trajectories for one or more objects in a simulated environment (e.g., the object trajectory' 116).
- the machine learned model 208 can output the predicted trajectory associated with the object based at least in part on determining a relationship between one feature vector relative to another feature vector or interactions between tokens in the token sequence.
- the machine learned model 208 can validate or modify associations between feature vectors determined by the Transformer model 204 and determine scene data that includes one or more obj ect traj ectories for one or more obj ects in the scene.
- the process may 7 include causing the vehicle to be controlled in a real-w orld environment based at least in part on the object trajectory'.
- the operation may include sending the output of the machine learned model to a vehicle computing device of the vehicle.
- the vehicle computing device is configured to determine a trajectory for the vehicle (e.g.. trajectory 118 for the vehicle 102) based at least in part on the output.
- a trajectory for the vehicle e.g.. trajectory 118 for the vehicle 102
- an output from the model component 830 can be sent to the perception component 822 or the planning component 824, just to name a few.
- the vehicle computing device may control operation of the vehicle, such as the planning component 824.
- the vehicle computing device may determine a vehicle trajectory based at least in part on the object trajectory(ies) thereby improving vehicle safety by planning for the possibility' that the object may intersect with the vehicle in the future. Additional details of controlling a vehicle using one or more outputs from one or more modes are discussed throughout the disclosure.
- FIG. 10 is flowchart depicting an example process 1000 for training a codebook using an example training component. Some or all of the process 1000 may be perfonned by one or more components in FIG. 8, as described herein. For example, some or all of process 1000 may be performed by the training component 502, the training component 702, and/or the training component 848.
- the process may include receiving, by a training component, state data representing a previous state of an object in an environment.
- the training component 502 can receive vectorized data representing an object, a building, an intersection, a crosswalk, and the like (e.g., the vector representation 322).
- the vehicle computing device may be configured to receive object state data representing position data, orientation data, heading data, velocity data, speed data, acceleration data, yaw rate data, or turning rate data associated with the object at one or more previous times.
- map data can be received that represents fixed features of the environment including but not limited to crosswalks, traffic signals, school zones, and the like.
- the process may include receiving, by the training component, feature vectors representing a vehicle and the object in the environment.
- the training component 502 can receive vectorized data representing a vehicle state data such as position data, orientation data, heading data, velocity data, speed data, acceleration data, yaw rate data, or turning rate data associated with the vehicle 102.
- the process may include training a codebook as a trained codebook, the training comprising at operation 1008 assigning, based at least in part on the state data, a first token to represent the previous state of the object.
- the training component 502 can be configured to assign, generate, or otherwise determine one or more tokens to represent different previous states of the object (e.g., a heading, a speed, a position, etc.).
- the training component 502 can also or instead determine tokens to represent states of the vehicle 102 based at least in part on the vehicle state data associated with a previous, cunent, or future time.
- the process may include mapping the feature vectors to respective tokens.
- the training component 502 can determine a list or table that associates a feature vector (discrete or continuous) with a token (discrete).
- the training component 502 can cause the codebook to store mapping data associated with a mapping between a feature vector and a token in a memory.
- the mapping data can be accessed by the codebook to identify feature vector(s) to represent each token in a token sequence or to identify a token to represent a feature vector.
- the process may include outputting the trained codebook for use by a machine learned model configured to access tokens from the trained codebook and to arrange the tokens to represent potential interactions between the vehicle and the object.
- the machine learned model 208 can access tokens from the codebook 506 and arrange the tokens in a sequence based at least in part on applying one or more algorithms (e.g., a self-attention algorithm, etc.).
- An output of the machine learned model 208 can be used by a vehicle computing device that is configured to control the autonomous vehicle (e.g., detennine a trajectory, control a drive system, a brake system, and so on).
- the trained codebook can include a number of tokens determined based at least in part on a hyperparameter specifying a total number of tokens to include in the trained codebook.
- the training component 502 can determine a number of tokens to include in the codebook. For example, the training component 502 can determine a minimum number of tokens and/or a maximum number of tokens for including in the codebook. In some examples, the training component 502 can receive training data indicating accuracy of determinations using different numbers of tokens in a token sequence to accurately represent a scene or predict an object trajectory. In some examples, the training component 502 can output the trained codebook with consideration to a number of tokens, a length of a token sequence, and/or a size of a token to make efficient use of computational resources.
- FIG. 11 illustrates an example block diagram 1100 of an example computer architecture for implementing techniques to generate example output data, as described herein.
- the computing device e.g.. the vehicle computing device(s) 804 and/or the computing device(s) 834
- the diffusion model 1102 can represent a machine learned model that implements a diffusion process to add and/or remove noise from an input.
- the diffusion model 1102 can incrementally denoise data to generate an output based on a conditional input.
- the diffusion model 1102 can denoise the input data 1104 representing random latent variables associated with an object, map data associated with an environment, just to name a few.
- the diffusion model 1102 can output the output data 1108 representing discrete latent variable data describing a state or intent of an object.
- the diffusion model 1102 can determine the output data 1108 based at least in part on the input data 1104 and the condition data 1106.
- the diffusion model 1102 can condition the input data 1104 based at least in part on one or more of: token information from a transformer model, node information from a GNN, scene information or other historical data.
- Token information can represent one or more tokens associated with objects in an environment including, in some examples, a token for an autonomous vehicle, a token to represent scene conditions, etc.
- Node information can include a node of a graph network associated with an object. Nodes or tokens of different objects can be used to condition the diffusion model 1102 so that the output data 1108 represents different object states (e.g., a position, a trajectory, an orientation, and the like).
- the diffusion model 1102 can represent functionality performed by (or otherwise in included as) the machine learned model 208 of FIG. 2.
- variable autoencoder 1110 comprises an encoder 1112 and a decoder 1114, which can be trained to perform a variety of functionality as described herein.
- the encoder 1112 can include the functionality performed by the encoder 304 of FIG. 3, and the decoder 1114 can include the functionality of the decoder 318.
- the encoder 1112 can be trained with ground truth object trajectories to output and embedding per object in latent space.
- the diffusion model 1102 can be trained using the output from the encoder 1112 (e.g., an embedding per object) to generate discrete object states (e.g., a position, a trajectory’, an orientation, etc.) usable as initial states for a simulation with an autonomous vehicle.
- the encoder 11 12 and/or the decoder 1114 can, for example, represent a machine learned model such as a GNN, RNN, CNN, multilayer perceptron (MLP), and the like.
- the variable autoencoder 1110 can perform the functionality' associated with the prediction component 104 of FIG. 1.
- the decoder 1114 can be configured to predict object behavior (e.g.. states, intents, etc.) for multiple objects in an environment with consideration to how the objects can potentially interact with one another.
- the decoder 1114 can be configured to determine one or more object trajectories for different objects in an environment at a future time.
- the decoder 1114 can output a scene 1116 as part of a scenario (also referred to herein as a simulation).
- the variable autoencoder 1110 can be included in a computing device configured to generate scene data for use in a simulation.
- the computing device can be included in a testing environment that generates diverse scenes (e.g., scenes having different object interactions) such that simulations over time capture a variety of different object trajectories and likelihoods for different obj ect interactions (e. g. , an obj ect occupying a same area of an environment in the future).
- the computing device can be included in a robotic device such as an autonomous vehicle operating in a real-world environment, and an output by the decoder 1114 can be used to control operation of the autonomous vehicle.
- variable autoencoder 1110 can be included in the vehicle computing device(s) 804 to predict potential object interactions between one or more objects and the vehicle 802.
- the decoder 1114 can detect an object 1118, and an object 1120, and an object 1122, and classify the objects 1118 and 1120 as a vehicle and the object 1122 as a pedestrian.
- the decoder 1114 can determine an object trajectory 1124 associated with the vehicle 1118, an object trajectory 1126 associated with the vehicle 1120, and an object trajectory 1128 associated with the pedestrian 1122.
- the object trajectories 1124, 1126, and 1128 can be used to determine potential interactions between the vehicle 1118, the vehicle 1120, and/or the object 1122.
- an output from the decoder 1114 can be sent to a component of the vehicle computing device(s) 804 to control the autonomous vehicle 802 in an environment (e.g., a real-world environment or a simulated environment).
- an environment e.g., a real-world environment or a simulated environment.
- data associated with the object trajectories 1124, 1126, and 1128 can be sent to the planning component 824 for consideration in planning operations (e.g., determining a vehicle trajectory).
- the output from the decoder 1114 can be used for training or generating scene data for use in different scenarios.
- FIG. 12 illustrates another example block diagram 1200 of an example computer architecture for implementing techniques to generate example output data, as described herein.
- the diffusion model 1102 of FIG. 11 can provide the output data 1108 to the vehicle 102 for use by the prediction component 104 of FIG. 1.
- the prediction component 104 can use the output data 1108 to determine the vehicle trajectory 118 for the vehicle 102 to safely navigate relative to the object 1 10.
- the diffusion model 1102 can receive the condition data 1106 from a machine learned model 1202.
- the machine learned model 1202 can include one or more self-attention layers for determining "attention ' or a relation between a first object and a second object.
- the machine learned model 1202 can be a transformer (e.g., the machine learned model 314, the transformer model 606) or a GNN, but other machine learned model types are also contemplated.
- FIG. 13 illustrates an example block diagram 1300 of an example diffusion model implemented by a computing device to generate latent variable data.
- a computing device e.g., the vehicle computing device(s) 804 can implement the diffusion model 1102 to generate the condition data 1106 for use by a machine learned model such as the variable autoencoder 1110 of FIG. 11.
- the output data 1106 can be transmitted to the prediction model 104, the machine learned model 208, or another machine learned model.
- the diffusion model 1302 comprises latent space 1304 for performing various steps (also referred to as operations) including adding noise to input data during training (shown as part of the “diffusion process” in FIG. 13) and/or removing noise from input data during non-training operations.
- the diffusion model 1302 can receive condition data 1306 for use during different diffusion steps to condition the input data, as discussed herein.
- the condition data 1306 can represent one or more of: a semantic label, text, an image, an object representation, an object behavior, a vehicle representation, historical information associated with an object and/or the vehicle, a scene label indicating a level of difficulty to associate with a simulation, an environment attribute, a control policy, or object interactions, just to name a few.
- the condition data 1306 can include a semantic label such as token information, node information, and the like.
- the condition data 1306 can include, for example, text or an image describing an object, a scene, and/or a vehicle.
- the condition data 1306 can be a representation and/or a behavior associated with one or more objects in an environment.
- the condition data 1306 may also or instead represent environmental attributes such as weather conditions, traffic laws, time of day, or data describing an object such as whether another vehicle is using a blinker or a pedestrian is looking towards the autonomous vehicle.
- the condition data 1306 represents one or more control policies that control a simulation (or object interactions thereof).
- the condition data 1306 can include specifying an object behavior, such as a level of aggression for a simulation that includes an autonomous vehicle.
- FIG. 13 depicts the variable autoencoder 1110 associated with pixel space 1308 that includes an encoder 1310 and a decoder 1312.
- the encoder 1310 can be configured similar to the encoder 1112 and the decoder 1312 can be configured similar to the decoder 1114.
- the encoder 1310 and the decoder 1312 can represent an RNN or a multilayer perceptron (MLP).
- MLP multilayer perceptron
- the encoder 1310 can receive an input (x) 1314 (e.g., an object trajectory, map data, object state data, or other input data), and output embedded information Z in the latent space 1304.
- the embedded information Z can include a feature vector for each object to represent a trajectory, a pose, an attribute, a past trajectoiy. etc.
- the input (x) 1314 can represent a top-down representation of an environment including a number of objects (e.g., can be determined by the condition data 1306).
- the input (x) 1314 can represent the input data 302 of FIG. 3.
- the '‘diffusion process’’ can include applying an algorithm to apply noise to the embedded information Z to output a noisy latent embedding Z(T).
- the noisy latent embedding Z(T) e.g., a representation of the input (x) 1314) can be input into a de-noising neural network 1316.
- the diffusion model 1302 can initialize the noisy latent embedding Z(T) with random noise, and the de-noising neural network 1316 (e.g., a CNN, a GNN, etc.) can apply one or more algorithms to determine an object intent based on applying different noise for different passes, or steps, to generate latent variable data that represents an object intent in the future. In some examples, multiple objects and object intents can be considered during denoising operations.
- input to the de-noising neural network 1316 can include a graph of nodes in which at least some nodes represent respective objects.
- the input data can be generated with random features for each obj ect, and the de-noising neural network 1316 can include performing graph message passing operations for one or more diffusion steps.
- the de-noising neural network 1316 can determine an object intent (e.g., a position. a trajectory, an orientation, etc.) for an object with consideration to the intent of other objects.
- object intent e.g., a position. a trajectory, an orientation, etc.
- the condition data 1306 can be used by the diffusion model 1302 in a variety of ways including being concatenated with the noisy latent embedding Z(T) as input into the de-noising neural network 1316.
- the condition data 1306 can be input during a de-noising step 1318 applied to an output of the de-noising neural network 1316.
- the de-noising step 1318 represents steps to apply the condition data 1306 over time to generate the embedded information Z which can be output to the decoder 1312 for use as initial states in a simulation that determines an output 1320 representative of an object trajectory, or another predicted object state(s).
- a training component (not shown) can train the diffusion model 1302 based at least in part on a computed loss for the decoder 1312 (e.g., the ability for the decoder to produce an output that is similar to the input to the encoder). That is, the diffusion model can improve predictions over time based on being trained at least in part on a loss associated with the decoder 1312. Similarly, the decoder 1312 can be trained based at least in part on a loss associated with the diffusion model 1302.
- a computed loss for the decoder 1312 e.g., the ability for the decoder to produce an output that is similar to the input to the encoder. That is, the diffusion model can improve predictions over time based on being trained at least in part on a loss associated with the decoder 1312. Similarly, the decoder 1312 can be trained based at least in part on a loss associated with the diffusion model 1302.
- FIG. 14 is flowchart depicting an example process 1400 for determining discrete latent variable data using an example diffusion model.
- Some or all of the process 1400 may be performed by one or more components in FIG. 8. as described herein.
- some or all of process 1400 may be performed by the diffusion model 1102.
- the process may include receiving, by a diffusion model, map data representing an environment.
- a computing device can implement the diffusion model 1102 to receive map data associated with a real -world environment or a simulated environment.
- the vehicle computing device(s) 804 can receive map data from the map(s) 828 and/or from a remote computing device (the computing device(s) 834).
- the process may include receiving, by the diffusion model, condition data representing a behavior of an object in the environment.
- the diffusion model 1 102 can receive the condition data 1106 or the condition data 1306 from a machine learned model (e.g., a transformer, GNN).
- the condition data 1306 can represent a level of aggression for one or more objects relative to an autonomous vehicle during a simulation.
- the condition data 1306 can indicate a behavior for an object to exhibit during a simulation (e.g., how reactive the object should be relative to an action by the autonomous vehicle).
- the process may include determining, by the diffusion model and based at least in part on the map data and the condition data, discrete latent variable data associated with the object.
- the diffusion model 1102 can generate the output data 1108 which includes discrete latent variable data for the object.
- the discrete latent variable data can represent an attribute and/or a state of the object, scene information, and the like.
- the process may include inputting the discrete latent variable data into a machine learned model.
- the output data 1108 can be transmitted to the variable autoencoder 11 10, the machine learned model 208, or another machine learned model.
- the discrete latent variable data can be used as initial state data for operations performed by the machine learned model.
- the process may include generating, by the machine learned model and based at least in part on the discrete latent variable data, a simulated environment that includes an object trajectory for the object.
- the variable autoencoder 1110 can perform a simulation to determine an object trajectory for the object to follow in the future.
- the machine learned model can be included in a vehicle computing device associated with a vehicle such as the vehicle computing device(s) 804 associated with the vehicle 802.
- the process may include causing a vehicle to be controlled in a real-world environment based at least in part on the object trajectory generated in the simulated environment.
- the vehicle computing device(s) 804 can receive data output by the machine learned model and determine a vehicle trajectory' for the vehicle 802 to follow in a real-world environment.
- the machine learned model can represent the prediction model 104 of FIG. 1 which in configured to determine the vehicle trajectory 118 for the vehicle 102.
- FIGS. 9, 10, and 14 illustrate example processes in accordance with examples of the disclosure. These processes are illustrated as logical flow graphs, each operation of which represents a sequence of operations that can be implemented in hardware, software, or a combination thereof.
- the operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations.
- computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
- the order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be omitted or combined in any order and/or in parallel to implement the processes.
- one or more operations of the method may be omitted entirely.
- operations 904. 906, 908. and 910 may be performed without operations 902 and 912 and only one of operations 1008 or 1010 may be performed in relation to operation 1006.
- the methods described herein can be combined in whole or in part with each other or with other methods.
- program modules include routines, programs, objects, components, data structures, etc., and define operating logic for performing particular tasks or implement particular abstract data types.
- software may be stored and distributed in various ways and using different means, and the particular software storage and execution configurations described above may be varied in many different ways.
- software implementing the techniques described above may be distributed on various types of computer-readable media, not limited to the forms of memory that are specifically described.
- a system comprising: one or more processors; and one or more non-transitory computer- readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the one or more processors to perform actions comprising: receiving, by a transformer model, a request to generate a simulated environment that includes a vehicle and an object; accessing, by the transformer model and based at least in part on the request, tokens from a codebook, at least one token in the codebook representing a behavior of the object; arranging, by the transformer model, the tokens into a sequence of tokens; inputting the sequence of tokens into a machine learned model; generating, by the machine learned model, the simulated environment that includes an object trajectory for the object; and causing the vehicle to be controlled in a real-world environment based at least in part on the object trajectory.
- the first token represents one of: a yield action, a drive straight action, a left turn action, a right turn action, a brake action, an acceleration action, a steering action, or a lane change action
- the second token represents a position, a heading, or an acceleration of the object.
- a method comprising: receiving sensor data from a sensor associated with a vehicle; receiving, by a first machine learned model, a set of tokens from a codebook, at least one token in the codebook representing a potential characteristic of an object in an environment, wherein the potential characteristic of the object represents a state or an action; inputting the set of tokens into a second machine learned model; determining, by the second machine learned model and based at least in part on the set of tokens, an object trajectory for the object to follow in the environment; and causing the vehicle to be controlled in the environment based at least in part on the object trajectory.
- G The method of paragraph F, wherein: an order of the set of tokens is based at least in part on an autoregressive algorithm; the first machine learned model is a transformer model, and the second machine learned model comprises at least one of a Generative Adversarial Network (GAN), a Graph Neural Network (GNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN). or another transformer model.
- GAN Generative Adversarial Network
- GNN Graph Neural Network
- CNN Convolutional Neural Network
- RNN Recurrent Neural Network
- the first machine learned model comprises a transformer model
- the transformer model is trained based at least in part on a set of conditions, at least one condition of the set of conditions comprising a previous action, a previous position, or a previous acceleration of the object.
- the first token represents one of: a yield action, a drive straight action, a left turn action, a right turn action, a brake action, an acceleration action, a steering action, or a lane change action
- the second token represents a position, a heading, or an acceleration of the object.
- M The method of any of paragraphs F-L, wherein the set of tokens represent discrete latent variables, and further comprising: mapping the discrete latent variables associated with the set of tokens to continuous variables associated with feature vectors, wherein inputting the set of tokens into the second machine learned model comprises inputting the continuous variables associated with the feature vectors, and determining, by the second machine learned model, the object trajectory’ is based at least in part on the continuous variables associated with the feature vectors.
- N The method of any of paragraphs F-M, wherein: the first machine learned model implements an autoregressive algorithm to sample the tokens from the codebook, the first machine learned model comprises one or more self-attention layers configured to determine scores for at least some of the tokens from the codebook, a first score indicating a dependency between a first token and a second token and a second score indicating a dependency betw een a third token and one of: the first token, the second token, or a fourth token, and detemiining the set of tokens is further based at least in part on the one or more self-attention layers determining the scores.
- the first machine learned model implements an autoregressive algorithm to sample the tokens from the codebook
- the first machine learned model comprises one or more self-attention layers configured to determine scores for at least some of the tokens from the codebook, a first score indicating a dependency between a first token and a second token and a second score indicating a dependency betw een a third token and
- P The method of any of paragraphs F-O, wherein: the environment is a simulated environment, another token in the codebook represents a feature of the simulated environment, and further comprising: determining, by the second machine learned model and based at least in part on the set of tokens, scene data for testing or verifying a scenario in the simulated environment.
- Q One or more non- transitory computer-readable media storing instructions that, when executed, cause one or more processors to perform actions comprising: receiving sensor data from a sensor associated with a vehicle; receiving, by a first machine learned model, a set of tokens from a codebook, at least one token in the codebook representing a potential characteristic of an object in an environment, wherein the potential characteristic of the object represents a state or an action; inputting the set of tokens into a second machine learned model; determining, by the second machine learned model and based at least in part on the set of tokens, an object trajectory for the object to follow in the environment; and causing the vehicle to be controlled in the environment based at least in part on the object trajectory .
- R The one or more non-transitory computer-readable media of paragraph Q, wherein: an order of the set of tokens is based at least in part on an autoregressive algorithm; the first machine learned model is a transformer model, and the second machine learned model comprises at least one of a Generative Adversarial Network (GAN), a Graph Neural Network (GNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), or another transformer model.
- GAN Generative Adversarial Network
- GNN Graph Neural Network
- CNN Convolutional Neural Network
- RNN Recurrent Neural Network
- S The one or more non-transitory computer-readable media of paragraph Q or R. wherein: the first machine learned model comprises a transformer model, and the transformer model is trained based at least in part on a set of conditions, at least one condition of the set of conditions comprising a previous action, a previous position, or a previous acceleration of the object.
- T The one or more non-transitory computer-readable media of any of paragraphs Q-S, wherein: the tokens in the codebook represent discrete latent variables.
- a system comprising: one or more processors; and one or more non-transitory computer- readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the one or more processors to perform actions comprising: receiving, by a training component, state data representing a previous state of an object in an environment; receiving, by the training component, feature vectors representing a vehicle and the object in the environment; training a codebook as a trained codebook, the training comprising: assigning, based at least in part on the state data, a first token to represent the previous state of the object; and mapping the feature vectors to respective tokens; and outputting the trained codebook for use by a machine learned model configured to access tokens from the trained codebook and to arrange the tokens to represent potential interactions between the vehicle and the object, wherein the trained codebook includes a determined number of tokens.
- V The system of paragraph U, wherein the machine learned model is a transformer model, and the actions further comprising: causing the transformer model to arrange the tokens based at least in part on the number of tokens and an order of the tokens.
- W The system of paragraph U or V, wherein: the feature vectors represent continuous variables, the tokens represent discrete latent variables, and mapping the feature vectors to respective tokens comprises mapping the continuous variables associated with the feature vectors to discrete latent variables associated with the tokens.
- training the codebook further comprises: assigning a second token to represent a characteristic of the vehicle; and assigning a third token to represent a feature of the environment; wherein the machine learned model represents the potential interactions between the vehicle and the object based at least in part on the first token, the second token, and the third token.
- Y The system of any of paragraphs U-X, the actions further comprising: receiving, by the training component, environmental data representing a feature of the environment, wherein training the codebook further comprises assigning a second token to represent the feature of the environment, and the codebook is usable by the machine learned model to cause generation of a scene with the feature in the environment.
- a method comprising: receiving, by a training component, state data representing a previous state of an object or a vehicle in an environment; receiving, by the training component, feature vectors representing the vehicle and the object in the environment; training a codebook to output a trained codebook, the training comprising: assigning, based at least in part on the state data, a first token to represent the previous state of the object; and mapping the feature vectors to respective tokens; and outputting the trained codebook for use by a machine learned model configured to access tokens from the trained codebook and to arrange the tokens to represent potential interactions between the vehicle and the object.
- AA The method of paragraph Z, further comprising: determining a number of tokens to include in the codebook; and determining an order of the tokens.
- AB The method of paragraph Z or AA. further comprising: causing the machine learned model to arrange the tokens based at least in part on a number of tokens and an order of the tokens.
- AC The method of any of paragraphs Z-AB, wherein: the feature vectors represent continuous variables, the tokens represent discrete latent variables, and mapping the feature vectors to respective tokens comprises mapping the continuous variables associated with the feature vectors to discrete latent variables associated with the tokens.
- AD The method of any of paragraphs Z-AC, wherein training the codebook further comprises: assigning a second token to represent a characteristic of the vehicle; and assigning a third token to represent a feature of the environment; wherein the machine learned model represents the potential interactions between the vehicle and the object based at least in part on the first token, the second token, and the third token.
- AE The method of any of paragraphs Z-AD, further comprising: receiving, by the training component, environmental data representing a feature of the environment, wherein training the codebook further comprises assigning a second token to represent the feature of the environment, and the codebook is usable by the machine learned model to cause generation of a scene with the feature in the environment.
- AF The method of any of paragraphs Z-AE, wherein the state data representing the previous state of the obj ect or the vehicle comprising one or more of: position data, orientation data, heading data, velocity data, speed data, acceleration data, yaw rate data, or turning rate data.
- AH The method of any of paragraphs Z-AG, wherein: the feature vectors represent a characteristic of the vehicle and a characteristic of the object in the environment.
- mapping the feature vectors to respective tokens comprises: identifying an action or a state of the vehicle associated with a first feature vector of the feature vectors; comparing a characteristic of the vehicle associated with the first feature vector to a characteristic associated with one or more of the tokens.
- AJ One or more non-transitory computer-readable media storing instructions that, when executed, cause one or more processors to perform actions comprising: receiving, by a training component, state data representing a previous state of an object or a vehicle in an environment; receiving, by the training component, feature vectors representing the vehicle and the object in the environment; training a codebook to output a trained codebook, the training comprising: assigning, based at least in part on the state data, a first token to represent the previous state of the object; mapping the feature vectors to respective tokens; and outputting the trained codebook for use by a machine learned model configured to access tokens from the trained codebook and to arrange the tokens to represent potential interactions between the vehicle and the object.
- AK The one or more non-transitory computer-readable media of paragraph AJ, wherein the machine learned model is a first machine learned model, and the actions further comprising: causing the machine learned model to arrange the tokens based at least in part on a number of tokens and an order of the tokens.
- AL The one or more non-transitory computer-readable media of paragraph AJ or AK, wherein: the feature vectors represent continuous variables, the tokens represent discrete latent variables, and mapping the feature vectors to respective tokens comprises mapping the continuous variables associated with the feature vectors to discrete latent variables associated with the tokens.
- AM The one or more non-transitory computer-readable media of any of paragraphs AJ-AL, wherein training the codebook further comprises: assigning a second token to represent a characteristic of the vehicle; and assigning a third token to represent a feature of the environment: wherein the machine learned model represents the potential interactions between the vehicle and the object based at least in part on the first token, the second token, and the third token.
- AN The one or more non-transitory computer-readable media of any of paragraphs AJ-AM, the actions further comprising: receiving, by the training component, environmental data representing a feature of the environment, wherein training the codebook further comprises assigning a second token to represent the feature of the environment, and the codebook is usable by the machine learned model to cause generation of a scene with the feature in the environment.
- a system comprising: one or more processors; and one or more non-transitory computer- readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the one or more processors to perform actions comprising: receiving, by a diffusion model, map data representing an environment; receiving, by the diffusion model, condition data representing a behavior of an obj ect in the environment; determining, by the diffusion model and based at least in part on the map data and the condition data, discrete latent variable data associated with the object; inputting the discrete latent variable data into a machine learned model; generating, by the machine learned model and based at least in part on the discrete latent variable data, a simulated environment that includes an object trajectory' for the object; and causing a vehicle to be controlled in a real-world environment based at least in part on the object trajectory generated in the simulated environment.
- C The system of paragraph A or B, wherein: the condition data comprises a token from a codebook associated with a transformer model, and the token representing the behavior of the object.
- D The system of any of paragraphs A-C, wherein: the condition data comprises a node from a Graph Neural Network, and the node representing the behavior of the object.
- condition data represents one of: a yield action, a drive straight action, a left turn action, a right turn action, a brake action, an acceleration action, a steering action, a lane change action, a position, a heading, or an acceleration of the object.
- a method comprising: receiving, by a diffusion model, map data representing an environment; receiving, by the diffusion model, condition data representing a state or an action of an obj ect in the environment; determining, by the diffusion model and based at least in part on the map data and the condition data, latent variable data associated with the object; inputting the latent variable data into a machine learned model; determining, by the machine learned model and based at least in part on the latent variable data, an object trajectory for the object to follow in the environment; and causing a vehicle to be controlled in the environment based at least in part on the object trajectory.
- the machine learned model comprises at least one of a Generative Adversarial Network (GAN), a Graph Neural Network (GNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), or a transformer model.
- GAN Generative Adversarial Network
- GNN Graph Neural Network
- CNN Convolutional Neural Network
- RNN Recurrent Neural Network
- H The method of paragraph F or G, wherein: the diffusion model is trained based at least in part on a set of conditions, at least one condition of the set of conditions comprising a previous action, a previous position, a previous trajectory, or a previous acceleration of the object.
- condition data comprises a token from a codebook associated with a transformer model, and the token representing a behavior of the object.
- condition data comprises a node from a Graph Neural Network, and the node representing a behavior of the object.
- K The method of any of paragraphs F-J, wherein: the condition data represents one of: a yield action, a drive straight action, a left turn action, a right turn action, a brake action, an acceleration action, a steering action, a lane change action, a position, a heading, or an acceleration of the object.
- L The method of any of paragraphs F-K, wherein: the diffusion model is trained by incrementally denoising data to generate an output based on a conditional input.
- M The method of any of paragraphs F-L, further comprising: mapping the latent variable data to continuous variables associated with feature vectors, wherein inputting the latent variable data into the second machine learned model comprises inputting the continuous variables associated with the feature vectors, and determining, by the machine learned model, the object trajectory' is based at least in part on the continuous variables associated with the feature vectors.
- N The method of any of paragraphs F-M, wherein: the diffusion model comprises one or more self-attention layers, and determining the latent variable data is further based at least in part on output data associated with the one or more self-attention layers.
- P The method of any of paragraphs F-O, wherein: the environment is a simulated environment, the condition data represents a feature of the simulated environment, and further comprising: determining, by the machine learned model and based at least in part on the feature of the simulated environment, scene data for testing or verifying a scenario in the simulated environment.
- Q One or more non-transitory computer-readable media storing instructions that, when executed, cause one or more processors to perform actions comprising: receiving, by a first machine learned model, map data representing an environment; receiving, by the first machine learned model, condition data representing a a state or an action of an object in the environment; determining, by the first machine learned model and based at least in part on the map data and the condition data, latent variable data associated with the object; inputting the latent variable data into a second machine learned model; determining, by the second machine learned model and based at least in part on the latent variable data, an object trajectory for the object to follow in the environment; and causing a vehicle to be controlled in the environment based at least in part on the object trajectory'.
- R The one or more non-transitory computer-readable media of paragraph Q, wherein: the first machine learned model is a diffusion model, and the second machine learned model comprises at least one of a Generative Adversarial Network (GAN), a Graph Neural Network (GNN), a Convolutional Neural Network (CNN), a Recunent Neural Network (RNN), or a transformer model.
- GAN Generative Adversarial Network
- GNN Graph Neural Network
- CNN Convolutional Neural Network
- RNN Recunent Neural Network
- S The one or more non-transitory computer-readable media of paragraph Q or R, wherein: the first machine learned model comprises a diffusion model, and the diffusion model is trained based at least in part on a set of conditions, at least one condition of the set of conditions comprising a previous action, a previous position, a previous trajectory, or a previous acceleration of the object.
- the first machine learned model comprises a diffusion model
- the diffusion model is trained based at least in part on a set of conditions, at least one condition of the set of conditions comprising a previous action, a previous position, a previous trajectory, or a previous acceleration of the object.
- T The one or more non-transitory computer-readable media of any' of paragraphs Q-S, wherein: the object is a first object, the condition data represents a potential action of a second object in the environment, and the latent variable data represents a potential interaction between the second obj ect and the first object.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Traffic Control Systems (AREA)
Abstract
L'invention concerne des techniques de prédiction d'une trajectoire d'objet ou d'informations de scène. Par exemple, les techniques peuvent consister à entrer des données variables latentes dans un modèle appris par machine. Le modèle appris par machine peut délivrer une trajectoire d'objet (par exemple, des données de position, des données de vitesse, des données d'accélération, etc.) pour un ou plusieurs objets dans l'environnement sur la base des données variables latentes. La trajectoire d'objet peut être envoyée à un dispositif informatique de véhicule en vue d'une prise en compte pendant la planification de véhicule, qui peut comprendre une simulation.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/087,540 | 2022-12-22 | ||
US18/087,540 US20240101157A1 (en) | 2022-06-30 | 2022-12-22 | Latent variable determination by a diffusion model |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024137504A1 true WO2024137504A1 (fr) | 2024-06-27 |
Family
ID=91589948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/084627 WO2024137504A1 (fr) | 2022-12-22 | 2023-12-18 | Détermination de variable latente par modèle de diffusion |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024137504A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147610A1 (en) * | 2017-11-15 | 2019-05-16 | Uber Technologies, Inc. | End-to-End Tracking of Objects |
US20200283016A1 (en) * | 2019-03-06 | 2020-09-10 | Robert Bosch Gmbh | Movement prediction of pedestrians useful for autonomous driving |
US20210114617A1 (en) * | 2019-10-18 | 2021-04-22 | Uatc, Llc | Method for Using Lateral Motion to Optimize Trajectories for Autonomous Vehicles |
US20210347382A1 (en) * | 2020-05-11 | 2021-11-11 | Zoox, Inc. | Unstructured vehicle path planner |
CN113936243A (zh) * | 2021-12-16 | 2022-01-14 | 之江实验室 | 一种离散表征的视频行为识别系统及方法 |
-
2023
- 2023-12-18 WO PCT/US2023/084627 patent/WO2024137504A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147610A1 (en) * | 2017-11-15 | 2019-05-16 | Uber Technologies, Inc. | End-to-End Tracking of Objects |
US20200283016A1 (en) * | 2019-03-06 | 2020-09-10 | Robert Bosch Gmbh | Movement prediction of pedestrians useful for autonomous driving |
US20210114617A1 (en) * | 2019-10-18 | 2021-04-22 | Uatc, Llc | Method for Using Lateral Motion to Optimize Trajectories for Autonomous Vehicles |
US20210347382A1 (en) * | 2020-05-11 | 2021-11-11 | Zoox, Inc. | Unstructured vehicle path planner |
CN113936243A (zh) * | 2021-12-16 | 2022-01-14 | 之江实验室 | 一种离散表征的视频行为识别系统及方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220274625A1 (en) | Graph neural networks with vectorized object representations in autonomous vehicle systems | |
US20220185267A1 (en) | Object determination in an occluded region | |
US11851054B2 (en) | Active prediction based on object trajectories | |
US11912268B2 (en) | Velocity regression safety system | |
US12012128B2 (en) | Optimization based planning system | |
US12080074B2 (en) | Center-based detection and tracking | |
US20240101150A1 (en) | Conditional trajectory determination by a machine learned model | |
US12060076B2 (en) | Determining inputs for perception system | |
US20240217548A1 (en) | Trajectory prediction for autonomous vehicles using attention mechanism | |
US12065140B1 (en) | Object trajectory determination | |
US20240101157A1 (en) | Latent variable determination by a diffusion model | |
WO2024118997A1 (fr) | Modèle de prédiction à étapes temporelles variables | |
US11970164B1 (en) | Adverse prediction planning | |
WO2024049925A1 (fr) | Prédiction de trajectoire basée sur un arbre décisionnel | |
US20240104934A1 (en) | Training a codebook for trajectory determination | |
US12054178B2 (en) | Identifying relevant objects within an environment | |
US20240212360A1 (en) | Generating object data using a diffusion model | |
US20240211797A1 (en) | Training a variable autoencoder using a diffusion model | |
US20240210942A1 (en) | Generating a scenario using a variable autoencoder conditioned with a diffusion model | |
US20240211731A1 (en) | Generating object representations using a variable autoencoder | |
US20240253620A1 (en) | Image synthesis for discrete track prediction | |
WO2024137504A1 (fr) | Détermination de variable latente par modèle de diffusion | |
US20240174265A1 (en) | Determining prediction times for a model | |
US12116017B1 (en) | Machine-learned component hybrid training and assistance of vehicle trajectory generation | |
US20240208536A1 (en) | Cascaded trajectory refinement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23908262 Country of ref document: EP Kind code of ref document: A1 |