CN116888578A

CN116888578A - Performance testing for mobile robot trajectory planner

Info

Publication number: CN116888578A
Application number: CN202280014919.9A
Authority: CN
Inventors: 伊恩·怀特赛德; 约翰·雷德福德; 戴维·海曼; 康斯坦丁·韦列捷尼科夫
Original assignee: Faber Artificial Intelligence Co ltd
Current assignee: Faber Artificial Intelligence Co ltd
Priority date: 2021-02-12
Filing date: 2022-02-11
Publication date: 2023-10-13
Also published as: GB202105838D0; CN116964563A

Abstract

A computer-implemented method of evaluating performance of a trajectory planner of a mobile robot in a real or simulated scene includes receiving a scene ground truth of the scene, the scene ground truth generated using the trajectory planner to control an autonomous agent of the scene in response to at least one scene element in the scene. One or more performance evaluation rules of the scene and at least one activation condition for each performance evaluation rule are received. The test predictors process the ground truth of the scene to determine whether an activation condition for each performance evaluation rule is satisfied over a plurality of time steps of the scene. Each performance evaluation rule is evaluated by the test predictors only when an activation condition of each performance evaluation rule is satisfied to provide at least one test result.

Description

Performance testing for mobile robot trajectory planner

Technical Field

The present disclosure relates to a method for evaluating the performance of a trajectory planner in a real or simulated scenario, as well as a computer program and a system for implementing the method. Such a planner is capable of automatically planning the autonomous trajectories of a fully/semi-autonomous vehicle or other form of mobile robot. Example applications include ADS (automatic driving system (Autonomous Driving System)) and ADAS (advanced driver assistance system (Advanced Driver Assist System)) performance testing.

Background

The field of autonomous vehicles has evolved significantly and rapidly. An autonomous vehicle (autonomous vehicle, AV) is a vehicle equipped with sensors and control systems that enable the vehicle to operate without one controlling its behavior. Autonomous vehicles are equipped with sensors that enable the vehicle to perceive its physical environment, such sensors including, for example, cameras, radar, and lidar. The autonomous vehicle is equipped with appropriately programmed computers that can process the data received from the sensors and make safe and predictable decisions based on the environment perceived by the sensors. An autonomous vehicle may be fully autonomous (at least in some cases, it is designed to operate without human supervision or intervention) or semi-autonomous. Semi-automatic driving systems require different levels of human supervision and intervention, such systems including advanced driver assistance systems and three-level automatic driving systems. There are different aspects to testing the behavior of sensors and control systems on a particular autonomous vehicle or one of the autonomous vehicles.

A "class 5" vehicle is a vehicle that is in any case fully autonomous in that it always ensures that a certain minimum safety level is reached. Such vehicles do not require manual control at all (steering wheel, pedals, etc.).

In contrast, class 3 and class 4 vehicles may operate fully automatically, but only in certain specific situations (e.g., in a geofenced area). Class 3 vehicles must be equipped with an autopilot to handle any situation that requires immediate response (e.g., emergency braking); however, a change in condition may trigger a "transitional demand" that requires the driver to control the vehicle for some limited period of time. Class 4 vehicles have similar limitations; however, in the event that the driver does not react within a prescribed period of time, the class 4 vehicle must also be able to automatically drive to achieve a "minimum risk strategy" (minimum risk maneuver, MRM) (i.e., some appropriate action) that places the vehicle in a safe condition (e.g., slowing down and stopping). Class 2 vehicles require the driver to be ready for intervention at any time, and if the automated driving system fails to respond properly at any time, the driver is responsible for the intervention. For level 2 automation, drivers are responsible for determining when their intervention is needed; for levels 3 and 4, this responsibility is transferred to the vehicle's autopilot system, which must alert the driver when intervention is required.

As the level of autopilot increases and more responsibility is transferred from person to machine, safety is an increasingly serious challenge. In autopilot, the importance of ensuring safety has been accepted. Ensuring safety does not necessarily mean zero incidents, but rather means ensuring that a certain minimum safety level is reached in certain situations. It is generally believed that this minimum safety level must significantly exceed the safety level of the human driver before autonomous driving is feasible.

According to Shalev-Shertz et al, "On a Formal Model of Safe and Scalable Self-driving cards (a safe and scalable automated driving automobile formalized model)" (2017), arXiv:1708.06374 (RSS paper), the entire contents of which are incorporated herein by reference, it is estimated that artificial driving results in 10 per hour ^-6 A serious accident. Assuming that an autopilot system will need to reduce it by at least three orders of magnitude, the RSS paper concludes that a guarantee of hourly is required10 ^-9 The lowest safety level of a serious accident, note that a purely data-driven approach therefore requires a large amount of driving data to be collected each time a change is made to the software or hardware of the AV system.

The RSS paper provides a model-based approach to ensuring security. The rule-based Responsibility-Sensitive Safety (RSS) model is built by formalizing the following small "common sense" driving rules:

"1. Do not strike from behind.

2. Without cutting in mess.

3. The right of way is given, not taken.

4. Note areas of limited visibility.

5. You must go if you can avoid one accident without causing another accident. "

The RSS model appears to prove safe because no incidents occur if all agents always follow the rules of the RSS model. The aim is to reduce the amount of driving data that needs to be collected by several orders of magnitude in order to prove the required level of safety.

A security model (e.g., RSS) may be used as a basis for evaluating the quality of trajectories implemented by autonomous agents in real or simulated scenarios under the control of an autopilot system (stack). The stack is tested by exposing the stack to different scenarios and evaluating whether the autonomous trajectories resulting therefrom conform to the rules of the security model (rule-based test). Rule-based testing methods may also be applied to other aspects of performance, such as comfort or progress toward a given goal.

Disclosure of Invention

According to a first aspect herein, a computer-implemented method of evaluating performance of a trajectory planner of a mobile robot in a real or simulated scene, the method comprising: receiving a scene ground truth of a scene, the scene ground truth generating an autonomous agent that controls the scene in response to at least one scene element of the scene using a trajectory planner; receiving one or more performance evaluation rules of a scene, at least one activation condition for each performance evaluation rule; and processing, by a test oracle, the ground truth of the scene to determine whether an activation condition for each performance evaluation rule is satisfied over a plurality of time steps of the scene. Each performance evaluation rule is evaluated by the test predictors only when an activation condition of each performance evaluation rule is satisfied to provide at least one test result.

In the context of pass/fail rules, this provides a third "inapplicable" that can be evaluated by the rule in a given time step. Particularly when evaluating large amounts of scene data (typically generated in a combination of simulations in a simulation or test), evaluating potentially complex rules in many time steps and in many scenes can require a significant amount of computational resources. By "deactivating" rules based on simpler activation conditions (which are cheaper to evaluate than the evaluation rules themselves), significant resources can be saved without compromising the final result. In fact, the quality of the results may be improved because "unsuitable" (invalid) results are generally more informative, as it distinguishes between cases where rules are suitable and pass/fail and cases where rules are not naturally suitable. For example, in an intersection scenario, a rule may be defined with respect to various distance thresholds for multiple other agents on a road that an autonomous agent wishes to join, the rule being activated only when the autonomous agent crosses a road boundary. If the rule is valid at all times, this not only results in high evaluation costs (as compared to the case where there is no distinction between "pass" and "not applicable") when the autonomous agent waits at the intersection, but the result of this time period does not provide much information.

In an embodiment, the scene ground truth may be processed to determine, for each scene element in the set of multiple scene elements, whether an activation condition for each performance evaluation rule is satisfied over multiple time steps of the scene. Each performance evaluation rule may be evaluated only when at least one of the scenario elements satisfies an activation condition of each performance evaluation rule, and only between the autonomous agent and the scenario element satisfying the activation condition.

In an embodiment, each performance evaluation rule may be encoded in a length of rule creation code as a second logical predicate, and an activation condition of each performance evaluation rule is encoded in the length of rule creation code as a first logical predicate, wherein at each time step, the test pre-cursor evaluates the first logical predicate for each scenario element, evaluating only the second logical predicate between the autonomous agent and any scenario element that satisfies the first logical predicate.

The test propulsor may receive a plurality of performance evaluation rules having different respective activation conditions and selectively evaluate the plurality of performance evaluation rules based on the different respective activation conditions of the plurality of performance evaluation rules.

Each performance evaluation rule may be related to drivability.

The method may include presenting respective results for a plurality of time steps in the time series on a graphical user interface (graphical user interface, GUI), the results for each time step visually indicating one of at least three categories including: a first category when the activation condition is not satisfied, a second category when the activation condition is satisfied and the rule is passed, and a third category when the activation condition is satisfied and the rule is failed.

For example, the result may be presented as one of at least three different colors corresponding to at least three categories.

The activation condition of a first one of the performance evaluation rules may depend on the activation condition of at least a second one of the performance evaluation rules.

For example, a first performance evaluation rule (e.g., related to comfort) may be disabled when a second performance evaluation rule (e.g., related to safety) is active.

The scene element may include one or more other agents.

At least one of the performance evaluation rules may be selectively evaluated in pairs between the autonomous agent and one of the set of scenario elements in the scenario, and its activation condition may be evaluated independently for each of the scenario elements to determine whether to evaluate the performance evaluation rules between the autonomous agent and the other agent at each time step.

The set of scene elements may be a set of other agents.

The activation condition may be evaluated for each scene element to calculate an iteratable term containing an identifier of any scene element that satisfies the activation condition at each time step, and the performance evaluation rule may be evaluated by looping through the iteratable term at each time step.

The performance evaluation rule may be defined as a computational graph applied to one or more signals extracted from the ground truth of a scene, wherein the term may be iterated through the computational graph in order to evaluate the rule between an autonomous agent and any scene element that satisfies the activation condition.

Another aspect herein provides a computer-implemented method of evaluating performance of a trajectory planner of a mobile robot in a real or simulated scene, the method comprising: receiving a scene ground truth of a scene, the scene ground truth generated using a trajectory planner to control autonomous agents in the scene in response to one or more scene elements in the scene; receiving one or more performance evaluation rules in a scenario and at least one activation condition for each performance evaluation rule; and processing, by the test propulsor, the scene ground truth to determine, for each scene element, whether an activation condition for each performance evaluation rule is satisfied over a plurality of time steps of the scene; wherein each performance evaluation rule is evaluated by the test predictors only when an activation condition of each performance evaluation rule is satisfied for at least one of the scenario elements, and only between the autonomous agent and the scenario element satisfying the activation condition to provide at least one test result.

A further aspect provides a computer system comprising one or more computers configured to implement the method of the first aspect or any embodiment thereof, and executable program instructions for programming the computer system to implement the method.

Drawings

For a better understanding of the present disclosure, and to show how embodiments of the disclosure may be carried into effect, reference is made, by way of example only, to the following drawings, in which:

FIG. 1A shows a schematic functional block diagram of an automatically driven vehicle stack;

FIG. 1B shows a schematic overview of an automated driving vehicle test paradigm;

FIG. 1C shows a schematic block diagram of a scene extraction pipeline;

FIG. 2 shows a schematic block diagram of a test pipeline;

FIG. 2A shows further details of a possible implementation of a test pipeline;

FIG. 3A illustrates an example of a rule tree evaluated within a test prophetic machine;

FIG. 3B illustrates an example output of a node of a rule tree;

FIG. 4A illustrates an example of a rule tree to be evaluated within a test prophetic machine;

FIG. 4B illustrates a second example of a rule tree evaluated on a scene ground truth dataset;

FIG. 4C shows how rules are selectively applied within the test prophetic machine;

FIG. 5 shows a schematic block diagram of a visualization component for presenting a graphical user interface;

FIGS. 5A, 5B, and 5C illustrate different views available within a graphical user interface;

FIG. 6A shows a first example of a cut-in scenario;

FIG. 6B illustrates an example predictor output for a first scenario instance;

FIG. 6C shows a second example of a cut-in scenario;

FIG. 6D illustrates an example predictor output for a second scenario example;

FIG. 7 illustrates an example of rule creation code in a domain-specific language for defining rules applied by a test propulsor; and

FIG. 8 illustrates another example of a GUI view for presenting the output of a custom rule tree.

Detailed Description

The described embodiments provide a test pipeline to facilitate rule-based testing of a mobile robot stack in a real or simulated scenario. Agent (actor) behavior in a real or simulated scenario is evaluated by a test propulsor based on defined performance evaluation rules. Such rules may evaluate different aspects of security. For example, a set of security rules may be defined to assess the performance of the stack according to a particular security standard, regulation, or security model (e.g., RSS), or a set of custom rules may be defined to test any aspect of performance. The application of the test pipeline in terms of security is not limited thereto and the test pipeline may be used to test any aspect of performance, such as comfort or progress toward some given goal. The rule editor allows the performance evaluation rules to be defined or modified and passed to the test predictors.

The "full" stack generally involves aspects of control logic that are derived from the processing and interpretation of low-level sensor data (perceptions), feed predictions and planning, etc., and generate appropriate control signals to implement planning level decisions (e.g., control braking, steering, acceleration, etc.). For an autonomous vehicle, the 3-level stack includes some logic for implementing the transitional demand, and the 4-level stack additionally includes some logic for implementing the minimum risk policy. The stack may also implement secondary control functions such as signaling, headlamps, windshield wipers, etc.

The term "stack" may also refer to individual subsystems (sub-stacks) of a full stack, such as a sense stack, a predict stack, a plan stack, or a control stack, which may be tested alone or in any desired combination. A stack may refer purely to software, i.e., one or more computer programs that execute on one or more general-purpose computer processors.

Whether real or simulated, a scenario requires an autonomous agent to navigate a real or modeled physical environment. An autonomous agent is a real or simulated mobile robot that moves under control of the stack under test. The physical environment includes static and/or dynamic elements to which the stack under test needs to respond effectively. For example, the mobile robot may be a fully or semi-autonomous vehicle under stack (autonomous vehicle) control. The physical environment may include a static road layout and a set of given environmental conditions (e.g., weather, time of day, lighting conditions, humidity, pollution/particulate matter levels, etc.), which may be maintained or varied as the scene progresses. The interactive scenario additionally includes one or more other agents ("external" agents, e.g., other vehicles, pedestrians, cyclists, animals, etc.).

The following examples consider the application of an autonomous vehicle test. However, the principles apply equally to other forms of mobile robots.

Scenes may be characterized or defined at different levels of abstraction. More abstract scenes accommodate a greater degree of variation. For example, a "cut-in scenario" or "lane change scenario" is an example of a highly abstract scenario featuring interesting strategies or behaviors that can accommodate many variations (e.g., different agent start positions and speeds, road layout, environmental conditions, etc.). "scenario run" refers to the specific case where an agent (optionally in the presence of one or more other agents) navigates a physical environment. For example, multiple runs of a cut-in or lane-change scene (in the real world and/or in the simulator) may be performed using different proxy parameters (e.g., starting position, speed, etc.), different road layouts, different environmental conditions, and/or different stack configurations, etc. The terms "run" and "instance" are used interchangeably in this context.

In the following example, performance of a stack is assessed at least in part by evaluating behavior of autonomous agents in a test propulsor according to a given set of performance evaluation rules during one or more runs. These rules apply to the "ground truth" of the (or each) scene run, which typically refers only to the proper characterization of the scene run (including the behavior of the autonomous agent), which characterization is considered authoritative for the purposes of testing. Ground truth is inherent to simulation; the simulator calculates a sequence of scene states, which by definition is a perfect, authoritative representation of the operation of the simulated scene. In real world scene runs, the "perfect" representation of the scene run does not exist in the same sense; however, appropriate informational ground truth may be obtained in a variety of ways (e.g., manual annotation based on in-vehicle sensor data, automatic/semi-automatic annotation of such data (e.g., using offline/non-real-time processing), and/or using external information sources (e.g., external sensors, maps, etc.).

Scene ground truth typically includes "trace" of autonomous agents and any other (significant) agents (as applicable). The trace is the history of the agent's position and motion during the scene. There are many ways to characterize the trace. Trace data will typically include spatial and motion data of agents within the environment. The term is used for both real scenes (with real world traces) and simulated scenes (with simulated traces). The trace typically records the actual trace implemented by the agent in the scene. With respect to the terms "trace" and "trace" may contain the same or similar types of information (e.g., a series of time-varying spatial and motion states). The term "trajectory" is generally popular in planning environments (may refer to future/predicted trajectories), while the term "trajectory" is generally related to past behavior in testing/evaluation environments.

In a simulation environment, a "scene description" is provided as input to a simulator. For example, the scene description may be encoded using a scene description language (scenario description language, SDL) or in any other form that may be used by a simulator. A scene description is typically a more abstract representation of a scene, which may yield multiple simulated runs. According to an embodiment, the scene description may have one or more configurable parameters that may be varied to increase the extent of possible variation. The degree of abstraction and parameterization is a design choice. For example, a scene description may encode a fixed layout with parameterized environmental conditions (e.g., weather, lighting, etc.). However, further abstractions are possible, for example with configurable road parameters (e.g. road curvature, lane configuration, etc.). The simulator inputs include a scene description and a selected set of parameter values (as applicable). The latter may be referred to as parameterization of the scene. The configurable parameters define a parameter space (also referred to as a scene space), the parameterization corresponding to points in the parameter space. In this case, a "scene instance" may refer to an instantiation of a scene in the simulator based on the scene description and, if applicable, the selected parameterization.

For brevity, the term "scene" may also be used to refer to a scene run, as well as a scene in a more abstract sense. The meaning of the term "scene" will be clearly visible from the background in which it is used.

Trajectory planning is an important function in the current environment, and the terms "trajectory planner", "trajectory planning system" and "trajectory planning stack" are used interchangeably herein to refer to a component or components that can plan future trajectories for a mobile robot. The trajectory planning decisions ultimately determine the actual trajectory of the autonomous agent implementation (although in some test environments this may be affected by other factors such as the implementation of these decisions in the control stack, and the real or modeled dynamic response of the autonomous agent to the generated control signals).

The trajectory planner may be tested alone or in combination with one or more other systems (e.g., sensing, predicting, and/or controlling). In a full stack, planning generally refers to a higher level of autopilot decision capability (e.g., trajectory planning), while control generally refers to the generation of lower level control signals for performing those autopilot decisions. However, in a performance testing environment, the term "control" is also used in a broader sense. For the avoidance of doubt, when the trajectory planner is considered to control an autonomous agent in a simulation, this does not necessarily mean that the control system is tested in combination with the trajectory planner (in a narrow sense).

Example AV stack

In order to provide a relevant environment for the described embodiments, further details of an example form of the AV stack will now be described.

Fig. 1A shows a high-level schematic block diagram of an AV runtime stack 100. The runtime stack 100 is shown to include a sense (subsystem) 102, a predict (subsystem) 104, a plan (subsystem) (planner) 106, and a control (subsystem) (controller) 108. As previously mentioned, the term (sub) stack may also be used to describe the above-described components 102-108.

In a real world environment, the perception system 102 receives sensor outputs from the AV's onboard sensor system 110 and uses those sensor outputs to detect external agents and measure their physical states, such as their position, velocity, acceleration, etc. The in-vehicle sensor system 110 may take different forms, but typically includes various sensors such as image capture devices (cameras/optical sensors), lidar and/or radar units, satellite positioning sensors (global positioning system (Global Positioning System, GPS), etc.), motion/inertial sensors (accelerometers, gyroscopes, etc.), and the like. The in-vehicle sensor system 110 thus provides rich sensor data from which detailed information about the surrounding environment can be extracted, as well as the status of the AV and any external actors (vehicles, pedestrians, cyclists, etc.) within the environment. The sensor output typically includes sensor data for a plurality of sensor modalities, such as stereoscopic images from one or more stereoscopic optical sensors, lidar, radar, and the like. The sensor data of the plurality of sensor modalities may be combined using filters, fusion components, or the like.

The sensing system 102 generally includes a plurality of sensing components that cooperate to interpret the sensor output to provide a sensing output to the prediction system 104.

In a simulation environment, it may or may not be necessary to model the in-vehicle sensor system 100, depending on the nature of the test, and in particular, the location of the stack 100 "slices" for testing purposes (see below). For higher level slices, no analog sensor data is needed, and therefore no complex sensor modeling is needed.

The prediction system 104 uses the perceived output of the perception system 102 to predict future behavior of external actors (agents) such as other vehicles in the vicinity of the AV.

The predictions calculated by the prediction system 104 are provided to the planner 106, which planner 106 uses the predictions to make automatic driving decisions to be performed by the AV in a given driving scenario. The input received by the planner 106 typically indicates a drivable region and will also capture the predicted movement of any external agents (obstacles from the AV perspective) within the drivable region. The drivable region may be determined using perceived output from the sensing system 102 in combination with map information such as HD (high definition) maps or the like.

The core function of the planner 106 is to plan the trajectory of the AV (autonomous trajectory) taking into account the predicted proxy motion. This may be referred to as trajectory planning. The trajectory is planned to achieve a desired objective within the scene. For example, the goal may be to enter the circular intersection and exit the circular intersection at a desired exit; beyond the preceding vehicle; or stay in the current lane (lane following) at the target speed. For example, the target may be determined by an autopilot route planner (not shown).

The controller 108 performs the decisions made by the planner 106 by providing appropriate control signals to the AV's onboard actor system 112. In particular, the planner 106 plans the AV track, and the controller 108 generates control signals to implement the planned track. Typically, the planner 106 will plan the future such that the planned trajectory may only be partially implemented at the control level before the planner 106 plans the new trajectory. Actor systems 112 include "primary" vehicle systems such as braking, acceleration, and steering systems, as well as secondary systems (e.g., signaling, windshield wipers, headlights, etc.).

Note that there may be a distinction between the planned trajectory at a given moment and the actual trajectory followed by the autonomous agent. The planning system typically runs on a sequence of planning steps, with the planned trajectory updated at each planning step to account for any changes in the scene since the previous planning step (or more precisely, any changes that deviate from the predicted changes). Planning system 106 may infer the future such that the planned trajectory at each planning step extends across the next planning step. Thus, any individual planning trajectory may be incompletely implemented (if planning system 106 is tested alone in simulation, the autonomous agent may simply follow the planning trajectory exactly until the next planning step; however, as noted, in other real and simulated environments, the planning trajectory may not follow exactly until the next planning step, as the behavior of the autonomous agent may be affected by other factors such as the operation of control system 108 and the real or modeled dynamics of the autonomous vehicle). In many test environments, the actual trajectory of the autonomous agent is ultimately important; in particular whether the actual trajectory is safe or not, and other factors such as comfort and progress. However, the rule-based test methods herein may also be applied to planned trajectories (even if those planned trajectories are not fully or precisely implemented by autonomous agents). For example, even if the actual trajectory of an agent is considered safe according to a given set of security rules, it may be that the instantaneous planned trajectory is unsafe; the fact that the planner 106 is considering an unsafe course of action may be exposed even though it does not result in unsafe agent behavior in this scenario. In addition to the actual agent behavior in the simulation, the instantaneous planning trajectory also constitutes an internal state that can be effectively evaluated. Other forms of internal stack states may be similarly evaluated.

The example of FIG. 1A contemplates a relatively "modular" architecture with separable perception, prediction, planning, and control systems 102-108. The sub-stacks themselves may also be modular, for example with separable planning modules within the planning system 106. For example, the planning system 106 may include a plurality of trajectory planning modules that may be applied to different physical environments (e.g., simple lane driving versus complex intersections or ring intersections). For the reasons described above, this is relevant to analog testing, as it allows components (e.g., planning system 106 or its individual planning modules) to be tested individually or in different combinations. For the avoidance of doubt, for a modular stack architecture, the term stack may refer not only to a full stack, but also to any individual subsystem or module thereof.

The degree of integration or separation of the various stack functions may vary significantly between different stack implementations (in some stacks, certain aspects may be tightly coupled and indistinguishable). For example, in other stacks, planning and control may be integrated (e.g., such stacks may be planned directly from control signals), while the architecture of other stacks (e.g., as depicted in fig. 1) may draw a clear distinction between the two (e.g., planning from trajectories, and using separate control optimizations to determine how to best perform planning trajectories at the control signal level). Similarly, in some stacks, predictions and plans may be more tightly coupled. In extreme cases, perception, prediction, planning and control may be essentially inseparable in so-called "end-to-end" driving. Unless otherwise indicated, the terms of perception, predictive planning, and control as used herein are not meant to imply any particular coupling or modularity of these aspects.

It should be understood that the term "stack" includes software, but may also include hardware. In simulation, the stacked software may be tested on a "general purpose" off-board computer system prior to final upload to the on-board computer system of the physical vehicle. However, in the "hardware-in-the-loop" test, the test may be extended to the underlying hardware of the vehicle itself. For example, for testing purposes, the stack software may run on an in-vehicle computer system (or a copy thereof) coupled to the simulator. In this case, the stack under test extends to the underlying computer hardware of the vehicle. As another example, some of the functions of the stack 110 (e.g., the awareness functions) may be implemented in dedicated hardware. In an analog environment, hardware-in-loop testing may involve feeding synthetic sensor data to dedicated hardware-aware components.

Fig. 1B shows a highly schematic overview of a test paradigm of an automatically driven vehicle. By running multiple scenario instances in simulator 202 and evaluating the performance of stack 100 (and/or individual sub-stacks thereof) in test propulsor 252, ADS/ADAS stack 100, such as the type depicted in FIG. 1A, is subjected to repeated testing and evaluation in the simulation. The output of the test forecaster 252 provides information to the expert 122 (team or individual) allowing them to identify problems in the stack 100 and modify the stack 100 to alleviate those problems (S124). The results also assist the expert 122 in selecting a further scenario for testing (S126), and the process continues with repeated modification, testing and evaluation of the performance of the stack 100 in the simulation. The improved stack 100 is ultimately incorporated (S125) in a real world AV 101 equipped with a sensor system 110 and an actor system 112. The improved stack 100 generally includes program instructions (software) that execute in one or more computer processors (not shown) in an onboard computer system of the vehicle 101. In step S125, the software of the modified stack is uploaded to the AV 101. Step 125 may also involve modifications to the underlying vehicle hardware. At AV 101, the modified stack 100 receives sensor data from the sensor system 110 and outputs control signals to the actor system 112. The real world test (S128) may be used in combination with a simulation-based test. For example, after an acceptable level of performance is achieved through the process of simulation testing and stack refinement, the appropriate real world scenes may be selected (S130), and the performance of AV 101 in those real scenes may be captured and similarly evaluated in test propulsor 252.

Scenes for simulation purposes may be obtained in various ways, including manual coding. The system is also capable of extracting scenes from real-world operations for simulation purposes, allowing the real-world situations and their changes to be recreated in simulator 202.

FIG. 1C shows a high-level schematic block diagram of a scene extraction pipeline. The real world operation data 140 is passed to a "ground truth" pipeline 142 for purposes of generating ground truth for the scene. The operational data 140 may include, for example, sensor data and/or perceived output captured/generated on one or more vehicles (which may be autonomous, human driven, or a combination thereof), and/or data captured from other sources (e.g., external sensors (CCTV, etc.). The performance data is processed within the ground truth pipeline 142 to generate the appropriate ground truth 144 (trace and environmental data) for the real world operation. As discussed, the ground truth process may be based on manual annotation of the "raw" operational data 142, or the process may be fully automated (e.g., using an off-line perception method), or a combination of manual and automatic ground truth may be used. For example, a 3D bounding box may be placed around vehicles and/or other agents captured in the travel data 140 in order to determine the spatial and motion states of their trajectories. The scene extraction component 146 receives the scene ground truth 144 and processes the scene ground truth 144 to extract a more abstract scene description 148 that can be used for simulation purposes. The scene description 148 is consumed by the simulator 202, allowing multiple simulation runs to be performed. The simulated operation is a variation of the original real world operation, the extent of possible variation being determined by the degree of abstraction. Ground truth 150 is provided for each simulation run.

Test assembly line

Further details of the test pipeline and test propulsor 252 will now be described. The following examples focus on simulation-based testing. However, as previously described, the test predictors 252 are equally applicable to evaluating stack performance on real scenes, as is the following description. The following description exemplifies the stack 100 in fig. 1A. However, as previously described, test pipeline 200 is highly flexible and may be applied to any stack or sub-stack operating at any automation level.

Fig. 2 shows a schematic block diagram of a test pipeline, indicated by reference numeral 200. Test pipeline 200 is shown to include a simulator 202 and a test propulsor 252. Simulator 202 runs a simulation scenario for the purpose of testing all or part of AV runtime stack 100, and test predictor 252 evaluates the performance of the stack (or sub-stack) on the simulation scenario. As discussed, only sub-stacks of the runtime stack may be tested, but for simplicity the following description refers throughout to the (full) AV stack 100. However, the description applies equally to sub-stacks rather than the full stack 100. The term "slice" is used herein to select a set or subset of stack components for testing.

As previously mentioned, the idea of a simulation-based test is to run a simulated driving scenario where an autonomous agent has to navigate under control of the stack 100 being tested. Typically, the scenario includes a static drivable zone (e.g., a particular static road layout) that an autonomous agent typically needs to navigate in the presence of one or more other dynamic agents (e.g., other vehicles, bicycles, pedestrians, etc.). To this end, an analog input 203 is provided from the simulator 202 to the stack under test 100.

The slice of the stack determines the form of the analog input 203. For example, fig. 2 shows a prediction system 104, a planning system 106, and a control system 108 within the AV stack 100 under test. To test the full AV stack in fig. 1A, the perception system 102 may also be applied during testing. In this case, the analog input 203 would include synthetic sensor data that is generated using an appropriate sensor model and processed within the perception system 102 in the same manner as the real sensor data. This requires the generation of sufficiently realistic synthetic sensor inputs (e.g., realistic image data and/or equally realistic simulated lidar/radar data, etc.). The final output of the perception system 102 will in turn be fed to a higher-level prediction system 104 and a planning system 106.

In contrast, so-called "planning level" simulations will substantially bypass the perception system 102. The simulator 202 will provide simpler, higher-level inputs 203 directly to the prediction system 104. In some cases, it may even bypass the prediction system 104 in order to test the planner 106 on predictions obtained directly from the simulation scenario (i.e., a "perfect" prediction).

Between these extremes, there are a range of different levels of input slices, such as testing only a subset (e.g., "later" (higher level) perception components of the perception system 102 (e.g., components such as filters or fusion components that operate on outputs from lower level perception components (e.g., object detectors, bounding box detectors, motion detectors, etc.)).

Regardless of the form they take, the analog inputs 203 are used (directly or indirectly) as a basis for decisions made by the planner 108. The controller 108 in turn implements the planner decisions by outputting control signals 109. In a real world environment, these control signals will drive the physical actor system 112 of the AV. In simulation, the autonomous vehicle dynamics model 204 is used to convert the resulting control signals 109 into true motions of the autonomous agent within the simulation, thereby simulating the physical response of the autonomous vehicle to the control signals 109.

Alternatively, a simpler form of simulation assumes that the autonomous agent follows each planned trajectory precisely between the planning steps. The method bypasses (to the extent that it is separable from planning) the control system 108 and eliminates the need for the autonomous vehicle dynamic model 204. This may be sufficient to test certain aspects of the plan.

To the extent that the external agent exhibits automatic driving behavior/decisions within simulator 202, some form of agent decision logic 210 is implemented to execute these decisions and determine agent behavior within the scenario. The proxy decision logic 210 may be comparable in complexity to the autonomous stack 100 itself, or it may have more limited decision-making capabilities. The purpose is to provide sufficiently realistic foreign agent behavior within simulator 202 to be able to effectively test the decision making capabilities of autonomous stack 100. In some cases, this does not require any proxy decision logic 210 at all (open loop simulation), and in other cases, relatively limited proxy logic 210 (e.g., basic adaptive cruise control (adaptive cruise control, ACC)) may be used to provide useful testing. One or more proxy dynamic models 206 may be used to provide more realistic proxy behavior, if appropriate.

The scene is run according to the scene description 201a and (if applicable) the selected parameterization 201b of the scene. A scene typically has both static and dynamic elements, which may be "hard coded" or configurable in the scene description 201a, and thus determined by the scene description 201a in combination with the selected parameterization 201 b. In a driving scenario, the static elements typically include a static road layout.

Dynamic elements typically include one or more external agents within a scene, such as other vehicles, pedestrians, bicycles, etc.

The degree of dynamic information provided to simulator 202 for each external agent may vary. For example, a scene may be described by separable static and dynamic layers. A given static layer (e.g., defining a road layout) may be used in combination with different dynamic layers to provide different instances of a scene. For each external agent, the dynamic layer may include a spatial path that the agent is to follow and one or both of motion data and behavior data associated with the path. In a simple open loop simulation, the external actor simply follows the spatial path and motion data defined in the dynamic layer, which is non-reactive, i.e., does not react to autonomous agents within the simulation. Such open loop simulation may be implemented without any proxy decision logic 210. However, in closed loop simulation, the dynamic layer defines at least one behavior (e.g., ACC behavior) to follow along the static path. In this case, the proxy decision logic 210 implements this behavior within the simulation in a reactive manner, i.e., reacting to autonomous agents and/or other external agents. The motion data may still be associated with a static path, but in this case is less canonical and may be used, for example, as a target along the path. For example, for ACC behavior, a target speed may be set along a path that the agent will seek to match, but the agent decision logic 210 may be allowed to reduce the speed of the external agent below the target at any point along the path in order to maintain a target following distance from the preceding vehicle.

It will be appreciated that for simulation purposes, a scene may be described in many ways and with any degree of configurability. For example, the number and type of agents and their motion information may be configured as part of scene parameterization 201 b.

The output of a given simulated simulator 202 includes an autonomous trace 212a of an autonomous agent and one or more agent traces 212b of one or more external agents (traces 212). Each trace 212a, 212b is a complete history of agent behavior within the simulation with both spatial and motion components. For example, each trace 212a, 212b may take the form of a spatial path having motion data associated with points along the path, such as velocity, acceleration, jerk (rate of change of acceleration), snap (rate of change of jerk), and the like.

Additional information is also provided to supplement trace 212 and provide an environment for trace 212. Such additional information is referred to as "environment" data 214. The environmental data 214 belongs to the physical environment of the scene and may have both static components (e.g., road layout) and dynamic components (e.g., the degree to which weather conditions change during simulation). The environment data 214 may be "pass through" in part because it is directly defined by the choice of scene description 201a or parameterization 201b and is therefore not affected by the simulation results. For example, the environmental data 214 may include a static road layout directly from the scene description 201a or the parameterization 201 b. However, typically the environmental data 214 will include at least some elements derived within the simulator 202. For example, this may include modeling environmental data, such as weather data, where simulator 202 may freely change weather conditions as the modeling progresses. In this case, the weather data may depend on time, which will be reflected in the environmental data 214.

The test forecaster 252 receives the traces 212 and the environmental data 214 and scores those outputs according to a set of performance evaluation rules 254. Performance evaluation rules 254 are shown provided as inputs to test predictors 252.

Rules 254 are classified in nature (e.g., pass/fail type rules). Some performance evaluation rules are also associated with a numerical performance index for "scoring" the trajectory (e.g., indicating the degree of success or failure or some other number that helps interpret or correlate with the classification result). The evaluation of rules 254 is time-based, given rules may have different results at different points in the scene. Scoring is also time-based: for each performance evaluation metric, the test predictors 252 track how the value (score) of the metric changes over time as the simulation progresses. The test predictors 252 provide an output 256 that includes a time series 256a of classification (e.g., pass/fail) results for each rule and a score-time graph 256b for each performance metric, as described in further detail below. The results and scores 256a, 256b provide information to the expert 122 and may be used to identify and mitigate performance issues within the test stack 100. The test predictors 252 also provide overall (aggregate) results (e.g., overall pass/fail) for the scenario. The output 256 of the test propulsor 252 is stored in a test database 258 in association with information regarding the scenario to which the output 256 belongs. For example, the output 256 may be stored in association with the scene description 210a (or identifier thereof) and the selected parameterization 201 b. In addition to time-dependent results and scores, overall scores may be assigned to scenes and stored as part of output 256. For example, an aggregate score (e.g., overall pass/fail) for each rule and/or an aggregate result (e.g., pass/fail) for all rules 254.

Fig. 2A shows another option for slicing and indicates full and sub-stacks using reference numerals 100 and 100S, respectively. The sub-stack 100S will be tested within the test pipeline 200 of fig. 2.

A plurality of "late" sense components 102B form part of the sub-stack 100S to be tested and are applied to the analog sense inputs 203 during testing. The post-sense component 102B can, for example, include a filtering or other fusing component, which post-sense component 102B fuses the sense inputs from the plurality of early sense components.

In the full stack 100, the late sense component 102B will receive the actual sense input 213 from the early sense component 102A. For example, the early perception component 102A may include one or more 2D or 3D bounding box detectors, in which case the analog perception input provided to the late perception component may include analog 2D or 3D bounding box detection derived in the simulation via ray tracing. Early sensing component 102A typically includes components that directly operate on sensor data. With the slice of fig. 2A, the analog sense input 203 will correspond in form to the actual sense input 213 typically provided by the early sense component 102A. However, the early sense component 102A is not applied as part of the test, but rather is used to train one or more sense error models 208, which one or more sense error models 208 can be used to introduce real errors into the simulated sense input 203 in a statistically stringent manner, which simulated sense input 203 is fed to the late sense component 102B of the sub-stack 100 under test.

Such a perceptual error model may be referred to as a perceptual statistical performance model (Perception Statistical Performance Model, PSPM) or synonym "PRISM". Further details of the principles of PSPM and suitable techniques for constructing and training them may be incorporated in international patent publication nos. WO2021037763, WO2021037760, WO2021037765, WO2021037761 and WO2021037766, each of which is incorporated herein by reference in its entirety. The idea behind the PSPM is to effectively introduce real errors into the analog perceived input provided to the sub-stack 100S (i.e., reflect the type of errors expected when the early perceived component 102A is applied in the real world). In a simulation environment, the "perfect" ground truth perceived input 203G is provided by a simulator, but these are used to derive a more realistic perceived input 203 with real errors introduced by the perceived error model 208.

As described in the above references, the PSPM may depend on one or more variables representing physical conditions ("confounding factors") allowing for the introduction of different levels of error reflecting different possible real world conditions. Thus, simulator 202 may simulate different physical conditions (e.g., different weather conditions) by simply changing the value of weather confounding factors, which would in turn change the manner in which the perceived error is introduced.

The post-sense components 102b within the sub-stack 100S process the analog sense inputs 203 in exactly the same way as the real world sense inputs 213 within the full stack 100, and their outputs in turn drive prediction, planning and control.

Alternatively, PRISM may be used to model the entire perception system 102 (including the post-perception component 208), in which case PSPM is used to generate a true perception output that is passed as input directly to the prediction system 104.

Depending on the implementation, there may or may not be a deterministic relationship between the simulation results for a given scene parameterization 201b and a given configuration of the stack 100 (i.e., the same parameterization may or may not always result in the same result for the same stack 100). The non-decision theory may occur in various ways. For example, when the simulation is based on PRISM, the PRISM may model the distribution over the possible perceived outputs at each given time step of the scene, from which the true perceived outputs are probability sampled. This results in an uncertain behaviour within simulator 202, whereby different results can be obtained for the same stack 100 and scene parameterization, as different perceived outputs are sampled. Alternatively or additionally, the simulator 202 may be inherently non-deterministic, e.g., weather, lighting, or other environmental conditions may be randomized/probabilistic to some extent within the simulator 202. As will be appreciated, this is a design choice: in other embodiments, the changing environmental conditions may be fully specified in the parameterization 201b of the scene. For non-deterministic simulations, each parameterization may run multiple scene instances. The aggregate pass/fail result may be assigned to a particular choice of parameterization 201b, for example as a count or percentage of pass or fail results.

The test orchestration component 260 is responsible for selecting a scenario for simulation purposes. For example, the test orchestration component 260 may automatically select the scene description 201a and the appropriate parameterization 201b based on the test predictor output 256 from the previous scene.

Test predictor rules:

the performance evaluation rules 254 are structured as computational graphs (rule trees) to be applied within the test propulsor. The term "rule tree" herein refers to a computational graph configured to implement a given rule, unless otherwise specified. Each rule is structured as a rule tree, and a collection of multiple rules may be referred to as a "forest" of multiple rule trees.

Fig. 3A shows an example of a rule tree 300 constructed from a combination of an extractor node (leaf object) 302 and an evaluator node (non-leaf object) 304. Each extractor node 302 extracts a time-varying numeric (e.g., floating point) signal (score) from the set of scene data 310. In the sense described above, the scene data 310 is a form of ground truth of a scene and may be referred to as such. Scene data 310 is obtained by deploying a trajectory planner (e.g., planner 106 in fig. 1A) in a real or simulated scene, the scene data 310 is shown as including autonomous and proxy traces 212 and environment data 214. In the simulation environment of fig. 2 or fig. 2A, the ground truth of the scene 310 is provided as an output of the simulator 202.

Each evaluator node 304 is shown as having at least one child object (node), where each child object is one of the extractor nodes 302 or another of the evaluator nodes 304. Each evaluator node receives outputs from its child nodes and applies an evaluator function to those outputs. The output of the evaluator function is a time series of classification results. The following examples consider simple binary pass/fail results, but these techniques can be easily extended to non-binary results. Each evaluator function evaluates the output of its child node according to a preset atomic rule. Such rules may be flexibly combined according to a desired security model.

Further, each evaluator node 304 derives from the output of its child node a time-varying digital signal that is related to the classification result by a threshold condition (see below).

The top level root node 304a is an evaluator node that is not a child of any other node. The top-level root node 304a outputs the final result sequence and its children (i.e., nodes that are direct or indirect children of the top-level root node 304 a) provide the underlying signals and intermediate results.

Fig. 3B depicts intuitively an example of a derived signal 312 and a corresponding resulting time series 314 calculated by the evaluator node 304. The result 314 is related to the derived signal 312 in that a pass result is returned when (and only when) the derived signal exceeds the failure threshold 316. As will be appreciated, this is merely one example of a threshold condition that correlates a time series of results with a corresponding signal.

The signal directly extracted from the scene ground truth 310 by the extractor node 302 may be referred to as the "raw" signal, to distinguish it from the "derived" signal calculated by the evaluator node 304. The result and the original/derived signal may be discretized in time.

Fig. 4A shows an example of a rule tree implemented within test platform 200.

A rule editor 400 is provided for constructing rules to be implemented with the test predictors 252. Rule editor 400 receives rule creation input from a user (which may or may not be an end user of the system). In this example, the rule creation input is encoded in a domain specific language (domain specific language, DSL) and defines at least one rule diagram 408 to be implemented within the test predictors 252. The rules in the following example are logical rules, TRUE and FALSE representing pass and fail, respectively (it will be appreciated that this is purely a design choice).

The following examples consider rules formulated using combinations of atomic logical predicates. Examples of basic atomic predicates include basic logic gates (OR, AND, etc.) AND logic functions (e.g., "greater than", (Gt (a, b)) (TRUE returned when a is greater than b, otherwise false returned).

The Gt function is a secure lateral distance rule (with an agent identifier "other agent id") that implements between an autonomous agent and another agent in a scenario. Two extractor nodes (latd, latsd) apply a LateralDistance and LateralSafeDistance extractor function, respectively. Those functions run directly on the scene ground truth 310 to extract time-varying lateral distance signals (measuring lateral distances between autonomous agents and identified other agents) and time-varying security lateral distance signals of autonomous agents and identified other agents, respectively. The safe lateral distance signal may depend on various factors such as the speed of the autonomous agent (captured in trace 212) and the speed of other agents as well as the environmental conditions (e.g., weather, lighting, road type, etc.) captured in environmental data 214.

The evaluator node (is_latd_safe) is the parent of the latd and latsd extractor nodes and maps to the Gt atomic predicate. Thus, when rule tree 408 is implemented, the is_latd_safe evaluator node applies the Gt function to the outputs of the latd and latsd extractor nodes to calculate TRUE/FALSE results for each time step of the scene, returning TRUE for each time step the latd signal exceeds the latsd signal, and otherwise returning FALSE. In this way, a "safe lateral distance" rule is constructed from the atom extractor function and predicates; when the lateral distance reaches or falls below a safe lateral distance threshold, the autonomous agent does not comply with the safe lateral distance rule. It will be appreciated that this is a very simple rule tree example. Rules of arbitrary complexity can be constructed according to the same principle.

The test forecaster 252 applies the rule tree 408 to the scene ground truth 310 and provides results via a User Interface (UI) 418.

Fig. 4B illustrates an example of a rule tree that includes lateral distance branches corresponding to the lateral distance branches in fig. 4A. In addition, the rule tree also includes a longitudinal distance branch and a top-level OR predicate (safe distance node is_d_safe) for implementing a safe distance metric. Similar to the lateral distance branch, the longitudinal distance branch extracts a longitudinal distance threshold signal and a longitudinal distance threshold signal (extractor nodes lon and lon, respectively) from the scene data, and when the longitudinal distance is above the safe longitudinal distance threshold, the longitudinal safety evaluator node (is_lon_safe) returns TRUE. When one OR both of the lateral and longitudinal distances are safe (below the applicable threshold), the top-level OR node returns TRUE, and if both are unsafe, FALSE. In this case it is sufficient that only one distance exceeds the safety threshold (for example if two vehicles are travelling on adjacent lanes, their longitudinal spacing is zero or close to zero when they are side by side; but if these vehicles have sufficient lateral spacing, this is not unsafe).

For example, the digital output of the top level node may be a time-varying robustness score.

Different rule trees may be constructed, for example, implementing different rules for a given security model, implementing different security models, or selectively applying rules to different scenarios (in a given security model, not every rule will necessarily apply to every scenario; by this means, different rules or combinations of rules may be applied to different scenarios). Within this framework, rules may also be built to evaluate comfort (e.g., based on instantaneous acceleration and/or jerk along the trajectory), progress (e.g., based on the time it takes to reach a defined goal), and so forth.

The above examples consider simple logical predicates, such as OR, AND, gt, etc., that evaluate a result or signal in a single instance of time. In practice, however, certain rules may need to be formulated based on temporal logic.

Hekmanejad et al, "Encoding and Monitoring Responsibility Sensitive Safety Rules for Automated Vehicles in Signal Temporal Logic" (2019) conference on MEMOCODE No. 19: formalized methods and model international conference treaties for the design of the ACM-IEEE system, 17 th, incorporated herein by reference in its entirety, disclose signal sequential logic (signal temporal logic, STL) encoding of RSS security rules. Temporal logic provides a formalized framework for constructing predicates defined in terms of time. This means that the result calculated by the evaluator at a given moment may depend on the result and/or signal value at another moment.

For example, the requirement of the security model may be that the autonomous agent react to an event within a set time frame. Such rules may be encoded in a similar manner using temporal logic predicates within the rule tree.

In the above example, the performance of stack 100 is evaluated at each time step of the scene. Overall test results (e.g., pass/fail) may be derived therefrom, e.g., if a rule fails at any time step within a scene, certain rules (e.g., safety critical rules) may result in an overall failure (i.e., the rule must be passed at each time step to obtain an overall pass across the scene). For other types of rules, the overall pass/fail criteria may be "softer" (e.g., a rule may trigger a failure only if the rule fails within a certain number of consecutive time steps), and such criteria may depend on the circumstances.

Fig. 4C schematically depicts the layering of rule evaluation implemented within the test predictors 252. A set of receiving rules 254 is for implementation in the test predictors 252.

Some rules apply only to autonomous agents (an example is a comfort rule that evaluates whether an autonomous trajectory exceeds some maximum acceleration or jerk threshold at any given moment).

Other rules are related to interactions of the autonomous agent with other agents (e.g., the "collision-free" rules or safe distance rules considered above). Each such rule is evaluated in a pairwise manner between the autonomous agent and each other agent. Another example is that the "pedestrian emergency braking" rule can only be activated when a pedestrian walks in front of an autonomous vehicle and can only be proxied for that pedestrian.

Not every rule is necessarily applicable to every scene, and some rules may be applicable to only a portion of a scene. Rule activation logic 422 within test pre-machine 422 determines whether and when each of rules 254 apply to the scenario in question and selectively activates rules when applicable. Thus, rules may remain valid throughout a scene, may never be activated in a given scene, or may only be activated in some scenes. Furthermore, rules may be evaluated for different numbers of agents at different points in the scene. Selectively activating rules in this manner may significantly improve the efficiency of the test predictors 252.

Activation or deactivation of a given rule may depend on activation/deactivation of one or more other rules. For example, when a pedestrian emergency braking rule is activated, the "best comfort" rule may be deemed unsuitable (because pedestrian safety is a primary issue), while when the latter is activated, the former may be deactivated.

Rule evaluation logic 424 evaluates each valid rule for any period of time that each valid rule remains valid. Each interaction rule is evaluated in a pairwise manner between the autonomous agent and any other agent to which it applies.

There may also be a degree of interdependence in the application of rules. For example, another way to address the relationship between the comfort rule and the emergency braking rule is to increase the jerk/acceleration threshold of the comfort rule each time the emergency braking rule is activated for at least one other agent.

Although pass/fail results have been considered, the rules may be non-binary. For example, two types of errors may be introduced-acceptable and unacceptable. Again, considering the relationship between comfort rules and emergency braking rules, acceptable failure in comfort rules may occur when the rules fail but when the emergency braking rules are valid. Thus, interdependencies between rules can be handled in various ways.

The activation criteria for rules 254 may be specified in the rule creation code provided to rule editor 400, as may the nature of any rule interdependencies and the mechanisms used to implement these interdependencies.

Graphic user interface

Fig. 5 shows a schematic block diagram of a visualization component 520. The visualization component is shown as having an input connected to a test database 258 for presenting an output 256 of the test predictors 252 on a Graphical User Interface (GUI) 500. The GUI is presented on a display system 522.

Fig. 5A shows an example view of GUI 500. The view belongs to a specific scenario containing multiple agents. In this example, the test propulsor output 526 belongs to a plurality of external agents, with the results organized according to agents. For each agent, at some point in the scene, each rule applicable to that agent has a time series of results. In the depicted example, the summary view has been selected for "agent 01" to display the "top-level" results calculated for each applicable rule. Top level results are computed at the root node of each rule tree. Color coding is used to distinguish between periods of time that the agent's rules are invalid ("inapplicable"), valid and pass, and valid and fail.

A first selectable element 534a is provided for each time series of results. This allows access to lower level results of the rule tree (i.e., results of lower level computations in the rule tree).

Fig. 5B shows a first expanded view of the "rule 02" result, where the result of the lower level node can also be seen. For example, for the "safe distance" rule of fig. 4B, the results of the "is_latd_safe node" and the "is_land_safe" node (labeled "C1" and "C2" in fig. 5B) may be visualized. In the first extended view of rule 02, it can be seen that the success/failure of rule 02 is defined by the logical OR relationship between results C1 and C2; rule 02 fails (as described in the "safe distance" rule above) only if both C1 and C2 fail.

A second selectable element 534b is provided for each time series of results, the second selectable element 534b allowing access to the associated digital performance score.

Fig. 5C shows a second expanded view in which the results of rule 02 and the "C1" results are expanded to reveal the relevant scores of those rules for the period of time that agent 01 is valid. The score is displayed as a visual score-time graph that represents pass/fail in a similar color coding.

Example scenario:

fig. 6A depicts a first example of a cut-in scenario in simulator 202 ending with a collision event between autonomous vehicle 602 and another vehicle 604. The cut-in scenario is characterized by a multi-lane driving scenario in which an autonomous vehicle 602 moves along a first lane 612 (an autonomous lane) while another vehicle 604 initially moves along a second, adjacent lane 604. At some point in the scene, another vehicle 604 moves from an adjacent lane 614 into an autonomous lane 612 in front of the autonomous vehicle 602 (cut-in distance). In this scenario, autonomous vehicle 602 cannot avoid a collision with another vehicle 604. The first scenario instance ends in response to a collision event.

Fig. 6B depicts an example of a first propranker output 256a obtained from a ground truth 310a of a first instance of a scene. The "no collision" rule is evaluated for the duration of the scene between the autonomous vehicle 602 and another vehicle 604. A collision event may cause this rule to fail at the end of the scene. In addition, the "safe distance" rule in FIG. 4B is also evaluated. As another vehicle 604 moves laterally closer to autonomous vehicle 602, a point in time (t 1) occurs at which both the safe lateral distance threshold and the safe longitudinal distance threshold are breached, resulting in a failure of the safe distance rule that persists until a collision event at time t 2.

Fig. 6C depicts a second example of a cut-in scenario. In the second case, the cut-in event does not result in a collision, and autonomous vehicle 602 is able to reach a safe distance behind another vehicle 604 after the cut-in event.

Fig. 6D depicts an example of a second propranker output 256b obtained from the ground truth 310b of the second scene instance. In this case, the "collision-free" rule is throughout. When the lateral distance between the autonomous vehicle 602 and the other vehicle 604 becomes unsafe, the safe distance rule is violated at time t 3. However, at time t4, autonomous vehicle 602 seeks to reach a safe distance behind another vehicle 604. Thus, the safe distance rule fails only between time t3 and time t 4.

Rule editor-Domain Specific Language (DSL)

Figure 7 shows an example of rule creation inputs of the test pre-emphasis engine 400 that are encoded in a particular DSL selection.

In the example of fig. 7, a custom rule diagram may be built within test platform 200. The test pre-predictor 252 is configured to provide a modular set of "building blocks" in the form of a preset extractor function 702 and a preset evaluator function 704.

The rule editor 400 receives rule creation input from a user. The rule creation input is encoded in DSL and an example portion of rule creation code 706 is described. Rule creation code 706 defines custom rule diagram 408, corresponding to the rule diagram in FIG. 4A. The choice of rule diagrams is purely illustrative, and the benefit of DSL is that users can build desired rule diagrams in a customized manner. Rule editor 400 interprets rule creation code 706 and causes custom rule diagram 408 to be implemented within test propulsor 252.

Within code 706, an extractor node creation input is depicted and labeled 711. The extractor node creation input 711 is shown as including an identifier 712 of one of the predetermined extractor functions 702.

An evaluator node creation input 713 is also depicted and is shown to include an identifier 714 of one of the preset evaluator functions 704. Here, input 713 indicates that an evaluator node is created having two child nodes with node identifiers 715a, 715b (which in this example are exactly the extractor nodes, but may generally be the evaluator node, the extractor node, or a combination of both).

The nodes of the custom rule graph are objects in the sense of object-oriented programming (OOP) (object-oriented programming). The node factory class (Nodes ()) is provided within the test predictors 252. To implement the custom rule diagram 708, the node factory class 710 is instantiated and a node creation function (add_node) of the generated factory object 710 (node factory) is invoked, which contains details of the node to be created.

According to code 706, the gt function is used to implement a secure lateral distance rule (with an agent identifier "other agent id") between an autonomous agent and another agent in the scene. Two extractor nodes (latd, latsd) are defined in code 406, mapped to preset LateralDistance and LateralSafeDistance extractor functions, respectively. Those functions run directly on the scene ground truth 310 to extract time-varying lateral distance signals (measuring lateral distances between autonomous agents and identified other agents) and time-varying security lateral distance signals of autonomous agents and identified other agents, respectively. The safe lateral distance signal may depend on various factors such as the speed of the autonomous agent (captured in trace 212) and the speed of other agents as well as the environmental conditions (e.g., weather, lighting, road type, etc.) captured in environmental data 214. This is essentially invisible to the end user, who need only select the desired extractor function (although in some embodiments one or more configurable parameters of the function may be exposed to the end user).

The evaluator node (is_latd_safe) is defined in code 706 as the parent of the latd and latsd extractor nodes, mapped to the Gt atomic predicate. Thus, when rule tree 408 is implemented, the is_latd_safe evaluator node applies the Gt function to the outputs of the latd and latsd extractor nodes to calculate TRUE/FALSE results for each time step of the scene, returning TRUE for each time step the latd signal exceeds the latsd signal, and otherwise returning FALSE. In this way, a "safe lateral distance" rule is constructed from the atom extractor function and predicates; when the lateral distance reaches or falls below a safe lateral distance threshold, the autonomous agent does not comply with the safe lateral distance rule. It will be appreciated that this is a very simple custom rule example. Rules of arbitrary complexity can be constructed according to the same principle. The test pre-cursor 252 applies the custom rule tree 408 to the scene ground truth 310 and provides the results in the form of an output graph 717, that is, the test pre-cursor 252 does not simply provide a top-level output, but rather provides an output calculated at each node of the custom rule tree 408. In the "safe lateral distance example" a time series of results calculated by the is_latd_safe node is provided, but the lower layer signals latd and latsd are also provided in the output graph 717, allowing the end user to easily investigate the cause of failure of a particular rule at any level in the graph. In this example, output diagram 717 is a visual representation of custom rule diagram 408 displayed via User Interface (UI) 418; each node of the custom rule graph is extended by its output visualization in the manner depicted in fig. 5A-5C.

FIG. 8 illustrates another example view of a GUI 500 for presenting custom rule trees. Multiple output graphs may be obtained via the GUI and displayed in association with a visualization 501 of the ground truth of the scene to which the output graphs relate. Each output graph is a visual representation of a particular rule graph that has been expanded by the visualization of the output of each node of the rule graph. Each output graph is initially displayed in a collapsed form, wherein only the root node of each computation graph is characterized. The first visual element 802 and the second visual element 804 represent root nodes of the first computational graph and the second computational graph, respectively. The first output graph is depicted in a collapsed form and only the time series of binary pass/fail results of the root node are visualized (as simple color-coded horizontal bars within the first visual element 802). However, the first visual element 802 is selectable to extend the visualization to lower level nodes and their outputs. The second output diagram is depicted in expanded form, accessed by selecting the second visual element 804. The visual elements 806, 808 represent lower level evaluator nodes within the applicable rule diagram and visualize their results in the same manner. Visual elements 810, 812 represent extractor nodes within the diagram. The visualizations of each node may also be selected to render an expanded view of the node. The extended view provides a visualization of the time-varying digital signal computed or extracted at the node. The second visual element 804 is shown in an expanded state, with a visualization of its derived signal displayed in place of the binary sequence of its results. The derived signal is color coded based on a failure threshold (in this example, a signal falling to zero or below indicates a failure according to the applicable rule). The visualizations 810, 812 of the extractor nodes are extensible in the same manner as the visualizations of their original signals are presented. Once the rule-graph is evaluated on a given set of ground truths of the scene, the view of fig. 8 presents the output of the rule-graph. Furthermore, prior to evaluating the rule diagram, an initial visualization may be presented for the benefit of the user creating the rule diagram. The initial visualization may be an updated response to a change in rule creation code 406.

Although not depicted in fig. 7, the node creation inputs 711, 713 may additionally set values of one or more configurable parameters (e.g., thresholds, time intervals, etc.) in the relevant evaluator or extractor function.

In some embodiments, computational efficiency may be improved via selective evaluation of a rule diagram. For example, in the graph of fig. 7, if (for example) is_latd_safe returns TRUE at a certain time step or interval, the output of the top level is_d_safe node can be calculated without evaluating the vertical distance branch for that time step/interval. Such efficiency gains are based on a "top-down" evaluation of the graph-starting from the top level of the tree, branches are calculated only to the extractor nodes as needed to obtain the output to the level.

The evaluator or extractor function may have one or more configurable parameters. For example, the latsd and lonsd nodes may have configurable parameters specifying how to extract a threshold distance from the scene ground truth 310, e.g., as a configurable function of autonomous speed.

Further efficiency gains may be obtained by caching and reusing results as much as possible.

For example, when a user modifies a graph or some parameter, the output of the affected node can only be recalculated (in some cases only to the extent required to calculate top-level results—see above).

While the above examples contemplate output in the form of time-varying signals and/or time-series of classifications (e.g., pass/fail or true/false results), other types of output may alternatively or additionally be communicated between nodes. For example, time-varying iteratable terms (i.e., objects that can iterate over a for loop) may be passed between nodes.

Variables may be distributed and/or passed through trees and bindings at runtime. The combination of runtime variables and iteratable terms provides control over loop and runtime (context dependent) parameterization, while the tree itself remains "static".

The For loop may define specific scene conditions For which the rule applies, such as "agent For front" or "each traffic light For the intersection" and so on. To implement such a loop, a variable is needed (e.g., implementing a "for each nearby agent" loop based on the "other agent" variable), but may also be used to define (store) the variable in the current environment, which may then be accessed (loaded) by other blocks (nodes) further down in the tree.

The time period can only be calculated on demand (also in a top-down fashion), and the results of the newly required time period can be cached and consolidated for the newly required time period.

For example, a rule (a rule diagram) may require calculating the acceleration of the preceding vehicle to check the adaptive cruise control headway. In addition, another rule (rule tree) may require acceleration of all vehicles around an autonomous agent ("nearby" agent).

Where applicable time periods overlap, one tree may be able to reuse acceleration data of another tree (e.g., where the duration of "other_vehicle" is considered "forward" is a subset of the duration it is considered 'nearby').

Referring to fig. 4C, as the scenario runs, rule activation logic 422 may be implemented based on loop traversal of the iteratable item in the manner described above. DSL can be extended to implement loops on any predicate at any given time step. In this case, the first logical predicate defines an activation condition applicable to each agent. For example, the first predicate may define the concept of a "nearby" agent according to a distance threshold condition (e.g., satisfied by any agent within a certain threshold distance of the autonomous agent), or the concept of a "front" agent as an appropriate set of conditions at the agent location (e.g., if a single agent (i) is in front of the autonomous agent, (ii) is on the same lane as the autonomous agent, and (iii) is closer to the agent than any other agent satisfying conditions (i) and (ii). The first logical predicate defining the activation condition may be encoded in the DSL in the same way as the rule itself. The rule tree may in turn be defined in the manner described above by the second logical predicate. This expands the DSL framework to merge loops onto any predicates. Evaluating loops of [ predicate 2] for [ any agent satisfying predicate 1 ], the rules and activation conditions encoded in DSL using the form to be built in DSL; at each step of the scenario run, a set of agents that satisfy predicate 1 (if any) is built, and predicate 2 is evaluated only against members of the set. "predicate 1" defines the rule activation condition of each agent, and "predicate 2" defines the rule tree itself. Time-varying iteratable terms can be constructed to track which agents satisfy predicate 1 at any time during the duration of the scene run, and pass down the rule tree as needed to facilitate efficient rule evaluation.

For example, each rule and its activation conditions may be defined in a first order logic.

The following provides a piece of code that defines a custom rule diagram (ALKS_01) as a temporal logic predicate using an alternative syntax.

/>

In the above example, longitudinal distance () and velocityalongroadratteralix () are preset extractor functions, and functions such as "and", eventuality (), next () and Always () are atomic evaluator functions. The function agentison nalane () is an evaluator function that is directly applied to a scene that determines whether a given agent is on the same lane as an autonomous agent.

Here, the sparbyAgents () is time-variant iteratable, identifying any other agents that meet some distance threshold of autonomous agents. This is one example of a rule activation condition that applies between an autonomous agent and each other agent based on distance from the autonomous agent.

Although the above examples consider AV stack testing, the technique may be applied to testing other forms of mobile robotic components. Other mobile robots are under development, for example for transporting goods in internal and external industrial areas. No one is present on such a mobile robot, and belongs to a category of mobile robots called UAVs (unmanned automatic vehicles). Autonomous airborne mobile robots (unmanned aerial vehicles) are also under development.

A computer system includes execution hardware that may be configured to perform the method/algorithm steps disclosed herein and/or implement models trained using the present technology. The term execution hardware includes any form/combination of hardware configured to perform the relevant method/algorithm steps. The execution hardware may take the form of one or more processors, which may be programmable or non-programmable, or a combination of programmable and non-programmable hardware may be used. Examples of suitable programmable processors include general-purpose processors based on instruction set architectures such as central processing units (Central Processing Unit, CPU), graphics processors (graphics processing unit, GPU)/accelerator processors, and the like. Such general purpose processors typically execute computer readable instructions held in a memory coupled to or internal to the processor and perform the relevant steps in accordance with those instructions. Other forms of programmable processors include field programmable gate arrays (field programmable gate array, FPGAs) having circuit configurations that are programmable by circuit description code. Examples of non-programmable processors include application specific integrated circuits (application specific integrated circuit, ASIC). The code, instructions, etc. may be suitably stored on a transitory or non-transitory medium (examples of which include solid state, magnetic and optical storage devices, etc.). The subsystems 102-108 of runtime stack fig. 1 may be implemented in programmable or special purpose processors or a combination of both, on-board a vehicle or in an off-board computer system in a test or like environment. The various components in FIG. 2, such as simulator 202 and test propulsor 252, may similarly be implemented in programmable and/or dedicated hardware.

Claims

1. A computer-implemented method of evaluating performance of a trajectory planner of a mobile robot in a real or simulated scene, the method comprising:

receiving a scene ground truth of the scene, the scene ground truth generated using the trajectory planner to control an autonomous agent of the scene in response to at least one scene element in the scene;

receiving one or more performance evaluation rules of the scene and at least one activation condition for each performance evaluation rule; and

processing, by a test pre-cursor, the ground truth of the scene to determine whether an activation condition of each performance evaluation rule is satisfied over a plurality of time steps of the scene, wherein each performance evaluation rule is evaluated by the test pre-cursor only when the activation condition of each performance evaluation rule is satisfied to provide at least one test result.

2. The method of claim 1, wherein for each of a set of a plurality of scenario elements, the scenario ground truth is processed to determine whether an activation condition for each performance evaluation rule is satisfied over a plurality of time steps of the scenario, wherein each performance evaluation rule is evaluated only when an activation condition for each performance evaluation rule is satisfied for at least one of the scenario elements, and only between the autonomous agent and a scenario element that satisfies the activation condition.

3. The method of claim 1 or 2, wherein each performance evaluation rule is encoded in a length of rule creation code as a second logical predicate, and an activation condition of each performance evaluation rule is encoded in the length of rule creation code as a first logical predicate, wherein at each time step the test pre-cursor evaluates the first logical predicate for each scenario element and evaluates only the second logical predicate between the autonomous agent and any scenario element that satisfies the first logical predicate.

4. A method according to claim 1, 2 or 3, wherein a plurality of performance evaluation rules having different respective activation conditions are received by the test propulsor and selectively evaluated according to the different respective activation conditions of the plurality of performance evaluation rules.

5. A method according to any preceding claim, wherein each performance assessment rule relates to drivability.

6. A method according to any preceding claim, comprising:

presenting respective results for a plurality of time steps in the time series on a graphical user interface GUI, the results for each time step visually indicating one of at least three categories including:

A first category when the activation condition is not satisfied,

a second category when the activation condition is satisfied and the rule is passed, and

a third category when the activation condition is satisfied and the rule fails.

7. The method of claim 6, wherein the result is presented as one of at least three different colors corresponding to the at least three categories.

8. A method according to any preceding claim, wherein the activation condition of a first one of the performance evaluation rules is dependent on the activation condition of at least a second one of the performance evaluation rules.

9. The method of claim 8, wherein the first performance evaluation rule is disabled when the second performance evaluation rule is valid.

10. The method of claim 9, wherein the second performance evaluation rule relates to security and the first performance evaluation rule relates to comfort.

11. A method according to any preceding claim, wherein the scene element comprises one or more other agents.

12. The method of claim 11, wherein the set of scene elements is a set of other agents.

13. A method according to claim 11 or 12 when dependent on claim 2, wherein the activation condition is evaluated for each scene element to calculate an iteratable term containing an identifier of any scene element satisfying the activation condition at each time step, the performance evaluation rule being evaluated by looping through the iteratable term at each time step.

14. The method of claim 13, wherein the performance evaluation rule is defined as a computational graph applied to one or more signals extracted from the scene ground truth through which the iteratable term passes in order to evaluate rules between the autonomous agent and any scene elements that satisfy the activation condition.

15. A computer system comprising one or more computers configured to implement the method of any preceding claim.

16. Executable program instructions for programming a computer system to implement the method of any preceding claim.