DE102021202335A1

DE102021202335A1 - Method and device for testing a technical system

Info

Publication number: DE102021202335A1
Application number: DE102021202335.5A
Authority: DE
Inventors: Thomas Heinz; Joachim Sohns; Christoph Gladisch; Ji Su Yoon; Philipp Glaser
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2022-09-15

Abstract

Verfahren (10) zum Prüfen eines technischen Systems, insbesondere eines zumindest teilautonomen Roboters oder Fahrzeuges,gekennzeichnet durch folgende Merkmale:- Kombinationen aus Trainingsdaten und Validierungsdaten werden gebildet (1),- für jede der Kombinationen werden Stichproben der Trainingsdaten (17) und Validierungsdaten (23) genommen (2), ein zweiwertiger Klassifikator mit den Stichproben trainiert (26) und Wahrscheinlichkeitsverteilungen von Konfusionsmatrizen berechnet (3),- es wird eine Anzahl der Trainingsdaten (17) für eine Simulation des Systems bestimmt (4),- mittels der Simulation (11) werden Tests (12) durchgeführt,- die Tests (12) werden hinsichtlich eines Erfüllungsmaßes (13) einer quantitativen Anforderung an das System und eines Fehlermaßes (14) der Simulation (11) ausgewertet und- abhängig vom Erfüllungsmaß (13) und Fehlermaß (14) wird eine Einstufung der Tests (12) als entweder zuverlässig oder unzuverlässig vorgenommen.Method (10) for testing a technical system, in particular an at least partially autonomous robot or vehicle, characterized by the following features: - Combinations of training data and validation data are formed (1), - Samples of the training data (17) and validation data (17) are taken for each of the combinations 23) taken (2), a two-valued classifier trained with the samples (26) and probability distributions of confusion matrices calculated (3), - a number of training data (17) for a simulation of the system is determined (4), - by means of the simulation (11), tests (12) are carried out, - the tests (12) are evaluated with regard to a measure of compliance (13), a quantitative requirement for the system and an error measure (14) of the simulation (11) and - depending on the measure of compliance (13) and The error measure (14) is used to classify the tests (12) as either reliable or unreliable.

Description

Die vorliegende Erfindung betrifft ein Verfahren zum Prüfen eines technischen Systems. Die vorliegende Erfindung betrifft darüber hinaus eine entsprechende Vorrichtung, ein entsprechendes Computerprogramm sowie ein entsprechendes Speichermedium.The present invention relates to a method for testing a technical system. The present invention also relates to a corresponding device, a corresponding computer program and a corresponding storage medium.

Stand der TechnikState of the art

In der Softwaretechnik wird die Nutzung von Modellen zur Automatisierung von Testaktivitäten und zur Generierung von Testartefakten im Testprozess unter dem Oberbegriff „modellbasiertes Testen“ (model-based testing, MBT) zusammengefasst. Hinlänglich bekannt ist beispielsweise die Generierung von Testfällen aus Modellen, die das Sollverhalten des zu testenden Systems beschreiben.In software engineering, the use of models to automate test activities and to generate test artifacts in the test process is summarized under the generic term "model-based testing" (model-based testing, MBT). For example, the generation of test cases from models that describe the target behavior of the system to be tested is well known.

Insbesondere eingebettete Systeme (embedded systems) sind auf schlüssige Eingangssignale von Sensoren angewiesen und stimulieren wiederum ihre Umwelt durch Ausgangssignale an unterschiedlichste Aktoren. Im Zuge der Verifikation und vorgelagerter Entwicklungsphasen eines solchen Systems wird daher in einer Regelschleife dessen Modell (model in the loop, MiL),
Software (software in the loop, SiL), Prozessor (processor in the loop, PiL) oder gesamte Hardware (hardware in the loop, HiL) gemeinsam mit einem Modell der Umgebung simuliert. In der Fahrzeugtechnik werden diesem Prinzip entsprechende Simulatoren zur Prüfung elektronischer Steuergeräte je nach Testphase und -objekt mitunter als Komponenten-, Modul- oder Integrationsprüfstände bezeichnet.Embedded systems in particular are dependent on coherent input signals from sensors and in turn stimulate their environment through output signals to a wide variety of actuators. In the course of the verification and upstream development phases of such a system, its model (model in the loop, MiL),
Software (software in the loop, SiL), processor (processor in the loop, PiL) or entire hardware (hardware in the loop, HiL) are simulated together with a model of the environment. In automotive engineering, simulators for testing electronic control units based on this principle are sometimes referred to as component, module or integration test benches, depending on the test phase and object.

DE10303489A1 offenbart ein derartiges Verfahren zum Testen von Software einer Steuereinheit eines Fahrzeugs, eines Elektrowerkzeugs oder eines Robotiksystems, bei dem durch ein Testsystem eine von der Steuereinheit steuerbare Regelstrecke wenigstens teilweise simuliert wird, indem Ausgangssignale von der Steuereinheit erzeugt werden und diese Ausgangssignale der Steuereinheit zu ersten Hardware-Bausteinen über eine erste Verbindung übertragen werden und Signale von zweiten Hardware-Bausteinen als Eingangssignale zur Steuereinheit über eine zweite Verbindung übertragen werden, wobei die Ausgangssignale als erste Steuerwerte in der Software bereitgestellt werden und zusätzlich über eine Kommunikationsschnittstelle in Echtzeit bezogen auf die Regelstrecke zum Testsystem übertragen werden. DE10303489A1 discloses such a method for testing software of a control unit of a vehicle, a power tool or a robotic system, in which a test system is used to at least partially simulate a controlled system that can be controlled by the control unit by generating output signals from the control unit and sending these output signals from the control unit to first hardware components are transmitted via a first connection and signals from second hardware components are transmitted as input signals to the control unit via a second connection, with the output signals being provided as first control values in the software and additionally via a communication interface in real time in relation to the controlled system to the test system be transmitted.

Derartige Simulationen sind auf verschiedenen Gebieten der Technik verbreitet und finden beispielsweise Einsatz, um eingebettete Systeme in Elektrowerkzeugen, Motorsteuergeräte für Antriebs-, Lenk- und Bremssysteme, Kamerasysteme, Systeme mit Komponenten der künstlichen Intelligenz und des maschinellen Lernens, Robotiksysteme oder autonome Fahrzeuge in frühen Phasen ihrer Entwicklung auf Tauglichkeit zu prüfen. Dennoch werden die Ergebnisse von Simulationsmodellen nach dem Stand der Technik aufgrund fehlenden Vertrauens in ihre Zuverlässigkeit nur begrenzt in Freigabeentscheidungen einbezogen.Such simulations are widespread in various fields of technology and are used, for example, to test embedded systems in power tools, engine control units for drive, steering and braking systems, camera systems, systems with artificial intelligence and machine learning components, robotic systems or autonomous vehicles in early phases to check their development for suitability. However, the results of state-of-the-art simulation models are only included in release decisions to a limited extent due to a lack of confidence in their reliability.

Die beim Deutschen Patent- und Markenamt unter dem Aktenzeichen 10 2020 205 539.4 geführte Patentanmeldung schlägt vor diesem Hintergrund einen „virtuellen Test-Klassifikator“ vor.The at the German Patent and Trademark Office under the file number 10 2020 205 539.4 Against this background, the pending patent application proposes a “virtual test classifier”.

Offenbarung der ErfindungDisclosure of Invention

Die Erfindung stellt ein Verfahren zum Prüfen eines technischen Systems, eine entsprechende Vorrichtung, ein entsprechendes Computerprogramm sowie ein entsprechendes Speichermedium gemäß den unabhängigen Ansprüchen bereit.The invention provides a method for testing a technical system, a corresponding device, a corresponding computer program and a corresponding storage medium according to the independent claims.

Der erfindungsgemäße Ansatz fußt auf der Erkenntnis, dass die Güte von Simulationsmodellen für die korrekte Vorhersagbarkeit der damit erzielbaren Testergebnisse entscheidend ist. Auf dem Gebiet des MBT beschäftigt sich die Teildisziplin der Validierung mit der Aufgabe, reale Messungen mit Simulationsergebnissen zu vergleichen. Dazu werden verschiedene Metriken, Maßzahlen oder andere Vergleicher verwendet, die Signale miteinander verknüpfen und die im Folgenden zusammenfassend als Signalmetriken (SM) bezeichnet werden sollen. Beispiele für derartige Signalmetriken sind Metriken, die Größe, Phasenverschiebung und Korrelationen vergleichen. Einige Signalmetriken sind durch einschlägige Normen definiert, z. B. gemäß ISO 18571.The approach according to the invention is based on the knowledge that the quality of simulation models is crucial for the correct predictability of the test results that can be achieved with them. In the field of MBT, the sub-discipline of validation deals with the task of comparing real measurements with simulation results. Various metrics, measures or other comparators are used for this purpose, which link signals together and which are collectively referred to below as signal metrics (SM). Examples of such signal metrics are metrics that compare magnitude, phase shift, and correlations. Some signal metrics are defined by relevant standards, e.g. B. according to ISO 18571.

Allgemeiner ausgedrückt unterstützen Unsicherheitsquantifizierungstechniken die Abschätzung der Simulations- und Modellgüte. Das Ergebnis einer Bewertung der Modellgüte unter Heranziehung einer Signalmetrik oder allgemeiner unter Verwendung einer Unsicherheitsquantifizierungsmethode für eine bestimmte Eingabe X, bei der es sich um einen Parameter oder ein Szenario handeln kann, wird nachfolgend als Simulationsmodell-Fehlermetrik - kurz: Fehlermetrik - SMerrorX bezeichnet. Zur Verallgemeinerung (Interpolation und Extrapolation) von SMerrorX für bisher nicht betrachtete Eingaben, Parameter oder Szenarien X können maschinelle Lernmodelle etwa auf der Grundlage sogenannter Gaußprozesse verwendet werden.More generally, uncertainty quantification techniques aid in the estimation of simulation and model fidelity. The result of an assessment of model performance using a signal metric or more generally using an uncertainty quantification method for a given input X, which can be a parameter or a scenario, is hereafter referred to as the simulation model error metric - error metric for short - SMerrorX. For the generalization (interpolation and extrapolation) of SMerrorX for previously unconsidered inputs, parameters or scenarios X, machine learning models based on so-called Gaussian processes can be used.

Bei der Verifizierung wird der Prüfling (system under test, SUT) typischerweise anhand einer Anforderung, Spezifikation oder Leistungskennzahl untersucht. Es ist zu beachten, dass Boolesche Anforderungen oder Spezifikationen oft in quantitative Messungen umgewandelt werden können, indem man Formalismen wie die Signal-Temporallogik (signal temporal logic, STL) verwendet. Derartige Formalismen können als Grundlage einer quantitativen Semantik dienen, die sich insofern als Verallgemeinerung der Verifikation darstellt, als ein positiver Wert die Erfüllung und ein negativer Wert die Verletzung einer Anforderung indiziert. Im Folgenden werden solche Anforderungen, Spezifikationen oder Leistungsmaße zusammenfassend als „quantitative Anforderungen“ (QSpec) bezeichnet.During verification, the system under test (SUT) is typically examined against a requirement, specification or key performance indicator. It should be noted that Boolean requirements or specifications can often be converted into quantitative measurements, using formalisms such as signal temporal logic (STL). Such formalisms can serve as a basis for quantitative semantics, which is presented as a generalization of verification insofar as a positive value indicates the fulfillment and a negative value the violation of a requirement. In the following, such requirements, specifications or performance measures are collectively referred to as "quantitative requirements" (QSpec).

Derlei quantitative Anforderungen können entweder anhand des realen SUT oder eines Modells desselben - gleichsam eines „virtuellen SUT“ - überprüft werden. Zum Zwecke dieser Verifikation werden Kataloge mit Testfällen zusammengestellt, denen ein SUT genügen muss, um zu entscheiden, ob es die gewünschten Leistungs- und Sicherheitseigenschaften aufweist. Ein solcher Testfall kann parametrisiert werden und so eine beliebige Anzahl von Einzeltests abdecken.Such quantitative requirements can either be checked using the real SUT or a model of the same - like a "virtual SUT". For the purpose of this verification, catalogs are compiled with test cases that an SUT must pass in order to decide whether it has the desired performance and security properties. Such a test case can be parameterized and thus cover any number of individual tests.

Vor diesem Hintergrund trägt der vorgeschlagene Ansatz dem Bedürfnis nach belastbaren Testergebnissen Rechnung, um die Leistungs- und Sicherheitseigenschaften eines SUT zu gewährleisten. Gerade bei der Durchführung von Tests anhand einer Simulation des Systems oder einer Teilkomponente - anstelle des realen Systems - gilt es sicherzustellen, dass die Simulationsergebnisse vertrauenswürdig sind.Against this background, the proposed approach takes into account the need for reliable test results in order to guarantee the performance and security properties of an SUT. Especially when carrying out tests using a simulation of the system or a subcomponent - instead of the real system - it is important to ensure that the simulation results are trustworthy.

Zu diesem Zweck werden Validierungstechniken verwendet, um zu bewerten, inwieweit Simulationsmodell und reale Messungen übereinstimmen. Der Vergleich zwischen den Simulationsergebnissen und den realen Messungen erfolgt unter Verwendung der Validierungsfehlermetrik SMerrorX, die durch Interpolation und Extrapolation verallgemeinert wird, sodass das Fehlermaß für neue Eingaben X vorhergesagt werden kann, ohne entsprechende Messungen vorzunehmen. Die Vorhersage von SMerrorX ist jedoch mit einer Unsicherheit verbunden, die als Intervall oder als Wahrscheinlichkeitsverteilung modelliert werden kann.To this end, validation techniques are used to assess the agreement between the simulation model and real measurements. The comparison between the simulation results and the real measurements is done using the validation error metric SMerrorX, which is generalized through interpolation and extrapolation, so that the error measure for new inputs X can be predicted without making corresponding measurements. However, the prediction of SMerrorX involves an uncertainty that can be modeled as an interval or as a probability distribution.

Ein Ziel dieses Ansatzes besteht folglich darin, die Gesamtheit der vorhandenen Daten optimal und vollständig in Validierungs- und Trainingsdaten für das Training und die Validierung des Klassifikators aufzuteilen. Es wird untersucht, wie sich die Wahrscheinlichkeiten für die Einträge in der sogenannten Wahrheits- oder Konfusionsmatrix verändern, wenn das Verhältnis M: K der Anzahl M an Trainingsdatensätzen und Anzahl K an Validierungsdatensätzen variiert wird. In diesem Fall ist die Gesamtzahl der Datensätze N = K + M.One goal of this approach is therefore to optimally and completely split the entirety of the existing data into validation and training data for the training and validation of the classifier. It is examined how the probabilities for the entries in the so-called truth or confusion matrix change when the ratio M: K of the number M of training data sets and the number K of validation data sets is varied. In this case, the total number of records is N = K + M.

Ein zweiter Anwendungsfall besteht darin, festzustellen, ob der Klassifikator mit genügend Datenpunkten trainiert wurde (Konvergenz). Um festzustellen, ob insgesamt genügend Datenpunkte erhoben wurden, wird ein Teil der Punkte weggelassen und untersucht, wie sich die Wahrscheinlichkeiten (und daraus abgeleitete Größen) verändern, wenn die Summe aus Validierungsdaten und Trainingsdaten variiert wird. In diesem Fall ist N > M + K.A second use case is to determine if the classifier has been trained on enough data points (convergence). In order to determine whether enough data points have been collected in total, a part of the points is omitted and it is examined how the probabilities (and quantities derived from them) change when the sum of validation data and training data is varied. In this case N > M + K.

Ein Vorzug dieser Lösung für diese Aufgabe besteht darin, dass sie im Gegensatz zu Konzepten, die ausschließlich auf Validierung oder ausschließlich auf Verifizierung basieren, beide Ansätze auf geschickte Weise vereint. Dazu wird ein „virtueller Test-Klassifikator“ eingeführt, welcher die Erfordernisse von Modellvalidierung und Produkttest kombiniert. Dies wird durch die Verknüpfung von Informationen aus der Validierung von Simulations- und Modellgüte (SMerrorX) einerseits und Testanforderungen (QSpec) andererseits erreicht. Um die Güte der aus dem Klassifikator angeleiteten Aussagen bewerten zu können, wird ein Robustheitsmaß ausgegeben. Dieses Robustheitsmaß lässt sich mit Hilfe etablierter Methoden wie Cross Validation oder Extremwertanalyse bestimmen. Zwei möglicher Verfahren zur Bestimmung dieses Robustheitsmaßes werden im Folgenden vorgestellt.A merit of this solution to this task is that it cleverly combines both approaches, as opposed to concepts based solely on validation or solely based on verification. For this purpose, a "virtual test classifier" is introduced, which combines the requirements of model validation and product testing. This is achieved by linking information from the validation of simulation and model quality (SMerrorX) on the one hand and test requirements (QSpec) on the other. In order to be able to evaluate the quality of the statements derived from the classifier, a robustness measure is issued. This robustness measure can be determined using established methods such as cross validation or extreme value analysis. Two possible methods for determining this robustness measure are presented below.

Die Anwendung entsprechender Tests kommt auf unterschiedlichsten Feldern in Betracht. Zu denken ist beispielsweise an die funktionale Sicherheit automatisierter Systeme, wie sie etwa zur Automatisierung von Fahrfunktionen (automated driving) genutzt werden.The application of corresponding tests can be considered in a wide variety of fields. For example, the functional safety of automated systems, such as those used to automate driving functions (automated driving), should be considered.

Durch die in den abhängigen Ansprüchen aufgeführten Maßnahmen sind vorteilhafte Weiterbildungen und Verbesserungen des im unabhängigen Anspruch angegebenen Grundgedankens möglich. So kann eine automatisierte, computer-implementierte Testumgebung vorgesehen sein, um die Qualität der getesteten Hardware- oder Softwareprodukte weitgehend selbsttätig zu verbessern.Advantageous further developments and improvements of the basic idea specified in the independent claim are possible as a result of the measures listed in the dependent claims. An automated, computer-implemented test environment can be provided in order to improve the quality of the tested hardware or software products largely automatically.

Figurenlistecharacter list

Ausführungsbeispiele der Erfindung sind in den Zeichnungen dargestellt und in der nachfolgenden Beschreibung näher erläutert. Es zeigt:

1 einen virtuellen Test-Klassifikator.
2 einen ersten Ansatz zur Erzeugung der Entscheidungsgrenze des Klassifikators auf der Grundlage von Daten.
3 eine Übersicht des Ansatzes „Statistik von Statistiken“.
4 eine Übersicht zur Aufteilung der Grundgesamtheit in Trainingsdaten und Validierungsdaten und eine Übersicht zur Kombination der Aufteilung.
5 einen Verlauf zur Evaluierung eines Klassifikators innerhalb von einer Kombination des Datensatzes.
6 eine Übersicht vom Vorgang zur Kombination der Grundgesamtheit bis zur Zusammenfassung der Einträge von Konfusionsmatrizen in Form von Verteilungen.
7 eine Übersicht zur Wahrscheinlichkeit, dass die Falsch-positiv-Rate exakt I beträgt.
8 eine beispielhafte Aufteilung in Trainings- und Validierungssamples.
9 eine Kategorisierung der Validierungssamples nach dem Training.
10 eine diskrete Wahrscheinlichkeitsverteilung eines Validierungssamples, welches eine diskrete Wahrscheinlichkeitsverteilung der Falsch-positiv-Rate in einem zufälligen Validierungssample enthält.
11 eine diskrete Wahrscheinlichkeitsverteilung der jeweiligen Kategorie TP, TN, FP oder FN in Abhängigkeit von der Anzahl des Trainingssamples für s = 1 (eine Teilmenge pro Anzahl an Trainingssamples wurde zufällig aus der Grundgesamtheit ausgewählt).
12 eine diskrete Wahrscheinlichkeitsverteilung der jeweiligen Kategorie TP, TN, FP oder FN in Abhängigkeit von der Anzahl des Trainingssamples für r = 1 und s = 5, wobei die Mittelwerte der Mittelwerte und die Mittelwerte der 98-%-Quantile dargestellt sind.
13 diskrete Wahrscheinlichkeitsverteilungen der jeweiligen Kategorie TP, TN, FP oder FN in Abhängigkeit von der Anzahl des Trainingssamples in unterschiedlichen Verhältnissen von Training zu Validierungssamples und in unterschiedlichen Berechnungsmethoden für die Quantile.
14 bis 17 drei Konvergenzschätzer.
18 schematisch eine Arbeitsstation gemäß einer zweiten Ausführungsform.

Embodiments of the invention are shown in the drawings and explained in more detail in the following description. It shows:

1 a virtual test classifier.
2 a first approach to generate the classifier's decision boundary based on data.
3 an overview of the “statistics of statistics” approach.
4 an overview of how the population is split into training data and validation data, and an overview of how the split is combined.
5 a history for evaluating a classifier within a combination of the data set.
6 an overview from the process of combining the population to the summary of the entries of confusion matrices in the form of distributions.
7 an overview of the probability that the false positive rate is exactly I.
8th an exemplary division into training and validation samples.
9 a categorization of the validation samples after training.
10 a discrete probability distribution of a validation sample containing a discrete probability distribution of the false positive rate in a random validation sample.
11 a discrete probability distribution of the respective category TP, TN, FP or FN depending on the number of training samples for s = 1 (a subset per number of training samples was randomly selected from the population).
12 a discrete probability distribution of the respective category TP, TN, FP or FN depending on the number of training samples for r = 1 and s = 5, with the mean values of the mean values and the mean values of the 98% quantiles being shown.
13 discrete probability distributions of the respective category TP, TN, FP or FN depending on the number of training samples in different ratios of training to validation samples and in different calculation methods for the quantiles.
14 until 17 three convergence estimators.
18 schematically a workstation according to a second embodiment.

Ausführungsformen der ErfindungEmbodiments of the invention

Erfindungsgemäß wird im Rahmen eines Tests X, welcher als Testfall einem Testkatalog entnommen oder als Instanz eines parametrischen Tests gewonnen werden kann, der Simulationsmodellfehler SMerrorX ausgewertet und die quantitative Spezifikation QSpec auf der Grundlage einer Simulation des SUT bewertet. Der virtuelle Testklassifikator verwendet als Eingabe SMerrorX und QSpec und trifft eine binäre Entscheidung dahingehend, ob das auf der Simulation basierende Testergebnis vertrauenswürdig ist oder nicht.According to the invention, as part of a test X, which can be taken from a test catalog as a test case or obtained as an instance of a parametric test, the simulation model error SMerrorX is evaluated and the quantitative specification QSpec is evaluated on the basis of a simulation of the SUT. The virtual test classifier takes SMerrorX and QSpec as input and makes a binary decision as to whether or not to trust the test result based on the simulation.

Gemäß dem in der Informatik und insbesondere Mustererkennung üblichen Sprachgebrauch ist als Klassifikator hierbei jedweder Algorithmus oder jedwede mathematische Funktion zu verstehen, welche einen Merkmalsraum auf eine Menge von Klassen abbildet, die im Zuge einer Klassifizierung gebildet und voneinander abgegrenzt wurden. Um entscheiden zu können, in welche Klasse ein Objekt einzustufen oder zu klassieren (umgangssprachlich auch: „klassifizieren“) ist, zieht der Klassifikator sogenannte Klassen- oder Entscheidungsgrenzen heran. Sofern eine Unterscheidung zwischen Verfahren und Instanz nicht von Bedeutung ist, wird der Begriff „Klassifikator“ in der Fachsprache und auch nachfolgend teilweise gleichbedeutend mit „Einstufung“ oder „Klassierung“ verwendet.According to the language commonly used in computer science and in particular pattern recognition, a classifier is any algorithm or any mathematical function that maps a feature space to a set of classes that were formed and differentiated from one another in the course of a classification. In order to be able to decide in which class an object is to be classified or classed (colloquially also: "classify"), the classifier uses so-called class or decision limits. If a distinction between procedure and instance is not important, the term "classifier" in the technical language and also in the following is sometimes used synonymously with "classification" or "classification".

1 illustriert eine solche Einstufung im vorliegenden Anwendungsbeispiel. Hierbei entspricht jeder Punkt einem Test, der im Wege der Simulation durchgeführt und für den das Erfüllungsmaß (13) der Anforderung QSpec sowie das Fehlermaß (14) SMerrorX berechnet wurden. QSpec ist in diesem Fall so definiert, dass es einen positiven Wert annimmt, wenn der Test vermuten lässt, dass das System der jeweiligen Anforderung genügt (Bezugszeichen 24), und negativ, wenn das System die Anforderung verfehlt (Bezugszeichen 25). 1 illustrates such a classification in the present application example. Each point corresponds to a test that was carried out by way of simulation and for which the measure of fulfillment (13) of the QSpec requirement and the measure of error (14) SMerrorX were calculated. QSpec is defined in this case as taking a positive value if the test suggests that the system meets the requirement in question (reference number 24) and negative if the system fails the requirement (reference number 25).

Wie die Abbildung erkennen lässt, unterteilt die Entscheidungsgrenze (19) des Klassifikators (18) den Raum in vier Klassen A, B, C und D. Tests der Klasse A würden vom System mit hoher Zuverlässigkeit bestanden. Für Tests der Klassen B und C liefert die Simulation lediglich unzuverlässige Ergebnisse; derartige Tests sind daher auf dem realen System durchzuführen. Tests der Klasse D würden auf dem System mit hoher Zuverlässigkeit fehlschlagen.As can be seen from the figure, the decision limit (19) of the classifier (18) divides the space into four classes A, B, C and D. Class A tests would be passed by the system with high reliability. For tests of classes B and C, the simulation only provides unreliable results; such tests must therefore be carried out on the real system. Class D tests would fail on the system with high confidence.

Dieser virtuelle Test-Klassifikator (18) gründet auf der Überlegung, dass eine in der Simulation nur knapp erfüllte Anforderung nur dann die Erprobung des realen Systems ersetzen kann, wenn von einem allenfalls marginalen Modellfehler (14) auszugehen ist. Andererseits kann bei einem betragsmäßig hohen Erfüllungsmaß (13) der quantitativen Anforderung QSpec, also einer bei weitem übererfüllten oder deutlich verfehlten Vorgabe, eine gewisse Abweichung der Simulationsergebnisse von entsprechenden experimentellen Messungen hingenommen werden.This virtual test classifier (18) is based on the consideration that a requirement that is only just met in the simulation can only replace the testing of the real system if at most a marginal model error (14) can be assumed. On the other hand, if the degree of fulfillment (13) of the quantitative requirement QSpec is high in terms of amount, i.e. a specification that is far exceeded or clearly missed, a certain deviation of the simulation results from corresponding experimental measurements can be accepted.

Da diese Betrachtungsweise die Kenntnis des Modellfehlers SMerrorX des Simulationsmodells voraussetzt, wird davon ausgegangen, dass letzteres im Vorfeld der Verwendung des virtuellen Test-Klassifikators (18) einer Verifikation und Validierung unterzogen wurde. Im Rahmen dieser Validierung sollte - z. B. auf der Grundlage eines Gaußprozesses oder anderweitig durch maschinelles Lernen - ein verallgemeinertes Modell gebildet werden, das SMerrorX für ein gegebenes X liefert. Dabei ist zu beachten, dass die Vertrauenswürdigkeit der Simulation entscheidend von der Korrektheit dieses generalisierten Modells abhängt.Since this approach requires knowledge of the model error SMerrorX of the simulation model, it is assumed that the latter was subjected to verification and validation prior to using the virtual test classifier (18). As part of this validation should - z. B. based on a Gaussian process or otherwise by machine learning - a generalized model that yields SMerrorX for a given X. It should be noted that the reliability of the simulation depends crucially on the correctness of this generalized model.

2 verdeutlicht einen möglichen Ansatz zur Ziehung der Entscheidungsgrenze (19 - 1) des Klassifikators (18) auf der Grundlage von Daten. Im einfachsten Fall verläuft die Grenze (19) hierbei entlang einer Ursprungsgeraden. Die Steigung der Geraden ist vorzugsweise so zu wählen, dass alle Punkte, in denen sich das Erfüllungsmaß (13) der quantitativen Anforderung QSpec zwischen Simulation (11) und realer Messung (21) im Vorzeichen unterscheidet - also gleichsam alle Tests (12), bei denen das Simulationsmodell versagt -, in den Bereichen C und B liegen und diese Bereiche zudem möglichst klein sind. 2 clarifies a possible approach to draw the decision boundary (19 - 1 ) of the classifier (18) based on data. In the simplest case, the boundary (19) runs along a straight line through the origin. The slope of the straight line should preferably be selected in such a way that all points in which the degree of fulfillment (13) of the quantitative requirement QSpec differs in sign between simulation (11) and real measurement (21) - i.e. all tests (12), so to speak, at which the simulation model fails - are in the areas C and B and these areas are also as small as possible.

In Betracht kommt ferner eine allgemeinere, z. B. polynomielle Entscheidungsgrenze (19), deren Funktionskurve mittels linearer Programmierung derart angepasst wird, dass sie das Kriterium eines Klassifikators (18) VTC erfüllt. Auch in diesem Fall liegen alle Punkte, in denen sich das Erfüllungsmaß (13) der quantitativen Anforderung QSpec zwischen Simulation (11) und realer Messung (21) im Vorzeichen unterscheidet - also gleichsam alle Tests (12), bei denen das Simulationsmodell versagt -, in den Bereichen C und B.A more general one, e.g. B. polynomial decision limit (19), whose function curve is adapted by means of linear programming in such a way that it meets the criterion of a classifier (18) VTC. In this case, too, all points in which the degree of fulfillment (13) of the quantitative requirement QSpec differs in sign between simulation (11) and real measurement (21) - i.e. all tests (12), so to speak, in which the simulation model fails - in areas C and B.

Der einleitend erwähnte virtuelle Test-Klassifikator (im Folgenden: „VTC“), der die Größen QSpec und SMerrorX für die Partitionierung von Testfällen verwendet, benötigt aufgrund der lernbasierten Eigenschaft eine gewisse Datenmenge einerseits für die Erzeugung von Grenzlinien für die Klassifikation in der Trainingsphase. Andererseits wird eine Datenmenge, welche sich von den Trainingsdaten unterscheidet, zur Validierung des Klassifikators verwendet. Im Allgemeinen ist es von Vorteil, möglichst große Datenmenge zum Trainieren des Klassifikators zur Verfügung zu stellen. Damit besteht eine hohe Wahrscheinlichkeit, dass die Grenzlinien eines Klassifikators mit dem Hinzukommen weiterer Trainingsdaten unverändert bleiben. Ein Maß für eben diese Wahrscheinlichkeit wird im vorliegenden Zusammenhang als „Robustheit“ bezeichnet.The virtual test classifier (hereinafter: "VTC") mentioned in the introduction, which uses the variables QSpec and SMerrorX for the partitioning of test cases, requires a certain amount of data due to its learning-based property on the one hand for the generation of boundary lines for the classification in the training phase. On the other hand, a dataset that differs from the training data is used to validate the classifier. In general, it is advantageous to provide the largest possible amount of data for training the classifier. Thus, there is a high probability that the boundary lines of a classifier will remain unchanged as more training data is added. A measure of this probability is referred to as "robustness" in the present context.

Ein VTC von hoher Robustheit liefert somit wahrscheinlich auch dann noch eine korrekte Klassifikation, wenn neue Testdaten hinzugefügt werden. Dies gilt unter der Annahme, dass die Testdaten und Trainingsdaten der gleichen oder einer ähnlichen Verteilung unterliegen.A VTC with high robustness is thus likely to provide a correct classification even when new test data is added. This applies under the assumption that the test data and training data are subject to the same or a similar distribution.

In der Praxis können jedoch reale Tests - z. B. aus wirtschaftlichen Gründen - nicht in unbegrenzter Anzahl durchgeführt werden. Einerseits gilt es, die Menge realer Tests zu minimieren und diese durch Simulation zu ersetzen; andererseits basiert das Vertrauen in die Simulation und in die Korrektheit des VTC auf einer ausreichend großen realen Datenmenge zum Trainieren. Die Erfindung zielt darauf ab, diese Gegensätze in Einklang zu bringen. Zu dieser Herausforderung werden folgende Aspekte betrachtet:

• Bestimmung der für eine robuste Aussage erforderlichen Mindestanzahl der als Grundwahrheit (ground truth) für das Training und die Validierung dienenden Datensätze. Hierbei sollen die Wahrheitswerte die Kriterien für die Vertrauenswürdigkeit eines Tests beinhalten.
• Bestimmung der optimalen Aufteilung in Trainingsdaten und Validierungsdaten.
• Bestimmung der Kennzahl für die Quantifizierung der Robustheit unter Betrachtung vom Konvergenzverhalten.

In practice, however, real tests - e.g. B. for economic reasons - not be carried out in unlimited numbers. On the one hand, it is important to minimize the number of real tests and to replace them with simulation; on the other hand, the confidence in the simulation and in the correctness of the VTC is based on a sufficiently large real data set for training. The invention aims to reconcile these opposites. The following aspects are considered for this challenge:

• Determine the minimum number of ground truth data sets required for training and validation to make a robust statement. The truth values should contain the criteria for the trustworthiness of a test.
• Determination of the optimal split into training data and validation data.
• Determination of the index for the quantification of the robustness considering the convergence behavior.

Der virtuelle Test-Klassifikator hat eine besondere Eigenschaft, die ihn von traditionellen lernbasierten Klassifikationsmethoden z. B. im maschinellen Lernen unterscheidet. Anhand einer Worstcase-Betrachtung werden die Entscheidungsgrenzen im virtuellen Test-Klassifikator mit den beobachteten Stützpunkten generiert, die anhand des definierten Kriteriums als nicht vertrauenswürdig eingestuft werden. Mit dieser Besonderheit ist die Frage nach der erforderlichen Anzahl der Trainingsdaten zum Aufbau eines robusten Klassifikators verknüpft, da die Existenz und ggf. Position solcher Stützpunkte abhängig von der Partitionierung der Trainingsdaten aus der Grundgesamtheit variiert.The virtual test classifier has a special property that sets it apart from traditional learning-based classification methods, e.g. B. in machine learning. Based on a worst-case analysis, the decision limits in the virtual test classifier are generated with the observed support points that are classified as untrustworthy based on the defined criterion. The question of the required amount of training data to build a robust classifier is linked to this peculiarity, since the existence and possible position of such support points varies depending on the partitioning of the training data from the population.

Erfindungsgemäß wird der Klassifikator zur Klärung dieser Frage gleichsam anhand einer Statistik von Statistiken der Datensätze evaluiert. Mit anderen Worten wird ausgewertet, mit welcher relativen Häufigkeit die Elemente der Konfusionsmatrix bestimmte Werte annehmen; diese relativen Häufigkeiten werden als Wahrscheinlichkeiten interpretiert. Letztlich sind hierbei Trainings- und Validierungsdatensätze in ihrer Gesamtheit ebenfalls als Stichprobe aus einer unbekannten (und ggf. unendlich großen) Grundgesamtheit aller theoretisch möglichen Testfälle zu betrachten.According to the invention, to clarify this question, the classifier is evaluated, so to speak, using statistics from statistics of the data sets. In other words, the relative frequency with which the elements of the confusion matrix assume certain values is evaluated; these relative frequencies are interpreted as probabilities. Ultimately, training and validation datasets in their entirety should also be viewed as a random sample from an unknown (and possibly infinitely large) population of all theoretically possible test cases.

Die besagte Statistik von Statistiken kommt erst dann zum Tragen, wenn die Streuung der relativen Häufigkeiten untersucht wird. Basierend darauf lassen sich Robustheitsindikatoren dafür definieren, wie vieler Punkte es insgesamt bedarf. Zusätzlich werden verschiedene Varianten zur Berechnung der Kennzahl für die Robustheit aufgezeigt.Said statistics of statistics only come into play when the spread of the relative frequencies is examined. Based on this, robustness indicators can be defined for how many points are required in total. In addition, various variants for calculating the key figure for robustness are shown.

3 illustriert die Schritte (1-4) eines entsprechenden Verfahrens (10) mitsamt ihrer jeweiligen Parameter. Zunächst wird die Grundgesamtheit in die Trainings- und Validierungsdaten unterteilt und die zwei Datensätze kombiniert (Schritt 1). 3 illustrates the steps (1-4) of a corresponding method (10) together with their respective parameters. First, the population divided into the training and validation data and combined the two datasets (step 1).

Nachdem die Datensätze für das Training und die Validierung durch die Kombination festgelegt sind, werden die Stichproben innerhalb von Trainings- und Validierungsdatensatz genommen. Bei der Validierung des trainierten Test-Klassifikators wird eine Konfusionsmatrix erstellt, mittels derer der Fehler des binären Klassifikators quantitativ beurteilt wird (Schritt 2).After the datasets for training and validation are determined by the combination, the samples are taken within the training and validation dataset. During the validation of the trained test classifier, a confusion matrix is created, which is used to quantitatively assess the error of the binary classifier (step 2).

Nach der Iteration dieser Schritte 1 und 2 wird eine Verteilung von Einträgen von Konfusionsmatrizen generiert (Schritt 3).After iterating these steps 1 and 2, a distribution of entries from confusion matrices is generated (step 3).

Schließlich wird ein geeigneter Umfang des Trainingsdatensatzes anhand einer Kennzahl (key performance indicator, KPI) für die Robustheit des Test-Klassifikators bestimmt (Schritt 4).Finally, a suitable scope of the training data set is determined using a key performance indicator (KPI) for the robustness of the test classifier (step 4).

In den folgenden Abschnitten werden diese Schritte (1-4) detailliert erklärt.These steps (1-4) are explained in detail in the following sections.

Im ersten Schritt (1) wird die vorhandene Grundgesamtheit anhand der definierten Anzahl der Trainingsdaten M und dem Verhältnis zwischen Trainingsdaten M und Validierungsdaten K in Trainingsdaten M und Validierungsdaten K unterteilt. Wie in 4 dargestellt, kann die Vereinigungsmenge der aufgeteilten Trainings- und Validierungsdaten kleiner als die Grundgesamtheit sein. Anschließend werden die unterteilten Daten kombiniert. Auch eine solche Kombination der Aufteilung wird auf der rechten Seite der 4 exemplarisch dargestellt.In the first step (1), the existing population is divided into training data M and validation data K based on the defined number of training data M and the ratio between training data M and validation data K. As in 4 shown, the union of the split training and validation data can be smaller than the population. Then the divided data is combined. Also such a combination of the breakdown is shown on the right side of the 4 shown as an example.

Aus dieser Kombination (1) entstehen im zweiten Schritt (2 - 3) verschiedene Aufteilungen der Grundgesamtheit. Bei einer Aufteilung werden Datenpunkte aus einem Trainingsdatensatz anhand der definierten Stichprobenzahl (15 - 3) gezogen und damit die Entscheidungsgrenzlinien des Test-Klassifikators bestimmt. Nach dem Lernprozess wird der Klassifikator mittels eines Validierungsdatensatzes validiert. In der Validierung werden die Wahrheitswerte vom Datensatz bestimmt und mittels dieser die binäre Klassifikation evaluiert. Die Ergebnisse der Evaluierung werden in Form einer Konfusionsmatrix dargestellt. In dieser speziellen Kontingenztafel werden sowohl die Häufigkeit der korrekten Einstufung durch den Klassifikator - richtig positiv (true positive, TP), richtig negativ (true negative, TN) - als auch die Häufigkeit der fehlerhaften Einstufung durch den Klassifikator - falsch positiv (false positive, FP), falsch negativ (false negative, FN) - dargestellt.From this combination (1) arise in the second step (2 - 3 ) different partitions of the population. A split takes data points from a training dataset based on the defined sample size (15 - 3 ) and thus the decision boundary lines of the test classifier are determined. After the learning process, the classifier is validated using a validation data set. In the validation, the truth values are determined from the data set and the binary classification is evaluated using them. The results of the evaluation are presented in the form of a confusion matrix. In this special contingency table, both the frequency of the correct classification by the classifier - true positive (TP), true negative (true negative (TN) - and the frequency of the incorrect classification by the classifier - false positive (false positive, FP), false negative (FN) - shown.

5 stellt den Verlauf zur Evaluierung eines Klassifikators innerhalb einer Kombination (1) des Datensatzes dar. Auf der linken Seite wird ein Kombinationsfall gezeigt, woraus der Trainingsdatensatz (17) und der Validierungsdatensatz (23) bereitgestellt werden, wobei eine Restmenge (28) der Grundgesamtheit nicht als Grundwahrheit herangezogen wird. Mit den Trainingsdaten wird ein Test-Klassifikator eingelernt (26), validiert (27) und anschließend wird dieser anhand des Validierungsdatensatzes evaluiert (29). Wie auf der rechten Seite der Abbildung dargestellt ist, wird das Evaluierungsergebnis in Form einer Konfusionsmatrix (31) aufgetragen. 5 shows the course of the evaluation of a classifier within a combination (1) of the data set. On the left side, a combination case is shown from which the training data set (17) and the validation data set (23) are provided, with a remainder (28) of the population not used as the basic truth. A test classifier is taught (26) with the training data, validated (27) and then evaluated using the validation data set (29). As shown on the right side of the figure, the evaluation result is plotted in the form of a confusion matrix (31).

Sodann wird in einem dritten Schritt (3 - 3) die Häufigkeit der jeweiligen Einträge in die Konfusionsmatrix eingetragen. Nach dem Durchlauf der Kombination in der Grundgesamtheit entsteht eine Verteilung der Einträge aus der Konfusionsmatrix.Then in a third step (3 - 3 ) entered the frequency of the respective entries in the confusion matrix. After the combination has been run through in the population, a distribution of the entries from the confusion matrix is produced.

6 illustriert zunächst den Vorgang zur Kombination der Grundgesamtheit auf dem linken Schaubild. Für jede Kombination (1) wird nach der Evaluierung des Test-Klassifikators (29) eine Konfusionsmatrix erstellt. Die Kombination der Grundgesamtheit führt zur Vielzahl von Konfusionsmatrizen (31, 32), die in der mittleren Darstellung aufgetragen sind. Die Einträge dieser Konfusionsmatrizen werden in Form von Verteilungen zusammengefasst. 6 first illustrates the process of combining the population on the left diagram. A confusion matrix is created for each combination (1) after evaluating the test classifier (29). The combination of the population leads to the large number of confusion matrices (31, 32) which are plotted in the middle representation. The entries in these confusion matrices are summarized in the form of distributions.

Im Folgenden wird ein Verfahren beschrieben, nach dem die Wahrscheinlichkeiten für eine gegebene Größe des Trainingsdatensatzes M und des Validierungsdatensatzes K berechnet werden können. Auf Grundlage dieser Wahrscheinlichkeiten und abgeleiteter Mittelwerte und Momente werden Konvergenzkriterien ausgewertet. N ist die Zahl der Punkte, für die sowohl ein realer als auch ein simulationsbasierter Test durchgeführt wurde. Es lassen sich zwei Arten von Anwendungsfällen unterscheiden:

In einem ersten Anwendungsfall werden die vorhandenen Daten vollständig in Validierungs- und Trainingsdaten aufgeteilt. Es wird untersucht, wie sich die Wahrscheinlichkeiten für die Einträge in der Konfusionsmatrix verändern, wenn das Verhältnis M/K variiert wird. In diesem Fall ist N = K + M.

A method is described below, according to which the probabilities for a given size of the training data set M and the validation data set K can be calculated. Convergence criteria are evaluated based on these probabilities and derived mean values and moments. N is the number of points for which both real and simulation-based testing was performed. Two types of use cases can be distinguished:

In a first use case, the existing data is completely divided into validation and training data. It is examined how the probabilities for the entries in the confusion matrix change when the ratio M/K is varied. In this case N = K + M.

In einem zweiten Anwendungsfall wird, um festzustellen, ob insgesamt genügend Datenpunkte erhoben wurden, ein Teil der Punkte weggelassen und untersucht, wie sich die Wahrscheinlichkeiten (und daraus abgeleitete Größen) verändern, wenn die Summe aus Validierungsdaten und Trainingsdaten variiert wird. In diesem Fall ist N > M + K.In a second use case, to determine whether enough data points have been collected, some of the points are omitted and it is examined how the probabilities (and quantities derived from them) change when the sum of validation data and training data is varied. In this case N > M + K.

Im Folgenden wird die Vorgehensweise für den ersten Anwendungsfall beschrieben. Für den zweiten Fall können die Wahrscheinlichkeiten entweder analytisch berechnet werden oder es werden numerisch die Auswahl der Punkte variiert und für jede Auswahl die Wahrscheinlichkeiten mit Hilfe des folgenden Algorithmus berechnet:

In einem ersten Schritt werden die Datenpunkte bezüglich der quantitativen Anforderung QSpec in vier Kategorien unterteilt:
1. 1. zuverlässig erfüllt (Simulation: QSpec ≥ 0, Messung: QSpec ≥ 0),
2. 2. vermutlich erfüllt (Simulation: QSpec ≥ 0, Messung: QSpec < 0),
3. 3. zuverlässig verfehlt (Simulation: QSpec < 0, Messung: QSpec < 0) und
4. 4. vermutlich verfehlt (Simulation: QSpec < 0, Messung: QSpec ≥ 0).

The procedure for the first use case is described below. For the second case, the probabilities can either be calculated analytically or the selection of the points can be varied numerically and for each choice calculates the probabilities using the following algorithm:

In a first step, the data points are divided into four categories with regard to the quantitative requirement QSpec:
1. 1. reliably fulfilled (simulation: QSpec ≥ 0, measurement: QSpec ≥ 0),
2. 2. probably fulfilled (simulation: QSpec ≥ 0, measurement: QSpec < 0),
3. 3. reliably missed (simulation: QSpec < 0, measurement: QSpec < 0) and
4. 4. presumably missed (simulation: QSpec < 0, measurement: QSpec ≥ 0).

In einem zweiten Schritt wird für jeden Datenpunkt das Verhältnis SMerrorX/QSpec berechnet.In a second step, the SMerrorX/QSpec ratio is calculated for each data point.

In einem dritten Schritt werden für alle vier genannten Mengen die Punkte gemäß ihrem errechneten Verhältnis SMerrorX/QSpec in aufsteigender Reihenfolge sortiert.In a third step, the points for all four named sets are sorted in ascending order according to their calculated ratio SMerrorX/QSpec.

In einem vierten Schritt werden für die rechte und linke Seite des VTC jeweils getrennt die Falsch-positiv- und Falsch-negativ-Rate berechnet. Die Richtig-positiv- und Richtig-negativ-Raten lassen sich auf analoge Weise berechnen.In a fourth step, the false positive and false negative rates are calculated separately for the right and left sides of the VTC. The true positive and true negative rates can be calculated in an analogous manner.

In einem fünften Schritt schließlich werden die Falsch-positiv- und Falsch-negativ-Rate für die rechte Seite des VTC wie folgt berechnet. Hierbei sei TP_sorted eine Liste aller Punkte, die die Anforderung zuverlässig erfüllen, und UP_sorted eine Liste aller Punkte, die die Anforderung vermutlich erfüllen.Finally, in a fifth step, the false positive and false negative rates for the right side of the VTC are calculated as follows. Here TP _sorted is a list of all points that reliably meet the requirement and UP _sorted is a list of all points that are likely to meet the requirement.

Zunächst wird die Wahrscheinlichkeit berechnet, dass der Punkt mit Index I im Trainingsdatensatz enthalten und zugleich der Stützpunkt ist: $P_{1} (I) = M \frac{(N - I)! (N - M)!}{N! (N - I - M)!}$

Falls I + M ≥ N gilt, beträgt die Wahrscheinlichkeit, dass der betreffende Punkt als Stützpunkt gewählt wird, null. In diesem Fall kann es keine Stichprobe geben, in der der Punkt mit dem Index I enthalten ist und es zugleich keinen Punkt mit einem niedrigeren Index und damit mit einem niedrigeren Verhältnis SMerrorX/QSpec gibt.First, the probability is calculated that the point with index I is contained in the training data set and is also the support point:

P_{1} (I) = M \frac{(N - I)! (N - M)!}{N! (N - I - M)!}

If I + M ≥ N, the probability that the point in question will be chosen as a support point is zero. In this case, there cannot be a sample containing the point with index I and at the same time there is no point with a lower index and therefore with a lower SMerrorX/QSpec ratio.

Es ist möglich, dass kein in der Grundwahrheit als unzuverlässig eingestufter Punkt im Trainingsdatensatz enthalten ist. In diesem Fall wird als Stützpunkt derjenige als zuverlässig eingestufte Punkt mit der höchsten Steigung verwendet.It is possible that no point classified as unreliable in the ground truth is included in the training dataset. In this case, the point classified as reliable with the highest gradient is used as the reference point.

Für die Zwecke der weiteren Ausführungen sei Zahl_up die Gesamtzahl derjenigen Punkte, welche die Anforderung vermutlich erfüllen, und Zahl_Tp die Gesamtzahl derjenigen Punkte, welche die Anforderung zuverlässig erfüllen. Die Wahrscheinlichkeit, dass letztere Gruppe den Punkt mit dem Index I enthält und dieser zugleich als Stützpunkt dient, beträgt für J = Zahl_TP + 1 - I $P_{2} (J) = M \frac{(N - {Zahl}_{U P} - I)! (N - M)!}{N! (N - I - M - {Zahl}_{U P})!} .$

Falls I + M + Zahl_up > N gilt, so ist der Trainingsdatensatz M so groß, dass es nicht möglich ist, dass keine Punkte, welche die Anforderung vermutlich erfüllen, und zugleich keine Punkte, welche die Anforderung sicher erfüllen, mit höherem SMerrorX/QSpec gezogen wurden. In diesem Fall beträgt die Wahrscheinlichkeit null.For purposes of further discussion, let number _up be the total number of points that are likely to meet the requirement and number _Tp be the total number of points that reliably meet the requirement. The probability that the latter group contains the point with the index I and that this also serves as a support point is for J = number _TP + 1 - I

P_{2} (J) = M \frac{(N - {Number}_{u P} - I)! (N - M)!}{N! (N - I - M - {Number}_{u P})!} .

If I + M + number _up > N applies, then the training data set M is so large that it is not possible that no points that presumably meet the requirement and at the same time no points that definitely meet the requirement have a higher SMerrorX/ QSpec were drawn. In this case, the probability is zero.

Die Wahrscheinlichkeit dafür, dass keiner der Punkte im Trainingsdatensatz die Anforderung vermutlich oder zuverlässig erfüllt, beträgt $P_{3} = M \frac{(N - {Zahl}_{U P} - {Zahl}_{T P})! (N - M)!}{N! (N - M - {Zahl}_{T P} - {Zahl}_{U P})!} .$

Falls Zahl_TP + Zahl_UP + M > N gilt, so ist dieses Ereignis unmöglich. Die Wahrscheinlichkeit beträgt null.The probability that none of the points in the training set are likely or reliable to meet the requirement is

P_{3} = M \frac{(N - {Number}_{u P} - {Number}_{T P})! (N - M)!}{N! (N - M - {Number}_{T P} - {Number}_{u P})!} .

If number _TP + number _UP + M > N, then this event is impossible. The probability is zero.

Die Wahrscheinlichkeit, dass die Anzahl der im Validierungsdatensatz enthaltenen falsch positiven Punkte exakt fp beträgt, lässt sich wie folgt bestimmen:

Zunächst wird verglichen, für welche Punkte, welche die Anforderung zuverlässig erfüllen, SMerrorX/QSpec größer ist als der entsprechende Wert desjenigen Punktes mit dem Index fp, welcher die Anforderung vermutlich erfüllt, und zugleich kleiner als der entsprechende Wert desjenigen Punktes mit dem Index fp + 1, welcher die Anforderung vermutlich erfüllt. Die zugehörigen Wahrscheinlichkeiten P₂ werden addiert. Zusätzlich wird der Wert P₁(fp + 1) addiert. Dieser Summand gibt an, mit welcher Wahrscheinlichkeit der Punkt mit dem Index fp + 1 den Stützpunkt für den VTC bildet. Für den Spezialfall fp = 0 wird zusätzlich der Wert von P₃ addiert. Wenn im Trainingsdatensatz kein Punkt auf der rechten Seite enthalten war, werden alle Punkte als unzuverlässig eingestuft. In diesem Fall kann es keine falsch negativen Punkte geben.

The probability that the number of false positive points contained in the validation data set is exactly fp can be determined as follows:

First, it is compared for which points, which reliably fulfill the requirement, SMerrorX/QSpec is larger than the corresponding value of the point with the index fp, which presumably fulfills the requirement, and at the same time smaller than the corresponding value of the point with the index fp + 1, which presumably meets the requirement. The associated probabilities P ₂ are added. In addition, the value P ₁ (fp + 1) is added. This addend indicates the probability with which the point with the index fp + 1 forms the support point for the VTC. For the special case fp = 0, the value of P ₃ is also added. If there was no point on the right side in the training dataset, all points are considered unreliable. In this case, there can be no false negative points.

Wie 7 verdeutlicht, entspricht die Wahrscheinlichkeit, dass die Falsch-positiv-Rate exakt I beträgt, der Summe der Wahrscheinlichkeiten aller in Betracht kommenden zuverlässigen Punkte, falls ein gemäß der Grundwahrheit die Anforderung zuverlässig erfüllender (33) Punkt als Stützpunkt ausgewählt wurde.As 7 clarified, the probability that the false positive rate is exactly I is equal to the sum of the probabilities of all the reliable points considered, if a point that reliably fulfills the requirement according to the basic truth (33) was selected as the reference point.

Die Wahrscheinlichkeit, dass die Anzahl der im Validierungsdatensatz enthaltenen falsch negativen Punkte exakt fp beträgt, setzt sich wie folgt zusammen: Zunächst wird verglichen, für welche Punkte, welche die Anforderung vermutlich erfüllen (34 - 7), SMerrorX/QSpec größer ist als der entsprechende Wert desjenigen Punktes mit dem Index fp, welcher die Anforderung zuverlässig erfüllt, und zugleich kleiner als der entsprechende Wert desjenigen Punktes mit dem Index fn + 1, welcher die Anforderung zuverlässig erfüllt. Die zugehörigen Wahrscheinlichkeiten P₁ werden addiert. Zusätzlich wird der Wert P₂(fn) addiert. Dieser Summand gibt an, mit welcher Wahrscheinlichkeit der Punkt mit dem Index fn den Stützpunkt für den VTC bildet. Für den Spezialfall fn = 0 wird zusätzlich der Wert von P₃ addiert. Wenn im Trainingsdatensatz kein Punkt auf der rechten Seite enthalten war, werden auch hier alle Punkte als unzuverlässig eingestuft. In diesem Fall kann es keine falsch negativen Punkte geben.The probability that the number of false negative points contained in the validation data set is exactly fp is made up as follows: First, a comparison is made for which points which presumably meet the requirement (34 - 7 ), SMerrorX/QSpec is greater than the corresponding value of the point with the index fp, which reliably fulfills the requirement, and at the same time smaller than the corresponding value of the point with the index fn + 1, which reliably fulfills the requirement. The associated probabilities P ₁ are added. In addition, the value P ₂ (fn) is added. This addend indicates the probability with which the point with the index fn forms the support point for the VTC. For the special case fn = 0, the value of P ₃ is also added. Again, if there was no point on the right side in the training dataset, all points are considered unreliable. In this case, there can be no false negative points.

Die Wahrscheinlichkeiten falsch positiver und falsch negativer Fälle für die rechte Seite des VTC, also diejenigen Punkte, welche die Anforderung vermutlich oder zuverlässig verfehlen, können analog berechnet werde. Hierbei ist jeweils der Betrag von SMerrorX/QSpec zu betrachten.The probabilities of false positive and false negative cases for the right side of the VTC, i.e. those points which presumably or reliably fail the requirement, can be calculated analogously. The amount of SMerrorX/QSpec must be considered here.

Als Alternative zu dieser rein analytischen Vorgehensweise kommt ein hybrider Ansatz unter Verwendung von Sampling und exakter Berechnung von Wahrscheinlichkeitsverteilungen in Betracht. Hierzu ist folgende Auslegung nützlich: Der VTC ist ein binärer Klassifikator, der beurteilt, ob ein Test, der im Kontext eines Simulationsmodells durchgeführt wird, bei Durchführung im realen System zum gleichen Ergebnis führt (Test bestanden bzw. Test fehlgeschlagen). Das Ergebnis des VTC kann als Vertrauen in einen simulationsbasierten Test interpretiert werden. Eine etablierte Grundlage zur Bewertung binärer Klassifikatoren ist die Wahrheits- oder Konfusionsmatrix mit folgenden Einträgen:

• richtig positiv (TP), richtig negativ (TN): Vertrauen bzw. Nicht-Vertrauen ist korrekt
• falsch positiv (FP): Vertrauen ist inkorrekt
• falsch negativ (FN): Nicht-Vertrauen ist inkorrekt

As an alternative to this purely analytical approach, a hybrid approach using sampling and exact calculation of probability distributions can be considered. The following interpretation is useful here: The VTC is a binary classifier that assesses whether a test performed in the context of a simulation model leads to the same result when performed in the real system (test passed or test failed). The result of the VTC can be interpreted as confidence in a simulation-based test. An established basis for evaluating binary classifiers is the truth or confusion matrix with the following entries:

• true positive (TP), true negative (TN): trust or lack of trust is correct
• False Positive (FP): Confidence is incorrect
• False Negative (FN): non-confidence is incorrect

Aus diesen Größen lässt sich eine Reihe ebenfalls etablierter Indikatoren ableiten, zum Beispiel

• die Richtig-negativ-Rate $\frac{T N}{T N + F P},$
• die Falsch-negativ Rate $\frac{F N}{F N + T P},$
• die Akkuratheit $\frac{T P + T N}{T P + T N + F P + F N},$
• die Präzision $P = \frac{T P}{T P + F P},$
• die Trefferquote (recall) $R = \frac{T P}{T P + F N},$
• das F1-Maß (F1 score) als harmonisches Mittel aus Präzision und Trefferquote $\frac{2}{\frac{1}{P} + \frac{1}{R}},$
• der Fowlkes-Mallows-Index als geometrisches Mittel aus Präzision und Trefferquote $FMI = \sqrt{P \cdot R} oder$
• der Matthews-Korrelationskoeffizient $MCC = \frac{T P \cdot T N - F P \cdot F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}} .$

A number of well-established indicators can be derived from these variables, for example

• the true negative rate $\frac{T N}{T N + f P},$
• the false negative rate $\frac{f N}{f N + T P},$
• the accuracy $\frac{T P + T N}{T P + T N + f P + f N},$
• the precision $P = \frac{T P}{T P + f P},$
• the hit rate (recall) $R = \frac{T P}{T P + f N},$
• the F1 measure (F1 score) as the harmonic mean of precision and hit rate $\frac{2}{\frac{1}{P} + \frac{1}{R}},$
• the Fowlkes-Mallows index as the geometric mean of precision and hit rate $FMI = \sqrt{P \cdot R} or$
• the Matthews correlation coefficient $MCC = \frac{T P \cdot T N - f P \cdot f N}{\sqrt{(T P + f P) (T P + f N) (T N + f P) (T N + f N)}} .$

Für die Zwecke der weiteren Ausführungen sei angenommen, dass jeder Datenpunkt der Grundwahrheit aus den folgenden Komponenten gebildet wird:

1. den zugrundeliegenden Testparametern,
2. dem im Wege der Messung gewonnenen Wert von QSpec,
3. dem im Wege der Simulation gewonnenen Wert von QSpec und
4. dem Simulationsmodellfehler SMerrorX für die gegebenen Parameter.

For the purposes of further discussion, it is assumed that each data point of the ground truth is formed from the following components:

1. the underlying test parameters,
2. the value of QSpec obtained through the measurement,
3. the value of QSpec obtained through the simulation and
4. The simulation model error SMerrorX for the given parameters.

Gegeben sei ferner ein Verhältnis r von Trainings- zu Validierungssamples, Schranken min und max für die Anzahl an Trainingssamples, die Anzahl t unterschiedlicher Anzahlen an Trainingssamples sowie die Anzahl s der Teilmengen pro Anzahl an Trainingssamples, die zufällig aus der Grundgesamtheit ausgewählt werden.Also given is a ratio r of training to validation samples, limits min and max for the number of training samples, the number t of different numbers of training samples and the number s of subsets per number Training samples randomly selected from the population.

Die Schranken min und max sind hierbei so zu wählen, dass für eine gegebene Größe einer Teilmenge aus der Grundgesamtheit das Verhältnis r von Trainingszu Validierungssamples gut approximiert werden kann. Auf dieser Grundlage werden t natürliche, vorzugsweise etwa äquidistante Zahlen zwischen den Schranken min und max gewählt. Diese geben die - im Folgenden jeweils mit #T bezeichnete - Anzahl an Trainingssamples an, die bei der Evaluation verwendet werden. Die Anzahl der Validierungssamples beträgt folglich $round (\frac{# T}{r}) .$

The limits min and max are to be chosen in such a way that the ratio r of training samples to validation samples can be well approximated for a given size of a subset from the population. On this basis, t natural, preferably approximately equidistant numbers between the limits min and max are selected. These indicate the number of training samples - referred to below as #T - that are used in the evaluation. The number of validation samples is therefore

round (\frac{# T}{right}) .

Für die Zwecke des hybriden Ansatzes zur Evaluation des VTC sei G die Menge an Tupeln (#T, #V) aus der Anzahl an Trainings- und der Anzahl an Validierungssamples, die sich nach obigem Verfahren aus den Parametern r, min, max und t ergeben. Für jedes Tupel (#T, #V) ∈ G werden zunächst #T + #V Datenpunkte zufällig aus der Grundgesamtheit ausgewählt. Für alle möglichen Aufteilungen in #T Trainingssamples und #V Validierungssamples innerhalb dieser Auswahl wird die exakte diskrete Wahrscheinlichkeitsverteilung von TP, TN, FP und FN bestimmt. Auf der Grundlage der resultierenden Menge diskreter Wahrscheinlichkeitsverteilungen schließlich werden die gewünschten Statistiken und Konvergenzschätzer berechnet.For the purposes of the hybrid approach to evaluating the VTC, let G be the set of tuples (#T, #V) from the number of training and the number of validation samples, which are obtained from the parameters r, min, max and t according to the above procedure result. For each tuple (#T, #V) ∈ G, #T + #V data points are first randomly selected from the population. For all possible partitions into #T training samples and #V validation samples within this selection, the exact discrete probability distribution of TP, TN, FP and FN is determined. Finally, based on the resulting set of discrete probability distributions, the desired statistics and convergence estimators are computed.

In den 8 und 9 wird der Ansatz beispielhaft veranschaulicht. Hierbei sei #T + #V = 10 und r = 1. Damit gilt #T = #V = 5. Für eine Auswahl von 10 Datenpunkten aus der Grundgesamtheit zeigt 8 eine mögliche Aufteilung in Trainingsdaten (17) und Validierungsdaten (23).In the 8th and 9 the approach is illustrated with an example. Here, let #T + #V = 10 and r = 1. Thus, #T = #V = 5. For a selection of 10 data points from the population, shows 8th a possible division into training data (17) and validation data (23).

Nach dem Training des VTC wird jedes Validierungssample der Kategorie TP, TN, FP und FN zugeordnet. Ein beispielhaftes Ergebnis zeigt 9. Wählt man zufällig ein Validierungssample aus, so ist dieses mit einer Wahrscheinlichkeit von $\frac{3}{5}$

einer der Kategorien TP oder TN, mit einer Wahrscheinlichkeit von

\frac{1}{5}

der Kategorie FP und ebenfalls mit einer Wahrscheinlichkeit von

\frac{1}{5}

der Kategorie FN zugeordnet.After training the VTC, each validation sample is assigned to the TP, TN, FP, and FN category. An example result is shown 9 . If you randomly choose a validation sample, it has a probability of

\frac{3}{5}

one of the categories TP or TN, with a probability of

\frac{1}{5}

of the category FP and also with a probability of

\frac{1}{5}

assigned to the category FN.

Für den betrachteten Fall von #T = #V = 5 gibt es $(\begin{matrix} 10 \\ 5 \end{matrix}) = 252$

mögliche Aufteilungen in Trainings- und Validierungssamples. Für jede der Kategorien TP, TN, FP und FN ergibt sich damit eine diskrete Wahrscheinlichkeitsverteilung unter der Annahme, dass die Aufteilungen auf eine bestimmte Art verteilt sind, z. B., dass jede Aufteilung gleichwahrscheinlich ist. 10 stellt eine diskrete Wahrscheinlichkeitsverteilung eines Validierungssamples dar, welches eine diskrete Wahrscheinlichkeitsverteilung der Falsch-positiv-Rate in einem zufälligen Validierungssample enthält.For the considered case of #T = #V = 5 there is

(\begin{matrix} 10 \\ 5 \end{matrix}) = 252

possible divisions into training and validation samples. For each of the categories TP, TN, FP and FN this results in a discrete probability distribution assuming that the distributions are distributed in a certain way, e.g. B. that every division is equally probable. 10 represents a discrete probability distribution of a validation sample, which contains a discrete probability distribution of the false positive rate in a random validation sample.

Zusätzlich können auch Verteilungen abgeleiteter Größen wir FMI oder MCC (siehe Formel 4) bestimmt werden. Die Besonderheit des vorgeschlagenen Verfahrens besteht darin, dass die diskreten Wahrscheinlichkeitsverteilungen exakt berechnet werden können und nicht durch Sampling angenähert werden müssen.In addition, distributions of derived variables such as FMI or MCC (see formula 4) can also be determined. The special feature of the proposed method is that the discrete probability distributions can be calculated exactly and do not have to be approximated by sampling.

Hierzu seien die Anzahlen #T der Trainingssamples und #V der Validierungssamples sowie eine Menge von #T + #V Datenpunkten aus der Grundgesamtheit fixiert. In einem ersten Schritt werden alle Samples nach dem Winkel gemäß 1 sortiert, d. h. die Sortierung erfolgt absteigend nach ${atan}_{2} (SMerrorX (sample), {QSpec}_{sim} (sample)) .$

For this purpose, the numbers #T of the training samples and #V of the validation samples as well as a set of #T + #V data points from the population are fixed. In a first step, all samples are sorted according to the angle 1 sorted, ie the sorting is in descending order

{atan}_{2} (SMerrorX (sample), {QSpec}_{sim} (sample)) .

Die sich ergebende sortierte Liste von Samples wird in zwei Listen trust_fail(QSpeC_sim < 0) und trust_pass(QSpec_sim > 0) geteilt, sodass die Konkatenation von trust_fail und trust_pass die ursprüngliche sortierte Liste ergibt.The resulting sorted list of samples is split into two lists trust _fail (QSpeC _sim < 0) and trust _pass (QSpec _sim > 0) such that the concatenation of trust _fail and trust _pass gives the original sorted list.

Es werden im Folgenden alle Aufteilungen der Samples in trust_fail und trust_pass in #T Trainings- und #V Validierungssamples betrachtet. Man betrachtet ferner alle möglichen VTCs, die durch maximal einen Stützpunkt fail_sep in trust_fail und maximal einen Stützpunkt pass_sep in trust_pass beschrieben sind. Dabei schließt man jene Kombinationen aus, die mit den gegebenen Werten für #T und #V nicht möglich sind.In the following, all divisions of the samples into trust _fail and trust _pass in #T training and #V validation samples are considered. Furthermore, all possible VTCs are considered that are described by a maximum of one fail _sep in trust _fail and a maximum of one pass _sep in trust _pass . In doing so, one excludes those combinations that are not possible with the given values for #T and #V.

Im Folgenden seien fail_sep und pass_sep fix, d. h. ein möglicher VTC sei fixiert. Dieser kommt unter einer bestimmten Teilmenge aller Aufteilungen in Trainings- und Validierungssamples zustande. Die Menge der Samples, die als Validierungssamples auftreten können, werden nach der ihnen jeweils zugeordneten Kategorie TP, TN, FP oder FN und dem Umstand aufgeteilt, ob das jeweilige Sample in sämtlichen oder nur in einigen Fällen als Validierungssample dient. Darüber hinaus bezeichnen im Folgenden n₁ die Anzahl der Muss-Samples einer bestimmten Kategorie X, n₂ die Anzahl der Muss-Samples sämtlicher verbleibender Kategorien, m₁ die Anzahl der Kann-Samples einer bestimmten Kategorie X, m₂ die Anzahl der Kann-Samples sämtlicher verbleibender Kategorien, k = #V, c_min = max(0, k - n₁ - n₂ - m₂) und c_max = min(m₁, k - n₁ - n₂).In the following, fail _sep and pass _sep are fixed, ie a possible VTC is fixed. This comes about under a certain subset of all divisions into training and validation samples. The set of samples that can appear as validation samples is divided according to the category TP, TN, FP or FN assigned to them and whether the respective sample serves as a validation sample in all cases or only in some cases. In addition, in the following n ₁ denotes the number of must-samples in a specific category X, n ₂ the number of must-samples in all remaining categories, m ₁ the number of optional samples in a specific category X, m ₂ the number of optional Samples of all remaining categories, k = #V, c _min = max(0, k - n ₁ - n ₂ - m ₂ ) and c _max = min(m ₁ , k - n ₁ - n ₂ ).

Die Wahrscheinlichkeit P_v, ein Validierungssample zufällig aus der Klasse X zu wählen (siehe vorheriges Beispiel) und die zugehörige Auftretenswahrscheinlichkeit P_a über alle Aufteilungen der #V Validierungssamples, die konsistent mit dem fixierten VTC sind, werden im Folgenden als Tupel (P_v,P_a) dargestellt. Die folgende Menge repräsentiert somit die exakte diskrete Wahrscheinlichkeitsverteilung für Kategorie X: ${(\frac{n_{1} + j}{k}, \frac{(\begin{matrix} m_{1} \\ j \end{matrix}) (\begin{matrix} m_{2} \\ k - n_{1} - n_{2} - j \end{matrix})}{(\begin{matrix} m_{1} + m_{2} \\ k - n_{1} - n_{2} \end{matrix})}) | c_{min} \leq j \leq c_{max}}$

The probability P _v of randomly choosing a validation sample from class X (see previous example) and the associated probability of occurrence P _a across all partitions of #V validation samples consistent with the fixed VTC are denoted below as tuple (P _v , P _a ) shown. The following set thus represents the exact discrete probability distribution for category X:

{(\frac{n_{1} + j}{k}, \frac{(\begin{matrix} m_{1} \\ j \end{matrix}) (\begin{matrix} m_{2} \\ k - n_{1} - n_{2} - j \end{matrix})}{(\begin{matrix} m_{1} + m_{2} \\ k - n_{1} - n_{2} \end{matrix})}) | c_{at least} \leq j \leq c_{Max}}

11 stellt die diskrete Wahrscheinlichkeitsverteilung der jeweiligen Kategorie TP und TN (Bezugszeichen 35), FP (Bezugszeichen 36) sowie FN (37) in Abhängigkeit von der Anzahl des Trainingssamples für s = 1 (eine Teilmenge pro Anzahl an Trainingssamples wurde zufällig aus der Grundgesamtheit ausgewählt) dar. 11 represents the discrete probability distribution of the respective categories TP and TN (reference 35), FP (reference 36) and FN (37) depending on the number of training samples for s = 1 (a subset per number of training samples was randomly selected from the population) represent.

In diesem Beispiel ist r = 1 und #T wird variiert. Die diskrete Wahrscheinlichkeitsverteilung wird für jeden Wert von #T in Form eines 98-%-Quantils (45, 46 bzw. 47) um den Mittelwert (35, 36, 37) dargestellt.In this example, r = 1 and #T is varied. The discrete probability distribution is presented for each value of #T in the form of a 98% quantile (45, 46, and 47, respectively) around the mean (35, 36, 37).

Wählt man s > 1, so stehen für jeden Wert von #T mehrere diskrete Wahrscheinlichkeitsverteilungen zur Verfügung. Aus der diskreten Wahrscheinlichkeitsverteilung der Mittelwerte etwa kann eine Statistik gebildet werden, die Mittelwert und Varianz der Mittelwerte erfasst. Aus der diskreten Wahrscheinlichkeitsverteilung der unteren und oberen q-Quantile kann entsprechend z. B. eine Statistik gebildet werden, die Mittelwert, Varianz sowie obere und untere Schranken der q-Quantile beleuchtet.If one chooses s > 1, several discrete probability distributions are available for each value of #T. From the discrete probability distribution of the mean values, for example, a statistic can be formed that records the mean value and variance of the mean values. From the discrete probability distribution of the lower and upper q-quantiles, e.g. For example, a statistic can be formed that illuminates the mean, variance, and upper and lower limits of the q-quantiles.

12 beleuchtet eine der 11 entsprechende Darstellung für r = 1 und s = 5, wobei die Mittelwerte der Mittelwerte und die Mittelwerte der 98-%-Quantile dargestellt sind. 12 illuminates one of the 11 corresponding plot for r = 1 and s = 5, showing the means of the means and the means of the 98% quantile.

Ein Vergleich zwischen r = 0,5 (linke Spalte) und r = 4 (rechte Spalte) sowie dem Mittelwert der 98-%-Quantile (obere Zeile) und den oberen und unteren Schranken der 98-%-Quantile (untere Zeile) ist in 13 dargestellt.A comparison between r = 0.5 (left column) and r = 4 (right column) and the mean of the 98% quantiles (top row) and the upper and lower bounds of the 98% quantiles (bottom row) is shown in 13 shown.

Auf Basis dieser Mengen exakt berechneter diskreter Wahrscheinlichkeitsverteilungen lassen sich Konvergenzschätzer definieren, die für eine gegebene Wahl von r und s die Abweichungen der s diskreten Wahrscheinlichkeitsverteilungen für jede Wahl von #T beurteilen. Beispiele für solche Schätzer sind der Mittelwert oder obere und untere Schranke der FN-Mittelwerte sowie der maximale Abstand aller Paare diskreter FN-Wahrscheinlichkeitsverteilungen. Der Abstand zwischen zwei Verteilungen ƒ₁ und ƒ₂ kann zu diesem Zweck z. B. wie folgt definiert werden: $max_{σ \in Σ} | f_{1} (σ) - f_{2} (σ) |$

On the basis of these sets of precisely calculated discrete probability distributions, convergence estimators can be defined which, for a given choice of r and s, assess the deviations of the s discrete probability distributions for each choice of #T. Examples of such estimators are the mean or upper and lower bounds of the FN mean values and the maximum distance of all pairs of discrete FN probability distributions. For this purpose, the distance between two distributions ƒ ₁ and ƒ ₂ can be B. be defined as follows:

\underset{σ \in Σ}{Max} | f_{1} (σ) - f_{2} (σ) |

In diesem Zusammenhang sei darauf verwiesen, dass die Kullback-Leibler-Divergenz (information gain) nicht definiert ist, wenn ∃σ ∈ Σ: ƒ(σ) = 0.In this context it should be noted that the Kullback-Leibler divergence (information gain) is undefined if ∃σ ∈ Σ: ƒ(σ) = 0.

In den unteren Hälften der 14 bis 17 sind alle drei Konvergenzschätzer jeweils anhand von oberer und unterer Schranke (38) und Mittelwert (39) der FN-Mittelwerte sowie maximalem Abstand (40) aller Paare diskreter FN-Wahrscheinlichkeitsverteilungen dargestellt. Für die die 14 und 15 gilt r = 0,5 und s = 20, wobei der Ausschnitt an #T-Werten variiert. Für die 16 und 17 sind r = 4 und s = 20 gewählt.In the lower halves of 14 until 17 all three convergence estimators are shown using the upper and lower bounds (38) and mean (39) of the FN mean values as well as the maximum distance (40) of all pairs of discrete FN probability distributions. For the the 14 and 15 then r = 0.5 and s = 20, with the range varying at #T values. For the 16 and 17 r = 4 and s = 20 are chosen.

Für ein gegebenes r konvergiert mit Erhöhung von #T und damit auch Erhöhung von #V der Abstand der Wahrscheinlichkeitsverteilungen. Mit Hilfe des Konvergenzschätzers kann somit entschieden werden, ob die zur Verfügung stehende Menge an Simulations- und Messdaten für eine robuste Klassifikation simulationsbasierter Tests ausreicht. Nach Abschluss der Erhebung von Messdaten kann das Vertrauen in das Klassifikationsergebnis für noch nicht gesehene Simulationsdaten mit Hilfe der Verteilungen von TP, TN, FP, FN und der daraus abgeleiteten Indikatoren quantifiziert werden.For a given r, the distance of the probability distributions converges with increasing #T and thus also increasing #V. With the help of the convergence estimator it can be decided whether the amount of simulation and measurement data available is sufficient for a robust classification of simulation-based tests. After completing the collection of measurement data, the confidence in the classification result for simulation data that has not yet been seen can be quantified using the distributions of TP, TN, FP, FN and the indicators derived from them.

Dieses Verfahren (10) kann beispielsweise in Software oder Hardware oder in einer Mischform aus Software und Hardware beispielsweise in einer Arbeitsstation (30) implementiert sein, wie die schematische Darstellung der 18 verdeutlicht.This method (10) can be implemented, for example, in software or hardware or in a mixed form of software and hardware, for example in a workstation (30), such as the schematic representation of FIG 18 clarified.

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES INCLUDED IN DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of documents cited by the applicant was generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturPatent Literature Cited

DE 10303489 A1 [0004]
DE 102020205539 [0006]

Claims

Method (10) for testing a technical system, in particular an at least partially autonomous robot or vehicle and mechatronic vehicle components such as a steering system, braking system and safety-critical driving support systems such as ABS and ESP, characterized by the following features: - Combinations of training data and validation data are formed (1), - for each of the combinations, samples of the training data (17) and validation data (23) are taken (2), a two-valued classifier is trained with the samples (26) and probability distributions of confusion matrices (31, 32) are calculated (3), - a number of the training data (17) for a simulation of the system (4), - tests (12) are carried out by means of the simulation (11), - a robustness measure which evaluates the trustworthiness of the classifier is calculated using the volume of training data, - the tests (12) with regard to a measure of fulfillment (13) of a quant Itative requirement for the system and an error measure (14) of the simulation (11) evaluated and - depending on the compliance measure (13) and error measure (14), the tests (12) are classified as either reliable or unreliable.

Method (10) according to claim 1 , characterized by the following feature: - the robustness measure for the classifier is determined using the distribution of the training and test data.

Method (10) according to claim 1 or 2 , characterized by the following features: - on the basis of the robustness measure, a decision is made as to whether the underlying totality of the data needs to be enlarged by further simulations and measurements and - if the totality of the data including the measurements is fixed, a simulation without measurements is dependent on the robustness measure Based on the robustness measure, a confidence measure is issued that evaluates how trustworthy the relevant test result is.

Method (10) according to any one of Claims 1 until 3 , characterized by the following features: - the classification (15) is carried out by a multi-value classifier (18) using a feature vector (13, 14) and - the measure of compliance (13) and measure of error (14) form components of the feature vector (13, 14).

Method (10) according to any one of Claims 1 until 4 , characterized by the following features: - the multi-value classifier (18) forms the feature vector (13, 14) on a case-by-case basis to a first class (A) with a positive degree of compliance (13) in the simulation (11) and the tests (12) on the system, a second class (B) with a positive degree of fulfillment (13) in the simulation (11) and a negative degree of fulfillment (13) in the tests (12) on the system, a third class (C) with a negative degree of fulfillment (13) in the simulation (11 ) and a positive degree of fulfillment (13) in the tests (12) on the system or a fourth class (D) with a negative degree of fulfillment (13) in the simulation (11) and the tests (12) on the system and - the classification takes place within specified Decision limits (19) between the classes (A, B, C, D).

Method (10) according to claim 5 , characterized by the following features: - in a preparatory phase (20), the simulation (11) is confirmed by experimental measurement (21) on the system, - the decision limits (19) are drawn in such a way that on the one hand in the simulation (11) and on the other the degree of compliance (13) taken in the measurement (21) deviates as little as possible and - preferably further tests (12) to be carried out in the preparation phase (20) are automatically selected (22).

Method (10) according to claim 5 , characterized by the following features: - the multi-value classifier (18) is defined by solving a system of equations and - the system of equations includes definition equations of the measure of compliance (13) and measure of error (14).

Method (10) according to any one of Claims 1 until 7 , characterized by the following feature: - the evaluation is carried out in such a way that the degree of fulfillment (13) is positive if the system meets the requirement (24) and negative if the system fails to meet the requirement (25).

Method (10) according to any one of Claims 1 until 8th , characterized by the following feature: - for certain parameters of the tests (12), the measure of fulfillment (13) and measure of error (14) are each represented in a feature space spanned by the parameters and - after the evaluation, the classification (15) is visualized in the feature space .

Method (10) according to any one of Claims 1 until 9 , characterized in that errors in the system detected by the checking are automatically corrected.

Computer program which is set up, the method (10) according to one of Claims 1 until 10 to execute.

Machine-readable storage medium on which the computer program claim 11 is saved.

Device (30) which is set up, the method (10) according to one of Claims 1 until 10 to execute.