DE102022107010A1

DE102022107010A1 - System and method for identifying objects

Info

Publication number: DE102022107010A1
Application number: DE102022107010.7A
Authority: DE
Inventors: Sebastian Niehaus; Alberto Merola; Janis Reinelt
Original assignee: Aicura Medical GmbH
Current assignee: Aicura Medical GmbH
Priority date: 2021-11-30
Filing date: 2022-03-24
Publication date: 2023-06-01

Abstract

Die Erfindung betrifft ein System zum Identifizieren von Objekten. Das System weist wenigstens ein erstes neuronales Netz, insbesondere einen Autoencoder oder Transformer, bevorzugt einen Sparse Autoencoder oder Sparse Transformer zum Umwandeln einer Abfrage in wenigstens eine typisierte Repräsentation, vorzugsweise mehrere typisierte Repräsentationen der Abfrage auf. Das erste neuronale Netz ist so trainiert, dass es eine Abfrage in abstraktere, typisierte Repräsentationen der Abfrage und zum anderen der die Eigenschaften der Objekte repräsentierenden Datensätze überführt. Außerdem weist das System ein zweites neuronales Netz zum Umwandeln einer Abfrage in wenigstens eine typisierte Repräsentation, vorzugsweise mehrere typisierte Repräsentationen, der Datensätze mit Objekteigenschaften auf. Das zweite neuronale Netz ist so trainiert, dass es einen jeweiligen Datensatz mit Objekteigenschaften in eine abstraktere, typisierte Repräsentation des jeweiligen die Eigenschaften der Objekte repräsentierenden Datensatzes überführt.The invention relates to a system for identifying objects. The system has at least one first neural network, in particular an autoencoder or transformer, preferably a sparse autoencoder or sparse transformer, for converting a query into at least one typed representation, preferably a plurality of typed representations of the query. The first neural network is trained in such a way that it converts a query into more abstract, typed representations of the query and, on the other hand, of the data sets representing the properties of the objects. In addition, the system has a second neural network for converting a query into at least one typed representation, preferably a plurality of typed representations, of the data sets with object properties. The second neural network is trained in such a way that it converts a respective data set with object properties into a more abstract, typified representation of the respective data set representing the properties of the objects.

Description

Die Erfindung betrifft ein System und Verfahren zum Identifizieren von Objekten.The invention relates to a system and method for identifying objects.

Objekte, also beispielsweise Gegenstände, aber auch Personen zeichnen sich typischerweise durch eine Vielzahl von Eigenschaften aus, von denen zumindest einige physikalischer Natur und messbar sind. Solche Eigenschaften können die Größe, die Farbe, das Alter, das Gewicht, die Zusammensetzung oder dergleichen sein. Andere Eigenschaften, die nicht ohne Weiteres messbar sind, aber ebenfalls zu den Attributen eines Objekts gehören können sind beispielsweise die Historie des Objekts. Ein Beispiel hierfür kann die Herstellungshistorie eines komplexen Bauteils sein, das in verschiedenen Stufen aus mehreren Grundelementen und Grundbauteilen sukzessive zu einem komplexeren Bauteil gefertigt wird. Auch eine derartige Produktionshistorie ist ein Attribut eines Objekts und kann beispielsweise im Zusammenhang mit einer Qualitätskontrolle relevant sein. Beispielsweise werden in der chemischen Industrie regelmäßig Chargennummern zu einem Produkt erfasst, um im Falle von Problemen nachvollziehen zu können, wie die entsprechende Charge, zu der das Produkt gehört, hergestellt wurde. Ein Problem ist, dass die Vielzahl der Eigenschaften und Attribute verschiedener Objekte typischerweise nicht vollständig oder nicht einheitlich erfasst wurde. Beispielsweise können gebrauchte Kraftfahrzeugersatzteile sowohl durch die Art des Ersatzteils, zum Beispiel „Heckklappe“ oder durch die Teilenummer des Herstellers beschrieben werden. Wird nur „Heckklappe“ angegeben, muss typischerweise das Fahrzeug, zu dem die Heckklappe gehören soll, genauer beschrieben werden. In Bezug auf das gewählte Beispiel spielt auch die Farbe häufig eine Rolle. Diese kann beispielsweise grob mit Hellgrün angegeben sein oder mit der Herstellerangabe „Frühlingsgrün“ oder mit einem Farbcode des Herstellers. Eine hellgrüne Heckklappe kann eine Heckklappe mit der Farbe „Frühlingsgrün“ sein. Letzteres ist jedoch dann nicht sicher, wenn es vom selben Hersteller mehrere hellgrüne Farbtöne für Fahrzeuge gibt. Weitere Eigenschaften eines solchen Objekts sind beispielsweise die Ausstattung wie Schließsensor, Heckscheibenreinigung oder Antrieb zum Öffnen und Schließen der Heckklappe.Objects, for example items, but also people are typically characterized by a large number of properties, at least some of which are physical in nature and measurable. Such characteristics may be size, color, age, weight, composition, or the like. Other properties that are not readily measurable, but can also belong to the attributes of an object, are, for example, the history of the object. An example of this can be the manufacturing history of a complex component that is successively manufactured in different stages from several basic elements and basic components to form a more complex component. Such a production history is also an attribute of an object and can be relevant, for example, in connection with quality control. For example, batch numbers for a product are regularly recorded in the chemical industry in order to be able to trace how the batch to which the product belongs was manufactured in the event of problems. One problem is that the large number of properties and attributes of different objects has typically not been recorded completely or in a uniform manner. For example, used automotive spare parts can be described both by the type of spare part, e.g. "tailgate" or by the manufacturer's part number. If only "tailgate" is specified, typically the vehicle to which the tailgate is to belong must be described in more detail. With regard to the chosen example, the color also often plays a role. This can, for example, be indicated roughly as light green or with the manufacturer's specification "spring green" or with a color code from the manufacturer. A light green tailgate may be a spring green tailgate. However, the latter is not certain if there are several light green shades for vehicles from the same manufacturer. Other properties of such an object are, for example, the equipment such as a closing sensor, rear window cleaning or a drive for opening and closing the tailgate.

Wenn viele Objekte ähnlicher oder gleicher Art mit ihren Eigenschaften in einer Datenbank erfasst sind, ergibt sich das Problem, dass die für ein einzelnes Objekt erfassten Daten unvollständig sein können oder auch ungenau. Daher ist es schwierig, mit einer automatisierten Abfrage diejenigen Objekte zu identifizieren, die am besten zu der Abfrage passen. Eine genau formulierte Abfrage führt somit nicht notwendigerweise zur Identifikation der am besten zu der Abfrage passenden Objekte, weil die Beschreibung der Objekte in der Datenbank möglichweise hinsichtlich des Formats der Angaben oder auch der geforderten Genauigkeit von der Abfrage abweichen können. Sollen von beispielsweise tausend Objekten die potenziell am besten zu einer Abfrage passenden Zielobjekte identifiziert werden, ohne dass die Objekteigenschaften nachträglich noch einmal neu erfasst werden, ergibt sich üblicherweise ein Problem.If many objects of a similar or the same type are recorded in a database with their properties, the problem arises that the data recorded for a single object can be incomplete or inaccurate. Therefore, it is difficult to use an automated query to identify those objects that best match the query. A precisely formulated query does not necessarily lead to the identification of the objects that best match the query, because the description of the objects in the database may differ from the query with regard to the format of the information or the required accuracy. If, for example, the target objects that potentially best match a query are to be identified from a thousand objects without subsequently re-entering the object properties, a problem usually arises.

Typischerweise setzt sich eine Abfrage aus einer Mehrzahl von Attributen oder Objekteigenschaften zusammen, die auszuwählende Objekte nach Möglichkeit aufweisen sollen.A query is typically composed of a plurality of attributes or object properties which objects to be selected should have if possible.

Idealerweise sind die Objekte genau durch solche Attribute oder Werte von solchen Objekteigenschaften beschrieben, wie sie auch in der Abfrage genannt sind. Häufig sind die Werte der erfassten Objekteigenschaften jedoch unvollständig (d.h. die Objekteigenschaften sind im Sine der Abfrage unvollständig erfasst) oder haben ein falsches Format, weil sie z.B. durch Werte in anderen Einheiten oder auf anderen Skalen oder in anderen Sprachen oder Begrifflichkeiten erfasst sind.Ideally, the objects are described exactly by such attributes or values of such object properties as are also named in the query. However, the values of the recorded object properties are often incomplete (i.e. the object properties are incompletely recorded in the sense of the query) or have an incorrect format, e.g. because they are recorded by values in different units or on different scales or in different languages or terms.

Dementsprechend ist es erstrebenswert, die Eigenschaften und Attribute von Objekten so genau und vollständig wie möglich zu erfassen. Dies setzt genaue Vorgaben und beispielsweise auch genaue Messverfahren voraus. Letzteres ist immer dann, wenn die Angaben zu den Eigenschaften der Objekte von verschiedenen Quellen stammen - beispielsweise von unterschiedlichen Institutionen oder Personen oder mittels unterschiedlicher Geräte erfasst werden - nur schwer umsetzbar. Nach dem Erfassen der Eigenschaften und Attribute von Objekten müssen diese in irgendeiner Form repräsentiert werden. Üblich ist es, die Eigenschaften und Attribute von Objekten mit Worten und Zahlen oder anderen Codes zu beschreiben und diese Repräsentationen der Eigenschaften (Worte, Zahlen Codes etc.) eines jeweiligen Objekts zu einem Datensatz zusammenzufassen. Auch die Datensätze als eine Art Container für die diese Repräsentationen der Eigenschaften eines jeweiligen Objekts können verschiedene Formate haben, genau wie die in ihnen enthaltenen Repräsentationen der Eigenschaften. Wenn es beispielsweise um messbare und in Zahlen und zugehörige Einheiten ausdrückbare Eigenschaften wie z.B. Größe, Gewicht oder Volumen geht, kann schon die Messung unterschiedlich genau sein, die Darstellung in Zahlen und Einheiten kann unterschiedlich sein (Anzahl der Nachkommastellen, verwendete Einheit) und deren Repräsentation und Formatierung in einer Datei können für zwei identische Objekte stark voneinander abweichen, selbst wenn die Messungen und die Repräsentationen frei von Fehlern sind. Bei einer Beschreibung von Eigenschaften mit Worten in natürlicher Sprache wird die Vielfalt der Möglichkeiten, eine bestimmte Eigenschaft zu benennen und diese in Datensätzen zu repräsentieren, noch größer.Accordingly, it is desirable to capture the properties and attributes of objects as accurately and completely as possible. This requires precise specifications and, for example, precise measurement methods. The latter is always difficult to implement when the information on the properties of the objects comes from different sources - for example from different institutions or people or using different devices. After capturing the properties and attributes of objects, they must be represented in some form. It is usual to describe the properties and attributes of objects with words and numbers or other codes and to combine these representations of the properties (words, number codes, etc.) of a respective object into a data set. Also the data sets as a kind of container for these representations of the properties of a respective object can have different formats, just like the representations of the properties contained in them. For example, when it comes to properties that can be measured and expressed in numbers and associated units, such as size, weight or volume, the measurement can vary in accuracy, the representation in numbers and units can vary (number of decimal places, unit used) and their representation and formatting in a file can differ widely for two identical objects, even if the measurements and the representations are free of errors. When describing Eigen When dealing with words in natural language, the variety of possibilities for naming a specific property and representing it in data sets becomes even greater.

Erfasste Eigenschaften und Attribute von Objekten können somit in maschinenlesbare Beschreibungen überführt werden. Üblicherweise müssen diese für eine deterministische Abfrage eindeutig sein. Dies ist jedoch in der Praxis häufig nicht der Fall.Recorded properties and attributes of objects can thus be converted into machine-readable descriptions. Usually these must be unique for a deterministic query. However, this is often not the case in practice.

Der Erfindung liegt die Aufgabe zugrunde, für das letztgenannte Problem eine Lösung anzubieten.The object of the invention is to offer a solution for the latter problem.

Erfindungsgemäß wird zur Lösung dieses Problems ein System vorgeschlagen, das wenigstens ein erstes neuronales Netz, insbesondere einen Autoencoder oder Transformer, bevorzugt einen Sparse Autoencoder oder Sparse Transformer zum Umwandeln einer Abfrage in wenigstens eine typisierte Repräsentation, vorzugsweise mehrere typisierte Repräsentationen der Abfrage aufweist. Das erste neuronale Netz ist so trainiert, dass es eine Abfrage in abstraktere, typisierte Repräsentationen der Abfrage überführt. Außerdem weist das System ein zweites neuronales Netz zum Umwandeln der Datensätze mit Objekteigenschaften in wenigstens eine typisierte Repräsentation, vorzugsweise mehrere typisierte Repräsentationen, der Datensätze mit Objekteigenschaften auf. Das zweite neuronale Netz ist so trainiert, dass es einen jeweiligen Datensatz mit Objekteigenschaften in eine abstraktere, typisierte Repräsentation des jeweiligen die Eigenschaften der Objekte repräsentierenden Datensatzes überführt.According to the invention, a system is proposed to solve this problem, which has at least a first neural network, in particular an autoencoder or transformer, preferably a sparse autoencoder or sparse transformer for converting a query into at least one typed representation, preferably a plurality of typed representations of the query. The first neural network is trained to transform a query into more abstract, typed representations of the query. In addition, the system has a second neural network for converting the data sets with object properties into at least one typed representation, preferably a plurality of typed representations, of the data sets with object properties. The second neural network is trained in such a way that it converts a respective data set with object properties into a more abstract, typified representation of the respective data set representing the properties of the objects.

Besonders bevorzugt ist es, wenn das jeweilige neuronale Netz, also insbesondere der (Sparse) Autoencoder oder (Sparse) Transformer, derart konfiguriert ist, dass er mehrere typisierte Repräsentationen einer Abfrage oder eines jeweiligen Datensatzes mit Objekteigenschaften generiert. Dies kann bei einem Autoencoder beispielsweise mittels variierendem Drop-Out in den einzelnen verdeckten Schichten oder bei eine Transformer durch Auskoppeln der Ausgaben der einzelnen Dekoder (siehe unten) erfolgen.It is particularly preferred if the respective neural network, ie in particular the (sparse) autoencoder or (sparse) transformer, is configured in such a way that it generates a plurality of typified representations of a query or of a respective data set with object properties. In the case of an autoencoder, this can be done, for example, by means of a varying drop-out in the individual hidden layers, or in the case of a transformer by decoupling the outputs of the individual decoders (see below).

Außerdem weist das System eine Vergleichseinheit auf, die dazu konfiguriert ist, die typisierte Repräsentation oder die typisierten Repräsentationen einer Abfrage (oder mehrere typisierte Repräsentationen einer Abfrage) mit den typisierten Repräsentationen der Eigenschaften der Objekte zu vergleichen und entsprechende Ähnlichkeitswerte zu generieren. Eine mit der Vergleichseinheit verbundene Auswahl- und Ausgabeeinheit ist dazu ausgebildet diejenigen Objekte zu identifizieren, deren typisierte Repräsentationen ihrer Eigenschaften der typisierten Repräsentation der Abfrage am ähnlichsten sind. Vorzugsweise ist die Vergleichseinheit ausgebildet, die Ähnlichkeit typisierter Repräsentationen durch Bestimmen der euklidischen Distanz zu ermitteln. Die jeweilige euklidische Distanz zwischen einer der typisierten Repräsentationen der Abfrage und einer typisierten Repräsentation eines jeweiligen Datensatzes mit Objekteigenschaften ist dann ein von der Vergleichseinheit gebildeter Ähnlichkeitswert. Die typisierten Repräsentationen der Datensätze mit Objekteigenschaften und die typisierten Repräsentationen der Abfrage können in Form von Tensoren vorliegen, die von dem zweiten beziehungsweise dem ersten neuronalen Netz generiert werden. Dazu haben das erste und das zweite neuronale Netz vorzugsweise eine oder mehrere Ausgabeschichten mit einer identischen Anzahl von Knoten.In addition, the system has a comparison unit that is configured to compare the typed representation or the typed representations of a query (or multiple typed representations of a query) with the typed representations of the properties of the objects and to generate corresponding similarity values. A selection and output unit connected to the comparison unit is designed to identify those objects whose typed representations of their properties are most similar to the typed representation of the query. The comparison unit is preferably designed to determine the similarity of typified representations by determining the Euclidean distance. The respective Euclidean distance between one of the typified representations of the query and a typified representation of a respective data set with object properties is then a similarity value formed by the comparison unit. The typed representations of the object property datasets and the typed representations of the query may be in the form of tensors generated by the second and first neural networks, respectively. To this end, the first and the second neural network preferably have one or more output layers with an identical number of nodes.

Alternativ können die typisierten Repräsentationen der Abfrage auch in Abfragen in einem Abfrageformat überführt werden, das direkt auf die Datensätze mit Objekteigenschaften anwendbar ist. Dann bestimmt die Vergleichseinheit die Ähnlichkeit zwischen derart transformierten Abfragen und den ursprünglichen Datensätzen mit Objekteigenschaften - und nicht etwa mit typisierten Repräsentationen der Datensätze mit Objekteigenschaften.Alternatively, the query's typed representations can be converted to queries in a query format that is directly applicable to the object property records. The comparison unit then determines the similarity between such transformed queries and the original object property records - rather than with typed representations of the object property records.

Für eine derartige Transformation der Abfragen werden diese in einem ersten Schritt wie hier beschrieben in typisierte Repräsentationen der Abfrage überführt. Die so gewonnenen typisierten Repräsentationen der Abfrage werden anschließend - in einem zweiten Schritt - mittels eines weiteren Decoders oder durch determinisches Mapping auf eine interpretierbare Datenrepräsentation der typisierten Abfrage gemappt und damit in ein Abfrageformat überführt, das direkt auf die Datensätze mit Objekteigenschaften anwendbar ist. Eine solche interpretierbare Datenrepräsentation kann beispielsweise als JSON-Datei realisiert werden. In dieser JSON-Datei werden die Schlüsseltermini (also z.B. die Namen der Parameter wie „Baujahr“ etc.) in natürlicher Sprache benannt.For such a transformation of the queries, in a first step they are converted into typed representations of the query as described here. The typed representations of the query obtained in this way are then - in a second step - mapped to an interpretable data representation of the typed query by means of a further decoder or by deterministic mapping and thus converted into a query format that can be applied directly to the data sets with object properties. Such an interpretable data representation can be implemented as a JSON file, for example. In this JSON file, the key terms (e.g. the names of the parameters such as "year of construction" etc.) are named in natural language.

Die interpretierbare Datenrepräsentation der typisierten Abfrage kann dann direkt auf die Datensätze mit Objekteigenschaften angewendet werden. Mit „interpretierbar“ ist hier gemeint, dass die Datenrepräsentation für Menschen verständlich ist und auf die zugrundeliegende Abfrage oder die auszuwählenden Daten gelesen werden kann - und keine abstrakte Form haben.The interpretable data representation of the typed query can then be applied directly to the object property records. By "interpretable" here is meant that the data representation is understandable to humans and can be read to the underlying query or data to be selected - and not in abstract form.

Ein erster, hier beschriebener Ansatz besteht somit darin, sowohl die Abfragen als auch die Datensätze mit Objekteigenschaften jeweils in typisierte Repräsentationen zu überführen, die jeweils ein abstraktes Format haben. Die typisierten Repräsentationen der Abfragen können dann mit den typisierten Repräsentationen der Datensätze mit Objekteigenschaften verglichen werden, um durch diesen Vergleich diejenigen Datensätze mit Objekteigenschaften zu bestimmen, die am besten zu der jeweiligen Anfrage passen.A first approach described here consists in converting both the queries and the data sets with object properties into typed representations, each of which has an abstract format. The typified representations tations of the queries can then be compared to the typed representations of the object property records to determine, through this comparison, those object property records that best match the particular query.

Der zweite, hier beschriebene Ansatz besteht darin, nur die Abfragen zunächst in typisierte Repräsentationen der Abfragen zu überführen und anschließend die typisierten Repräsentationen der Abfragen in interpretierbare Datenrepräsentationen zu überführen, die zum einen für Menschen verständlich sind und zum anderen direkt auf die Datensätze mit Objekteigenschaften angewandt werden können. Um diejenigen Datensätze mit Objekteigenschaften zu bestimmen, die am besten zu der jeweiligen Anfrage passen, zu finden, können dann die interpretierbaren Datenrepräsentationen der Abfragen mit den Datensätzen mit Objekteigenschaften verglichen werden, um die jeweils am besten zur jeweiligen Abfrage passenden Datensätze mit Objekteigenschaften zu ermitteln.The second approach described here is to first convert only the queries into typed representations of the queries and then to convert the typed representations of the queries into interpretable data representations that are both understandable to humans and applied directly to the datasets with object properties can become. To determine the object property records that best match the query at hand, the interpretable data representations of the queries can then be compared to the object property records to identify the object property records that best match the query.

Vorzugsweise wird die Qualität des Mappings der typisierten Repräsentationen auf interpretierbare Datenrepräsentationen (das Repräsentationsmapping) mit einem Ranking versehen, um unter allen in Frage kommenden diejenigen interpretierbaren Datenrepräsentationen zu bestimmen, die am besten zu der jeweiligen typisierten Repräsentation passen. Da sowohl die die typisierten Repräsentationen als auch die interpretierbaren Datenrepräsentationen als Vektoren darstellbar sind, die sich aus Werten für eine Vielzahl von Parametern zusammensetzen, ist es möglich ein Ähnlichkeitsmaß zwischen den Vektoren, z.B. deren euklidischen Abstand zu ermitteln und das Ranking auf Basis des Ähnlichkeitsmaßes vorzunehmen.The quality of the mapping of the typed representations to interpretable data representations (the representation mapping) is preferably provided with a ranking in order to determine those interpretable data representations from among all the possible data representations that best match the respective typed representation. Since both the typified representations and the interpretable data representations can be represented as vectors composed of values for a large number of parameters, it is possible to determine a similarity measure between the vectors, e.g. their Euclidean distance, and to carry out the ranking based on the similarity measure .

Für dieses Ranking kann eine Vergleichseinheit vorgesehen sein, die der zuvor Beschriebenen ähnlich ist. Dementsprechend kann eine typisierte Repräsentation einer Abfrage mit verschiedenen interpretierbaren Datenrepräsentationen von Abfragen verglichen und die jeweils passendste interpretierbare Datenrepräsentation bestimmt und ausgewählt werden.A comparison unit similar to that described above can be provided for this ranking. Accordingly, a typified representation of a query can be compared with different interpretable data representations of queries, and the most suitable interpretable data representation can be determined and selected in each case.

Falls die Parameter (z.B. Alter, Größe, Baujahr, etc.) in den Abfragen nicht durch jeweils einen Einzelwert definiert sind, sondern durch einen Wertebereich, spannt jede Abfrage einen multidimensionalem Vektorraum auf. dem beispielsweise ein Mittelpunkt zugeordnet werden kann. In diesem Fall können die Positionen der Mittelpunkte in dem multidimensionalem Vektorraum miteinander verglichen werden. Das Ranking ist dann umso höher, je näher sich die von der typisierten Repräsentation der Abfrage und von der interpretierbaren Datenrepräsentation abgeleiteten Mittelwerte in dem multidimensionalem Vektorraum sind.If the parameters (e.g. age, size, year of construction, etc.) in the queries are not defined by a single value, but by a range of values, each query spans a multidimensional vector space. to which a center can be assigned, for example. In this case, the positions of the centers in the multidimensional vector space can be compared with each other. The ranking is then the higher, the closer the mean values derived from the typified representation of the query and from the interpretable data representation are in the multidimensional vector space.

Auch die auf Basis der typisierten Repräsentationen der Abfragen oder der entsprechenden interpretierbaren Datenrepräsentationen durchgeführten Vergleiche mit den Datensätzen mit Objekteigenschaften beziehungsweise deren typisierten Repräsentationen können mittels eines entsprechenden Rankings bewertet werden. Hierzu kann die gleiche Methode wie zum Ranking der Repräsentationsmappings verwendet werden. Dafür wird eine Zuordnung von Datensätzen mit Objekteigenschaften zu der jeweiligen Abfrage auf Basis von Vektorräumen realisiert, die eine jeweilige Abfrage, repräsentieren, realisiert. Je zentraler ein Datensatz mit Objekteigenschaften in dem von der interpretierbaren Repräsentation der Abfrage definierten Vektorraum liegt, desto höher ist das Ranking.The comparisons carried out on the basis of the typed representations of the queries or the corresponding interpretable data representations with the data sets with object properties or their typed representations can also be evaluated by means of a corresponding ranking. The same method as for ranking the representation mappings can be used for this. For this purpose, data sets with object properties are assigned to the respective query on the basis of vector spaces that represent a respective query. The more central a data set with object properties is in the vector space defined by the interpretable representation of the query, the higher the ranking.

In einem jeweiligen eine Abfrage repräsentierenden Vektorraum repräsentiert jeder Teil der Abfrage (also z.B. jeder in der Anfrage definierte Parameter) eine Vektordimension. Beispielsweise stellen Materialdichte, Herstellungsdatum und Zustand jeweils eine Vektordimension dar. Damit ergibt sich ein dreidimensionaler Parameter- oder Vektorraum (die Begriffe Parameterraum und Vektorraum werden hier synonym verwendet). Die Werte der einzelnen Dimensionen müssen in eine einheitliche Skalierung überführt werden. Soll jeder Wert den gleichen Wert für das Ranking haben, ist der höchste Rankingwert der Mitte des Vektorraums zugeordnet. Es können allerdings auch Gewichtungen für die einzelnen Dimensionen vergeben werden. In diesem Fall befindet sich der Punkt des höchsten Rankingwertes nicht mehr in der Mitte des Vektorraums, sondern ist im Vektorraum verschoben.In a respective vector space representing a query, each part of the query (e.g. each parameter defined in the query) represents a vector dimension. For example, material density, date of manufacture and condition each represent a vector dimension. This results in a three-dimensional parameter or vector space (the terms parameter space and vector space are used synonymously here). The values of the individual dimensions must be converted to a uniform scale. If each value is to have the same value for ranking, the highest ranking value is assigned to the middle of the vector space. However, weightings can also be assigned for the individual dimensions. In this case, the point of the highest ranking value is no longer in the middle of the vector space, but is shifted in the vector space.

Die Repräsentationen - sei es die typisierten Repräsentationen oder die interpretierbaren Datenrepräsentationen - werden in einem strukturieren Datenstandard abgelegt. Das kann beispielsweise ein JSON-Format sein. Für medizinische Daten bietet sich auch ein Format gemäß dem FHIR-Standard an. Neben den Repräsentationen der Parameter, die die Abfrage oder die Objekteigenschaften definieren, können die Repräsentationen auch Werte enthalten, die die Qualität des Mappings der Ressource repräsentierenThe representations - be it the typified representations or the interpretable data representations - are stored in a structured data standard. This can be a JSON format, for example. A format according to the FHIR standard is also suitable for medical data. In addition to the representations of the parameters that define the query or the object properties, the representations can also contain values that represent the quality of the mapping of the resource

Wenn eine Abfrage oder auch ein jeweiliger Datensatz mit Objekteigenschaften in jeweils in mehrere typisierte Repräsentationen überführt wird, kann jede typisierte Repräsentation eines Datensatzes mit Objekteigenschaften mit den verschiedenen typisierten Repräsentationen der Abfrage verglichen werden. Bei beispielsweise n verschiedenen Repräsentation einer Abfrage und m Repräsentationen der Datensätze mit Objekteigenschaften ergibt sich somit eine Matrix von relativen Ähnlichkeitswerten, wobei die Matrix die Dimension n x m hat.If a query or also a respective data set with object properties is converted into a plurality of typed representations, each typed representation of a data set with object properties can be compared with the various typed representations of the query. For example, with n different representations of a query and m representations of the Data sets with object properties result in a matrix of relative similarity values, with the matrix having the dimension nxm.

Ein erfindungsgemäßes System weist somit die folgenden Bestandteile auf:

- eine Eingabeschnittstelle zum Eingeben oder Empfangen einer Abfrage z.B. in Form eines Abfragedatensatzes,
- ein mit der Eingabeschnittstelle verbundenes erstes neuronales Netz, das zum Umwandeln einer Abfrage in wenigstens eine typisierte Repräsentation, vorzugsweise mehrere typisierte Repräsentationen der Abfrage konfiguriert und trainiert ist,
- einen Zugang zu einer Datenbank, die Datensätze mit Objekteigenschaften enthält,
- ein mit dem Zugang zur Datenbank verbundenes zweites neuronales Netz, das zum Umwandeln eines jeweiligen Datensatzes mit Objekteigenschaften in wenigstens eine typisierte Repräsentation des Datensatzes mit Objekteigenschaften, vorzugsweise mehrere typisierte Repräsentationen des Datensatzes mit Objekteigenschaften konfiguriert und trainiert ist, wobei die Anzahl der Knoten der Ausgangsschicht oder der Ausgangsschichten des zweiten neuronalen Netzes der Anzahl der Knoten der Ausgangsschicht oder der Ausgangsschichten des ersten neuronalen Netzes entspricht, so dass die typisierten Repräsentationen der Anfrage und des jeweiligen Datensatzes mit Objekteigenschaften die gleiche Dimension haben,
- eine Vergleichseinheit zum Bestimmen derÄhnlichkeiten von mittels des ersten neuronalen Netzes generierten typisierten Repräsentationen der Abfrage mit mittels des zweiten neuronalen Netzes generierten typisierten Repräsentationen der Datensätze mit Objekteigenschaften, sowie
- eine Auswahl- und Ausgabeeinheit zum Identifizieren zu einer Abfrage passender Objekte auf Basis der von der Vergleichseinheit bestimmten Ähnlichkeiten und zum Anzeigen identifizierter passender Objekte.

A system according to the invention thus has the following components:

- an input interface for entering or receiving a query, e.g. in the form of a query data record,
- a first neural network connected to the input interface, which is configured and trained to convert a query into at least one typed representation, preferably a plurality of typed representations of the query,
- access to a database containing data sets with object properties,
- a second neural network connected to the access to the database, which is configured and trained to convert a respective data set with object properties into at least one typed representation of the data set with object properties, preferably several typed representations of the data set with object properties, the number of nodes of the output layer or the output layers of the second neural network corresponds to the number of nodes in the output layer or layers of the first neural network, so that the typified representations of the query and the respective data set with object properties have the same dimension,
- a comparison unit for determining the similarities of typified representations of the query generated by means of the first neural network with typified representations of the data sets with object properties generated by means of the second neural network, and
- a selection and output unit for identifying matching objects to a query on the basis of the similarities determined by the comparison unit and for displaying identified matching objects.

Ein entsprechendes erfindungsgemäßes Verfahren umfasst die folgenden Schritte:

- Erfassen einer Mehrzahl physikalisch messbarer Eigenschaften einer Anzahl M von Objekten oder Individuen, wobei jedes Objekt oder Individuum durch Werte der für dieses Objekt oder Individuum erfassten Mehrzahl physikalisch messbarer Eigenschaften individuell charakterisiert ist
- Bilden einer der Anzahl m entsprechenden Anzahl von Datensätzen mit Objekteigenschaften, von denen jeder erfasste Werte für wenigstens einen Teil Mehrzahl physikalisch messbarer Eigenschaften zu dem jeweiligen Objekt oder Individuum enthält
- Bilden einer Abfrage zum Auswählen einer Teilmenge der Objekte oder Individuen, wobei die Abfrage Werte für wenigstens einen Teil der Mehrzahl physikalisch messbarer Eigenschaften enthält
- Transformieren der Abfrage mittels eines Autoencoders oder eines Transformers in eine Anzahl n typisierter Repräsentationen der Abfrage,
- Transformieren eines jeden der m Datensätze mit Objekteigenschaften mittels eines Autoencoders oder eines Transformers in eine Anzahl m typisierter Repräsentationen der Datensätze mit Objekteigenschaften, und
- Bestimmen eines Ähnlichkeitswerts für jeden der m typisierten Repräsentationen der Datensätze mit Objekteigenschaften durch Ermitteln der Ähnlichkeit zwischen einer typisierten Repräsentation der Abfrage mit einer jeweiligen der m typisierten Repräsentationen der Datensätze mit Objekteigenschaften.

A corresponding method according to the invention comprises the following steps:

- detecting a plurality of physically measurable properties of a number M of objects or individuals, each object or individual being individually characterized by values of the plurality of physically measurable properties detected for this object or individual
- Forming a number m corresponding to the number of data sets with object properties, each of which contains detected values for at least a part of a plurality of physically measurable properties of the respective object or individual
- forming a query to select a subset of the objects or individuals, the query including values for at least a portion of the plurality of physically measurable properties
- Transforming the query using an autoencoder or a transformer into a number n of typed representations of the query,
- transforming each of the m object property data sets by means of an autoencoder or a transformer into a number m of typed representations of the object property data sets, and
- determining a similarity value for each of the m typed representations of the object property data sets by determining the similarity between a typed representation of the query and a respective one of the m typed representations of the object property data sets.

Diese Lösung schließt den Gedanken ein, dass das genannte Problem nicht nur durch genaueres Erfassen der Eigenschaften und Attribute der Objekte gelöst werden kann, sondern dass es möglich ist, gute Ergebnisse auch auf Basis von ungenau oder unvollständig erfassten Eigenschaften und Attributen dadurch erzielen, dass die die Eigenschaften und Attribute repräsentierenden Datensätze mittels eines Autoencoders oder Transformers in eine abstraktere, typisierte Repräsentation überführt werden und auch die Abfragen zunächst in eine abstraktere, typisierte Repräsentation überführt wird, um dann auf Basis eines Vergleichs der typisierten Repräsentation der Abfrage mit den typisierten Repräsentationen der Objekte ein oder mehrere zu der Abfrage passende Objekte identifizieren zu können.This solution includes the idea that the problem mentioned can not only be solved by more precisely detecting the properties and attributes of the objects, but that it is also possible to achieve good results on the basis of imprecisely or incompletely detected properties and attributes in that the the data sets representing properties and attributes are converted into a more abstract, typed representation by means of an autoencoder or transformer and the queries are also first converted into a more abstract, typed representation, and then based on a comparison of the typed representation of the query with the typed representations of the objects to be able to identify one or more objects matching the query.

Das System weist als erstes neuronales Netz vorzugsweise einen Autoencoder oder einen Transformer auf, dessen Eingabeschicht eine Repräsentation einer Abfrage zugeführt wird.The system preferably has an autoencoder or a transformer as the first neural network, the input layer of which is supplied with a representation of a query.

Das System weist als zweites neuronales Netz vorzugsweise einen Autoencoder oder einen Transformer auf, dessen Eingabeschicht Eigenschaften der Objekte repräsentierende Datensätze zugeführt werden.The system preferably has an autoencoder or a transformer as the second neural network, to the input layer of which data sets representing properties of the objects are supplied.

Der das erste neuronale Netz bildende Autoencoder oder Transformer ist mit Trainingsdaten dazu trainiert eine konkrete Abfrage in jeweils mehrere typisierte Repräsentationen der Abfrage zu überführen. Der das zweite neuronale Netz bildende Autoencoder oder Transformer ist mit Trainingsdaten dazu trainiert einen jeweiligen Eigenschaften der Objekte repräsentierenden Datensatz in eine typisierte Repräsentation des Datensatzes zu überführen. Dies erlaubt es anschließend, die typisierten Repräsentationen der Abfrage mit jeder der typisierten Repräsentationen der verschiedenen Datensätze mit Objekteigenschaften zu vergleichen um in der Vielzahl der Objekte diejenigen Objekte zu identifizieren, deren typisierte Repräsentationen von deren Eigenschaften der typisierten Repräsentation der Abfrage am ähnlichsten sind.The autoencoder or transformer forming the first neural network is trained with training data to convert a specific query into a plurality of typified representations of the query. The autoencoder or transformer forming the second neural network is trained with training data to convert a data set representing respective properties of the objects into a typified representation of the data set. This then allows the query's typed representations to be compared with each of the typed representations of the various object property datasets to identify, among the plurality of objects, those objects whose typed representations are most similar in properties to the query's typed representation.

Sowohl die typisierte Repräsentation der Abfrage als auch die typisierten Repräsentationen der Eigenschaften der Objekte können als Tensoren dargestellt werden, die von dem Autoencoder oder Transformer erzeugt und an deren jeweiliger Ausgabeschicht bereitgestellt wurden. Die Ähnlichkeit von die typisierten Repräsentation der Datensätze mit Objekteigenschaften repräsentierenden Tensoren mit einem eine typisierte Repräsentation der Abfrage repräsentierenden Tensor kann beispielsweise durch Bestimmen der euklidischen Distanz zwischen dem die typisierte Abfrage repräsentierenden Tensor und den verschiedenen die typisierten Eigenschaften der Objekte repräsentierenden Tensoren erfolgen. Es kann dann eine gewünschte Anzahl von Objekten identifiziert werden, für die die euklidische Distanz zwischen dem das jeweilige Objekt beschreibenden Tensor und dem die typisierte Abfrage beschreibenden Tensor am geringsten sind.Both the typed representation of the query and the typed representations of the properties of the objects can be represented as tensors generated by the autoencoder or transformer and provided to their respective output layer. The similarity of tensors representing the typed representation of the datasets with object properties to a tensor representing a typed representation of the query can be determined, for example, by determining the Euclidean distance between the tensor representing the typed query and the various tensors representing the typed properties of the objects. A desired number of objects can then be identified for which the Euclidean distance between the tensor describing the respective object and the tensor describing the typed query is the smallest.

Autoencoder haben typischerweise eine Encoder-Decoder-Architektur mit einer Eingabeschicht (input layer) und einer Ausgabeschicht (output layer) und einer Mehrzahl dazwischenliegender versteckter Schichten (hidden layer). Wie bekannt setzt sich jede Schicht aus einer Vielzahl von Knoten zusammen, wobei die Anzahl der Knoten der Eingabeschicht zur Repräsentation des Abfragedatensatzes passen muss. Eine Eigenschaft des Autoencoders ist, dass die Anzahl der Knoten ausgehend von der Eingabeschicht zunächst Schicht für Schicht abnimmt und schließlich wieder Schicht für Schicht zunimmt, sodass sich zwischen der Eingabeschicht und der Ausgabeschicht ein Flaschenhals (bottle neck) ausbildet.Autoencoders typically have an encoder-decoder architecture with an input layer and an output layer and a plurality of hidden layers in between. As is known, each layer is composed of a large number of nodes, whereby the number of nodes of the input layer must match the representation of the query data set. A property of the autoencoder is that the number of nodes, starting from the input layer, first decreases layer by layer and finally increases again layer by layer, so that a bottleneck forms between the input layer and the output layer.

Das vorgestellte Verfahren und das vorgestellte System lösen das Problem zu einer Abfrage „passender“ Objekte/Personen in Bezug auf eine Abfragekriterium zu identifizieren, also zu einem Anforderungsprofil passende Objekte/Personen zu finden, auch wenn das Abfragekriterium oder die Daten zu den abgefragten Attributen nicht einheitlich oder nicht vollständig sind, weil beispielsweise Datensätze zu den einzelnen Objekten/Personen, aus denen die Auswahl erfolgen soll, abgefragte Attribute nicht vollständig oder nicht in einheitlicher Form enthalten.The presented method and the presented system solve the problem of identifying "suitable" objects/persons in relation to a query criterion, i.e. finding objects/persons that match a requirement profile, even if the query criterion or the data for the queried attributes do not match are uniform or incomplete because, for example, data records for the individual objects/persons from which the selection is to be made do not contain the requested attributes in full or in a uniform form.

Anwendungsgebiete sind beispielsweise das Auffinden geeigneten Recyclingmaterials oder Auffinden passender gebrauchter Ersatzteile, z.B. Autoersatzteile, Batterien oder Platinen mit elektronischen Bauteilen.Areas of application are, for example, finding suitable recycling material or finding suitable used spare parts, e.g. car spare parts, batteries or circuit boards with electronic components.

In diesen Anwendungsfällen besteht häufig das Problem unvollständiger oder ungenauer Datensätze. Beispielsweise kann die Farbe eines gebrauchten Ersatzteils allgemein „schwarz“ oder herstellerspezifisch „Nachtschwarz“ oder spezieller Farbcode „AB201"b angeben sein. Auch kommen häufig ungenaue oder fehlende Angaben zum Herstellungszeitpunkt (Modelljahr), ungenaue oder unvollständige Angaben zum Zustand (z.B. aktuelle Batteriekapazität) usw. vor.In these use cases, there is often the problem of incomplete or inaccurate data sets. For example, the color of a used spare part can be generally "black" or manufacturer-specific "night black" or a special color code "AB201"b. There is also often inaccurate or missing information about the time of manufacture (model year), inaccurate or incomplete information about the condition (e.g. current battery capacity) etc.

Die Erfindung soll nun anhand von Ausführungsbeispielen mit Bezug auf die Figuren näher erläutert werden. Von den Figuren zeigt

1: ein erfindungsgemäßes System zum Identifizieren von Objekten, die in einer Abfrage festgehaltene Objekteigenschaften haben;
2: einen Teil des Systems aus 1, bei dem mithilfe eines Autoencoders eine Abfrage mit Objekteigenschaften in eine typisierte Repräsentation der Abfrage überführt wird;
3: einen Teil des Systems aus 1, bei dem mithilfe eines Autoencoders Datensätze mit Objekteigenschaften in typisierte Repräsentationen dieser Objekteigenschaften überführt werden können;
4: ein Beispiel, bei dem der Autoencoder zum Überführen einer Abfrage oder von Datensätzen mit Objekteigenschaften in entsprechende typisierte Repräsentationen ein Sparse Autoencoder ist;
5: ein Beispiel, bei dem der Autoencoder zum Überführen von Abfragen oder Datensätzen mit Objekteigenschaften in entsprechende typisierte Repräsentationen nur der Encoder-Teil eines Autoencoders ist;
6: ein Beispiel, bei dem der Autoencoder zum Überführen einer Abfrage oder von Datensätzen mit Objekteigenschaften in entsprechende typisierte Repräsentationen einen Multiheadautoencoder ist, bei dem ein Encoder eines Autoencoders mit verschiedenen Decodern verknüpft ist;
7: ein Beispiel für einen Autoencoder zum Überführen von Datensätzen mit Objekteigenschaften in entsprechende typisierte Repräsentationen der Objekteigenschaften mittels dem Encoder-Teil eines Autoencoders;
8: eine schematische Skizze einer Vergleichseinheit und einer Auswahl- und Ausgabeeinheit des Systems aus 1;
9: eine Skizze eines dreidimensionalen Parameterraums - aufgespannt durch die Parameterwerte einer Abfrage - und darin auf Basis der Objekteigenschaften verortete Objekte; und
10: ein Beispiel für einen Transformer zum Überführen von Abfragen oder von Datensätzen mit Objekteigenschaften in mehrere typisierte Repräsentationen der Abfrage oder eines jeweiligen Datensatzes mit Objekteigenschaften; und
11: eine andere Repräsentation eines Transformers.

The invention will now be explained in more detail using exemplary embodiments with reference to the figures. From the figures shows

1 : a system according to the invention for identifying objects that have object properties recorded in a query;
2 : part of the system off 1 , in which a query with object properties is converted into a typed representation of the query using an autoencoder;
3 : part of the system off 1 , in which an autoencoder can be used to convert data sets with object properties into typed representations of these object properties;
4 : an example where the autoencoder for transforming a query or records with object properties into corresponding typed representations is a sparse autoencoder;
5 : an example where the autoencoder for transforming queries or records with object properties into corresponding typed representations is just the encoder part of an autoencoder;
6 : an example where the autoencoder for converting a query or records with object properties into corresponding typed representations is a multi-head autoencoder, where an encoder of an autoencoder is associated with different decoders;
7 : an example of an autoencoder for converting data sets with object properties into corresponding typed representations of the object properties using the encoder part of an autoencoder;
8th : a schematic sketch of a comparison unit and a selection and output unit of the system 1 ;
9 : a sketch of a three-dimensional parameter space - spanned by the parameter values of a query - and objects located therein on the basis of the object properties; and
10 : an example of a transformer for transforming queries or object property datasets into multiple typed representations of the query or a respective object property dataset; and
11 : another representation of a Transformer.

Ein System zum Identifizieren zu einer Abfrage passender Objekte umfasst ein Eingabeinterface 12 für eine Abfrage, einen ersten Autoencoder oder Transformer 14, zum Überführen der Abfrage in eine typisierte Repräsentation 16 der Abfrage und parallel dazu eine Datenbank 18 mit Datensätzen, die Objekteigenschaften repräsentieren, einen zweiten Autoencoder oder Transformer 20 zum Überführen der Datensätze mit Objekteigenschaften in typisierte Repräsentationen 22 der Datensätze mit Objekteigenschaften und eine Vergleichseinheit 24 zum Bestimmen der Ähnlichkeit von typisierten Repräsentationen 16 der Abfrage mit typisierten Repräsentationen 22 der Objekteigenschaften, sowie eine Auswahl- und Ausgabeeinheit 26 zum Anzeigen identifizierter passender Objekte; siehe 1.A system for identifying objects that match a query includes an input interface 12 for a query, a first autoencoder or transformer 14 for converting the query into a typed representation 16 of the query and, in parallel, a database 18 with data records that represent object properties, a second Autoencoder or transformer 20 for converting the data sets with object properties into typed representations 22 of the data sets with object properties and a comparison unit 24 for determining the similarity of typed representations 16 of the query with typed representations 22 of the object properties, and a selection and output unit 26 for displaying identified matching ones objects; please refer 1 .

Ein zum Überführen einer Abfrage in eine typisierte Repräsentation der Abfrage geeigneter Autoencoder hattypischerweise eine Encoder-Decoderstruktur, die von in Schichten organisierten Knoten 30 gebildet ist. Diese Schichten umfassen eine Eingabeschicht 32, mehrere verdeckte Schichten (hidden layers) 34 und eine Ausgabeschicht 36; siehe 2. Die Knoten 30 auf einander folgender Schichten sind miteinander verbunden, sodass einem Koten 30 einer nachfolgenden Schicht die Ausgangswerte einiger oder sämtlicher Knoten 30 einer vorangegangenen Schicht zugeführt werden. Die Ausgangswerte der Knoten einer vorangegangenen Schicht bilden somit die Eingangswerte für einen Knoten in einer nächstfolgenden Schicht. Ein Knoten ist dazu ausgebildet, die Eingangswerte gewichtet zu verarbeiten, beispielsweise zu Summieren um auf diese Weise einen Ausgangswert zu bilden, der dann an die Knoten der nachfolgenden Schicht weitergegeben wird.An autoencoder suitable for converting a query into a typed representation of the query typically has an encoder-decoder structure formed by nodes 30 organized in layers. These layers include an input layer 32, several hidden layers 34, and an output layer 36; please refer 2 . The nodes 30 of successive layers are connected to one another, so that a node 30 of a subsequent layer is supplied with the output values of some or all of the nodes 30 of a preceding layer. The output values of the nodes of a previous layer thus form the input values for a node in a subsequent layer. A node is designed to process the input values in a weighted manner, for example to add them together in order to form an output value in this way, which is then forwarded to the nodes of the subsequent layer.

Ein Autoencoder hat die Eigenschaft, dass die Anzahl der Knoten derjenigen Schichten, die auf die Eingabeschicht folgen, zunächst abnehmen. Dieser Teil des Autoencoders bildet einen Encoder 38. Wenigstens eine mittlere Schicht des Autoencoders 14 ist eine Schicht mit der geringsten Anzahl von Knoten und bildet einen bottle neck 40. Die Ausgangswerte des Knoten des bottle necks 40 stellen abstrahierte Repräsentationen derjenigen Daten dar, die auf die Knoten der Eingabeschicht 32 des Autoencoders 14 gegeben werden. In einem Decoder-Teil 42 nimmt die Anzahl der Knoten pro Schicht bis hin zur Ausgabeschicht 36 sukzessive wieder zu.An autoencoder has the property that the number of nodes in the layers that follow the input layer initially decreases. This portion of the autoencoder forms an encoder 38. At least a middle layer of the autoencoder 14 is a layer with the fewest number of nodes and forms a bottleneck 40 Nodes of the input layer 32 of the autoencoder 14 are given. In a decoder part 42 the number of nodes per layer increases again successively up to the output layer 36 .

Ein Autoencoder ist ein neuronales Netz, dessen Eigenschaften durch Training bestimmt werden. Beim Training werden dem Autoencoder einer Vielzahl von Trainingsdaten zugeführt. Die Trainingsdaten werden durch den Encoder des Autoencoders zunächst in abstraktere Präsentationen überführt und schließlich von dem Decoder-Teil des Autoencoders wieder möglichst genau rekonstruiert. Dabei soll ein Rekonstruktionsverlust (reconstruction loss) zwischen den Daten, die auf die Eingabeschicht des Autoencoders gegeben werden und denjenigen Daten, die der Autoencoderan der Ausgabeschicht ausgibt, möglichst klein sein. Durch das Training mit sehr vielen Datensätzen, die beispielsweise verschiedene Abfragen repräsentieren, stellen sich aufgrund der Anforderung, den Rekonstruktionsverlust so gering wie möglich zu halten, in den Knoten jeweils Gewichte ein, die dazu führen, dass stark streuende Anteile von auf die Eingabeschicht gegebenen Trainingsdatensätzen weniger stark gewichtet werden, als solche Bestandteile der Trainingsdatensätze, die weniger stark streuen. Dies verleiht dem Autoencoder die Eigenschaft, im späteren Betrieb eine beliebige Abfrage in eine typisierte Repräsentation der Abfrage zu überführen. Diese typisierte Repräsentation der Abfrage kann völlig abstrakt und für einen Nutzer kaum verständlich sein. Typischerweise generiert der Decoder jedoch eine typisierte Repräsentation einer Abfrage, die für einen Nutzer verständlich ist und um beispielsweise zufällige Ungenauigkeiten bereinigt ist.An autoencoder is a neural network whose properties are determined by training. During training, the autoencoder is supplied with a large number of training data. The training data is first converted into more abstract presentations by the encoder of the autoencoder and then reconstructed as precisely as possible by the decoder part of the autoencoder. A reconstruction loss between the data that is given to the input layer of the autoencoder and the data that the autoencoder outputs to the output layer should be as small as possible. Due to the training with a large number of data sets, which, for example, represent different queries, weights arise in the nodes due to the requirement to keep the reconstruction loss as low as possible, which lead to heavily scattering parts of the training data sets given on the input layer are weighted less than those components of the training data sets that scatter less. This gives the autoencoder the ability to convert any query into a typed representation of the query during later operation. This typed representation of the query can be completely abstract and hardly understandable for a user. However, the decoder typically generates a typed representation of a query that is understandable to a user and has been cleaned of random inaccuracies, for example.

Um eine Überanpassung (over fitting) des trainierten Autoencoders 14 zu vermeiden, wird der Autoencoder 14 beim Trainieren variiert, z.B. indem einige Knoten der verdeckten Schichten 34 während des Trainings in zufälligerweise deaktiviert werden (drop out).In order to avoid overfitting of the trained autoencoder 14, the autoencoder 14 is varied during training, e.g. by randomly deactivating (dropping out) some nodes of the hidden layers 34 during training.

Anstelle eines üblichen Autoencoders 14 oder 20 kann zum Generieren der typisierten Repräsentationen entweder der Abfrage oder der Datensätze mit Objekteigenschaften ein Sparse Autoencoder 50 genutzt werden, wie beispielhaft in 3 gezeigt ist. Bei dem Sparse Autoencoder 50 fehlt gegenüber demAutoencoder 14 oder 20 der charakteristische bottle neck zwischen Encoder 38 und Decoder 42. Vielmehr können bei einem Sparse Autoencoder 50 sämtliche Schichten (layer) die gleiche Anzahl von Knoten enthalten, d.h. sowohl die Eingabeschicht 32 als auch die Ausgabeschicht 36 und die dazwischenliegenden, verdeckten Schichten (hidden layer) 34 weisen gleich viele Knoten 30 auf. Um dennoch den gewünschten Effekt zu erzielen, beispielsweise für eine konkrete, auf die Eingabeschicht 32 des Sparse Autoencoder 50 gegebene Abfrage eine typisierte Repräsentation zu erzeugen, bei der arbiträre Merkmale weniger gewichtet sind als zentrale Merkmale, werden während des Trainings des Sparse Autoencoders einige Knoten 30' der verdeckten Schichten 34' deaktiviert. Hierzu kann der Sparse Autoencoder 50 für jede Schicht zusätzliche Bias-Knoten aufweisen. Es sei angemerkt, dass die Struktur (Topologie) des Sparse Autoencoders 50 ähnlich sein kann, wie bei dem Autoencoder 14. D.h., auch bei dem Sparse Autoencoder 50 können verdeckte Schichten 34' weniger Knoten 30 aufweisen als die Eingabeschicht 32 und die Ausgabeschicht 36.Instead of a conventional autoencoder 14 or 20, a sparse autoencoder 50 can be used to generate the typed representations of either the query or the data sets with object properties, as exemplified in 3 is shown. Compared to the autoencoder 14 or 20, the sparse autoencoder 50 lacks the charac teristic bottleneck between encoder 38 and decoder 42. Rather, with a sparse autoencoder 50, all layers (layer) contain the same number of nodes, ie both the input layer 32 and the output layer 36 and the hidden layers (hidden layer) 34 lying in between have the same number of nodes 30. In order to nevertheless achieve the desired effect, for example for a specific query given to the input layer 32 of the sparse autoencoder 50 to generate a typified representation in which arbitrary features are weighted less than central features, some nodes 30 ' of the hidden layers 34' disabled. To this end, the sparse autoencoder 50 can have additional bias nodes for each layer. It should be noted that the structure (topology) of the sparse autoencoder 50 can be similar to that of the autoencoder 14. That is, even with the sparse autoencoder 50, hidden layers 34' can have fewer nodes 30 than the input layer 32 and the output layer 36.

Zu beachten ist, dass durch entsprechendes Deaktivieren einzelner Knoten bei einem Sparse Autoencoder eine Abfrage in unterschiedliche typisierte Repräsentationen überführt werden kann, so wie dies in 3 dargestellt ist.It should be noted that by disabling individual nodes in a sparse autoencoder, a query can be transformed into different typed representations, as is the case in 3 is shown.

Anstelle einen vollständigen Autoencoder 14 oder 20 oder einen Sparse Autoencoder 50 zum Erzeugen der typisierten Repräsentationen der Abfrage und/oder der Datensätze mit Objekteigenschaften vorzusehen, kann auch nur der Encoder 38 eines Autoencoders einschließlich des bottle necks 40 verwendet werden. Die Knoten des bottle necks 40 stellen dann bereits die Ausgabeschicht dar. Ein derartiges Beispiel ist in 4 gezeigt. Mit einem derartigen neuronalen Netz kann eine Abfrage oder können Datensätze mit Objekteigenschaften in Merkmalsvektoren 56 überführt werden, die ebenfalls typisierte Repräsentationen beispielsweise einer Abfrage oder eines jeweiligen mit Objekteigenschaften darstellen. Das Beispiel in 4 zeigt das Überführen einer Abfrage in eine Mehrzahl von Merkmalsvektoren mittels des Encoders 38. Gleichermaßen können auch Datensätze mit Objekteigenschaften in entsprechende Merkmalsvektoren überführt werden. Dies ist nicht gezeigt.Instead of providing a full autoencoder 14 or 20 or a sparse autoencoder 50 to generate the typed representations of the query and/or the object property records, only the encoder 38 of an autoencoder including the bottleneck 40 can be used. The nodes of the bottle neck 40 then already represent the output layer. Such an example is shown in 4 shown. With such a neural network, a query or data sets with object properties can be converted into feature vectors 56, which likewise represent typified representations, for example of a query or a respective one with object properties. The example in 4 shows the conversion of a query into a plurality of feature vectors by means of the encoder 38. Similarly, data sets with object properties can also be converted into corresponding feature vectors. This is not shown.

In 5 ist illustriert, dass der Autoencoder 18' auch eine Multihead-Architektur aufweisen kann, bei der ein Encoder 38 mit einer Vielzahl von Decodern 42 verbunden ist, die jeweils unterschiedlich trainiert sind. Auch auf diese Weise ist es möglich, beispielsweise eine Abfrage oder auch Datensätze mit Objekteigenschaften in jeweils eine Mehrzahl typisierte Repräsentationen zu überführen.In 5 It is illustrated that the autoencoder 18′ can also have a multihead architecture in which an encoder 38 is connected to a large number of decoders 42 which are each trained differently. In this way it is also possible, for example, to convert a query or also data sets with object properties into a plurality of typed representations.

6 und 7 illustrieren, wie jeder Datensatz mit Objekteigenschaften aus einer Vielzahl von Datensätzen mit Objekteigenschaften mittels eines Autoencoders 20 (6) in typisierte Repräsentationen dieser Datensätze mit Objekteigenschaften überführt werden kann. 7 illustriert analog zu 4, dass jeder Datensatz mit Objekteigenschaften aus einer Vielzahl von Datensätzen mit Objekteigenschaften mittels des Encoder-Teils 38 eines Autoencoders in Merkmalsvektoren überführt werden kann, die eine typisierte Repräsentation des jeweiligen Datensatzes mit Objekteigenschaften darstellen. 6 and 7 illustrate how each object property record is selected from a plurality of object property records using an autoencoder 20 ( 6 ) can be converted into typed representations of these datasets with object properties. 7 illustrated analogously to 4 that each data set with object properties from a plurality of data sets with object properties can be converted by means of the encoder part 38 of an autoencoder into feature vectors which represent a typed representation of the respective data set with object properties.

Nachdem einerseits eine Abfrage in eine oder mehrere typisierte Repräsentationen überführt wurde und andererseits die Datensätze mit Objekteigenschaften jeweils in eine oder mehrere typisierte Repräsentationen überführt wurden, können die Ähnlichkeiten zwischen der oder den typisierten Repräsentationen der Abfrage und der oder den typisierten Repräsentationen der jeweiligen Datensätze mit Objekteigenschaften bestimmt werden. Dies geschieht mittels der in 1 dargestellten Vergleichseinheit 24. 8 zeigt die Vergleichseinheit 24 in einer detaillierteren Darstellung.After, on the one hand, a query has been converted into one or more typed representations and, on the other hand, the data sets with object properties have each been converted into one or more typed representations, the similarities between the typed representation(s) of the query and the typed representation(s) of the respective data sets with object properties to be determined. This is done using the in 1 comparison unit 24 shown. 8th shows the comparison unit 24 in a more detailed representation.

Nach Bestimmen der Ähnlichkeiten zwischen beispielsweise einer Repräsentation der Abfrage und den Repräsentationen der Datensätze mit Objekteigenschaften können diejenigen, zugehörigen Objekte identifiziert werden, die hinsichtlich der Objekteigenschaften der Abfrage am ähnlichsten sind. Diese Objekte können dann über eine Ausgabe 26 angezeigt werden.After determining the similarities between, for example, a representation of the query and the representations of the object property records, those associated objects that are most similar in object properties to the query can be identified. These objects can then be displayed via an output 26.

Entscheidend ist, dass dieses Identifizieren von Objekten mit dem hier beschriebenen System 10 auch dann funktioniert, wenn die Datensätze mit Objekteigenschaften für die verschiedenen Objekte nicht einheitlich, insbesondere nicht vollständig sind oder die erfassten Objekteigenschaften, insbesondere die erfassten physikalischen Parameter ungenau oder voneinander abweichend repräsentiert sind. Dies erlaubt es, dass die Datensätze mit Objekteigenschaften von verschiedenen, nicht streng miteinander abgestimmten Quellen stammen können. Dennoch ist eine sinnvolle Abfrage zum Identifizieren geeigneter Objekte möglich. Dies erlaubt ein automatisches Identifizieren geeigneter Objekte ohne vorhergehende, aufwendige manuelle Pflege der Datensätze und auch ohne vorhergehende detaillierte Definition der Inhalte einer Abfrage.What is decisive is that this identification of objects with the system 10 described here also works if the data sets with object properties for the various objects are not uniform, in particular not complete, or the recorded object properties, in particular the recorded physical parameters, are represented imprecisely or deviate from one another . This allows the data sets with object properties to come from different, not strictly coordinated sources. Nevertheless, a meaningful query to identify suitable objects is possible. This allows suitable objects to be identified automatically without the need for time-consuming manual maintenance of the data records and without a detailed definition of the contents of a query.

Die Vergleichseinheit 24 ist dazu ausgebildet, für jede typisierte Repräsentation einer Abfrage einen Ähnlichkeitswert, insbesondere die euklidische Distanz zu den typisierten Repräsentationen eines jeden der Datensätze mit Objekteigenschaften zu bestimmen. Aus den so bestimmten Ähnlichkeitswerten erzeugt die Vergleichseinheit 24 eine Ähnlichkeitsmatrix 60 mit der Dimension m × n, wobei m die Anzahl der typisierten Repräsentationen der Abfrage ist und n die Anzahl der typisierten Repräsentationen der Datensätze mit Objekteigenschaften. Mithilfe derÄhnlichkeitsmatrix60 können dann diejenigen Objekte identifiziert werden, deren zugehörige typisierte Repräsentationen der Datensätze mit Objekteigenschaften die größte Ähnlichkeit, insbesondere die kleinste euklidische Distanz zu den typisierten Repräsentationen der Abfrage haben.The comparison unit 24 is designed to determine a similarity value for each typified representation of a query, in particular the Euclidean distance to the typified representations of each of the data sets with object properties. From the values of similarity thus determined ten, the comparison unit 24 generates a similarity matrix 60 with the dimension m×n, where m is the number of typed representations of the query and n is the number of typed representations of the data sets with object properties. The similarity matrix 60 can then be used to identify those objects whose associated typed representations of the data sets with object properties have the greatest similarity, in particular the smallest Euclidean distance, to the typed representations of the query.

Zum Identifizieren zu der Abfrage passender Objekte ist schließlich die Auswahl- und Ausgabeeinheit 26 vorgesehen. Diese ist dazu ausgebildet auf Basis der Ähnlichkeitsmatrix 60 einen Repräsentationswert aus der Ähnlichkeitsmatrix 60 zu bilden. Dies kann beispielsweise auf eine der vier folgenden Arten erfolgen:

1. Es wird nur der höchste Wert der Ähnlichkeitsmatrix 60 herangezogen
2. Der Mittelwert oder Median einer Zeile oder Spalte der Ähnlichkeitsmatrix 60 wird herangezogen
3. Der Mittelwert oder Median der Top-N Ähnlichkeitswerte (beispielsweise der 5 höchsten Werte) wird verwendet
4. Der Mittelwert oder Median der gesamten Matrix wird verwendet.

Finally, the selection and output unit 26 is provided for identifying objects that match the query. This is designed to form a representation value from the similarity matrix 60 on the basis of the similarity matrix 60 . For example, this can be done in one of four ways:

1. Only the highest value of the similarity matrix 60 is used
2. The mean or median of a row or column of the similarity matrix 60 is taken
3. The mean or median of the top-N similarity values (e.g. the 5 highest values) is used
4. The mean or median of the entire matrix is used.

Dieser Repräsentationswert wird mit einem im System vorgegebenen Schwellenwert abgeglichen und wenn der Repräsentationswert über oder gleich dem Schwellenwert ist, wird der betreffende Datensatz und das damit verbundene Objekt (oder Person) als zu der Abfrage passend identifiziert und eine Liste passender Objekte an der Ausgabe bereitgestellt.This representation value is checked against a threshold set in the system and if the representation value is above or equal to the threshold, the record in question and the object (or person) associated with it are identified as matching the query and a list of matching objects is provided at the output.

Die auf Basis der typisierten Repräsentationen der Abfragen oder der entsprechenden interpretierbaren Datenrepräsentationen durchgeführten Vergleiche mit den Datensätzen mit Objekteigenschaften beziehungsweise deren typisierten Repräsentationen können auch mittels eines entsprechenden Rankings bewertet werden. Dafür wird eine Zuordnung von Datensätzen mit Objekteigenschaften zu der jeweiligen Abfrage auf Basis von Vektorräumen realisiert, die eine jeweilige Abfrage, repräsentieren, realisiert. Je zentraler ein Datensatz mit Objekteigenschaften in dem von der interpretierbaren Repräsentation der Abfrage definierten Vektorraum liegt, desto höher ist das Ranking. Dies ist beispielhaft in 9 für einen dreidimensionalen Parameter- beziehungsweise Vektorraum skizziert. Ein derartiger dreidimensionaler Parameter- beziehungsweise Vektorraum ergibt sich, wenn die einzelnen Objekt durch jeweils durch drei Parameter wie z.B. Alter, Größe und Zustand gekennzeichnet sind. Jedes Objekt ist dann durch einen dreidimensionalen Merkmalsvektor repräsentiert, der sich durch einen Punkt in dem dreidimensionalen Parameter- beziehungsweise Vektorraum darstellen lässt. Der als Quader in 10 dargestellte dreidimensionale Parameter- beziehungsweise Vektorraum kann beispielsweise eine Abfrage repräsentieren. Einige Objekte fallen in den durch die Abfrage definierten dreidimensionalen Parameter- beziehungsweise Vektorraum, während andere Objekte nicht zur Abfrage passen und außerhalb des durch die Abfrage definierten dreidimensionalen Parameterbeziehungsweise Vektorraums liegen.The comparisons carried out on the basis of the typed representations of the queries or the corresponding interpretable data representations with the data sets with object properties or their typed representations can also be evaluated using a corresponding ranking. For this purpose, data sets with object properties are assigned to the respective query on the basis of vector spaces that represent a respective query. The more central a data set with object properties is in the vector space defined by the interpretable representation of the query, the higher the ranking. This is an example in 9 for a three-dimensional parameter or vector space. Such a three-dimensional parameter or vector space results when the individual objects are each characterized by three parameters such as age, size and condition. Each object is then represented by a three-dimensional feature vector, which can be represented by a point in the three-dimensional parameter or vector space. The as a cuboid in 10 The three-dimensional parameter space or vector space shown can represent a query, for example. Some objects fall within the three-dimensional parameter or vector space defined by the query, while other objects do not match the query and fall outside the three-dimensional parameter or vector space defined by the query.

In einem jeweiligen eine Abfrage repräsentierenden Vektorraum repräsentiert jeder Teil der Abfrage (also z.B. jeder in der Anfrage definierte Parameter) eine Vektordimension. Beispielsweise stellen Materialdichte, Herstellungsdatum und Zustand jeweils eine Vektordimension dar. Damit ergibt sich ein dreidimensionaler Paramterraum. Die Werte der einzelnen Dimensionen müssen in eine einheitliche Skalierung überführt werden. Soll jeder Wert den gleichen Wert für das Ranking haben, ist der höchste Rankingwert der Mitte des Vektorraums zugeordnet. Es können allerdings auch Gewichtungen für die einzelnen Dimensionen vergeben werden. In diesem Fall befindet sich der Punkt des höchsten Rankingwertes nicht mehr in der Mitte des Vektorraums, sondern ist im Vektorraum verschoben.In a respective vector space representing a query, each part of the query (e.g. each parameter defined in the query) represents a vector dimension. For example, material density, date of manufacture and condition each represent a vector dimension. This results in a three-dimensional parameter space. The values of the individual dimensions must be converted to a uniform scale. If each value is to have the same value for ranking, the highest ranking value is assigned to the middle of the vector space. However, weightings can also be assigned for the individual dimensions. In this case, the point of the highest ranking value is no longer in the middle of the vector space, but is shifted in the vector space.

Wie eingangs erläutert können sowohl das erste neuronale Netz zum Erzeugen typisierter Repräsentationen einer Abfrage als auch das zweite neuronale Netz zum Erzeugen typisierter Repräsentation der jeweiligen Datensätze mit Objekteigenschaften Transformer, insbesondere Sparse Transformer sein. 10 zeigt einen derartigen Transformer, der in dem System gemäß 1 anstelle der dort beispielhaft dargestellten Autoencoder 14 und 20 vorgesehen sein kann.As explained at the outset, both the first neural network for generating typed representations of a query and the second neural network for generating typed representations of the respective data sets with object properties can be transformers, in particular sparse transformers. 10 shows such a transformer used in the system according to FIG 1 can be provided instead of the autoencoder 14 and 20 shown there as an example.

Auch der Transformer 90 besitzt eine Encoder-Decoder-Architektur. Tatsächlich sind eine Vielzahl von Encondern 92 und einen Vielzahl von Decodern 94 vorgesehen. Wie einer Fachfrau oder einem Fachmann grundsätzlich bekannt weist jeder Encoder 92 eines Transformers jeweils eine Self-Attention Schicht 96 und Feedforward Schicht 98 auf. Jeder Decoder 94 weist ebenfalls eine Self-Attention Schicht 100 und eine Feedforward Schicht 102 auf. Zwischen der Self-Attention Schicht 100 eines Decoders 94 und dessen Feedforward Schicht 102 ist eine Encoder-Decoder-Attention Schicht 104 vorgesehen.The Transformer 90 also has an encoder-decoder architecture. In fact, a plurality of encoders 92 and a plurality of decoders 94 are provided. As is fundamentally known to a person skilled in the art, each encoder 92 of a transformer has a self-attention layer 96 and a feedforward layer 98 in each case. Each decoder 94 also has a self-attention layer 100 and a feedforward layer 102 . An encoder-decoder attention layer 104 is provided between the self-attention layer 100 of a decoder 94 and its feedforward layer 102 .

Die Eingangsschicht des ersten Encoders 92 wird von einer Embedding Schicht 106 gebildet. Für eine jeweilige Abfrage erfolgt somit zunächst eine Einbettung der Abfragebestandteile (input embedding) und eine Codierung der Position (position encoding). Ein so gebildeter Eingangsdatensatz wird dann durch die verschiedenen Encoder 92 hindurch gereicht und verarbeitet. Der Ausgabewert des letzten Encoders 92 wird dann auf alle Encoder-Decoder-Attention Schichten 104 aller Decoder 94 gegeben.The input layer of the first encoder 92 is formed by an embedding layer 106 . For a respective query is therefore first of all an embedding of the query components (input embedding) and a coding of the position (position encoding). An input data set formed in this way is then passed through the various encoders 92 and processed. The output value of the last encoder 92 is then given to all encoder-decoder attention layers 104 of all decoders 94 .

Bei bekannten Transformern werden die Ausgaben aller Decoder beispielsweise mittels einer Softmax-Funktion zu einem einzigen Ausgabetensorverarbeitet. Abweichend hiervon ist es zum gewinnen unterschiedlicher typisierter Repräsentationen einer Abfrage oder eines jeweiligen Datensatzes mit Objekteigenschaften vorgesehen, die von jedem einzelnen der Decoder 94 erzeugten Ausgabetensoren als verschiedene typisierte Repräsentationen einer Abfrage aus dem neuronalen Netz 14 beziehungsweise 20 herauszuführen und der Vergleichseinheit 26 zuzuführen. Aus diesem Grunde sind in der schematischen Repräsentation eines Transformers in 9 die auf den letzten Decoder 94 folgenden Schichten nicht weiter dargestellt.In known transformers, the outputs of all decoders are processed into a single output tensor, for example using a softmax function. Deviating from this, in order to obtain different typed representations of a query or a respective data set with object properties, the output tensors generated by each individual decoder 94 are taken out of neural network 14 or 20 as different typed representations of a query and fed to comparison unit 26. For this reason, in the schematic representation of a transformer in 9 the layers following the last decoder 94 are not further shown.

In Bezug auf die Modellbildung, also das Training der neuronalen Netzen 14 und 20 ist anzumerken, dass das zum Generieren typisierter Repräsentationen der Abfrage dienende neuronale Netz 14 und das zum Generieren der typisierten Repräsentationen der Datensätze mit Objekteigenschaften dienende neuronale Netz 20 zunächst mit identischen Trainingsdatensätzen trainiert werden können. Anschließend kann das neuronale Netz 20 zum Bilden der typisierten Repräsentationen der Datensätze mit Objekteigenschaften nachtrainiert werden.With regard to the modeling, i.e. the training of the neural networks 14 and 20, it should be noted that the neural network 14 used to generate typed representations of the query and the neural network 20 used to generate the typed representations of the data sets with object properties initially trained with identical training data sets can become. The neural network 20 can then be retrained to form the typified representations of the data sets with object properties.

Vorzugsweise haben das erste neuronale Netz 14 und das zweite neuronale Netz 20 eine ähnliche oder identische Topologie. Ist die Topologie der beiden neuronalen Netze 14 und 20 identisch, unterscheiden sich die beiden neuronalen Netze 14 und 20 in der Regel immer noch durch das von Ihnen verkörperte Modell, das durch die infolge des Trainings erzeugten Gewichte in den einzelnen Knoten repräsentiert ist. Bekanntermaßen gibt beispielsweise jeder Knoten 30 einer vorangehenden Schicht seinen Ausgabewert an sämtliche Konten der nachfolgenden Schicht weiter (zumindest im Falle eines voll vernetzten (fully connected) neuronalen Netzes), so dass einem Knoten einer nachfolgenden Schicht die Ausgabewerte aller Knoten der vorangehenden Schicht als Eingabewerte zugeführt werden. Diese Eingabewerte werden in dem jeweiligen Knotenunterschiedlich gewichtet, wobei die unterschiedlichen gewichte Ergebnis des Trainings des neuronalen Netzes sind und zusammen mit der Topologie des neuronalen Netzes (d.h. dessen Aufbau aus Knoten, Schichten und Verbindungen) ein Modell verkörpern.Preferably, the first neural network 14 and the second neural network 20 have a similar or identical topology. If the topology of the two neural networks 14 and 20 is identical, the two neural networks 14 and 20 usually still differ in the model they embody, which is represented by the weights generated in the individual nodes as a result of the training. It is known, for example, that each node 30 of a preceding layer passes on its output value to all nodes of the following layer (at least in the case of a fully connected neural network), so that a node of a subsequent layer receives the output values of all nodes of the preceding layer as input values become. These input values are weighted differently in the respective node, the different weights being the result of the training of the neural network and embodying a model together with the topology of the neural network (i.e. its structure from nodes, layers and connections).

11 zeigt eine andere Repräsentation eines Transformers. 11 shows another representation of a transformer.

BezugszeichenlisteReference List

1010: Systemsystem
1212: Eingabeinterfaceinput interface
1414: Autoencoder/TransformerAutoencoder/Transformer
1616: typisierte Repräsentation der Abfragetyped representation of the query
18, 18'18, 18': DatenbankDatabase
2020: Autoencoder/TransformerAutoencoder/Transformer
2222: typisierte Repräsentationen der Objekteigenschaftentyped representations of object properties
2424: Vergleichseinheitcomparison unit
2626: Auswahl- und AusgabeeinheitSelection and output unit
30, 30'30, 30': Knotennode
3232: Eingabeschichtinput layer
34, 34'34, 34': verdeckte Schichtenhidden layers
3636: Ausgabeschichtoutput layer
3838: Encoderencoders
4040: bottle neckbottleneck
4242: Decoderdecoder
5050: Sparse AutoencoderSparse autoencoders
5656: Merkmalsvektorenfeature vectors
6060: Ähnlichkeitsmatrixsimilarity matrix
9292: Encoderencoders
9494: Decoderdecoder
96, 10096, 100: Self-Attention SchichtSelf Attention Layer
98, 10298, 102: Feedforward Schichtfeedforward layer
104104: Encoder-Decoder Attention SchichtEncoder Decoder Attention Layer
106106: Embedding Schichtembedding layer

Claims

Method with the steps: - Detecting a plurality of physically measurable properties of a number M of objects or individuals, each object or individual being individually characterized by values of the plurality of physically measurable properties detected for this object or individual - Forming a number m corresponding to the number of Data sets with object properties, each of which contains recorded values for at least a part of a plurality of physically measurable properties relating to the respective object or individual - forming a query for selecting a subset of the objects or individuals, the query containing values for at least part of the plurality of physically measurable properties - transforming the query using an autoencoder or a transformer into a number n of typed representations of the query, - transforming each the m data sets with object properties by means of an autoencoder or a transformer into a number of m typed representations of the data sets with object properties, and - determining a similarity value for each of the m typed representations of the data sets with object properties by determining the similarity between a typed representation of the query and a respective one of the m typed representations of the datasets with object properties.

procedure after claim 1 , additionally comprising: setting up a similarity matrix containing the determined similarity values.

procedure after claim 1 or 2 , wherein determining a similarity value for each of the m typed representations of the object property datasets by determining the similarity between a typed representation of the query and a respective one of the m typed representations of the object property datasets forming a value of the Euclidean distance between the selection feature vectors with the object feature vectors representing a respective data set.

procedure after claim 3 , where the Euclidean distance is the respective similarity value.

procedure after claim 2 and 4 , wherein the similarity matrix contains values for the Euclidean distance between the selection feature vectors and the object feature vectors representing a respective data set.

procedure after claim 1 or 2 , wherein a similarity value is determined for each of the m typed representations of the data sets with object properties by forming a vector space representing the respective query and determining the position given by the object properties, which are defined by a respective data set with object properties, in the vector space representing the respective query .

procedure after claim 6 where the proximity of the position given by the object properties to a center of the vector space representing the respective query is the respective similarity value.

Procedure with the steps: - detecting a plurality of physically measurable properties of a number M of objects or individuals, each object or individual being individually characterized by values of the plurality of physically measurable properties detected for this object or individual - Forming a number m corresponding to the number of data sets with object properties, each of which contains detected values for at least a part of a plurality of physically measurable properties of the respective object or individual - forming a query to select a subset of the objects or individuals, the query including values for at least a portion of the plurality of physically measurable properties - Transforming the query using an autoencoder or a transformer into a number n of typed representations of the query, - transforming the typed representations of the query into interpretable data representations of the query, and - determining a similarity score for each of the object property records by determining the similarity between an interpretable data representation of the query and a respective one of the object property records.

procedure according to claim 8 , characterized in that the typed representations of the query are mapped to an interpretable data representation of the typed query by means of a further decoder or by deterministic mapping and are thus converted into a query format that can be applied directly to the data sets with object properties.

System for identifying objects, with - an input interface for entering or receiving a query, - a first neural network connected to the input interface, which is configured and trained to convert a query into at least one typed representation, preferably a plurality of typed representations of the query, - an access to a database containing data sets with object properties, - a second neural network connected to the access to the database, which is used to convert a respective data set with object properties into at least one typed representation of the data set with object properties, preferably a plurality of typed representations of the data set with object properties is configured and trained, the number of nodes of the output layer or layers of the second neural network being equal to the number of nodes of the out output layer or the output layers of the first neural network, so that the typified representations of the query and the respective data set with object properties have the same dimension, - a comparison unit for determining the similarities of typified representations of the query generated by means of the first neural network with by means of the second neural network generated typed representations of the data sets with object properties, and - a selection and output unit for identifying a query matching objects based on the similarities determined by the comparison unit and for displaying identified matching objects.

system according to claim 10 , In which the first neural network (14) and the second neural network (20) are each designed as a sparse autoencoder.

system according to claim 10 , In which the first neural network (14) and the second neural network (20) are each designed as a transformer (90), in particular as a sparse transformer.