DE102012005325A1

DE102012005325A1 - Machine image recognition method based on a Kl system

Info

Publication number: DE102012005325A1
Application number: DE102012005325A
Authority: DE
Inventors: Anmelder Gleich
Original assignee: Individual
Current assignee: Individual
Priority date: 2012-03-19
Filing date: 2012-03-19
Publication date: 2013-09-19
Also published as: WO2013139754A1

Abstract

Insgesamt beschreibt die Erfindung ein Verfahren zur maschinellen Erkennung von Bilddaten eines Gesamtbilds oder einer Abfolge von Bildern, charakterisiert durch die folgenden Schritte: – Erfassen des Bildes und dessen Aufteilung und Klassifizierung in Unterlelemente, also detaillierte Bildelemente anhand einer Bildelement- und/oder Bildobjekts-Merkmalsanalyse insbesondere hinsichtlich geometrischen Grundformen, Texturen, Farben, Material, Perspektive, wobei die Bildelement- und/oder Bildobjekts-Merkmalsanalyse durch analytische deterministische Softwaretechniken der Bildverarbeitung und Bildanalyse realisiert wird, insbesondere Fourier-Analyse, Kantenantastung, Farbanalyse und ähnliche; – Erkennen und Identifizieren der klassifizierten Bildelements- und/oder Bildobjekts-Merkmale unter Verwendung künstlicher Intelligenz, insbesondere eines neuronalen Netzes derart, dass den Bildelementen und/oder Bildobjekten jeweils eine oder auch mehrere beschreibende Textbezeichnungen zugeordnet werden; – Einspeisen der den Bildelementen und/oder Bildobjekten zugeordneten Textbezeichnungen in eine textuelle Wissensbasis, in welcher eine weitergehende Analyse der Beziehungen der Bildelemente und oder Bildobjekte zueinander und untereinander sowie zum Bild und/oder zu Teilen des Bildes mittels einer textbasierten Suchmaschine, insbesondere basierend auf einem neuronalen Netz derart vorgenommen werden, dass der Inhalt und Kontext des Bildes oder der Abfolge von Bildern ermittelt wird.Overall, the invention describes a method for the automatic recognition of image data of an overall image or a sequence of images, characterized by the following steps: acquisition of the image and its division and classification into sub-elements, that is, detailed image elements based on a picture element and / or image object feature analysis in particular as regards basic geometric shapes, textures, colors, material, perspective, wherein the pixel and / or image feature analysis is realized by analytical deterministic software techniques of image processing and image analysis, in particular Fourier analysis, edge sampling, color analysis and the like; Recognizing and identifying the classified picture element and / or picture object features using artificial intelligence, in particular a neural network, such that one or more descriptive text names are respectively assigned to the picture elements and / or picture objects; Feeding the text designations associated with the picture elements and / or picture objects into a textual knowledge base, in which a further analysis of the relationships of the picture elements and / or picture objects to each other and to one another and to the picture and / or parts of the picture by means of a text-based search engine, in particular based on a Neural network can be made such that the content and context of the image or the sequence of images is determined.

Description

Die Erfindung betrifft ein maschinelles automatisiertes Bilderkennungsverfahren, welches zunächst ein zu analysierendes Gesamtbild, dessen Bildinhalt erkannt und identifiziert werden soll, in charakterisierende einzelne Bildelemente unterteilt, wofür zunächst insbesondere vorhandene Verfahren der software-technischen Bildanalyse herangezogen werden und in weiteren Schritten dann Systeme mit künstlicher Intelligenz, wie etwa neuronale Netze, welche nach dem Übergang zu einer textbasierten assoziativen Wissensbasis die Bilderkennung des gesamten Bildinhaltes automatisch durchführen.The invention relates to a machine-automated image recognition method, which first an entire image to be analyzed, the image content is to be identified and identified, divided into characterizing individual pixels, for which initially especially existing methods of software-technical image analysis are used and then in further steps systems with artificial intelligence , such as neural networks, which automatically perform the image recognition of the entire image content after transitioning to a text-based associative knowledge base.

Die vorliegende Erfindung betrifft sowohl einzelne Bilder als auch eine Abfolge von Bildern bzw. Videos, aus denen in automatisierter Form der Kontext bzw. der Inhalt mit Hilfe von Softwaregestützten Analyseverfahren ermittelt wird.The present invention relates both to individual images and to a sequence of images or videos from which the context or content is determined in an automated form with the aid of software-supported analysis methods.

Allgemeine Probleme jeder Art der maschinellen automatisierten Erkennung von Bildern, Bildinhalten und Bildobjekten sind zunächst natürlich die extrem hohen Anforderungen an Rechenkapazitäten, Speicherkapazitäten und Datenübertragungsraten, die durch eine matrix- oder netzartige Verarbeitung und Analyse über mehrere Hierarchieklassen hinweg vieler einzelner Bildelementmerkmale, Bildkontext-artig/Bedeutungsinhalt-artig zusammengefasste Bildelementgruppen (Meta-Elemente), geometrisch zusammenhängende Bildobjekte (Meta-Objekte), Teilen des Gesamtbildes bezüglich ihrer Korrelationen zueinander und untereinander sowie zum Kontext des Gesamtbildes exponentiell ansteigen. Weiterhin müssen einerseits für das Trainieren eines geeignet einzusetzenden Systemes mit künstlicher Intelligenz, also insbesondere eines neuronalen Netzes bereits eine umfangreiche Wissenbasis in einem dem System zugänglichen Speicher vorhanden sein; andererseits muss aber eine noch viel umfangreichere Wissenbasis während des eigentlichen Einsatzes des maschinellen Erkennungssystems, um einen zu analysierenden Bildinhalt maschinell automatisch zu erkennen, über einen extrem schnellem Speicherzugriff zur Verfügung stehen.First of all, of course, the extremely high demands on computing capacities, memory capacities and data transmission rates, which result from a matrix or network-like processing and analysis over several hierarchical classes of many individual picture element features, image contextual / general problems of all kinds of automatic automated recognition of images, image contents and image objects. Meaningful-grouped image element groups (meta-elements), geometrically coherent image objects (meta-objects), parts of the overall image with respect to their correlations with one another and with one another and with the context of the overall image increase exponentially. Furthermore, on the one hand, an extensive knowledge base must already be available in a memory accessible to the system for training a suitably deployed system with artificial intelligence, that is, in particular a neural network; On the other hand, however, an even more extensive knowledge base must be available via an extremely fast memory access during the actual use of the machine recognition system in order to automatically recognize a picture content to be analyzed by machine.

Gemäß der Erfindung wird ein vorteilhaftes Bilderkennungssystem bereitgestellt, wie durch das Verfahren von Anspruch 1 gelöst.According to the invention, there is provided an advantageous image recognition system as achieved by the method of claim 1.

Nachfolgende Prozessschritte der Erfindung beziehen sich auf die Analyse eines Bildes (zur Unterscheidung von Elementen auch als „Gesamtbild” bezeichnet) oder Videos durch eine Kombination von Elementen der Bildanalyse mit Elementen der Textanalyse. Im Prinzip werden einzelne Elemente eines Bildes nach ihren Grundformen, Texturen, Farben, natürlichem oder künstlichem Wesen, Material sowie weiteren Merkmalen klassifiziert. Dafür steht eine Wissensbasis (Grundwissen) innerhalb der Bildverarbeitung zur Verfügung. Aus diesen Ergebnissen werden Gruppen solcher Elemente – Metaelemente gebildet, wo dies ein komplexeres Objekt ergibt.Subsequent process steps of the invention relate to the analysis of an image (also referred to as "overall image" for distinguishing elements) or videos through a combination of elements of image analysis with elements of text analysis. In principle, individual elements of an image are classified according to their basic shapes, textures, colors, natural or artificial nature, material and other characteristics. For this, a knowledge base (basic knowledge) is available within the image processing. From these results, groups of such elements - meta-elements are formed, where this results in a more complex object.

Weiter wird der Kontext zwischen den einzelnen Elementen, zu den Meta-Elementen sowie zum Gesamtbild analyisert. Zudem wird das Bild als Gesamtheit analysiert hinsichtlich Perspektive, Horizont, Beleuchtungsart, Farben, Farbverläufen und -spektren, Kontrasten etc.Furthermore, the context between the individual elements, the meta elements and the overall picture is analyzed. In addition, the image as a whole is analyzed with regard to perspective, horizon, type of lighting, colors, color gradients and spectra, contrasts etc.

Die Merkmale der gefundenen Elemente werden in einer gesonderten, textuellen Wissensbasis (welche beliebiges textuelles Wissen, wie Lexika, Fachliteratur o. ä. sein kann) daraufhin geprüft, welchen Objekten sie zugehörig sein können oder ob sie ggf. eigenständige Objekte sind. Dazu kann eine Taxonomie verwendet werden. Bedeutungsgehalte werden assoziativ ermittelt. Der Prozess ist stark rückgekoppelt, in Feedback-Schleifen wird das Ergebnis auf Widersprüche bzw. Übereinstimmungen geprüft und die Lösung mit den wenigsten Widersprüchen gewählt.The features of the found elements are checked in a separate, textual knowledge base (which may be any textual knowledge, such as dictionaries, specialist literature, or the like) to which objects they may belong or whether they are possibly independent objects. For this a taxonomy can be used. Meaning contents are determined associatively. The process is strongly fed back, in feedback loops the result is checked for contradictions or similarities and the solution with the fewest contradictions is chosen.

Die Beziehung der einzelnen Elemente sowohl zueinander, als auch zum Gesamtbild, werden verarbeitet. Dieser Kontext wird auch mittels der textuellen Wissensbasis im selben Verfahren wie oben beschrieben, interpretiert.The relationship of the individual elements both to each other, as well as to the overall picture, are processed. This context is also interpreted by means of the textual knowledge base in the same method as described above.

So wird es möglich, das bereits in Form von Textdokumenten umfassend verfügbare Wissen – welches in ähnlicher Form im Bildbereich nicht existiert, für die Bildanalyse zu nutzen. Damit wird der Aufbau einer eigenen Bild-Wissensbasis nicht überflüssig, doch kann diese durch die textuelle Wissensbasis erheblich erweitert werden, insbesondere hinsichtlich des Bestimmens des Kontextes von Elementen.This makes it possible to use the knowledge already available in the form of text documents - which does not exist in a similar form in the image area - for image analysis. This does not make the construction of an own image knowledge base superfluous, but it can be considerably extended by the textual knowledge base, in particular with regard to determining the context of elements.

Dadurch wird eine umfassende Bildanalyse möglich, die Präzision der Erkennung erheblich gesteigert. Das zusätzliche Element „Bewegung” in einem Video ergibt weitere Informationen zum Bildgeschehen und verbessert so die Präzision.This enables a comprehensive image analysis, significantly increasing the accuracy of the recognition. The additional element "movement" in a video provides further information about the image process and thus improves the precision.

Die gefundenen Kontexte können in eine Kontext-Bibliothek eingestellt werden. Eine solche kann für ein breites Spektrum von Szenen bzw. Lebensbereichen angelegt werden, auch modular und sehr spezialisiert.The found contexts can be set in a context library. This can be applied to a wide range of scenes or areas of life, including modular and very specialized.

Im weiteren folgt eine detaillierte Beschreibung und nähere Erläuterung des erfindungsgemäßen Bilderkennungsverfahren, insbesondere anhand eines praktischen Fallbeispiels.The following is a detailed description and detailed explanation of the image recognition method according to the invention, in particular based on a practical case study.

Prozeßphasen, dargestellt am Beispiel „Torbogen” Process phases, illustrated by the example "Archway"

Zu analysieren sei nachfolgendes digitales Bild in 1: Das zu analysierende Gesamtbild, aus welchem zunächst die wesentlichen Elemente extrahiert werden, indem sie aufgrund der geometrischen Form, der Kanten, der Farbflächen und Farbverläufe, aufgrund von Kontrasten und Bildparametern (wie Auflösungsgrad, Farbe, Kontrast, Helligkeit, schwarz-weiß-Verteilung etc.) sowie deren Verändeungen selektiert werden, wobei insbesondere vorhandene Verfahren der Bildverarbeitung und Bildanalyse angewandt werden.To analyze is following digital picture in 1 The overall image to be analyzed, from which the essential elements are first extracted, based on the geometric shape, the edges, the color areas and color gradients, due to contrasts and image parameters (such as degree of resolution, color, contrast, brightness, black-and-white distribution etc.) as well as their changes are selected, whereby in particular existing methods of image processing and image analysis are used.

2.1 Erste Phase:2.1 First phase:

Ermitteln der wesentlichen Elemente. Zunächst werden die einzelnen wesentlichen Elemente des Bildes extrahiert. Die Elemente werden aufgrund der Form, der Kanten, der realen Farbflächen und -verläufe, Kontraste als auch nach Veränderungen von Bildparametern (wie Auflösungsgrad, Farbe, Kontrast, Heiligkeit, s/w etc.) selektiert. Hier wird auf vorhandene Verfahren zurückgegriffen.Identify the essential elements. First, the individual essential elements of the image are extracted. The elements are selected on the basis of shape, edges, real color surfaces and gradients, contrasts as well as changes of image parameters (such as degree of resolution, color, contrast, sanctity, b / w, etc.). Here existing methods are used.

Beispiele: die weiße Fläche am Boden mit ihrer unregelmäßigen Struktur resultiert aus der Farbflächenanalyse, die Fassade mit ihrem zweifarbigen Anstrich ebenso (könnte u. U. auch zusammengesetzt sein aus den Modulen hell und ocker). Die beiden Tore sind als Elemente mit deutlichen Konturen selektierbar.Examples: the white surface on the floor with its irregular structure results from the color surface analysis, the façade with its two-tone coating as well (could also be composed of the light and ocher modules). The two goals can be selected as elements with clear contours.

Dies führt (vereinfacht) z. B. zu Elementen, wie sie in 2 beispielhaft dargestellt sind: Die Selektion der Bildelemente aus 1 führt dann zu separierten einzelnen Bildkomponenten, die hier in 2 dargestellt sind.This leads (simplified) z. B. to elements as they are in 2 are shown by way of example: The selection of the picture elements 1 then leads to separated individual image components, which are here in 2 are shown.

2.2 Zweite Phase: Merkmalsermittelung.2.2 Second Phase: Determination of Characteristics.

Gesamtbild und alle Elemente werden nun parallel verarbeitet zur Ermittlung bestimmter Merkmale (3): Die einzelnen Bildkomponenten werden nun alle parallel weiterverarbeitet, und auch solche Bildelemente, die mehrmals vorkommen, werden auch mit ihrer exakten Anzahl des Vorkommens im Gesamtbild parallel berücksichtigt.Overall image and all elements are now processed in parallel to determine certain characteristics ( 3 ): The individual image components are now all processed in parallel, and even those pixels that occur several times, are also considered in parallel with their exact number of occurrences in the overall picture.

Bei dem gewählten Beispiel könnte eine mögliche Abfolge der Analyseschritte (sowohl für die Elemente als auch das Gesamtbild) wie folgt aussehen:

1. natürlich/künstlich?
2. Perspektive ja/nein?
3. Form/Struktur
4. Farbe, Farbverlauf
5. Oberfläche, Textur
6. ... weitere Merkmale

In the example chosen, a possible sequence of the analysis steps (both for the elements and the overall image) could be as follows:

1. natural / artificial?
2nd perspective yes / no?
3. Shape / Structure
4th color, gradient
5. Surface, texture
6. ... further features

Somit ergibt sich folgende erste Phase gemäß 4 und 4A: Alle nunmehr parallel betrachteten Einzelelemente des Gesamtbildes sowie auch gleichzeitig das Gesamtbild welches all diese Einzelbildelemente enthält, werden nun einer weiteren Bildobjekt-Merkmalsanalyse unterzogen und mit genauer beschreibenden textuellen Attributen versehen, wobei hierfür nun bereits zumindest teilweise – neben den vorhandenen deterministisch-analytischen Bildanalyseverfahren – ein anhand von (bekannten) Trainingsbeispielen trainiertes KI-System wie etwa das neuronale Netz des Apollo-Systems herangezogen wird, welches dann Unterscheidungen erlaubt, wie z. B. ob ein Bildobjekt von natürlicher oder künstlicher Natur ist, ob eine perspektivische Darstellung im Bild vorliegt, wo im Bild ein (mittlerer) Horizont ermittelt werden kann, wodurch ein Bodenbereich festgelegt werden kann; weiterhin werden Unterscheidungen getroffen bezüglich Formen und Strukturen der Bildelemente, basierend auf Basiswissen über allgemine geometrische Formen und Strukturen, wie etwa etwaige Symmetrien, Rundungen, Bögen, Kreislinien, Vielecke, Rasterungen, Schattierungen, Helligkeiten, Farbverläufe, Texturen etc. sowie insbesondere bei ausgedehnten Flächen wird – neben ihrer Farbe und ihrem Farbverlauf – in Betracht gezogen, ob sie oberhalb oder unterhalb der Horizontlinie liegt, ob sie regelmäßig oder unregelmäßig geformt ist (FFT-Analyse), ob sie regelmäßige oder unregelmäßige Texturen aufweist, und wenn ja in welchen Bereichen dieser Fläche sich diese befinden.This results in the following first phase according to 4 and 4A : All now considered individual elements of the overall image as well as the whole image containing all these individual image elements are now subjected to another image object feature analysis and provided with more descriptive textual attributes, which now already at least partially - in addition to the existing deterministic-analytical image analysis method - a training based on (known) training examples AI system such as the neural network of the Apollo system is used, which then allows distinctions such. Whether an image object is of a natural or artificial nature, whether there is a perspective representation in the image, where a (middle) horizon can be determined in the image, whereby a ground area can be determined; Furthermore, distinctions are made in terms of shapes and structures of the picture elements, based on basic knowledge of general geometric shapes and structures, such as any symmetries, curves, arcs, circles, polygons, halftones, shading, brightness, color gradients, textures, etc., and especially in extended areas In addition to its color and gradient, it is considered whether it is above or below the horizon line, whether it is regular or irregular in shape (FFT analysis), whether it has regular or irregular textures, and if so in which areas Area these are.

In Schritten 1. und 2. wird jeweils eine ja/nein Entscheidung getroffen. In Schritten 3., 4., 5. gibt es jeweils ein verbal beschreibbares Resultat.In steps 1 and 2, a yes / no decision is made. In steps 3, 4, 5 there is a verbally describable result.

Beispiele:Examples:

a) Natural or artificial? This analysis is performed on the overall image and on each of the elements. It leads to four elements with positive results: the three people and the snow. In the overall picture, a unified statement is not possible. This distinction is possible after training the system (Apollo). The software uses training examples to independently learn the characteristics relevant to a differentiation.
b) perspective yes / no? This is the whole picture z. B. determined by means of alignment lines (see 5 ). This leads to the following result:

Es gibt eine Perspektive. Ein Horizont ist definierbar, welcher das Bild einteilt. Die Auswertung dieses Ergebnisses folgt in einer nächsten Phase (z. B. Definieren des unteren Teiles als Boden). Bei den Elementen ist eine Perspektive nicht erkennbar (mit Ausnahme des zweiten Torbogens, dort jedoch aufgrund der geringen Größe und Bildqualität kaum sicher zu ermitteln.There is a perspective. A horizon is definable, which divides the picture. The evaluation of this result follows in a next phase (eg defining the lower part as soil). For the elements, one perspective is not visible (with the exception of the second archway, but there due to the small size and image quality hardly certain to determine.

Die vorherige 4 zeigte also die in Einzelbildelemente aufgetrennten Bildinhalte des Gesamtbildes, die wie oben beschrieben parallel und einzeln weiterverarbeitet und analysiert werden, und hier in 5 wird nun die oben bereits angedeutete ebenfalls parallel ablaufende Bewertung und Analyse des Gesamtbildes unter Berücksichtigung der einzelnen Bildelemente, welche nunmehr bereits näher textuell charakterisiert und klassifiziert wurden am Beispiel der Entscheidung, ob eine Perspektive im Gesamtbild vorliegt, veranschaulicht. Eine solche Perspektive ist natürlich in den Einzelbildelementen nicht erkennbar, aber im Gesamtbild können beispielsweise mittels Kantenantastung – auch gemittelte – Fluchtlinien identifiziert werden wie auch eine – auch gemittelte – Horizontlinie identifiziert werden kann, welche das Gesamtbild in einen oberen und einen unteren Bereich einteilt. Insbesondere hier werden deterministisch-analytische Bildanalyseverfahren und oder anhand hierarchisch klassifizierter Taxonomien bestehend aus Beispiel-Wissensbasen trainierte KI-Systeme wie Apollo gegebenfalls gleichzeitig angewendet.

c) Form/Struktur. Betrachten wir eines der Bogenfenster. Ergebnis der Analyse (basierend auf Basiswissen zu Strukturen...): Symmetrisch, Halbbogen oben, gerastert, dunkel (vorher war schon ermittelt worden, dass es sich um ein künstliches Objekt handelt).
d) Form/Struktur. Betrachten wir ein anderes Beispiel, die weiße Fläche. Bereits aus der Festlegung des Horizontes ist klar, dass es sich um eine Fläche „unten” handelt, also evtl. den Boden. Ergebnis der Form- und Strukturanalyse (siehe 6): a. unregelmäßig geformt b. Textur ohne regelmäßige Struktur, unregelmäßige Form in der Mitte.

The previous one 4 Thus, the image contents of the overall image separated into individual image elements were processed and analyzed in parallel and individually as described above, and here in 5 Now, the above-mentioned also parallel running evaluation and analysis of the overall picture taking into account the individual picture elements, which have now already been characterized and classified textual illustrated using the example of the decision whether a perspective in the overall picture is illustrated. Of course, such a perspective is not recognizable in the individual picture elements, but in the overall picture it is possible, for example, to identify alignment lines - also averaged - alignment lines and to identify a - also averaged - horizon line which divides the overall picture into an upper and a lower area. In particular, here deterministic-analytical image analysis methods or, based on hierarchically classified taxonomies consisting of example knowledge bases, trained AI systems such as Apollo are applied simultaneously if appropriate.

c) shape / structure. Let's take a look at one of the arched windows. Result of the analysis (based on basic knowledge of structures ...): Symmetrical, half-sheet above, gridded, dark (previously it had been determined that it was an artificial object).
d) shape / structure. Let's look at another example, the white area. Already from the determination of the horizon is clear that it is an area "below", so possibly the ground. Result of the shape and structure analysis (see 6 ): a. irregular shaped b. Texture without regular structure, irregular shape in the middle.

Hier in 6 ist also nun nochmal eines der Einzelbildelemente bzw. der Bildobjekte aus 4 herausgegriffen, nämlich die größtenteils weiße Fläche des Bodenbereiches des Gesamtbildes, womit nochmal die Einzelbildelement-Analyse beispielhaft hervorgehoben werden soll bezüglich Farbe, Farbverlauf, sowie bezüglich der als unregelmäßig erkannten geometrischen (Umriß-)Form, bezüglich der als unregelmäßig strukturiert erkannten Textur und den als unregelmäßig und anders gearteten Formen, die in der Mitte dieses Bildelementes erkannt werden (wobei es sich um die die nur teilweise sichtbaren Personen handelt, was das System zu diesem Zeitpunkt aber noch nicht definiert hat).

e) Farbe, Farbverlauf. Die Analyse ergibt eine flächige Anordnung aus Ocker und einem hellem Farbton.
f) Textur (siehe 7): Hier ist also nun nochmal ein anderes der Einzelbildelemente aus 4 herausgegriffen, nämlich der große und im Gesamtbild dominierende Torbogen, womit abermals die Einzelbildelement-Analyse beispielhaft hervorgehoben werden soll, hier nunmehr in Bezug auf die Erkennung einer komplexen Textur innerhalb einer als stark symmetriebehafteten geometrischen Form – nämlich des schmiedeeisernen Torbogens, was das System aber zu diesem Zeitpunkt noch nicht so genau weiß, bis dahin (er-)kennt es nur die Torbogengeometrie, also Rechteck plus Halbrundung und eine filigrane komplexe Textur im Inneren dieser geometrischen Form.

Here in 6 is now again one of the single picture elements or the picture objects 4 singled out, namely the largely white surface of the bottom portion of the overall picture, which again the single-frame element analysis is to be highlighted by way of color, color gradient, as well as with respect to recognized as irregular geometrical (outline) shape, with respect to recognized as irregular structured texture and as irregular and different shapes detected in the middle of this picture element (which are the only partially visible persons, which the system has not yet defined at this time).

e) color, gradient. The analysis results in a flat arrangement of ocher and a light hue.
f) texture (see 7 ): So here's another one of the single picture elements 4 singled out, namely the large and dominating in the overall picture archway, which again the single-pixel analysis is to be highlighted, here now in relation to the detection of a complex texture within a strong symmetrical as a geometric shape - namely the wrought-iron archway, but what the system to At this point in time, it does not know exactly, until then it only knows the archway geometry, ie rectangle plus half-rounding and a filigree complex texture inside this geometric shape.

2.3 Dritte Phase: Meta Elemente erkennen2.3 Third Phase: Detecting Meta Elements

In dieser Phase werden die erkannten Merkmale verarbeitet, insb. durch Ermittlung der Beziehungen zwischen den Elementen und der Anordnung der Elemente innerhalb des Kontexts des Gesamtbildes. Die textuellen Ergebnisse der Schritte 3, 4 und 5 für Elemente werden in ein assoziatives Textsystem mit „inverser” Taxonomie eingespeist, in einer textuellen Wissensbasis verarbeitet und versehen so die Elemente mit Bedeutungsgehalt.

a) Betrachten wir nochmals die Bogenfenster (siehe 8): Hier in 8 wird nun das nächste einzelne Bildelement aus der 4 herausgegriffen, und zwar ein weniger komplexes „kleineres”, nämlich das Bogenfenster, was das System ja erst noch als solches erkennen muss, bisher (er-)kannte es ja nur anhand der speziellen geometrischen Form und Größe (wieder wie in 7 ein Rechteck mit einem Halbbogen oben, aber diesmal kleiner), Farbe und Helligkeit (dunkel), charakteristischen Textur (gerastert), Symmetrie (achsensymmetrisch), dass es sich um ein Einzelbildelement handelt, welches mehrmals im Gesamtbild vorkommt. Ergebnis der bisherigen Analyse (basierend auf Basiswissen zu Strukturen...): Symmetrisch, Halbbogen oben, gerastert, dunkel. Nun folgen kontextuelle Analysen sowohl der Elemente untereinander, als auch ihres Bezugs zum Gesamtbild und ihrer Position im Gesamtbild: a. » Vergleich mit den anderen Elementen, Bestimmung der Häufigkeit ähnlicher Elemente. Ergebnis: mindestens vier sehr ähnliche Objekte kommen im Bild vor. b. » Prüfung der Regelmäßigkeit ihrer Anordnung. Ergebnis: strukturiert, regelmäßig, in zwei Reihen übereinander angeordnet, je drei Achsen bildend, getrennt durch senkrechte regelmäßige Elemente. c. » Prüfung weiterer Objekte innerhalb der regelmäßigen Struktur führt zu zwei weiteren, sehr ähnlichen Figuren in der Mitte des Bildes (mittleres Fenster und kleiner Torbogen) und zwei weniger ähnlichen (untere Fenster links und rechts, siehe nachstehende 9): Hier in 9 nun alle als ähnlich erkannten Einzel-Bildelemente in dieser Figur aufgelistet, die dadurch als ähnlich erkannt wurden, dass sie alle die gleiche (Achsen-)Symmetrie, etwa die gleiche Form und Textur aufweisen, wobei zur Beurteilung des Grades der Ähnlichkeit der Einzel-Bildelemente die Größe der Einzel-Bildelemente zunächst nicht herangezogen wird.

In this phase, the recognized features are processed, in particular by determining the relationships between the elements and the arrangement of the elements within the context of the overall picture. The textual results of steps 3, 4 and 5 for elements are fed into an associative text system with "inverse" taxonomy, processed in a textual knowledge base and thus provide meaningful elements.

a) Consider again the arched windows (see 8th ): Here in 8th Now the next single pixel from the 4 The only thing that the system has yet to recognize as such is that of a less complex "smaller" one, namely the arched window, which until now was only known on the basis of the specific geometric shape and size (again as in 7 a rectangle with a semicircle above, but this time smaller), color and brightness (dark), characteristic texture (screened), symmetry (axisymmetric), that it is a single pixel, which occurs several times in the overall picture. Result of the previous analysis (based on basic knowledge of structures ...): Symmetrical, half-arch above, screened, dark. Now, contextual analyzes of both the elements and their relation to the overall picture and their position in the overall picture follow: a. »Comparison with the other elements, determination of the frequency of similar elements. Result: at least four very similar objects appear in the picture. b. »Checking the regularity of their arrangement. Result: structured, regular, arranged in two rows one above the other, forming three axes each, separated by vertical regular elements. c. "Checking other objects within the regular structure leads to two more, very similar figures in the middle of the picture (middle window and small archway) and two less similar (lower windows left and right, see below 9 ): Here in 9 now listing all the similarly recognized single picture elements in this figure, which have been recognized as being similar in that they all have the same (axis) symmetry, approximately the same shape and texture, to judge the degree of similarity of the individual picture elements the size of the individual picture elements is initially not used.

Diese Fensterelemente zusammen bilden ein Meta-Element, welches in seinem Bedeutungsgehalt untersucht wird. Dies erfolgt durch die Nutzung der bisher gewonnen Begriffe und Adjektive. Sie werden als „Search Terms” in eine assoziative Search Engine eingespeist: „quadratisch, Halbbogen, Gitter, dunkel, zwei Reihen, drei Achsen” etc. Es werden Objekte in der textuellen Wissensbasis gesucht, welche der Summe der genannten Merkmale am ähnlichsten sind. Diese Ähnlichkeit wird durch zwei Methoden festgestellt:

– die meisten Übereinstimmungen und
– die wenigsten Widersprüche.

(Welche der beiden Methoden den Ausschlag gibt, ist situationsabhängig manuell oder automatisch festzulegen und hier nicht ausschlaggebend).These window elements together form a meta-element, which is examined in its meaning content. This is done by using the previously obtained terms and adjectives. They are fed into an associative search engine as "search terms": "quadratic, half-arc, grid, dark, two rows, three axes" etc. Objects in the textual knowledge base which are most similar to the sum of the named features are searched for. This similarity is established by two methods:

- most matches and
- the least contradictions.

(Which of the two methods is decisive depends on the situation, manually or automatically and is not decisive here).

Als Ergebnisse werden ermittelt: Kirche, Kathedrale, Schloß, Fenster, Fassade, College, Palazzo, Gebäude u. ä. Daraus wird eine Taxonomie gebildet. Diese systematisiert die Begriffe, welchen verschiedenen Bereichen, Abstraktionsebenen (Fenster > Fassade > Gebäude > Gebäudetypen Schloß, College, Kirche » Kathedrale u. ä.) etc. angehören.The results are: Church, Cathedral, Castle, Window, Facade, College, Palazzo, Building u. ä. From this a taxonomy is formed. This systematizes the terms which belong to different areas, abstraction levels (windows> facade> buildings> building types castle, college, church »cathedral, etc.).

Diese Objekte sind Hypothesen für den Bildinhalt. Sie werden nun nacheinander mit dem Gesamtbild wie auch mit dem Meta-Element verglichen mit dem Ziel, die Objekte nach dem geringsten Grad von Widersprüchen zu bewerten und einzuteilen. (Kathedrale z. B. würde das Merkmal Spitzbogenfenster ergeben, welches als Widerspruch zu dem Halbbogen dieses Merkmal ausschließt). Der Vergleich der im Beschreibungstext enthaltenen Elemente ergibt folgendes Ranking:

– Schloß
– College
– Palazzo
– Kirche.

These objects are hypotheses for the image content. They are compared with the overall picture as well as with the meta-element one after another, with the aim to evaluate and classify the objects according to the least degree of contradictions. (Cathedral, for example, would give the feature pointed arch window, which, in contradistinction to the semicircle, precludes this feature). The comparison of the elements contained in the description text gives the following ranking:

- Lock
- College
- Palazzo
- church.

Ferner sollen alle diese bis zu diesem Zeitpunkt vom maschinellen System als ähnlich erkannten Einzel-Bildelemente nun als sogenanntes Meta-Element zu einer übergeordneten gemeinsamen Klasse von identifizierten Bildobjekten zusammengefasst und mit einem oder mehreren charakterisierenden textuell beschreibenden Oberbegriff(en) versehen werden. Um diesen charakterisierenden Oberbegriff zu ermitteln (hier soll selbstverständlich z. B. unter anderem „Fenster” herauskommen), werden nun alle bisher ermittelten Attribute aller als ähnlich identifizierten Bildelemente als „search terms” oder „keywords” in eine assoziative textbasierte „Search engine” eingespeist; in diesem Fall werden also die „Keywords” für die (Text-)Suchmaschine etwa lauten: quadratisch, Halbbogen, Gitter, dunkel, zwei Reihen, drei (Symmetrie-)Achsen, etc. Nun werden maschinell Objekte in dieser assoziativen textuellen Wissenbasis gesucht, welche der „Summe” der genannten Bildelement-Attribute am ähnlichsten sind bzw. welche im Mittel am signifikantesten mit diesen Attributen assoziiert werden können. Dieser Ähnlichkeitsgrad oder diese Signifikanz der Korrelation der Attribute mit den zunächst als Arbeitshypothese von der textbasierten Suchmaschine aufgefundenen Objekten/Objektvorschlägen wird nun weitergehend geprüfft, sodass eine Rangfolge der zunächst hypothetischen Objektvorschläge der Suchmaschine ermittelt werden kann. Zu diesem Zweck werden zwei insbesondere statistisch mittelnde Methoden herangezogen, zum einen die Überprüfung auf eine möglichst große Anzahl und möglichst große Qualität an Übereinstimmungen, und zum anderen die Überprüfung auf möglichst wenig und möglichst geringfügige Widersprüche.Furthermore, all of these individual picture elements, which until then have been recognized as being similar by the machine system, should now be combined as a so-called meta-element into a superordinated common class of identified picture objects and provided with one or more characterizing descriptive generic terms. In order to determine this characterizing generic term (here, of course, "windows" are to come out, inter alia), all the attributes of all the pixels identified as similarly identified as "search terms" or "keywords" are now converted into an associative text-based "search engine". fed; In this case, the "keywords" for the (text) search engine will be approximately: square, half-arc, grid, dark, two rows, three (symmetry) axes, etc. Now objects are searched mechanically in this associative textual knowledge base, which are most similar to the "sum" of said pixel attributes or which, on the average, can be most significantly associated with those attributes. This degree of similarity or significance of the correlation of the attributes with the objects / object suggestions initially found as a working hypothesis by the text-based search engine will now be checked further, so that a ranking of the initially hypothetical object proposals of the search engine can be determined. For this purpose, two particularly statistically averaging methods are used, on the one hand, the check for the largest possible number and the highest possible quality of matches, and on the other hand, the check for as few and as small as possible contradictions.

Dadurch könnte die textbasierte Suchmaschine dann als Ergebnis beispielsweise die folgenden (Ober-)Begriffe liefern: Kirche, Kathedrale, Schloß, Fenster, Fassade, College, Palazzo, Gebäude und oder ähnliches. Weiterhin wird diese auf einer in eine hierarchische Taxonomie klassifizierte textuelle assoziative Wissenbasis fußende Suchmaschine insbesondere unter Zuhilfenahme eines KI-Systems wie eines neuroalen Netzes die gefundenen Begriffe auch systematisieren z. B. in hierarchisch gegliederten Abstraktionsebenen, wie etwa nach der tatsächlichen Größe und oder auch nach der gegliederten Integration in eine größere übergeordnete Struktur: Also beispielsweise: Fenster > Gebäude > Gebäudetypen (Schloß, College, Kirche » Kathedrale). Alle diese vorläufig erkannten Meta-Elemente sind nun Hypothesen für den Bildinhalt bzw. für Komponenten des Gesamtbildinhaltes. Diese Arbeitshypothesen werden nun nacheinander mit dem Gesamtbild wie auch mit (allen) separierten Meta-Elementen verglichen mit dem Ziel, die Objekte nach dem geringsten Grad von Widersprüchen in eine Rangfolge der Signifikanz anzuordnen. Hier könnte (und sollte) sich aufgrund solcher Vergleiche eine Rangfolge ergeben wie etwa Schloß » College » Palazzo » Kirche.

b) Nun folgt eine Analyse von Farbflächen 11: Im nächsten Schritt werden dann z. B. die Farbflächen nochmals analysiert und ebenfalls in diese Vergleichsoperationen zur Widerspruchsminimierung eingebunden, um die Rangfolge der ermittelten Arbeitshypothesen weiter zu präzisieren. Diese Fläche wird mit den ermittelten Hypothesen Schloß, College, Palazzo, Kirche verglichen auf die jeweilige Zahl der Widersprüche. Die Farben stehen nicht im Widerspruch zu einem Gebäude. Es ergeben sich keine signifikanten Widersprüche, die Kirche erhält die geringste Wahrscheinlichkeit. Nächster Schritt: die Analyse des (durch den Horizont als solchen ermittelten) Bodens (siehe 11): Die Attribute ergeben: weiß, teilweise glatt, teilweise unregelmäßig strukturiert, unregelmäßig begrenzt, ... Diese Attribute werden nun ebenfalls in Taxonomien eingespeist. Objekte mit diesen Eigenschaften in Verbindung mit „Boden” sind: Marmor, Schnee, Teppich, ....

As a result, the text-based search engine could then provide, for example, the following (upper) terms: Church, Cathedral, Castle, Window, Facade, College, Palazzo, Buildings and the like. Furthermore, this will be on a classified in a hierarchical taxonomy textual associative knowledge base based search engine especially with the aid of an AI system such as a neural network systematize the terms found z. In hierarchically structured levels of abstraction, such as the actual size and or even after the articulated integration into a larger superordinate structure: So for example: Window>Buildings> Building Types (Castle, College, Church »Cathedral). All of these tentatively recognized meta-elements are now hypotheses for the image content or for components of the overall image content. These working hypotheses are then compared successively with the overall picture as well as with (all) separated meta-elements with the aim of arranging the objects in the order of significance according to the least degree of contradiction. Here could (and should) be ranked by such comparisons, such as Castle »College» Palazzo »Church.

b) Now follows an analysis of color surfaces 11 In the next step then z. B. the color areas analyzed again and also in these comparison operations for Contradiction minimization involved in order to further specify the ranking of the identified working hypotheses. This area is compared with the determined hypotheses castle, college, palazzo, church on the respective number of contradictions. The colors do not contradict a building. There are no significant contradictions, the church gets the least probability. Next step: the analysis of the soil (determined by the horizon as such) (see 11 ): The attributes are: white, partly smooth, partly irregularly structured, irregularly bounded, ... These attributes are now also fed into taxonomies. Objects with these properties in connection with "ground" are: marble, snow, carpet, ....

Diese Objekte sind Hypothesen für den Bildinhalt. Sie werden nun nacheinander mit dem Gesamtbild wie auch mit dem Meta-Element verglichen mit dem Ziel, die Objekte nach dem geringsten Grad von Widersprüchen zu ranken.

– Weißer Teppich: Aufgrund der Natürlichkeit, welche bereits in Phase 1 festgestellt worden ist, gibt es hinsichtlich des Teppichs einen Widerspruch, ebenso aus der Oberflächenstruktur. Auch aus dem Schneefall (s. unten) folgt ein Widerspruch zu Teppich, welcher nicht im Außenbereich verwendet wird.
– Weißer Marmor: Die fehlende Struktur ist ein Widerspruch, ebenso wie die teils sehr zerklüftete Oberfläche.
– Schnee: Die Unregelmäßigkeit der Oberfläche ist ein wichtiges Merkmal, welches für Schnee spricht. Schnee wäre jedenfalls dann bestätigt, wenn es schneien würde. Das System prüft also: finden sich Hinweise auf Schneefall? Dazu wird er Bildteil oberhalb des Horizonts untersucht, speziell die Bereiche vor dunklen Flächen, da sich dort Schneeflocken am besten erkennen lassen. Auf dem linken Bildausschnitt finden sich tatsächlich weiße Flocken verteilt auf dem gesamten Bild. Auf dem rechten jedoch nicht. Da Schneeflocken in einem Teilbereich vorkommen können, in einem anderen nicht, ist die Schnee-Hypothese bestätigt. Denn umgekehrt schließt das Fehlen von Flocken in einem Bildteil das Vorkommen in einem anderen nicht aus (siehe 12).

These objects are hypotheses for the image content. They are compared with the overall picture as well as with the meta-element, one after the other, with the aim of entwining the objects to the least degree of contradictions.

- White carpet: Due to the naturalness, which has already been determined in Phase 1, there is a contradiction in terms of the carpet, as well as the surface structure. Also from the snowfall (see below) follows a contradiction to carpet, which is not used outdoors.
- White marble: The missing structure is a contradiction, as well as the partly very rugged surface.
- Snow: The irregularity of the surface is an important feature that speaks for snow. Anyway, snow would be confirmed when it would snow. So the system checks: are there any indications of snowfall? For this purpose, he examines part of the picture above the horizon, especially the areas in front of dark areas, where snowflakes are best recognized. The left part of the picture actually contains white flakes spread over the whole picture. On the right, however, not. Since snowflakes can occur in one area and not in another, the snow hypothesis is confirmed. Conversely, the absence of flakes in one part of the image does not exclude the occurrence in another (see 12 ).

Die Berücksichtigung der Widersprüche ergibt folgendes Ranking nach dem Grad der Wahrscheinlichkeit:

– Schnee (sehr hoch)
– Marmor (gering)
– Teppich (sehr gering)

The consideration of the contradictions yields the following ranking according to the degree of probability:

- snow (very high)
- Marble (low)
- carpet (very low)

In diesem Schritt (12) wird nun also z. B. die Analyse des Bodens (welcher durch vorangegangenes Auffinden einer Horizontlinie im Gesamtbild als solcher erkannt wurde) auch in diese Widerspruchs-minimierenden und Obereinstimmungs-maximierenden Vergleichsoperationen zwischen den Meta-Elementen untereinander und mit dem Teil- oder Gesamtinhalt des ganzen Bildes mit einbezogen, um die Signifikanz der Rangfolge auch der diesbezüglichen Hypothesen bzgl. des erkannten Bildinhaltes weiter zu präzisieren. So etwa werden die für die als Bodenbereich identifizierte Bildregion gefundenen Attribute (weiß, teilweise glatt, teilweise unregelmäßig strukturiert, unregelmäßig begrenzt) ebenfalls in Taxonomien eingespeist und diese assoziativen hierarchisch klassifizierten textuellen Wissenbasen (Suchmaschinen) könnten dann mit großer Wahrscheinlichkeit signifikante Korrelationen zu Objekten herstellen, wie Marmor, Schnee, Teppich, was nun wiederum Hypothesen darstellt, diesmal für den Bodenbereich des Bildes.In this step ( 12 ) is now z. For example, the analysis of the ground (which was identified by finding a horizon line in the overall picture as such) is also included in these contradiction-minimizing and match-maximizing comparison operations between the meta-elements and the partial or total content of the whole picture. in order to further clarify the significance of the order of precedence and hypotheses regarding the recognized image content. For example, the attributes found for the image region identified as the bottom region (white, partially smooth, partially irregularly textured, irregularly bounded) are also fed into taxonomies, and these associative hierarchically classified textual knowledge bases (search engines) could then, with high probability, produce significant correlations to objects. like marble, snow, carpet, which in turn represents hypotheses, this time for the bottom of the picture.

Auch diese Hypothesen für den Boden werden nun wieder mit dem Gesamtbild und allen Meta-Elementen verglichen, mit dem Ziel, die Objekte nach dem geringsten Grad an Widersprüchen bzw. dem höchsten Grad an Übereinstimmungen in einer Rangfolge anzuordnen. Aufgrund der vorher erkannten Natürlichkeit des Bodenbereiches und aufgrund der unregelmäßigen Oberflächenstruktur und aufgrund des Schneefalles im (Gesamt-)Bild, wodurch auf Außenbereich geschlossen werden kann, stellt sich ein Widerspruch zur Hypothese „Teppich” ein. Fehlende Struktur und zerklüftete Oberfläche bzw. Umrandung führt zu einem Widerspruch zur Hypothese „Marmor”. Dafür spricht die Unregelmäßigkeit des Bodenbereiches für eine Schneefläche und die Überprüfung auf Schneefall im (Gesamt-)Bild fällt ebenfalls positiv aus (kleine weiße Flecken auf dem ganzen Bild oder zumindest großen Teilenbereichen davon deuten auf Schneeflocken hin, die sich insbesondere vor den dunkeln Bereichen in der Bildhälfte oberhalb des Horizontes abheben). Somit ergibt sich hier eine Rangfolge derart: Schnee » Marmor » Teppich. In 12 wird der Ausschnitt des Gesamtbildes hervorgehoben, der am geeignetesten dafür ist, die maschinelle Überprüfung auf etwaigen vorliegenden Schnellfall im Bild zu überprüfen. Eventuell kann hierbei auch ein manueller Eingriff bei der Auswahl solcher gezielt gewählten Bildausschnitte insbesondere in der Trainingsphase des KI-Systems vorgesehen sein.

c) Gruppe der Personen (bleibt auszuführen) Stichworte: Silhouetten und Kopfform führen zu Personen. Kein Gesicht erkennbar > von hinten.
d) Vordergrund: Torbogen (bleibt auszuführen) Stichworte: Aus der festgestellten Perspektive folgt, dass es sich um eine Art Tunnel handelt mit erkennbaren Strukturen an der Decke (werden bei helleren Bildern deutlicher), an der Öffnung ein sehr heller Anteil...

These hypotheses for the ground are now compared again with the overall picture and all meta-elements, with the aim of arranging the objects according to the least degree of contradiction or the highest degree of similarity in a ranking order. Due to the previously recognized naturalness of the floor area and due to the irregular surface structure and due to the snowfall in the (overall) image, which can be closed to outdoor, there is a contradiction to the hypothesis "carpet". Missing structure and rugged surface or border leads to a contradiction to the hypothesis "marble". This is confirmed by the irregularity of the ground area for a snow surface and the check for snowfall in the (overall) picture is also positive (small white spots on the whole picture or at least large parts of it indicate snowflakes, especially in front of the dark areas in half of the picture above the horizon). This results in a ranking like this: snow »marble» carpet. In 12 the section of the overall picture which is most suitable for checking the machine check for any instantaneous drop in the picture is highlighted. It may also be possible to provide manual intervention in the selection of such selectively selected image sections, in particular in the training phase of the AI system.

c) Group of people (remains to be carried out) Keywords: silhouettes and head shape lead to persons. No face recognizable> from behind.
d) foreground: archway (remains to be carried out) Keywords: From the established perspective it follows that it is a kind of tunnel with recognizable structures on the ceiling (becoming clearer in brighter pictures), a very bright part at the opening ...

Die Abfolge der Analyseschritte ist vereinfacht dargestellt. Sie ist rückgekoppelt und daher variabel und wird vom System selbst gesteuert. Allgemein: Jede Hypothese kann mit Zwischenergebnissen anderer Schritte verglichen werden, so dass sich ein matrix- bzw. netzartiges Vorgehen ergibt. The sequence of the analysis steps is simplified. It is fed back and therefore variable and is controlled by the system itself. General: Each hypothesis can be compared with intermediate results of other steps, resulting in a matrix or net-like procedure.

Diese Vorgehensweise ist prinzipiell analog bei einem Video. Die diesem kommt noch die Analyse der Veränderungen im Zeitablauf hinzu.This procedure is basically analogous to a video. Add to that the analysis of changes over time.

2.4. Vierte Phase: Kontext ermitteln und Ergebnisse2.4. Fourth phase: determine context and results

1 zeigt das Gesamtbid und listet das Ergebnis der Bilderkennung, in Form eines Wahrscheinlichkeitsrankings der erkannten Bildobjekte:

1. Schloß » College » Palazzo o. ä.
2. Menschen (von hinten) auf Weg zu 1.
3. Aus Torbogen kommend.

1 shows the total bid and lists the result of the image recognition, in the form of a probability ranking of the recognized image objects:

1. Castle »College» Palazzo o. Ä.
2. People (from behind) on the way to 1.
3. Coming from archway.

Die beigefügten Figuren erläutern rein beispielhaft die vorliegende Erfindung.The attached figures illustrate the present invention purely by way of example.

1: Das zu analysierende Gesamtbild. 1 : The overall picture to be analyzed.

2: Die Selektion der Bildelemente aus 1 führt dann zu separierten einzelnen Bildkomponenten, die hier dargestellt sind. 2 : Selection of picture elements 1 then leads to separated individual image components, which are shown here.

3: Die einzelnen Bildkomponenten werden nun alle parallel weiterverarbeitet, und auch solche Bildelemente, die mehrmals vorkommen, werden auch mit ihrer exakten Anzahl des Vorkommens im Gesamtbild parallel berücksichtigt. 3 : The individual image components are now all processed in parallel, and even those pixels that occur several times, are also considered in parallel with their exact number of occurrences in the overall picture.

4: Alle nunmehr parallel betrachteten Einzelelemente des Gesamtbildes sowie auch gleichzeitig das Gesamtbild welches all diese Einzelbildelemente enthält, werden nun einer weiteren Bildobjekt-Merkmalsanalyse unterzogen und mit genauer beschreibenden textuellen Attributen versehen. 4 : All individual elements of the overall image, which are now considered in parallel, as well as the overall image which contains all these individual image elements, are now subjected to a further image object feature analysis and provided with more precisely descriptive textual attributes.

5: Hier wird nun die ebenfalls parallel ablaufende Bewertung und Analyse des Gesamtbildes unter Berücksichtigung der einzelnen Bildelemente, welche nunmehr bereits näher textuell charakterisiert und klassifiziert wurden am Beispiel der Entscheidung, ob eine Perpektive im Gesamtbild vorliegt, veranschaulicht. 5 : Here, the parallel evaluation and analysis of the overall picture taking into account the individual picture elements, which have already been characterized and classified more closely in textual terms, is illustrated by the example of deciding whether there is a perspective in the overall picture.

6: Hier ist nun nochmal eines der Einzelbildelemente aus 4 herausgegriffen, nämlich die größtenteils weiße Fläche des Bodenbereiches des Gesamtbildes, womit nochmal die Einzelbildelement-Analyse beispielhaft hervorgehoben werden soll. 6 : Here's another one of the single picture elements 4 singled out, namely the largely white surface of the bottom portion of the overall picture, which again the single-frame element analysis is to be highlighted as an example.

7: Hier ist nun nochmal ein anderes der Einzelbildelemente aus 4 herausgegriffen, nämlich der große und im Gesamtbild dominierende Torbogen, womit abermals die Einzelbildelement-Analyse beispielhaft hervorgehoben werden soll. 7 : Here is another one of the single picture elements 4 singled out, namely the large and dominant in the overall picture archway, which once again the single-pixel analysis is to be highlighted as an example.

8: Hier wird nun das nächste einzelne Bildelement aus der 4 herausgegriffen, und zwar ein weniger komplexes „kleineres”, nämlich das Bogenfenster, welches mehrmals im Gesamtbild vorkommt. 8th Here is the next single pixel from the 4 singled out, and a less complex "smaller", namely the bow window, which occurs several times in the overall picture.

9: Hier werden nun alle als ähnlich erkannten Einzel-Bildelemente in dieser Figur aufgelistet. 9 : Here are all now recognized as similar individual picture elements listed in this figure.

Gemäß der Erfindung sollen alle diese bis zu diesem Zeitpunkt vom maschinellen System als ähnlich erkannten Einzel-Bildelemente, sollen nun als sogenanntes Meta-Element zu einer übergeordneten gemeinsamen Klasse von identifizierten Bildobjekten zusammengefasst und mit einem oder mehreren charakterisierenden textuell beschreibenden Oberbegriff(en) versehen werden, die zunächst von der Maschine als Arbeitshypothesen betrachtet werden. Diese Arbeitshypothesen werden nun nacheinander mit dem Gesamtbild wie auch mit (allen) separierten Meta-Elementen verglichen mit dem Ziel, die erkannten Objekte nach dem geringsten Grad von Widersprüchen in eine Rangfolge der Signifikanz anzuordnen. Hier könnte (und sollte) sich aufgrund solcher Vergleiche eine Rangfolge ergeben wie etwa Schloß » College » Palazzo » Kirche.According to the invention, all of these individual picture elements identified by the mechanical system up to this time should now be combined as a so-called meta-element into a superordinate common class of identified picture objects and provided with one or more characterizing descriptive textual terms , which are initially considered by the machine as working hypotheses. These working hypotheses are then compared successively with the overall picture as well as with (all) separated meta-elements with the aim of arranging the recognized objects in the order of importance according to the least degree of contradiction. Here could (and should) be ranked by such comparisons, such as Castle »College» Palazzo »Church.

10: Im nächsten Schritt werden dann z. B. die Farbflächen nochmals analysiert und ebenfalls in diese Vergleichsoperationen zur Widerspruchsminimierung eingebunden, um die Rangfolge der ermittelten Arbeitshypothesen weiter zu präzisieren. 10 In the next step then z. B. the color areas analyzed again and also included in these comparison operations to minimize contradictions in order to further specify the ranking of the identified working hypotheses.

11: Im weiteren nächsten Schritt wird nun z. B. die Analyse des Bodens (welcher durch vorangegangenes Auffinden einer Horizontlinie im Gesamtbild als solcher erkannt wurde) auch in diese Widerspruchs-minimierenden und Übereinstimmungs-maximierenden Vergleichsoperationen zwischen den Meta-Elementen untereinander und mit dem Teil- oder Gesamtinhalt des ganzen Bildes mit einbezogen, um die Signifikanz der Rangfolge auch der diesbezüglichen Hypothesen bzgl. des erkannten Bildinhaltes weiter zu präzisieren. Es ergibt sich hier eine Rangfolge derart: Schnee » Marmor » Teppich. 11 In the next step, z. For example, the analysis of the soil (which was identified by finding a horizon line in the overall image as such) is also included in these contradiction-minimizing and match-maximizing comparison operations between the meta-elements and the partial or total content of the whole image. in order to further clarify the significance of the order of precedence and hypotheses regarding the recognized image content. The result is a ranking like this: snow »marble» carpet.

12: Hier wird der Ausschnitt des Gesamtbildes hervorgehoben, der am geeignetesten dafür ist, die maschinelle Überprüfung auf etwaigen vorliegenden Schnellfall im Bild zu überprüfen. Eventuell kann hierbei auch ein manueller Eingriff bei der Auswahl solcher gezielt gewählten Bildausschnitte insbesondere in der Trainingsphase des KI-Systems vorgesehen sein. 12 : Here is the section of the overall picture highlighted, which is most suitable for checking the machine check for any present rapid case in the picture. It may also be possible to provide manual intervention in the selection of such selectively selected image sections, in particular in the training phase of the AI system.

Gemäß 1 wird das Ergebnis der Bilderkennung in Form eines Wahrscheinlichkeitsrankings der erkannten Bildobjekte gelistet:

4. Schloß » College » Palazzo o. ä.
5. Menschen (von hinten) auf Weg zu 1.
6. Aus Torbogen kommend.

According to 1 the result of the image recognition is listed in the form of a probability ranking of the recognized image objects:

4. Castle »College» Palazzo o. Ä.
5. People (from behind) on the way to 1.
6. Coming from archway.

Die vorliegende Erfindung ist insbesondere nicht nur für Einzelbilder sondern auch für eine Abfolge von Bildern bzw. ein Video ausgerichtet, um den Inhalt eines Videos automatisiert zu ermitteln. Gemäß der Erfindug können Objekte ähnlich wie bei einem virtuellen Gehirn ermittelt werden, wobei ein Tagging nicht verwendet wird. Gemäß der Erfindung werden neuro-biologische Prozesse angewendet, so dass dadurch ein einfaches Training für die Anpassung an neue Aufgaben ausreichend ist. Gemäß der Erfindung können bei Videos ähnliche Objekte und ähnliche Szenen ermittelt werden, wobei jegliche Art von akustischen Signalen ebenso analysiert und ermittelt werden können. Dies gilt sowohl für herkömmliche 2D-Videos als auch für 3D-Videos.In particular, the present invention is not only aimed at individual images but also at a sequence of images or a video in order to automatically determine the content of a video. According to the invention, objects can be determined similar to a virtual brain, with tagging not used. According to the invention, neuro-biological processes are applied so that a simple training for adaptation to new tasks is sufficient. According to the invention, video-like objects and similar scenes can be detected, and any type of acoustic signals can also be analyzed and detected. This applies to both conventional 2D videos and 3D videos.

Zum Erkennen von Objekten oder Szenen ist ein einfaches und schnelles Training ausreichend, wobei die Merkmale der Objekte vollkommen automatisch extrahiert werden. Dazu sind keine festen Voreinstellungen erforderlich, können jedoch vorgenommen werden. Das Training kann durch einen User ohne jegliche Programmierung erfolgen, wobei dies insbesondere in der gleichen Objektkategorie möglich ist. Die Ergebnisse können zurückverfolgt werden und daher kann die Präzision auf einfache Art und Weise optimiert werden.For recognizing objects or scenes, simple and quick training is sufficient, whereby the features of the objects are extracted completely automatically. No fixed presets are required, but they can be made. The training can be done by a user without any programming, which is possible in particular in the same object category. The results can be traced and therefore the precision can be optimized in a simple way.

Die Software zur automatischen Bilderkennung kann herkömmliche Computer, Windows oder Unix verwenden, welche auch mehrere Videos parallel verarbeiten können. Gemäß der Erfindung können auch MPP-Computer (beispielsweise Exergy) verwendet werden, um in extrem kurzer Zeit Ergebnisse zu erzielen, innovative Anwendungen zu ermöglichen und die Kosten und Ressourcen zu reduzieren.The automatic image recognition software can use conventional computers, Windows or Unix, which can also process several videos in parallel. According to the invention, MPP computers (for example Exergy) can also be used to achieve results in an extremely short time, to enable innovative applications and to reduce costs and resources.

Zur erfindungsgemäßen Videoerkennung wird ein Preprocessing-Toolkit in Verbindung mit einem neuronalen Netz verwendet. Insbesondere werden die einzelnen Bilder eines Videos in Segmente aufgeteilt, anschließend daran einzelne Merkmale extrahiert und in einem Normalizer verarbeitet, wobei unter Verwendung von Vektoren und eines entsprechenden neuronalen Netzes Einzelergebnisse erzielt werden und klassifiziert werden, um die Objekte des Ausgangsbildes bzw. das Gesamtbild zu erkennen.For video detection according to the invention, a preprocessing toolkit is used in conjunction with a neural network. In particular, the individual images of a video are divided into segments, then extracted thereon individual features and processed in a normalizer, wherein using vectors and a corresponding neural network individual results are obtained and classified to recognize the objects of the output image or the entire image ,

Für die Objekterkennung wird auch auf eine Positionsveränderung, auf eine Skalierungsveränderung und eine Drehung Bezug genommen, so dass es beispielsweise unerheblich ist, ob innerhalb eines Videos ein Mensch sich von einer Kamera wegbewegt. Dabei ist es gemäß der Erfindung möglich, das zu ermittelnde Objekt auch dann zu erfassen, wenn beispielsweise aufgrund einer Drehung das Objekt nur teilweise sichtbar ist oder andere Qualitätseinbussen vorliegen.Object recognition also refers to a change in position, to a scaling change and to a rotation, so that it is irrelevant, for example, whether a person moves away from a camera within a video. It is possible according to the invention to detect the object to be detected even if, for example, due to rotation, the object is only partially visible or other quality losses are present.

Gemäß der Erfindung wird als erster Schritt eine Wissensbasis erzeugt, welche auf einer automatischen Extraktion der Merkmale von Trainingsobjekten basiert. Als zweiter Schritt können die zu erfassenden Objekte aufgrund der Wissensbasis ermittelt und gemäß ihrem Inhalt klassifiziert werden bzw. als Text ausgegeben werden.According to the invention, as a first step, a knowledge base is generated which is based on an automatic extraction of the features of training objects. As a second step, the objects to be detected can be determined based on the knowledge base and classified according to their content or output as text.

Gemäß einer bevorzugten Ausführungsform können Keywords für die Suche verwendet werden, wobei gemäß der Erfindung die Ergebnisse in Abhängigkeit von Ähnlichkeiten ermittelt bzw. gerankt werden.According to a preferred embodiment, keywords can be used for the search, and according to the invention, the results are determined or ranked depending on similarities.

Als Videoformate können MPEG II, AVI, H264-Codec verwendet werden. Als Hardware ist Quadcore und 8 GB RAM Win7/2008 ausreichend.The video formats MPEG II, AVI, H264 codec can be used. Quadcore and 8GB RAM Win7 / 2008 are sufficient as hardware.

Für das Training pro Szene sind in etwa ein bis zwei Minuten ausreichend. Für die Klassifikation pro Szene (25 bis 100 MB) werden ca. eine Minute benötigt. Das Preprocessing für 50 bis 100 MB beträgt für das zu konvertierende Video in einzelne Rahmen ca. ein bis zwei Minuten. Mit Hilfe von MPP Computern ist eine Realtime-Analyse und ein Streaming möglich.For the training per scene are sufficient in about one to two minutes. The classification per scene (25 to 100 MB) will take about one minute. The preprocessing for 50 to 100 MB takes about one to two minutes for the video to be converted into individual frames. With the help of MPP computers a real-time analysis and streaming is possible.

Einzelne Bildbearbeitungsmodule können wie folgt strukturiert sein:
Die Merkmalsextraktion extrahiert mehrere Merkmale aus den Bildern und Videofiles. Das Preprocessing umfasst zusätzliche Algorithmen zur Vorverarbeitung von Bildern und Videofiles. Neurobiologische Netzwerkmodule können für den Klassifizierungsprozess und für die Entwicklung von Algorithmen von hoher Performance verwendet werden. Die Klassifizierung kann als Multilayer-Klassifizierungsprozess ausgebildet sein.Individual image processing modules can be structured as follows:
The feature extraction extracts several features from the images and video files. Preprocessing includes additional algorithms for preprocessing images and video files. Neurobiological network modules can be used for the classification process and for the development of high performance algorithms. The classification can be designed as a multilayer classification process.

Ein Merkmalstool kann eine Internetrecherche bzw. eine Internetsuchmaschine und Klassifizierung umfassen. Die Videoverarbeitung kann entsprechende Tools und Analysen von verschiedenen Videoarten verwenden, wobei in sehr grossen Datenstrukturen recherchiert werden kann. Insbesondere kann ein erweitertes Trainingstool verwendet werden.A feature tool may include an internet search and an internet search engine and classification. Video processing can use the appropriate tools and analysis of various video types, with very large data structures to research. In particular, an advanced training tool can be used.

Die Erfindung basiert insbesondere auf der Erfassung des Inhalts einer Szene, wobei eine Textanalyse und eine Soundtrackanalyse vorgenommen wird. Insbesondere können für die Erfindung auch herkömmliche Computer oder ein iPhone oder ein Pad unter Verwendung der Apollo-Videosoftware verwendet werden.In particular, the invention is based on capturing the content of a scene, wherein text analysis and soundtrack analysis is performed. In particular, conventional computers or an iPhone or a pad using the Apollo video software can be used for the invention.

Gemäß der Erfindung können auch Inhalte von Fernsehstationen bzw. Fernsehsendungen analysiert und ermittelt werden, wobei eine Spracherfassung, eine Objekterfassung, eine Gesichtserkennung, Logoerkennung, Szenenerkennung und ähnliches verwendet werden kann. Ferner kann der Anfangs- und Endpunkt eines Videos bzw. einer Fernsehsendung verwendet werden.According to the invention, contents of television stations or television broadcasts can also be analyzed and ascertained, wherein speech detection, object detection, face recognition, logo recognition, scene recognition and the like can be used. Furthermore, the start and end points of a video or a television program can be used.

Gemäß einer weiteren Ausführungsform der Erfindung können Trailer automatisch für einzelne Spielfilme erzeugt werden. Gemäß der Erfindung kann der Videoinhalt im Hinblick auf Musik, Sprache und jegliche Art von Sound recherchiert werden.According to another embodiment of the invention, trailers can be automatically generated for individual feature films. According to the invention, the video content can be researched in terms of music, speech and any kind of sound.

Für die Recherche können Datenbanken und Apps zur Ermittlung eines Videos verwendet werden.For research, databases and apps can be used to identify a video.

Als weiteres Beispiel der Erfindung kann eine elektronische Bedienungsanleitung generiert werden, wobei hierzu beispielsweise ein Photo eines Smartphones verwendet werden kann. Hierzu werden die Informationen mit Hilfe eines Dialogs hinsichtlich des Bedarfs des Nutzers verwendet, wobei neben der Bilderkennung auch ein semantisches Verständnis des Textes bzw. der Sprache erforderlich ist.As a further example of the invention, an electronic user manual can be generated, for which purpose, for example, a photo of a smartphone can be used. For this purpose, the information is used with the aid of a dialogue with regard to the needs of the user, whereby apart from the image recognition also a semantic understanding of the text or the language is required.

Gemäß der Erfindung wird eine Lösung bereitgestellt, welche es ermöglicht, dass von einer Bildanalyse zu einem Text und wiederum zu einer Sprache umgeschaltet bzw. umgewandelt werden kann, je nach Erfordernis.According to the invention, a solution is provided which allows to switch from an image analysis to a text and in turn to a language, as required.

Gemäß der Erfindung können für Logistikaufgaben auch der Verlust von Objekten bzw. Gegenständen oder Prozessfehler ermittelt werden, wobei eine Echtzeitanalyse von Videodaten vorgenommen werden kann.According to the invention, the loss of objects or objects or process errors can also be determined for logistics tasks, wherein a real-time analysis of video data can be performed.

Gemäß einem weiteren Beispiel kann die Erfindung auch auf Satellitendaten zurückgreifen, um die aktuelle Verkehrsdichte zur Echtzeitermittlung der Luftverschmutzung zu ermöglichen.As another example, the invention may also use satellite data to enable current traffic density for real-time air pollution detection.

Gemäß einer weiteren Ausführungsform der Erfindung kann auch ein Cloud-Verfahren verwendet werden, wobei Filme in der Cloud hochgeladen werden und nach der erfindungsgemäßen Verarbeitung verändertes Video und mit entsprechender Bilderkennung aus der Cloud wieder heruntergeladen werden kann.According to a further embodiment of the invention, a cloud method can also be used, wherein films are uploaded in the cloud and after the processing according to the invention modified video and with corresponding image recognition can be downloaded from the cloud again.

Gemäß einer weiteren Ausführungsform der Erfindung können Textdaten mit Hilfe von Keywörtern recherchiert werden. Ferner können Videodaten ohne Struktur recherchiert werden, wobei als Lösung gemäß der Erfindung eine Vielzahl von Kategorien unter Ausbildung einer Bibliothek verknüpft werden können.According to a further embodiment of the invention, text data can be searched by means of keywords. Furthermore, video data can be searched without structure, and as a solution according to the invention, a plurality of categories can be linked to form a library.

Insgesamt beschreibt die Erfindung ein Verfahren zur maschinellen Erkennung von Bilddaten eines Gesamtbilds oder einer Abfolge von Bildern, charakterisiert durch die folgenden Schritte:

– Erfassen des Bildes und dessen Aufteilung und Klassifizierung in Unterlelemente, also detaillierte Bildelemente anhand einer Bildelement- und/oder Bildobjekts-Merkmalsanalyse insbesondere hinsichtlich geometrischen Grundformen, Texturen, Merkmalsanalyse durch analytische deterministische Softwaretechniken der Bildverarbeitung und Bildanalyse realisiert wird, insbesondere Fourier-Analyse, Kantenantastung, Farbanalyse und ähnliche;
– Erkennen und Identifizieren der klassifizierten Bildelements- und/oder Bildobjekts-Merkmale unter Verwendung künstlicher Intelligenz, insbesondere eines neuronalen Netzes derart, dass den Bildelementen und/oder Bildobjekten jeweils eine oder auch mehrere beschreibende Textbezeichnungen zugeordnet werden;
– Einspeisen der den Bildelementen und/oder Bildobjekten zugeordneten Textbezeichnungen in eine textuelle Wissensbasis, in welcher eine weitergehende Analyse der Beziehungen der Bildelemente und oder Bildobjekte zueinander und untereinander sowie zum Bild und/oder zu Teilen des Bildes mittels einer textbasierten Suchmaschine, insbesondere basierend auf einem neuronalen Netz derart vorgenommen werden, dass der Inhalt und Kontext des Bildes oder der Abfolge von Bildern ermittelt wird.

Overall, the invention describes a method for machine recognition of image data of an overall image or a sequence of images, characterized by the following steps:

- Capturing the image and its division and classification in sub-elements, ie detailed picture elements on the basis of a pixel and / or image object feature analysis in particular geometric basic shapes, textures, feature analysis by analytical deterministic software techniques of image processing and image analysis is realized, in particular Fourier analysis, Kantenantastung , Color analysis and the like;
Recognizing and identifying the classified picture element and / or picture object features using artificial intelligence, in particular a neural network such that one or more descriptive text names are respectively assigned to the picture elements and / or picture objects;
Feeding the text designations associated with the picture elements and / or picture objects into a textual knowledge base, in which a further analysis of the relationships of the picture elements and / or picture objects to each other and to one another and to the picture and / or parts of the picture by means of a text-based search engine, in particular based on a Neural network can be made such that the content and context of the image or the sequence of images is determined.

Claims

Method for machine recognition of image contents of an image or a sequence of images characterized by the following steps: acquiring the image and its subdivision and classification, ie detailed image elements based on a pixel and / or image object feature analysis, in particular with regard to basic geometric shapes, textures Color, material, perspective, wherein the pixel and / or image object feature analysis is realized by analytical deterministic software techniques of image processing and image analysis, in particular Fourier analysis, edge sampling, color analysis and the like; Recognizing and identifying the classified picture element and / or picture object features using artificial intelligence, in particular a neural network, such that one or more descriptive text names are respectively assigned to the picture elements and / or picture objects; Feeding the text designations associated with the picture elements and / or picture objects into a textual knowledge base, in which a further analysis of the relationships of the picture elements and / or picture objects to each other and to one another and to the picture and / or parts of the picture by means of a text-based search engine, in particular based on a Neural network can be made such that the content and context of the image or the sequence of images is determined.

A method according to claim 1, wherein the recognition accuracy is increased by repetitively iterating and feedbacking through different portions of the image recognition method or even the entire image recognition method, maximizing matches and / or minimizing conflicts between the working hypothesis initially considered by the image recognition method or image objects or also the partial or entire image content characterizing or descriptive textual words and / or generic terms among each other, especially within the same level of the pixel and image hierarchies and across these hierarchies across, so that a matrix and / or netlike Procedure results.

The method of claim 1-2, wherein the individual significant detailed pixels are extracted based on a pixel feature analysis, in particular with respect to their geometric shape, edges, color areas and gradients, contrasts, textures, degree of resolution, brightness, black and white, perspective and similar, wherein in this case on existing analytical deterministic software engineering methods of image processing and image analysis is used, in particular mathematical and numerical methods such as threshold value method, gradient and extremal value determinations with the aid of the Hesse matrix to determine structural features, in particular of the image elements to be extracted, Kantenantastung, Blob Analysis, Fourier method for the determination of regularities and roughnesses as well as mean grain sizes of textures, cross-correlation-methods with regard to rotation and translation of picture elements, picture objects, picture parts or also the overall picture for the determination of symmetries and / or periodicities, color distribution histograms and the like.

Method according to claim 3, wherein all determined image elements are further processed in parallel to determine further characteristic image element and / or image object features, in particular the query natural / artificial, perspective yes / no, shape, structure, material, color, color gradient, surface, Texture or the like, this second step of the image feature analysis of the picture elements is already output in text form, which in particular already a trained AI system is used.

The method according to claims 1-4, wherein the picture elements described in more detail in text form are fed into an associative text system with inverse taxonomy, ie a text-based search engine, to determine relationships of the picture elements with each other and the arrangement and classification of these picture elements within the context of the picture, thereby By means of this processing of the text-based image features of the picture elements in a textual knowledge base, these picture elements are given textual meaning content, ie the picture elements are subdivided and classified into higher-level groups / classes of meta-elements and provided with a characterizing textual generic term which corresponds to the machine-based image recognition method according to the invention initially serves as a working hypothesis for in further steps to be specified precision of the respective image content.

The method of claim 4 and 5 wherein the text-based image features of the pixels are fed as "keywords" or "search terms" in an associative (text) search engine to the textual meaning contents of the meta-elements, so the classifying upper bounds of the individual groups of similar / equivalent pixels which are most similar to the "sum" of said textual pixel features, that is, the keywords that collectively describe the groups of similar pixels in common.

Method according to claim 6, wherein the meta-elements found characterizing metacultiphores are first regarded as a working hypothesis and by means of iterative feedback loops by further processing in the associative textual knowledge base on most matches or least contradictions between the individual picture elements and the hitherto hypothetical meta-elements. Checking the element-generic term and thus determining the most significant hypothesis for a generic term for a given meta-element from a ranking of possible solutions, where each hypothesis for the characterizing generic term for such a meta-element with intermediate results of other steps with respect to other recognized or yet to be recognized meta-elements can be compared, so that there is a matrix or net-like approach.

A method according to claim 7, wherein the determined characterizing generic terms of the meta-elements now by means of a re-analysis of color surfaces on matches and Contradictions are examined, in particular within the associative textual knowledge base.

The method of claim 1-8, wherein after determining the horizon in the image, a further object is analyzed, in particular with regard to the features and or attributes color, roughness, regularity of structures, limitation and / or the like and wherein these features and / or attributes also in associative textual knowledge bases and / or in hierarchically classified taxonomies are fed to determine the properties, which in particular a geometrically contiguous meta-object is determined in partial analogy and analogous differentiation to the above partly non-geometrically linked meta-elements.

The method of claim 1-9, wherein all detected meta-objects and meta-elements are checked again for the least number of contradictions, both in terms of the visual pixel features determined as well as logical relations to each other, also taking into account the overall picture context, each Hypothesis for the characterizing generic term for a meta-element and / or a meta-object is compared with intermediate results of other steps with respect to other detected or yet to be recognized meta-elements and / or meta-objects, so that there is a net-like approach.

The method according to claims 1-10, wherein all determined most probable characterizing generic terms of the recognized / identified meta-elements and / or meta-objects are in turn fed into the associative textual knowledge base, in particular a text based search engine based on a neural network, in particular as "search terms". and / or "keywords" in order to determine a textual characterization of the overall picture content, wherein each hypothesis for the characterizing generic term for a meta-element and / or meta-object with intermediate results of other steps with regard to other recognized or still to be recognized meta-elements and / or Meta objects is compared, so that a matrix-like structure is generated.