DE112021006196T5

DE112021006196T5 - METHOD AND APPARATUS FOR VISUAL INFERENCE

Info

Publication number: DE112021006196T5
Application number: DE112021006196.8T
Authority: DE
Inventors: Ke Su; Chongxuan Li; Hang Su; Jun Zhu; Bo Zhang; Ze Cheng; Siliang Lu
Original assignee: Tsinghua University; Robert Bosch GmbH
Current assignee: Tsinghua University; Robert Bosch GmbH
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2023-09-28
Also published as: CN117223033A; WO2022183403A1; US20240185023A1

Abstract

Die vorliegende Offenbarung stellt ein Verfahren zum visuellen Schlussfolgern bereit. Das Verfahren umfasst: Bereitstellen eines Netzwerks mit Sätzen von Eingaben und Sätzen von Ausgaben, wobei jeder Satz von Eingaben aus den Sätzen von Eingaben auf einen eines Satzes von Ausgaben abgebildet wird, der dem Satz von Eingaben basierend auf visuellen Informationen über den Satz von Eingaben entspricht, und wobei das Netzwerk ein probabilistisches generatives Modell (PGM) und einen Satz von Modulen umfasst; Bestimmen einer Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen durch das PGM, basierend auf den Sätzen von Eingaben und Sätzen von Ausgaben; und Anwenden von Domänenwissen als eine oder mehrere posteriore Regularisierungsbeschränkungen auf die bestimmte Posterior-Verteilung.The present disclosure provides a method for visual reasoning. The method includes: providing a network having sets of inputs and sets of outputs, each set of inputs from the sets of inputs being mapped to one of a set of outputs corresponding to the set of inputs based on visual information about the set of inputs , and wherein the network includes a probabilistic generative model (PGM) and a set of modules; determining by the PGM a posterior distribution over combinations of one or more modules of the set of modules based on the sets of inputs and sets of outputs; and applying domain knowledge as one or more posterior regularization constraints to the particular posterior distribution.

Description

GEBIETAREA

Gesichtspunkte der vorliegenden Offenbarung beziehen sich im Allgemeinen auf künstliche Intelligenz und insbesondere auf ein Verfahren und ein Netzwerk für visuelles Schlussfolgern.Aspects of the present disclosure relate generally to artificial intelligence, and more particularly to a method and network for visual reasoning.

HINTERGRUNDBACKGROUND

Künstliche Intelligenz (KI) wird in einer Vielzahl von Bereichen wie Bildklassifizierung, Objekterkennung, Szenenverständnis, maschinelle Übersetzung und dergleichen eingesetzt. Es besteht ein zunehmendes Interesse an visueller Schlussfolgerung mit einer zunehmenden Wachstum von Anwendungen wie visuelle Fragenbeantwortung (VQA), verkörperte Fragenbeantwortung, visuelle Navigation, Autopilot und dergleichen, wo KI-Modelle im Allgemeinen erforderlich sein können, um Kognitionsprozesse auf hoher Ebene über Wahrnehmungsergebnisse auf niedriger Ebene durchzuführen, zum Beispiel, um abstrakte Schlussfolgerung auf hoher Ebene über einfache visuelle Konzepte wie Linien, Formen und dergleichen durchzuführen.Artificial intelligence (AI) is used in a variety of areas such as image classification, object recognition, scene understanding, machine translation and the like. There is increasing interest in visual reasoning with an increasing growth of applications such as visual question answering (VQA), embodied question answering, visual navigation, autopilot and the like, where AI models may generally be required to predict high-level cognitive processes over low-level perceptual outcomes level, for example, to perform high-level abstract reasoning about simple visual concepts such as lines, shapes, and the like.

Tiefe neuronale Netze wurden in großem Umfang im Bereich der visuellen Schlussfolgerung angewandt, wo tiefe neuronale Netze trainiert werden können, um die Korrelation zwischen Taskeingabe und -ausgabe zu modellieren und bei verschiedenen Aufgaben der visuellen Schlussfolgerung mit tiefem und reichhaltigem Repräsentationslernen erfolgreich sein zu können, insbesondere bei Wahrnehmungsaufgaben. Zusätzlich haben modularisierte Netzwerke in den letzten Jahren mehr und mehr Aufmerksamkeit für visuelle Schlussfolgerung auf sich gezogen, wodurch Deep Learning und symbolische Schlussfolgerung vereint werden können, wobei der Schwerpunkt auf dem Aufbau neuronal-symbolischer Modelle liegt, mit dem Ziel, das Beste aus Repräsentationslernen und symbolischer Schlussfolgerung zu kombinieren. Die Grundidee besteht darin, neuronale Module, die jeweils einen primitiven Schritt im Schlussfolgerungsprozess darstellen, manuell zu entwerfen und Schlussfolgerungsprobleme zu lösen, indem diese Module zu jeweiligen symbolischen Netzwerken zusammengefügt werden, die den gelösten Schlussfolgerungsproblemen entsprechen.Deep neural networks have been widely applied in the field of visual inference, where deep neural networks can be trained to model the correlation between task input and output and can succeed in various visual inference tasks with deep and rich representation learning, in particular in perceptual tasks. Additionally, in recent years, modularized networks have attracted more and more attention to visual inference, allowing deep learning and symbolic inference to be combined, with a focus on building neural-symbolic models, aiming to get the best of representation learning and to combine symbolic conclusions. The basic idea is to manually design neural modules, each representing a primitive step in the reasoning process, and solve reasoning problems by assembling these modules into respective symbolic networks corresponding to the solved reasoning problems.

Mit diesem modularisierten Netzwerk mit neuronal-symbolischer Methodik kann ein herkömmliches Problem der visuellen Fragenbeantwortung (VQA) im Allgemeinen richtig gelöst werden, wobei die Fragen im Allgemeinen in Form von Texten vorliegen. Zusätzlich zur VQA wird neuerdings eine abstrakte visuelle Schlussfolgerung vorgeschlagen, um abstrakte Konzepte oder Fragen direkt aus einer visuellen Eingabe ohne natürlichsprachliche Fragestellung, wie aus einem Bild, zu extrahieren und Schlussfolgerungsprozesse dementsprechend durchzuführen. Da Schlussfolgerung über abstrakte Konzepte seit langem eine Herausforderung im Bereich des maschinellen Lernens darstellt, können die derzeitigen Verfahren oder KI-Modelle, wie sie vorstehend beschrieben wurden, bei einer solchen abstrakten visuellen Schlussfolgerung eine unbefriedigende Leistung aufweisen.This modularized network using neural-symbolic methodology can generally correctly solve a traditional visual question answering (VQA) problem, where the questions are generally in the form of texts. In addition to VQA, abstract visual inference is recently proposed to extract abstract concepts or questions directly from a visual input without natural language questioning, such as from an image, and perform inference processes accordingly. Since inference about abstract concepts has long been a challenge in the field of machine learning, current methods or AI models as described above may have unsatisfactory performance in such abstract visual inference.

Es kann wünschenswert sein, noch bessere Verfahren oder KI-Modelle bereitzustellen, um abstrakte visuelle Schlussfolgerungsaufgaben zu verarbeiten.It may be desirable to provide even better methods or AI models to process abstract visual reasoning tasks.

KURZDARSTELLUNGSHORT PRESENTATION

Das Folgende stellt eine vereinfachte Kurzdarstellung eines oder mehrerer Gesichtspunkte gemäß der vorliegenden Offenbarung dar, um ein grundlegendes Verständnis solcher Gesichtspunkte bereitzustellen. Diese Kurzdarstellung ist kein umfassender Überblick über alle in Betracht gezogenen Gesichtspunkte und soll weder Schlüssel- oder kritische Elemente aller Gesichtspunkte identifizieren noch den Umfang eines oder aller Gesichtspunkte abgrenzen. Ihr einziger Zweck besteht darin, einige Konzepte eines oder mehrerer Gesichtspunkte als Vorwegnahme der nachfolgend präsentierten detaillierteren Beschreibung in vereinfachter Form darzustellen.The following presents a simplified summary of one or more aspects in accordance with the present disclosure to provide a basic understanding of such aspects. This brief is not a comprehensive overview of all considerations and is not intended to identify key or critical elements of all considerations nor to delineate the scope of any or all considerations. Its sole purpose is to present in a simplified form some concepts of one or more points of view in anticipation of the more detailed description presented below.

In einem Gesichtspunkt der Offenbarung umfasst ein Verfahren für visuelle Schlussfolgerung: Bereitstellen eines Netzwerks mit Sätzen von Eingaben und Sätzen von Ausgaben, wobei jeder Satz von Eingaben der Sätze von Eingaben auf einen eines Satzes von Ausgaben, der dem Satz von Eingaben entspricht, basierend auf visuellen Informationen über den Satz von Eingaben, abgebildet wird, und wobei das Netzwerk ein probabilistisches generatives Modell (PGM) und einen Satz von Modulen umfasst; Bestimmen einer Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen durch das PGM, basierend auf den Sätzen von Eingaben und Sätzen von Ausgaben; und Anwenden von Domänenwissen als eine oder mehrere posteriore Regularisierungsbeschränkungen auf die bestimmte Posterior-Verteilung.In one aspect of the disclosure, a method for visual reasoning includes: providing a network having sets of inputs and sets of outputs, each set of inputs of the sets of inputs being based on one of a set of outputs corresponding to the set of inputs information about the set of inputs, and wherein the network comprises a probabilistic generative model (PGM) and a set of modules; determining by the PGM a posterior distribution over combinations of one or more modules of the set of modules based on the sets of inputs and sets of outputs; and applying domain knowledge sen as one or more posterior regularization constraints on the particular posterior distribution.

In einem weiteren Gesichtspunkt der Offenbarung wird ein Verfahren für visuelles Schlussfolgern mit einem Netzwerk bereitgestellt, das ein probabilistisches generatives Modell (PGM) und einen Satz von Modulen umfasst, wobei das Verfahren umfasst: Bereitstellen des Netzwerks mit einem Satz von Eingabebildern und einem Satz von Kandidatenbildern; Erzeugen einer Kombination von einem oder mehreren Modulen des Satzes von Modulen basierend auf einer Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen und dem Satz von Eingabebildern, wobei die Posterior-Verteilung von dem unter Domänenwissen trainierten PGM als eine oder mehrere posteriore Regularisierungsbeschränkungen formuliert wird; Verarbeiten des Satzes von Eingabebildern und des Satzes von Kandidatenbildern durch die erzeugte Kombination von einem oder mehreren Modulen; und Auswählen eines Kandidatenbildes aus dem Satz von Kandidatenbildern basierend auf einer Bewertung jedes Kandidatenbildes in dem Satz von Kandidatenbildern, die durch das Verarbeiten geschätzt wird.In another aspect of the disclosure, there is provided a method for visual reasoning with a network comprising a probabilistic generative model (PGM) and a set of modules, the method comprising: providing the network with a set of input images and a set of candidate images ; Generating a combination of one or more modules of the set of modules based on a posterior distribution over combinations of one or more modules of the set of modules and the set of input images, where the posterior distribution of the PGM trained under domain knowledge is one or more posterior regularization constraints are formulated; processing the set of input images and the set of candidate images through the generated combination of one or more modules; and selecting a candidate image from the set of candidate images based on a score of each candidate image in the set of candidate images estimated by the processing.

In einem weiteren Gesichtspunkt der Offenbarung umfasst ein Netzwerk für visuelle Schlussfolgerung: einen Satz von Modulen, wobei jeder des Satzes von Modulen als neuronales Netzwerk implementiert ist und mindestens einen trainierbaren Parameter zum Fokussieren dieses Moduls auf eine oder mehrere variable Bildeigenschaften aufweist; und ein probabilistisches generatives Modell (PGM), das mit dem Satz von Modulen gekoppelt ist, wobei das PGM konfiguriert ist, um eine Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen auszugeben.In another aspect of the disclosure, a visual inference network includes: a set of modules, each of the set of modules being implemented as a neural network and having at least one trainable parameter for focusing that module on one or more variable image properties; and a probabilistic generative model (PGM) coupled to the set of modules, the PGM configured to output a posterior distribution over combinations of one or more modules of the set of modules.

In einem weiteren Gesichtspunkt der Offenbarung umfasst die Vorrichtung für visuelles Denken einen Speicher; und mindestens einen Prozessor, der mit dem Speicher gekoppelt ist. Der mindestens eine Prozessor ist konfiguriert, um ein Netzwerk mit Sätzen von Eingaben und Sätzen von Ausgaben bereitzustellen, wobei jeder Satz von Eingaben aus den Sätzen von Eingaben auf einen eines Satzes von Ausgaben abgebildet wird, der dem Satz von Eingaben basierend auf visuellen Informationen über den Satz von Eingaben entspricht, und wobei das Netzwerk ein probabilistisches generatives Modell (PGM) und einen Satz von Modulen umfasst; Bestimmen einer Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen durch das PGM, basierend auf den Sätzen von Eingaben und Sätzen von Ausgaben; und Anwenden von Domänenwissen als eine oder mehrere posteriore Regularisierungsbeschränkungen auf die bestimmte Posterior-Verteilung.In another aspect of the disclosure, the visual reasoning device includes a memory; and at least one processor coupled to the memory. The at least one processor is configured to provide a network with sets of inputs and sets of outputs, each set of inputs from the sets of inputs being mapped to one of a set of outputs corresponding to the set of inputs based on visual information about the set of inputs, and wherein the network comprises a probabilistic generative model (PGM) and a set of modules; determining by the PGM a posterior distribution over combinations of one or more modules of the set of modules based on the sets of inputs and sets of outputs; and applying domain knowledge as one or more posterior regularization constraints to the particular posterior distribution.

In einem weiteren Gesichtspunkt der Offenbarung umfasst ein Computerprogrammprodukt für visuelles Denken einen durch einen Prozessor ausführbaren Computercode zum Bereitstellen eines Netzwerks mit Sätzen von Eingaben und Sätzen von Ausgaben, wobei jeder Satz von Eingaben der Sätze von Eingaben auf einen eines Satzes von Ausgaben, der dem Satz von Eingaben entspricht, basierend auf visuellen Informationen über den Satz von Eingaben, abgebildet wird, und wobei das Netzwerk ein probabilistisches generatives Modell (PGM) und einen Satz von Modulen umfasst; Bestimmen einer Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen durch das PGM, basierend auf den Sätzen von Eingaben und Sätzen von Ausgaben; und Anwenden von Domänenwissen als eine oder mehrere posteriore Regularisierungsbeschränkungen auf die bestimmte Posterior-Verteilung.In another aspect of the disclosure, a visual reasoning computer program product includes processor-executable computer code for providing a network having sets of inputs and sets of outputs, each set of inputs of the sets of inputs being responsive to one of a set of outputs corresponding to the set of inputs is mapped based on visual information about the set of inputs, and wherein the network comprises a probabilistic generative model (PGM) and a set of modules; determining by the PGM a posterior distribution over combinations of one or more modules of the set of modules based on the sets of inputs and sets of outputs; and applying domain knowledge as one or more posterior regularization constraints to the particular posterior distribution.

In einem weiteren Gesichtspunkt der Offenbarung speichert ein computerlesbares Medium einen Computercode für visuelle Schlussfolgerung. Der Computercode, wenn er durch einen Prozessor ausgeführt wird, veranlasst den Prozessor, ein Netzwerk mit Sätzen von Eingaben und Sätzen von Ausgaben bereitzustellen, wobei jeder Satz von Eingaben der Sätze von Eingaben auf einen eines Satzes von Ausgaben abgebildet wird, der dem Satz von Eingaben entspricht, basierend auf visuellen Informationen über den Satz von Eingaben, und wobei das Netzwerk ein probabilistisches generatives Modell (PGM) und einen Satz von Modulen umfasst; Bestimmen einer Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen durch das PGM, basierend auf den Sätzen von Eingaben und Sätzen von Ausgaben; und Anwenden von Domänenwissen als eine oder mehrere posteriore Regularisierungsbeschränkungen auf die bestimmte Posterior-Verteilung.In another aspect of the disclosure, a computer-readable medium stores computer code for visual reasoning. The computer code, when executed by a processor, causes the processor to provide a network of sets of inputs and sets of outputs, each set of inputs of the sets of inputs being mapped to one of a set of outputs corresponding to the set of inputs based on visual information about the set of inputs, and wherein the network comprises a probabilistic generative model (PGM) and a set of modules; determining by the PGM a posterior distribution over combinations of one or more modules of the set of modules based on the sets of inputs and sets of outputs; and applying domain knowledge as one or more posterior regularization constraints to the particular posterior distribution.

Mit Unterstützung des Domänenwissens können die erzeugten modularisierten Netzwerke Strukturen bereitstellen, die einen von Menschen interpretierbaren Schlussfolgerungsprozess präzise darstellen, was zu einer verbesserten Leistung führen kann.With the support of domain knowledge, the generated modularized networks can provide structures that accurately represent a human-interpretable reasoning process, which can lead to improved performance.

Andere Gesichtspunkte oder Variationen der Offenbarung sowie andere Vorteile werden unter Berücksichtigung der folgenden detaillierten Beschreibung und beigefügten Zeichnungen offensichtlich.Other aspects or variations of the disclosure, as well as other advantages, will become apparent upon consideration of the following detailed description and accompanying drawings.

KURZBESCHREIBUNG DER ZEICHNUNGENBRIEF DESCRIPTION OF DRAWINGS

Die offenbarten Gesichtspunkte werden nachstehend in Verbindung mit den beigefügten Zeichnungen beschrieben, die bereitgestellt werden, um die offenbarten Gesichtspunkte zu veranschaulichen und nicht zu beschränken.

1 zeigt ein Beispiel für abstrakte visuelle Schlussfolgerung.
2 veranschaulicht ein beispielhaftes Netzwerk, in dem Gesichtspunkte der vorliegenden Offenbarung durchgeführt werden können.
3A und 38 veranschaulichen beispielhafte modularisierte Netzwerke mit unterschiedlichen Strukturen.
4 zeigt ein beispielhaftes Flussdiagramm, das ein Verfahren zum Durchführen einer abstrakten visuellen Schlussfolgerungsaufgabe mit einem probabilistischen neuronal-symbolischen Modell veranschaulicht, das gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung mit Domänenwissen regularisiert wird.
5 stellt ein beispielhaftes Flussdiagramm dar, das einen Optimierungsprozess für eine abstrakte visuelle Schlussfolgerungsaufgabe gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung veranschaulicht.
6 zeigt ein beispielhaftes Flussdiagramm, das ein Verfahren zum Durchführen einer abstrakten visuellen Schlussfolgerungsaufgabe mit einem probabilistischen neuronal-symbolischen Modell veranschaulicht, das gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung mit Domänenwissen regularisiert wird.
7 veranschaulicht ein weiteres beispielhaftes Netzwerk, in dem Gesichtspunkte der vorliegenden Offenbarung durchgeführt werden können.
8 stellt ein beispielhaftes Flussdiagramm dar, das einen Optimierungsprozess für eine abstrakte visuelle Schlussfolgerungsaufgabe gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung veranschaulicht.
9 veranschaulicht ein Beispiel einer Hardware-Implementierung für eine Vorrichtung gemäß einer Ausführungsform der vorliegenden Offenbarung.

The aspects disclosed are described below in connection with the accompanying drawings, which are provided to illustrate and not limit the aspects disclosed.

1 shows an example of abstract visual inference.
2 illustrates an example network in which aspects of the present disclosure may be implemented.
3A and 38 illustrate exemplary modularized networks with different structures.
4 shows an example flowchart illustrating a method for performing an abstract visual reasoning task with a probabilistic neural-symbolic model regularized with domain knowledge in accordance with one or more aspects of the present disclosure.
5 depicts an example flowchart illustrating an optimization process for an abstract visual reasoning task in accordance with one or more aspects of the present disclosure.
6 shows an example flowchart illustrating a method for performing an abstract visual reasoning task with a probabilistic neural-symbolic model regularized with domain knowledge in accordance with one or more aspects of the present disclosure.
7 illustrates another example network in which aspects of the present disclosure may be implemented.
8th depicts an example flowchart illustrating an optimization process for an abstract visual reasoning task in accordance with one or more aspects of the present disclosure.
9 illustrates an example of a hardware implementation for a device according to an embodiment of the present disclosure.

DETAILLIERTE BESCHREIBUNGDETAILED DESCRIPTION

Die vorliegende Offenbarung wird nun unter Bezugnahme auf mehrere beispielhafte Implementierungen erörtert. Es versteht sich, dass diese Implementierungen nur erörtert werden, um es dem Fachmann zu ermöglichen, die Ausführungsformen der vorliegenden Offenbarung besser zu verstehen und somit zu implementieren, und nicht, um Einschränkungen des Schutzumfangs der vorliegenden Offenbarung nahezulegen.The present disclosure will now be discussed with reference to several example implementations. It is understood that these implementations are discussed only to enable those skilled in the art to better understand and thus implement the embodiments of the present disclosure, and not to suggest limitations on the scope of the present disclosure.

Gegenüber den herkömmlichen Computer-Vision-Aufgaben wie Bildklassifizierung und Objekterkennung geht visuelle Schlussfolgerung einen Schritt weiter und erfordert nicht nur ein umfassendes Verständnis des visuellen Inhalts, sondern auch die Fähigkeit, über die extrahierten Konzepte nachzudenken, um Rückschlüsse zu ziehen. 1 zeigt ein Beispiel für abstrakte visuelle Schlussfolgerung, bei dem die acht Bildfelder im linken gestrichelten Kasten eine Reihe von Eingaben und die sechs Bildfelder im rechten gestrichelten Kasten eine Reihe von Ausgaben darstellen. Es können eine oder mehrere gemeinsame Regeln zwischen dem Satz von Eingaben und dem richtigen Satz von Ausgaben vorhanden sein. Um aus mehreren in Frage kommenden Ausgabefeldern das richtige auszuwählen, werden die gemeinsamen Regeln extrahiert und unter Verwendung dieser Regeln auf das richtige Ausgabefeld abgebildet. In dem Beispiel von 1 kann beispielsweise die gemeinsame Regel für die acht Eingabebildfelder eine aufsteigende Anzahl von Formen pro Zeile sein, und das richtige Ausgabefeld D kann basierend auf der Regel ausgewählt werden. Beispielsweise kann das Extrahieren der Regel einer aufsteigenden Anzahl von Formen pro Zeile eine abstrakte Schlussfolgerungsaufgabe auf hoher Ebene sein, die auf einem oder mehreren visuellen Konzepten auf niedriger Ebene basiert, wie verschiedene Formen in jedem der Eingabebildfelder. [0027] Die vorliegende Offenbarung schlägt ein Verfahren zum Durchführen einer abstrakten visuellen Schlussfolgerungsaufgabe mit einem probabilistischen neuronal-symbolischen Modell vor, das mit Domänenwissen regularisiert wird. Ein neuronal-symbolisches Modell kann ein leistungsfähiges Tool bereitstellen, das die symbolische Programmausführung für logisches Denken und tiefes Repräsentationslernen für visuelle Erkennung kombiniert. Beispielsweise kann ein neuronal-symbolisches Modell ein bestimmtes modularisiertes Netzwerk bilden, das für jede Eingabe ein oder mehrere Module umfasst, die jeweils aus einem Satz von Modulen ausgewählt werden, wie einem Bestand an wiederverwendbaren Modulen. Eine probabilistische Formulierung zum Trainieren von Modellen mit stochastischen latenten Variablen kann ein interpretierbares und lesbares Schlussfolgerungssystem mit weniger Überwachungen erhalten.Compared to traditional computer vision tasks such as image classification and object detection, visual inference goes a step further and requires not only a comprehensive understanding of the visual content but also the ability to reason about the extracted concepts to make inferences. 1 shows an example of abstract visual reasoning in which the eight panels in the left dashed box represent a set of inputs and the six panels in the right dashed box represent a set of outputs. There may be one or more common rules between the set of inputs and the correct set of outputs. In order to select the correct one from among several candidate output fields, the common rules are extracted and mapped to the correct output field using these rules. In the example of 1 For example, the common rule for the eight input image fields can be an increasing number of shapes per line, and the correct output field D can be selected based on the rule. For example, extracting the rule of increasing number of shapes per row can be a high-level abstract inference task based on one or more low-level visual concepts, such as different shapes in each of the input image fields. The present disclosure proposes a method for performing an abstract visual reasoning task with a probabilistic neural-symbolic model regularized with domain knowledge. A neural-symbolic model can provide a powerful tool that combines symbolic program execution for reasoning and deep representation learning for visual recognition. For example, a neural-symbolic model may form a particular modularized network that includes, for each input, one or more modules, each selected from a set of modules like a stock of reusable modules. A probabilistic formulation for training models with stochastic latent variables can obtain an interpretable and readable inference system with fewer supervisions.

Domänenwissen kann bei der Erzeugung eines angemessenen modularisierten Netzwerks eine Orientierungshilfe bereitstellen, da es sich im Allgemeinen um ein Optimierungsproblem mit einer Mischung aus kontinuierlichen und diskreten Variablen handelt. Mit Unterstützung des Domänenwissens können die erzeugten modularisierten Netzwerke Strukturen bereitstellen, die einen von Menschen interpretierbaren Schlussfolgerungsprozess präzise darstellen, was zu einer verbesserten Leistung führen kann.Domain knowledge can provide guidance in generating an appropriate modularized network, as it is generally an optimization problem with a mix of continuous and discrete variables. With the support of domain knowledge, the generated modularized networks can provide structures that accurately represent a human-interpretable reasoning process, which can lead to improved performance.

2 veranschaulicht ein beispielhaftes Netzwerk 200, in dem Gesichtspunkte der vorliegenden Offenbarung durchgeführt werden können. Beispielsweise kann das Netzwerk 200 ein probabilistisches generatives Modell (PGM) 210 und einen Satz von Modulen 220 einschließen, wie einen Bestand an wiederverwendbaren Modulen. In einem Gesichtspunkt der vorliegenden Offenbarung kann eine Vielzahl von Kombinationen von einem oder mehreren Modulen aus dem Satz von Modulen 220 ausgewählt werden, um jeweilige Sätze von Eingaben zu lösen, und die Vielzahl von Kombinationen des Satzes von Modulen 220 kann als eine latente Variable betrachtet werden, für die eine Posterior-Verteilung durch das PGM 210 formuliert werden kann, indem ein Datensatz gelernt wird. Beispielsweise können ein oder mehrere Module aus dem Bestand an wiederverwendbaren Modulen ausgewählt werden, um ein modularisiertes Netzwerk mit einer Struktur zusammenzustellen, die die zusammengestellten Module und die Verbindungen dazwischen angibt. Beispielsweise kann die Struktur des zusammengestellten modularisierten Netzwerks als gerichteter azyklischer Graph (DAG) dargestellt werden. Das PGM 210 kann verwendet werden, um eine Verteilung über Strukturen modularisierter Netzwerke zu formulieren, wobei der Satz von Modulen 220 ein Bestand an wiederverwendbaren Modulen für das Zusammenstellen von modularisierten Netzwerken sein kann. Beispielsweise kann das PGM 210 eine Posterior-Verteilung über Strukturen von modularisierten Netzwerken durch Lernen eines Datensatzes formulieren. Die formulierte Posterior-Verteilung über Strukturen von modularisierten Netzwerken kann mit Domänenwissen regularisiert werden. 2 illustrates an example network 200 in which aspects of the present disclosure may be performed. For example, the network 200 may include a probabilistic generative model (PGM) 210 and a set of modules 220, such as a reusable module inventory. In one aspect of the present disclosure, a plurality of combinations of one or more modules from the set of modules 220 may be selected to solve respective sets of inputs, and the plurality of combinations of the set of modules 220 may be considered a latent variable , for which a posterior distribution can be formulated by the PGM 210 by learning a data set. For example, one or more modules may be selected from the inventory of reusable modules to compose a modularized network with a structure that specifies the assembled modules and the connections between them. For example, the structure of the assembled modularized network can be represented as a directed acyclic graph (DAG). The PGM 210 can be used to formulate a distribution across structures of modularized networks, where the set of modules 220 can be a collection of reusable modules for composing modularized networks. For example, the PGM 210 may formulate a posterior distribution over structures of modularized networks by learning a data set. The formulated posterior distribution over structures of modularized networks can be regularized with domain knowledge.

Beispielsweise kann das PGM 210 einen Variations-Autoencoder (VAE) umfassen, wobei ein Encoder eines VAE eine variierende Posterior-Verteilung von Strukturen modularisierter Netzwerke formulieren kann, und ein Decoder des VAE eine generative Verteilung formulieren kann. Die formulierte variierende Posterior-Verteilung von Strukturen modularisierter Netzwerke durch den Encoder kann eine geschätzte Posterior-Verteilung von Strukturen modularisierter Netzwerke basierend auf dem beobachteten Datensatz sein. Die formulierte generative Verteilung durch den Decoder kann zur Rekonstruktion verwendet werden (wie über Route 4 von 8 veranschaulicht). In einigen Gesichtspunkten der vorliegenden Offenbarung kann ein Decoder im PGM 210 weggelassen werden. In anderen Gesichtspunkten der vorliegenden Offenbarung können sowohl ein Encoder als auch ein Decoder im PGM 210 vorhanden sein.For example, the PGM 210 may include a variational autoencoder (VAE), where an encoder of a VAE can formulate a varying posterior distribution of structures of modularized networks, and a decoder of the VAE can formulate a generative distribution. The encoder's formulated varying posterior distribution of modularized network structures may be an estimated posterior distribution of modularized network structures based on the observed data set. The formulated generative distribution by the decoder can be used for reconstruction (as via Route 4 of 8th illustrated). In some aspects of the present disclosure, a decoder may be omitted from the PGM 210. In other aspects of the present disclosure, both an encoder and a decoder may be included in the PGM 210.

Beispielsweise kann der Satz von Modulen 220 ein oder mehrere vorgefertigte neuronale Module umfassen, von denen jedes einen primitiven Schritt in einem Schlussfolgerungsprozess darstellt. Beispielsweise kann jedes Modul des Satzes von Modulen 220 als mehrschichtiges neuronales Netzwerk mit einem oder mehreren trainierbaren Parametern implementiert werden. In einem Gesichtspunkt der vorliegenden Offenbarung kann jedes Modul des Satzes von Modulen 220 dynamisch miteinander verbunden sein, um ein bestimmtes modularisiertes Netzwerk zu bilden, das verwendet werden kann, um einen gegebenen Satz von Eingaben auf die richtige Ausgabe abzubilden. In einem Gesichtspunkt der vorliegenden Offenbarung kann das PGM 210 verwendet werden, um modularisierte Netzwerke mit Strukturen zu erzeugen, die den einzelnen Eingaben entsprechen, um die jeweiligen grundlegenden Regeln innerhalb der einzelnen Eingaben vorherzusagen.For example, the set of modules 220 may include one or more prebuilt neural modules, each representing a primitive step in a reasoning process. For example, each module of the set of modules 220 can be implemented as a multilayer neural network with one or more trainable parameters. In one aspect of the present disclosure, each module of the set of modules 220 may be dynamically interconnected to form a particular modularized network that may be used to map a given set of inputs to the correct output. In one aspect of the present disclosure, the PGM 210 can be used to create modularized networks with structures corresponding to the individual inputs to predict the respective fundamental rules within the individual inputs.

3A und 38 veranschaulichen beispielhafte modularisierte Netzwerke mit unterschiedlichen Strukturen. Beispielsweise kann die Struktur des modularisierten Netzwerks als DAG dargestellt werden, das mit G = (v, A) bezeichnet wird, wobei v G M^d, v jeden Knoten (d. h. jedes Modul) der Struktur, M den Satz von Modulen 220, d die Größe der Struktur und A ∈ {0,1}^d×d die Adjazenzmatrix darstellt, die die Verbindungen zwischen den Modulen der Struktur darstellen kann. Beispielsweise kann die Anzahl der Scheitelpunkte des Graphen so spezifiziert werden, dass sie kleiner oder gleich einem Schwellenwert ist (z. B. d ≤ 4 oder 6 oder dergleichen), und jeder Scheitelpunkt kann mit einem bestimmten Modul aus dem Satz von Modulen 220 gefüllt werden. Beispielsweise kann der Satz von Modulen M 220 zehn von 0 bis 9 nummerierte Module einschließen, die als v₀, v₁, v₂, v₃, v₄, v₅, v₅, v₇, v₅, v₉ dargestellt werden können. 3A and 38 illustrate exemplary modularized networks with different structures. For example, the structure of the modularized network can be represented as a DAG, denoted by G = (v, A), where v GM ^d , v each node (i.e. each module) of the structure, M the set of modules 220, d the size of the structure and A ∈ {0,1} ^d×d represents the adjacency matrix, which can represent the connections between the modules of the structure. For example, the number of vertices of the graph may be specified to be less than or equal to a threshold (e.g., d ≤ 4 or 6 or the like), and each vertex may be filled with a particular module from the set of modules 220 . For example, the set of modules M 220 may include ten modules numbered 0 to 9, represented _as _v0 , _v1 , _v2 , _v3 , _v4 , v5, _v5 , _v7 , _v5 , _v9 can.

Als Beispiel kann die in 3A gezeigte Struktur die Module v₁, v₂, v₃, v₄ aufweisen, die jeweils in die Scheitelpunkte 310-1, 310-2, 310-4 und 310-3 gefüllt wurden, sowie eine Adjazenzmatrix $A = {\begin{matrix} 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{matrix}} .$

As an example, the in 3A The structure shown has the modules v ₁ , v ₂ , v ₃ , v ₄ , which were respectively filled into the vertices 310-1, 310-2, 310-4 and 310-3, as well as an adjacency matrix

A = {\begin{matrix} 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{matrix}} .

Als ein weiteres Beispiel kann die in 3B gezeigte Struktur die Module v₁, v₂, v₃, v₄ aufweisen, die jeweils in die Scheitelpunkte 310-1, 310-4, 310-3 und 310-2 gefüllt wurden, sowie eine Adjazenzmatrix $A = {\begin{matrix} 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}} .$

As another example, the in 3B The structure shown has the modules v ₁ , v ₂ , v ₃ , v ₄ , which were respectively filled into the vertices 310-1, 310-4, 310-3 and 310-2, as well as an adjacency matrix

A = {\begin{matrix} 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}} .

In einigen Gesichtspunkten der vorliegenden Offenbarung können die modularisierten Netzwerke mit den jeweiligen in 3A und 3B gezeigten Strukturen geeignet sein, unterschiedliche Regeln zu extrahieren, die in unterschiedlichen Sätzen von Eingaben enthalten sind. In einem Gesichtspunkt der vorliegenden Offenbarung kann das Netzwerk 200 oder 700 durch Trainieren eines Datensatzes, umfassend Sätze von Eingaben und Sätze von Ausgaben, die den jeweiligen Sätzen von Eingaben zugeordnet sind, Zuordnungen zwischen den Sätzen von Eingaben und entsprechenden Strukturen erlernen, die dazu verwendet werden können, die jeweiligen korrekten Ausgaben abzubilden. Beispielsweise kann eine Posterior-Verteilung von Strukturen modularisierter Netzwerke durch das PGM 210 erlernt und dazu verwendet werden, eine Struktur eines modularisierten Netzwerks für einen beliebigen Satz von Eingaben abzuleiten. In einem weiteren Gesichtspunkt der vorliegenden Offenbarung kann Domänenwissen bei der Erzeugung von Strukturen angewendet werden. Beispielsweise kann Domänenwissen auf die Posterior-Verteilung von Strukturen modularisierter Netzwerke angewendet werden, die durch das PGM 210 anhand des Datensatzes als eine oder mehrere posteriore Regularisierungsbeschränkungen gelernt wurden. Unter Zuhilfenahme des Domänenwissens kann die regularisierte Verteilung von Strukturen modularisierter Netzwerke verwendet werden, um eine präzise und interpretierbare Struktur für einen Satz von Eingaben zu erzeugen, die möglicherweise ausgeblendete Regeln innerhalb des Satzes von Eingaben darstellen.In some aspects of the present disclosure, the modularized networks with the respective in 3A and 3B The structures shown may be suitable for extracting different rules contained in different sets of inputs. In one aspect of the present disclosure, the network 200 or 700 may learn associations between the sets of inputs and corresponding structures used therefor by training a data set including sets of inputs and sets of outputs associated with the respective sets of inputs to display the correct outputs. For example, a posterior distribution of modularized network structures may be learned by the PGM 210 and used to derive a modularized network structure for any set of inputs. In another aspect of the present disclosure, domain knowledge may be applied in creating structures. For example, domain knowledge may be applied to the posterior distribution of structures of modularized networks learned by the PGM 210 from the dataset as one or more posterior regularization constraints. Using domain knowledge, the regularized distribution of structures of modularized networks can be used to produce a precise and interpretable structure for a set of inputs, potentially representing hidden rules within the set of inputs.

Ein Fachmann wird verstehen, dass auch andere Strukturen und andere Darstellungen für mindestens einen Teil des Satzes von Modulen 220 möglich sind.One skilled in the art will understand that other structures and other representations for at least a portion of the set of modules 220 are also possible.

4 zeigt ein beispielhaftes Flussdiagramm, das ein Verfahren 400 zum Durchführen einer abstrakten visuellen Schlussfolgerungsaufgabe mit einem probabilistischen neuronal-symbolischen Modell veranschaulicht, das gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung mit Domänenwissen regularisiert wird. Beispielsweise kann das Verfahren 400 durch das Netzwerk 200 und das Netzwerk 700 durchgeführt werden, die nachfolgend ausführlich beschrieben werden. So kann beispielsweise das Verfahren 400 auch durch andere Netzwerke, Systeme oder Modelle durchgeführt werden. 4 shows an example flowchart illustrating a method 400 for performing an abstract visual reasoning task with a probabilistic neural-symbolic model regularized with domain knowledge in accordance with one or more aspects of the present disclosure. For example, method 400 may be performed by network 200 and network 700, which are described in detail below. For example, the method 400 can also be carried out by other networks, systems or models.

In Block 410 können Sätze von Eingaben und Sätze von Ausgaben einem Netzwerk 200 oder 700 bereitgestellt werden, wobei jeder Satz von Eingaben der Sätze von Eingaben auf einen Satz von Ausgaben abgebildet werden kann, der dem Satz von Eingaben entspricht, basierend auf visuellen Informationen über den Satz von Eingaben. Die Sätze von Eingaben und die Sätze von Ausgaben können beispielsweise einen Trainingsdatensatz umfassen, wie den prozedural generierten Matrix (Procedurally Generated Matrice (PGM))-Datensatz oder den relationalen und analogen visuellen rEasoNing-Datensatz (RAVEN) oder dergleichen. Das Netzwerk 200, 700 kann ein probabilistisches generatives Modell (PGM) 210, 710 und einen Satz von Modulen 220, 720 umfassen.In block 410, sets of inputs and sets of outputs may be provided to a network 200 or 700, where each set of inputs of the sets of inputs may be mapped to a set of outputs corresponding to the set of inputs based on visual information about the Set of inputs. The sets of inputs and the sets of outputs may include, for example, a training data set, such as the Procedurally Generated Matrice (PGM) data set or the relational and analog visual rEasoNing data set (RAVEN), or the like. The network 200, 700 may include a probabilistic generative model (PGM) 210, 710 and a set of modules 220, 720.

Bei Block 420 kann durch das PGM 210, 710 basierend auf den bereitgestellten Sätzen von Eingaben und Sätzen von Ausgaben eine Posterior-Verteilung in Bezug auf den Satz von Modulen 220, 720 bestimmt werden. In einem Gesichtspunkt der vorliegenden Offenbarung kann eine Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen 220, 720 durch das PGM 210, 710 basierend auf den bereitgestellten Sätzen von Eingaben und Sätzen von Ausgaben bestimmt werden. In einem Beispiel können die Kombinationen eines oder mehrerer Module des Satzes von Modulen 220, 720 modularisierte Netzwerke umfassen, die aus einem oder mehreren Modulen des Satzes von Modulen 220, 720 zusammengesetzt sind, wobei die modularisierten Netzwerke Strukturen aufweisen können, die als G = (v, A) dargestellt werden können. In einem weiteren Beispiel können die Kombinationen eines oder mehrerer Module des Satzes von Modulen 220 beliebige Permutationen eines oder mehrerer Module aus dem Satz von Modulen 220 umfassen. Beispielsweise kann das PGM 210 ein VAE umfassen. Eine geschätzte Posterior-Verteilung über Strukturen von modularisierten Netzwerken kann durch einen Encoder des VAE basierend auf dem beobachteten Datensatz formuliert werden.At block 420, a posterior distribution may be determined by the PGM 210, 710 based on the sets of inputs and sets of outputs provided with respect to the set of modules 220, 720. In one aspect of the present disclosure, a posterior distribution over combinations of one or more modules of the set of modules 220, 720 may be determined by the PGM 210, 710 based on the sets of inputs and sets of outputs provided. In one example, the combinations of one or more modules of the set of modules 220, 720 may include modularized networks consisting of one or more modules of the set of modules 220, 720 are composed, whereby the modularized networks can have structures that can be represented as G = (v, A). In another example, the combinations of one or more modules of the set of modules 220 may include any permutations of one or more modules of the set of modules 220. For example, the PGM 210 may include a VAE. An estimated posterior distribution over structures of modularized networks can be formulated by an encoder of the VAE based on the observed data set.

In Block 430 kann das Domänenwissen auf die bestimmte Posterior-Verteilung des Satzes von Modulen 220 als eine oder mehrere posteriore Regularisierungsbeschränkungen angewendet werden. Beispielsweise kann ein regularisiertes Bayes'sches Rahmenwerk (RegBayes) verwendet werden, um menschliches Domänenwissen in Bayes'sche Verfahren durch direktes Anwenden von Beschränkungen auf die Posterior-Verteilung zu integrieren. Die Flexibilität von RegBayes kann die explizite Berücksichtigung von Domänenwissen ermöglichen, indem Wissen in beliebige Bayes'sche Modelle als weiche Beschränkungen integriert wird.In block 430, the domain knowledge may be applied to the particular posterior distribution of the set of modules 220 as one or more posterior regularization constraints. For example, a regularized Bayesian framework (RegBayes) can be used to integrate human domain knowledge into Bayesian methods by directly applying constraints on the posterior distribution. The flexibility of RegBayes can enable the explicit consideration of domain knowledge by incorporating knowledge into arbitrary Bayesian models as soft constraints.

Unter Zuhilfenahme des Domänenwissens kann das Verfahren 400 genutzt werden, um präzise und interpretierbare Strukturen für unterschiedliche Sätze von Eingaben zu erzeugen, da die erzeugten Strukturen verborgene Regeln zwischen den Sätzen von Eingaben erfassen können.Using domain knowledge, method 400 can be used to generate precise and interpretable structures for different sets of inputs because the generated structures can capture hidden rules between sets of inputs.

Ein Fachmann wird verstehen, dass auch andere probabilistische generative Modelle möglich sind und andere Verteilungen in Bezug auf den Satz von Modulen 220 möglich sein können.One skilled in the art will understand that other probabilistic generative models are also possible and other distributions may be possible with respect to the set of modules 220.

In einem Gesichtspunkt der vorliegenden Offenbarung können eine oder mehrere posteriore Regularisierungsbeschränkungen eine oder mehrere Beschränkungen der Logik erster Ordnung (FOL) umfassen, die möglicherweise Domänenwissen enthalten. Beispielsweise kann eine Beschränkungsfunktion aus Berechnungen der Logik erster Ordnung über jede der Strukturen und jeden der Sätze von Eingaben bestehen. Insbesondere nimmt jede Beschränkungsfunktion jede der Strukturen und jeden der Sätze von Eingaben als Eingabe und berechnet den entworfenen Ausdruck der Logik erster Ordnung als Ausgabe. Die Ausgabe der Beschränkungsfunktion kann einen Wert in einem Bereich von [0, 1] annehmen, der den Grad angibt, in dem die Eingabe jeder der Strukturen und jeder der Sätze von Eingaben einer spezifischen Anforderung entspricht, wobei ein niedrigerer Wert eine stärkere Übereinstimmung zeigen kann. Daher kann das Netzwerk 200 durch Minimieren von Werten solcher Beschränkungsfunktionen während der Optimierung der Posterior-Verteilung von Strukturen lernen, Strukturen zu erzeugen, die dem angewendeten Domänenwissen entsprechen können.In one aspect of the present disclosure, one or more posterior regularization constraints may include one or more first-order logic (FOL) constraints that may include domain knowledge. For example, a constraint function may consist of first-order logic calculations over each of the structures and each of the sets of inputs. Specifically, each constraint function takes each of the structures and each of the sets of inputs as input and calculates the designed first-order logic expression as output. The output of the constraint function may take a value in a range of [0, 1] indicating the degree to which the input of each of the structures and each of the sets of inputs conforms to a specific requirement, where a lower value may indicate a stronger match . Therefore, by minimizing values of such constraint functions while optimizing the posterior distribution of structures, the network 200 can learn to generate structures that can correspond to the applied domain knowledge.

In einem weiteren Gesichtspunkt der vorliegenden Offenbarung kann es vorteilhaft sein, innere Zusammenhänge zwischen Beschränkungen zu berücksichtigen. Beschränkungen, die unterschiedliche Gesichtspunkte des Domänenwissens berücksichtigen, können unabhängig voneinander sein. Andererseits können Beschränkungen, die auf unterschiedliche Knoten einer Struktur angewendet werden, aber den gleichen Gesichtspunkt des Domänenwissens teilen, miteinander korreliert werden. Dementsprechend können die Beschränkungen, die den gleichen Gesichtspunkt des Domänenwissens teilen, in eine Gruppe von Beschränkungen gruppiert werden. Beispielsweise können insgesamt L Gruppen von Beschränkungen vorgeschlagen werden, wobei jede Gruppe einem bestimmten Schlussfolgerungstyp entspricht, einschließlich der booleschen logischen Schlussfolgerung, der zeitlichen Schlussfolgerung, der räumlichen Schlussfolgerung, der arithmetischen Schlussfolgerung und dergleichen.In another aspect of the present disclosure, it may be advantageous to consider internal connections between constraints. Constraints that take into account different aspects of domain knowledge can be independent of each other. On the other hand, constraints applied to different nodes of a structure but sharing the same domain knowledge viewpoint can be correlated with each other. Accordingly, the constraints that share the same domain knowledge viewpoint can be grouped into a group of constraints. For example, a total of L groups of constraints may be proposed, each group corresponding to a particular type of inference, including Boolean logical inference, temporal inference, spatial inference, arithmetic inference, and the like.

In einem weiteren Gesichtspunkt der vorliegenden Offenbarung können die eine oder die mehreren FOL-Beschränkungen basierend auf einer oder mehreren Eigenschaften eines jeden Satzes von Eingaben erzeugt werden. Beispielsweise kann in einem prozedural generierten Matrix (Procedurally Generated Matrices, PGM)-Datensatz jedes Paar eines Satzes von Eingaben und des entsprechenden Satzes von Ausgaben eine oder mehrere Regeln aufweisen, wobei jede Regel als Tripel dargestellt werden kann, $T = {[r, o, a] : r \in R, o \in O, a \in A},$

die aus den folgenden primitiven Sätzen gesammelt wird:

• Beziehungstypen: (
mit Elementen r): Progression, XOR, OR, AND, konsistente Vereinigung
• Objekttypen: (
mit Elementen o): Form, Linie
• Attributtypen: (
mit Elementen a): Größe, Typ, Farbe, Position, Nummer

In another aspect of the present disclosure, the one or more FOL constraints may be generated based on one or more characteristics of each set of inputs. For example, in a Procedurally Generated Matrices (PGM) data set, each pair of a set of inputs and the corresponding set of outputs may have one or more rules, where each rule may be represented as a triple,

T = {[r, O, a] : r \in R, O \in O, a \in A},

which is collected from the following primitive sentences:

• Relationship types: (
with elements r): progression, XOR, OR, AND, consistent union
• Object types: (
with elements o): shape, line
• Attribute types: (
with elements a): size, type, color, position, number

Diese Tripel können abstrakte Schlussfolgerungsregeln bestimmen durch einen bestimmten Satz von Eingaben und die entsprechende korrekte Ausgabe. Beispielsweise, wenn

das Tripel [Progression, Form, Farbe] enthält, kann der Satz von Eingaben und die entsprechende korrekte Ausgabe eine progressive Beziehung aufweisen, die sich auf die Farbe (z. B. die Graustufenintensität) von Formen bezieht. Beispielsweise kann jeder Attributtyp

a \in A

(z. B. Farbe) einen von einer endlichen Anzahl diskreter Werte z ∈ Z annehmen (z. B. 10 Ganzzahlen zwischen [0, 255] für die Graustufenintensität). Daher kann eine gegebene Regel

eine Vielzahl von Realisierungen abhängig von den Werten für die Attributtypen aufweisen, aber alle diese Realisierungen können derselben grundlegenden abstrakten Regel unterliegen. Auswahl von r kann die zu realisierenden Werte von z einschränken. Wenn beispielsweise r eine Progression ist, können die Werte von z entlang der Zeilen oder Spalten in der Matrix von Eingabebildfeldern zunehmen und nach dieser Regel mit unterschiedlichen Werten variieren.These triples can determine abstract inference rules by a given set of inputs and the corresponding correct output. For example, if

contains the triple [progression, shape, color], the set of inputs and the corresponding correct output may have a progressive relationship related to the color (e.g. grayscale intensity) of shapes. For example, any attribute type

a \in A

(e.g. color) take one of a finite number of discrete values z ∈ Z (e.g. 10 integers between [0, 255] for grayscale intensity). Therefore, a given rule can

have a variety of realizations depending on the values for the attribute types, but all of these realizations can be governed by the same basic abstract rule. Choosing r can limit the values of z that can be realized. For example, if r is a progression, the values of z can increase along the rows or columns in the matrix of input image patches and vary with different values according to this rule.

In einem Gesichtspunkt der vorliegenden Offenbarung können die eine oder die mehreren FOL-Beschränkungen basierend auf mindestens einem von Beziehungstypen, Objekttypen oder Attributtypen der Sätze von Eingaben erzeugt werden. Beispielsweise kann eine beispielhafte Formation einer FOL-Beschränkung gegeben sein durch: $Φ_{j} (G, x) : = 1 - 1 [v_{j} \in S (x)]$

In one aspect of the present disclosure, the one or more FOL constraints may be generated based on at least one of relationship types, object types, or attribute types of the sets of inputs. For example, an exemplary formation of a FOL constraint may be given by:

Φ_{j} (G, x) : = 1 - 1 [v_{j} \in S (x)]

Wobei 1 [•] die Indikatorfunktion ist und v_j ∈ s(x) wahr ist, wenn die semantische Darstellung von v_j zu finden ist in S(x). Wobei S(x) semantische Attribute eines Satzes von Eingaben x sind, die von einem oder mehreren Tripeln $T {[r, o, a]}$

des Satzes von Eingaben x extrahiert werden können. Wobei der j-te Knoten in der Struktur G bezeichnet wird durch v_j.Where 1 [•] is the indicator function and v _j ∈ s(x) is true if the semantic representation of v _j is found in S(x). Where S(x) are semantic attributes of a set of inputs x composed of one or more triples

T {[r, O, a]}

of the set of inputs x can be extracted. Where the jth node in the structure G is denoted by v _j .

In einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung kann eine Gruppe von FOL-Beschränkungen erzeugt werden, basierend auf einem oder mehreren Tripeln $T {[r, o, a]}$

des Satzes von Eingaben x, gemäß einem bestimmten Gesichtspunkt des Domänenwissens, wie logische Schlussfolgerung, zeitliche Schlussfolgerung, räumliche Schlussfolgerung oder arithmetische Schlussfolgerung und dergleichen. Beispielsweise kann die logische Schlussfolgerung logische UND, ODER, XOR oder dergleichen umfassen. Beispielsweise kann die arithmetische Schlussfolgerung arithmetische ADD, SUB, MUL und dergleichen umfassen. Beispielsweise kann die räumliche Schlussfolgerung STRUC (Struktur) umfassen, z. B. zum Ändern der Berechnungsregeln von Eingabemodulen und dergleichen. Beispielsweise kann die zeitliche Schlussfolgerung PROG (Fortschritt), ID (Identisch) und dergleichen umfassen.In one or more aspects of the present disclosure, a set of FOL constraints may be generated based on one or more triples

T {[r, O, a]}

of the set of inputs x, according to a certain aspect of domain knowledge, such as logical reasoning, temporal reasoning, spatial reasoning or arithmetic reasoning and the like. For example, the logical conclusion may include logical AND, OR, XOR, or the like. For example, the arithmetic conclusion may include arithmetic ADD, SUB, MUL, and the like. For example, spatial inference may include STRUC (structure), e.g. B. for changing the calculation rules of input modules and the like. For example, the temporal conclusion may include PROG (Progress), ID (Identical), and the like.

In einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung kann eine Gruppe von FOL-Beschränkungen, die gemäß einem bestimmten Gesichtspunkt des Domänenwissens erzeugt werden, auf jeden der Knoten einer Struktur angewendet werden. Beispielsweise können Beschränkungen in der Gruppe eine FOL-Regel für alle Knoten der Struktur durchführen, die einen bestimmten Gesichtspunkt des Domänenwissens überprüfen kann.In one or more aspects of the present disclosure, a set of FOL constraints generated according to a particular aspect of domain knowledge may be applied to each of the nodes of a structure. For example, constraints in the group can perform a FOL rule on all nodes of the tree, which can check a specific aspect of domain knowledge.

Ein Fachmann wird verstehen, dass der eine oder die mehreren der vorstehend beschriebenen Gesichtspunkte durch das Netzwerk 200, 700 oder andere Netzwerke, Systeme oder Modelle durchgeführt werden können.One skilled in the art will understand that one or more of the aspects described above may be performed by network 200, 700 or other networks, systems or models.

In einem Beispiel können in dem beispielhaften Flussdiagramm von Verfahren 400 Schlussfolgerungsaufgaben durchgeführt werden, indem trainierbare Parameter von PGM 210, 710 und Modulen des Satzes von Modulen 220, 720 optimiert werden, um den Vorhersageverlust über beobachtete Stichproben zu minimieren, wie durch das folgende Ziel formuliert: ${min}_{φ} m i n_{θ} l_{e r r} (φ, θ) : = \sum_{D} \sum_{G \sim q_{φ}} [- l o g p_{n e t} (y_{n} | x_{n}, G, θ)]$

In one example, in the example flowchart of method 400, inference tasks may be performed by optimizing trainable parameters of

PGM

210, 710 and modules of the set of

modules

220, 720 to minimize prediction loss over observed samples, as formulated by the following objective :

{min}_{φ} m i n_{θ} l_{e r r} (φ, θ) : = \sum_{D} \sum_{G \sim q_{φ}} [- l O G p_{n e t} (y_{n} | x_{n}, G, θ)]

Wobei φ trainierbare Parameter im PGM 210,710 bezeichnet, ϑ trainierbare Parameter von Modulen des Satzes von Modulen 220,720 bezeichnet und D = {(x_n, y_n)}_n=1:N einen Datensatz umfasst, der die n-te Eingabe x_n, zugeordnet zur Ausgabe y_n, bezeichnet.Where φ denotes trainable parameters in the PGM 210,710, ϑ denotes trainable parameters of modules of the set of modules 220,720 and D = {(x _n , y _n )} _n=1:N comprises a data set containing the nth input x _n , assigned to the output y _n , denoted.

In einem Gesichtspunkt der vorliegenden Offenbarung kann das Netzwerk 200, 700 ein PGM 210, 710 nutzen, um eine generative Verteilung p_φ(x|G) und eine Variationsverteilung q_φ(G|x) darzustellen. Beispielsweise kann ein Encoder einer VAE die Variationsverteilung q_φ(G|x), darstellen, und ein Decoder der VAE kann die generative Verteilung p_φ(x|G) darstellen. Insbesondere durch Optimieren der Formulierung (2) wird eine geschätzte Posterior-Verteilung der Strukturen p̃_φ0 (G|x) und der entsprechenden Modulparameter ϑ₀ erhalten. In one aspect of the present disclosure, the network 200, 700 may utilize a PGM 210, 710 to represent a generative distribution p _φ (x|G) and a variational distribution q _φ (G|x). For example, an encoder of a VAE can represent the variation distribution q _φ (G|x), and a decoder of VAE can represent the generative distribution p _φ (x|G). In particular, by optimizing formulation (2), an estimated posterior distribution of the structures p̃ _φ ₀ (G|x) and the corresponding module parameter ϑ ₀ are obtained.

In einem weiteren Gesichtspunkt der vorliegenden Offenbarung können eine oder mehrere FOL-Beschränkungen zur Regularisierung angewendet werden, um die neue Posterior-Verteilung der Strukturen (l)darzustellen. Formal lässt sich das Gesamtziel formulieren als: $\begin{matrix} {min}_{φ, ξ, η} m i n_{θ} l_{e r r} (φ, θ) + C_{1} \sum_{i = 1}^{L} ξ_{i} + C_{2} η, \\ \begin{matrix} s .t . \forall i, & E_{x_{n} \in D} | E_{G \sim q_{φ}} [\sum_{j = 1}^{T_{i}} Φ_{i j} (G, x_{n})] | \leq ξ_{i} + ε, \end{matrix} \\ K L [q_{φ} (G | x) ‖ {\tilde{p}}_{φ_{0}} (G | x)] \leq η + ε, \\ Wobei φ_{0} = a r g m i n_{φ} l_{e r r} (φ; θ) \end{matrix}$

In another aspect of the present disclosure, one or more FOL regularization constraints may be applied to represent the new posterior distribution of the structures (l). Formally, the overall goal can be formulated as:

\begin{matrix} {min}_{φ, ξ, η} m i n_{θ} l_{e r r} (φ, θ) + C_{1} \sum_{i = 1}^{L} ξ_{i} + C_{2} η, \\ \begin{matrix} s .t . \forall i, & E_{x_{n} \in D} | E_{G \sim q_{φ}} [\sum_{j = 1}^{T_{i}} Φ_{i j} (G, x_{n})] | \leq ξ_{i} + ε, \end{matrix} \\ K L [q_{φ} (G | x) ‖ {\tilde{p}}_{φ_{0}} (G | x)] \leq η + ε, \\ Where φ_{0} = a r G m i n_{φ} l_{e r r} (φ; θ) \end{matrix}

Wobei q_φ(G|x) die regularisierte Posterior-Verteilung der Strukturen ist, p̃_φ0(G|x) die geschätzte Posterior-Verteilung der Strukturen ist, gegeben durch Optimieren der Formulierung (2), ξi=1:L ≥ 0 und η ≥ 0 sind Schlupfvariablen mit entsprechenden Regularisierungsparametern C₁ und C₂, und e ist ein kleiner positiver Präzisionsparameter.Where q _φ (G|x) is the regularized posterior distribution of the structures, p̃ _φ ₀ (G|x) is the estimated posterior distribution of the structures, given by optimizing formulation (2), ξi=1:L ≥ 0 and η ≥ 0 are slack variables with corresponding regularization parameters C ₁ and C ₂ , and e is a small positive precision parameter.

Die Φc_ij (G, x_n) Funktionen in Formulierung (3), deren Werte durch die Schlupfvariablen begrenzt werden können, sind FOL-Beschränkungen. In einem Beispiel kann jede Beschränkungsfunktion einen Wert im Bereich von [0,1] annehmen, wobei ein kleinerer Wert eine bessere Übereinstimmung zwischen der Struktur G und der Eingabe x_n gemäß dem Domänenwissen bezeichnen kann. Es ist zu beachten, dass Beschränkungsfunktionen L Gruppen bilden können, anstatt unabhängig voneinander zu sein. Die i-te Gruppe kann T_i korrelierende Beschränkungen umfassen, die einer gemeinsamen Schlupfvariablen (i entsprechen können.The Φc _ij (G, x _n ) functions in formulation (3), whose values can be bounded by the slack variables, are FOL constraints. In an example, each constraint function may take a value in the range [0,1], where a smaller value may denote a better match between the structure G and the input x _n according to the domain knowledge. Note that constraint functions can form L groups instead of being independent of each other. The i-th group may include T _i correlated constraints, which may correspond to a common slack variable (i.

Während das Hauptziel von Formulierung (3) darin bestehen kann, den Aufgabenverlust ℓ_err, zu minimieren, können die Schlupfvariablen ξi=1:L in der Formulierung die FOL-Beschränkungen berücksichtigen. Der Prozess der Strukturerzeugung kann mit dem angewendeten Domänenwissen regularisiert werden. Um das Minimum des Gesamtziels zu erreichen, kann das Netzwerk 200, 700 lernen, Strukturen zu erzeugen, die den angewendeten FOL-Beschränkungen gerecht werden. Darüber hinaus kann die KL-Divergenz zwischen q_φ(G|x) und p̃_φ0(G|x) als zusätzliche Beschränkung betrachtet werden, die verhindern kann, dass das Netzwerk 200 oder 700 übermäßig auf das Domänenwissen reagiert.While the main goal of formulation (3) may be to minimize the task loss, ℓ _err , the slack variables ξi=1:L in the formulation can take into account the FOL constraints. The process of structure generation can be regularized with the applied domain knowledge. To achieve the minimum of the overall goal, the network 200, 700 can learn to produce structures that satisfy the applied FOL constraints. Furthermore, the KL divergence between q _φ (G|x) and p̃ _φ ₀ (G|x) can be considered an additional constraint that can prevent the network 200 or 700 from being overly responsive to the domain knowledge.

Außerdem können eine oder mehrere zusätzliche Beschränkungen hinzugefügt werden, und eine oder mehrere der vorstehend beschriebenen beispielhaften Beschränkungen können weggelassen werden.Additionally, one or more additional restrictions may be added and one or more of the example restrictions described above may be omitted.

5 veranschaulicht ein beispielhaftes Flussdiagramm, das einen Optimierungsprozess 500 für die Formulierung (3) gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung darstellt. Beispielsweise kann der Prozess 500 durch das Netzwerk 200, das Netzwerk 700, das nachfolgend ausführlich beschrieben wird, oder andere Netzwerke, Systeme, Modelle oder dergleichen durchgeführt werden. 5 illustrates an example flowchart depicting an optimization process 500 for formulation (3) according to one or more aspects of the present disclosure. For example, process 500 may be performed by network 200, network 700, described in detail below, or other networks, systems, models, or the like.

In Block 510 können Parameter des PGM 210, 710 und Parameter von Modulen des Satzes von Modulen 220, 720 alternativ durch Maximieren von Evidenzen der Sätze von Eingaben und der Sätze von Ausgaben aktualisiert werden, um eine geschätzte Posterior-Verteilung über die Kombinationen von einem oder mehreren Sätzen von Modulen des Satzes von Modulen 220, 720 und optimierten Parametern der Module des Satzes von Modulen 220, 720 zu erhalten.In block 510, parameters of the PGM 210, 710 and parameters of modules of the set of modules 220, 720 may alternatively be updated by maximizing evidence of the sets of inputs and the sets of outputs to obtain an estimated posterior distribution over the combinations of one or more multiple sets of modules of the set of modules 220, 720 and optimized parameters of the modules of the set of modules 220, 720.

In Block 520 können eine oder mehrere Gewichtungen von einer oder mehreren posterioren Regularisierungsbeschränkungen, die auf die geschätzte Posterior-Verteilung über die Kombinationen von einem oder mehreren Modulen des Satzes von Modulen 220, 720 angewendet werden, aktualisiert werden, um eine oder mehrere optimale Lösungen der einen oder mehreren Gewichtungen zu erhalten.In block 520, one or more weights of one or more posterior regularization constraints applied to the estimated posterior distribution over the combinations of one or more modules of the set of modules 220, 720 may be updated to provide one or more optimal solutions of the to obtain one or more weights.

In Block 530 kann die geschätzte Posterior-Verteilung über die Kombinationen von einem oder mehreren Modulen des Satzes von Modulen 220, 720 durch Anwenden der einen oder der mehreren optimalen Lösungen der einen oder der mehreren Gewichtungen und der einen oder der mehreren Werte der einen oder der mehreren Beschränkungen auf die geschätzte Posterior-Verteilung angepasst werden.In block 530, the estimated posterior distribution over the combinations of one or more modules of the set of modules 220, 720 may be determined by applying the one or more optimal solutions of the one or more weights and the one or more values of the one or more several constraints on the estimated posterior distribution.

In Block 540 können die optimierten Parameter der Module des Satzes von Modulen 220, 720 basierend auf der angepassten geschätzten Posterior-Verteilung über die Kombinationen von einem oder mehreren Modulen des Satzes von Modulen 220, 720 aktualisiert werden, um in die aktualisierte Strukturverteilung zu passen.In block 540, the optimized parameters of the modules of the set of modules 220, 720 may be updated based on the adjusted estimated posterior distribution across the combinations of one or more modules of the set of modules 220, 720 to fit the updated structural distribution.

In einem Beispiel, angenommen ϑ ist fest, kann das Ziel des probabilistischen generativen Modells durch Maximieren der Evidenz der beobachteten Datenproben gegeben sein, was geschrieben werden kann als: $\begin{matrix} {min}_{φ} l_{p r o b} (φ, θ) : = \sum_{n} - l o g p (x_{n}, y_{n}) \\ = \sum_{n} - [l o g p (x_{n}) + l o g p (y_{n} | x_{n})] \\ \approx \sum_{n} - E_{G \sim q_{φ}} [l o g p_{φ} (x_{n} | G) - β l o g p_{φ} (G | x_{n}) + β l o g p (G) + γ l o g p_{n e t} (y_{n} | x_{n}, G, θ)], \end{matrix}$

In an example, assuming ϑ is fixed, the goal of the probabilistic generative model may be given by maximizing the evidence of the observed data samples, which can be written as:

\begin{matrix} {min}_{φ} l_{p r O b} (φ, θ) : = \sum_{n} - l O G p (x_{n}, y_{n}) \\ = \sum_{n} - [l O G p (x_{n}) + l O G p (y_{n} | x_{n})] \\ \approx \sum_{n} - E_{G \sim q_{φ}} [l O G p_{φ} (x_{n} | G) - β l O G p_{φ} (G | x_{n}) + β l O G p (G) + γ l O G p_{n e t} (y_{n} | x_{n}, G, θ)], \end{matrix}

Wobei der Skalierungs-Hyperparameter die Vorhersagewahrscheinlichkeit ist und ein konstanter Parameter ist, der β > 1 erfüllt. Da ℓ_prob(φ, θ) für den Erwartungswert E_G~_qφ, möglicherweise nicht differenzierbar ist, kann der REINFORCE-Algorithmus angewendet werden, um einen geschätzten Gradienten für die Aktualisierungen zu erhalten. Aktualisierungen von können direkt mit Gradienten berechnet werden.Where the scaling hyperparameter is the prediction probability and is a constant parameter that satisfies β > 1. Since ℓ _prob (φ, θ) for the expected value E _G ~ _q _φ , may not be differentiable, the REINFORCE algorithm can be applied to obtain an estimated gradient for the updates. Updates of can be calculated directly using gradients.

Angenommen, die PGM 210, 710-Parameter haben das Optimum erreicht, kann das Optimieren des Prozesses über ϑ zum Optimieren der Ausführungsleistung des Netzwerks werden, was geschrieben werden kann als: $m i n_{θ} l_{e r r} (φ, θ) = \sum_{D} \sum_{G \sim q_{φ}} [- l o g p_{n e t} (y_{n} | x_{n}, G, θ)]$

Assuming that the

PGM

210, 710 parameters have reached the optimum, optimizing the process via ϑ can become optimizing the execution performance of the network, which can be written as:

m i n_{θ} l_{e r r} (φ, θ) = \sum_{D} \sum_{G \sim q_{φ}} [- l O G p_{n e t} (y_{n} | x_{n}, G, θ)]

Der Gradient ∇_θ ℓ_err(φ, θ) kann mit stochastischem Gradientenabstieg (SGD) geschätzt werden, wobei die Struktur G während des Trainings erfasst wird.The gradient ∇ _θ ℓ _err (φ, θ) can be estimated using stochastic gradient descent (SGD), where the structure G is captured during training.

Angenommen, die Ergebnisse des vorstehenden Optimierungsvorgangs in Bezug auf Formulierung (2) werden mit φ₀ und θ₀, bezeichnet, und die geschätzte Posterior-Verteilung der Strukturen kann mit p̃_φ0 (G|x). bezeichnet werden. Um eine angenäherte Lösung für Formulierung (3) zu erhalten, kann φ0 als fest betrachtet werden, und das Ziel kann in eine RegBayes-Formation transformiert werden, die geschrieben werden kann als: $\begin{matrix} m i n_{φ, ξ, η} K L [q_{φ} (G | x) ‖ {\tilde{p}}_{φ_{0}} (G | x)] + C \sum_{i = 1}^{L} ξ_{i}, \\ s .t . E_{x_{n} \in D} | E_{G \sim q_{φ}} [\sum_{j = 1}^{T_{i}} Φ_{i j} (G, x_{n})] | \leq ξ_{i} + ε, \end{matrix}$

Suppose that the results of the above optimization procedure with respect to formulation (2) are denoted by φ ₀ and θ ₀ , and the estimated posterior distribution of the structures can be denoted by p̃ _φ ₀ (G|x). be referred to. To obtain an approximate solution to formulation (3), φ0 can be considered fixed, and the target can be transformed into a RegBayes formation, which can be written as:

\begin{matrix} m i n_{φ, ξ, η} K L [q_{φ} (G | x) ‖ {\tilde{p}}_{φ_{0}} (G | x)] + C \sum_{i = 1}^{L} ξ_{i}, \\ s .t . E_{x_{n} \in D} | E_{G \sim q_{φ}} [\sum_{j = 1}^{T_{i}} Φ_{i j} (G, x_{n})] | \leq ξ_{i} + ε, \end{matrix}

In einem Gesichtspunkt der vorliegenden Offenbarung kann ein durch die Konvexanalyse eingeführtes duales Problem zur Lösung der Formulierung (6) angewendet werden. Daher kann durch das Einführen von Variablen des dualen Problems, µ, eine optimale Verteilung des RegBayes-Ziels durch folgende Formulierung erhalten werden: $q_{φ} (G | x; μ *) = \frac{{\tilde{p}}_{φ_{0}} (G | x)}{Z (μ *)} e x p (\sum_{i = 1}^{L} μ * | Φ_{[i]}^{(D)} (G, x))$

In one aspect of the present disclosure, a dual problem introduced by convex analysis may be applied to solve formulation (6). Therefore, by introducing variables of the dual problem, µ, an optimal distribution of the RegBayes objective can be obtained by the following formulation:

q_{φ} (G | x; μ *) = \frac{{\tilde{p}}_{φ_{0}} (G | x)}{Z (μ *)} e x p (\sum_{i = 1}^{L} μ * | Φ_{[i]}^{(D)} (G, x))

Wobei $Φ_{[i]}^{(D)} (G, x)$

die gruppierte Summierung der FOL-Beschränkungen in der i-ten Gruppe ist,

Φ_{[i]}^{(D)} (G, x) : = \sum_{j = 1}^{T_{i}} Φ_{i j}^{(D)} (G, x)

Where

Φ_{[i]}^{(D)} (G, x)

is the grouped summation of the FOL constraints in the i-th group,

Φ_{[i]}^{(D)} (G, x) : = \sum_{j = 1}^{T_{i}} Φ_{i j}^{(D)} (G, x)

Wobei jeder $Φ_{i j}^{(D)} (G, x)$

ein Erwartungswert über beobachtete Proben für die entsprechende Beschränkungsfunktion ist,

Φ_{i j}^{(D)} (G, x) : = E_{x_{n} \in D} [Φ_{i j} (G, x_{n})]

Whereby everyone

Φ_{i j}^{(D)} (G, x)

is an expected value over observed samples for the corresponding constraint function,

Φ_{i j}^{(D)} (G, x) : = E_{x_{n} \in D} [Φ_{i j} (G, x_{n})]

Z (µ*) der Normalisierungsfaktor für q_φ ist, wobei µ* die optimale Lösung des dualen Problems ist: $\begin{array}{l} m a x_{μ} L (μ) = - l o g Z (μ) - ε \sum_{i = 1}^{L} μ_{i}, \\ s . t . | μ_{i} | \leq C, \forall i = 1,2, \dots ., L \end{array}$

wobei C und E Hyperparameter in Formulierung (3) sind.Z (µ*) is the normalization factor for q _φ , where µ* is the optimal solution to the dual problem:

\begin{array}{l} m a x_{μ} L (μ) = - l O G Z (μ) - ε \sum_{i = 1}^{L} μ_{i}, \\ s . t . | μ_{i} | \leq C, \forall i = 1.2, \dots ., L \end{array}

where C and E are hyperparameters in formulation (3).

Die Optimierung des dualen Problems (10) kann mit einem angenäherten stochastischen Gradientenabstiegsverfahren (SGD) verarbeitet werden. Insbesondere kann der Gradient angenähert werden als: $\partial_{μ_{i}} log Z (μ) = \sum_{G} q_{φ} (G | x) Φ_{[i]}^{(D)} (G, x) \approx {\hat{Φ}}_{[i]} (G, x), \forall i = 1,2, \dots, L$

The optimization of the dual problem (10) can be processed using an approximate stochastic gradient descent (SGD) method. In particular, the gradient can be approximated as:

\partial_{μ_{i}} log Z (μ) = \sum_{G} q_{φ} (G | x) Φ_{[i]}^{(D)} (G, x) \approx {\hat{Φ}}_{[i]} (G, x), \forall i = 1.2, \dots, L

Wobei die erste Gleichung auf die Dualität zurückzuführen ist und die Annäherung darin besteht, den Erwartungswert zu schätzen, Φ̂_[i](G,x), der durch gleichmäßiges Abtasten der beobachteten Proben und Berechnen der Beschränkungsfunktionswerte gegeben sein kann. Insbesondere können die Aktualisierungen µ_i gegeben sein durch die SGD-Regel: $μ_{i}^{(t + 1)} = P r o j_{[- C, C]} (μ_{i}^{(t)} + r_{t} (- \partial_{μ_{i}} l o g Z (μ) + ε))$

Where the first equation is due to duality and the approach is to estimate the expected value, Φ̂ _[i] (G,x), which can be given by uniformly sampling the observed samples and calculating the constraint function values. In particular, the updates µ _i can be given by the SGD rule:

μ_{i}^{(t + 1)} = P r O j_{[- C, C]} (μ_{i}^{(t)} + r_{t} (- \partial_{μ_{i}} l O G Z (μ) + ε))

Wobei Proj_[-C,C] die euklidische Projektion der Eingabe auf [-C, C] bezeichnet und r_t die Schrittlänge ist. Nach dem Lösen von µ* kann die regularisierte Posterior-Verteilung der Strukturen q_φ(G|x) gegeben sein durch die Formulierung (7). Die Modulparameter ϑ können ferner optimiert werden, damit sie in die aktualisierte Strukturverteilung passen.Where Proj _[-C,C] denotes the Euclidean projection of the input onto [-C, C] and r _t is the step length. After solving µ*, the regularized posterior distribution of the structures q _φ (G|x) can be given by the formulation (7). The module parameters ϑ can be further optimized to fit the updated structure distribution.

In einem Beispiel kann die Gesamtpipeline des beispielhaften Optimierungsprozesses 500 in Algorithmus 1 dargestellt werden.In one example, the overall pipeline of the example optimization process 500 may be illustrated in Algorithm 1.

Algorithmus 1:

♦ Zufälliges Initialisieren von ϑ, φ und µ
♦ Bei Konvergenz mit
1. 1) Satz ϑ ist fest, Gradient ∇ℓ_prob(, ϑ) wird berechnet, um φ gemäß Formulierung (4) zu aktualisieren;
2. 2) Satz q ist fest, Gradient ∇_ϑ ℓ_err(, ϑ) wird berechnet, um ϑ gemäß Formulierung (5) zu aktualisieren;
♦ Ende
♦ kann φ₀ das Ergebnis des vorstehenden Verfahrens bezeichnen;
♦ Bei Konvergenz mit
- 3) Aktualisieren von µ gemäß dem dualen Problem (10), wobei die Aktualisierungen in der Formulierung (12) gegeben sind;
♦ Ende
♦ 4) Berechnen von q (G|x) in Formulierung (7) mit φ₀ und µ*;
♦ Bei Konvergenz mit
- 5) Berechnen des Gradienten ∇_ϑ ℓ_err(, ϑ) um ϑ gemäß Formulierung (5) zu aktualisieren;
♦ Ende

Algorithm 1:

♦ Random initialization of ϑ, φ and µ
♦ When convergent with
1. 1) Set ϑ is fixed, gradient ∇ℓ _prob (, ϑ) is calculated to update φ according to formulation (4);
2. 2) Set q is fixed, gradient ∇ _ϑ ℓ _err (, ϑ) is calculated to update ϑ according to formulation (5);
♦ End
♦ φ ₀ can denote the result of the above procedure;
♦ When convergent with
- 3) updating µ according to the dual problem (10), where the updates are given in the formulation (12);
♦ End
♦ 4) Calculate q (G|x) in formulation (7) with φ ₀ and µ*;
♦ When convergent with
- 5) Calculate the gradient ∇ _ϑ ℓ _err (, ϑ) to update ϑ according to formulation (5);
♦ End

Wobei µ als Gewichtung der FOL-Beschränkungen betrachtet werden kann. In einem Gesichtspunkt der vorliegenden Offenbarung können eine oder mehrere FOL-Beschränkungen in eine oder mehrere Gruppen von FOL-Beschränkungen gruppiert werden, und die gruppierten FOL-Beschränkungen können zusammen nur einer Gewichtung entsprechen. Wie in Schritt 3) von Algorithmus 1 veranschaulicht, muss der Optimierungsprozess 500 möglicherweise mehrere Iterationsberechnungen durchführen, um jede der Gewichtungen zu aktualisieren, bis er konvergiert. Die gruppierten FOL-Beschränkungen können die Anzahl der Gewichtungen reduzieren, was dementsprechend Rechenressourcen einsparen kann.Where µ can be viewed as the weight of the FOL constraints. In one aspect of the present disclosure, one or more FOL constraints may be grouped into one or more groups of FOL constraints, and the grouped FOL constraints may together correspond to only one weight. As illustrated in step 3) of Algorithm 1, the optimization process 500 may need to perform multiple iteration calculations to update each of the weights until it converges. The grouped FOL constraints can reduce the number of weights, which can accordingly save computing resources.

In einem weiteren Gesichtspunkt der vorliegenden Offenbarung kann ein Wert einer FOL-Beschränkung basierend auf einer Korrelation zwischen einem Satz von Eingaben und einem Modul in einer Kombination von einem oder mehreren Modulen des Satzes von Modulen bestimmt werden, die gemäß der geschätzten posterioren Verteilung angesichts des Satzes von Eingaben erzeugt wurde. Beispielsweise kann sich die Korrelation darauf beziehen, ob die semantische Darstellung eines Moduls in einer Struktur, die gemäß der geschätzten Posterior-Verteilung (z. B. bei x_n, φ₀) veranschaulicht wird, in S(x_n) zu finden ist, wie durch Formulierung (1) veranschaulicht. In another aspect of the present disclosure, a value of a FOL constraint may be determined based on a correlation between a set of inputs and a module in a combination of one or more modules of the set of modules according to the estimated posterior distribution given the set generated from inputs. For example, the correlation may refer to whether the semantic representation of a module in a structure illustrated according to the estimated posterior distribution (e.g. at x _n , φ ₀ ) can be found in S(x _n ), as illustrated by formulation (1).

6 zeigt ein beispielhaftes Flussdiagramm, das ein Verfahren 600 zum Durchführen einer abstrakten visuellen Schlussfolgerungsaufgabe mit einem probabilistischen neuronal-symbolischen Modell veranschaulicht, das gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung mit Domänenwissen regularisiert wird. Beispielsweise kann das Verfahren 600 durch das Netzwerk 200 oder das Netzwerk 700 durchgeführt werden, die nachfolgend ausführlich beschrieben werden. So kann beispielsweise das Verfahren 600 auch durch andere Netzwerke, Systeme oder Modelle durchgeführt werden. 6 shows an example flowchart illustrating a method 600 for performing an abstract visual reasoning task with a probabilistic neural-symbolic model regularized with domain knowledge in accordance with one or more aspects of the present disclosure. For example, method 600 may be performed by network 200 or network 700, which are described in detail below. For example, the method 600 can also be carried out by other networks, systems or models.

In Block 610 kann das Netzwerk 200, 700 mit einem Satz von Eingabebildern und einem Satz von Kandidatenbildern bereitgestellt werden.In block 610, the network 200, 700 may be provided with a set of input images and a set of candidate images.

In Block 620 kann eine Kombination von einem oder mehreren Modulen des Satzes von Modulen 220, 720 basierend auf einer Posterior-Verteilung über Kombinationen von einem oder mehreren Modulen des Satzes von Modulen 220, 720 und dem Satz von Eingabebildern erzeugt werden, wobei die Posterior-Verteilung durch das PGM 210, 710, das unter Domänenwissen als eine oder mehrere posteriore Regularisierungsbeschränkungen trainiert wurde, formuliert wird. In einem Beispiel kann der Trainingsprozess gemäß dem Verfahren 400 unter Bezugnahme auf 4, wie vorstehend veranschaulicht, durchgeführt werden.In block 620, a combination of one or more modules of the set of modules 220, 720 may be generated based on a posterior distribution over combinations of one or more modules of the set of modules 220, 720 and the set of input images, where the posterior Distribution is formulated by the PGM 210, 710 trained under domain knowledge as one or more posterior regularization constraints. In an example, the training process may be according to method 400 with reference to 4 , as illustrated above, can be carried out.

In Block 630 kann der Satz von Eingabebildern und der Satz von Kandidatenbildern durch die erzeugte Kombination von einem oder mehreren Modulen des Satzes von Modulen 220, 720 verarbeitet werden.In block 630, the set of input images and the set of candidate images may be processed by the generated combination of one or more modules of the set of modules 220, 720.

In Block 640 kann ein Kandidatenbild ausgewählt werden aus dem Satz von Kandidatenbildern basierend auf einer Bewertung jedes Kandidatenbildes in dem Satz von Kandidatenbildern geschätzt durch das Verarbeiten.At block 640, a candidate image may be selected from the set of candidate images based on a score of each candidate image in the set of candidate images estimated by the processing.

7 veranschaulicht ein weiteres beispielhaftes Netzwerk 700, in dem Gesichtspunkte der vorliegenden Offenbarung durchgeführt werden können. Das Netzwerk 700 kann ein Beispiel des Netzwerks 200 sein, wie in 2 veranschaulicht. Beispielsweise kann das Netzwerk 700 ein probabilistisches generatives Modell (PGM) 710 und einen Satz von Modulen 720 einschließen, wie einen Bestand an wiederverwendbaren Modulen. Das PGM 710 und der Satz von Modulen 720 können ein Beispiel des PGM 210 bzw. des Satzes von Modulen 220 sein. Jedes Modul des Satzes von Modulen 720 kann eine Verarbeitungsart umfassen, die vorgegeben sein kann, um zu bewerten, ob die Felder eine spezifische Beziehung erfüllen. Die Verarbeitungsarten können die Operatoren logisches UND, logisches ODER, logisches XOR, arithmetisches ADD, arithmetisches SUB, arithmetisches MUL und dergleichen umfassen. Darüber hinaus kann jedes Modul des Satzes von Modulen 720 einen oder mehrere trainierbare Parameter zum Fokussieren dieses Moduls auf eine oder mehrere variable Bildeigenschaften umfassen. Beispielsweise kann ein Modul einen Typ eines logischen UND aufweisen und über die trainierbaren Parameter, die durch einen Datensatz trainiert werden, auf unterschiedliche Bildeigenschaften fokussieren. Beispielsweise kann das Modul mit dem Typ eines logischen AND eine logische UND-Verknüpfung zwischen Linienfarben durchführen, und es kann auch eine logische UND-Verknüpfung zwischen Formpositionen durchführen, abhängig von unterschiedlichen trainierten Werten der trainierbaren Parameter. 7 illustrates another example network 700 in which aspects of the present disclosure may be performed. Network 700 may be an example of network 200, as shown in 2 illustrated. For example, the network 700 may include a probabilistic generative model (PGM) 710 and a set of modules 720, such as a reusable module inventory. The PGM 710 and the set of modules 720 may be an example of the PGM 210 and the set of modules 220, respectively. Each module of the set of modules 720 may include a type of processing that may be predetermined to evaluate whether the fields satisfy a specific relationship. The processing types may include the operators logical AND, logical OR, logical XOR, arithmetic ADD, arithmetic SUB, arithmetic MUL and the like. Additionally, each module of the set of modules 720 may include one or more trainable parameters for focusing that module on one or more variable image properties. For example, a module may have a logical AND type and focus on different image properties via the trainable parameters trained by a data set. For example, the logical AND type module may perform a logical AND between line colors, and it may also perform a logical AND between shape positions depending on different trained values of the trainable parameters.

In einem Gesichtspunkt der vorliegenden Offenbarung kann jedes Modul des Satzes von Modulen 720 konfiguriert sein, um einen vorentwickelten Prozess auf einer oder mehreren variablen Bildeigenschaften durchzuführen, und die eine oder die mehreren variablen Bildeigenschaften können sich aus dem Verarbeiten einer Eingabebildmerkmalskarte durch mindestens einen trainierbaren Parameter ergeben. Beispielsweise kann ein Modul mit einem Typ eines logischen UND wie folgt dargestellt werden: $ƒ_{U N D} (d, e) = (W_{d} \cdot d) Λ (W_{e} \cdot e)$

In one aspect of the present disclosure, each module of the set of modules 720 may be configured to perform a pre-developed process on one or more variable image features, and the one or more variable image features may result from processing an input image feature map through at least one trainable parameter . For example, a module with a type of logical AND can be represented as follows:

ƒ_{U N D} (d, e) = (W_{d} \cdot d) Λ (W_{e} \cdot e)

Wobei d und e Eingabefeldmerkmale sind, W_d und W_e sind trainierbare Parameter zum Fokussieren auf eine spezifische Feldeigenschaft.Where d and e are input field features, W _d and W _e are trainable parameters for focusing on a specific field feature.

In einem Gesichtspunkt der vorliegenden Offenbarung kann eine Bildfeldeigenschaft eine beliebige Eigenschaft umfassen, die auf einem Bild vorhanden sein kann. In einem weiteren Gesichtspunkt der vorliegenden Offenbarung können eine oder mehrere variable Bildeigenschaften unter Zuhilfenahme von Domänenwissen Form, Linie, Größe, Typ, Farbe, Position oder Anzahl oder dergleichen umfassen, die zumindest teilweise auf Tripeln $T [r, o, a]$

basieren, von denen Beschränkungen abhängig sein können.In one aspect of the present disclosure, an image field property may include any property that may be present on an image. In another aspect of the present disclosure, one or more variable image properties may include shape, line, size, type, color, position or number or the like based at least in part on triples using domain knowledge

T [r, O, a]

based on which restrictions may depend.

In einem Gesichtspunkt der vorliegenden Offenbarung kann PGM 710 konfiguriert sein, um eine Posterior-Verteilung über Strukturen modularisierter Netzwerke 730 auszugeben, die aus dem Satz von Modulen 720 zusammengesetzt sind, wobei die Strukturen 730 die Typen der zusammengesetzten Module und die Verbindungen dazwischen identifizieren können. Die eine oder die mehreren variablen Bildeigenschaften eines jeden Moduls 740 können durch Trainieren der mindestens einen trainierbaren Parameter bestimmt werden. Die getrennte Erzeugung von Strukturen 730 (z. B. durch das PGM 710 erzeugt) und variablen Bildeigenschaften 740 (z. B. erzeugt basierend auf den trainierbaren Parametern) kann dem Netzwerk 700 mehr Flexibilität bei der Abstraktion von Konzepten auf hoher Ebene und beim repräsentativen Lernen bereitstellen.In one aspect of the present disclosure, PGM 710 may be configured to output a posterior distribution across structures of modularized networks 730 composed of the set of modules 720, where the structures 730 may identify the types of composed modules and the connections therebetween. The one or more variable image properties of each module 740 may be determined by training the at least one trainable parameter. Separate generation of structures 730 (e.g., generated by the PGM 710) and variable image properties 740 (e.g., generated based on the trainable parameters) may provide the network 700 more flexibility in abstracting high-level and representative concepts Providing learning.

8 zeigt ein beispielhaftes Diagramm, das ein Beispiel für das Durchführen des Verfahrens 400, des Optimierungsprozesses 500 oder des Verfahrens 600 durch ein Netzwerk 800 gemäß einem oder mehreren Gesichtspunkten der vorliegenden Offenbarung veranschaulicht. Beispielsweise kann das Netzwerk 800 ein Beispiel des Netzwerks 200 oder des Netzwerks 700 sein. Beispielsweise kann ein VAE, der einen Encoder 810-1 und einen Decoder 810-2 umfasst, ein Beispiel für das PGM 210 oder 710 sein. Der Satz von Modulen 820 kann ein Beispiel für den Satz von Modulen 220, 720 sein und kann Strukturen G = (v, A) bilden. Das Subnetzwerk 860 kann verwendet werden, um für jedes Kandidatenbildfeld eine Punktzahl zu berechnen, die dementsprechend einen Korrelationsgrad zwischen jedem Kandidatenbildfeld und einem Ergebnis der Verarbeitung eines Satzes von Eingaben gemäß einem erzeugten modularisierten Netzwerk mit einer Struktur G = (v, A) angibt. Beispielsweise kann die Punktzahl basierend auf verschiedenen Metriken, wie einer Energiefunktion, berechnet werden, wobei eine höhere Energie eine bessere Korrelation angeben kann. Die Posterior-Verteilungseinheit 850 kann Parameter einer Posterior-Verteilung speichern, die von dem Encoder 810-1 ausgegeben wird und basierend auf denen eine Struktur erzeugt werden kann, z. B. durch Abtasten gemäß den Parametern der Posterior-Verteilung. 8th shows an example diagram illustrating an example of performing the method 400, the optimization process 500, or the method 600 through a network 800 in accordance with one or more aspects of the present disclosure. For example, network 800 may be an example of network 200 or network 700. For example, a VAE that includes an encoder 810-1 and a decoder 810-2 may be an example of the PGM 210 or 710. The set of modules 820 may be an example of the set of modules 220, 720 and may form structures G = (v, A). The subnetwork 860 can be used to calculate a score for each candidate image field, which accordingly indicates a degree of correlation between each candidate image field and a result of processing a set of inputs according to a generated modularized network with a structure G = (v, A). For example, the score can be calculated based on various metrics, such as an energy function, where higher energy may indicate better correlation. The posterior distribution unit 850 may store parameters of a posterior distribution that is output by the encoder 810-1 and based on which a structure can be generated, e.g. B. by sampling according to the parameters of the posterior distribution.

In einem Beispiel kann das Verfahren 400 damit beginnen, das Netzwerk 800 mit Sätzen von Eingaben und Sätzen von Ausgaben (z. B. über Route 1) bereitzustellen, wobei jeder Satz von Eingaben (z. B. X₁ von 3 × 3 Feldern von 8) der Sätze von Eingaben auf einen Satz von Ausgaben (z. B. das erste Feld in der ersten Zeile von Y₁ von 8) der dem Satz von Eingaben entspricht, basierend auf visuellen Informationen über den Satz von Eingaben, und wobei das Netzwerk 800 ein probabilistisches generatives Modell (PGM) (z. B. einen Encoder 810-1 und einen Decoder 810-2) und einen Satz von Modulen 820 umfasst. Der Encoder 810-1 kann den Satz von Eingaben X₁ in Verteilungsparameter abbilden oder codieren (z. B. λ1, σ1 bei Annahme von p(G|x)~N(λ, σ)) für eine oder mehrere Variablen (z. B. insgesamt 20 Variablen für eine Summierung von 4x4 Adjazenzmatrixeinträgen und 4 Vertices der Beispiele von 3A und 3B), basierend auf denen eine Struktur G = (v, A) erzeugt werden kann. Die Sätze von Eingaben Xi und/oder Ausgaben Y₁ können dem erzeugten modularisierten Netzwerk mit der erzeugten Struktur G = (v, A) über Route 2 bereitgestellt und verarbeitet werden. Das Subnetzwerk 860 kann die verarbeiteten Eingaben Xi und Ausgaben Y₁ verwenden, um die Bewertung der richtigen Ausgabe (z. B. das erste Feld in der ersten Zeile von Y₁ von 8) über die Routen 3 und 5 zu berechnen.In one example, the method 400 may begin by providing the network 800 with sets of inputs and sets of outputs (e.g., via Route 1), where each set of inputs (e.g., X ₁ of 3 × 3 fields of 8th ) of sets of inputs to a set of outputs (e.g. the first field in the first line of Y ₁ of 8th ) corresponding to the set of inputs based on visual information about the set of inputs, and wherein the network 800 includes a probabilistic generative model (PGM) (e.g., an encoder 810-1 and a decoder 810-2) and a set of modules 820 includes. The encoder _810-1 can map or encode the set of inputs B. a total of 20 variables for a summation of 4x4 adjacency matrix entries and 4 vertices of the examples of 3A and 3B) , based on which a structure G = (v, A) can be generated. The sets of inputs Xi and/or outputs Y ₁ can be provided and processed via route 2 to the generated modularized network with the generated structure G = (v, A). The subnetwork 860 may use the processed inputs Xi and outputs Y ₁ to evaluate the correct output (e.g., the first field in the first row of Y ₁ of 8th ) via routes 3 and 5.

Das Verfahren 400 kann das unter Bezugnahme auf die Eingaben Xi und Ausgaben Y1, beschriebene Verfahren wiederholen, z. B. mit X₂, Y₂, X₃, Y₃, ..., X_n, Y_n. Die Parameter φ, ϑ des Encoders 810-1, des Decoders 810-2 und der Module des Satzes von Modulen 820 können gemäß dem vorstehend unter Bezugnahme auf 5 beschriebenen Optimierungsprozess 500 aktualisiert werden, um die geschätzte Posterior-Verteilung von Strukturen zu erhalten, die mit p̃_φ0(G|x). bezeichnet werden. Darüber hinaus können optimale Lösungen der Gewichtungen µ* erhalten und zum Berechnen der regularisierten Posterior-Verteilung von Strukturen gemäß dem vorstehend unter Bezugnahme auf 5 beschriebenen Optimierungsprozess 500 verwendet werden, z. B. über Route 6.The method 400 may repeat the method described with reference to the inputs Xi and outputs Y1, e.g. B. with X ₂ , Y ₂ , X ₃ , Y ₃ , ..., X _n , Y _n . The parameters φ, ϑ of the encoder 810-1, the decoder 810-2 and the modules of the set of modules 820 can be set according to the above with reference to 5 The optimization process 500 described can be updated to obtain the estimated posterior distribution of structures with p̃ _φ ₀ (G|x). be referred to. Furthermore, optimal solutions of the weights µ* can be obtained and used to calculate the regularized posterior distribution of structures according to the above with reference to 5 optimization process 500 described can be used, e.g. B. via Route 6.

Vorzugsweise können die Parameter ϑ der Module des Satzes von Modulen 820 ferner so aktualisiert werden, dass sie in die aktualisierte regularisierte Posterior-Verteilung von Strukturen hineinpassen.Preferably, the parameters ϑ of the modules of the set of modules 820 may be further updated to fit the updated regularized posterior distribution of structures.

In einem Gesichtspunkt der vorliegenden Offenbarung kann der Decoder 810-2 für eine Rückwärtspropagation verwendet werden, z. B. über Route 4. In einem weiteren Gesichtspunkt der vorliegenden Offenbarung kann der Decoder 810-2 weggelassen werden.In one aspect of the present disclosure, the decoder 810-2 may be used for backward propagation, e.g. via Route 4. In another aspect of the present disclosure, the decoder 810-2 may be omitted.

In einem Beispiel kann das Verfahren 600 für einen Inferenzprozess durchgeführt werden, nachdem das Netzwerk 800 gemäß dem Verfahren 400 und/oder dem Optimierungsprozess 500 trainiert wurde.In one example, the method 600 may be performed for an inference process after the network 800 has been trained according to the method 400 and/or the optimization process 500.

Ein Fachmann wird verstehen, dass die Posterior-Verteilung 850 und/oder das Subnetzwerk 860 in einen oder mehrere Teile des Netzwerks 800 integriert werden kann, anstatt als separater Teil in 8 veranschaulicht zu sein, abhängig von einer Designpräferenz und/oder einer spezifischen Implementierung, ohne von der vorliegenden Offenbarung abzuweichen.One skilled in the art will understand that the posterior distribution 850 and/or the subnetwork 860 may be integrated into one or more parts of the network 800, rather than as a separate part 8th to be illustrated depending on a design preference and/or a specific implementation without departing from the present disclosure.

9 veranschaulicht ein Beispiel einer Hardware-Implementierung für eine Vorrichtung 900 gemäß einer Ausführungsform der vorliegenden Offenbarung. Die Vorrichtung 900 zur visuellen Schlussfolgerung kann einen Speicher 910 und mindestens einen Prozessor 920 umfassen. 9 illustrates an example of a hardware implementation for a device 900 according to an embodiment of the present disclosure. The visual reasoning device 900 may include a memory 910 and at least one processor 920.

Der Prozessor 920 kann mit dem Speicher 910 gekoppelt und konfiguriert werden, um das Verfahren 400, den Optimierungsprozess 500 und das Verfahren 600 durchzuführen, wie vorstehend unter Bezugnahme auf 4, 5 und 6 beschrieben. Der Prozessor 920 kann ein Universalcomputer sein oder auch als eine Kombination von Rechenvorrichtungen implementiert werden, z. B. eine Kombination aus einem DSP und einem Mikroprozessor, mehreren Mikroprozessoren, einem oder mehreren Mikroprozessoren in Verbindung mit einem DSP-Kern oder einer beliebigen anderen derartigen Konfiguration. Der Speicher 910 kann die Eingabedaten, Ausgabedaten, durch einen Prozessor 920 erzeugte Daten und/oder durch einen Prozessor 920 ausgeführte Anweisungen speichern.The processor 920 may be coupled to the memory 910 and configured to perform the method 400, the optimization process 500, and the method 600, as described above with reference to 4 , 5 and 6 described. The processor 920 can be a general purpose computer or can be implemented as a combination of computing devices, e.g. B. a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The memory 910 may store the input data, output data, data generated by a processor 920, and/or instructions executed by a processor 920.

Die verschiedenen Vorgänge, Modelle und Netzwerke, die hierin in Verbindung mit der Offenbarung beschrieben werden, können in Hardware, durch einen Prozessor ausgeführte Software, Firmware oder einer beliebigen Kombination davon implementiert sein. Gemäß einer Ausführungsform der Offenbarung kann ein Computerprogrammprodukt für visuelle Schlussfolgerungen einen durch einen Prozessor ausführbaren Computercode zum Durchführen des Verfahrens 400, des Optimierungsprozesses 500 und des Verfahrens 600 umfassen, die vorstehend unter Bezugnahme auf 4, 5 und 6 beschrieben sind. Gemäß einer anderen Ausführungsform der Offenbarung kann ein computerlesbares Medium Computercode für visuelle Schlussfolgerungen speichern, wobei der Computercode, wenn er von einem Prozessor ausgeführt wird, den Prozessor veranlassen kann, das Verfahren 400, den Optimierungsprozess 500 und das Verfahren 600 durchzuführen, die vorstehend unter Bezugnahme auf 4, 5 und 6 beschrieben sind. Computerlesbare Medien schließen sowohl nicht-transitorische, computerlesbare Speichermedien als auch Kommunikationsmedien einschließlich aller Medien ein, welche die Übertragung eines Computerprogramms von einem Ort zum anderen unterstützen. Jede Verbindung kann als ein computerlesbares Medium bezeichnet werden. Andere Ausführungsformen und Implementierungen liegen innerhalb des Schutzumfangs der Offenbarung.The various processes, models, and networks described herein in connection with the disclosure may be implemented in hardware, processor-executed software, firmware, or any combination thereof. According to one embodiment of the disclosure, a visual inference computer program product may include processor-executable computer code for performing the method 400, the optimization process 500, and the method 600 described above with reference to 4 , 5 and 6 are described. According to another embodiment of the disclosure, a computer-readable medium may store computer code for visual reasoning, the computer code, when executed by a processor, may cause the processor to perform the method 400, the optimization process 500, and the method 600, referenced above on 4 , 5 and 6 are described. Computer-readable media includes both non-transitory, computer-readable storage media and communications media, including any media that supports the transmission of a computer program from one location to another. Any compound can be referred to as a computer-readable medium. Other embodiments and implementations are within the scope of the disclosure.

Die vorhergehende Beschreibung der offenbarten Ausführungsformen wird bereitgestellt, um es einem Fachmann zu ermöglichen, die verschiedenen Ausführungsformen herzustellen oder zu verwenden. Verschiedene Modifikationen an diesen Ausführungsformen sind für einen Fachmann leicht ersichtlich, und die hierin definierten generischen Prinzipien können auf andere Ausführungsformen angewendet werden, ohne vom Schutzumfang der verschiedenen Ausführungsformen abzuweichen. Somit sollen die Ansprüche nicht auf die hierin gezeigten Ausführungsformen beschränkt sein, sondern es ist ihnen der breiteste Schutzumfang zu gewähren, der mit den folgenden Ansprüchen und den hierin offenbarten Prinzipien und neuartigen Merkmalen übereinstimmt.The foregoing description of the disclosed embodiments is provided to enable one skilled in the art to make or use the various embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the various embodiments. Thus, the claims are not intended to be limited to the embodiments shown herein, but are to be accorded the broadest scope consistent with the following claims and the principles and novel features disclosed herein.

Claims

A method for visual reasoning, comprising: providing a network having sets of inputs and sets of outputs, each set of inputs from the sets of inputs being mapped to one of a set of outputs corresponding to the set of inputs based on visual information about the set of inputs, and wherein the network comprises a probabilistic generative model (PGM) and a set of modules; determining by the PGM a posterior distribution over combinations of one or more modules of the set of modules based on the sets of inputs and sets of outputs; and Applying domain knowledge as one or more posterior regularization constraints to the given posterior distribution.

Procedure according to Claim 1 , wherein the one or more posterior regularization constraints are grouped into one or more groups of constraints according to one or more aspects of domain knowledge.

Procedure according to Claim 2 , wherein the one or more aspects of domain knowledge include one or more of logical reasoning, temporal reasoning, spatial reasoning or arithmetic reasoning.

Procedure according to Claim 1 , where the one or more posterior regularization constraints are one or more first-order logic (FOL) constraints.

Procedure according to Claim 4 , wherein the one or more FOL constraints are generated based on at least one of relationship types, object types, or attribute types of the sets of inputs.

Procedure according to Claim 1 , wherein each of the combinations of one or more modules of the set of modules comprises a modularized network, the modularized network being composed of one or more modules of the set of modules with a structure indicating the assembled one or more modules and connections therebetween.

Procedure according to Claim 6 , further comprising: determining a posterior distribution over structures of modularized networks by the PGM based on the provided sets of inputs and the sets of outputs.

Procedure according to Claim 6 , wherein each module of the set of modules includes at least one trainable parameter for focusing that module on one or more variable image properties and is configured to apply a pre-built process type to the one or more variable image properties; and wherein the method further comprises: determining, by the PGM, a posterior distribution over structures of modularized networks indicating the types of the composite one or more modules and the connections therebetween based on the provided sets of inputs and sets of outputs.

Procedure according to Claim 1 , the method further comprising optimizing the network by: updating parameters of the PGM and parameters of modules of the set of modules, alternatively by maximizing evidence of the sets of inputs and the sets of outputs to obtain an estimated posterior distribution over the to obtain combinations of one or more modules of the set of modules and optimized parameters of the modules of the set of modules; Updating one or more weights of the one or more posterior regularization constraints applied to the estimated posterior distribution over the combinations of one or more modules of the set of modules to obtain one or more optimal solutions for the one or more weights ; adjusting the estimated posterior distribution over the combinations of one or more modules of the set of modules by applying the one or more optimal solutions of the one or more weights and one or more values of the one or more constraints to the estimated posterior distribution; and updating the optimized parameters of the modules based on the adjusted estimated posterior distribution over the combinations of one or more modules from the set of modules.

Procedure according to Claim 9 , where the one or more posterior regularization constraints are grouped into one or more groups of constraints and a group of constraints corresponds to a weight.

Procedure according to Claim 9 , wherein a value of a constraint is determined based on a correlation between a set of inputs and a module in a combination of one or more modules of the set of modules generated according to the estimated posterior distribution given the set of inputs.

A method for visual reasoning with a network, the network comprising a probabilistic generative model (PGM) and a set of modules, the method comprising: providing the network with a set of input images and a set of candidate images; Generating a combination of one or more modules of the set of modules based on a posterior distribution over combinations of one or more modules of the set of modules and the set of input images, where the posterior distribution of the PGM trained under domain knowledge is one or more posterior regularization constraints are formulated; processing the set of input images and the set of candidate images through the generated combination of one or more modules; and selecting a candidate image from the set of candidate images based on a score of each candidate image in the set of candidate images estimated by the processing.

A visual reasoning device comprising: a memory; and at least one processor coupled to the memory and configured to implement the method according to one of Claims 1 until 12 to carry out.

A computer program product for visual reasoning, comprising: computer code executable by a processor for performing the method according to one of the Claims 1 until 12 .

A computer-readable medium storing computer code for visual reasoning, the computer code, when executed by a processor, causing the processor to execute the method of one of the Claims 1 until 12 to carry out.

Visual reasoning network comprising: a set of modules, each of the set of modules being implemented as a neural network and having at least one trainable parameter for focusing that module on one or more variable image properties; and a probabilistic generative model (PGM) coupled to the set of modules, the PGM configured to output a posterior distribution over combinations of one or more modules of the set of modules.

network Claim 16 , wherein each of the set of modules is configured to perform a predefined type of processing on the one or more variable image features, and the one or more variable image features result from processing an image feature map through the at least one trainable parameter.

network Claim 17 , wherein the one or more variable image properties include one or more of shape, line, size, type, color, position or number and the prefabricated processing type is a logical AND, logical OR, logical XOR, arithmetic ADD, arithmetic SUB, arithmetic MUL, spatial STRUC, temporal PROG or temporal ID.