DE102017219402A1

DE102017219402A1 - DEVICE AND METHOD FOR DETECTING AND TRACKING OBJECTS IN A VIDEO SEQUENCE

Info

Publication number: DE102017219402A1
Application number: DE102017219402.2A
Authority: DE
Inventors: Sebastian Bullinger; Christoph Bodensteiner; Michael Arens
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2019-05-02
Also published as: WO2019081713A1

Abstract

Vorrichtung zum Erkennen und Verfolgen von Objekten in einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln bestehenden Bildern, wobei die Vorrichtung umfasst:
eine Segmentierungsstufe zur Segmentierung wenigstens eines ersten Bildes und eines zeitlich nachfolgenden zweiten Bildes;
eine Korrespondenzermittlungsstufe zur Ermittlung mehrerer Korrespondenzpaare, welche jeweils ein Pixel der Pixel des ersten Bildes und ein dazu korrespondierendes Pixel der Pixel des zweiten Bildes umfassen;
eine Zuordnungsstufe zum Zuordnen der Segmentierungsinstanzinformation und des Satzes von Wahrscheinlichkeitsinformationen wenigstens eines Pixels des ersten Bildes auf der Basis von Verschiebungsvektoren zu jeweils einem Pixel des zweiten Bildes;
eine Objektkennungszuordnungsstufe zum Zuordnen der Pixel einer der Segmentierungsinstanzen des ersten Bildes und der Pixel einer der Segmentierungsinstanzen des zweiten Bildes zu einer gemeinsamen Objektkennung, sofern die jeweilige Segmentierungsinstanz des ersten Bildes und die Segmentierungsinstanz des zweiten Bildes dasselbe Objekt abbilden;
eine Fusionierungsstufe zum Fusionieren des jeweils einem Pixel des zweiten Bildes zugeordneten Satzes von Wahrscheinlichkeitsinformationen aus dem ersten Bild und des Satzes von Wahrscheinlichkeitsinformationen des jeweiligen Pixels des zweiten Bildes; und
eine Objektkategorieermittlungsstufe zum Ermitteln der jeweiligen Objektkategorie der Segmentierungsinstanzen.

Apparatus for detecting and tracking objects in a video sequence of temporally consecutive pixel-based images, the apparatus comprising:
a segmentation stage for segmenting at least a first image and a temporally subsequent second image;
a correspondence detection stage for detecting a plurality of correspondence pairs each comprising one pixel of the pixels of the first image and a corresponding one of the pixels of the second image;
an association stage for assigning the segmentation instance information and the set of probability information of at least one pixel of the first image based on displacement vectors to one pixel of the second image, respectively;
an object identifier assignment stage for assigning the pixels of one of the segmentation instances of the first image and the pixels of one of the segmentation instances of the second image to a common object identifier, if the respective segmentation entity of the first image and the segmentation entity of the second image image the same object;
a fusing stage for fusing the set of probability information from the first image and the set of likelihood information of the respective pixel of the second image respectively associated with a pixel of the second image; and
an object category discovery stage for determining the respective object category of the segmentation instances.

Description

Die vorliegende Erfindung betrifft eine Vorrichtung und ein Verfahren zum Erkennen und Verfolgen von Objekten in einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln bestehenden Bildern.The present invention relates to an apparatus and method for recognizing and tracking objects in a video sequence of temporally successive pixelated images.

Aus den Referenzen [1], [2] und [3] sind Vorrichtungen zum Erkennen und Verfolgen von Objekten auf der Basis von sogenannten Bounding-Box-Ansätzen bekannt. Die verwendeten Verfahren bestehen aus mehreren Stufen. In der ersten Stufe werden in den Einzelbildern Objektabbildungen ermittelt, welche jeweils ein Objekt repräsentieren. Dabei wird der Bereich, den eine Objektabbildung im Bild einnimmt, durch ein Begrenzungsrechteck (Bounding-Box) festgelegt. In der zweiten Stufe werden die Objektabbildungen aus aufeinanderfolgenden Bildern miteinander verknüpft. Die Verknüpfung kann auf einem räumlichen Bezug und/oder auf einer visuellen Ähnlichkeit beruhen. Der räumliche Bezug kann beispielsweise durch eine Propagierung einer Objektabbildung aus dem aktuellen Bild in das darauffolgende Bild unter Verwendung eines Kalman Filters oder eines optischen Flusses hergestellt werden.References [1], [2] and [3] disclose devices for recognizing and tracking objects on the basis of so-called bounding box approaches. The methods used consist of several stages. In the first stage, object images are determined in the individual images, which each represent an object. The area occupied by an object image in the image is defined by a bounding box. In the second stage, the object images are linked together from successive images. The link may be based on spatial reference and / or visual similarity. The spatial reference can be made, for example, by propagating an object image from the current image into the subsequent image using a Kalman filter or an optical flow.

Eine weitere Vorrichtung zum Erkennen und Verfolgen von Objekten auf der Basis von sogenannten Segmentierungs-Ansätzen ist aus der Referenz [4] bekannt, bei welcher in einem ersten Schritt ein erstes Einzelbild und ein zweites Einzelbild derart segmentiert werden, dass für in dem jeweiligen Einzelbild abgebildete Objekte jeweils eine Segmentierungsinstanz gebildet wird, wobei die zu einer der Segmentierungsinstanzen gehörigen Pixel ermittelt und einer semantischen Objektkategorie zugeordnet werden. Dann werden in einem zweiten Schritt die Segmentierungsinstanzen des ersten Einzelbildes mittels eines optischen Flusses in das zweite Einzelbild abgebildet und in einem dritten Schritt die abgebildeten Segmentierungsinstanzen des ersten Einzelbildes mittels einer Affinitätsmatrix korrespondierenden Segmentierungsinstanzen des zweiten Einzelbildes zugeordnet.A further apparatus for recognizing and tracking objects on the basis of so-called segmentation approaches is known from reference [4], in which in a first step a first individual image and a second individual image are segmented in such a way that images shown in the respective individual image Objects is formed in each case a segmentation instance, wherein the belonging to one of the segmentation instances pixels are determined and assigned to a semantic object category. Then, in a second step, the segmentation instances of the first individual image are imaged into the second individual image by means of an optical flow, and in a third step the mapped segmentation entities of the first individual image are assigned by means of an affinity matrix to corresponding segmentation instances of the second individual image.

Aufgabe der vorliegenden Erfindung ist es, eine verbesserte Vorrichtung und ein verbessertes Verfahren zum Erkennen und Verfolgen von Objekten bereitzustellen.The object of the present invention is to provide an improved apparatus and method for recognizing and tracking objects.

In einem ersten Aspekt wird die Aufgabe gelöst durch eine Vorrichtung zum Erkennen und Verfolgen von Objekten in einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln bestehenden Bildern, wobei die Vorrichtung umfasst:

eine Segmentierungsstufe zur Segmentierung wenigstens eines ersten Bildes der Bilder und eines zeitlich nachfolgenden zweiten Bildes der Bilder, wobei die Segmentierungsstufe über mehrere vordefinierte semantische Objektkategorien verfügt und wobei die Segmentierungsstufe so ausgebildet ist, dass bei der Segmentierung eines Bildes der Bilder
für in dem jeweiligen Bild abgebildete Objekte jeweils eine für das jeweilige Bild gültige Segmentierungsinstanz gebildet ist, und
für mehrere Pixel der Pixel des jeweiligen Bildes jeweils eine Segmentierungsinstanzinformation und jeweils ein Satz von Wahrscheinlichkeitsinformationen ermittelt sind, wobei die jeweilige Segmentierungsinstanzinformation angibt, zu welcher der Segmentierungsinstanzen des jeweiligen Bildes das jeweilige Pixel zugehörig ist, und wobei jede der Wahrscheinlichkeitsinformationen des jeweilige Satzes von Wahrscheinlichkeitsinformationen eine Wahrscheinlichkeitsangabe der Zugehörigkeit des jeweiligen Pixels zu einer der Objektkategorien angibt;
eine Korrespondenzermittlungsstufe zur Ermittlung mehrerer Korrespondenzpaare, welche jeweils ein Pixel der Pixel einer der Segmentierungsinstanzen des ersten Bildes und ein dazu korrespondierendes Pixel der Pixel des zweiten Bildes umfassen, wobei aus den Korrespondenzpaaren jeweils ein Verschiebungsvektor zwischen dem jeweiligen Pixel des ersten Bildes BI1) und dem jeweiligen korrespondierenden Pixel des zweiten Bildes ermittelt ist;
eine Zuordnungsstufe zum Zuordnen der Segmentierungsinstanzinformation und des Satzes von Wahrscheinlichkeitsinformationen wenigstens eines Pixels der Pixel des ersten Bildes auf der Basis der Verschiebungsvektoren zu jeweils einem Pixel der Pixel des zweiten Bildes, so dass dem jeweiligen Pixel des zweiten Bildes eine Segmentierungsinstanzinformation der Segmentierungsinstanzinformationen aus dem ersten Bild und ein Satz der Sätze von Wahrscheinlichkeitsinformationen aus dem ersten Bild zugeordnet sind;
eine Objektkennungszuordnungsstufe zum Zuordnen der Pixel einer der Segmentierungsinstanzen des ersten Bildes und der Pixel einer der Segmentierungsinstanzen des zweiten Bildes zu einer gemeinsamen Objektkennung, sofern die jeweilige Segmentierungsinstanz des ersten Bildes und die Segmentierungsinstanz des zweiten Bildes dasselbe Objekt abbilden, wobei die Objektkennung das jeweilige Objekt über die Videosequenz hinweg eindeutig kennzeichnet, wobei das Zuordnen der Pixel der jeweiligen Segmentierungsinstanz des ersten Bildes auf der Basis der Segmentierungsinstanzinformation des jeweiligen Pixels erfolgt, und wobei das Zuordnen der Pixel der jeweiligen Segmentierungsinstanz des zweiten Bildes auf der Basis der Segmentierungsinstanzinformation des jeweiligen Pixels und der dem jeweiligen Pixel zugeordneten Segmentierungsinstanzinformation aus dem ersten Bild erfolgt;
eine Fusionierungsstufe zum Fusionieren des jeweils einem Pixel der Pixel des zweiten Bildes zugeordneten Satzes von Wahrscheinlichkeitsinformationen aus dem ersten Bild und des Satzes von Wahrscheinlichkeitsinformationen des jeweiligen Pixels des zweiten Bildes, um so für ein oder mehrere Pixel des zweiten Bildes jeweils einen fusionierten Satz von Wahrscheinlichkeitsinformationen zu bestimmen; und
eine Objektkategorieermittlungsstufe zum Ermitteln der jeweiligen Objektkategorie der Segmentierungsinstanzen, welche einer der Objektkennungen zugeordnet sind, auf der Basis der fusionierten Sätze von Wahrscheinlichkeitsinformationen der der jeweiligen Objektkennung zugeordneten Pixel des zweiten Bildes.

In a first aspect, the object is achieved by a device for recognizing and tracking objects in a video sequence from time-sequentially consisting of pixels images, the device comprising:

a segmentation stage for segmenting at least a first image of the images and a temporally subsequent second image of the images, wherein the segmentation step has a plurality of predefined semantic object categories, and wherein the segmentation step is arranged such that when segmenting an image of the images
for each of the objects imaged in the respective image a respective segmentation instance valid for the respective image is formed, and
a segmentation instance information and a set of probability information are respectively determined for a plurality of pixels of the pixels of the respective image, the respective segmentation instance information indicating to which of the segmentation instances of the respective image the respective pixel belongs, and wherein each of the probability information of the respective set of probability information is a Indicates the probability of belonging the respective pixel to one of the object categories;
a correspondence determination stage for determining a plurality of correspondence pairs, each comprising a pixel of the pixels of one of the segmentation instances of the first image and a corresponding pixel of the pixels of the second image, each of the correspondence pairs being a displacement vector between the respective pixel of the first image BI1) and the respective one corresponding pixel of the second image is determined;
an assigning stage for assigning the segmentation instance information and the set of probability information of at least one pixel of the pixels of the first image based on the displacement vectors to one pixel of the pixels of the second image, so that the respective pixel of the second image has segmentation instance information of the segmentation instance information from the first image and associating a set of the sets of probability information from the first image;
an object identifier assignment stage for assigning the pixels of one of the segmentation instances of the first image and the pixels of one of the segmentation instances of the second image to a common object identifier, if the respective segmentation entity of the first image and the segmentation entity of the second image image the same object, wherein the object identifier transmits the respective object via uniquely identifying the video sequence, wherein the assigning of the pixels of the respective segmentation instance of the first image is based on the segmentation instance information of the respective pixel, and the assigning of the pixels of the respective segmentation instance of the second image on the basis of the segmentation instance information of the respective pixel and the segmentation instance information associated with each pixel is from the first image;
a fusing step of fusing the set of probability information from the first image and the set of likelihood information of the respective pixel of the second image respectively associated with one pixel of the pixels of the second image so as to respectively associate a fused set of likelihood information with one or more pixels of the second image determine; and
an object category determination stage for determining the respective object category of the segmentation instances associated with one of the object identifiers on the basis of the merged sets of probability information of the pixels of the second image associated with the respective object identifier.

Unter einem Objekt wird dabei ein Element einer abgebildeten Szene verstanden. Ein Objekt hebt sich dabei in der Regel vom Hintergrund ab. Es kann in Bezug auf den Hintergrund ortsfest oder beweglich sein.An object is understood to be an element of an imaged scene. An object usually stands out from the background. It may be fixed or movable with respect to the background.

Erkennen eines Objektes bedeutet dabei, die Art des jeweiligen Objektes festzustellen. Weiterhin bedeutet Verfolgen eines Objektes, Positionsveränderungen des jeweiligen Objektes im Zeitverlauf in der jeweiligen Szene festzustellen.Detecting an object means to determine the type of the respective object. Furthermore, tracking an object means detecting changes in the position of the respective object over time in the respective scene.

Um nun das Erkennen und Verfolgen von Objekten zu ermöglichen, werden die mindestens zwei Bilder der Videosequenz der Segmentierungsstufe zugeführt und getrennt voneinander segmentiert. Dabei sind in der Segmentierungsstufe semantische Objektkategorien hinterlegt, welche die Art eines Objektes sinnhaft beschreiben. Es handelt sich somit um „sprechende“ Bezeichnungen für die Art eines Objekts Beispiele für Objektkategorien sind die Kategorien „Fußgänger“, „Auto“, „Baum“ usw.In order now to enable the recognition and tracking of objects, the at least two images of the video sequence are fed to the segmentation stage and segmented separately from one another. In this case, semantic object categories are stored in the segmentation stage, which describe the nature of an object meaningfully. These are therefore "speaking" terms for the type of object. Examples of object categories are the categories "pedestrian", "car", "tree", etc.

Bei der Segmentierung eines der Bilder wird nun für jedes in dem Bild sichtbare Objekt eine für das jeweilige Bild definierte Segmentierungsinstanz gebildet. Eine Segmentierungsinstanz ist somit ein Repräsentant für ein (noch nicht erkanntes) Objekt. Jedem Pixel des Bildes, welches einer der gebildeten Segmentierungsinstanzen zuordenbar ist, wird dabei eine Segmentierungsinstanzinformation zugewiesen, welche angibt, zu welcher der gebildeten Segmentierungsinstanzen das betreffende Pixel zugehörig ist. Weiterhin wird jedem zuordenbaren Pixel ein Satz von Wahrscheinlichkeitsinformationen zugewiesenen, wobei die Wahrscheinlichkeitsinformationen des Satzes jeweils eine Wahrscheinlichkeit für eine Zugehörigkeit des Pixels zu einer der Objektkategorien angeben. Der Satz von Wahrscheinlichkeitsinformationen kann ein Vektor sein, dessen Länge der Anzahl der hinterlegten Objektkategorien entspricht. Beispielsweise kann der Satz von Wahrscheinlichkeitsinformationen ausdrücken, dass das jeweilige Pixel mit einer Wahrscheinlichkeit von 50 % der Objektkategorie „Fußgänger“, mit einer Wahrscheinlichkeit von 20 % der Objektkategorie „Baum“ und mit einer Wahrscheinlichkeit von 30 % der Kategorie „Auto“ angehört. Es ist aber nicht erforderlich, dass die Wahrscheinlichkeiten auf einer normierten Skala angegeben werden. Auch ist es nicht erforderlich, dass die Wahrscheinlichkeiten zahlenmäßig ermittelt werden. So ist es beispielsweise möglich, die Wahrscheinlichkeiten auf einer Nominalskala, beispielsweise mit den Werten „sehr niedrig“, „niedrig“, „mittel“, „hoch“ und „sehr hoch“, anzugeben.During the segmentation of one of the images, a segmentation instance defined for the respective image is formed for each object visible in the image. A segmentation instance is thus a representative of an (not yet recognized) object. Each pixel of the image, which can be assigned to one of the formed segmentation instances, is assigned a segmentation instance information which indicates to which of the formed segmentation instances the pixel in question belongs. Further, each assignable pixel is assigned a set of likelihood information, the likelihood information of the set each indicating a likelihood of the pixel belonging to one of the object categories. The set of probability information may be a vector whose length corresponds to the number of stored object categories. For example, the set of probability information may express that the respective pixel belongs to the object category "pedestrian" with a probability of 50%, with a probability of 20% of the object category "tree" and with a probability of 30% of the category "auto". It is not necessary, however, that the probabilities be stated on a standardized scale. Nor is it necessary for the probabilities to be numerically determined. For example, it is possible to specify the probabilities on a nominal scale, for example with the values "very low", "low", "medium", "high" and "very high".

Die Segmentierungsstufe kann dabei wie in Referenz [4], [5] und [6] beschriebenen ausgebildet sein und insbesondere zur Ausführung der in den Referenzen [4], [5] und [6] näher beschriebenen Segmentierungsmethode ausgebildet sein, da diese Segmentierungsmethoden sowohl die benötigten Segmentierungsinstanzinformationen als auch die Sätze von Wahrscheinlichkeitsinformationen für die zuordenbaren Pixel bereitstellen.The segmentation stage can be embodied as described in reference [4], [5] and [6] and can be designed in particular for carrying out the segmentation method described in more detail in references [4], [5] and [6], since these segmentation methods both provide the required segmentation instance information as well as the sets of probability information for the assignable pixels.

Die Korrespondenzermittlungsstufe zur Ermittlung von Korrespondenzpaaren, welche korrespondierende Pixel aus aufeinanderfolgenden Bildern enthalten, dient der späteren Zusammenführung von Segmentierungsinstanzinformationen aus unterschiedlichen aufeinanderfolgenden Bildern sowie die Zusammenführung von Sätzen von Wahrscheinlichkeitsinformationen aus unterschiedlichen aufeinanderfolgenden Bildern. Die Ermittlung korrespondierender Pixel aus unterschiedlichen aufeinanderfolgenden Bildern kann beispielsweise auf korrespondierenden Farben oder auf korrespondierenden räumlichen Anordnungen beruhen. Die Verschiebungsvektoren, welche die Pixel des vorangegangenen Bildes zum korrespondierenden Pixel des aktuellen Bildes verschieben, berücksichtigen sowohl eine Bewegung des dem Pixel zu Grunde liegenden Objektes in der Szene als auch eine Bewegung der Kamera, mit der die Szene aufgenommen ist. Ein vordefiniertes Bewegungsmodell ist hierzu nicht erforderlich.The correspondence determination stage for determining pairs of correspondence, which contain corresponding pixels from successive images, is used later to merge segmentation instance information from different successive images and to merge sets of probability information from different successive images. The determination of corresponding pixels from different successive images can be based, for example, on corresponding colors or on corresponding spatial arrangements. The displacement vectors, which shift the pixels of the previous image to the corresponding pixel of the current image, take into account both a movement of the underlying pixel in the scene and a movement of the camera with which the scene is captured. A predefined movement model is not required.

Die Zuordnungsstufe dient dazu, Pixeln einer der Segmentierungsinstanzen des zweiten Bildes jeweils eine Segmentierungsinstanzinformationen und einen Satz von Wahrscheinlichkeitsinformationen eines Pixels des ersten Bildes zuzuordnen, wobei dies auf der Basis der Verschiebungsvektoren erfolgt. Auf diese Weise ergeben sich Pixel des zweiten Bildes, denen nicht nur eine Segmentierungsinstanzinformation und ein Satz von Wahrscheinlichkeitsinformationen aus dem zweiten Bild sondern auch eine Segmentierungsinstanzinformation und ein Satz von Wahrscheinlichkeitsinformationen aus dem ersten Bild zugeordnet sind.The assignment stage serves to associate pixels of one of the segmentation instances of the second image with a segmentation instance information and a set of probability information of a pixel of the first image, respectively, on the basis of the displacement vectors. In this way, pixels of the second image yield not only segmentation instance information and a set of probability information from the second image, but also segmentation instance information and a sentence associated with probability information from the first image.

Die Objektkennungszuordnungsstufe dient nun dazu, Segmentierungsinstanzen des ersten Bildes und Segmentierungsinstanzen des zweiten Bildes, welche dasselbe Objekt abbilden, miteinander zu verknüpfen, umso das jeweilige Objekt zu verfolgen. Hierzu werden zusammengehörige Segmentierungsinstanzen des ersten Bildes und des zweiten Bildes einer gemeinsamen Objektkennung zugeordnet, welche das jeweilige Objekt über die Videosequenz hinweg eindeutig kennzeichnet.The object identifier assignment stage now serves to link segmentation instances of the first image and segmentation instances of the second image, which map the same object, so as to track the respective object. For this purpose, associated segmentation instances of the first image and of the second image are assigned to a common object identifier which uniquely identifies the respective object across the video sequence.

Folgendes Beispiel verdeutlicht das Prinzip der Zuordnung: Es sei angenommen, dass die Segmentierungsstufe in dem ersten Bild zwei Segmentierungsinstanzen gebildet und diesen Segmentierungsinstanzen die Kennung „1, 1“ und die Kennung „1, 2“ zugewiesenen hat, wobei die erste Zahl die Zugehörigkeit zum ersten Bild ausdrückt und die zweite Zahl einen im ersten Bild eindeutigen Segmentierungsinstanzindex darstellt. Weiterhin sei angenommen, dass die Segmentierungsstufe in dem zweiten Bild drei Segmentierungsinstanzen gebildet und diesen Segmentierungsinstanzen die Kennung „2, 1“, die Kennung „2, 2“ und die Kennung „2, 3“ zugeordnet hat, wobei hier die erste Zahl die Zugehörigkeit zum zweiten Bild ausdrückt und die zweite Zahl einen im zweiten Bild eindeutigen Segmentierungsinstanzindex darstellt.The following example clarifies the principle of the assignment: Let it be assumed that the segmentation stage has formed two segmentation instances in the first image and assigned the identifier "1, 1" and the identifier "1, 2" to these segmentation instances, the first number belonging to the segmentation instances first image and the second number represents a unique segmentation instance index in the first image. Furthermore, it is assumed that the segmentation stage has formed three segmentation instances in the second image and assigned to these segmentation instances the identifier "2, 1", the identifier "2, 2" and the identifier "2, 3", in which case the first number is the affiliation to the second image and the second number represents a unique segmentation instance index in the second image.

Die Pixel der Segmentierungsinstanzen mit den Kennungen „1, 1“ und „1, 2“ des ersten Bildes können nun in Abhängigkeit ihrer Segmentierungsinstanzinformation aus dem ersten Bild einer ersten Objektkennung „1“ oder einer zweiten Objektkennung „2“ zugeordnet werden. Im Beispiel sei die Segmentierungsinstanz mit der Kennung „1, 1“ der Objektkennung „1“ und die Segmentierungsinstanz mit der Kennung „1, 2“ der Objektkennung „2“ zugeordnet.The pixels of the segmentation instances with the identifiers "1, 1" and "1, 2" of the first image can now be assigned as a function of their segmentation instance information from the first image to a first object identifier "1" or to a second object identifier "2". In the example, the segmentation instance with the identifier "1, 1" is assigned to the object identifier "1" and the segmentation entity with the identifier "1, 2" is assigned to the object identifier "2".

Die Zuordnung der Segmentierungsinstanz des zweiten Bildes mit der Kennung „2, 1“ beruht nun auf einer Betrachtung der dieser Segmentierungsinstanz zugeordneten Pixel. Sofern nun die Segmentierungsinstanz mit der Kennung „2, 1“ eines der Objekte abbildet, welche im ersten Bild abgebildet sind, sind zumindest einigen der Pixel der Segmentierungsinstanz mit der Kennung „2, 1“ Segmentierungsinstanzinformation des ersten Bildes zugeordnet. Aus den Segmentierungsinstanzinformationen des ersten Bildes kann dann eine Zuordnung Pixel der Segmentierungsinstanz mit der Kennung „2, 1“ zu der Objektkennung „1“ oder der Objektkennung „2“ erfolgen. Wenn beispielsweise die zugeordneten Segmentierungsinstanzinformationen des ersten Bildes besagen, dass deren Pixel der Segmentierungsinstanz mit der Kennung „1, 2“ zugehörig sind, dann kann die Segmentierungsinstanz mit der Kennung „2, 1“ ebenfalls der Objektkennung „2“ zugeordnet werden. Auf diese Weise kann festgestellt werden, dass die Segmentierungsinstanz mit der Kennung „2, 1“ dasselbe Objekt abbildet wie die Segmentierungsinstanz mit der Kennung „1, 2“.The assignment of the segmentation instance of the second image with the identifier "2, 1" is now based on a consideration of the pixels assigned to this segmentation entity. If the segmentation entity with the identifier "2, 1" now images one of the objects that are depicted in the first image, at least some of the pixels of the segmentation entity with the identifier "2, 1" are assigned segmentation instance information of the first image. From the segmentation instance information of the first image, an assignment of pixels of the segmentation instance with the identifier "2, 1" to the object identifier "1" or the object identifier "2" can then take place. For example, if the associated segmentation instance information of the first image indicates that its pixels belong to the segmentation instance with the identifier "1, 2", then the segmentation entity with the identifier "2, 1" can also be assigned to the object identifier "2". In this way, it can be stated that the segmentation entity with the identifier "2, 1" depicts the same object as the segmentation entity with the identifier "1, 2".

Im Idealfall gehören alle Segmentierungsinstanzinformationen des ersten Bildes, welche den Pixeln einer der Segmentierungsinstanzen des zweiten Bildes zugeordnet sind, zur selben Segmentierungsinstanz des ersten Bildes. In diesem Fall ist die Zuordnung der jeweiligen Segmentierungsinstanz des zweiten Bildes zu einer der Objektkennungen besonders zuverlässig. Sollten jedoch die Segmentierungsinstanzinformationen des ersten Bildes, welche den Pixeln einer der Segmentierungsinstanzen des zweiten Bildes zugeordnet sind, unterschiedlichen Segmentierungsinstanzen des ersten Bildes angehören, dann kann die Zuordnung zu einer der Objektkennungen auf der Basis einer Mehrheit der Segmentierungsinstanzinformation des ersten Bildes erfolgen. Dann erfolgt die Zuordnung zu einer der Objektkennungen allerdings unter einer gewissen Unsicherheit.Ideally, all segmentation instance information of the first image associated with the pixels of one of the segmentation instances of the second image belongs to the same segmentation instance of the first image. In this case, the assignment of the respective segmentation instance of the second image to one of the object identifiers is particularly reliable. However, if the segmentation instance information of the first image associated with the pixels of one of the segmentation instances of the second image belong to different segmentation instances of the first image, then the association with one of the object identifiers may be based on a majority of the segmentation instance information of the first image. Then, however, the assignment to one of the object identifiers takes place under a certain uncertainty.

Weiterhin sei angenommen, dass den Pixeln der Segmentierungsinstanz mit der Kennung „2, 2“ keine Segmentierungsinstanzinformation des ersten Bildes zugeordnet sind. In diesem Fall kann davon ausgegangen werden, dass die Segmentierungsinstanz mit der Kennung „2, 2“ keines der im ersten Bild abgebildeten Objekte abbildet, wobei dann eine neue Objektkennung „3“ vergeben werden kann.Furthermore, it is assumed that the pixels of the segmentation instance with the identifier "2, 2" are not assigned any segmentation instance information of the first image. In this case, it can be assumed that the segmentation entity with identifier "2, 2" does not map any of the objects depicted in the first image, in which case a new object identifier "3" can be assigned.

Nun können die Pixels der Segmentierungsinstanz mit der Kennung „2, 3“ auf das Vorhandensein von zugeordneten Segmentierungsinstanzinformationen aus dem ersten Bild untersucht werden. Wenn diese vorhanden sind und der Segmentierungsinstanz mit der Kennung „1, 1“ zugehörig sind, dann kann die Segmentierungsinstanz mit der Kennung „2, 3“ der Objektkennung „1“ zugeordnet werden.Now, the pixels of the segmentation instance with the identifier "2, 3" can be examined for the presence of associated segmentation instance information from the first image. If these are present and belong to the segmentation instance with the identifier "1, 1", then the segmentation entity with identifier "2, 3" can be assigned to the object identifier "1".

Unter einem fusionierten Satz von Wahrscheinlichkeitsinformationen versteht man einen derartigen Satz von Wahrscheinlichkeitsinformationen, der rechnerisch aus wenigstens zwei ursprünglichen Sätzen von Wahrscheinlichkeitsinformationen ermittelt ist, und somit Informationen aus wenigstens zwei ursprünglichen Sätzen von Wahrscheinlichkeitsinformationen enthält. Die Fusionierungsstufe ist nun so ausgebildet, dass ein einem Pixel zugeordneter Satz von Wahrscheinlichkeitsinformationen aus dem ersten Bild und einen demselben Pixel zugeordneten Satz von Wahrscheinlichkeitsinformationen aus dem zweiten Bild fusioniert werden, so dass Informationen aus dem ersten Bild und Informationen aus dem zweiten Bild in einem fusionierten Satz von Wahrscheinlichkeitsinformationen zusammengefasst werden. Das fusionieren kann beispielsweise so erfolgen, dass korrespondierende Wahrscheinlichkeitsangaben der beiden Sätze von Wahrscheinlichkeitsinformationen gemittelt werden.A fused set of likelihood information is understood to mean such a set of likelihood information computationally determined from at least two original sets of likelihood information and thus containing information from at least two original sets of likelihood information. The fusing stage is now arranged so that a set of probability information associated with one pixel from the first image and a set of likelihood information associated with the same pixel from the second image are fused such that information from the first image and information from the second image into a fused one Set of probability information summarized become. For example, the merging can take place in such a way that corresponding probability information of the two sets of probability information is averaged.

Die Objektkategorieermittlungsstufe ist nun so ausgebildet, dass die jeweilige Objektkategorie der Segmentierungsinstanzen, welche einer der Objektkennungen zugeordnet sind, auf der Basis der fusionierten Sätze von Wahrscheinlichkeitsinformationen erfolgt. Hierdurch werden bei der Ermittlung der jeweiligen Objektkategorie Informationen aus dem ersten Bild und Informationen aus dem zweiten Bild berücksichtigt. Betrachtet werden können hierbei alle Pixel, welche über die Segmentierungsinstanzen einer der Objektkennungen zugeordnet sind. Im Idealfall legen alle berücksichtigten fusionierten Sätze von Wahrscheinlichkeitsinformationen einer der Objektkennungen die Zugehörigkeit der jeweiligen Objektkennung zu ein und derselben Objektkategorie nahe. Sollte dies nicht der Fall sein, so kann die Ermittlung der jeweiligen Objektkategorie beispielsweise durch Durchschnittsbildungen bei den Wahrscheinlichkeitsangaben der berücksichtigten Sätze von Wahrscheinlichkeitsinformationen erfolgen. Dabei ist allerdings zu berücksichtigen, dass dann die Ermittlung der jeweiligen Objektkategorie einer gewissen Unsicherheit unterliegt.The object category detection stage is now designed so that the respective object category of the segmentation instances associated with one of the object identifiers is based on the merged sets of probability information. As a result, information from the first image and information from the second image are taken into account when determining the respective object category. All pixels that are assigned to one of the object identifiers via the segmentation instances can be considered here. Ideally, all considered fused sets of probability information of one of the object identifiers would approximate the affiliation of the respective object identifier to the same object category. If this is not the case, then the respective object category can be determined, for example, by averaging the probability information of the considered sets of probability information. It should be noted, however, that then the determination of the respective object category is subject to a certain uncertainty.

Die erfindungsgemäße Vorrichtung weist gegenüber den bekannten Vorrichtungen eine erhöhte Erkennungssicherheit und eine erhöhte Verfolgungssicherheit auf. Das bedeutet, dass die Wahrscheinlichkeit der Zuordnung der korrekten Objektkategorie sowie die Wahrscheinlichkeit der korrekten Ermittlung der Positionsveränderungen des jeweiligen Objektes im Zeitverlauf in der jeweiligen Szene erhöht sind. Letzteres gilt insbesondere, dann, wenn im Zeitverlauf neue Objekte in die Szene eintreten austreten und/oder ein in der Szene eine große Zahl von Objekten auftritt.The device according to the invention has, compared to the known devices, increased detection reliability and increased tracking security. This means that the probability of assigning the correct object category as well as the probability of correct determination of the positional changes of the respective object over time in the respective scene are increased. The latter is particularly true when, over time, new objects enter the scene and / or a large number of objects appear in the scene.

Die Erhöhung der Erkennungssicherheit sowie der Verfolgungssicherheit im Vergleich zu Vorrichtungen mit Bounding-Box-Ansätzen ergibt sich daraus, dass die jeweilige Form der Segmentierungsinstanzen als Ergebnis der Segmentierung im Gegensatz zu den Bounding-Boxen auf den Einzelfall bezogen und nicht auf ein Begrenzungsrechteck beschränkt ist, so dass die Segmentierungsinstanzen im allgemeinen keine dem Hintergrund zugehörige oder einer fremden Segmentierungsinstanz zugehörige Pixel enthalten.The increase of the recognition security as well as the tracking security in comparison to devices with bounding box approaches results from the fact that the respective form of the segmentation instances as opposed to the bounding boxes is based on the individual case and not limited to a bounding rectangle, as a result of the segmentation. such that the segmentation instances generally do not contain pixels associated with the background or associated with a foreign segmentation instance.

Die Verbesserung der der Verfolgungssicherheit im Vergleich zu bekannten Vorrichtungen mit Segmentierungs-Ansätzen liegt daran, dass die Zuordnung zu einer Objektkennung der Pixel der Segmentierungsinstanz eines Bildes, dem bereits ein oder mehrere Bilder vorangegangen sind, nicht nur auf den Segmentierungsinstanzinformationen dieses Bildes, sondern auch auf den Segmentierungsinstanzinformationen des oder der vorangegangenen Bilder beruhen.The improvement in tracking security compared to known devices with segmentation approaches is that the assignment to an object identifier of the pixels of the segmentation instance of an image, which has already been preceded by one or more images, not only on the segmentation instance information of this image, but also on based on the segmentation instance information of the previous image (s).

Weiterhin ergibt sich die Verbesserung der Erkennungssicherheit im Vergleich zu bekannten Vorrichtungen mit Segmentierungs-Ansätzen daraus, dass die Ermittlung der Objektkategorie der Segmentierungsinstanzeneines Bildes, dem bereits ein oder mehrere Bilder vorangegangen sind, nicht nur auf den Sätzen von Wahrscheinlichkeitsinformationen dieses Bildes, sondern auch auf den Segmentierungsinstanzinformationen des oder der vorangegangenen Bilder beruhen, welche in den fusionierten Sätzen von Wahrscheinlichkeitsinformationen implizit enthalten sind.Furthermore, the improvement of the recognition security compared to known devices with segmentation approaches results from the fact that the determination of the object category of the segmentation instances of an image, which has already been preceded by one or more images, not only on the sets of probability information of this image, but also on the Segmentation instance information of the previous image (s) implicit in the merged sets of probability information.

Die Erfindung kann insbesondere im Bereich autonomer Fahrzeuge angewendet werden. Hier kann die Erfindung insbesondere dazu eingesetzt werden, andere Verkehrsteilnehmer sowie Verkehrshindernisse zu erkennen und zu verfolgen.The invention can be used in particular in the field of autonomous vehicles. Here, the invention can be used in particular to recognize and track other road users as well as traffic obstructions.

Gemäß einer vorteilhaften Weiterbildung der Erfindung ist die Segmentierungsstufe so ausgebildet, dass bei der Segmentierung eines der Bilder die Vielzahl von Pixeln der Pixel des jeweiligen Bildes alle Pixel der Pixel des jeweiligen Bildes umfasst, welche nicht einer Hintergrundinstanz zugehörig sind. Unter einer Hintergrundinstanz wird dabei eine bei der Segmentierung eines der Bilder gebildete Instanz verstanden, deren Pixel einen Hintergrund des jeweiligen Bildes darstellen. Wenn nun für sämtliche Pixel, die nicht der Hintergrundinstanz angehören, jeweils eine Segmentierungsinstanzinformation und jeweils ein Satz von Wahrscheinlichkeitsinformationen ermittelt wird, so kann die Verfolgungssicherheit und die der Vorrichtung weiter verbessert werden.According to an advantageous development of the invention, the segmentation stage is designed such that, in the segmentation of one of the images, the plurality of pixels of the pixels of the respective image comprise all pixels of the pixels of the respective image which are not associated with a background entity. A background instance is understood here to mean an instance formed during the segmentation of one of the images whose pixels represent a background of the respective image. If a segmentation instance information and a set of probability information are respectively determined for all pixels that do not belong to the background entity, then the tracking security and that of the device can be further improved.

Gemäß einer bevorzugten Weiterbildung der Erfindung ist die Korrespondenzermittlungsstufe zur Ermittlung der Korrespondenzpaare und der Verschiebungsvektoren auf der Basis eines Optischen-Fluss-Verfahrens oder eines Semi-Dense-Matching-Verfahrens ausgebildet. Die Korrespondenzermittlungsstufe kann dabei zur Durchführung eines in der Referenz [7] beschriebenen Optischen-Fluss-Verfahrens oder zur Durchführung eines in der Referenz [8] beschriebenen Semi-Dense-Matching-Verfahrens ausgebildet sein.According to a preferred development of the invention, the correspondence determination stage is designed to determine the correspondence pairs and the displacement vectors on the basis of an optical flow method or a semi-sense matching method. The correspondence determination stage can be designed to carry out an optical flow method described in reference [7] or to carry out a semi-sense matching method described in reference [8].

Gemäß einer zweckmäßigen Weiterbildung der Erfindung ist die Korrespondenzermittlungsstufe so ausgebildet, dass zumindest für einige der Pixel des ersten Bildes, welche keinem der Korrespondenzpaare angehören, jeweils ein interpolierter Verschiebungsvektor auf der Basis von Interpolationen der Verschiebungsvektoren der Korrespondenzpaare, welche derselben Segmentierungsinstanz wie das jeweilige Pixel des ersten Bildes angehören, bestimmt wird. Auf diese Weise kann in dem zweiten Bild die Anzahl der Pixel, denen jeweils eine Segmentierungsinstanzinformation und jeweils ein Satz von Wahrscheinlichkeitsinformationen zugeordnet ist, erhöht werden, so dass die Verfolgungssicherheit und die der Vorrichtung weiter verbessert werden kann.According to an expedient development of the invention, the correspondence determination stage is designed such that at least for some of the pixels of the first image that do not belong to any of the correspondence pairs, an interpolated displacement vector based on interpolations of the displacement vectors of the correspondence pairs, of the same segmentation entity belong to the respective pixel of the first image is determined. In this way, in the second image, the number of pixels each having a segmentation instance information and a set of likelihood information respectively can be increased, so that the tracking certainty and that of the apparatus can be further improved.

Nach einer vorteilhaften Weiterbildung der Erfindung ist die Fusionierungsstufe so ausgebildet, dass das Fusionieren der jeweiligen Sätze von Wahrscheinlichkeitsinformationen durch Bildung gewichteter Summen erfolgt.According to an advantageous embodiment of the invention, the fusing stage is designed such that the fusing of the respective sets of likelihood information takes place by forming weighted sums.

Das Fusionieren der Sätze von Wahrscheinlichkeiten, welche einem der Punkte eines Bildes zugeordnet sind, soll am folgenden Beispiel erläutert werden: Wenn k₁ ein Vektor ist, der einen Satz von Wahrscheinlichkeitsinformationen eines ersten Bildes repräsentiert, k₂ ein Vektor ist, der einen Satz von Wahrscheinlichkeitsinformationen eines nachfolgenden zweiten Bildes repräsentiert, k₃ ein Vektor ist, der einen Satz von Wahrscheinlichkeitsinformationen eines nachfolgenden dritten Bildes repräsentiert, dann kann mittels der nachfolgenden Formel ein fusionierter Satz von Wahrscheinlichkeiten z₃ für den jeweiligen Punkt berechnet werden: $z_{3} = \frac{α_{3} k_{3} + α_{2} k_{2} + α_{1} k_{1}}{α_{3} + α_{2} + α_{1}},$

wobei α₁ das Gewicht des Vektors k₁ , α₁ das Gewicht des Vektors k₁ und α₁ das Gewicht des Vektors k₁ ist, wobei für die Gewichte gilt: 0 < a_n.< 1. Der Nenner der Gleichung dient dabei zur Normierung. Um sicherzustellen, dass ein zeitlich zurückliegender Satz von Wahrscheinlichkeiten weniger berücksichtigt wird als ein aktueller Satz von Wahrscheinlichkeiten kann weiterhin vorgesehen sein: α_n-1.< α_n. Auf diese Weise kann die Erkennungssicherheit der Vorrichtung weiter verbessert werden.Fusing the sets of probabilities associated with one of the points of an image will be explained by the following example: If k ₁ is a vector representing a set of probability information of a first image, k ₂ is a vector representing a set of probability information of a subsequent second image, k ₃ is a vector representing a set of probability information of a subsequent third image, then by means of the following formula a fused set of probabilities z ₃ calculated for each point:

z_{3} = \frac{α_{3} k_{3} + α_{2} k_{2} + α_{1} k_{1}}{α_{3} + α_{2} + α_{1}} .

in which α ₁ the weight of the vector k ₁ . α ₁ the weight of the vector k ₁ and α ₁ the weight of the vector k ₁ where the following applies to the weights: 0 <a _n . <1. The denominator of the equation is used for normalization. To ensure that a lagging set of probabilities is less taken into account than a current set of probabilities, it may further be provided: α _n-1 . <Α _n . In this way, the detection reliability of the device can be further improved.

Gemäß einer vorteilhaften Weiterbildung der Erfindung ist die Objektkennungszuordnungsstufe so ausgebildet, dass das Zuordnen der Pixel einer der Segmentierungsinstanzen des zweiten Bildes zusätzlich auf der Basis des dem jeweiligen Pixel des zweiten Bildes zugeordneten Satzes von Wahrscheinlichkeitsinformationen aus dem ersten Bild und des Satzes von Wahrscheinlichkeitsinformationen des jeweiligen Pixels des zweiten Bildes erfolgt.According to an advantageous development of the invention, the object identifier assignment stage is arranged such that the assignment of the pixels of one of the segmentation instances of the second image additionally on the basis of the respective pixel of the second image associated set of probability information from the first image and the set of probability information of the respective pixel of the second picture.

Dies ist insbesondere dann sinnvoll, wenn die Zuordnung der Pixel einer der Segmentierungsinstanzen zu einer der Objektkennungen auf der Basis der Segmentierungsinstanzinformationen des ersten Bildes eine Unsicherheit aufweist. In diesem Fall ist es beispielsweise möglich, die Sätze von Wahrscheinlichkeiten der zuzuordnenden Segmentierungsinstanz mit den Sätzen von Wahrscheinlichkeiten der bisher eingeführten Objektkennungen zu vergleichen. Wenn nun die Sätze von Wahrscheinlichkeiten der zuzuordnenden Segmentierungsinstanz mit den Sätzen von Wahrscheinlichkeiten einer der bisher eingeführten Objektkennungen korrespondieren, dann kann die fragliche Segmentierungsinstanz dieser Objektkennung zugeordnet werden. Falls jedoch keine Korrespondenz zwischen den Sätzen von Wahrscheinlichkeiten der zuzuordnenden Segmentierungsinstanz und den Sätzen von Wahrscheinlichkeiten der bisher eingeführten Objektkennungen vorliegt, dann kann eine neue Objektkennung eingeführt werden. Auf diese Weise kann die Erkennungssicherheit der Vorrichtung weiter verbessert werden.This is particularly useful if the assignment of the pixels of one of the segmentation instances to one of the object identifiers has an uncertainty on the basis of the segmentation instance information of the first image. In this case, for example, it is possible to compare the sets of probabilities of the segmentation instance to be assigned with the sets of probabilities of the previously introduced object identifiers. If now the sets of probabilities of the segmentation instance to be assigned correspond to the sets of probabilities of one of the previously introduced object identifiers, then the segmentation instance in question can be assigned to this object identifier. However, if there is no correspondence between the sets of probabilities of the segmentation instance to be assigned and the sets of probabilities of the previously introduced object identifiers, then a new object identifier may be introduced. In this way, the detection reliability of the device can be further improved.

Gemäß einer bevorzugten Weiterbildung der Erfindung ist die Segmentierungsstufe zur Segmentierung eines dritten Bildes der Bilder, welches dem zweiten Bild der Bilder zeitlich nachfolgt, ausgebildet;
wobei die Korrespondenzermittlungsstufe zur Ermittlung mehrere weiterer Korrespondenzpaare ausgebildet ist, welche jeweils ein Pixel der Pixel einer der Segmentierungsinstanzen des zweiten Bildes und ein dazu korrespondierendes Pixel der Pixel des dritten Bildes umfassen, wobei aus den weiteren Korrespondenzpaaren jeweils ein weiterer Verschiebungsvektor zwischen dem jeweiligen Pixel des zweiten Bildes und dem jeweiligen korrespondierenden Pixel des dritten Bildes ermittelt ist;
wobei die Zuordnungsstufe zum Zuordnen der jeweils einem Pixel der Pixel des zweiten Bildes zugeordneten Segmentierungsinstanzinformationen aus dem ersten Bild und aus dem zweiten Bild auf der Basis der weiteren Verschiebungsvektoren zu jeweils einem korrespondierenden Pixel der Pixel des dritten Bildes ausgebildet ist, so dass dem jeweiligen Pixel des dritten Bildes eine Segmentierungsinstanzinformation der Segmentierungsinstanzinformationen aus dem ersten Bild und eine Segmentierungsinstanzinformation der Segmentierungsinstanzinformationen aus dem zweiten Bild zugeordnet sind;
wobei die Zuordnungsstufe zum Zuordnen der einem Pixel der Pixel des zweiten Bildes zugeordneten Sätze von Wahrscheinlichkeitsinformationen aus dem ersten Bild und aus dem zweiten Bild auf der Basis der weiteren Verschiebungsvektoren zu jeweils einem korrespondierenden Pixel der Pixel des dritten Bildes ausgebildet ist, so dass dem jeweiligen Pixel des dritten Bildes ein Satz der Sätze von Wahrscheinlichkeitsinformationen aus dem ersten Bild und ein Satz der Sätze von Wahrscheinlichkeitsinformationen aus dem zweiten Bild zugeordnet sind;
wobei die Objektkennungszuordnungsstufe zum Zuordnen der Pixel einer der Segmentierungsinstanzen des dritten Bildes zu einer der Objektkennungen ausgebildet ist, sofern die jeweilige Segmentierungsinstanz des dritten Bildes dasselbe Objekt abbildet, wie die der jeweiligen Objektkennung zugeordnete Segmentierungsinstanz des ersten Bildes und/oder die der jeweiligen Objektkennung zugeordnete Segmentierungsinstanz des zweiten Bildes, wobei das Zuordnen der Pixel der jeweiligen Segmentierungsinstanz des dritten Bildes auf der Basis der Segmentierungsinstanzinformation des jeweiligen Pixels und der dem jeweiligen Pixel zugeordneten Segmentierungsinstanzinformationen aus dem ersten Bild und/oder aus dem zweiten Bild erfolgt;
wobei die Fusionierungsstufe zum Fusionieren des einem Pixel der Pixel des dritten Bildes zugeordneten Satzes von Wahrscheinlichkeitsinformationen aus dem ersten Bild, des dem jeweiligen Pixel des dritten Bildes zugeordneten Satzes von Wahrscheinlichkeitsinformationen aus dem zweiten Bild und des Satzes von Wahrscheinlichkeitsinformationen des jeweiligen Pixels des dritten Bild ausgebildet ist, um so für das jeweilige Pixel des dritten Bildes einen fusionierten Satz von Wahrscheinlichkeitsinformationen zu bestimmen; und
wobei die Objektkategorieermittlungsstufe zum Ermitteln der jeweiligen Objektkategorie der Segmentierungsinstanzen, welche einer der Objektkennungen zugeordnet sind, auf der Basis der fusionierten Sätze von Wahrscheinlichkeitsinformationen der der jeweiligen Objektkennung zugeordneten Pixel des dritten Bildes ausgebildet ist.According to a preferred development of the invention, the segmentation stage is designed for segmenting a third image of the images, which follows in time the second image of the images;
wherein the correspondence determination stage is adapted to determine a plurality of further correspondence pairs, each comprising a pixel of the pixels of one of the segmentation instances of the second image and a corresponding pixel of the pixels of the third image, wherein from the further correspondence pairs in each case a further displacement vector between the respective pixel of the second Image and the respective corresponding pixel of the third image is determined;
wherein the association stage is adapted to associate the segmentation instance information associated with each one pixel of the pixels of the second image from the first image and the second image based on the further displacement vectors to a respective one of the pixels of the third image, so that the respective pixel of the third image a segmentation instance information of the segmentation instance information from the first image and a segmentation instance information of the segmentation instance information from the second image are assigned to the third image;
wherein the association stage is adapted to associate the sets of probability information associated with one pixel of the pixels of the second image from the first image and the second image based on the further displacement vectors to a respective one of the pixels of the third image, such that the respective pixel the third image is assigned a set of the sets of probability information from the first image and a set of the sets of probability information from the second image;
wherein the object identifier assignment stage is adapted to associate the pixels of one of the segmentation instances of the third image with one of the object identifiers if the respective segmentation entity of the third image maps the same object, such as the segmentation instance of the first image assigned to the respective object identifier and / or the segmentation entity of the second image associated with the respective object identifier, wherein the assignment of the pixels of the respective segmentation entity of the third image on the basis of the segmentation instance information of the respective pixel and the segmentation instance information associated with the respective pixel from the first image and / or from the second image takes place;
wherein the fusing stage is adapted to fusing the set of probability information from the first image associated with one pixel of the pixels of the third image, the set of probability information from the second image associated with the respective pixel of the third image, and the set of likelihood information from the respective pixel of the third image so as to determine a fused set of likelihood information for the respective pixel of the third image; and
wherein the object category determination stage is adapted to determine the respective object category of the segmentation instances associated with one of the object identifiers on the basis of the fused sets of probability information of the pixels of the third image associated with the respective object identifier.

Gemäß einer zweckmäßigen Weiterbildung der Erfindung ist die Objektkennungszuordnungsstufe so ausgebildet, dass das Zuordnen der Pixel einer der Segmentierungsinstanzen des dritten Bildes zusätzlich auf der Basis der dem jeweiligen Pixel des dritten Bildes zugeordneten Sätze von Wahrscheinlichkeitsinformationen aus dem ersten und dem zweiten Bild) und des Satzes von Wahrscheinlichkeitsinformationen des jeweiligen Pixels des dritten Bildes erfolgt.According to an expedient development of the invention, the object identifier assignment stage is arranged such that the assignment of the pixels of one of the segmentation instances of the third image additionally on the basis of the respective pixel of the third image associated sets of probability information from the first and the second image) and the set of Probability information of the respective pixel of the third image is done.

Vorrichtung nach einem der Ansprüche 7 oder 8, wobei die Korrespondenzermittlungsstufe so ausgebildet ist, dass zumindest für einige der Pixel des zweiten Bildes, welche keinem der weiteren Korrespondenzpaare angehören, jeweils ein weiterer interpolierter Verschiebungsvektor auf der Basis von Interpolationen der weiteren Verschiebungsvektoren der Korrespondenzpaare, welche derselben Segmentierungsinstanz wie das jeweilige Pixel des zweiten Bildes angehören, bestimmt werden.Apparatus according to any one of claims 7 or 8, wherein the correspondence detection stage is arranged such that, at least for some of the pixels of the second image which do not belong to any of the further correspondence pairs, a further interpolated displacement vector based on interpolations of the further displacement vectors of the correspondence pairs the same segmentation instance as the respective pixel of the second image.

In diesen Ausführungsbeispielen beruht die Verfolgung und die Erkennung eines Objektes nicht nur auf Informationen aus dem ersten und dem zweiten Bild, sondern auch auf Informationen aus dem dritten Bild. Es versteht sich von selbst, dass in analoger Weise auch Informationen aus einem vierten Bild, aus einem fünften Bild usw. verarbeitet werden könnten. Auf diese Weise kann die Verfolgungssicherheit und die Erkennungssicherheit der Vorrichtung weiter verbessert werden.In these embodiments, the tracking and recognition of an object is based not only on information from the first and second images, but also on information from the third image. It goes without saying that information from a fourth image, from a fifth image, etc., could also be processed in an analogous manner. In this way, the tracking security and the detection reliability of the device can be further improved.

In einem weiteren Aspekt betrifft die Erfindung ein System zum Erkennen und Verfolgen von Objekten, wobei das System umfasst:

eine Videokamera zur Erzeugung einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln bestehenden Bildern; und
eine Vorrichtung zum Erkennen und Verfolgen von Objekten in der Videosequenz nach einem der vorstehenden Ansprüche.

In a further aspect, the invention relates to a system for recognizing and tracking objects, the system comprising:

a video camera for generating a video sequence of temporally successive pixel-based images; and
a device for detecting and tracking objects in the video sequence according to any one of the preceding claims.

In einem weiteren Aspekt betrifft Erfindung ein Verfahren zum Erkennen und Verfolgen von Objekten in einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln bestehenden Bildern, wobei das Verfahren folgende Schritte umfasst:

Segmentierung wenigstens eines ersten Bildes der Bilder und eines zeitlich nachfolgenden zweiten Bildes der Bilder, wobei mehrere vordefinierte semantische Objektkategorien vorgesehen sind und wobei bei der Segmentierung eines Bildes der Bilder
für in dem jeweiligen Bild abgebildete Objekte jeweils eine für das jeweilige Bild gültige Segmentierungsinstanz gebildet wird, welche jeweils eines der Objekte abbildet, und
für mehrere Pixel der Pixel des jeweiligen Bildes jeweils eine Segmentierungsinstanzinformation und jeweils ein Satz von Wahrscheinlichkeitsinformationen ermittelt wird, wobei die jeweilige Segmentierungsinstanzinformation angibt, zu welcher der Segmentierungsinstanzen das jeweilige Pixel zugehörig ist, und wobei jede der Wahrscheinlichkeitsinformationen des jeweilige Satzes von Wahrscheinlichkeitsinformationen eine Wahrscheinlichkeitsangabe der Zugehörigkeit des jeweiligen Pixels zu einer der Objektkategorien angibt;

In a further aspect, the invention relates to a method for recognizing and tracking objects in a video sequence from temporally successive pixel-based images, the method comprising the following steps:

Segmentation of at least a first image of the images and a temporally subsequent second image of the images, wherein a plurality of predefined semantic object categories are provided and wherein in the segmentation of an image of the images
for each of the objects imaged in the respective image, a respective segmentation instance valid for the respective image is formed, which respectively represents one of the objects, and
for each of a plurality of pixels of the pixels of the respective image, a segmentation instance information and a set of probability information are respectively determined, the respective segmentation instance information indicating to which of the segmentation entities the respective pixel belongs, and wherein each of the probability information of the respective set of probability information is a probability indication of the membership indicates the respective pixel to one of the object categories;

Ermittlung von Korrespondenzpaaren, welche jeweils ein Pixel der Pixel einer der Segmentierungsinstanzen des ersten Bildes und ein dazu korrespondierendes Pixel der Pixel des zweiten Bildes umfassen, wobei aus den Korrespondenzpaaren jeweils ein Verschiebungsvektor zwischen dem jeweiligen Pixel des ersten Bildes und dem jeweiligen Pixel des zweiten Bildes ermittelt wird;Determining correspondence pairs, each comprising a pixel of the pixels of one of the segmentation instances of the first image and a corresponding pixel of the pixels of the second image, wherein each of the correspondence pairs determines a displacement vector between the respective pixel of the first image and the respective pixel of the second image becomes;

Zuordnen der Segmentierungsinstanzinformation und des Satzes von Wahrscheinlichkeitsinformationen wenigstens eines Pixels der Pixel des ersten Bildes auf der Basis der Verschiebungsvektoren zu jeweils einem Pixel der Pixel des zweiten Bildes, so dass dem jeweiligen Pixel des zweiten Bildes eine Segmentierungsinstanzinformation der Segmentierungsinstanzinformationen aus dem ersten Bild und ein Satz der Sätze von Wahrscheinlichkeitsinformationen aus dem ersten Bild zugeordnet sind; undAssigning the segmentation instance information and the set of probability information of at least one pixel of the pixels of the first image on the basis of the displacement vectors to one pixel of the pixels of the second image, so that the segmentation instance information of the segmentation instance information is given to the respective pixel of the second image associated with the first image and a set of the sets of probability information from the first image; and

Zuordnen der Pixel einer der Segmentierungsinstanzen des ersten Bildes und der Pixel einer der Segmentierungsinstanzen des zweiten Bildes zu einer gemeinsamen Objektkennung, sofern die jeweilige Segmentierungsinstanz des ersten Bildes und die Segmentierungsinstanz des zweiten Bildes dasselbe Objekt abbilden, wobei die Objektkennung das jeweilige Objekt über die Videosequenz hinweg eindeutig kennzeichnet, wobei das Zuordnen der Pixel der Segmentierungsinstanz des ersten Bildes auf der Basis der Segmentierungsinstanzinformation des jeweiligen Pixels erfolgt, und wobei das Zuordnen der Pixel der Segmentierungsinstanz des zweiten Bildes auf der Basis der Segmentierungsinstanzinformation des jeweiligen Pixels und der dem jeweiligen Pixel zugeordneten Segmentierungsinstanzinformationen aus dem ersten Bild erfolgt;Assigning the pixels of one of the segmentation instances of the first image and the pixels of one of the segmentation instances of the second image to a common object identifier if the respective segmentation instance of the first image and the segmentation entity of the second image image the same object, the object identifier passing the respective object across the video sequence uniquely identifying wherein the pixels of the segmentation instance of the first image are assigned based on the segmentation instance information of the respective pixel, and assigning the pixels of the segmentation instance of the second image based on the segmentation instance information of the respective pixel and the segmentation instance information associated with the respective pixel the first picture takes place;

Fusionieren des jeweils einem Pixel der Pixel des zweiten Bildes zugeordneten Satzes von Wahrscheinlichkeitsinformationen aus dem ersten Bild und des Satzes von Wahrscheinlichkeitsinformationen des jeweiligen Pixels des zweiten Bildes, um so für ein oder mehre Pixel des zweiten Bildes jeweils einen fusionierten Satz von Wahrscheinlichkeitsinformationen zu bestimmen; undFusing the set of probability information from the first image associated with each one pixel of the pixels of the second image and the set of probability information of the respective pixel of the second image so as to respectively determine a fused set of probability information for one or more pixels of the second image; and

Ermitteln der jeweiligen Objektkategorie der Segmentierungsinstanzen, welche einer der Objektkennungen zugeordnet sind, auf der Basis der fusionierten Sätze von Wahrscheinlichkeitsinformationen der der jeweiligen Objektkennung zugeordneten Pixel des zweiten Bildes.Determining the respective object category of the segmentation instances associated with one of the object identifiers, based on the merged sets of probability information, of the pixels of the second image associated with the respective object identifier.

In einem weiteren Aspekt betrifft Erfindung ein Computerprogramm zur Durchführung eines erfindungsgemäßen Verfahrens, wenn es auf einem Computer oder Prozessor ausgeführt wird.In another aspect, the invention relates to a computer program for performing a method of the invention when executed on a computer or processor.

Im Folgenden werden die vorliegende Erfindung und deren Vorteile anhand von Figuren näher beschrieben.

1 zeigt ein erstes Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung in einer schematischen Darstellung;
2 zeigt ein zweites Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung in einer schematischen Darstellung;
3 zeigt ein drittes Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung in einer schematischen Darstellung;
4 zeigt ein viertes Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung in einer schematischen Darstellung; und
5 illustriert die Funktionsweise der Zuordnungsstufe der erfindungsgemäßen Vorrichtung.

In the following, the present invention and its advantages will be described in more detail with reference to figures.

1 shows a first embodiment of a device according to the invention in a schematic representation;
2 shows a second embodiment of a device according to the invention in a schematic representation;
3 shows a third embodiment of a device according to the invention in a schematic representation;
4 shows a fourth embodiment of a device according to the invention in a schematic representation; and
5 illustrates the operation of the assignment stage of the device according to the invention.

Gleiche oder gleichartige Elemente oder Elemente mit gleicher oder äquivalenter Funktion sind im Folgenden mit gleichen oder gleichartigen Bezugszeichen versehen.Identical or similar elements or elements with the same or equivalent function are provided below with the same or similar reference numerals.

In der folgenden Beschreibung werden Ausführungsbeispiele mit einer Vielzahl von Merkmalen der vorliegenden Erfindung näher beschrieben, um ein besseres Verständnis der Erfindung zu vermitteln. Es ist jedoch festzuhalten, dass die vorliegende Erfindung auch unter Auslassung einzelner der beschriebenen Merkmale umgesetzt werden kann. Es sei auch darauf hingewiesen, dass die in verschiedenen Ausführungsbeispielen gezeigten Merkmale auch in anderer Weise kombinierbar sind, sofern dies nicht ausdrücklich ausgeschlossen ist oder zu Widersprüchen führen würde.In the following description, embodiments having a plurality of features of the present invention will be described in detail to provide a better understanding of the invention. It should be noted, however, that the present invention may be practiced by omitting some of the features described. It should also be noted that the features shown in various embodiments can also be combined in other ways, unless this is expressly excluded or would lead to contradictions.

1 zeigt ein erstes Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung 1 in einer schematischen Darstellung. 1 shows a first embodiment of a device according to the invention 1 in a schematic representation.

Die Vorrichtung 1 zum Erkennen und Verfolgen von Objekten in einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln PI bestehenden Bildern BI umfasst:

eine Segmentierungsstufe 2 zur Segmentierung wenigstens eines ersten Bildes BI1 der Bilder BI und eines zeitlich nachfolgenden zweiten Bildes BI2 der Bilder BI, wobei die Segmentierungsstufe 2 über mehrere vordefinierte semantische Objektkategorien OKT verfügt und wobei die Segmentierungsstufe 2 so ausgebildet ist, dass bei der Segmentierung eines Bildes BI1, BI2 der Bilder BI
für in dem jeweiligen Bild BI1, BI2 abgebildete Objekte jeweils eine für das jeweilige Bild BI1, BI2 gültige Segmentierungsinstanz SI1, SI2 gebildet ist, und
für mehrere Pixel PI1, PI2 der Pixel PI1, PI2 des jeweiligen Bildes BI1, BI2 jeweils eine Segmentierungsinstanzinformation SII1, SII2 und jeweils ein Satz von Wahrscheinlichkeitsinformationen SVW1, SVW2 ermittelt sind, wobei die jeweilige Segmentierungsinstanzinformation SII1, SII2 angibt, zu welcher der Segmentierungsinstanzen SI1, SI2 des jeweiligen Bildes BI1, BI2 das jeweilige Pixel PI1, PI2 zugehörig ist, und wobei jede der Wahrscheinlichkeitsinformationen des jeweilige Satzes von Wahrscheinlichkeitsinformationen SVW1, SVW2 eine Wahrscheinlichkeitsangabe der Zugehörigkeit des jeweiligen Pixels PI1, PI2 zu einer der Objektkategorien OKT angibt;
eine Korrespondenzermittlungsstufe 3 zur Ermittlung mehrerer Korrespondenzpaare, welche jeweils ein Pixel PI1 der Pixel PI1 einer der Segmentierungsinstanzen SI1 des ersten Bildes BI1 und ein dazu korrespondierendes Pixel PI2 der Pixel PI2 des zweiten Bildes BI2 umfassen, wobei aus den Korrespondenzpaaren jeweils ein Verschiebungsvektor VV zwischen dem jeweiligen Pixel PI1 des ersten Bildes BI1 und dem jeweiligen korrespondierenden Pixel PI2 des zweiten Bildes BI ermittelt ist;
eine Zuordnungsstufe 4 zum Zuordnen der Segmentierungsinstanzinformation SII1 und des Satzes von Wahrscheinlichkeitsinformationen SVW1 wenigstens eines Pixels PI1 der Pixel PI1 des ersten Bildes BI1 auf der Basis der Verschiebungsvektoren VV zu jeweils einem Pixel PI2 der Pixel PI2 des zweiten Bildes, so dass dem jeweiligen Pixel PI2 des zweiten Bildes eine Segmentierungsinstanzinformation SII1 der Segmentierungsinstanzinformationen SII1 aus dem ersten Bild BI1 und ein Satz SVW1 der Sätze von Wahrscheinlichkeitsinformationen SVW1 aus dem ersten Bild BI1 zugeordnet sind;
eine Objektkennungszuordnungsstufe 5 zum Zuordnen der Pixel PI1 einer der Segmentierungsinstanzen SI1 des ersten Bildes BI1 und der Pixel PI2 einer der Segmentierungsinstanzen SI2 des zweiten Bildes BI2 zu einer gemeinsamen Objektkennung OBK, sofern die jeweilige Segmentierungsinstanz SI1 des ersten Bildes BI1 und die Segmentierungsinstanz SI2 des zweiten Bildes BI2 dasselbe Objekt abbilden, wobei die Objektkennung OBK das jeweilige Objekt über die Videosequenz hinweg eindeutig kennzeichnet, wobei das Zuordnen der Pixel PI1 der jeweiligen Segmentierungsinstanz SI1 des ersten Bildes BI1 auf der Basis der Segmentierungsinstanzinformation SII1 des jeweiligen Pixels PI1 erfolgt, und wobei das Zuordnen der Pixel PI2 der jeweiligen Segmentierungsinstanz SI2 des zweiten Bildes BI2 auf der Basis der Segmentierungsinstanzinformation SII2 des jeweiligen Pixels PI2 und der dem jeweiligen Pixel PI2 zugeordneten Segmentierungsinstanzinformation SII1 aus dem ersten Bild BI1 erfolgt;
eine Fusionierungsstufe 6 zum Fusionieren des jeweils einem Pixel PI2 der Pixel PI2 des zweiten Bildes BI2 zugeordneten Satzes von Wahrscheinlichkeitsinformationen SVW1 aus dem ersten Bild BI1 und des Satzes von Wahrscheinlichkeitsinformationen SVW2 des jeweiligen Pixels PI2 des zweiten Bildes BI2, um so für ein oder mehrere Pixel des zweiten Bildes PI2 jeweils einen fusionierten Satz von Wahrscheinlichkeitsinformationen FSVW2 zu bestimmen; und
eine Objektkategorieermittlungsstufe 7 zum Ermitteln der jeweiligen Objektkategorie OKT der Segmentierungsinstanzen SI1, SI2, welche einer der Objektkennungen OBK zugeordnet sind, auf der Basis der fusionierten Sätze von Wahrscheinlichkeitsinformationen FSVW2 der der jeweiligen Objektkennung OBK zugeordneten Pixel PI2 des zweiten Bildes BI2.

The device 1 to recognize and track objects in a video sequence of temporally consecutive pixels PI existing pictures BI includes:

a segmentation level 2 for segmentation of at least one first image BI1 the pictures BI and a temporally subsequent second image BI2 the pictures BI , wherein the segmentation stage 2 over several predefined semantic object categories October and where the segmentation level 2 is designed so that when segmenting an image BI1 . BI2 the pictures BI
for in the respective picture BI1 . BI2 Imaged objects one for each image BI1 . BI2 valid segmentation instance SI1 . SI2 is formed, and
for several pixels PI1 . PI2 the pixel PI1 . PI2 of the respective picture BI1 . BI2 one segmentation instance information each SII1 . SII2 and one set of probability information each SVW1 . SVW2 are determined, wherein the respective segmentation instance information SII1 . SII2 indicates to which of the segmentation instances SI1 . SI2 of the respective picture BI1 . BI2 the respective pixel PI1 . PI2 and each of the probability information of the respective set of probability information SVW1 . SVW2 a Probability indication of the membership of the respective pixel PI1 . PI2 to one of the object categories October indicates;
a correspondence determination stage 3 for identifying multiple correspondence pairs, each one pixel PI1 the pixel PI1 one of the segmentation instances SI1 of the first picture BI1 and a corresponding pixel PI2 the pixel PI2 of the second picture BI2 comprise, wherein from the correspondence pairs in each case a displacement vector VV between each pixel PI1 of the first picture BI1 and the respective corresponding pixel PI2 of the second picture BI is determined;
an assignment level 4 for assigning the segmentation instance information SII1 and the set of probability information SVW1 at least one pixel PI1 the pixel PI1 of the first picture BI1 based on the displacement vectors VV to one pixel each PI2 the pixel PI2 of the second image, allowing the respective pixel PI2 of the second image, a segmentation instance information SII1 the segmentation instance information SII1 from the first picture BI1 and a sentence SVW1 the sets of probability information SVW1 from the first picture BI1 assigned;
an object identifier mapping stage 5 to associate the pixels PI1 one of the segmentation instances SI1 of the first picture BI1 and the pixel PI2 one of the segmentation instances SI2 of the second picture BI2 to a common object identifier OBK if the respective segmentation instance SI1 of the first picture BI1 and the segmentation instance SI2 of the second picture BI2 mimic the same object, where the object identifier OBK clearly identifies the respective object across the video sequence, wherein the mapping of the pixels PI1 the respective segmentation instance SI1 of the first picture BI1 based on the segmentation instance information SII1 of the respective pixel PI1 takes place, and wherein the mapping of the pixels PI2 the respective segmentation instance SI2 of the second picture BI2 based on the segmentation instance information SII2 of the respective pixel PI2 and the respective pixel PI2 associated segmentation instance information SII1 from the first picture BI1 he follows;
a level of fusion 6 for fusing each one pixel PI2 the pixel PI2 of the second picture BI2 associated set of probability information SVW1 from the first picture BI1 and the set of probability information SVW2 of the respective pixel PI2 of the second picture BI2 so as to one or more pixels of the second image PI2 each a merged set of probability information FSVW2 to determine; and
an object category discovery stage 7 for determining the respective object category October the segmentation instances SI1 . SI2 which is one of the object identifiers OBK are assigned on the basis of the merged sets of probability information FSVW2 that of the respective object identifier OBK associated pixels PI2 of the second picture BI2 ,

Gemäß einer bevorzugten Weiterbildung der Erfindung ist die Segmentierungsstufe 2 so ausgebildet, dass bei der Segmentierung eines der Bilder BI die Vielzahl von Pixeln PI der Pixel PI des jeweiligen Bildes BI alle Pixel PI der Pixel PI des jeweiligen Bildes BI umfasst, welche nicht einer Hintergrundinstanz zugehörig sind.According to a preferred development of the invention, the segmentation stage 2 designed so that when segmenting one of the pictures BI the multitude of pixels PI the pixel PI of the respective picture BI all pixels PI the pixel PI of the respective picture BI includes, which are not associated with a background instance.

Nach einer zweckmäßigen Weiterbildung der Erfindung ist die Korrespondenzermittlungsstufe 3 zur Ermittlung der Korrespondenzpaare und der Verschiebungsvektoren W auf der Basis eines Optischen-Fluss-Verfahrens oder eines Semi-Dense-Matching-Verfahrens ausgebildet.According to an expedient development of the invention, the correspondence determination stage 3 for determining the correspondence pairs and the displacement vectors W formed on the basis of an optical flow method or a semi-dense matching method.

Gemäß einer zweckmäßigen Weiterbildung der Erfindung ist die Fusionierungsstufe 6 so ausgebildet, dass das Fusionieren der jeweiligen Sätze von Wahrscheinlichkeitsinformationen SVW durch Bildung gewichteter Summen erfolgt.According to an expedient development of the invention, the level of fusion is 6 designed so that the merging of the respective sets of probability information SVW done by weighted sums.

In einem weiteren Aspekt betrifft Erfindung ein System zum Erkennen und Verfolgen von Objekten, wobei das System umfasst:

eine Videokamera (nicht dargestellt) zur Erzeugung einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln PI bestehenden Bildern BI; und
eine Vorrichtung 1 zum Erkennen und Verfolgen von Objekten in der Videosequenz nach einem der vorstehenden Ansprüche.

a video camera (not shown) for generating a video sequence of temporally successive pixels PI existing pictures BI ; and
a device 1 for recognizing and tracking objects in the video sequence according to any one of the preceding claims.

In einem weiteren Aspekt betrifft Erfindung ein Verfahren zum Erkennen und Verfolgen von Objekten in einer Videosequenz aus zeitlich aufeinanderfolgenden aus Pixeln PI bestehenden Bildern BI, wobei das Verfahren folgende Schritte umfasst:

Segmentierung wenigstens eines ersten Bildes BI1 der Bilder BI und eines zeitlich nachfolgenden zweiten Bildes BI2 der Bilder BI, wobei mehrere vordefinierte semantische Objektkategorien OKT vorgesehen sind und wobei bei der Segmentierung eines Bildes BI1, BI2 der Bilder BI
für in dem jeweiligen Bild BI1, BI2 abgebildete Objekte jeweils eine für das jeweilige Bild BI1, BI2 gültige Segmentierungsinstanz SI1, SI2 gebildet wird, welche jeweils eines der Objekte abbildet, und
für mehrere Pixel PI1, PI2 der Pixel PI1, PI2 des jeweiligen Bildes BI1, BI2 jeweils eine Segmentierungsinstanzinformation SII1, SII2 und jeweils ein Satz von Wahrscheinlichkeitsinformationen SVW1, SVW2 ermittelt wird, wobei die jeweilige Segmentierungsinstanzinformation SII1, SII2 angibt, zu welcher der Segmentierungsinstanzen SI1, SI2 das jeweilige Pixel PI1, PI2 zugehörig ist, und wobei jede der Wahrscheinlichkeitsinformationen des jeweilige Satzes von Wahrscheinlichkeitsinformationen SVW1, SVW2 eine Wahrscheinlichkeitsangabe der Zugehörigkeit des jeweiligen Pixels PI1, PI2 zu einer der Objektkategorien angibt OKT;

In another aspect, the invention relates to a method of recognizing and tracking objects in a video sequence of temporally successive pixels PI existing pictures BI the method comprising the steps of:

Segmentation of at least a first image BI1 the pictures BI and a temporally subsequent second image BI2 the pictures BI , where several predefined semantic object categories October are provided and wherein at the segmentation of an image BI1 . BI2 the pictures BI
for in the respective picture BI1 . BI2 Imaged objects one for each image BI1 . BI2 valid segmentation instance SI1 . SI2 is formed, which each represents one of the objects, and
for several pixels PI1 . PI2 the pixel PI1 . PI2 of the respective picture BI1 . BI2 one segmentation instance information each SII1 . SII2 and one set of probability information each SVW1 . SVW2 is determined, wherein the respective segmentation instance information SII1 . SII2 indicates to which of the segmentation instances SI1 . SI2 the respective pixel PI1 . PI2 and each of the probability information of the respective set of probability information SVW1 . SVW2 a probability indication of the membership of the respective pixel PI1 . PI2 to one of the object categories October ;

Ermittlung von Korrespondenzpaaren, welche jeweils ein Pixel PI1 der Pixel PI1 einer der Segmentierungsinstanzen SI1 des ersten Bildes BI1 und ein dazu korrespondierendes Pixel PI2 der Pixel PI2 des zweiten Bildes B2 umfassen, wobei aus den Korrespondenzpaaren jeweils ein Verschiebungsvektor VV zwischen dem jeweiligen Pixel PI1 des ersten Bildes BI1 und dem jeweiligen Pixel PI2 des zweiten Bildes BI2 ermittelt wird;
Zuordnen der Segmentierungsinstanzinformation SII1 und des Satzes von Wahrscheinlichkeitsinformationen SVW1 wenigstens eines Pixels PI1 der Pixel PI1 des ersten Bildes BI1 auf der Basis der Verschiebungsvektoren W zu jeweils einem Pixel PI2 der Pixel PI2 des zweiten Bildes BI2, so dass dem jeweiligen Pixel PI2 des zweiten Bildes BI2 eine Segmentierungsinstanzinformation SII1 der Segmentierungsinstanzinformationen SII1 aus dem ersten Bild BI1 und ein Satz SVW1 der Sätze von Wahrscheinlichkeitsinformationen SVW aus dem ersten Bild BI1 zugeordnet sind; und
Zuordnen der Pixel PI1 einer der Segmentierungsinstanzen SI1 des ersten Bildes BI1 und der Pixel PI2 einer der Segmentierungsinstanzen SI2 des zweiten Bildes BI2 zu einer gemeinsamen Objektkennung OBK, sofern die jeweilige Segmentierungsinstanz SI1 des ersten Bildes BI1 und die Segmentierungsinstanz SI2 des zweiten Bildes BI2 dasselbe Objekt abbilden, wobei die Objektkennung OBK das jeweilige Objekt über die Videosequenz hinweg eindeutig kennzeichnet, wobei das Zuordnen der Pixel PI1 der Segmentierungsinstanz SI1 des ersten Bildes BI1 auf der Basis der Segmentierungsinstanzinformation SII1 des jeweiligen Pixels PI1 erfolgt, und wobei das Zuordnen der Pixel PI2 der Segmentierungsinstanz SI2 des zweiten Bildes BI2 auf der Basis der Segmentierungsinstanzinformation SII2 des jeweiligen Pixels PI2 und der dem jeweiligen Pixel PI2 zugeordneten Segmentierungsinstanzinformationen SII1 aus dem ersten Bild BI1 erfolgt;
Fusionieren des jeweils einem Pixel PI2 der Pixel PI2 des zweiten Bildes BI2 zugeordneten Satzes von Wahrscheinlichkeitsinformationen SVW1 aus dem ersten Bild BI1 und des Satzes von Wahrscheinlichkeitsinformationen SVW2 des jeweiligen Pixels PI2 des zweiten Bildes BI2, um so für ein oder mehre Pixel PI2 des zweiten Bildes BI2 jeweils einen fusionierten Satz von Wahrscheinlichkeitsinformationen FSVW2 zu bestimmen; und
Ermitteln der jeweiligen Objektkategorie OKT der Segmentierungsinstanzen SI, welche einer der Objektkennungen OBK zugeordnet sind, auf der Basis der fusionierten Sätze von Wahrscheinlichkeitsinformationen FSVW2 der der jeweiligen Objektkennung OBK zugeordneten Pixel PI2 des zweiten Bildes BI2.Determination of correspondence pairs, each one pixel PI1 the pixel PI1 one of the segmentation instances SI1 of the first picture BI1 and a corresponding pixel PI2 the pixel PI2 of the second picture B2 comprise, wherein from the correspondence pairs in each case a displacement vector VV between each pixel PI1 of the first picture BI1 and the respective pixel PI2 of the second picture BI2 is determined;
Assign the segmentation instance information SII1 and the set of probability information SVW1 at least one pixel PI1 the pixel PI1 of the first picture BI1 based on the displacement vectors W to one pixel each PI2 the pixel PI2 of the second picture BI2 , so that the respective pixel PI2 of the second picture BI2 a segmentation instance information SII1 the segmentation instance information SII1 from the first picture BI1 and a sentence SVW1 the sets of probability information SVW from the first picture BI1 assigned; and
Map the pixels PI1 one of the segmentation instances SI1 of the first picture BI1 and the pixel PI2 one of the segmentation instances SI2 of the second picture BI2 to a common object identifier OBK if the respective segmentation instance SI1 of the first picture BI1 and the segmentation instance SI2 of the second picture BI2 mimic the same object, where the object identifier OBK clearly identifies the respective object across the video sequence, wherein the mapping of the pixels PI1 the segmentation instance SI1 of the first picture BI1 based on the segmentation instance information SII1 of the respective pixel PI1 takes place, and wherein the mapping of the pixels PI2 the segmentation instance SI2 of the second picture BI2 based on the segmentation instance information SII2 of the respective pixel PI2 and the respective pixel PI2 associated segmentation instance information SII1 from the first picture BI1 he follows;
Fusion of each one pixel PI2 the pixel PI2 of the second picture BI2 associated set of probability information SVW1 from the first picture BI1 and the set of probability information SVW2 of the respective pixel PI2 of the second picture BI2 in order for one or more pixels PI2 of the second picture BI2 each a merged set of probability information FSVW2 to determine; and
Determining the respective object category October the segmentation instances SI which is one of the object identifiers OBK are assigned on the basis of the merged sets of probability information FSVW2 that of the respective object identifier OBK associated pixels PI2 of the second picture BI2 ,

2 zeigt ein zweites Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung 1 in einer schematischen Darstellung. 2 shows a second embodiment of a device according to the invention 1 in a schematic representation.

Nach einer zweckmäßigen Weiterbildung der Erfindung ist die Korrespondenzermittlungsstufe 3 so ausgebildet, dass zumindest für einige der Pixel PI1 des ersten Bildes BI1, welche keinem der Korrespondenzpaare angehören, jeweils ein interpolierter Verschiebungsvektor IVV auf der Basis von Interpolationen der Verschiebungsvektoren VV der Korrespondenzpaare, welche derselben Segmentierungsinstanz SI1 wie das jeweilige Pixel PI1 des ersten Bildes BI1 angehören, bestimmt wird.According to an expedient development of the invention, the correspondence determination stage 3 designed so that at least for some of the pixels PI1 of the first picture BI1 which belong to none of the correspondence pairs, one interpolated displacement vector each IVV based on interpolations of the displacement vectors VV the correspondence pairs, which are the same segmentation instance SI1 like the respective pixel PI1 of the first picture BI1 belong, is determined.

Nach einer zweckmäßigen Weiterbildung der Erfindung ist die Objektkennungszuordnungsstufe 5 so ausgebildet, dass das Zuordnen der Pixel PI2 einer der Segmentierungsinstanzen SI2 des zweiten Bildes BI2 zusätzlich auf der Basis des dem jeweiligen Pixel PI2 des zweiten Bildes BI2 zugeordneten Satzes von Wahrscheinlichkeitsinformationen SVW1 aus dem ersten Bild BI1 und des Satzes von Wahrscheinlichkeitsinformationen SVW2 des jeweiligen Pixels PI2 des zweiten Bildes BI2 erfolgt.According to an expedient development of the invention, the object identifier assignment stage 5 designed so that the mapping of the pixels PI2 one of the segmentation instances SI2 of the second picture BI2 additionally on the basis of the respective pixel PI2 of the second picture BI2 associated set of probability information SVW1 from the first picture BI1 and the set of probability information SVW2 of the respective pixel PI2 of the second picture BI2 he follows.

3 zeigt ein drittes Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung 1 in einer schematischen Darstellung. 3 shows a third embodiment of a device according to the invention 1 in a schematic representation.

Gemäß einer vorteilhaften Weiterbildung der Erfindung ist die Segmentierungsstufe 2 zur Segmentierung eines dritten Bildes BI3 der Bilder BI, welches dem zweiten Bild BI2 der Bilder BI zeitlich nachfolgt, ausgebildet;
wobei die Korrespondenzermittlungsstufe 3 zur Ermittlung mehrere weiterer Korrespondenzpaare ausgebildet ist, welche jeweils ein Pixel PI2 der Pixel PI2 einer der Segmentierungsinstanzen SI2 des zweiten Bildes BI2 und ein dazu korrespondierendes Pixel PI3 der Pixel PI3 des dritten Bildes BI3 umfassen, wobei aus den weiteren Korrespondenzpaaren jeweils ein weiterer Verschiebungsvektor WVV zwischen dem jeweiligen Pixel PI2 des zweiten Bildes BI2 und dem jeweiligen korrespondierenden Pixel PI3 des dritten Bildes BI3 ermittelt ist;
wobei die Zuordnungsstufe 4 zum Zuordnen der jeweils einem Pixel PI2 der Pixel PI2 des zweiten Bildes BI2 zugeordneten Segmentierungsinstanzinformationen SII1, SII2 aus dem ersten Bild BI1 und aus dem zweiten Bild BI2 auf der Basis der weiteren Verschiebungsvektoren WVV zu jeweils einem korrespondierenden Pixel PI3 der Pixel PI3 des dritten Bildes BI1 ausgebildet ist, so dass dem jeweiligen Pixel PI3 des dritten Bildes BI3 eine Segmentierungsinstanzinformation SII1 der Segmentierungsinstanzinformationen SII1 aus dem ersten Bild BI1 und eine Segmentierungsinstanzinformation SII2 der Segmentierungsinstanzinformationen SII2 aus dem zweiten Bild BI2 zugeordnet sind;
wobei die Zuordnungsstufe 4 zum Zuordnen der einem Pixel PI2 der Pixel PI2 des zweiten Bildes BI2 zugeordneten Sätze von Wahrscheinlichkeitsinformationen SVW1, SVW2 aus dem ersten Bild BI1 und aus dem zweiten Bild BI2 auf der Basis der weiteren Verschiebungsvektoren WVV zu jeweils einem korrespondierenden Pixel PI3 der Pixel PI3 des dritten Bildes BI3 ausgebildet ist, so dass dem jeweiligen Pixel PI3 des dritten Bildes BI3 ein Satz SVW1 der Sätze von Wahrscheinlichkeitsinformationen SVW1 aus dem ersten Bild BI1 und ein Satz SVW2 der Sätze von Wahrscheinlichkeitsinformationen SVW2 aus dem zweiten Bild BI2 zugeordnet sind;
wobei die Objektkennungszuordnungsstufe 5 zum Zuordnen der Pixel PI3 einer der Segmentierungsinstanzen SI3 des dritten Bildes BI3 zu einer der Objektkennungen OBK ausgebildet ist, sofern die jeweilige Segmentierungsinstanz SI3 des dritten Bildes BI3 dasselbe Objekt abbildet, wie die der jeweiligen Objektkennung OBK zugeordnete Segmentierungsinstanz SI1 des ersten Bildes BI1 und/oder die der jeweiligen Objektkennung OBK zugeordnete Segmentierungsinstanz SI2 des zweiten Bildes BI2, wobei das Zuordnen der Pixel PI3 der jeweiligen Segmentierungsinstanz SI3 des dritten Bildes BI3 auf der Basis der Segmentierungsinstanzinformation SII3 des jeweiligen Pixels PI3 und der dem jeweiligen Pixel PI3 zugeordneten Segmentierungsinstanzinformationen SII1, SII2 aus dem ersten Bild BI1 und/oder aus dem zweiten Bild BI2 erfolgt;
wobei die Fusionierungsstufe 6 zum Fusionieren des einem Pixel PI3 der Pixel PI3 des dritten Bildes BI3 zugeordneten Satzes von Wahrscheinlichkeitsinformationen SVW1 aus dem ersten Bild BI1, des dem jeweiligen Pixel PI3 des dritten Bildes PI3 zugeordneten Satzes von Wahrscheinlichkeitsinformationen SVW2 aus dem zweiten Bild BI2 und des Satzes von Wahrscheinlichkeitsinformationen SVW3 des jeweiligen Pixels PI3 des dritten Bild BI3 ausgebildet ist, um so für das jeweilige Pixel PI3 des dritten Bildes BI3 einen fusionierten Satz von Wahrscheinlichkeitsinformationen FSVW3 zu bestimmen; und
wobei die Objektkategorieermittlungsstufe 7 zum Ermitteln der jeweiligen Objektkategorie OKT der Segmentierungsinstanzen SI1, SI2, SI3, welche einer der Objektkennungen OBK zugeordnet sind, auf der Basis der fusionierten Sätze von Wahrscheinlichkeitsinformationen FSVW3 der der jeweiligen Objektkennung OBK zugeordneten Pixel PI3 des dritten Bildes BI3 ausgebildet ist.According to an advantageous development of the invention, the segmentation stage 2 for segmentation of a third image BI3 the pictures BI which is the second picture BI2 the pictures BI succeeds in time, educated;
wherein the correspondence determination stage 3 for determining a plurality of further pairs of correspondence is formed, each of which is a pixel PI2 the pixel PI2 one of the segmentation instances SI2 of the second picture BI2 and a corresponding pixel PI3 the pixel PI3 of the third picture BI3 comprise, wherein from the further correspondence pairs in each case a further displacement vector WVV between each pixel PI2 of the second picture BI2 and the respective corresponding pixel PI3 of the third picture BI3 is determined;
where the assignment level 4 for assigning each one pixel PI2 the pixel PI2 of the second picture BI2 associated segmentation instance information SII1 . SII2 from the first picture BI1 and from the second picture BI2 on the basis of the further displacement vectors WVV in each case to a corresponding pixel PI3 the pixel PI3 of the third picture BI1 is formed, so that the respective pixel PI3 of the third picture BI3 a segmentation instance information SII1 the segmentation instance information SII1 from the first picture BI1 and a segmentation instance information SII2 the segmentation instance information SII2 from the second picture BI2 assigned;
where the assignment level 4 to associate the one pixel PI2 the pixel PI2 of the second picture BI2 associated sets of probability information SVW1 . SVW2 from the first picture BI1 and from the second picture BI2 on the basis of the further displacement vectors WVV in each case to a corresponding pixel PI3 the pixel PI3 of the third picture BI3 is formed, so that the respective pixel PI3 of the third picture BI3 commitment SVW1 the sets of probability information SVW1 from the first picture BI1 and a sentence SVW2 the sets of probability information SVW2 from the second picture BI2 assigned;
wherein the object identifier assignment stage 5 to associate the pixels PI3 one of the segmentation instances SI3 of the third picture BI3 to one of the object identifiers OBK is formed, provided that the respective segmentation instance SI3 of the third picture BI3 the same object as that of the respective object identifier OBK associated segmentation instance SI1 of the first picture BI1 and / or the respective object identifier OBK associated segmentation instance SI2 of the second picture BI2 , wherein the mapping of the pixels PI3 the respective segmentation instance SI3 of the third picture BI3 based on the segmentation instance information SII3 of the respective pixel PI3 and the respective pixel PI3 associated segmentation instance information SII1 . SII2 from the first picture BI1 and / or from the second picture BI2 he follows;
the fusion level 6 to merge the one pixel PI3 the pixel PI3 of the third picture BI3 associated set of probability information SVW1 from the first picture BI1 , of the respective pixel PI3 of the third picture PI3 associated set of probability information SVW2 from the second picture BI2 and the set of probability information SVW3 of the respective pixel PI3 of the third picture BI3 is designed to be so for each pixel PI3 of the third picture BI3 a fused set of probability information FSVW3 to determine; and
wherein the object category discovery stage 7 for determining the respective object category October the segmentation instances SI1 . SI2 . SI3 which is one of the object identifiers OBK are assigned on the basis of the merged sets of probability information FSVW3 that of the respective object identifier OBK associated pixels PI3 of the third picture BI3 is trained.

4 zeigt ein viertes Ausführungsbeispiel einer erfindungsgemäßen Vorrichtung in einer schematischen Darstellung. 4 shows a fourth embodiment of a device according to the invention in a schematic representation.

Gemäß einer zweckmäßigen Weiterbildung der Erfindung ist die Korrespondenzermittlungsstufe 3 so ausgebildet, dass zumindest für einige der Pixel PI2 des zweiten Bildes BI2, welche keinem der weiteren Korrespondenzpaare angehören, jeweils ein weiterer interpolierter Verschiebungsvektor WIVV auf der Basis von Interpolationen der weiteren Verschiebungsvektoren WW der Korrespondenzpaare, welche derselben Segmentierungsinstanz SI2 wie das jeweilige Pixel PI2 des zweiten Bildes BI2 angehören, bestimmt werden.According to an expedient development of the invention, the correspondence determination stage 3 designed so that at least for some of the pixels PI2 of the second picture BI2 which belong to none of the further correspondence pairs, in each case a further interpolated displacement vector WIVV based on interpolations of the further displacement vectors WW the correspondence pairs, which are the same segmentation instance SI2 like the respective pixel PI2 of the second picture BI2 be determined.

Nach einer vorteilhaften Weiterbildung der Erfindung ist die Objektkennungszuordnungsstufe 5 so ausgebildet, dass das Zuordnen der Pixel PI3 einer der Segmentierungsinstanzen SI3 des dritten Bildes BI3 zusätzlich auf der Basis der dem jeweiligen Pixel PI3 des dritten Bildes BI3 zugeordneten Sätze von Wahrscheinlichkeitsinformationen SVW1, SVW2 aus dem ersten und dem zweiten Bild BI1, BI2 und des Satzes von Wahrscheinlichkeitsinformationen SVW3 des jeweiligen Pixels PI3 des dritten Bildes BI3 erfolgt.According to an advantageous embodiment of the invention, the object identifier assignment stage 5 designed so that the mapping of the pixels PI3 one of the segmentation instances SI3 of the third picture BI3 additionally based on the respective pixel PI3 of the third picture BI3 associated sets of probability information SVW1 . SVW2 from the first and the second picture BI1 . BI2 and the set of probability information SVW3 of the respective pixel PI3 of the third picture BI3 he follows.

5 illustriert die Funktionsweise der Zuordnungsstufe 4 der erfindungsgemäßen Vorrichtung 1. 5 illustrates how the assignment level works 4 the device according to the invention 1 ,

5 zeigt drei Bilder BI1, BI2 und BI3, welche diese Reihenfolge in einer Videosequenz enthalten sind. Das erste Bild BI1 umfasst sechzehn Pixel PI1a bis PI1b. Weiterhin umfasst das zweite Bild BI2 sechzehn Pixel PI2a bis PI2b und das dritte Bild BI3 sechzehn Pixel PI3a bis PI3b. Es versteht sich von selbst, dass in der Praxis die Bilder BI wesentlich mehr Pixel enthalten können. Beispielhaft enthält das Bild BI1 eine Abbildung eines einzigen Objektes, so dass bei der Segmentierung des ersten Bildes BI1 genau eine Segmentierungsinstanz SI1 gebildet ist, welche, wiederum beispielhaft, die ausgegrauten Pixel PI1e, PI1f, PI1i und PI1j umfasst. Weiterhin enthält das zweite Bild BI2 eine Abbildung eines einzigen Objektes, so dass bei der Segmentierung des zweiten Bildes BI2 genau eine Segmentierungsinstanz SI2 gebildet ist, welche, wiederum beispielhaft, die ausgegrauten Pixel PI2f, PI2g, PI2j und PI2k umfasst. Die nicht ausgegrauten Pixel PI1 und PI2 gehören jeweils einer Hintergrundinstanz an. 5 shows three pictures BI1 . BI2 and BI3 which are contained in this sequence in a video sequence. The first picture BI1 includes sixteen pixels PI1a to PI1b , Furthermore, the second image includes BI2 sixteen pixels PI2a to PI2b and the third picture BI3 sixteen pixels PI3a to PI3b , It goes without saying that in practice the pictures BI can contain significantly more pixels. Exemplary contains the picture BI1 an illustration of a single object, so when segmenting the first image BI1 exactly one segmentation instance SI1 is formed, which, in turn, the grayed-out pixels PI1e . PI1f . PI1i and PI1j includes. Furthermore, the second picture contains BI2 an illustration of a single object, so when segmenting the second image BI2 exactly one segmentation instance SI2 is formed, which, in turn, the grayed-out pixels PI2f . PI2G . PI2j and PI2k includes. The not grayed out pixels PI1 and PI2 each belong to a background instance.

Nun sei angenommen, dass die Korrespondenzermittlung Stufe 3 ein Korrespondenzpaar ermittelt hat, welches das Pixel PI1e aus dem ersten Bild BI1 und das Pixel PI2f aus dem zweiten Bild BI2 umfasst. Dies führt nun zur Hypothese, dass die Segmentierungsinstanz SI1 und die Segmentierungsinstanz SI2 dasselbe Objekt abbilden, wobei die unterschiedlichen Positionen der Segmentierungsinstanzen SI1 und SI2 offensichtlich auf eine Bewegung des abgebildeten Objektes relativ zur Kamera schließen lassen. Anhand des Positionsunterschiedes zwischen den Pixeln PI1e und PI2f kann nun ein Verschiebungsvektor VV vermittelt werden. Mittels dieses Verschiebungsvektors VV können nun unmittelbar die dem Pixel PI1e zugeordnete Segmentierungsinstanzinformation SII und der dem Pixel PI1e zugeordnete Satz von Wahrscheinlichkeitsinformationen SVW zur Position des Pixel PI2f verschoben und somit dem Pixel PI2f zugewiesen werden. Now suppose that the correspondence determination stage 3 has determined a correspondence pair which the pixel PI1e from the first picture BI1 and the pixel PI2f from the second picture BI2 includes. This now leads to the hypothesis that the segmentation instance SI1 and the segmentation instance SI2 mimic the same object, with the different positions of the segmentation instances SI1 and SI2 obviously suggest a movement of the imaged object relative to the camera. Based on the position difference between the pixels PI1e and PI2f can now be a displacement vector VV mediated. By means of this displacement vector VV can now directly affect the pixel PI1e associated segmentation instance information SII and the pixel PI1e associated set of probability information SVW to the position of the pixel PI2f shifted and thus the pixel PI2f be assigned to.

Für die weiteren Pixel PI1f, PI1i und PI1j der Segmentierungsinstanz SI1 liegt kein Verschiebungsvektor VV, da die Pixel PI1f, PI1i und PI1j keinem Korrespondenz paar angehören. Allerdings können für diese Pixel PI1f, PI1i und PI1j auf der Basis des Verschiebungsrektors VV interpolierte Verschiebungsvektoren IVV ermittelt werden, welche eine Verschiebung der den Pixeln PI1f, PI1i und PI1j zugeordneten Segmentierungsinstanzinformationen SII1 und der den Pixeln PI1f, PI1i und PI1j zugeordneten Sätzen von Wahrscheinlichkeitsinformationen SVW1 erlauben, so dass diese den Pixeln PI2g, PI2j und PI2k zugeordnet werden können.For the other pixels PI1f . PI1i and PI1j the segmentation instance SI1 there is no displacement vector VV because the pixels PI1f . PI1i and PI1j do not belong to any correspondence. However, for these pixels PI1f . PI1i and PI1j on the basis of the shift rector VV interpolated displacement vectors IVV be determined, which is a shift of the pixels PI1f . PI1i and PI1j associated segmentation instance information SII1 and the pixels PI1f . PI1i and PI1j associated sets of probability information SVW1 allow, so that these pixels PI2G . PI2j and PI2k can be assigned.

Das dritte Bild BI3 enthält ebenfalls eine Abbildung eines einzigen Objektes, so dass auch bei der Segmentierung des zweiten Bildes BI3 genau eine Segmentierungsinstanz SI3 gebildet ist, welche, wiederum beispielhaft, die ausgegrauten Pixel PI3k, PI3l, PI3o und PI1p umfasst. Die nicht ausgegrauten Pixel PI1 und PI2 gehören jeweils einer Hintergrundinstanz an.The third picture BI3 also contains an image of a single object, so that also in the segmentation of the second image BI3 exactly one segmentation instance SI3 is formed, which, in turn, the grayed-out pixels PI3K . PI3l . PI3o and PI1p includes. The not grayed out pixels PI1 and PI2 each belong to a background instance.

Wenn nun angenommen wird, dass die Korrespondenzermittlungsstufe 3 ein Korrespondenzpaar ermittelt hat, welches die Pixel PI2f und PI3k umfasst, dann können die den Pixeln PI2f, PI2g, PI2j und PI2k zugeordneten Segmentierungsinstanzinformationen SII1 und SII2 und die den Pixeln PI2f, PI2g, PI2j und PI2k zugeordneten Sätzen von Wahrscheinlichkeitsinformationen SVW1 und SVW2 mittels des weiteren Verschiebungsvektors WVV und mittels der weiteren interpolierten Verschiebungsvektoren WIVV verschoben und den Pixeln PI3k, PI3l, PI3o und PI1p zugewiesen werden.Assuming now that the correspondence determination stage 3 has determined a correspondence pair which the pixels PI2f and PI3K covers, then the pixels can PI2f . PI2G . PI2j and PI2k associated segmentation instance information SII1 and SII2 and the pixels PI2f . PI2G . PI2j and PI2k associated sets of probability information SVW1 and SVW2 by means of the further displacement vector WVV and by means of the further interpolated displacement vectors WIVV moved and the pixels PI3K . PI3l . PI3o and PI1p be assigned to.

Je nach bestimmten Implementierungsanforderungen können Ausführungsbeispiele der erfindungsgemäßen Vorrichtung zumindest teilweise in Hardware oder zumindest teilweise in Software implementiert sein. Die Implementierung kann unter Verwendung eines digitalen Speichermediums, beispielsweise einer Floppy-Disk, einer DVD, einer Blu-ray Disc, einer CD, eines ROM, eines PROM, eines EPROM, eines EEPROM oder eines FLASH-Speichers, einer Festplatte oder eines anderen magnetischen oder optischen Speichers durchgeführt werden, auf dem elektronisch lesbare Steuersignale gespeichert sind, die mit einem programmierbaren Computersystem derart zusammenwirken können, dass eine oder mehrere oder alle der funktionalen Elemente der erfindungsgemäßen Vorrichtung realisiert werden. Beispielsweise kann die erfindungsgemäße Vorrichtung eine speicherprogrammierbare Steuerung aufweisen.Depending on specific implementation requirements, embodiments of the inventive device may be at least partially implemented in hardware or at least partially in software. The implementation may be performed using a digital storage medium, such as a floppy disk, a DVD, a Blu-ray Disc, a CD, a ROM, a PROM, an EPROM, an EEPROM or FLASH memory, a hard disk, or other magnetic disk or optical memory are stored on the electronically readable control signals that can interact with a programmable computer system such that one or more or all of the functional elements of the device according to the invention can be realized. For example, the device according to the invention may have a programmable logic controller.

Bei manchen Ausführungsbeispielen kann ein programmierbares Logikbauelement (beispielsweise ein feldprogrammierbares Gatterarray, ein FPGA) dazu verwendet werden, manche oder alle Funktionalitäten der hierin beschriebenen Vorrichtung durchzuführen. Bei manchen Ausführungsbeispielen kann ein feldprogrammierbares Gatterarray mit einem Mikroprozessor zusammenwirken, um eine der hierin beschriebenen Vorrichtungen zu realisieren.In some embodiments, a programmable logic device (eg, a field programmable gate array, an FPGA) may be used to perform some or all of the functionality of the device described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor to implement one of the devices described herein.

Je nach bestimmten Implementierungsanforderungen können Ausführungsbeispiele des erfindungsgemäßen Verfahrens mittels einer Vorrichtung durchgeführt werden, welche zumindest teilweise in Hardware oder zumindest teilweise in Software implementiert ist. Die Implementierung kann unter Verwendung eines digitalen Speichermediums, beispielsweise einer Floppy-Disk, einer DVD, einer Blu-ray Disc, einer CD, eines ROM, eines PROM, eines EPROM, eines EEPROM oder eines FLASH-Speichers, einer Festplatte oder eines anderen magnetischen oder optischen Speichers durchgeführt werden, auf dem elektronisch lesbare Steuersignale gespeichert sind, die mit einem programmierbaren Computersystem derart zusammenwirken können, dass das erfindungsgemäße Verfahren durchgeführt wird.Depending on specific implementation requirements, embodiments of the inventive method may be performed by means of a device which is at least partially implemented in hardware or at least partially in software. The implementation may be performed using a digital storage medium, such as a floppy disk, a DVD, a Blu-ray Disc, a CD, a ROM, a PROM, an EPROM, an EEPROM or FLASH memory, a hard disk, or other magnetic disk or optical memory are stored on the electronically readable control signals that can interact with a programmable computer system such that the inventive method is performed.

Aspekte der Erfindung, welche hierin im Kontext der erfindungsgemäßen Vorrichtung beschrieben sind, repräsentieren ebenso Aspekte des erfindungsgemäßen Verfahrens. Umgekehrt repräsentieren solche Aspekte der Erfindung, welche hierin im Kontext des erfindungsgemäßen Verfahrens beschrieben sind, ebenso Aspekte der erfindungsgemäßen Vorrichtung.Aspects of the invention described herein in the context of the device of the invention also represent aspects of the method of the invention. Conversely, those aspects of the invention described herein in the context of the method of the invention also represent aspects of the inventive device.

Allgemein werden die Verfahren bei einigen Ausführungsbeispielen seitens einer beliebigen Hardwarevorrichtung durchgeführt. Diese kann eine universell einsetzbare Hardware wie ein Computerprozessor (CPU) oder ein Grafikprozessor (GPU), sein oder für das Verfahren spezifische Hardware, wie beispielsweise ein ASIC.In general, in some embodiments, the methods are performed by any hardware device. This can be a universally applicable hardware such as a computer processor (CPU) or a graphics processor (GPU), or hardware specific to the method, such as an ASIC.

Ein weiteres Ausführungsbeispiel umfasst einen Computer, auf dem das Computerprogramm zum Durchführen eines der hierin beschriebenen Verfahren installiert ist.Another embodiment includes a computer on which the computer program is installed to perform one of the methods described herein.

Allgemein können Ausführungsbeispiele der vorliegenden Erfindung als Computerprogramm mit einem Programmcode implementiert sein, wobei der Programmcode dahin gehend wirksam ist, eines der Verfahren durchzuführen, wenn das Computerprogramm auf einem Computer abläuft. Der Programmcode kann beispielsweise auch auf einem maschinenlesbaren Träger gespeichert sein.In general, embodiments of the present invention may be implemented as a computer program having a program code, wherein the program code is operable to perform one of the methods when the computer program runs on a computer. The program code can also be stored, for example, on a machine-readable carrier.

Manche Ausführungsbeispiele der Erfindung umfassen einen, vorzugsweise nicht-flüchtigen Datenträger oder Datenspeicher, der ein Computerprogramm mit elektronisch lesbaren Steuersignalen aufweist, welches in der Lage ist, mit einem programmierbaren Computersystem derart zusammenzuwirken, dass eines der hierin beschriebenen Verfahren durchgeführt wird.Some embodiments of the invention include a preferably nonvolatile data carrier or data storage having a computer program with electronically readable control signals capable of interacting with a programmable computer system to perform one of the methods described herein.

Ausführungsbeispiele der vorliegenden Erfindung können als Computerprogrammprodukt mit einem Computerprogramm implementiert sein, wobei das Computerprogramm dahin gehend wirksam ist, eines der Verfahren durchzuführen, wenn das Computerprogramm auf einem Computer abläuft.Embodiments of the present invention may be implemented as a computer program product having a computer program, wherein the computer program is operable to perform one of the methods when the computer program runs on a computer.

BezugszeichenlisteLIST OF REFERENCE NUMBERS

11: Vorrichtung zum Erkennen und Verfolgen von Objekten in einer VideosequenzDevice for recognizing and tracking objects in a video sequence
22: Segmentierungsstufesegmentation stage
33: KorrespondenzermittlungsstufeCorrespondence determination stage
44: Zuordnungsstufeassignment level
55: ObjektkennungszuordnungsstufeObject identifier assignment stage
66: Fusionierungsstufefusing step
77: Objektkategorieermittlungsstufe Object category determination stage
PIPI: Pixelpixel
BIBI: Bildimage
OKTOctober: Objektkategorieobject category
SISI: Segmentierungsinstanzsegmentation instance
SIISII: SegmentierungsinstanzinformationSegmentation instance information
SVWSVW: Satz von WahrscheinlichkeitsinformationenSet of probability information
VVVV: Verschiebungsvektordisplacement vector
OBKOBK: Objektkennungobject identifier
FSVWFSVW: fusionierter Satz von Wahrscheinlichkeitsinformationenfused set of probability information
IVVIVV: interpolierter Verschiebungsvektorinterpolated displacement vector
WVVWVV: weiterer Verschiebungsvektorfurther shift vector
WIWWIW: weiterer interpolierter Verschiebungsvektoranother interpolated displacement vector

Quellen:Swell:

[1] Yu Xiang, Alexandre Alahi and Silvio Savarese. "Learning to Track: Online Multi-Object Tracking by Decision Making". In: 2015 IEEE International Conference on Computer Vision (ICCV), Pages: 4705-4713, 2015 ,
[2] Wongun Choi. "Near-Online Multi-Target Tracking with Aggregated Local Flow Descriptor". In: 2015 IEEE International Conference on Computer Vision (ICCV), Pages: 3029-3037, 2015 ,
[3] Anton Milan, Laura Leal-Taixe, Konrad Schindler, Ian Reid. "Joint Tracking and Segmentation of Multiple Targets". In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015 ,
[4] Sebastian Bullinger, Christoph Bodensteiner, Michael Arens. "Instance Flow Based Online Multiple Object Tracking". In: Computing Research Repository (CoRR), 2017 ,
[5] Jifeng Dai, Kaiming He and Jian Sun, "Instance-aware Semantic Segmentation via Multi-task Network Cascades". In: Computing Research Repository (CoRR), 2015 ,
[6] Yi Li, Haozhi Qi, Jifeng Dai, Xiangyuang Ji and Yichen Like, "Fully Convolutional Instance Aware Semantic Segmentation". In: Computing Research Repository (CoRR), 2016 ,
[7] Yinlin Hu, Yunsong Li, "Efficient Coarse-to-Fine Patchmatch for Large Displacement Optical Flow". In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 ,
[8th] Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordula Schmid, "Deepmatching: Hierarchical Deformable Dense Matching". In: International Journal of Computer Vision, Volume 120, No. 3, pages 300-323, 2016 ,

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte Nicht-PatentliteraturCited non-patent literature

Yu Xiang, Alexandre Alahi and Silvio Savarese. "Learning to Track: Online Multi-Object Tracking by Decision Making". In: 2015 IEEE International Conference on Computer Vision (ICCV), Pages: 4705-4713, 2015 [0080]
Wongun Choi. "Near-Online Multi-Target Tracking with Aggregated Local Flow Descriptor". In: 2015 IEEE International Conference on Computer Vision (ICCV), Pages: 3029-3037, 2015 [0080]
Anton Milan, Laura Leal-Taixe, Konrad Schindler, Ian Reid. "Joint Tracking and Segmentation of Multiple Targets". In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015 [0080]
Sebastian Bullinger, Christoph Bodensteiner, Michael Arens. "Instance Flow Based Online Multiple Object Tracking". In: Computing Research Repository (CoRR), 2017 [0080]
Jifeng Dai, Kaiming He and Jian Sun, "Instance-aware Semantic Segmentation via Multi-task Network Cascades". In: Computing Research Repository (CoRR), 2015 [0080]
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyuang Ji and Yichen Like, "Fully Convolutional Instance Aware Semantic Segmentation". In: Computing Research Repository (CoRR), 2016 [0080]
Yinlin Hu, Yunsong Li, "Efficient Coarse-to-Fine Patchmatch for Large Displacement Optical Flow". IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 [0080]
Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordula Schmid, "Deepmatching: Hierarchical Deformable Dense Matching". In: International Journal of Computer Vision, Vol. 120, No. 3 pp. 300-323, 2016 [0080]

Claims

Apparatus for detecting and tracking objects in a video sequence of temporally successive pixel (PI) images (BI), the apparatus (1) comprising: a segmentation stage (2) for segmentation of at least a first image (BI1) of the images (BI) and a temporally subsequent second image (BI2) of the images (BI), the segmentation level (2) having a plurality of predefined semantic object categories (OKT) and wherein the segmentation stage (2) is designed so that when segmenting an image (BI1, BI2) of the images (BI) for objects imaged in the respective image (BI1, BI2), one respective segmentation entity (SI1, SI2) valid for the respective image (BI1, BI2) is formed, and for a plurality of pixels (PI1, PI2) of the pixels (PI1, PI2) of the respective image (BI1, BI2), respectively one segmentation instance information (SII1, SII2) and one set of probability information (SVW1, SVW2) are determined, the respective segmentation instance information ( SII1, SII2) indicates to which of the segmentation entities (SI1, SI2) of the respective image (BI1, BI2) the respective pixel (PI1, PI2) belongs, and wherein each of the probability information of the respective set of probability information (SVW1, SVW2) is a Indicates the probability of belonging the respective pixel (PI1, PI2) to one of the object categories (OKT); a correspondence determination stage (3) for determining a plurality of correspondence pairs, each comprising a pixel (PI1) of the pixels (PI1) of one of the segmentation entities (SI1) of the first image (BI1) and a corresponding pixel (PI2) of the pixels (PI2) of the second image (BI2), wherein from the correspondence pairs in each case a displacement vector (VV) between the respective pixel (PI1) of the first image BI1) and the respective corresponding pixel (PI2) of the second image (BI) is determined; an allocation stage (4) for allocating the segmentation instance information (SII1) and the set of probability information (SVW1) of at least one pixel (PI1) of the pixels (PI1) of the first image (BI1) on the basis of the displacement vectors (VV) to one pixel each ( PI2) of the pixels (PI2) of the second image, such that the respective pixel (PI2) of the second image has segmentation instance information (SII1) of the segmentation instance information (SII1) from the first image (BI1) and a set (SVW1) of the sets of probability information ( SVW1) from the first image (BI1) are assigned; an object identifier assignment stage (5) for assigning the pixels (PI1) of one of the segmentation instances (SI1) of the first image (BI1) and the pixels (PI2) of one of the segmentation entities (SI2) of the second image (BI2) to a common object identifier (OBK), if the respective segmentation entity (SI1) of the first image (BI1) and the segmentation entity (SI2) of the second image (BI2) map the same object, wherein the object identifier (OBK) uniquely identifies the respective object across the video sequence, wherein the association of the pixels (PI1) of the respective segmentation entity (SI1) of the first image (BI1) on the basis of the segmentation entity information (SII1) of the respective pixel (PI1), and wherein the assignment of the pixels (PI2) of the respective segmentation entity (SI2) of the second image (PI2) BI2) on the basis of the segmentation instance information (SII2) of the respective pixel (PI2) and the segmentation instance information (SII1.15) associated with the respective pixel (PI2) ) takes place from the first image (BI1); a merging stage (6) for fusing the set of probability information (SVW1) respectively associated with one pixel (PI2) of the pixels (PI2) of the second image (BI2) from the first image (BI1) and the set of probability information (SVW2) of the respective pixel (PI2) of the second image (BI2) so as to respectively determine a fused set of probability information (FSVW2) for one or more pixels of the second image (PI2); and an object category determination stage (7) for determining the respective object category (OKT) of the segmentation entities (SI1, SI2) associated with one of the object identifiers (OBK) based on the merged sets of probability information (FSVW2) associated with the respective object identifier (OBK) Pixel (PI2) of the second image (BI2).

Device according to the preceding claim, wherein the segmentation stage (2) is designed such that when segmenting one of the images (BI) the plurality of pixels (PI) of the pixels (PI) of the respective image (BI) all pixels (PI) of the Pixel (PI) of the respective image (BI), which are not associated with a background instance.

Device according to one of the preceding claims, wherein the correspondence determination stage (3) for determining the correspondence pairs and the displacement vectors (VV) is formed on the basis of an optical flow method or a semi-sense matching method.

Apparatus according to any one of the preceding claims, wherein the correspondence detection stage (3) is arranged so that at least some of the pixels (PI1) of the first image (BI1) which do not belong to any of the correspondence pairs have an interpolated displacement vector (IVV) based on Interpolations of the displacement vectors (VV) of the correspondence pairs, which belong to the same segmentation entity (SI1) as the respective pixel (PI1) of the first image (BI1).

Apparatus according to any one of the preceding claims, wherein the fusing stage (6) is arranged to fuse the respective sets of probability information (SVW) by weighted sums.

Apparatus according to any one of the preceding claims, wherein the object identifier assigning stage (5) is arranged to associate the pixels (PI2) of one of the segmentation entities (SI2) of the second image (BI2) additionally on the basis of the respective pixel (PI2) of the second Image (BI2) associated set of probability information (SVW1) from the first image (BI1) and the set of probability information (SVW2) of the respective pixel (PI2) of the second image (BI2).

Device according to one of the preceding claims, wherein the segmentation stage (2) is designed for segmenting a third image (BI3) of the images (BI), which follows in time the second image (BI2) of the images (BI); wherein the correspondence determination stage (3) is designed to determine a plurality of further correspondence pairs, each one pixel (PI2) of the pixels (PI2) of the second image (BI2) and a corresponding pixel (PI3) of the pixels (PI3) of the third image (BI3 ), wherein from the further correspondence pairs in each case a further displacement vector (WW) between the respective pixel (PI2) of the second image (BI2) and the respective corresponding pixel (PI3) of the third image (BI3) is determined; wherein the assignment stage (4) for allocating the segmentation instance information (SII1, SII2) respectively associated with one pixel (PI2) of the pixels (PI2) of the second image (BI2) from the first image (BI1) and from the second image (BI2) on the Base of the further displacement vectors (WW) is formed in each case a corresponding pixel (PI3) of the pixels (PI3) of the third image (BI1), so that the respective pixel (PI3) of the third image (BI3) segmentation instance information (SII1) of the segmentation instance information (SII1) from the first image (BI1) and a segmentation instance information (SII2) of the segmentation instance information (SII2) from the second image (BI2) are assigned; wherein the assignment stage (4) for allocating the sets of probability information (SVW1, SVW2) associated with one pixel (PI2) of the pixels (PI2) of the second image (BI2) from the first image (BI1) and the second image (BI2) the base of the further displacement vectors (WW) is formed in each case to a corresponding pixel (PI3) of the pixels (PI3) of the third image (BI3), so that the respective pixel (PI3) of the third image (BI3) has a sentence (SVW1) of Sets of probability information (SVW1) from the first image (BI1) and a set (SVW2) of the sets of probability information (SVW2) from the second image (BI2) are assigned; wherein the object identifier assignment stage (5) is arranged for assigning the pixels (PI3) of one of the segmentation entities (SI3) of the third image (BI3) to one of the object identifiers (OBK) if the respective segmentation entity (SI3) of the third image (BI3) is the same Object maps, such as the respective object identifier (OBK) associated segmentation entity (SI1) of the first image (BI1) and / or the respective object identifier (OBK) associated segmentation entity (SI2) of the second image (BI2), wherein the assignment of the pixels ( PI3) of the respective segmentation entity (SI3) of the third image (BI3) on the basis of the segmentation entity information (SII3) of the respective pixel (PI3) and the segmentation entity information (SII1, SII2) associated with the respective pixel (PI3) from the first image (BI1) and / or from the second image (BI2); wherein the merging stage (6) for fusing the set of probability information SVW1) associated with one pixel (PI3) of the pixels (PI3) of the third image (BI3) from the first image (BI1), of the respective pixel (PI3) of the third image ( PI3) associated set of probability information (SVW2) from the second image (BI2) and the set of probability information (SVW3) of the respective pixel (PI3) of the third image (BI3) is formed so as to the respective pixel (PI3) of the third Image (BI3) to determine a fused set of probability information (FSVW3); and wherein the object category determination stage (7) for determining the respective object category (OKT) of the segmentation entities (SI1, SI2, SI3) assigned to one of the object identifiers (OBK) on the basis of the merged sets of probability information (FSVW3) of the respective object identifier ( OBK) associated pixels (PI3) of the third image (BI3) is formed.

Apparatus according to the preceding claim, wherein the object identifier assignment stage (5) is arranged such that the assignment of the pixels (PI3) of one of the segmentation entities (SI3) of the third image (BI3) additionally on the basis of the respective pixel (PI3) of the third image ( BI3) associated sets of probability information (SVW1, SVW2) from the first and second image (BI1, BI2)) and the set of probability information (SVW3) of the respective pixel (PI3) of the third image (BI3).

Device according to one of Claims 7 or 8th , wherein the correspondence determination stage (3) is designed so that at least for some of the pixels (PI2) of the second image (BI2) which belong to none of the further correspondence pairs, in each case a further interpolated displacement vector (WIW) the basis of interpolations of the further displacement vectors (WW) of the correspondence pairs which belong to the same segmentation entity (SI2) as the respective pixel (PI2) of the second image (BI2).

A system for recognizing and tracking objects, the system comprising: a video camera for generating a video sequence from temporally successive consisting of pixels (PI) images (BI); and A device (1) for recognizing and tracking objects in the video sequence according to any one of the preceding claims.

A method of detecting and tracking objects in a video sequence of temporally successive pixel (PI) images (BI), the method comprising the steps of: Segmentation of at least one first image (BI1) of the images (BI) and of a temporally subsequent second image (BI2) of the images (BI), wherein a plurality of predefined semantic object categories (OKT) are provided and wherein in the segmentation of an image (BI1, BI2) the pictures (BI) for objects imaged in the respective image (BI1, BI2), a respective segmentation entity (SI1, SI2) valid for the respective image (BI1, BI2) is formed, which images one of the objects, and for a plurality of pixels (PI1, PI2) of the pixels (PI1, PI2) of the respective image (BI1, BI2), one segmentation instance information (SII1, SII2) and one set of probability information (SVW1, SVW2) are respectively determined, the respective segmentation instance information ( SII1, SII2) indicates to which of the segmentation entities (SI1, SI2) the respective pixel (PI1, PI2) belongs, and wherein each of the probability information of the respective set of probability information (SVW1, SVW2) indicates a probability of belonging the respective pixel (PI1 , PI2) to one of the object categories (OKT); Determination of correspondence pairs, each comprising a pixel (PI1) of the pixels (PI1) of one of the segmentation entities (SI1) of the first image (BI1) and a corresponding pixel (PI2) of the pixels (PI2) of the second image (B2) from the correspondence pairs in each case a displacement vector (VV) between the respective pixel (PI1) of the first image (BI1) and the respective pixel (PI2) of the second image (BI2) is determined; Associating the segmentation instance information (SII1) and the set of probability information (SVW1) of at least one pixel (PI1) of the pixels (PI1) of the first image (BI1) on the basis of the displacement vectors (VV) to one pixel (PI2) of the pixels (PI2 ) of the second image (BI2) such that the respective pixel (PI2) of the second image (BI2) has segmentation instance information (SII1) of the segmentation instance information (SII1) from the first image (BI1) and a set (SVW1) of the sets of probability information (BI2). SVW) are assigned from the first image (BI1); and Assigning the pixels (PI1) of one of the segmentation entities (SI1) of the first image (BI1) and the pixels (PI2) of one of the segmentation entities (SI2) of the second image (BI2) to a common entity identifier (OBK), if the respective segmentation entity (SI1 ) of the first image (BI1) and the segmentation entity (SI2) of the second image (BI2) image the same object, wherein the object identifier (OBK) uniquely identifies the respective object across the video sequence, wherein the assignment of the pixels (PI1) of the segmentation entity ( SI1) of the first image (BI1) on the basis of the segmentation instance information (SII1) of the respective pixel (PI1), and the assignment of the pixels (PI2) of the segmentation entity (SI2) of the second image (BI2) on the basis of the segmentation instance information (BI1) SII2) of the respective pixel (PI2) and the segmentation instance information (SII1) associated with the respective pixel (PI2) from the first image (BI1); Fusing the set of probability information (SVW1) respectively associated with one pixel (PI2) of the pixels (PI2) of the second image (BI2) from the first image (BI1) and the set of probability information (SVW2) of the respective pixel (PI2) of the second image (BI2) so as to respectively determine a fused set of probability information (FSVW2) for one or more pixels (PI2) of the second image (BI2); and Determining the respective object category (OKT) of the segmentation instances (SI) which are assigned to one of the object identifiers (OBK) on the basis of the merged sets of probability information (FSVW2) of the pixels (PI2) of the second image assigned to the respective object identifier (OBK) ( BI2).

Computer program for carrying out a method according to the preceding claim, when executed on a computer or processor.