DE102018126664A1

DE102018126664A1 - DOMAIN ADAPTATION THROUGH CLASS-EXISTED SELF-TRAINING WITH SPATIAL PRIOR

Info

Publication number: DE102018126664A1
Application number: DE102018126664.2A
Authority: DE
Inventors: Yang Zou; Zhiding Yu; Vijayakumar Bhagavatula; Jinsong Wang
Original assignee: GM Global Technology Operations LLC
Current assignee: GM Global Technology Operations LLC
Priority date: 2017-10-27
Filing date: 2018-10-25
Publication date: 2019-05-02

Abstract

Ein Fahrzeug, System und Verfahren zum Navigieren eines Fahrzeugs. Das Fahrzeug und das System beinhalten eine Digitalkamera zum Erfassen eines Zielbildes einer Zieldomäne des Fahrzeugs und einen Prozessor. Der Prozessor ist für das Folgende konfiguriert: Bestimmen eines Zielsegmentierungsverlustes zum Trainieren des neuronalen Netzwerks, um eine semantische Segmentierung eines Zielbildes in einer Zieldomäne durchzuführen, Bestimmen eines Wertes einer Pseudobeschriftung des Zielbildes durch Reduzieren des Zielsegmentierungsverlustes unter gleichzeitiger Überwachung des Trainings über die Zieldomäne, Durchführen einer semantischen Segmentierung des Zielbildes unter Verwendung des trainierten neuronalen Netzwerks zum Segmentieren des Zielbildes und Klassifizieren eines Objekts im Zielbild, und Navigieren des Fahrzeugs basierend auf dem klassifizierten Objekt im Zielbild.

A vehicle, system and method for navigating a vehicle. The vehicle and the system include a digital camera for capturing a target image of a target domain of the vehicle and a processor. The processor is configured to: determine a target segmentation loss to train the neural network to perform a semantic segmentation of a target image in a target domain, determine a value of a pseudo-caption of the target image by reducing the target segmentation loss while monitoring the training over the target domain, performing a semantically segmenting the target image using the trained neural network to segment the target image and classify an object in the target image, and navigating the vehicle based on the classified object in the target image.

Description

EINLEITUNGINTRODUCTION

Die vorliegende Offenbarung betrifft ein System und Verfahren zum Anpassen neuronaler Netzwerke zum Durchführen einer semantischen Segmentierung von Bildern, die aus einer Vielzahl von Domänen aufgenommen wurden, für autonomes Fahren und fortgeschrittene Fahrerassistenzsysteme (ADAS).The present disclosure relates to a system and method for adjusting neural networks for performing semantic segmentation of images taken from a plurality of domains for autonomous driving and advanced driver assistance systems (ADAS).

Bei autonomen Fahrzeugen und ADAS besteht ein Ziel darin, die umliegende Umgebung so zu verstehen, dass entweder dem Fahrer oder dem Fahrzeug selbst Informationen bereitgestellt werden können, um entsprechende Entscheidungen zu treffen. Eine Möglichkeit, dieses Ziel zu erreichen, besteht darin, digitale Bilder der Umgebung unter Verwendung einer fahrzeugeigenen Digitalkamera aufzunehmen und dann Objekte und fahrbare Bereiche im digitalen Bild mithilfe von Computer-Vision-Algorithmen zu identifizieren. Solche Identifikationsaufgaben können durch semantische Segmentierung erreicht werden, bei der Pixel im digitalen Bild gruppiert und dicht mit Labels versehen werden, die einem vordefinierten Satz von semantischen Klassen (wie Auto, Fußgänger, Straße, Gebäude usw.) entsprechen. Ein neuronales Netzwerk kann für die semantische Segmentierung unter Verwendung von Trainingsbildern mit human annotierten Labels trainiert werden. Aufgrund der Annotationsmitteln auferlegten Einschränkungen decken die Trainingsbilder möglicherweise nur einen kleinen Teil der Orte auf der ganzen Welt ab, enthalten Bilder möglicherweise bei bestimmten Wetterbedingungen und bestimmten Tageszeiten und werden möglicherweise von bestimmten Kameratypen gesammelt. Diese der Quelle der Trainingsbilder auferlegten Einschränkungen gelten insbesondere für die Domäne der Trainingsbilder. Es ist jedoch durchaus üblich, dass ein Fahrzeug in einer anderen Domäne betrieben wird. Da verschiedene Domänen unterschiedliche Beleuchtungen, Straßenstile, unsichtbare Objekte usw. aufweisen können, funktioniert ein in einer Domäne trainiertes neuronales Netzwerk nicht immer gut in einer anderen Domäne. Dementsprechend ist es wünschenswert, ein Verfahren zum Anpassen eines für semantische Segmentierung trainierten neuronalen Netzwerks in einer Domäne bereitzustellen, um das neuronale Netzwerk in einer anderen Domäne effektiv zu betreiben.In autonomous vehicles and ADAS, one goal is to understand the surrounding environment so that either the driver or the vehicle itself can be provided with information to make appropriate decisions. One way to achieve this goal is to capture digital images of the environment using an onboard digital camera and then identify objects and drivable areas in the digital image using computer vision algorithms. Such identification tasks can be accomplished by semantic segmentation in which pixels in the digital image are grouped and tightly labeled with labels that correspond to a predefined set of semantic classes (such as car, pedestrian, street, building, etc.). A neural network can be trained for semantic segmentation using training images with human annotated labels. Due to restrictions imposed by the annotation means, the training images may only cover a small part of the locations around the world, may contain images under certain weather conditions and times of day, and may be collected by certain camera types. These restrictions imposed on the source of the training images apply in particular to the domain of the training images. However, it is quite common for a vehicle to operate in a different domain. Since different domains may have different lights, street styles, invisible objects, etc., one domain-trained neural network does not always work well in another domain. Accordingly, it is desirable to provide a method of adapting a semantic segmentation-trained neural network in one domain to effectively operate the neural network in another domain.

KURZDARSTELLUNGSUMMARY

In einer exemplarischen Ausführungsform wird ein Verfahren zum Navigieren eines Fahrzeugs offenbart. Das Verfahren beinhaltet Bestimmen eines Zielsegmentierungsverlustes zum Trainieren eines neuronalen Netzwerks, um eine semantische Segmentierung an einem Zieldomänenbild durchzuführen, Bestimmen eines Wertes eines Pseudo-Labels des Zielbildes durch Reduzieren des Zielsegmentierungsverlustes, während eine Überwachung des Trainings über die Zieldomäne bereitgestellt wird, Durchführen einer semantischen Segmentierung des Zielbildes unter Verwendung des trainierten neuronalen Netzwerks, um das Zielbild zu segmentieren und ein Objekt in dem Zielbild zu klassifizieren, und Navigieren des Fahrzeugs basierend auf den klassifizierten Objekten im Zielbild.In an exemplary embodiment, a method for navigating a vehicle is disclosed. The method includes determining a target segmentation loss to train a neural network to perform a semantic segmentation on a target domain image, determining a value of a pseudo-label of the target image by reducing the target segmentation loss while providing training monitoring over the target domain, performing semantic segmentation of the target image using the trained neural network to segment the target image and classify an object in the target image, and navigate the vehicle based on the classified objects in the target image.

Das Verfahren beinhaltet ferner Bestimmen eines Quellsegmentierungsverlustes zum Trainieren des neuronalen Netzwerks, um eine semantische Segmentierung an einem Quelldomänenbild durchzuführen, und Reduzieren einer Summierung des Quellsegmentierungsverlustes und des Zielsegmentierungsverlustes, während die Überwachung des Trainings über der Zieldomäne bereitgestellt wird. Das Verfahren kann ferner Reduzieren der Summierung durch Anpassen der Parameter des neuronalen Netzwerks und des Wertes der Pseudobeschriftung beinhalten.The method further includes determining a source segmentation loss to train the neural network to perform a semantic segmentation on a source domain image and reducing a summation of the source segmentation loss and the target segmentation loss while providing the monitoring of the training over the target domain. The method may further include reducing the summation by adjusting the parameters of the neural network and the value of the pseudo-label.

In verschiedenen Ausführungsformen beinhaltet das Bestimmen des Wertes der Pseudobeschriftung des Zielbildes Reduzieren des Zielsegmentierungsverlustes über einer Vielzahl von Segmentierungsklassen bei gleichzeitiger Bereitstellung der Überwachung jeder der Vielzahl von Segmentierungsklasse. Bestimmen des Zielsegmentierungsverlustes beinhaltet ferner Multiplizieren der Verteilung räumlicher Prioren für die Segmentierungsklasse mit einer Klassenwahrscheinlichkeit, dass sich ein Pixel in der Segmentierungsklasse befindet. Das neuronale Netzwerk kann durch ein gegnerisches Domänenanpassungstraining und/oder ein selbstlernendes Domänenanpassungs-Training trainiert werden. Die Überwachung des Trainings kann Durchführen eines Klassenausgleichs für den Zielsegmentierungsverlust beinhalten. Beim semantischen Segmentieren des Zielbildes kann ein Glättungsalgorithmus angewendet werden.In various embodiments, determining the value of the pseudo-caption of the target image includes reducing the target segmentation loss over a plurality of segmentation classes while providing the monitoring of each of the plurality of segmentation classes. Determining the target segmentation loss further includes multiplying the distribution of spatial priors for the segmentation class with a class probability that a pixel is in the segmentation class. The neural network can be trained by antagonistic domain adaptation training and / or self-learning domain adaptation training. The monitoring of the training may include performing a class compensation for the target segmentation loss. When semantically segmenting the target image, a smoothing algorithm can be applied.

In einer weiteren exemplarischen Ausführungsform wird ein Navigationssystem für ein Fahrzeug offenbart. Das System beinhaltet eine Digitalkamera zum Erfassen eines Zielbildes einer Zieldomäne des Fahrzeugs und einen Prozessor. Der Prozessor ist konfiguriert zum: Bestimmen eines Zielsegmentierungsverlustes zum Trainieren des neuronalen Netzwerks, um ein semantisches Segmentieren des Zielbildes in der Zieldomäne durchzuführen, Bestimmen eines Wertes einer Pseudobeschriftung des Zielbildes durch Reduzieren des Zielsegmentierungsverlustes bei gleichzeitigem Überwachen des Trainings über die Zieldomäne, Durchführen einer semantischen Segmentierung des Zielbildes unter Verwendung des trainierten neuronalen Netzwerks zum Segmentieren des Zielbildes und Klassifizieren von Objekten im Zielbild und Navigieren des Fahrzeugs basierend auf dem klassifizierten Objekt im Zielbild.In another exemplary embodiment, a navigation system for a vehicle is disclosed. The system includes a digital camera for capturing a target image of a target domain of the vehicle and a processor. The processor is configured to: determine a target segmentation loss to train the neural network to semantically segment the target image in the Perform target domain, determining a value of a pseudo-labeling of the target image by reducing the Zielsegmentierungsverlustes while monitoring the training on the target domain, performing a semantic segmentation of the target image using the trained neural network for segmenting the target image and classifying objects in the target image and navigating the vehicle based on the classified object in the target image.

Der Prozessor ist ferner konfiguriert, um einen Quellsegmentierungsverlust zu bestimmen, um das neuronale Netzwerk zu trainieren, um eine semantische Segmentierung an einem Quelldomänenbild durchzuführen, und um eine Summierung des Quellsegmentierungsverlustes und des Zielsegmentierungsverlustes zu reduzieren, bei gleichzeitigem Bereitgestellen einer Überwachung des Trainings über der Zieldomäne. In einer Ausführungsform ist der Prozessor ferner konfiguriert, um die Summierung durch Anpassen eines Parameters des neuronalen Netzwerks und des Wertes der Pseudobeschriftung zu reduzieren. Der Prozessor ist ferner konfiguriert, um den Wert der Pseudobeschriftung des Zielbildes zu bestimmen, indem er den Verlust der Zielsegmentierung über eine Vielzahl von Segmentierungsklassen reduziert und gleichzeitig die Überwachung für jede der Vielzahl von Segmentierungsklassen bereitstellt. Der Prozessor ist ferner konfiguriert, eine Verteilung räumlicher Prioren für die Segmentierungsklasse mit einer Klassenwahrscheinlichkeit für ein Pixel in der Segmentierungsklasse zu multiplizieren.The processor is further configured to determine a source segmentation loss to train the neural network to perform semantic segmentation on a source domain image and to reduce summation of the source segmentation loss and the target segmentation loss while providing training over the target domain monitoring , In one embodiment, the processor is further configured to reduce the summation by adjusting a parameter of the neural network and the value of the pseudo-label. The processor is further configured to determine the value of the pseudo-caption of the target image by reducing the loss of the target segmentation over a plurality of segmentation classes while providing the monitoring for each of the plurality of segmentation classes. The processor is further configured to multiply a distribution of spatial priors for the segmentation class by a class probability for a pixel in the segmentation class.

In noch einer anderen exemplarischen Ausführungsform wird ein Fahrzeug offenbart. Das Fahrzeug beinhaltet eine Digitalkamera zum Erfassen eines Zielbildes einer Zieldomäne des Fahrzeugs und einen Prozessor. Der Prozessor ist konfiguriert, um einen Zielsegmentierungsverlust für das Training des neuronalen Netzwerks zu bestimmen, um eine semantische Segmentierung des Zielbildes in der Zieldomäne durchzuführen, um einen Wert einer Pseudobeschriftung des Zielbildes zu bestimmen, indem der Zielsegmentierungsverlust reduziert wird, während eine Überwachung des Trainings über die Zieldomäne bereitgestellt wird, um eine semantische Segmentierung des Zielbildes unter Verwendung des trainierten neuronalen Netzwerks und der Pseudobeschriftung durchzuführen, um das Zielbild zu segmentieren und ein Objekt im Zielbild zu klassifizieren, und um das Fahrzeug basierend auf dem klassifizierten Objekt im Zielbild zu steuern.In yet another exemplary embodiment, a vehicle is disclosed. The vehicle includes a digital camera for capturing a target image of a target domain of the vehicle and a processor. The processor is configured to determine a target segmentation loss for training the neural network to perform a semantic segmentation of the target image in the target domain to determine a value of a pseudo-caption of the target image by reducing the target segmentation loss while monitoring the training over the target domain is provided to perform semantic segmentation of the target image using the trained neural network and the pseudo-label to segment the target image and classify an object in the target image and to control the vehicle based on the classified object in the target image.

Der Prozessor ist ferner konfiguriert, um einen Quellsegmentierungsverlust zu bestimmen, um das neuronale Netzwerk zu trainieren, um eine semantische Segmentierung an einem Quelldomänenbild durchzuführen, und um eine Summierung des Quellsegmentierungsverlustes und des Zielsegmentierungsverlustes zu reduzieren, bei gleichzeitigem Bereitstellen der Überwachung des Trainings über die Zieldomäne.The processor is further configured to determine a source segmentation loss to train the neural network to perform semantic segmentation on a source domain image and to reduce summation of the source segmentation loss and the target segmentation loss while providing monitoring of the training over the target domain ,

In einer Ausführungsform ist der Prozessor ferner konfiguriert, um die Summierung durch Anpassen eines Parameters des neuronalen Netzwerks und des Wertes der Pseudobeschriftung zu reduzieren. Der Prozessor ist ferner konfiguriert, um den Wert der Pseudobeschriftung des Zielbildes zu bestimmen, indem er den Verlust der Zielsegmentierung über eine Vielzahl von Segmentierungsklassen reduziert und gleichzeitig die Überwachung für jede der Vielzahl von Segmentierungsklassen bereitstellt. Der Prozessor ist ferner konfiguriert, eine Verteilung räumlicher Prioren für eine Segmentierungsklasse mit einer Klassenwahrscheinlichkeit eines Pixels in der Segmentierungsklasse zu multiplizieren, um den Zielsegmentierungsverlust zu bestimmen. Der Prozessor ist ferner konfiguriert, einen Glättungsalgorithmus auf die semantische Segmentierung des Zielbildes anzuwenden.In one embodiment, the processor is further configured to reduce the summation by adjusting a parameter of the neural network and the value of the pseudo-label. The processor is further configured to determine the value of the pseudo-caption of the target image by reducing the loss of the target segmentation over a plurality of segmentation classes while providing the monitoring for each of the plurality of segmentation classes. The processor is further configured to multiply a distribution of spatial priors for a segmentation class by a class probability of a pixel in the segmentation class to determine the target segmentation loss. The processor is further configured to apply a smoothing algorithm to the semantic segmentation of the target image.

Die oben genannten Eigenschaften und Vorteile sowie anderen Eigenschaften und Funktionen der vorliegenden Offenbarung gehen aus der folgenden ausführlichen Beschreibung in Verbindung mit den zugehörigen Zeichnungen ohne Weiteres hervor.The above features and advantages as well as other features and functions of the present disclosure will become more readily apparent from the following detailed description when taken in conjunction with the accompanying drawings.

Figurenlistelist of figures

Andere Merkmale, Vorteile und Details erscheinen nur exemplarisch in der folgenden ausführlichen Beschreibung der Ausführungsformen, wobei sich die ausführliche Beschreibung auf die Zeichnungen bezieht, wobei gilt:

1 zeigt ein veranschaulichendes Trajektorienplanungssystem, das einem Fahrzeug gemäß verschiedenen Ausführungsformen zugeordnet ist;
2 zeigt ein veranschaulichendes digitales Bild, das von einer fahrzeugeigenen Digitalkamera des Fahrzeugs erhalten wurde, sowie ein semantisch segmentiertes Bild, das dem digitalen Bild entspricht;
3 veranschaulicht auf schematische Weise Verfahren zum Trainieren und Betreiben eines neuronalen Netzwerks;
4A und 4B zeigen verschiedene räumliche Prioren, die während des Trainings des neuronalen Netzwerks in der Quelldomäne erhalten werden;
5 zeigt ein veranschaulichendes digitales Bild, das in einer Zieldomäne für semantische Segmentierung erhalten wurde;
6 zeigt ein ungestütztes semantisches Segmentierungsbild des digitalen Bildes; und
7 zeigt ein semantisches Segmentierungsbild nach erfolgter Anpassung des neuronalen Netzwerks.

Other features, advantages and details appear only by way of example in the following detailed description of the embodiments, the detailed description of which refers to the drawings, wherein:

1 FIG. 10 illustrates an illustrative trajectory planning system associated with a vehicle according to various embodiments; FIG.
2 Fig. 11 shows an illustrative digital image obtained from an onboard digital camera of the vehicle and a semantically segmented image corresponding to the digital image;
3 schematically illustrates methods for training and operating a neural network;
4A and 4B show different spatial priors obtained during training of the neural network in the source domain;
5 shows an illustrative digital image obtained in a semantic segmentation destination domain;
6 shows an unsupported semantic segmentation image of the digital image; and
7 shows a semantic segmentation image after adaptation of the neural network.

AUSFÜHRLICHE BESCHREIBUNGDETAILED DESCRIPTION

Die folgende Beschreibung ist lediglich exemplarischer Natur und nicht dazu gedacht, die vorliegende Offenbarung in ihren An- oder Verwendungen zu beschränken. Es sollte verstanden werden, dass in den Zeichnungen entsprechende Bezugszeichen gleiche oder entsprechende Teile und Merkmale bezeichnen.The following description is merely exemplary in nature and is not intended to limit the present disclosure in its applications or uses. It should be understood that in the drawings, like reference characters designate like or corresponding parts and features.

Gemäß einer exemplarischen Ausführungsform zeigt 1 ein veranschaulichendes Trajektorienplanungssystem, das im Allgemeinen bei 100 dargestellt ist und einem Fahrzeug 10 gemäß verschiedenen Ausführungsformen zugeordnet ist. Im Allgemeinen bestimmt das System 100 einen Trajektorienplan für das automatisierte Fahren. Wie in 1 dargestellt, beinhaltet das Fahrzeug 10 im Allgemeinen ein Fahrgestell 12, eine Karosserie 14, Vorderräder 16 und Hinterräder 18. Die Karosserie 14 ist auf dem Fahrgestell 12 angeordnet und umhüllt im Wesentlichen die anderen Komponenten des Fahrzeugs 10. Die Karosserie 14 und das Fahrgestell 12 können gemeinsam einen Rahmen bilden. Die Räder 16-18 sind jeweils mit dem Fahrgestell 12 in der Nähe einer jeweiligen Ecke der Karosserie 14 drehbar verbunden.According to an exemplary embodiment shows 1 an illustrative trajectory planning system that generally accompanies 100 is shown and a vehicle 10 according to various embodiments. In general, the system determines 100 a trajectory plan for automated driving. As in 1 shown, includes the vehicle 10 generally a chassis 12 , a body 14 , Front wheels 16 and rear wheels 18 , The body 14 is on the chassis 12 arranged and substantially covers the other components of the vehicle 10 , The body 14 and the chassis 12 can together form a framework. The wheels 16 - 18 are each with the chassis 12 near a corner of the body 14 rotatably connected.

In verschiedenen Ausführungsformen ist das Fahrzeug 10 ein autonomes Fahrzeug und das Trajektorienplanungssystem 100 ist in das autonome Fahrzeug 10 (nachfolgend als das autonomes Fahrzeug 10 bezeichnet) integriert. Das autonome Fahrzeug 10 ist beispielsweise ein Fahrzeug, das automatisch gesteuert wird, um Passagiere von einem Ort zum anderen zu befördern. Das Fahrzeug 10 ist in der veranschaulichten Ausführungsform als Pkw dargestellt, es sollte jedoch beachtet werden, dass auch jedes andere Fahrzeug, einschließlich Motorräder, Lastwagen, Sportfahrzeuge (SUVs), Freizeitfahrzeuge (RVs), Schiffe, Flugzeuge usw. verwendet werden können. In einer exemplarischen Ausführungsform ist das autonome Fahrzeug 10 ein sogenanntes Level-Vier oder Level-Fünf Automatisierungssystem. Ein Level-Vier-System zeigt eine „hohe Automatisierung“ unter Bezugnahme auf die Fahrmodus-spezifische Leistung durch ein automatisiertes Fahrsystem aller Aspekte der dynamischen Fahraufgabe an, selbst wenn ein menschlicher Fahrer nicht angemessen auf eine Anforderung einzugreifen, reagiert. Ein Level-Fünf-System zeigt eine „Vollautomatisierung“ an und verweist auf die Vollzeitleistung eines automatisierten Fahrsystems aller Aspekte der dynamischen Fahraufgabe unter allen Fahrbahn- und Umgebungsbedingungen, die von einem menschlichen Fahrer verwaltet werden können.In various embodiments, the vehicle is 10 an autonomous vehicle and the trajectory planning system 100 is in the autonomous vehicle 10 (hereinafter referred to as the autonomous vehicle 10 designated) integrated. The autonomous vehicle 10 For example, a vehicle that is automatically controlled to carry passengers from one place to another. The vehicle 10 is illustrated as a passenger car in the illustrated embodiment, but it should be understood that any other vehicle including motorcycles, trucks, sports cars (SUVs), recreational vehicles (RVs), ships, airplanes, etc., may be used. In an exemplary embodiment, the autonomous vehicle is 10 a so-called level-four or level-five automation system. A level four system indicates "high automation" with reference to the drive mode specific performance by an automated driving system of all aspects of the dynamic driving task, even if a human driver does not respond appropriately to a request. A level five system indicates "full automation" and refers to the full-time performance of an automated driving system of all aspects of the dynamic driving task under all road and environmental conditions that can be managed by a human driver.

Wie dargestellt, beinhaltet das autonome Fahrzeug 10 im Allgemeinen ein Antriebssystem 20, ein Übertragungssystem 22, ein Lenksystem 24, ein Bremssystem 26, ein Sensorsystem 28, ein Stellantriebsystem 30, mindestens einen Datenspeicher 32, mindestens eine Steuerung 34 und ein Kommunikationssystem 36. Das Antriebssystem 20 kann in verschiedenen Ausführungsformen einen Verbrennungsmotor, eine elektrische Maschine, wie beispielsweise einen Traktionsmotor und/oder ein Brennstoffzellenantriebssystem, beinhalten. Das Übertragungssystem 22 ist dazu konfiguriert, Leistung vom Antriebssystem 20 zu den Fahrzeugrädern 16-18 gemäß den wählbaren Übersetzungen zu übertragen. Gemäß verschiedenen Ausführungsformen kann das Getriebesystem 22 ein Stufenverhältnis-Automatikgetriebe, ein stufenlos verstellbares Getriebe oder ein anderes geeignetes Getriebe beinhalten. Das Bremssystem 26 ist dazu konfiguriert, den Fahrzeugrädern 16-18 ein Bremsmoment bereitzustellen. Das Bremssystem 26 kann in verschiedenen Ausführungsformen Reibungsbremsen, Brake-by-Wire, ein regeneratives Bremssystem, wie beispielsweise eine elektrische Maschine und/oder andere geeignete Bremssysteme beinhalten. Das Lenksystem 24 beeinflusst die Position der Fahrzeugräder 16-18. Während in einigen Ausführungsformen innerhalb des Umfangs der vorliegenden Offenbarung zur Veranschaulichung als ein Lenkrad dargestellt, kann das Lenksystem 24 kein Lenkrad beinhalten.As shown, includes the autonomous vehicle 10 generally a drive system 20 , a transmission system 22 , a steering system 24 , a braking system 26 , a sensor system 28 , an actuator system 30 , at least one data store 32 , at least one controller 34 and a communication system 36 , The drive system 20 For example, in various embodiments, it may include an internal combustion engine, an electric machine, such as a traction motor, and / or a fuel cell propulsion system. The transmission system 22 is configured to power from the drive system 20 to the vehicle wheels 16 - 18 according to the selectable translations. According to various embodiments, the transmission system 22 a step ratio automatic transmission, a continuously variable transmission or other suitable transmission include. The brake system 26 is configured to the vehicle wheels 16 - 18 to provide a braking torque. The brake system 26 In various embodiments, it may include friction brakes, brake-by-wire, a regenerative braking system, such as an electric machine, and / or other suitable braking systems. The steering system 24 affects the position of the vehicle wheels 16 - 18 , While in some embodiments, within the scope of the present disclosure, illustrated by way of illustration as a steering wheel, the steering system may 24 do not include a steering wheel.

Das Sensorsystem 28 beinhaltet eine oder mehrere Sensorvorrichtungen 40a-40n, die beobachtbare Zustände der äußeren Umgebung und/oder der inneren Umgebung des autonomen Fahrzeugs 10 erfassen. Die Sensorvorrichtungen 40a-40n können Radargeräte, Lidare, globale Positionierungssysteme, optische Kameras, Digitalkameras, Wärmebildkameras, Ultraschallsensoren und/oder andere Sensoren beinhalten, sind aber nicht darauf beschränkt. Das Stellantriebssystem 30 beinhaltet eine oder mehrere Stellantriebs-Vorrichtungen 42a-42n, die ein oder mehrere Fahrzeugmerkmale, wie zum Beispiel das Antriebssystem 20, das Getriebesystem 22, das Lenksystem 24 und das Bremssystem 26, steuern, jedoch nicht darauf beschränkt sind. In verschiedenen Ausführungsformen können die Fahrzeugmerkmale ferner Innen- und/oder Außenfahrzeugmerkmale, wie beispielsweise Türen, einen Kofferraum und Innenraummerkmale, wie z. B. Luft, Musik, Beleuchtung usw., beinhalten, sind jedoch nicht auf diese beschränkt (nicht nummeriert).The sensor system 28 includes one or more sensor devices 40a - 40n , the observable states of the external environment and / or the interior environment of the autonomous vehicle 10 to capture. The sensor devices 40a - 40n may include, but is not limited to, radars, lidars, global positioning systems, optical cameras, digital cameras, thermal imagers, ultrasonic sensors, and / or other sensors. The actuator system 30 includes one or more actuator devices 42a - 42n containing one or more vehicle features, such as the propulsion system 20 , the transmission system 22 , the steering system 24 and the brake system 26 , control, but not limited to. In various embodiments, the vehicle features may further include interior and / or exterior vehicle features, such as doors, a trunk, and interior features such as, for example, vehicle doors. As air, music, lighting, etc., include, but are not limited to these (not numbered).

Die Datenspeichervorrichtung 32 speichert Daten zur Verwendung beim automatischen Steuern des autonomen Fahrzeugs 10. In verschiedenen Ausführungsformen speichert die Datenspeichervorrichtung 32 definierte Landkarten der navigierbaren Umgebung. In verschiedenen Ausführungsformen können die definierten Landkarten vordefiniert und von einem entfernten System abgerufen werden. So können beispielsweise die definierten Landkarten durch das entfernte System zusammengesetzt und dem autonomen Fahrzeug 10 (drahtlos und/oder drahtgebunden) mitgeteilt und in der Datenspeichervorrichtung 32 gespeichert werden. Die Datenspeichervorrichtung 32 speichert weiterhin Daten und Parameter für den Betrieb eines neuronalen Netzwerks, um ein neuronales Netzwerk zur semantischen Segmentierung von digitalen Bildern zu betreiben. Diese Daten können, wie hierin erläutert, Anpassungsverfahren, Verteilung räumlicher Prioren für Merkmale und andere Daten usw. beinhalten. Wie ersichtlich, kann die Datenspeichervorrichtung 32 ein Teil der Steuerung 34, von der Steuerung 34 getrennt, oder ein Teil der Steuerung 34 und Teil eines separaten Systems sein.The data storage device 32 stores data for use in automatically controlling the autonomous vehicle 10 , In various embodiments, the data storage device stores 32 defined maps of the navigable environment. In various embodiments, the defined maps may be predefined and retrieved from a remote system. For example, the defined maps can be composed by the remote system and the autonomous vehicle 10 (wireless and / or wired) and in the data storage device 32 get saved. The data storage device 32 further stores data and parameters for the operation of a neural network to operate a neural network for semantic segmentation of digital images. These data, as discussed herein, may include adaptation methods, distribution of spatial priors for features and other data, and so on. As can be seen, the data storage device 32 a part of the controller 34 , from the controller 34 disconnected, or part of the controller 34 and be part of a separate system.

Die Steuerung 34 beinhaltet mindestens einen Prozessor 44 und eine computerlesbare Speichervorrichtung oder Medien 46. Der Prozessor 44 kann eine Spezialanfertigung oder ein handelsüblicher Prozessor sein, eine Zentraleinheit (CPU), eine Grafikprozessoreinheit (GPU) unter mehreren Prozessoren verbunden mit der Steuerung 34, ein Mikroprozessor auf Halbleiterbasis (in Form eines Mikrochips oder Chip-Satzes), ein Makroprozessor, eine Kombination derselben oder allgemein jede beliebige Vorrichtung zur Ausführung von Anweisungen. Die computerlesbare Speichervorrichtung oder Medien 46 können flüchtige und nicht-flüchtige Speicher in einem Nur-Lese-Speicher (ROM), einem Speicher mit direktem Zugriff (RAM) und einem Keep-Alive-Memory (KAM) beinhalten. KAM ist ein persistenter oder nicht-flüchtiger Speicher, der verwendet werden kann, um verschiedene Betriebsvariablen zu speichern, während der Prozessor 44 ausgeschaltet ist. Die computerlesbare Speichervorrichtung oder Medien 46 können unter Verwendung einer beliebigen einer Anzahl an bekannten Speichervorrichtungen, wie beispielsweise PROMs (programmierbarer Nur-Lese-Speicher), EPROMs (elektrische PROM), EEPROMs (elektrisch löschbarer PROM), Flash-Speicher oder beliebige andere elektrischen, magnetischen, optischen oder kombinierten Speichervorrichtungen implementiert werden, die Daten speichern können, von denen einige ausführbare Anweisungen darstellen, die von der Steuerung 34 beim Steuern des autonomen Fahrzeugs 10 verwendet werden.The control 34 includes at least one processor 44 and a computer readable storage device or media 46 , The processor 44 may be a custom or commercial processor, a central processing unit (CPU), a graphics processor unit (GPU) among multiple processors connected to the controller 34 , a semiconductor-based microprocessor (in the form of a microchip or chip set), a macro-processor, a combination thereof or in general any device for executing instructions. The computer readable storage device or media 46 may include volatile and non-volatile memory in a read only memory (ROM), a random access memory (RAM), and a keep alive memory (KAM). CAM is a persistent or non-volatile memory that can be used to store various operating variables while the processor is running 44 is off. The computer readable storage device or media 46 may be any of a number of known memory devices, such as programmable read only memory (PROM), EPROM (Electric PROM), EEPROM (Electrically Erasable PROM), flash memory, or any other electrical, magnetic, optical, or combined memory devices which can store data, some of which represent executable instructions issued by the controller 34 while controlling the autonomous vehicle 10 be used.

Die Anweisungen können ein oder mehrere separate Programme beinhalten, von denen jede eine geordnete Auflistung von ausführbaren Anweisungen zum Implementieren von logischen Funktionen umfasst. Die Anweisungen empfangen und verarbeiten, wenn diese vom Prozessor 44 ausgeführt werden, Signale vom Sensorsystem 28, führen Logik, Berechnungen, Verfahren und/oder Algorithmen zur automatischen Steuerung der Komponenten des autonomen Fahrzeugs 10 durch und erzeugen Steuersignale an das Stellantriebssystem 30, um die Komponenten des autonomen Fahrzeugs 10 basierend auf der Logik, den Berechnungen, den Verfahren und/oder Algorithmen automatisch zu steuern. Obwohl in 1 nur eine Steuerung 34 dargestellt ist, können Ausführungsformen des autonomen Fahrzeugs 10 eine beliebige Anzahl an Steuerungen 34 beinhalten, die über ein geeignetes Kommunikationsmedium oder eine Kombination von Kommunikationsmedien kommunizieren und zusammenwirken, um die Sensorsignale zu verarbeiten, Logiken, Berechnungen, Verfahren und/oder Algorithmen durchzuführen, und Steuersignale zu erzeugen, um die Funktionen des autonomen Fahrzeugs 10 automatisch zu steuern.The instructions may include one or more separate programs, each of which includes an ordered listing of executable instructions for implementing logical functions. The instructions receive and process, if these from the processor 44 be executed signals from the sensor system 28 , perform logic, calculations, procedures and / or algorithms to automatically control the components of the autonomous vehicle 10 and generate control signals to the actuator system 30 to the components of the autonomous vehicle 10 based on the logic, calculations, methods and / or algorithms to control automatically. Although in 1 only one controller 34 can be shown, embodiments of the autonomous vehicle 10 any number of controllers 34 which communicate and cooperate via a suitable communication medium or combination of communication media to process the sensor signals, perform logics, computations, methods and / or algorithms, and generate control signals to the autonomous vehicle functions 10 to control automatically.

In verschiedenen Ausführungsvarianten werden eine oder mehrere Anweisungen der Steuerung 34 im Trajektorienplanungssystem 100 abgebildet und erzeugen bei der Ausführung durch den Prozessor 44 eine Trajektorienausgabe, die kinematische und dynamische Randbedingungen der Umgebung berücksichtigt. So empfangen die Anweisungen beispielsweise als Eingabe ein digitales Bild der Umgebung von einer fahrzeugeigenen Digitalkamera und betreiben ein neuronales Netzwerk auf dem Prozessor 44, um eine semantische Segmentierung des digitalen Bildes durchzuführen, um Objekte in einem Sichtfeld der Digitalkamera zu klassifizieren und zu identifizieren. Die Anweisungen können weiterhin ein Verfahren zur Anpassung des neuronalen Netzwerks an Bilder durchführen, die in verschiedenen Bereichen oder an verschiedenen Orten aufgenommen wurden. Anpassungsverfahren können die Verwendung von räumlichen Vorverteilungen beinhalten, die während einer Trainingssequenz für das neuronale Netzwerk und die Glättungsoperationen bestimmt wurden. Die Steuerung 34 steuert weiterhin das Stellgliedsystem 30 und/oder die Stellglieder 42a-42c, um das Fahrzeug in Bezug auf die identifizierten Objekte zu navigieren.In various embodiments, one or more instructions are the controller 34 in the trajectory planning system 100 mapped and generated when executed by the processor 44 a trajectory output that takes into account the kinematic and dynamic constraints of the environment. For example, the instructions receive as input a digital image of the environment from an onboard digital camera and operate a neural network on the processor 44 to perform a semantic segmentation of the digital image to classify and identify objects in a field of view of the digital camera. The instructions may also perform a process of adapting the neural network to images taken in different areas or locations. Adjustment techniques may involve the use of spatial pre-distributions determined during a training sequence for the neural network and the smoothing operations. The control 34 continues to control the actuator system 30 and / or the actuators 42a - 42c to navigate the vehicle in relation to the identified objects.

Das Kommunikationssystem 36 ist dazu konfiguriert, Informationen drahtlos an und von anderen Einheiten 48, wie beispielsweise, jedoch nicht beschränkt auf andere Fahrzeuge („V2V“-Kommunikation,) Infrastruktur („V2I“-Kommunikation), entfernte Systeme und/oder persönliche Vorrichtungen (in Bezug auf 2 näher beschrieben), zu übermitteln. In einer exemplarischen Ausführungsform ist das drahtlose Kommunikationssystem 36 dazu konfiguriert, über ein drahtloses lokales Netzwerk (WLAN) unter Verwendung des IEEE 802.11-Standards, über Bluetooth oder mittels einer mobilen Datenkommunikation zu kommunizieren. Im Geltungsbereich der vorliegenden Offenbarung werden jedoch auch zusätzliche oder alternative Kommunikationsverfahren, wie beispielsweise ein dedizierter Nahbereichskommunikations-(DSRC)-Kanal, berücksichtigt. DSRC-Kanäle beziehen sich auf Einweg- oder Zweiwege-Kurzstrecken- bis Mittelklasse-Funkkommunikationskanäle, die speziell für den Automobilbau und einen entsprechenden Satz von Protokollen und Standards entwickelt wurden. The communication system 36 is configured to send information wirelessly to and from other devices 48 , such as but not limited to other vehicles ("V2V" communication,) Infrastructure ("V2I" communication), remote systems and / or personal devices (related to 2 described in more detail). In an exemplary embodiment, the wireless communication system is 36 configured to communicate over a wireless local area network (WLAN) using the IEEE 802.11 standard, via Bluetooth or via mobile data communication. However, additional or alternative communication techniques, such as a dedicated short-range communications (DSRC) channel, are also contemplated within the scope of the present disclosure. DSRC channels refer to one-way or two-way short to medium-range radio communication channels designed specifically for the automotive industry and a corresponding set of protocols and standards.

2 zeigt ein veranschaulichendes digitales Bild 200, das von einer fahrzeugseitigen Digitalkamera des Fahrzeugs 10 erhalten wurde, sowie ein segmentiertes Bild 220, das dem digitalen Bild 200 entspricht. In verschiedenen Ausführungsformen kann das Bild 220 durch einen Operator segmentiert oder durch einen Prozessor erstellt werden, der eine semantische Segmentierung durchführt. Die semantische Segmentierung trennt, grenzt ab oder klassifiziert Pixel im digitalen Bild nach verschiedenen Klassen, die verschiedene Objekte repräsentieren, so dass der Prozessor diese Objekte und ihre Positionen im Sichtfeld der Digitalkamera erkennen kann. Semantische Segmentierung bildet ein Pixel nach seiner Farbe und seiner Beziehung zu anderen Pixeln in Klassen wie Autos 204, Straße, 206, Bürgersteig 208 oder Himmel 210 ab. Zusätzliche Pixelklassen können unter anderem Hohlraum, Zaun, Gelände, LKW, Straße, Mast, Himmel, Bus, Bürgersteig, Ampel, Person, Zug, Gebäude, Verkehrsschild, Fahrer, Motor, Wand, Vegetation, Auto, Fahrrad, usw. beinhalten. 2 shows an illustrative digital image 200 from a vehicle-side digital camera of the vehicle 10 and a segmented image 220 that the digital picture 200 equivalent. In various embodiments, the image 220 segmented by an operator or created by a processor performing semantic segmentation. The semantic segmentation separates, demarcates or classifies pixels in the digital image into different classes that represent different objects so that the processor can recognize these objects and their positions in the field of view of the digital camera. Semantic segmentation forms a pixel by its color and its relationship to other pixels in classes such as cars 204 , Street, 206, sidewalk 208 or heaven 210 from. Additional pixel classes may include, but are not limited to, cavity, fence, terrain, truck, road, mast, sky, bus, sidewalk, traffic light, person, train, building, road sign, driver, engine, wall, vegetation, car, bicycle, etc.

3 veranschaulicht schematisch Verfahren zum Trainieren und Betreiben eines neuronalen Netzwerks. Mathematisch gesehen kann ein neuronales Netzwerk als eine komplizierte nichtlineare Funktion betrachtet werden, wobei ein zu segmentierendes Bild als Eingang in das neuronale Netzwerk dient, wobei die vom Netzwerk prognostizierten beschrifteten Karten als ein Ausgang des neuronalen Netzwerks dienen und die Netzwerkparameter als Koeffizienten, die die Funktion charakterisieren. Wenn die Netzwerkparameter auf ausgewählte Werte initialisiert sind, wird das neuronale Netzwerk 306 in einer ersten Domäne trainiert, die hierin auch als eine Quelldomäne 302 bezeichnet wird. Das neuronale Netzwerk 306 wird mit einem oder mehreren Bildern (hier auch als „Quellbilder“ 304 bezeichnet) zusammen mit den manuell annotierten, mit der Ground Truth (Bodenverifikation) beschrifteten Bildern 320 für die Quelldomäne 302 dargestellt. Die mit der Ground Truth beschrifteten Bilder 320 stellen direkte Beobachtungen der Umgebung dar, mit denen das neuronale Netzwerk 306 trainiert werden kann, dar. Das neuronale Netzwerk 306 führt eine Prognose oder semantische Segmentierung des einen oder der mehreren Quellbilder 304 durch, um segmentierte vom Netzwerk prognostizierte beschriftete Bilder 308 zu erhalten. Um das neuronale Netzwerk zu trainieren, werden die vom Netzwerk prognostizierten beschrifteten Bilder 308 mit den mit der Ground Truth beschrifteten Bilder 320 verglichen, wobei eine Verlustfunktion verwendet wird, um quantitativ zu messen, wie sehr sich die vom Netzwerk prognostizierten beschrifteten Bilder 308 von den mit der Ground Truth beschrifteten Bildern 320 unterscheiden. Der Trainingsverfahren betrifft das iterative Aktualisieren der Parameter des neuronalen Netzwerks 306, so dass der Verlust reduziert wird, und die vom Netzwerk prognostizierten beschrifteten Bilder 308 stimmen zunehmend in hohem Maße mit den mit den mit der Ground Truth beschrifteten Bilder 320 überein. Das trainierte neuronale Netzwerk 306 wird einer zweiten Domäne bereitgestellt, die hierin auch als Zieldomäne 312 bezeichnet wird. Das neuronale Netzwerk 306 führt eine semantische Segmentierung der Zielbilder 314 aus der Zieldomäne 312 durch, um segmentierte beschriftete Bilder 318 zu erhalten. Aufgrund von Unterschieden, die zwischen Quelldomäne 302 und Zieldomäne 312 offensichtlich sind, wie z.B. unterschiedliche Beleuchtung, unterschiedliche Geographie, Stadt vs. Land, usw., funktioniert das neuronale Netzwerk 306 nicht unbedingt so gut in der Zieldomäne 312 wie in der Quelldomäne 302, in der es trainiert wurde. Das neuronale Netzwerk 306 verwendet daher verschiedene Anpassungsverfahren 316, die mit dem neuronalen Netzwerk 306 in der Zieldomäne 312 verwendet werden, um es dem neuronalen Netzwerk 306 zu ermöglichen, die Qualität des segmentierten beschrifteten Bildes 318 in der Zieldomäne 312 zu verbessern. 3 schematically illustrates methods for training and operating a neural network. Mathematically, a neural network can be thought of as a complex non-linear function, with an image to be segmented as the input to the neural network, with the labeled maps predicted by the network serving as an output of the neural network and the network parameters as coefficients representing the function characterize. When the network parameters are initialized to selected values, the neural network becomes 306 in a first domain, also referred to herein as a source domain 302 referred to as. The neural network 306 is associated with one or more images (also referred to herein as "source images" 304) along with the manually annotated images labeled Ground Truth 320 for the source domain 302 shown. The pictures labeled with the Ground Truth 320 represent direct observations of the environment with which the neural network 306 can be trained. The neural network 306 performs a prediction or semantic segmentation of the one or more source images 304 by labeled images projected by the network segmented 308 to obtain. To train the neural network, the images predicted by the network are labeled 308 with the pictures labeled with the Ground Truth 320 using a loss function to quantitatively measure how much the network predicted labeled images 308 from the pictures with the Ground Truth 320 differ. The training method involves iteratively updating the parameters of the neural network 306 so that the loss is reduced, and the pictures predicted by the network 308 increasingly agree with the images tagged with the Ground Truth 320 match. The trained neural network 306 is provided to a second domain, also referred to herein as the destination domain 312 referred to as. The neural network 306 performs a semantic segmentation of the target images 314 from the target domain 312 through to segmented labeled images 318 to obtain. Due to differences between source domain 302 and destination domain 312 are obvious, such as different lighting, different geography, city vs. city. Land, etc., the neural network works 306 not necessarily so good in the target domain 312 as in the source domain 302 in which it was trained. The neural network 306 therefore uses different adaptation methods 316 that with the neural network 306 in the destination domain 312 be used to it the neural network 306 to enable the quality of the segmented labeled image 318 in the destination domain 312 to improve.

Das neuronale Netzwerk 306 wird zunächst durch Einspeisen von Quellbildern 304 mit der Ground Truth 320 von der Quelldomäne 302 in das neuronale Netzwerk 306 trainiert. Das Training des neuronalen Netzwerks 306 wird durch Anpassen eines oder mehrerer Parameter des neuronalen Netzwerks w durchgeführt, um einen Minimalwert einer Verlustfunktion zu erhalten, die einen Verlust der Domänensegmentierung oder einen Verlust darstellt, der während des Segmentierungsprozesses gemäß dem vom Netzwerk prognostizierten beschrifteten Bild 308 und dem mit der Ground Truth beschrifteten Bild 320 auftritt. Ein Segmentierungsverlust ist definiert als ein Produkt aus einer Ground Truth Pixelbeschriftung mit einem Logarithmus einer vorhergesagten Klassenwahrscheinlichkeit. Der Verlust der Domänensegmentierung ist eine Summe dieser Produkte über jede Klasse und jedes Pixel und jedes Bild der Quelldomäne. Eine exemplarische Segmentierungsverlustfunktion ist in Gl. (1): $min_{w} {- \sum_{s = 1}^{S} \sum_{n = 1}^{N} y_{s, n}^{T} log (p_{n} (w, I_{s}))}$

wobei w der Parameter des neuronalen Netzwerks ist, I_s das Quellbild 304 ist, p_n die prognostizierte Klassenwahrscheinlichkeit des n^ten Pixels des Quellbildes 304 ist, wie durch das neuronale Netzwerk bestimmt (oder eine Wahrscheinlichkeit, dass das n^ten Pixel zu einer ausgewählten Klasse gehört), und

y_{s, n}^{T}

eine Pixelbezeichnung oder ein Spaltenvektor für das n^te Pixel ist. Die Pixelbeschriftung

y_{s, n}^{T}

ist im Allgemeinen ein einziger Hot-Vektor, der zum Identifizieren des n^ten Pixels verwendet wird. Der Logarithmus der vorhergesagten Klassenwahrscheinlichkeit ist eine negative Zahl, da die Wahrscheinlichkeiten zwischen 0 und 1 liegen. Somit werden Summierungen vor der Minimierung mit „-1“ multipliziert.The neural network 306 is first by feeding source images 304 with the ground truth 320 from the source domain 302 into the neural network 306 trained. The training of the neural network 306 is performed by adjusting one or more parameters of the neural network w to obtain a minimum value of a loss function representing a loss of domain segmentation or a loss incurred during the segmentation process in accordance with the labeled image predicted by the network 308 and the image labeled with the Ground Truth 320 occurs. A segmentation loss is defined as a product of a ground truth pixel label with a logarithm of a predicted class probability. The loss of domain segmentation is a sum of these products over each class and every pixel and every image of the source domain. An exemplary segmentation loss function is shown in Eq. (1):

min_{w} {- Σ_{s = 1}^{S} Σ_{n = 1}^{N} y_{s . n}^{T} log (p_{n} (w . I_{s}))}

where w is the parameter of the neural network, I _{s is} the source image 304 , p _{n is} the predicted probability of class n ^th pixel of the source image 304 is as determined by the neural network (or a probability that the ^nth pixel belongs to a selected class), and

y_{s . n}^{T}

is a pixel label or a column vector for the ^nth pixel. The pixel caption

y_{s . n}^{T}

is generally a single hot vector used to identify the n ^th pixel. The logarithm of the predicted class probability is a negative number because the probabilities are between 0 and 1. Thus, summations are multiplied by "-1" before minimization.

Das Netzwerk wird dann durch gegnerisches Training sowohl auf den Quellbildern 304 als auch auf den Zielbildern 314 trainiert, um die Prognoseleistung des neuronalen Netzwerks 306 auf Bildern aus den Zielbildern 314 zu verbessern. Das gegnerische Domänentraining ist als das nachstehende Optimierungsproblem formuliert $L_{t o t a l} = L_{s e g} - λ_{A} L_{A}$

wobei

L_{A} = max_{w_{F}} min_{θ} {\sum_{s = 1}^{S} \sum_{n = 1}^{N_{S}} log (p_{n} (w_{F}, θ, I_{s})) - \sum_{t = 1}^{T} \sum_{n = 1}^{N_{S}} log (- p_{n} (w_{F}, θ, I_{t}))}

L_{s e g} = max_{w_{F}, w_{S}} {\sum_{s = 1}^{S} \sum_{n = 1}^{N_{S}} y_{s, n}^{T} log (p_{n} (w_{F}, w_{S}, I_{s}))}

The network then becomes through enemy training on both the source images 304 as well as on the target images 314 trained to predict the performance of the neural network 306 on pictures from the target pictures 314 to improve. The opposing domain training is formulated as the following optimization problem

L_{t O t a l} = L_{s e G} - λ_{A} L_{A}

in which

L_{A} = \underset{w_{F}}{Max} min_{θ} {Σ_{s = 1}^{S} Σ_{n = 1}^{N_{S}} log (p_{n} (w_{F} . θ . I_{s})) - Σ_{t = 1}^{T} Σ_{n = 1}^{N_{S}} log (- p_{n} (w_{F} . θ . I_{t}))}

L_{s e G} = \underset{w_{F} . w_{S}}{Max} {Σ_{s = 1}^{S} Σ_{n = 1}^{N_{S}} y_{s . n}^{T} log (p_{n} (w_{F} . w_{S} . I_{s}))}

p_n(w_F, θ, I_s/t) die Wahrscheinlichkeit für das n^te Pixel in einem Bild I_s/t ist, das aus der Quelldomäne prognostiziert ist. I_s/t gibt an, dass das Bild aus der Quell-/Zieldomäne stammt. Der Index t ∈ {1,2,...,T}, n ∈ {1,2,...,N} sind die Parameter für das domänendiskriminierende Netzwerk, das auf dem neuronalen Netzwerkparameter w_F aufgebaut ist, der dem Merkmalserzeugungsnetzwerk entspricht. Der Parameter ws ist der Parameter des neuronalen Netzwerks, der dem Segmentierungsnetz entspricht. Die Parameter w_F und ws bilden das Segmentierungsnetzwerk.p _n (w _F , θ, I _{s / t} ) is the probability for the n ^th pixel in an image I _{s / t} predicted from the source domain. I _{s / t} indicates that the image originates from the source / destination domain. The index t ∈ {1,2, ..., T}, n ∈ {1,2, ..., N} are the parameters for the domain-discriminating network, which is based on the neural network parameter w _F , the feature generation network equivalent. The parameter ws is the parameter of the neural network corresponding to the segmentation network. The parameters w _F and ws form the segmentation network.

Die vorhergehenden Gleichungen (2)-(4) können durch das folgende iterative Verfahren gelöst werden: 1) Trainieren eines Domänendiskriminators, um Merkmale der Quelldomäne von Merkmalen der Zieldomäne durch Lösen des inneren Minimierungsproblems von Gl. (3) über ein Verfahren zum stochastischen Gradientenabstieg zu unterscheiden; und 2) Trainieren des Merkmalsextraktionsnetzwerks w_F und ws durch Lösen der äußeren Maximierung von Gl. (3) kombiniert mit Gl. (4).The foregoing equations (2) - (4) can be solved by the following iterative method: 1) training a domain discriminator to obtain features of the source domain of features of the target domain by solving the inner minimization problem of eq. (3) to distinguish a stochastic gradient descent method; and 2) training the feature extraction network w _F and ws by solving the outer maximization of Eq. (3) combined with Eq. (4).

Sobald der neuronale Netzwerkparameter w durch ein gegnerisches Domänentraining bestimmt worden ist, wird die Domänenanpassung auf Selbsttrainingsbasis weiterverwendet, um das Netzwerk besser an die Zieldomäne anzupassen. Das Verfahren wird verwendet, um eine semantische Segmentierung von Zielbildern aus der Zieldomäne durchzuführen. Domänenanpassungsverfahren werden verwendet, um das neuronale Netzwerk an die Zieldomäne anzupassen und so die Effektivität des neuronalen Netzwerks in der Zieldomäne zu verbessern. Ähnlich wie das gegnerische Domänentraining trägt auch die Domänenanpassung auf Selbsttrainingsbasis dazu bei, die Effektivität des neuronalen Netzwerks in der Zieldomäne zu verbessern, indem Zielbilder in mehreren Runden oder Iterationen des Netzwerktrainings integriert werden, ohne dass von Menschen annotierte Ground Truths erforderlich sind. Im Gegensatz zum gegnerischen Domänentraining übernimmt die Domänenanpassung auf Selbsttrainingsbasis jedoch einen Rahmen für die Minimierung oder Reduzierung von Verlustfunktionen, der dem traditionellen Netzwerktraining in Gl. (1) ohne den gegnerischen Schritt im gegnerischen Domänentraining ähnelt. Da die Ground Truths der Zieldomäne nicht verfügbar sind, generiert die Domänenanpassung auf Selbsttrainingsbasis Netzwerkprognosen auf Zielbildern und integriert die sichersten Vorhersagen im Netzwerktraining als approximierte Ziel-Ground-Truths (hierin als Pseudobeschriftungen bezeichnet). Sobald die Netzwerkparameter aktualisiert sind, generiert das aktualisierte Netzwerk die Pseudobeschriftungen auf den Zielbildern neu und integriert sie für eine weitere Runde des Netzwerktrainings. Dieser Prozess wird iterativ für mehrere Runden wiederholt. Mathematisch gesehen kann jede Runde der Pseudobeschriftungsgenerierung und des Netzwerktrainings so formuliert werden, dass die Verlustfunktion, die in Gl. (2) gezeigt wird, minimiert wird.Once the neural network parameter w has been determined by enemy domain training, self-training domain customization continues to be used to better adapt the network to the target domain. The method is used to perform semantic segmentation of target images from the target domain. Domain matching techniques are used to tailor the neural network to the target domain to improve the effectiveness of the neural network in the target domain. Similar to adversary domain training, self-training domain customization also helps to improve the effectiveness of the neural network in the target domain by integrating target images into multiple rounds or iterations of network training without the need for human-annotated ground truths. However, unlike hostile domain training, self-training domain customization provides a framework for minimizing or reducing loss functions that is inherent in traditional network training in Eq. (1) without resembling the opponent's step in opposing domain training. Because the ground truths of the target domain are not available, self-training domain customization generates network predictions on target images and integrates the safest predictions in network training as approximate target ground truths (referred to herein as pseudo-labels). Once the network parameters are updated, the updated network regenerates the pseudo-captions on the target images and integrates them for another round of network training. This process is repeated iteratively for several rounds. Mathematically, every round can be pseudo-label generation and network training are formulated so that the loss function described in Eq. (2) is minimized.

Sobald der neuronale Netzwerkparameter w bestimmt wurde, wird er zur semantischen Segmentierung von Zielbildern aus der Zieldomäne verwendet. Domänenadaptionsverfahren werden verwendet, um das neuronale Netzwerk an die Zieldomäne anzupassen, wodurch eine Effektivität des neuronalen Netzwerks in der Zieldomäne verbessert wird. Um die Domänenanpassung in der Zieldomäne durchzuführen, wird eine Zweitverlustfunktion minimiert, die eine Summe aus einem Segmentierungsverlust in der Quelldomäne und einem Segmentierungsverlust in der Zieldomäne beschreibt. Eine repräsentative Verlustfunktion für das Verfahren der Domänenanpassung wird in Gl. (5) gezeigt: $\begin{matrix} m i n \\ \hat{y}, w \end{matrix} {- [\sum_{s = 1}^{S} \sum_{n = 1}^{N} y_{s, n}^{T} l o g (p_{n} (w, I_{s})) + \sum_{t = 1}^{T} \sum_{n = 1}^{N} \sum_{c = 1}^{C} {\hat{y}}_{t, n}^{(c)} l o g (p_{n} (c | w, I_{t})) + \sum_{c = 1}^{C} k_{c} {\hat{y}}_{t, n}^{(c)}]}$

sodass:

{\hat{y}}_{t, n} \in {{e | e \in ℝ^{C}} \cup 0}

k_{c} > 0, \forall c

wobei I_T das Zielbild in der Zieldomäne ist und p_n die vorhergesagte Klassenwahrscheinlichkeit ist. Der Begriff p_n(c|w, I_t) ist eine Wahrscheinlichkeit, dass ein n^tes Pixels des Zielbildes I_t ((bestimmt durch das neuronale Netzwerk mit dem Parameter w) in Klasse c liegt. Der Segmentierungsverlust in der Quelldomäne wird durch den ersten Term (mit Summierungen über S und N) dargestellt und der Segmentierungsverlust in der Zieldomäne wird durch den zweiten Term (mit Summierungen über T, N und C) dargestellt. Der Klassenbegriff c erscheint erst im zweiten Begriff (d. h. der Zieldomäne). Im zweiten Term wird die vorhergesagte Klassenwahrscheinlichkeit mit einer Pseudobeschriftung

{\hat{y}}_{t, n}^{(c)}

multipliziert. Die Pseudobeschriftung

{\hat{y}}_{t, n}^{(c)}

ist ein Einzelwert für ein n^tes Pixel in Klasse c. Die Pseudobeschriftung

{\hat{y}}_{t, n}^{(c)}

ist eine Variable der Verlustfunktion, die angepasst wird, um die Verlustfunktion von Gl. (5) zu minimieren. Nachdem die Pseudobeschriftungen bestimmt wurden, können die Zielbilder in das Netzwerktraining integriert werden, indem Gl. (5) in Bezug auf Netzwerkparameter w minimiert wird, während die Pseudobeschriftungen fixiert werden, die in Berechnungen zur semantischen Segmentierung des Zielbildes verwendet werden können.Once the neural network parameter w has been determined, it is used to semantically segment target images from the target domain. Domain adaptation methods are used to tailor the neural network to the target domain, thereby improving the effectiveness of the neural network in the target domain. To perform the domain customization in the target domain, a second loss function that describes a sum of a segmentation loss in the source domain and a segmentation loss in the target domain is minimized. A representative loss function for the domain matching procedure is described in Eq. (5) shown:

\begin{matrix} m i n \\ \hat{y} . w \end{matrix} {- [Σ_{s = 1}^{S} Σ_{n = 1}^{N} y_{s . n}^{T} l O G (p_{n} (w . I_{s})) + Σ_{t = 1}^{T} Σ_{n = 1}^{N} Σ_{c = 1}^{C} {\hat{y}}_{t . n}^{(c)} l O G (p_{n} (c | w . I_{t})) + Σ_{c = 1}^{C} k_{c} {\hat{y}}_{t . n}^{(c)}]}

so that:

{\hat{y}}_{t . n} \in {{e | e \in ℝ^{C}} \cup 0}

k_{c} > 0 \forall c

where I _{T is} the target image in the destination domain and p _{n is} the predicted class probability. The term p _n (c | w, I _t) is a probability that an ^nth pixel of the target image I _t ((determined by the neural network with parameters w) in class c is the segmentation loss in the source domain is determined by the. The first term (with summations above S and N) is represented and the segmentation loss in the target domain is represented by the second term (with summations over T, N and C.) The class term c appears only in the second term (ie the target domain) Term becomes the predicted class probability with a pseudo-label

{\hat{y}}_{t . n}^{(c)}

multiplied. The pseudo-label

{\hat{y}}_{t . n}^{(c)}

is a single value for an ^nth pixel in class c. The pseudo-label

{\hat{y}}_{t . n}^{(c)}

is a variable of the loss function that is adjusted to the loss function of Eq. (5) minimize. After the pseudo-labels have been determined, the target images can be integrated into the network training by Eqs. (5) is minimized with respect to network parameters w while fixing the pseudo-captions that can be used in semantic segmentation calculations of the target image.

Der dritte Begriff $\sum_{c = 1}^{C} k_{c} {\hat{y}}_{t, n}^{(c)}$

ist ein Einschränkungsbegriff, der verhindert, dass der Mindestwert der Verlustfunktion Null ist oder eine triviale Lösung bereitstellt. Daher beinhaltet die Minimierung der Verlustfunktion von Gl. (5) das Bestimmen eines lokalen Minimums der Verlustfunktion und nicht eines absoluten Minimums der Verlustfunktion. Der Parameter k_c ist ein Schwellenwert, der den Trainingsprozess in der Zieldomäne überwacht, indem er eine Strenge des Pseudobeschriftungsgenerierungsprozesses für die Klasse C steuert. Insbesondere betrifft Aufsicht die Kontrolle der Werte für k_c für jede Klasse, um eine Einschränkung für die jeweilige Klasse zu schaffen, was zu einem klassenausgewogenen Rahmen für die Durchführung des Trainings des neuronalen Netzwerks führt, wie beispielsweise beim Domainadaptionstraining auf Selbsttrainingsbasis. Die Auswahl der Werte kann verwendet werden, um zu verhindern, dass große Klassen (d. h. Klassen, die einen großen Teil der Pixel enthalten) von überwältigenden kleinen Klassen (d. h. Klassen, die wenige Pixel enthalten) und um zu verhindern, dass die kleinen Klassen von größeren Klassen subsumiert werden. Als veranschaulichendes Beispiel können große Klassen Himmel, Straße, Gebäude usw. beinhalten, während kleine Klassen Stoppschilder, Telefonmasten usw. beinhalten können. In einem Rahmen, in dem das neuronale Netzwerk sich selbst trainiert, kann man die Häufigkeit des Auftretens jeder Klasse in Bildern aus der Quelldomäne zählen und einen Schwellenwert einer bestimmten Klasse finden, in dem der Anteil der Pixel mit vorhergesagten Wahrscheinlichkeiten dieser Klasse größer als der Schwellenwert ist der, der Frequenz der Quelldomäne ist. Dieser Schwellenwert wird dann verwendet, um den Parameter k_c einzustellen. Die Auswahl verschiedener Parameterwerte k_c für jede Klasse ermöglicht die Überwachung des Trainings des neuronalen Netzwerks in der Zieldomäne, indem die Klassen bei der Segmentierung der Zielbilder von der Größenänderung abgehalten werden.The third term

Σ_{c = 1}^{C} k_{c} {\hat{y}}_{t . n}^{(c)}

is a restriction term that prevents the minimum value of the loss function from being zero or providing a trivial solution. Therefore, minimizing the loss function of Eq. (5) determining a local minimum of the loss function and not an absolute minimum of the loss function. The parameter k _c is a threshold that monitors the training process in the target domain by controlling severity of the class C pseudo-label generation process. In particular, supervision refers to the control of values for k _c for each class to provide a constraint for the particular class, resulting in a class-balanced framework for performing neural network training, such as self-training domain adaptation training. The selection of values can be used to prevent large classes (ie, classes that contain a large portion of the pixels) from overwhelming small classes (ie, classes that contain few pixels) and to prevent the small classes of be subsumed into larger classes. As an illustrative example, large classes may include sky, road, buildings, etc., while small classes may include stop signs, telephone poles, etc. In a framework in which the neural network trains itself, one can count the frequency of occurrence of each class in images from the source domain and find a threshold of a particular class in which the proportion of predicted probabilities of that class is greater than the threshold is the frequency of the source domain. This threshold is then used to set the parameter k _c . Selecting different parameter values k _c for each class allows monitoring of training of the neural network in the target domain by discouraging the classes from resizing the segmentation of the target images.

In einem weiteren Aspekt verwenden die hierin offenbarten Verfahren Verteilungen räumlicher Prioren, um die Summe aus Segmentierungsverlust in der Zieldomäne und Segmentierungsverlust in der Quelldomäne zu reduzieren. Trotz der Unterschiede zwischen Quelldomänen und Zieldomänen treten in der Regel verschiedene Merkmale oder Objekte an gleichen oder ähnlichen Stellen in digitalen Bildern unabhängig von der Domäne auf. So nimmt beispielsweise der Himmel oft den oberen Teil des Bildes ein, während die Straße und der Gehweg oft im unteren Teil bleiben. Die Wahrscheinlichkeitsverteilung dieser Merkmale in einem Bild kann in einem skalaren Feld bereitgestellt werden, das hierin als eine Verteilung räumlicher Prioren bezeichnet wird. Verteilungen räumlicher Prioren werden im Allgemeinen aus Bildern in der Quelldomäne beim Training des neuronalen Netzwerks bestimmt und dann auf dem Speichermedium zur Verwendung in der Zieldomäne gespeichert. Wenn das neuronale Netzwerk das Zielbild segmentiert, kann die Verteilung räumlicher Prioren zusammen mit dem Zielbild verwendet werden, um die Klassenwahrscheinlichkeiten in der Zieldomäne zu verbessern. In another aspect, the methods disclosed herein use spatial priors distributions to reduce the sum of segmentation loss in the target domain and segmentation loss in the source domain. Despite the differences between source domains and target domains, typically, different features or objects occur at the same or similar locations in digital images, regardless of the domain. For example, the sky often occupies the upper part of the image, while the road and walkway often remain in the lower part. The probability distribution of these features in an image may be provided in a scalar field, referred to herein as a distribution of spatial priors. Distributions of spatial priors are generally determined from images in the source domain during training of the neural network and then stored on the storage medium for use in the destination domain. When the neural network segments the target image, the distribution of spatial priors along with the target image may be used to improve the class probabilities in the target domain.

4A und 4B zeigen verschiedene räumliche Prioren, die während des Trainings des neuronalen Netzwerks im Quellbereich erhalten werden. Die Figuren veranschaulichen räumliche Prioren von 19 verschiedenen Klassen. In der oberen Reihe von 4A sind die Klassen, von links nach rechts, die Straße 401, der Bürgersteig 402, das Gebäude 403 und die Wand 404. In der zweiten Reihe von links nach rechts sind die Klassen der Zaun 405, der Mast 406, die Ampel 407 und die Verkehrszeichen 408. In der dritten Reihe, von links nach rechts, sind die Klassen die Baumvegetation 409, das Gelände 410, der Himmel 411 und die Person 412. Fortfahrend in der oberen Reihe von 4B, von links nach rechts, sind die Klassen der Fahrer 413, das Auto 414,der Lastwagen 415 und der Bus 416. In der zweiten Reihe von 4B, von links nach rechts, sind die Klassen der Zug 417, das Motorrad 418 und das Fahrrad 419. Jede Verteilung räumlicher Prioren wird für ein digitales Bild angezeigt, das 2000 Pixel breit und 1000 Pixel hoch ist, obwohl das digitale Bild in verschiedenen Ausführungsformen eine bestimmte Dimension oder ein bestimmtes Aspektverhältnis aufweisen kann. Die hellen Bereiche einer Verteilung räumlicher Prioren geben eine Position mit hoher Wahrscheinlichkeit für das Auftreten des Merkmals an. Die dunklen Bereiche einer Verteilung räumlicher Prioren geben eine Position mit geringer Wahrscheinlichkeit des Auftretens des Merkmals an. Die Graustufen, die die Wahrscheinlichkeiten anzeigen, werden rechts neben dem räumlichen Prior angezeigt. 4A and 4B show different spatial priors obtained during training of the neural network in the source area. The figures illustrate spatial priors of 19 different classes. In the upper row of 4A are the classes, from left to right, the road 401 , the sidewalk 402 , the building 403 and the wall 404 , In the second row from left to right, the classes are the fence 405 , the mast 406 , the traffic lights 407 and the traffic signs 408 , In the third row, from left to right, the classes are the tree vegetation 409 , the site 410 , the sky 411 and the person 412 , Continuing in the top row of 4B From left to right, the classes are the drivers 413 , the car 414 , the truck 415 and the bus 416 , In the second row of 4B From left to right, the classes are the train 417 , the motorcycle 418 and the bike 419 , Each distribution of spatial priors is displayed for a digital image that is 2000 pixels wide and 1000 pixels high, although in various embodiments the digital image may have a particular dimension or aspect ratio. The bright areas of a distribution of spatial priors indicate a position with high probability of occurrence of the feature. The dark areas of a distribution of spatial priors indicate a position with a low probability of occurrence of the feature. The greyscales that display the probabilities appear to the right of the spatial prior.

Als Beispiel geben das räumliche Prior für einen Bürgersteig 402 an, dass Bürgersteige tendenziell in der Nähe der Unterseite oder Seite des Bildes erscheinen. Das räumliche Prior für den Himmel 411 gibt an, dass der Himmel dazu neigt, nahe der Spitze und Mitte des Bildes zu erscheinen. Das räumliche Prior für Gebäude 403 und das räumliche Prior für Baumvegetation 409 geben an, dass Gebäude und Baumvegetation dazu neigen, über die Oberseite der Bilder zu verlaufen.As an example, give the spatial prior for a sidewalk 402 sidewalks tend to appear near the bottom or side of the image. The spatial prior for the sky 411 indicates that the sky tends to appear near the top and center of the image. The spatial prior for buildings 403 and the spatial priority for tree vegetation 409 indicate that buildings and tree vegetation tend to run over the top of the images.

In einer Ausführungsform können die Verteilungen räumlicher Prioren in die Kostenfunktion eingegeben werden, um einen weiteren Begriff bereitzustellen, der den semantischen Segmentierungsprozess in der Zieldomäne verfeinert. In verschiedenen Ausführungsformen wird die Verteilung räumlicher Prioren mit der vorhergesagten Klassenwahrscheinlichkeit p_n multipliziert und aus diesem Produkt der Zielsegmentierungsverlust bestimmt. Eine exemplarische Verlustfunktion, die Verteilungen räumlicher Prioren beinhaltet, wird in Gl. (8) gezeigt: $\begin{array}{l} \begin{matrix} m i n \\ \hat{y}, w \end{matrix} {- [\sum_{s = 1}^{S} \sum_{n = 1}^{N} y_{s, n}^{T} l o g (p_{n} (w, I_{s})) + \\ \sum_{t = 1}^{T} \sum_{n = 1}^{N} \sum_{c = 1}^{C} {\hat{y}}_{t, n}^{(c)} l o g (p_{n} (c | w, I_{t}) q_{n}^{(c)}) + \sum_{c = 1}^{C} k_{c} {\hat{y}}_{t, n}^{(c)}}] \end{array}$

sodass:

{\hat{y}}_{t, n}^{T} \in {{e | e \in ℝ^{C}} \cup 0}

\sum_{n} q_{n}^{(c)} = 1 / C

k > 0, \forall c

In one embodiment, the spatial priority distributions may be input to the cost function to provide another term that refines the semantic segmentation process in the destination domain. In various embodiments, the distribution of spatial priors is multiplied by the predicted class probability p _n and the target segmentation loss is determined from this product. An exemplary loss function involving distributions of spatial priors is given in Eq. (8) shown:

\begin{array}{l} \begin{matrix} m i n \\ \hat{y} . w \end{matrix} {- [Σ_{s = 1}^{S} Σ_{n = 1}^{N} y_{s . n}^{T} l O G (p_{n} (w . I_{s})) + \\ Σ_{t = 1}^{T} Σ_{n = 1}^{N} Σ_{c = 1}^{C} {\hat{y}}_{t . n}^{(c)} l O G (p_{n} (c | w . I_{t}) q_{n}^{(c)}) + Σ_{c = 1}^{C} k_{c} {\hat{y}}_{t . n}^{(c)}}] \end{array}

so that:

{\hat{y}}_{t . n}^{T} \in {{e | e \in ℝ^{C}} \cup 0}

Σ_{n} q_{n}^{(c)} = 1 / C

k > 0 \forall c

In einem weiteren Aspekt kann die Glätte, die in einer Segmentierung gefunden wird, die in der Quelldomäne auftritt, verwendet werden, um eine Glättung in Segmentierungsbildern in der Zieldomäne zu erreichen. Pixel, die ähnliche Merkmale aufweisen und in derselben Klasse in der Quelldomäne gruppiert sind, sollten in der Zieldomäne zusammengefasst werden.In another aspect, the smoothness found in a segmentation that occurs in the source domain may be used to achieve smoothing in segmentation images in the target domain. Pixels that have similar characteristics and are grouped in the same class in the source domain should be grouped together in the destination domain.

5 zeigt ein veranschaulichendes digitales Bild 500, das in einer Zieldomäne zur semantischen Segmentierung erhalten wurde. Das Bild 500 beinhaltet verschiedene Merkmalsklassen wie den Himmel 502, das Fahrzeug 504, die Straße 506 und die Motorhaube 508. 5 shows an illustrative digital image 500 that was obtained in a target domain for semantic segmentation. The picture 500 includes different feature classes like the sky 502 , the vehicle 504 , the street 506 and the hood 508 ,

6 zeigt ein ungestütztes semantisches Segmentierungsbild 600 des digitalen Bildes 500. Die Himmelsklasse 602 nimmt deutlich weniger des Segmentierungsbildes 600 ein als die Himmelsklasse 502 des digitalen Bildes 500. Außerdem wird das Fahrzeug 504 des digitalen Bildes 500 durch zwei verschiedene Merkmalsklassen mit den Bezeichnungen 604a und 604b im Segmentierungsbild 600 dargestellt. Die Straßenklasse 606 des Segmentierungsbildes 600 nimmt nur einen Teil des Segmentierungsbildes 600 ein, während die entsprechende Straße 506 des Bildes 500 von der linken Seite zur rechten Seite des digitalen Bildes 500 reicht. Außerdem scheint die Haubenklasse 608 im Segmentierungsbild 600 viel größer zu sein als die entsprechende Haube 508 im digitalen Bild 500. 6 shows an unsupported semantic segmentation image 600 of the digital image 500 , The sky class 602 takes significantly less of the segmentation picture 600 one as the sky class 502 of the digital image 500 , In addition, the vehicle 504 of the digital image 500 by two different feature classes with the terms 604a and 604b in the segmentation picture 600 shown. The street class 606 of the segmentation image 600 takes only part of the segmentation image 600 one while the corresponding street 506 of the picture 500 from the left side to the right side of the digital image 500 enough. Besides, the hood class seems 608 in the segmentation picture 600 to be much larger than the corresponding hood 508 in the digital image 500 ,

7 zeigt ein semantisches Segmentierungsbild 700 nach erfolgter Anpassung des neuronalen Netzwerks (z.B. Gl. (5)). Die Klassenmerkmale von Bild 700 sind besser auf die Merkmale des Originalbildes 500 abgestimmt als die Klassenmerkmale von Bild 600. Insbesondere stellt der Himmel 702 den Himmel 502 von Bild 500 näher als den Himmel 602 von Bild 600 dar. Das Fahrzeug 504 wird in Bild 700 durch eine einzelne Klasse 704 dargestellt. Die Straßenklasse 706 nimmt viel mehr vom Bild 700 ein, ebenso wie die Straße 506 von Bild 500. Zusätzlich wurde die Haubenklasse 708 reduziert um eine bessere Übereinstimmung mit der Größe der Haube 508 in Bild 500 aufzuweisen. 7 shows a semantic segmentation image 700 after adaptation of the neural network (eg equation (5)). The class characteristics of image 700 are better on the features of the original image 500 matched as the class characteristics of image 600 , In particular, the sky represents 702 the sky 502 from picture 500 closer than the sky 602 from picture 600 dar. The vehicle 504 is in picture 700 through a single class 704 shown. The street class 706 takes a lot more of the picture 700 a, as well as the road 506 from picture 500 , In addition, the hood class 708 reduced to better match the size of the hood 508 in picture 500 exhibit.

Während die obige Offenbarung mit Bezug auf exemplarische Ausführungsformen beschrieben wurde, werden Fachleute verstehen, dass unterschiedliche Änderungen vorgenommen und die einzelnen Teile durch entsprechende andere Teile ausgetauscht werden können, ohne vom Umfang der Offenbarung abzuweichen. Darüber hinaus können viele Modifikationen vorgenommen werden, um eine bestimmte Materialsituation an die Lehren der Offenbarung anzupassen, ohne von deren wesentlichem Umfang abzuweichen. Daher ist vorgesehen, dass die Erfindung nicht auf die offenbarten speziellen Ausführungsformen eingeschränkt sein soll, sondern dass sie auch alle Ausführungsformen beinhaltet, die innerhalb des Umfangs der Anmeldung fallen.While the above disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and the individual parts may be substituted with corresponding other parts without departing from the scope of the disclosure. In addition, many modifications may be made to adapt a particular material situation to the teachings of the disclosure without departing from the essential scope thereof. Thus, it is intended that the invention not be limited to the particular embodiments disclosed, but that it also encompass all embodiments falling within the scope of the application.

Claims

A method of navigating a vehicle, comprising: Determining a target segmentation loss to train a neural network to perform a semantic segmentation on a target domain image; Determining a value of a pseudo-caption of the target image by reducing the target segmentation loss while monitoring the training over the target domain; Performing a semantic segmentation of the target image using the trained neural network to segment the target image and classify an object in the target image; and Navigate the vehicle based on the classified object in the target image.

Method according to Claim 1 further comprising determining a source segmentation loss to train the neural network to perform a semantic segmentation on a source domain image and reducing a summation of the source segmentation loss and the target segmentation loss while providing training monitoring over the target domain, wherein reducing the summation means Setting parameters of the neural network and the value of the pseudo-labeling comprises.

Method according to Claim 1 and further comprising determining the value of the pseudo-description of the target image by reducing the target segmentation loss over a plurality of segmentation classes while providing the monitoring of each of the plurality of segmentation classes.

Method according to Claim 1 wherein determining the target segmentation loss further comprises multiplying the spatial pre-distribution for the segmentation class by a class probability of a pixel in the segmentation class.

Method according to Claim 1 further comprising training the neural network using one of the following: (i) opposing domain adaptation training; and self-training domain customization training.

Method according to Claim 1 wherein monitoring the training further comprises performing a class compensation for the target segmentation loss.

Navigation system for a vehicle, comprising: a digital camera for acquiring a target image of a target domain of the vehicle; a processor configured to: Determining a target segmentation loss to train the neural network to perform a semantic segmentation of the target image in the target domain; Determining a value of a pseudo-description of the target image by reducing the loss of the target segmentation while monitoring the training over the target domain; Performing a semantic segmentation of the target image using the trained neural network to segment the target image and classify an object in the target image; and Navigate the vehicle based on the classified object in the target image.

Navigation system after Claim 7 wherein the processor is further configured to determine a source segmentation loss to train the neural network to perform a semantic segmentation on a source domain image and to reduce summation of the source segmentation loss and the target segmentation loss while providing training over the target domain monitoring; wherein reducing the summation comprises adjusting a parameter of the neural network and the value of the pseudo-label.

Navigation system after Claim 7 wherein the processor is further configured to determine the value of the pseudo-caption of the target image by reducing the loss of the target segmentation over a plurality of segmentation classes while providing the monitoring of each of the plurality of segmentation classes.

Navigation system after Claim 7 wherein the processor is further configured to multiply a distribution of spatial priors for the segmentation class by a class probability of a pixel in the segmentation class.