DE102017201603A1

DE102017201603A1 - Method and control unit for determining the position of a point of view

Info

Publication number: DE102017201603A1
Application number: DE102017201603.5A
Authority: DE
Inventors: Cornelia Denk; Felix Klanner; Ralph Helmar Rasshofer; Chamara Liyanaarachchi Lekamalage; Yan Yang; Guangbin Huang
Original assignee: Bayerische Motoren Werke AG; Nanyang Technological University
Current assignee: Bayerische Motoren Werke AG
Priority date: 2017-02-01
Filing date: 2017-02-01
Publication date: 2018-08-02

Abstract

Es wird ein Verfahren (300) zur Ermittlung der Position eines Gesichtspunktes (122) auf Basis von Bilddaten (112), die ein Bild (203) eines Kopfes (120) umfassen, beschrieben. Das Verfahren (300) umfasst das Ermitteln (301) einer Vielzahl von Blöcken (201) des Bildes (203), wobei ein Block (201) einen Teilbereich des Bildes (203) umfasst. Außerdem umfasst das Verfahren (300) das Abbilden (302) der Vielzahl von Blöcken (201) auf eine entsprechende Vielzahl von Merkmalswerten (211) von K unterschiedlichen Merkmals-Karten (210) mittels K unterschiedlicher Faltungs-Gewichtsvektoren β(202), mit k = 1, ..., K und mit K > 1. Dabei wurden die K unterschiedlichen Faltungs-Gewichtsvektoren β(202) mittels einer Maschine-Lern-Methode auf Basis von Trainingsdaten ermittelt. Außerdem umfasst das Verfahren das Ermitteln (303) der Position zumindest eines Gesichtspunktes (122) auf Basis der K unterschiedlichen Merkmals-Karten (210).A method (300) for determining the position of a viewpoint (122) based on image data (112) comprising an image (203) of a head (120) is described. The method (300) comprises determining (301) a plurality of blocks (201) of the image (203), wherein a block (201) comprises a subregion of the image (203). In addition, the method (300) comprises mapping (302) the plurality of blocks (201) to a corresponding plurality of feature values (211) of K different feature maps (210) using K different convolution weight vectors β (202), k = 1, ..., K and with K> 1. The K different convolution weight vectors β (202) were determined by means of a machine-learning method on the basis of training data. In addition, the method includes determining (303) the position of at least one viewpoint (122) based on the K distinct feature maps (210).

Description

Die Erfindung betrifft ein Verfahren und eine entsprechende Steuereinheit zur Erkennung von Gesichtspunkten und zur Bestimmung der Position von Gesichtspunkten, insbesondere um den Zustand eines Fahrers und/oder die Kopfhaltung eines Fahrers eines Fahrzeugs zu ermitteln.The invention relates to a method and a corresponding control unit for detecting aspects and for determining the position of aspects, in particular in order to determine the condition of a driver and / or the head posture of a driver of a vehicle.

In einem Fahrzeuge können Funktionen in Abhängigkeit von dem Zustand eines Fahrers des Fahrzeugs und/oder in Abhängigkeit von der Kopfhaltung bzw. Blickrichtung des Fahrers des Fahrzeugs bereitgestellt werden. Beispielsweise kann eine zumindest teilautonome Fahrfunktion automatisch aktiviert werden, wenn erkannt wird, dass der Fahrer eines Fahrzeugs den Blick vom Verkehr abgewendet hat.In a vehicle, functions may be provided depending on the condition of a driver of the vehicle and / or depending on the head position or gaze direction of the driver of the vehicle. For example, an at least partially autonomous driving function can be activated automatically when it is detected that the driver of a vehicle has averted his eyes from traffic.

Das vorliegende Dokument befasst sich mit der technischen Aufgabe, ein Verfahren und eine Steuereinheit bereitzustellen, durch die in effizienter, zuverlässiger und robuster Weise Gesichtspunkte eines Menschen erkannt und lokalisiert werden können, insbesondere um den Zustand und/oder die Kopfhaltung eines Fahrers eines Fahrzeugs zu bestimmen.The present document deals with the technical problem of providing a method and a control unit, by means of which the aspects of a person can be identified and located in an efficient, reliable and robust manner, in particular in order to determine the condition and / or the head posture of a driver of a vehicle ,

Die Aufgabe wird durch die Merkmale der unabhängigen Patentansprüche gelöst. Vorteilhafte Ausführungsformen werden u.a. in den abhängigen Ansprüchen beschrieben. Es wird darauf hingewiesen, dass zusätzliche Merkmale eines von einem unabhängigen Patentanspruch abhängigen Patentanspruchs ohne die Merkmale des unabhängigen Patentanspruchs oder nur in Kombination mit einer Teilmenge der Merkmale des unabhängigen Patentanspruchs eine eigene und von der Kombination sämtlicher Merkmale des unabhängigen Patentanspruchs unabhängige Erfindung bilden können, die zum Gegenstand eines unabhängigen Anspruchs, einer Teilungsanmeldung oder einer Nachanmeldung gemacht werden kann. Dies gilt in gleicher Weise für in der Beschreibung beschriebene technische Lehren, die eine von den Merkmalen der unabhängigen Patentansprüche unabhängige Erfindung bilden können.The object is solved by the features of the independent claims. Advantageous embodiments are described i.a. in the dependent claims. It should be noted that additional features of a claim dependent on an independent claim without the features of the independent claim or only in combination with a subset of the features of the independent claim may form an independent invention independent of the combination of all features of the independent claim, the subject of an independent claim, a divisional application or a subsequent application. This applies equally to technical teachings described in the specification, which may form an independent invention of the features of the independent claims.

Gemäß einem Aspekt wird ein Verfahren zur Ermittlung der Position zumindest eines Gesichtspunktes auf Basis von Bilddaten beschrieben, wobei die Bilddaten ein Bild eines Kopfes umfassen. Typischerweise wird die Position von mehreren Gesichtspunkten (z.B. 20, 30 oder mehr Gesichtspunkten) ermittelt. Die Position kann dabei die (kartesischen) Koordinaten eines Gesichtspunktes innerhalb des Bildes anzeigen. Auf Basis der Position von mehreren Gesichtspunkten kann die Kopfhaltung eines Menschen, insbesondere eines Fahrers eines Fahrzeugs, ermittelt werden.According to one aspect, a method for determining the position of at least one viewpoint based on image data is described, the image data comprising an image of a head. Typically, the location is determined from several viewpoints (e.g., 20, 30, or more points of view). The position can indicate the (Cartesian) coordinates of a point of view within the image. On the basis of the position from several points of view, the head posture of a person, in particular a driver of a vehicle, can be determined.

Das Verfahren umfasst das Ermitteln einer Vielzahl von Blöcken des Bildes, wobei ein Block einen Teilbereich des Bildes umfasst. Ein Bild kann z.B. 100×100 oder mehr Bildpunkte umfassen. Andererseits kann ein Block z.B. 3×3 oder 5x5 Bildpunkte umfassen. Die Blöcke können sich dabei teilweise überlappen.The method includes determining a plurality of blocks of the image, wherein a block comprises a portion of the image. An image may e.g. 100 × 100 or more pixels. On the other hand, a block may be e.g. 3 × 3 or 5x5 pixels. The blocks may partially overlap.

Außerdem umfasst das Verfahren das Abbilden der Vielzahl von Blöcken auf eine entsprechende Vielzahl von Merkmalswerten von K unterschiedlichen Merkmals-Karten mittels K unterschiedlicher Faltungs-Gewichtsvektoren β_conv _k, mit k = 1, ..., K und mit K > 1. Es können somit K unterschiedliche Faltungs-Gewichtsvektoren β_conv _k bereitgestellt werden, um ein Bild auf unterschiedliche Weise zu analysieren, und um so K unterschiedliche Merkmals-Karten bereitzustellen, die unterschiedliche Aspekte des Bildes beschreiben.In addition, the method includes mapping the plurality of blocks to a corresponding plurality of feature values of K different feature maps using K different convolution weight vectors β _conv _k , k = 1, ..., K, and K> 1 thus K different convolutional weight vectors β _conv _k are provided to analyze an image in different ways and thus to provide K different feature maps describing different aspects of the image.

Insbesondere können die K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k ausgebildet sein, einen Block K unterschiedlichen Bild-Struktur-Typen zuzuweisen, bzw. um zu ermitteln, ob ein Block eine von K unterschiedlichen Bild-Struktur-Typen aufweist. Dabei können die unterschiedlichen Bild-Struktur-Typen z.B. ein oder mehrere Kanten und ein oder mehrere Kurven (z.B. mit jeweils unterschiedlichen Orientierungen) umfassen. Die unterschiedlichen Merkmals-Karten (für die unterschiedlichen Bild-Struktur-Typen) können dann die Positionen von unterschiedlichen Bild-Strukturen innerhalb eines Bildes anzeigen.In particular, the K different convolution weight _vectors β _conv _k can be designed to assign a block K to different image structure types or to determine whether a block has one of K different image structure types. The different image structure types may include, for example, one or more edges and one or more curves (eg with different orientations in each case). The different feature maps (for the different image texture types) may then display the positions of different image structures within an image.

Die K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k können mittels einer Maschine-Lern-Methode auf Basis von Trainingsdaten ermittelt worden sein. Insbesondere können die K Faltungs-Gewichtsvektoren β_conv _k Gewichtungsvektoren von K künstlichen neuronalen Netzwerken sein, wobei das Bild eine Eingangs-Schicht und die K Merkmals-Karten die Ausgangs-Schichten der K neuronalen Netzwerke darstellen können. Darüber hinaus können die künstlichen neuronalen Netzwerke ein oder mehrere versteckte Zwischenschichten aufweisen. Insbesondere können die Faltungs-Gewichtsvektoren β_conv _k Ausgangs-Gewichte einer Extrem Learning Maschine, ELM, umfassen. Des Weiteren kann die Maschine-Lern-Methode zur Ermittlung der Faltungs-Gewichtsvektoren β_conv _k einen Extreme Learning Maschine, ELM, Auto Encoder, AE umfassen.The K different convolution weight _vectors β _conv _k may have been determined by means of a machine learning method based on training data. In particular, the K convolution weight vectors β _conv _{k may be} weight vectors of K artificial neural networks, the image being an input layer and the K feature maps representing the output layers of the K neural networks. In addition, the artificial neural networks may have one or more hidden intermediate layers. In particular, the convolution weight _vectors β _conv _k may include output weights of an extreme learning machine, ELM. Furthermore, the machine-learning method for determining the convolution weight _vectors β _conv _{k may} include an Extreme Learning Machine, ELM, Auto Encoder, AE.

Die Extrem Learning Maschine, ELM, wird z.B. in Z. Bai, G.-Bin. Huang, L. L. C. Kasun and C. M. Vong, „Local Receptive Fields Based Extreme Learning Machine“, IEEE Computational Intelligence Magazine, vol, 10, no. 2, 2015 beschrieben. Der Extreme Learning Maschine, ELM, Auto Encoder, AE, wird z.B. in L. L. C. Kasun, H. Zhou, G.-B. Huang, and C. M. Vong, „Representational Learning with Extreme Learning Machine for Big Data,“ IEEE Intelligent Systems, vol. 28, no. 6, pp. 31-34, 2013 beschrieben. Diese Dokumente werden durch Bezugnahme hierin aufgenommen.The extreme learning machine, ELM, is used eg in Z. Bai, G.-Bin. Huang, LLC Kasun and CM Vong, "Local Receptive Fields Based Extreme Learning Machine," IEEE Computational Intelligence Magazine, vol, 10, no. 2, 2015 described. The Extreme Learning Machine, ELM, Auto Encoder, AE, is used eg in LLC Kasun, H. Zhou, G.-B. Huang, and CM Vong, "Representational Learning with Extreme Learning Machine for Big Data, "IEEE Intelligent Systems, vol. 28, no. 6, pp. 31-34, 2013 described. These documents are incorporated herein by reference.

Außerdem umfasst das Verfahren das Ermitteln der Position zumindest eines Gesichtspunktes auf Basis der K unterschiedlichen Merkmals-Karten. Durch die Verwendung von unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k zur Analyse von unterschiedlichen Teilbereichen eines Bildes kann die Position eines Gesichtspunktes in effizienter und präziser Weise ermittelt werden.In addition, the method includes determining the position of at least one viewpoint based on the K different feature maps. By using different convolution weight _vectors β _conv _k to analyze different portions of an image, the position of a viewpoint can be determined in an efficient and precise manner.

Die Faltungs-Gewichtsvektoren können insbesondere durch folgende Formel ermittelt worden sein: $β_{c o n v} = {(\frac{I}{C} + H^{T} H)}^{- 1} H^{T} T$

The convolution weight vectors can be determined in particular by the following formula:

β_{c O n v} = {(\frac{I}{C} + H^{T} H)}^{- 1} H^{T} T

Dabei ist die Matrix β_conv = [β_conv ₁, ..., β_conv _K] die Zusammenfassung der K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k; I ist eine L × L Einheitsmatrix; C ist ein Regularisierungs-Term; und T ist eine Matrix mit Trainings-Blöcken x_n , mit n = 1, ..., N, einer Vielzahl von Trainings-Bildern. Es können somit N Trainings-Blöcke bereitgestellt werden, um die Faltungs-Gewichtsvektoren anzulernen.Where [β β _conv _1, ..., _conv _K] is the matrix β _conv = the summary of K different convolutional weight vectors β _k _conv; I is an L × L unit matrix; C is a regularization term; and T is a matrix of training blocks x _n , where n = 1, ..., N, a plurality of training images. Thus, N training blocks can be provided to train the convolution weight vectors.

Des Weiteren ist H = [h(x₁), ..., h(x_N)]^T, wobei h(x_n) L Ausgangswerte von L Neuronen einer versteckten Schicht von Neuronen für den Trainings-Bock x_n sind. L kann z.B. die Anzahl von Bildpunkten in einem Trainings-Bock x_n sein.Further, H = [h (x ₁ ), ..., h (x _N )] ^T , where h (x _n ) L are initial values of L neurons of a hidden layer of neurons for the training gan x _n . L can be eg the number of pixels in a training block x _n .

Außerdem kann $h (x_{n}) = {[h_{1} (x_{n}), \dots, h_{L} (x_{n})]}^{T}$

sein, wobei h_l(x_n) = g(a_l · x_n + b_l), mit l = 1, ...,L, eine Aktivierungsfunktion des l^ten Neurons der versteckten Schicht ist; und wobei a_l zufällige Gewichtungen und b_l ein zufälliger Schwellenwert sind. So wird somit ein effizientes und präzises Anlernen der Faltungs-Gewichtsvektoren ermöglicht. Die Faltungs-Gewichtsvektoren können im Vorfeld zur Ermittlung der Position zumindest eines Gesichtspunktes anhand der o.g. Formeln ermittelt worden sein.In addition, can

H (x_{n}) = {[H_{1} (x_{n}) . ... . H_{L} (x_{n})]}^{T}

, where h _l _(n · x + b _l a _l) is to be (x _n) = g, with l = 1, ..., L, activation function of the l ^th neuron of the hidden layer; and where a _{l are} random weights and b _{l is} a random threshold. Thus, an efficient and precise learning of the convolution weight vectors is thus made possible. The convolution weight vectors may have been determined in advance to determine the position of at least one aspect using the above-mentioned formulas.

Das Abbilden eines Blockes x mittels eines Faltungs-Gewichtsvektors β_conv _k kann umfassen $c_{k, i} (x) = \sum_{l = 1}^{L} x_{l} β_{c o n v k, l}$

wobei c_k,i(x) der i^te Merkmalswert der k^ten Merkmals-Karte ist; wobei x_l ein l^ter Bildpunkt des Blockes x ist; und wobei β_conv _k,l ein l^tes Faltungs-Gewicht des Faltungs-Gewichtsvektors β_conv _k ist.The mapping of a block x by means of a convolution weight _vector β _conv _k may comprise

c_{k . i} (x) = Σ_{l = 1}^{L} x_{l} β_{c O n v k . l}

where c _{k, i} (x) is the i ^th feature value of the k ^th feature map; where x _{l is} an ^lth pixel of the block x; and wherein β _conv _{k, l} is an l ^th convolution weight of the convolutional weight vector β _conv _k.

Das Ermitteln der Position zumindest eines Gesichtspunktes kann für jede der K unterschiedlichen Merkmals-Karten das Ermitteln einer entsprechenden Zusammenfassungs-Karte mit jeweils einer Vielzahl von Zusammenfassungswerten umfassen. Dabei kann ein Zusammenfassungswert der k^ten Zusammenfassung-Karte auf Basis einer Norm, insbesondere einer L2-Norm, eines Merkmals-Blocks von Merkmalswerten (z.B. mit 2x2 Merkmalswerten) der k^ten Merkmals-Karte ermittelt werden. So kann bewirkt werden, dass das Verfahren translationsinvariant ist.Determining the position of at least one viewpoint may include, for each of the K distinct feature maps, determining a corresponding summary map, each having a plurality of summary values. In this case, a summary value of the k ^th summary map can be determined on the basis of a standard, in particular an L2 standard, of a feature block of feature values (eg with 2x2 feature values) of the k ^th feature map. Thus, the method can be caused to be translation invariant.

Das Ermitteln der Position zumindest eines Gesichtspunktes kann für jede der K unterschiedlichen Merkmals-Karten das Ermitteln eines Verbindungsvektors mit einer gegenüber der Anzahl von Merkmalswerten bzw. der Anzahl von Zusammenfassungswerten reduzierten Anzahl von Verbindungswerten umfassen. Dabei können die Verbindungswerte eines k^ten Verbindungsvektors anhand von Verbindungs-Gewichtsvektoren P _full
k , mit k = 1, ..., K, ermittelt werden. Die K unterschiedlichen Verbindungs-Gewichtsvektoren β_full
k können dabei mittels einer Maschine-Lern-Methode, insbesondere mittels eines ELM AE, auf Basis der Trainingsdaten ermittelt worden sein. So können Zusammenhänge zwischen den unterschiedlichen Bild-Strukturen ermittelt werden.Determining the position of at least one aspect may comprise, for each of the K different feature maps, determining a connection vector having a reduced number of connection values compared to the number of feature values and the number of summary values, respectively. In this case, the connection values of a k ^th connection vector based on connection weight vectors P _full _k , with k = 1, ..., K. The K different connection weight vectors β _full _k may have been determined by means of a machine-learning method, in particular by means of an ELM AE, on the basis of the training data. Thus, relationships between the different image structures can be determined.

Außerdem kann das Ermitteln der Position zumindest eines Gesichtspunktes das Ermitteln eines Positionsvektors auf Basis der K Verbindungsvektoren umfassen. Dabei können die K Verbindungsvektoren durch einen gemeinsamen Ausgangs-Gewichtsvektor β_output in den Positionsvektor überführt werden. Der Ausgangs-Gewichtsvektor β_output kann dabei mittels einer Maschine-Lern-Methode auf Basis der Trainingsdaten und auf Basis von Labels für die Trainingsdaten ermittelt worden sein, wobei die Labels tatsächliche Positionen von Gesichtspunkten in den Trainingsdaten anzeigen. Durch den Ausgangs-Gewichtsvektor β_output können die Bild-Strukturen der einzelnen Merkmals-Karten bzw. Verbindungsvektoren zusammengeführt werden, um in robuster Weise Gesichtspunkte zu detektieren und deren Position zu bestimmen.In addition, determining the position of at least one aspect may include determining a position vector based on the K connection vectors. In this case, the K connection _{vectors can be converted} into the position vector by a common output weight vector β _output . The output weight vector β _output may have been determined by means of a machine-learning method on the basis of the training data and on the basis of labels for the training data, the labels indicating actual positions of viewpoints in the training data. By means of the output weight vector β _output , the image structures of the individual feature maps or connection vectors can be combined in order to robustly detect aspects and determine their position.

Gemäß einem weiteren Aspekt wird eine Steuereinheit zur Ermittlung der Position eines Gesichtspunktes auf Basis von Bilddaten beschrieben, wobei die Bilddaten ein Bild eines Kopfes umfassen. Die Steuereinheit ist eingerichtet, eine Vielzahl von Blöcken des Bildes zu ermitteln, wobei ein Block einen Teilbereich des Bildes umfasst. Außerdem ist die Steuereinheit eingerichtet, mittels K unterschiedlicher Faltungs-Gewichtsvektoren β_conv _k die Vielzahl von Blöcken auf eine entsprechende Vielzahl von Merkmalswerten von K unterschiedlichen Merkmals-Karten abzubilden, mit k = 1, ..., K und mit K > 1 (wobei K eine ganze Zahl ist). Dabei wurden die K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k mittels einer Maschine-Lern-Methode auf Basis von Trainingsdaten ermittelt. Die Steuereinheit ist ferner eingerichtet, die Position zumindest eines Gesichtspunktes auf Basis der K unterschiedlichen Merkmals-Karten zu ermitteln.According to a further aspect, a control unit for determining the position of a viewpoint on the basis of image data is described, the image data comprising an image of a head. The control unit is set up to determine a plurality of blocks of the image, one block comprising a partial area of the image. In addition, the control unit is set up by means of K different convolution weight _vectors β _conv _k the plurality of Mapping blocks to a corresponding plurality of feature values of K different feature maps, where k = 1, ..., K and K> 1 (where K is an integer). The K different convolution weight _vectors β _conv _{k were determined} by means of a machine-learning method on the basis of training data. The control unit is further configured to determine the position of at least one viewpoint based on the K different feature maps.

Gemäß einem weiteren Aspekt wird ein Fahrzeug (insbesondere ein Straßenkraftfahrzeug z.B. ein Personenkraftwagen, ein Lastkraftwagen oder ein Motorrad) beschrieben, das die in diesem Dokument beschriebene Steuereinheit umfasst (z.B. um die Kopfhaltung eines Fahrers des Fahrzeugs zu ermitteln).According to a further aspect, a vehicle (in particular a road motor vehicle, for example a passenger car, a truck or a motorcycle) is described, which comprises the control unit described in this document (for example, to determine the head posture of a driver of the vehicle).

Gemäß einem weiteren Aspekt wird ein Software (SW) Programm beschrieben. Das SW Programm kann eingerichtet werden, um auf einem Prozessor ausgeführt zu werden, und um dadurch das in diesem Dokument beschriebene Verfahren auszuführen.In another aspect, a software (SW) program is described. The SW program can be set up to run on a processor and thereby perform the method described in this document.

Gemäß einem weiteren Aspekt wird ein Speichermedium beschrieben. Das Speichermedium kann ein SW Programm umfassen, welches eingerichtet ist, um auf einem Prozessor ausgeführt zu werden, und um dadurch das in diesem Dokument beschriebene Verfahren auszuführen.In another aspect, a storage medium is described. The storage medium may include a SW program that is set up to run on a processor and thereby perform the method described in this document.

Es ist zu beachten, dass die in diesem Dokument beschriebenen Verfahren, Vorrichtungen und Systeme sowohl alleine, als auch in Kombination mit anderen in diesem Dokument beschriebenen Verfahren, Vorrichtungen und Systemen verwendet werden können. Des Weiteren können jegliche Aspekte der in diesem Dokument beschriebenen Verfahren, Vorrichtungen und Systemen in vielfältiger Weise miteinander kombiniert werden. Insbesondere können die Merkmale der Ansprüche in vielfältiger Weise miteinander kombiniert werden.It should be understood that the methods, devices and systems described herein may be used alone as well as in combination with other methods, devices and systems described in this document. Furthermore, any aspects of the methods, devices, and systems described herein may be combined in a variety of ways. In particular, the features of the claims can be combined in a variety of ways.

Im Weiteren wird die Erfindung anhand von Ausführungsbeispielen näher beschrieben. Dabei zeigen

Figur 1a beispielhafte Komponenten eines Fahrzeugs;
Figur 1b beispielhafte Gesichtspunkte eines Gesichts eines Fahrers eines Fahrzeugs;
2 eine beispielhafte Architektur eines Algorithmus zur Ermittlung der Position von Gesichtspunkten; und
3 ein Ablaufdiagramm eines beispielhaften Verfahrens zur Ermittlung der Position eines Gesichtspunktes.

Furthermore, the invention will be described in more detail with reference to exemplary embodiments. Show

Figure 1a exemplary components of a vehicle;
FIG. 1b shows exemplary aspects of a face of a driver of a vehicle;
2 an exemplary architecture of an algorithm for determining the position of viewpoints; and
3 a flowchart of an exemplary method for determining the position of a viewpoint.

Wie eingangs dargelegt, befasst sich das vorliegende Dokument mit der effizienten und zuverlässigen Ermittlung der Position von ein oder mehreren Gesichtspunkten eines Menschen, insbesondere eines Fahrers eines Fahrzeugs, auf Basis von Bilddaten. In diesem Zusammenhang zeigt Fig. la beispielhafte Komponenten eines Fahrzeugs 100. Das Fahrzeug 100 umfasst einen Bildsensor 101, z.B. eine Bildkamera, der eingerichtet ist, Bilddaten 112 bezüglich des Kopfes eines Insassen, insbesondere des Fahrers, des Fahrzeugs 100 zu erfassen. As set forth above, the present document is concerned with the efficient and reliable determination of the position of one or more aspects of a human, in particular a driver of a vehicle, based on image data. In this context, Fig. La shows exemplary components of a vehicle 100 , The vehicle 100 comprises an image sensor 101, eg an image camera, which is set up, image data 112 concerning the head of an occupant, in particular the driver, of the vehicle 100 capture.

Die Bilddaten 112 können ein Bild des Kopfes 120 des Insassen des Fahrzeugs 100 anzeigen (siehe 1b).The image data 112 can take a picture of the head 120 of the occupant of the vehicle 100 (see 1b ).

Das Fahrzeug 100 umfasst eine Steuereinheit 101, die eingerichtet ist, die Bilddaten 112 zu analysieren. Insbesondere kann die Steuereinheit 101 eingerichtet sein, auf Basis der Bilddaten 112 die Position von ein oder mehreren Gesichtspunkten 122 auf dem Kopf 120, insbesondere auf dem Gesicht, des Insassen zu ermitteln. Die Gesichtspunkte 122 können Teil eines markanten Bereichs 121, z.B. Teil eines Auges, einer Nase, eines Mundes, einer Augenbraue, etc., des Gesichts 120 sein. Insbesondere können die Positionen von mehreren Gesichtspunkten 122 ermittelt werden (wie z.B. in 1b dargestellt). So kann ein Modell des Kopfes 120 des Insassen erstellt werden, z.B. um den Zustand des Insassen und/oder um die Haltung des Kopfes 120 zu ermitteln. Die Steuereinheit 101 kann dann, z.B. in Abhängigkeit von der Haltung des Kopfes 120, ein oder mehrere Aktuatoren 103 des Fahrzeugs 100 ansteuern, um ein Fahrerassistenzsystem bereitzustellen.The vehicle 100 includes a control unit 101 that is set up the image data 112 analyze. In particular, the control unit 101 be set up based on the image data 112 the position of one or more points of view 122 on the head 120 , in particular on the face, of the occupant. The points of view 122 can be part of a distinctive area 121 , eg part of an eye, a nose, a mouth, an eyebrow, etc., of the face 120 be. In particular, the positions may have several aspects 122 be determined (such as in 1b shown). So can a model of the head 120 be created by the occupant, for example, the condition of the occupant and / or the attitude of the head 120 to investigate. The control unit 101 can then, for example, depending on the attitude of the head 120 , one or more actuators 103 of the vehicle 100 to provide a driver assistance system.

Zur Ermittlung der Position eines Gesichtspunktes 122 können diverse Bildanalysealgorithmen verwendet werden. In diesem Dokument wird ein Algorithmus beschrieben, der angelernte künstliche neuronale Netzwerke verwendet, um die Bilddaten 112 bezüglich des Kopfes 120 eines Menschen einem Ausgangsvektor zuzuweisen, der die Position von ein oder mehreren Gesichtspunkten 122 anzeigt. 2 veranschaulicht den in diesem Dokument beschriebenen Algorithmus 200.To determine the position of a point of view 122 Various image analysis algorithms can be used. This document describes an algorithm that uses trained artificial neural networks to capture the image data 112 concerning the head 120 assign a human to a seed vector that is the position of one or more points of view 122 displays. 2 illustrates the algorithm described in this document 200 ,

Die Bilddaten 112 repräsentieren ein Bild 203 mit einer Vielzahl von Pixeln bzw. Bildpunkten 204, insbesondere mit einer Matrix von Pixeln 204, durch die der Kopf 120 eines Menschen dargestellt wird. Das Bild 203 kann in eine Vielzahl von Blöcken x_j 201 unterteilt werden. Die Blöcke 201 weisen dabei beispielsweise eine Größe von r × r auf, wobei r = 10, 5 oder 3. Die Blöcke 201 können sich zumindest teilweise überlappen. Beispielsweise können die Blöcke 201 dadurch gebildet werden, dass ausgehend von einer Ecke des Bildes 203 (z.B. links oben), durch Verschieben nach rechts um jeweils ein Pixel 204 und/oder durch Verschieben nach unten um jeweils ein Pixel 204 ein neuer Block 201 generiert wird. Es ergeben sich somit für ein Bild 203 mit den Abmessungen d × d, (d - r + 1) × (d - r + 1) Blöcke 201.The image data 112 represent a picture 203 with a plurality of pixels 204 , in particular with a matrix of pixels 204 through which the head 120 of a human being. The picture 203 can be divided into a plurality of blocks x _j 201. The blocks 201 have, for example, a size of r × r, where r = 10, 5 or 3. The blocks 201 may at least partially overlap. For example, the blocks 201 may be formed by starting from one corner of the image 203 (eg top left), by shifting to the right by one pixel at a time 204 and / or by moving down each a pixel 204 a new block 201 is generated. This results in a picture 203 with the dimensions d × d, (d-r + 1) × (d-r + 1) blocks 201 ,

Ein Block 201 kann durch einen Faltungs-Gewichtsvektor β_conv _k 202 auf eine Merkmals-Karte 210 abgebildet werden. Dabei umfasst eine Merkmals-Karte 210 für jeden der Vielzahl von Blöcken 201 einen Merkmalswert c_i,j (x) 211. Es kann somit eine Merkmals-Karte 210 mit einer Vielzahl von Merkmalswert c_i,j(x) 211 bereitgestellt werden. Dabei können die Merkmalswerte 211 aus den entsprechenden Blöcken 201 durch Faltung bzw. Konvolution mit dem durch den Faltungs-Gewichtsvektor β_conv _k 202 gebildeten Filter ermittelt werden, als $\begin{matrix} c_{i, j} (x) = \sum_{m = 1}^{r} \sum_{n = 1}^{r} x_{i + m - 1, j + n - 1} \cdot β_{c o n v_{k_{m, n}}} \\ i, j = 1, \dots, (d - r + 1) \end{matrix}$

wobei x_i+m-1,j+n-1 die Pixel 204 eines Blockes 201 und β_conv _k
m,n die Faltungs-Gewichte eines k^ten Faltungs-Gewichtsvektors sind. In der o.g. Formel werden die Zeilen i und die Spalten j eines Blocks betrachtet. Alternativ können die Blöcke 201 als eindimensionale Vektoren dargestellt werden (durch Aneinanderreihung der einzelnen Spalten).A block 201 may converte 202 by a convolution weight vector β _conv _k 202 on a feature map 210 be imaged. This includes a feature card 210 for each of the multitude of blocks 201 a feature value c _{i, j} (x) 211. It can thus be a feature map 210 with a plurality of feature values c _{i, j} (x) 211. The characteristic values 211 from the corresponding blocks 201 are determined by convolution or convolution with the filter formed by the convolution weight _vector β _conv _k 202, as

\begin{matrix} c_{i . j} (x) = Σ_{m = 1}^{r} Σ_{n = 1}^{r} x_{i + m - 1, j + n - 1} \cdot β_{c O n v_{k_{m . n}}} \\ i . j = 1, ... . (d - r + 1) \end{matrix}

where x _{i + m-1, j + n-1 are} the pixels 204 a block 201 and β _conv _k _{m, n} are the convolution weights of a k ^th convolution weight vector. In the above formula, the lines i and the columns j of a block are considered. Alternatively, the blocks 201 represented as one-dimensional vectors (by juxtaposing the individual columns).

Es können K unterschiedliche Faltungs-Gewichtsvektoren 202, β_conv ₁ bis β_conv _K verwendet werden, um aus der Vielzahl von Blöcken 201 eines Bildes 203 K unterschiedliche Merkmals-Karten 210 zu ermitteln.There can be K different convolution weight vectors 202 , β _conv ₁ to β _conv _K used to get out of the large number of blocks 201 of a picture 203 K different feature cards 210 to investigate.

Durch die Ermittlung von unterschiedlichen Merkmals-Karten 210 können unterschiedliche markante Strukturen, wie z.B. Kanten oder Kurven, detektiert werden. Durch die Merkmalswerkte 211 der unterschiedlichen Merkmals-Karten 210 können somit unterschiedliche Strukturen innerhalb des ursprünglichen Bildes 203 angezeigt werden. Beispielsweise können durch eine erste Merkmals-Karte 210 Strukturen eines ersten Struktur-Typs (z.B. Kanten in eine erste Richtung) und durch eine zweite Merkmals-Karte 210 Strukturen eines zweiten Struktur-Typs (z.B. Kanten in eine zweite Richtung) angezeigt werden.By identifying different feature cards 210 different distinctive structures, such as edges or curves, can be detected. Through the feature works 211 The different feature maps 210 may thus have different structures within the original image 203 are displayed. For example, by a first feature card 210 Structures of a first structure type (eg, edges in a first direction) and a second feature map 210 Structures of a second structure type (eg, edges in a second direction) are displayed.

Die Anwendung von unterschiedlichen Faltungs-Gewichtsvektoren 202 auf Teilbereiche, d.h. auf Blöcke 201, eines Bildes 203 ermöglicht es in effizienter Weise die Tatsache zu berücksichtigen, dass zwischen benachbarten Bildpunkten 204 eines Bildes 203 typischerweise eine relativ hohe Korrelation und zwischen weiter entfernten Bildpunkten 204 eines Bildes 203 typischerweise eine relativ geringe Korrelation besteht.The application of different folding weight vectors 202 on subareas, ie on blocks 201 , a picture 203 makes it possible to efficiently consider the fact that between adjacent pixels 204 of an image 203 typically a relatively high correlation and between more distant pixels 204 a picture 203 typically has a relatively low correlation.

Die Faltungs-Gewichtsvektoren 202 können zufällig ausgewählt werden, z.B. mittels einer bestimmten Wahrscheinlichkeitsverteilung. Andererseits kann die Güte der Erkennung von Gesichtspunkten 122 substantiell erhöht werden, wenn die Faltungs-Gewichtsvektoren 202 auf Basis von Trainingsdaten erlernt werden. Dazu kann vorteilhaft der sogenannte Extreme Learning Maschine (ELM) Auto Encoder (AE) verwendet werden, der z.B. in L. L. C. Kasun, H. Zhou, G.-B. Huang, and C. M. Vong, „Representational Learning with Extreme Learning Machine for Big Data,“ IEEE Intelligent Systems, vol. 28, no. 6, pp. 31-34, 2013 , beschrieben wird. Insbesondere kann die in dem o.g. Dokument geschriebene Gleichung (6) verwendet werden, um Faltungs-Gewichtsvektoren 202 auf Basis von Trainingsdaten zu ermitteln.The convolution weight vectors 202 can be selected at random, eg by means of a certain probability distribution. On the other hand, the quality of the recognition of viewpoints 122 be substantially increased when the convolution weight vectors 202 be learned on the basis of training data. This can be advantageously used the so-called Extreme Learning Machine (ELM) Auto Encoder (AE), for example, in LLC Kasun, H. Zhou, G.-B. Huang, and CM Vong, "Representational Learning with Extreme Learning Machine for Big Data," IEEE Intelligent Systems, vol. 28, no. 6, pp. 31-34, 2013 , is described. In particular, the equation (6) written in the above-mentioned document can be used to construct convolution weight vectors 202 based on training data to determine.

Der in 2 dargestellte Algorithmus umfasst einen Zusammenfassungsschritt, bei dem mehrere Merkmalswerte 211 einer Merkmals-Karte 210 zu einem Zusammenfassungswert 221 zusammengefasst werden. Insbesondere können Merkmals-Blöcke 214 aus e × e Merkmalswerten 211 gebildet werden (z.B. e = 3 oder 2), wobei aus den Merkmalswerten 211 eines Merkmals-Blocks 214 jeweils ein Zusammenfassungswert 221 ermittelt wird. Die Merkmals-Blöcke 214 können sich dabei zumindest teilweise überlappen. Ein Zusammenfassungswert 221 kann mittels einer Norm, z.B. einer L1 oder L2 Norm, aus den Merkmalswerten 211 eines Merkmals-Blocks 214 ermittelt werden, z.B. als $h_{p, q, k} = \sqrt{\sum_{i = p - e / 2}^{p + e / 2} \sum_{j = q - e / 2}^{q + e / 2} c_{i, j, k}^{2}}$

wobei h_p,q,k in der obigen Gleichung der Zusammenfassungswert 221 p, q für die k^te Zusammenfassungs-Karte 220 ist. Es ergeben sich somit K Zusammenfassungs-Karten 220 mit Zusammenfassungswerten 221. Durch die Ermittlung von Zusammenfassungs-Karten 220 kann der Algorithmus 200 skalen- und translations-invariant gemacht werden.The in 2 The algorithm illustrated comprises a summarizing step in which a plurality of feature values 211 a feature card 210 to a summary value 221 be summarized. In particular, feature blocks 214 from e × e characteristic values 211 are formed (eg e = 3 or 2), where from the feature values 211 a feature block 214 each a summary value 221 is determined. The feature blocks 214 can overlap at least partially. A summary value 221 may be obtained from the feature values by means of a norm, eg an L1 or L2 norm 211 a feature block 214 be determined, eg as

H_{p . q . k} = \sqrt{Σ_{i = p - e / 2}^{p + e / 2} Σ_{j = q - e / 2}^{q + e / 2} c_{i . j . k}^{2}}

where h _{p, q, k} in the above equation is the summary value 221 p, q for the k ^th summary card 220 is. This results in K summary maps 220 with summary values 221 , By identifying summary cards 220 can the algorithm 200 scales and translations are made invariant.

In einem Verbindungsschritt können Werte 221, 211 der einzelnen Zusammenfassungs-Karten 220 bzw. Merkmals-Karten 210 miteinander verbunden werden. Zu diesem Zweck können für jede Karte 220, 210 Verbindungs-Gewichtsvektoren P _fullk 222 ermittelt werden. Die Verbindungs-Gewichtsvektoren P _fullk können vorteilhaft mittels des o.g. ELM-AE Verfahrens auf Basis von Karten 220, 210 ermittelt werden, die aus Trainingsdaten mit einer Vielzahl von Bildern 203 bestimmt wurden. Die Verbindungs-Gewichtsvektoren 222 können die Werte einer Karte 220, 210 auf eine reduzierte Anzahl von Verbindungswerten 231 eines Verbindungsvektors 230 abbilden. Es ergeben sich somit K Verbindungsvektoren 230 für die K Karten 220, 210.In a connection step, values 221 . 211 the individual resume cards 220 or feature cards 210 be connected to each other. For this purpose, for each card 220 . 210 Compound weight _vectors P _fullk 222 can be detected. The connection weight _vectors P _fullk can advantageously by means of the _{above-mentioned} ELM-AE method based on maps 220 . 210 be determined from training data with a variety of images 203 were determined. The connection weight vectors 222 may be the values of a map 220 . 210 to a reduced number of connection values 231 a connection vector 230 depict. This results in K connection vectors 230 for the K cards 220 . 210 ,

Schließlich können die K Verbindungsvektoren 230 durch die Ausgangs-Gewichte eines Ausgangs-Gewichtsvektors β_output 232 in einen Positionsvektor 240 überführt werden, der die Positionen von ein oder mehreren Gesichtspunkten 122 anzeigt. Die Ausgangs-Gewichte können mittels linear Regression oder mittels des o.g. ELM-AE Verfahrens auf Basis von Trainingsdaten und unter Verwendung von Labels, die die tatsächlichen Positionen der ein oder mehreren Gesichtspunkte 122 in den Trainingsdaten anzeigen, erlernt werden. Finally, the K connection vectors 230 be converted by the output weights of an output weight vector β _output 232 into a position vector 240 indicative of the positions of one or more viewpoints 122. The output weights can be determined by means of linear regression or by means of the above-mentioned ELM-AE method based on training data and using labels showing the actual positions of one or more aspects 122 in the training data show, be learned.

Durch den in 2 dargestellten Algorithmus kann in effizienter und präziser Weise die Position (z.B. die x/y Koordinaten) von ein oder mehreren Gesichtspunkten 122 eines Menschen auf Basis von Bilddaten 112 ermittelt werden. So kann die Kopfhaltung eines Menschen in zuverlässiger Weise auf Basis der Position von Gesichtspunkten 122 ermittelt werden (auch bei Änderung von Beleuchtungssituationen)By the in 2 The algorithm depicted may efficiently and precisely determine the position (eg, the x / y coordinates) of one or more aspects 122 of a human on the basis of image data 112 be determined. Thus, the head posture of a person can dependably based on the position of viewpoints 122 be determined (even when changing lighting situations)

3 zeigt ein Ablaufdiagramm eines beispielhaften Verfahrens 300 zur Ermittlung der Position (insbesondere der Koordinaten) eines Gesichtspunktes 122 auf Basis von Bilddaten 112, wobei die Bilddaten 112 ein Bild 203 eines Kopfes 120 eines Menschen umfassen. Insbesondere können die Bilddaten 112 das Bild 203 eines Kopfes 120 eines Fahrers eines Fahrzeugs 100 umfassen. Anhand der Position von ein oder mehreren Gesichtspunkten 122 des Kopfes 120 des Fahrers kann auf eine Haltung des Kopfes 120, z.B. auf eine Blickrichtung, des Fahrers geschlossen werden. Diese Information kann zur Steuerung eines Fahrerassistenzsystems verwendet werden. Das Verfahren 300 ist ein Computerimplementiertes Verfahren, das z.B. auf einer Steuereinheit 101 eines Fahrzeugs 100 ausgeführt werden kann. 3 shows a flowchart of an exemplary method 300 for determining the position (in particular the coordinates) of a viewpoint 122 on the basis of image data 112 , where the image data 112 a picture 203 a head 120 of a human being. In particular, the image data 112 the picture 203 a head 120 a driver of a vehicle 100 include. Based on the position of one or more points of view 122 Of the head 120 the driver may be on an attitude of the head 120 , For example, in a line of sight, the driver to be closed. This information can be used to control a driver assistance system. The procedure 300 is a computer-implemented method, for example, on a control unit 101 a vehicle 100 can be executed.

Das Verfahren 300 umfasst das Ermitteln 301 einer Vielzahl von Blöcken 201 des Bildes 203. Beispielsweise können mit einem Versatz von jeweils einem Bildpunkt 204 in horizontaler Richtung und/oder in vertikaler Richtung unterschiedliche Blöcke 201 aus einem Bild 203 gebildet werden. Ein Block 201 umfasst somit einen Teilbereich des Bildes 203 (z.B. einen Teilbereich mit r × r Bildpunkten 204).The procedure 300 includes determining 301 a variety of blocks 201 of the picture 203 , For example, with an offset of one pixel each 204 in the horizontal direction and / or in the vertical direction different blocks 201 from a picture 203 be formed. A block 201 thus includes a portion of the image 203 (eg a subarea with r × r pixels 204 ).

Außerdem umfasst das Verfahren 300 das Abbilden 302 der Vielzahl von Blöcken 201 auf eine entsprechende Vielzahl von Merkmalswerten 211 von K unterschiedlichen Merkmals-Karten 210 mittels K unterschiedlicher Faltungs-Gewichtsvektoren β_conv _k 202, mit k = 1, ..., K und mit K > 1. Insbesondere kann jeder Block 201 der Vielzahl von Blöcken 201 mit K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k 202 gefiltert werden, um die Merkmalwerte 211 von K unterschiedlichen Merkmals-Karten 210 zu ermitteln. Die K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k 202 können dabei ausgelegt sein, K unterschiedliche Bild-Struktur-Typen aus einem Block 201 herauszufiltern. Durch die Verwendung von K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k 202 können somit K unterschiedliche Merkmals-Karten 210 für ein Bild 203 bereitgestellt werden, die jeweils die Position von unterschiedlichen Bild-Struktur-Typen in dem Bild 203 anzeigen.In addition, the process includes 300 the picture 302 of the plurality of blocks 201 to a corresponding plurality of feature values 211 from K different feature cards 210 using K different convolutional weight vectors β _conv _k 202, k = 1, ..., K and K> 1. In particular, each block 201 the variety of blocks 201 with K different convolutional weight vectors β _conv _k 202 to filter the feature values 211 of K different feature maps 210 to investigate. The K different convolution weight _vectors β _conv _k 202 can be designed here, K different image structure types from a block 201 filter out. By using K different convolution weight vectors β _conv _k 202, K can thus use different feature maps 210 for a picture 203 are each provided the position of different image structure types in the image 203 Show.

Die K unterschiedlichen Faltungs-Gewichtsvektoren β_conv _k 202 können mittels einer Maschine-Lern-Methode auf Basis von Trainingsdaten ermittelt worden sein. Insbesondere kann eine ELM-AE Methode verwendet worden sein, um die Faltungs-Gewichtsvektoren β_conv _k 202 zu ermitteln. So kann in effizienter und zuverlässiger Weise eine Analyse eines Bildes 203 in Bezug auf unterschiedliche Bild-Struktur-Typen erfolgen.The K different convolution weight _vectors β _conv _k 202 may have been determined by means of a machine-learning method on the basis of training data. In particular, an ELM-AE method may have been used to determine the convolution weight _vectors β _conv _k 202. So, in an efficient and reliable way can analyze an image 203 in terms of different image structure types.

Außerdem umfasst das Verfahren 300 das Ermitteln 303 der Position zumindest eines Gesichtspunktes 122 auf Basis der K unterschiedlichen Merkmals-Karten 210. Zu diesem Zweck kann ein künstliches neuronales Netzwerk verwendet werden, das eingerichtet ist, die K unterschiedlichen Merkmals-Karten 210 auf einen Positionsvektor 240 abzubilden, der die Position, insbesondere die Koordinaten, zumindest eines Gesichtspunktes 122 anzeigt. Das künstliche neuronale Netzwerk kann dabei auf Basis der Trainingsdaten und auf Basis von Labels für die Trainingsdaten angelernt werden. Dabei zeigen die Labels die tatsächlichen Position des zumindest einen Gesichtspunktes 122 für die Trainingsdaten an.In addition, the process includes 300 the determining 303 the position of at least one point of view 122 based on the K different feature maps 210. For this purpose, an artificial neural network can be used that is set up, the K different feature maps 210 to a position vector 240 depicting the position, in particular the coordinates, of at least one point of view 122 displays. The artificial neural network can be trained on the basis of the training data and on the basis of labels for the training data. The labels show the actual position of the at least one aspect 122 for the training data.

Es wird somit eine effiziente und zuverlässige Ermittlung der Position von Gesichtspunkten 122 ermöglicht.It thus becomes an efficient and reliable determination of the position of viewpoints 122 allows.

Die vorliegende Erfindung ist nicht auf die gezeigten Ausführungsbeispiele beschränkt. Insbesondere ist zu beachten, dass die Beschreibung und die Figuren nur das Prinzip der vorgeschlagenen Verfahren, Vorrichtungen und Systeme veranschaulichen sollen.The present invention is not limited to the embodiments shown. In particular, it should be noted that the description and figures are intended to illustrate only the principle of the proposed methods, apparatus and systems.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte Nicht-PatentliteraturCited non-patent literature

Z. Bai, G.-Bin. Huang, L.L.C. Kasun and C.M. Vong, "Local Receptive Fields Based Extreme Learning Machine", IEEE Computational Intelligence Magazine, vol, 10, no. 2, 2015. [0010]
L.L.C. Kasun, H. Zhou, G.-B. Huang, and C.M. Vong, "Representational Learning with Extreme Learning Machine for Big Data," IEEE Intelligent Systems, vol. 28, no. 6, pp. 31-34, 2013 [0010, 0035]

Claims

A method (300) for determining the position of a viewpoint (122) based on image data (112) comprising an image (203) of a header (120), the method comprising (300), - determining (301) a plurality of Blocks (201) of the image (203); wherein a block (201) comprises a portion of the image (203); Mapping (302) the plurality of blocks (201) to a corresponding plurality of feature values (211) of K different feature maps (210) using K different convolution weight vectors β _conv _k (202), where k = 1, .. ., K and with K>1; wherein the K different convolution weight _vectors β _conv _k (202) were determined by means of a machine learning method based on training data; and - determining (303) the position of at least one viewpoint (122) based on the K distinct feature maps (210).

Method (300) according to Claim 1 where - the machine-learning method includes an Extreme Learning Machine, ELM, Auto Encoder, AE; and / or - a convolution weight vector β _conv _k (202) comprises output weights of an extreme learning machine, ELM.

Method (300) according to one of the preceding claims, wherein

β_{c O n v} = {(\frac{I}{C} + H^{T} H)}^{- 1} H^{T} T

where - β _conv = [β _conv ₁ , ..., β _conv _K ]; I is an L × L unit matrix; - C is a regularization term; - T is a matrix with training blocks (201), x _n, with n = 1, ..., N, a plurality of training images (203); - H = [h (x ₁ ), ..., h (x _N )] ^T ; - h (x _n ) L are initial values of L neurons of a hidden layer of neurons for the training block (201) x _n ; and L is a number of pixels (204) in a training block (201) x _n .

Method (300) according to Claim 3 , in which

H (x_{n}) = {[H_{1} (x_{n}) . ... . H_{L} (x_{n})]}^{T}

where - h _l (x _n ) = g (a _l * x _n + b _l ), where l = 1, ..., L, is an activation function of the ^lth neuron of the hidden layer; and - a _l _l b are random weights and a random threshold.

Method (300) according to one of the preceding claims, wherein - the K different convolution weight _vectors β _conv _k (202) are designed to assign a block (201) to K different image structure types; and the image structure types in particular comprise: one or more edges and one or more curves.

The method (300) of one of the preceding claims, wherein mapping (302) a block x (201) by means of a convolution weight _vector β comprises _conv _k (202),

c_{k . i} (x) = Σ_{l = 1}^{L} x_{l} β_{c O n v k . l}

where - c _{k, i} (x) is the i ^th feature value (211) of the k ^th feature map (210); - x _{l is} a l ^th pixel (204) of the block x (201); and - ß _conv _{k, l} a l ^th convolution weight of the convolutional weight vector β _conv _k (202).

A method (300) according to any one of the preceding claims, wherein - determining (303) the position of at least one viewpoint (122) for each of the K distinct feature maps (210) comprises determining a respective summary map (220) each having a plurality of summary values (221); and - a merge value (221) of the ^short merge map (220) is determined based on a norm, in particular an L2 norm, of a feature block (214) of feature values (211) of the ^kth feature map (210) ,

The method (300) according to one of the preceding claims, wherein - determining (303) the position of at least one viewpoint (122) for each of the distinct K feature maps (210) comprises determining a connection vector (230) against a number of feature values (211) comprises a reduced number of connection values (231); - the connection _values (231) of a k ^th connection _vector (230) are determined using connection weight _vectors P _fullk (222), _where k = 1, ..., K; and - the K different connection weight _vectors P _fullk (222) were determined by means of a machine learning method based on training data.

Method (300) according to Claim 8 wherein determining (303) the position of at least one viewpoint (122) comprises determining a Position vector (240) based on the K connection vectors (230); - K compound vectors (230) _output (232) is transferred into the position vector (240) by an output weight vector β; the output weight vector β _output (232) has been determined by means of a machine learning method on the basis of the training data and on the basis of labels for the training data; and - the labels indicate actual positions of viewpoints (122) in the training data.

Control unit (101) for determining the position of a viewpoint (122) on the basis of image data (112) comprising an image (203) of a head (120), the control unit (101) being arranged, - a plurality of blocks (201 ) of the image (203); wherein a block (201) comprises a portion of the image (203); by means of K different convolution weight vectors β _conv _k (202) to map the plurality of blocks (201) to a corresponding plurality of feature values (211) of K different feature maps (210), where k = 1, ..., K and with K>1; wherein the K different convolution weight _vectors β _conv _k (202) were determined by means of a machine learning method based on training data; and determine the position of at least one viewpoint (122) based on the K different feature maps (210).