DE102007041719A1

DE102007041719A1 - Augmented reality producing method for use in e.g. TV studio, involves synchronizing camera with video projectors, and modulating images in luminance and/or chrominance in pixelwise, and integrating coded structure into projected images

Info

Publication number: DE102007041719A1
Application number: DE102007041719A
Authority: DE
Inventors: Oliver Bimber; Anselm Grundhöfer; Stephanie Zollmann; Daniel Kolster
Original assignee: Bauhaus Universitaet Weimar
Current assignee: Bauhaus Universitaet Weimar
Priority date: 2006-11-06
Filing date: 2007-09-04
Publication date: 2008-05-15
Anticipated expiration: 2027-09-05
Also published as: DE102007041719B4

Abstract

The method involves projecting images of an image sequence on a part of surfaces provided in an area (1) or limiting the area by digital video projectors (3). The images are spatially and/or temporally modulated in luminance and/or chrominance in pixelwise. A camera (2) is synchronized with the video projectors, where the camera accommodates a part of the area. A coded structure is integrated into the projected images. A complement of the coded structure is integrated into the projected images so that the coded structure is visible only for a viewer and the camera. An independent claim is also included for a studio comprising coded structures integrated in lighting and/or image contents.

Description

Die Erfindung betrifft ein Verfahren zur Erzeugung erweiterter Realität in einem Raum.The The invention relates to a method for generating augmented reality in one Room.

Viele moderne Fernsehproduktionen wenden virtuelle Studiotechnologie an. Chroms Keying ist das wichtigste Verfahren, um live aufgenommene oder gespeicherte Videosignale eines realen Blue-Screen-Studios mit virtuellen Inhalten zu überlagern. Dabei wird das Videosignal analysiert und Pixel des Videos mit einer vordefinierten Farbe (z.B. blau oder grün) werden durch computergenerierte Grafiken ersetzt. Dies ermöglicht die Benutzung von Bildverarbeitungstechniken, um den Vordergrund effektiv vom Hintergrund zu trennen und anschließend reale Objekte (wie einen Schauspieler oder einen Moderator) nahtlos in eine rein virtuelle Umgebung zu integrieren. Blue-Screen-Techniken beschränken virtuelle Studiotechnologie allerdings auf besondere Aufnahmeumgebungen. Deshalb untersuchen neueste Forschungsanstrengungen die Möglichkeiten erweiterter Realität (AR – augmented reality) für Fernsehproduktionen. Im Gegensatz zu Blue-Screen-Studios werden voll ausgestattete reale Fernsehstudios mittels virtueller Inhalte erweitert indem das aufgenommene Videomaterial mit Computergrafiken überlagert wird. In Anlehnung an den Begriff virtuelles Studio verwenden wir hierfür im Folgenden den Begriff erweitertes Studio. Verschiedene Gruppen haben bereits die Vorzüge erweiterter Realität im Rahmen von Studioproduktionen gezeigt. In ( Yuko Yamanuchi et al., Real space-based virtual sudio seamless synthesis of a real set image with a virtual set image, Proceedings of the ACM symposium an Virtual reality software anc technology, Hybrid VR, 2002, ISBN 1-58113-530-0, pp. 194–200 ) werden 360° ultrahoch aufgelöste omnidirektionale (kugelförmige) Bilder künstlicher Hintergründe erweitert, die in Echtzeit im Verhältnis zur Rotation einer Pan-Tilt-Kamera (pan – horizontale Rotation; tilt – vertikale Rotation) verzerrt und durch einen realen Schauspieler verdeckt werden. Eine Axi-Vision-Kamera wird benutzt, um gleichzeitig Farb- und Tiefeninformationen für jedes Pixel aufzunehmen. Jüngste Beispiele werden auch im Zusam menhang mit dem EU-geförderten Projekt MATRIS gezeigt ( IST, MATRIS – Markerless real-time Tracking for Augmented Reality Image Synthesis, http://www.ist-matris.org/, 2004 , Last visited August 31st, 2006). In ( Frahm et al., Markerless Augmented Reality with Light Source Estimation for Direct Illumination, Conference an Visual Media Production CVMP, London, December 2005 ) wird eine Fish-Eye-Kamera zusätzlich zu einer Studio-Kamera benutzt. Während die Studio-Kamera die zu erweiternden Videoinhalte aufnimmt, beobachtet die Fish-Eye-Kamera die obere Hemisphäre, um die installierten Studiobeleuchtungen zu verfolgen. Die Anwendung eines Structure-from-Motion-Algorithmus auf beide Bilder ermöglicht die Schätzung der Stellung der Studio-Kamera. Herkömmliche Stereo-Algorithmen erlauben die Rekonstruktion der Tiefe der Studio-Szenerie und ermöglichen infolgedessen korrekte Verdeckungseffekte zwischen realen und virtuellen Objekten. Außerdem erlaubt die Kenntnis der realen Studio-Lichtquellen die Berechnung einer Light Map, die eine konsistente Beleuchtung und Abschattung sicherstellt. Virtuelle und erweiterte Studio-Produktionen müssen verschiedenen technischen Herausforderungen begegnen. Eine davon ist die stabile und schnelle Verfolgung (auch Tracking genannt) und Nachführung der Studio-Kameras. Bei einigen Lösungsansätzen wird spezielle Verfolgungs-Hardware eingesetzt, während andere versuchen, die Kamerastellung durch Beobachtung natürlicher Charakteristika (z.B. an der Decke befestigte Studio-Lampen oder der Studio-Inhalt selbst) oder künstlicher Marken mittels zusätzlicher Kameras abzuschätzen. Optische Verfolgung wird wegen ihrer Robustheit gegenüber den meisten Umgebungsstörungen, und wegen ihrer Geschwindigkeit und Präzision immer beliebter. Ansätze zur optischen Verfolgung können in markerlose und markerbasierte Methoden eingeteilt werden. Markerlose Techniken auf der einen Seite hängen stark von der robusten Erkennung natürlicher Szenen-Charakteristika ab. Sie versagen gewöhnlich bei gleichförmig strukturierten Oberflächen unter ungünstigen Lichtverhältnissen. Dies beschränkt die Anwendung solcher Techniken in Fernseh-Studios auf optimierte Situationen. Markerbasierte Verfolgung andererseits stellt künstliche visuelle Charakteristika durch Einbeziehung detektierbarer Markierungszeichen zur Verfügung. Allerdings sollten diese Marker weder direkt in der Studioumgebung noch in der aufgenommenen Video aufzeichnung sichtbar sein. Infolgedessen ist markerbasierte Verfolgung gewöhnlich auf die Beobachtung von Bereichen außerhalb des Kamerabildes wie die Decke oder den Fußboden, die normalerweise mit Studioausrüstung, wie Lichtinstallationen, Kabeln und Aufbauten, bedeckt sind, beschränkt. Daher verursachen Verdeckungen und dynamische Umgruppierung der Installationen zusätzliche Probleme für markerbasierte Verfolgung. Ein weiteres Problem ist die Erfassung der Szenentiefe. Dies ist notwendig für die Einbeziehung künstlicher 3D-Objekte in die Videoaufzeichnung bei Erzeugung konsistenter Verdeckungs- und Beleuchtungseffekte zusammen mit dem aufgenommenen realen Inhalt. Einige Lösungsansätze rekonstruieren die Szenen-Geometrie offline (während eines speziellen Kalibrierungsschrittes) durch Nutzung von Multiviewpoint-Stereo aus unkalibrierten Video-Sequenzen. Die Qualität solcher Techniken hängt von der Qualität der Merkmalsübereinstimmung in den Stereopaaren ab. Allerdings kann das Auffinden vergleichbarer Merkmale zur Absicherung einer hochqualitativen Tiefenrekonstruktion nicht nur für reale Studioumgebungen sondern auch für virtuelle Studios oder eingebettete Blue-Screens, die hauptsächlich gleichförmig gefärbte Oberflächen einsetzen, schwierig sein. Außer der Offline-Rekonstruktion der statischen Studio-Szenerie sind Online-Tiefenschätzungen (z.B. von bewegten Personen in der Szene) noch schwieriger. In virtuellen Studios erlaubt der gleichförmig gefärbte Hintergrund, der für das Chroms Keying erforderlich ist, die Anwendung schneller Depth-from-Silhouette-Algorithmen oder ähnlicher Techniken. Dies ist allerdings nicht mit einer realen Studio-Szenerie möglich. Noch eine weitere Herausforderung für virtuelle und erweiterte Studios ist die Frage, wie Regieinformationen für Moderatoren, Schauspieler oder Teilnehmer während einer Live-Ausstrahlung oder einer Aufnahme angezeigt werden. Teleprompter oder fest installierte Bildschirme bieten begrenzte Möglichkeiten, da sie nicht erlauben, die dargestellte Information in einen räumlichen Zusammenhang zu bringen. Schrittfolgen werden beispielsweise gewöhnlich statisch auf dem Fußboden markiert.Many modern television productions use virtual studio technology. Chrome's keying is the most important method of overlaying live video or stored video from a real blue screen studio with virtual content. The video signal is analyzed and pixels of the video with a predefined color (eg blue or green) are replaced by computer-generated graphics. This allows the use of image processing techniques to effectively separate the foreground from the background and then seamlessly integrate real objects (such as an actor or a presenter) into a purely virtual environment. However, blue-screen techniques limit virtual studio technology to special recording environments. Therefore, recent research efforts are exploring the possibilities of augmented reality (AR) for television productions. Unlike blue-screen studios, fully equipped real-life TV studios are augmented with virtual content by superimposing video footage on computer graphics. Following the term virtual studio, we use the term extended studio for this purpose. Several groups have already shown the benefits of augmented reality in studio productions. In ( Yuko Yamanuchi et al., Real space-based virtual sudio seamless synthesis of a virtual set image, Proceedings of the ACM symposium on virtual reality software anc technology, Hybrid VR, 2002, ISBN 1-58113-530-0 , pp. 194-200 ) expands 360 ° ultra-high-resolution omnidirectional (spherical) images of artificial backgrounds that are distorted in real-time in proportion to the rotation of a pan-tilt camera (pan-horizontal rotation; tilt-vertical rotation) and obscured by a real actor. An Axi Vision camera is used to simultaneously capture color and depth information for each pixel. Recent examples are also presented in the context of the EU-funded project MATRIS ( IST, MATRIS - Markerless Real-time Tracking for Augmented Reality Image Synthesis, http://www.ist-matris.org/, 2004 , Last visited August 31st, 2006). In ( Frahm et al., Markerless Augmented Reality with Light Source Estimation for Direct Illumination, Conference on Visual Media Production CVMP, London, December 2005 ), a fish-eye camera is used in addition to a studio camera. As the studio camera captures the video content to be expanded, the fish-eye camera observes the top hemisphere to track the installed studio lighting. Using a structure-from-motion algorithm on both images allows estimating the position of the studio camera. Traditional stereo algorithms allow you to reconstruct the depth of the studio scene, allowing for correct masking effects between real and virtual objects. In addition, knowing the real studio light sources allows the calculation of a light map that ensures consistent lighting and shading. Virtual and advanced studio productions must meet different technical challenges. One of them is the stable and fast tracking (also called tracking) and tracking of the studio cameras. Some approaches use special tracking hardware while others attempt to estimate the camera position by observing natural characteristics (eg, ceiling-mounted studio lamps or the studio content itself) or artificial markers using additional cameras. Optical tracking is becoming increasingly popular because of its robustness to most environmental disturbances, and because of its speed and precision. Optical tracking approaches can be classified into markerless and marker-based methods. Markerless techniques on the one hand depend heavily on the robust recognition of natural scene characteristics. They usually fail with uniformly structured surfaces under unfavorable light conditions. This limits the application of such techniques in television studios to optimized situations. Marker-based tracking, on the other hand, provides artificial visual characteristics by including detectable landmarks. However, these markers should neither be visible directly in the studio environment nor in the recorded video recording. As a result, marker-based tracking is usually limited to the observation of areas outside of the camera image such as the ceiling or floor normally covered with studio equipment, such as light installations, cables and constructions. Therefore, obfuscation and dynamic regrouping of the installations cause additional problems for marker-based tracking. Another problem is the capture of scene depth. This is necessary for the incorporation of 3D artificial objects into the video recording while producing consistent masking and lighting effects along with the captured real content. Some approaches reconstruct the scene geometry offline (during a special calibration step) by using multiviewpoint stereo from uncalibrated video sequences. The quality of such techniques depends on the quality of the feature match in the stereo pairs. However, finding comparable features to ensure high-quality deep reconstruction can not only be done in real-world studio environments, but also in virtual studios or embedded blue-screens that have mostly uniformly colored surfaces use, be difficult. In addition to the offline reconstruction of the static studio scenery, online depth estimates (eg of moving people in the scene) are even more difficult. In virtual studios, the uniformly colored background required for chroma keying allows the use of fast depth-from-silhouette algorithms or similar techniques. However, this is not possible with a real studio scenery. Yet another challenge for virtual and advanced studios is how to display directorial information for presenters, actors, or participants during a live broadcast or recording. Teleprompters or fixed screens offer limited possibilities, as they do not allow the presented information to be spatially related. For example, step sequences are usually statically marked on the floor.

Es ist eine Aufgabe der Erfindung, ein neuartiges Verfahren zur Erzeugung erweiterter Realität in einem Raum anzugeben.It It is an object of the invention to provide a novel method of production augmented reality in a room.

Die Aufgabe wird erfindungsgemäß gelöst durch ein Verfahren mit den Merkmalen des Anspruchs 1.The The object is achieved by a method having the features of claim 1.

Vorteilhafte Weiterbildungen sind Gegenstand der Unteransprüche.advantageous Further developments are the subject of the dependent claims.

Bei einem erfindungsgemäßen Verfahren zur Erzeugung erweiterter Realität in einem Raum werden zumindest auf einen Teil den Raum begrenzender und/oder im Raum befindlicher Oberflächen mittels mindestens eines digitalen Videoprojektors Bilder projiziert, wobei die Bilder räumlich und/oder zeitlich in Luminanz und/oder Chrominanz moduliert werden, insbesondere pixelweise. Es ist mindestens eine Kamera vorgesehen, die mit mindestens einem der Videoprojektoren synchronisiert ist und zumindest einen Teil des Raumes aufnimmt.at a method according to the invention to create augmented reality in a room, the space becomes at least a part of it and / or in-surface surfaces by means of at least one digital video projector projects images, with the images spatially and / or temporally modulated in luminance and / or chrominance, in particular pixel by pixel. There is at least one camera provided with at least one of the video projectors is synchronized and at least one Part of the room.

Vorzugsweise wird in mindestens einem der projizierten Bilder mindestens eine kodierte Struktur integriert. In einem unmittelbar nachfolgenden der projizierten Bilder wird ein Komplement der kodierten Struktur so integriert, dass die Struktur nur für einen Betrachter oder nur für die Kamera sichtbar ist. Die aufeinanderfolgende Projektion zweier zueinander komplementärer Bilder mit einer hinreichend hohen Bildwiederholfrequenz von beispielsweise 120 Hz führt für den Betrachter infolge der Beschränkungen des menschlichen Sehvermögens zu einer Wahrnehmung einer gemittelten Summe beider Bilder. Eine mit dem Videoprojektor synchronisierte Kamera kann jedoch beide Bilder separat aufnehmen.Preferably At least one of the projected images will have at least one integrated coded structure. In an immediately following the projected image becomes a complement of the coded structure integrated so that the structure only for a viewer or only for the Camera is visible. The successive projection of two complementary images with a sufficiently high refresh rate of, for example 120 Hz leads for the Viewer due to the limitations of the human eyesight to a perception of an averaged sum of both images. One with However, the camera synchronized with the video projector can take both pictures record separately.

Auf diese Weise wird mittels für den Betrachter nicht sichtbarer Strukturen beispielsweise optisches Tracking, das heißt die Nachführung einer Kameraposition anhand der projizierten Struktur, möglich, ohne dass zusätzliche Marker außerhalb des Sichtfeldes der Kamera erforderlich sind.On this way is by means of the observer not visible structures, for example, optical Tracking, that is the tracking a camera position based on the projected structure, possible, without that extra Markers outside the field of view of the camera are required.

Ebenso können für den Betrachter sichtbare, jedoch für die Kamera unsichtbare Strukturen eingeblendet werden, beispielsweise Regieanweisungen für einen Moderator, wodurch kein Teleprompter benötigt wird. Dies geschieht beispielsweise, in dem die Kamera so synchronisiert wird, dass sie nur jedes zweite projizierte Bild aufnimmt, während die Struktur jeweils dann eingeblendet wird, wenn die Kamera nicht aufnimmt.As well can for the Viewer visible, however, for the camera invisible structures are displayed, for example Director's instructions for a moderator, which does not require a teleprompter. This happens, for example, in which the camera is synchronized so that it only every second projected image picks up while the structure will be displayed whenever the camera is not receives.

Die auf die Oberflächen projizierten Bilder können auch zur zumindest partiellen Beleuchtung des Raumes dienen. Dabei wird beispielsweise Licht mit einem einheitlichen Farbton mittels radiometrischer Kompensation auf zumindest einen Teil der im Raum befindlichen Oberflächen projiziert. Eine solche Projektion kann mit beliebiger Frequenz durchgeführt werden und kann auch integrierte Strukturen enthalten. Die Projektion kann mit einer Kamera aufgenommen und einem Chroma-Keying verfahren zugrunde gelegt werden, analog zu virtuellen Blue-Screen-Studios, jedoch vor beliebigen Hintergründen. Eine solche Beleuchtung ist sowohl statisch als auch dynamisch möglich.The on the surfaces projected images can also serve for at least partial illumination of the room. there For example, light with a uniform hue using radiometric compensation on at least part of the space located surfaces projected. Such a projection can be at any frequency carried out and can also contain integrated structures. The projection can be taken with a camera and chroma-keying underlying, however, similar to virtual blue-screen studios, but before any backgrounds. Such lighting is possible both statically and dynamically.

Vorzugsweise wird zur Einblendung der Struktur in das Bild eine für ein menschliches Auge gerade wahrnehmbare Differenz für mindestens ein Pixel mindestens eines Originalbildes aus einer Originalbildfolge berechnet. Das bedeutet beispielsweise, dass ermittelt wird, wie sehr das Originalbild zur Einblendung der Struktur modifiziert werden kann, damit der Unterschied für einen Betrachter gerade erkennbar wird. Zur Einblendung einer für den Betrachter oder eine der Kameras nicht wahrnehmbaren Struktur wird ein positionsgleiches Pixel in einem aus dem Originalbild abgeleiteten Bild zumindest in einem roten und/oder blauen und/oder grünen Farbkanal maximal um die gerade wahrnehmbare Differenz und in einem komplementären Bild maximal um ein Negatives der gerade wahrnehmbaren Differenz verändert. Das Bild und das komplementäre Bild werden aufeinander folgend projiziert. Die gerade wahrnehmbare Differenz ist vom Bildinhalt abhängig und wird dementsprechend für wechselnde Originalbilder und/oder wechselnde Strukturen jeweils neu berechnet.Preferably becomes one for a human to show the structure in the picture Eye just noticeable difference for at least one pixel at least an original image calculated from an original sequence. The means, for example, that it determines how much the original image can be modified to display the structure so that the Difference for a viewer is just recognizable. To display one for the viewer or one of the cameras imperceptible structure becomes a positionally equal Pixels in an image derived from the original image, at least in a red and / or blue and / or green color channel at most around the just perceptible difference and in a complementary image changed by a maximum of the currently perceivable difference. The Picture and the complementary Image are projected consecutively. The just noticeable Difference depends on the image content and is accordingly for changing original pictures and / or changing structures respectively recalculated.

Die abwechselnde Projektion erfolgt vorzugsweise mit einer Bildwiederholfrequenz, die größer als eine Flimmerfrequenz ist. Als Flimmerfrequenz soll eine Maximalfrequenz verstanden werden, bei der bei aufeinanderfolgender Projektion von Bildern von einem menschlichen Auge ein Flimmern wahrgenommen wird.The alternating projection is preferably carried out with a refresh rate, the bigger than is a flicker frequency. As a flicker frequency is a maximum frequency be understood in the case of successive projection of images from a human eye a flicker is perceived.

Flimmerfreie Darstellung ist ab ungefähr 45 Hell-Dunkel-Wechseln pro Sekunde möglich, bei denen die meisten Menschen das Flimmern nur noch unbewusst wahrnehmen. Allerdings können sehr helle und kontrastreiche Bilder auch hier noch zum Flimmern führen. Ab etwa 60 Hell-Dunkel-Wechseln pro Sekunde ist Flimmern weitgehend ausgeschaltet. Insbesondere erfolgt die abwechselnde Projektion mit einer Bildwiederholfrequenz von 120 Hz. Auf diese Weise sind die in jedem zweiten Bild projizierten Strukturen bzw. ihre Komplemente nicht als Flimmern wahrnehmbar.flicker Presentation is about from 45 light-dark-changes per second possible, with which most People perceive the flicker only unconsciously. Indeed can very bright and high-contrast images even here to flicker to lead. From about 60 light-dark changes per second is flicker largely switched off. In particular, the alternating projection takes place with a refresh rate of 120 Hz. In this way, the in every second image projected structures or their complements not noticeable as flicker.

Vorzugsweise wird die gerade wahrnehmbare Differenz auf einer regionalen Bildhelligkeit und/oder einer räumlichen Auflösung und/oder einer Raumfrequenz des Originalbildes und/oder der Struktur und/oder einer zeitlichen Frequenz des Bildes und/oder des komplementären Bildes und/oder einer maximalen Geschwindigkeit einer Augenbewegung basierend pixelweise ermittelt. Die Kenntnis der Verhältnisse zwischen diesen Parametern ermöglicht eine dynamische und inhaltsabhängige regionale Adaption der gerade wahrnehmbaren Differenz. Die maximale Geschwindigkeit der Augenbewegung kann insbesondere empirisch ermittelt werden.Preferably is the just noticeable difference on a regional image brightness and / or a spatial resolution and / or a spatial frequency of the original image and / or the structure and / or a temporal frequency of the image and / or the complementary image and / or a maximum speed of eye movement determined pixel by pixel. The knowledge of the relationships between these parameters allows one dynamic and content dependent regional adaptation of the just perceptible difference. The maximal Speed of eye movement can be determined in particular empirically become.

Bevorzugt wird mindestens eine der Kameras auf die Bildwiederholfrequenz synchronisiert, wobei das Originalbild aus einem Mittelwert des aufgenommenen Bildes und des aufgenommenen komplementären Bildes und die Struktur aus einer Differenz oder einem Quotienten aus dem aufgenommenen Bild und dem aufgenommenen komplementären Bild rekonstruiert wird. Der Quotient der beiden Bilder ist größer oder kleiner Eins während die Differenz beider Bilder größer oder kleiner Null ist, abhängig vom integrierten Bit. Um einen Einfluss der Übertragungs- oder Antwortfunktion der Kamera oder des Projektors zu vermeiden, können diese linearisiert sein. Nach der Linearisierung kann eine Gammakorrektur angewandt werden, um die Farbkonsistenz sicher zu stellen.Prefers if at least one of the cameras is synchronized to the refresh rate, the original image being an average of the captured image and the recorded complementary Picture and the structure of a difference or a quotient reconstructed from the recorded image and the recorded complementary image becomes. The quotient of the two pictures is larger or smaller one while the Difference between both pictures bigger or less than zero is dependent from the integrated bit. To influence the transmission or response function camera or projector, these can be linearized. After linearization, gamma correction can be applied to ensure the color consistency.

Es können sowohl binäre Strukturen als auch Strukturen variabler Intensitäten integriert sein.It can both binary Structures as well as structures of variable intensities integrated be.

Vorzugsweise wird ein Kantendetektions-Algorithmus auf das aufgenommene Bild und auf das aufgenommene komplementäre Bild angewandt, ein optischer Fluss berechnet und eine Homografie-Matrix ermittelt, um das aufgenommene Bild und das aufgenommene komplementäre Bild aneinander auszurichten. Auf diese Weise werden beispielsweise schnelle Kamerabewegungen kompensiert. Als Kantendetektions-Algorithmus kommt beispielsweise ein Canny-Algorithmus in Betracht.Preferably becomes an edge detection algorithm on the captured image and applied to the captured complementary image, an optical one River is calculated and a homography matrix is determined to be recorded Align image and the recorded complementary image to each other. In this way, for example, fast camera movements compensated. As an edge detection algorithm, for example, comes Canny algorithm into consideration.

Insbesondere zur regionalen Adaption der gerade wahrnehmbaren Differenz ist die Bestimmung der Raumfrequenz des darzustellenden Bildes in Echtzeit vorteilhaft. Zu diesem Zweck wird vorzugsweise ein Laplace-Pyramiden-Algorithmus angewandt. Die verwendete Laplace-Pyramide weist vorzugsweise sechs Stufen auf. Beispielsweise werden die absoluten Differenzen jeder Stufe der Gauß-Pyramide benutzt und jede Stufe der daraus resultierenden Laplace-Pyramide normiert. Das Ergebnis sind die Verhältnisse der Raumfrequenzen innerhalb jedes der erzeugten Frequenzbänder. Die Ergebnisse werden abhängig von der Entfernung des Betrachters von einer Bildebene und der Größe der Projektion in Einheiten von Zyklen pro Grad konvertiert, mit denen sich das Auflösungsvermögen des menschlichen Auges besonders gut beschreiben lässt. Das Eingabebild für den Laplace-Pyramiden-Algorithmus wird in sein physikalisches Helligkeits-Äquivalent mit der Einheit cd/m² transformiert (Die Antwort-Funktion des Projektors wurde mit einem Fotometer gemessen.). Mit diesen Parametern kann die gerade wahrnehmbare Differenz für eine beliebige Region innerhalb des Originalbildes mit der kodierten Struktur ermittelt werden.In particular for the regional adaptation of the currently perceivable difference, the determination of the spatial frequency of the image to be displayed in real time is advantageous. For this purpose, a Laplace pyramid algorithm is preferably used. The Laplace pyramid used preferably has six stages. For example, the absolute differences of each stage of the Gaussian pyramid are used and normalized to each stage of the resulting Laplace pyramid. The result is the ratios of the spatial frequencies within each of the generated frequency bands. The results are converted depending on the distance of the viewer from an image plane and the size of the projection in units of cycles per degree, with which the resolution of the human eye can be described very well. The input image for the Laplace pyramid algorithm is transformed into its physical brightness equivalent with the unit cd / m ² (the response function of the projector was measured with a photometer). With these parameters, the just noticeable difference for any region within the original image with the coded structure can be determined.

Vorzugsweise wird der grüne Kanal des Originalbildes um eine geringere Differenz verändert als der rote und der blaue Kanal. Dies ist eine weitere Möglichkeit, um die Sichtbarkeit der Strukturen in den projizierten Bildern zu verringern, da das menschliche Auge für Licht im Wellenlängenbereich grünen Lichts besonders empfindlich ist. Insbesondere beträgt die Differenz im grünen Kanal maximal ein Viertel der Differenz im roten und/oder im blauen Kanal. Bei deutlich verminder ter Wahrnehmbarkeit durch einen Beobachter kann die Struktur so dennoch mit hoher Qualität projiziert werden. Trotz der unterschiedlichen Differenzen in den Farbkanälen wird vom Betrachter keine Farbverschiebung wahrgenommen, da das Bild in gleicher Weise durch das komplementäre Bild kompensiert wird. Vorzugsweise wird die maximale Differenz in den drei Farbkanälen statt einer mittleren Differenz im Graukanal zur Schwellwertbestimmung benutzt.Preferably becomes the green one Channel of the original image changed by a smaller difference than that red and the blue channel. This is another way to the visibility of the structures in the projected images decrease, since the human eye for light in the wavelength range green light is particularly sensitive. In particular, the difference is in the green channel maximum one quarter of the difference in the red and / or blue channel. at significantly diminished perceptibility by an observer the structure can still be projected with high quality. In spite of the different differences in the color channels is not the viewer Color shift perceived as the image in the same way the complementary picture is compensated. Preferably, the maximum difference in the three color channels instead of a mean difference in the gray channel for thresholding used.

Wenn die Auflösung der Kamera niedriger ist als die des Projektors, können individuelle Pixel am Übergang von Markergrenzen falsch klassifiziert werden. Bei schnellen Kamerabewegungen kann die integrierte Struktur durch die geometrische Missregistrierung nicht mehr für jedes Pixel korrekt rekonstruiert werden. Dieser Defekt kann durch Anwendung eines Median-Filters effizient entfernt werden.If the resolution the camera is lower than that of the projector, can be customized Pixel at the transition of marker boundaries are misclassified. For fast camera movements can the integrated structure through the geometric misregistration not for anymore every pixel is correctly reconstructed. This defect can through Application of a median filter can be efficiently removed.

In einer bevorzugten Ausführungsform wird, insbesondere bei zeitlich variierenden Strukturen die Differenz zum Originalbild über eine Folge von Bildern und komplementären Bildern schrittweise von Null bis maximal bis auf die gerade wahrnehmbare Differenz angehoben und/oder von maximal der gerade wahrnehmbaren Differenz bis auf Null abgesenkt. Das bedeutet, dass eine Struktur nicht abrupt in das Bild projiziert, sondern stufenweise eingeblendet wird, wodurch die Wahrnehmung von Flackern an den Übergängen vermieden wird. Insbesondere wird zum Ausblenden die Differenz in einer Anzahl von Schritten um einen konstanten Wert bis auf Null reduziert. Wenn die Differenz Null ist, kann auf eine neue Struktur umgeschaltet und diese wieder schrittweise eingeblendet werden.In a preferred embodiment, in particular with temporally varying structures, the difference to the original image over a sequence of images and complementary images is gradually increased from zero to maximum except for the just perceptible difference and / or from maximum of the just noticeable difference to zero lowers. This means that a structure is not abruptly projected into the image, but gradually faded in, avoiding the perception of flickering at the transitions. In particular, to hide the difference in a number of steps is reduced by a constant value to zero. If the difference is zero, you can switch to a new structure and display it again step by step.

Eine Anzahl der Schritte, über die die Differenz angehoben oder abgesenkt wird, wird bevorzugt in Abhängigkeit von einer mittels der lokalen Raumfrequenz und der lokalen Helligkeit bestimmten gerade wahrnehmbaren Helligkeits- und/oder Kontrastdifferenz ermittelt. Diese Parameter können aus einer so genannten Threshold-vs-Intensity-Funktion (TVI) und einer Kontrast-Empfindlichkeits-Funktion abgeleitet werden. Dies sind Funktionen der lokalen Raumfrequenz und des Helligkeitspegels. Auf diese Weise kann eine optimale Anzahl von Schritten für den Vorgang des Ein- und Ausblendens einer kodierten Struktur für jede Region im Bild bestimmt werden. Insbesondere werden Mittelwerte der Raumfrequenzen und Helligkeitspegel von Bildbereichen benutzt, für die bereits die gerade wahrnehmbare Differenz berechnet wurde. Die TVI-Funktion und die Kontrast-Empfindlichkeits-Funktion werden angewandt und ihre Ergebnisse multipliziert, um die größte nicht wahrnehmbare Helligkeitsdifferenz pro Schritt zu bestimmen. Auf diese Weise wird die Anzahl der jeweils erforderlichen Schritte zum Ein- und Ausblenden für die entsprechende Bildregion bestimmt.A Number of steps, over the difference is raised or lowered, is preferred in dependence one by means of the local spatial frequency and the local brightness certain just perceivable brightness and / or contrast difference determined. These parameters can from a so-called Threshold vs Intensity function (TVI) and a contrast sensitivity function be derived. These are functions of the local spatial frequency and the brightness level. In this way can be an optimal number of steps for the A process of fading in and out of a coded structure for each region to be determined in the picture. In particular, mean values of the spatial frequencies become and brightness levels of image areas used for that already the just noticeable difference was calculated. The TVI function and the contrast sensitivity function are applied and their results multiplied to the largest imperceptible brightness difference to determine per step. In this way, the number of each required Steps to show and hide for the corresponding image region certainly.

Wird das Originalbild in der Originalbildfolge gegenüber einem vorangegangenen Originalbild verändert, beispielsweise bei einem Szenenwechsel, bei Videos oder interaktiven Inhalten, während die gerade projizierte Struktur dennoch beibehalten werden soll, beispielsweise wenn es sich um einen Marker für die Kameraverfolgung und -nachführung handelt, wird die gerade wahrnehmbare Differenz vorzugsweise neu bestimmt und auch der Ein- oder Ausblendvorgang entsprechend angepasst, damit die Struktur nicht plötzlich sichtbar wird oder eine andere Wahrnehmungsstörung verursacht, beispielsweise infolge einer veränderten Raumfrequenz.Becomes the original picture in the original picture sequence compared to a previous original picture changed For example, during a scene change, for videos or interactive Content while the currently projected structure should still be preserved, for example, if it is a marker for camera tracking and -nachführung the currently noticeable difference is preferably new determined and also the fade-in or fade-out adjusted accordingly, so that the structure does not suddenly becomes visible or causes another perception disorder, for example, as a result an altered one Spatial frequency.

Als Strukturen werden beispielsweise zweidimensionale Marker in das Bild und entsprechend in das komplementäre Bild eingebettet, die zur Kameraverfolgung und -nachführung verwendet werden.When Structures, for example, become two-dimensional markers in the Picture and accordingly embedded in the complementary picture, the Camera tracking and tracking be used.

Vorzugsweise wird eine Position und/oder eine Größe des Markers während der Bildfolge zumindest dann dynamisch neu berechnet und im Bild und im komplementären Bild verändert, wenn eine der Kameras bewegt oder auf eine andere der Kameras umgeschaltet wird und der bisherige Macker infolge Abschattung durch ein Objekt oder eine Person in einem Vordergrund nicht oder schlecht sichtbar ist, wobei Objekte und/oder Personen in einem Vordergrund mittels eines Keying-Verfahrens extrahiert werden.Preferably is a position and / or a size of the marker during the Image sequence at least then recalculated dynamically and in the image and in the complementary Image changed, when one of the cameras moves or switches to another of the cameras is and the previous markers due to shading by an object or a person in a foreground not or poorly visible is where objects and / or people in a foreground means a keying process be extracted.

Die Größe und Position der Marker wird dabei vorzugsweise mittels eines Quad-Trees bestimmt, wobei die Markergröße entsprechend der Sichtbarkeit maximiert wird. Ein Quadtree ist in der Informatik eine spezielle Baum-Struktur, in der jeder innere Knoten bis zu vier Kinder haben kann. Das Begriff Quad-Tree leitet sich von der Zahl der Kinder eines inneren Knotens ab (quad (vier) + tree (Baum) = Quad-Tree). Diese Struktur wird hauptsächlich zur Organisation zweidimensionaler Daten im Bereich der Computergrafik eingesetzt. Die Wurzel des Baumes repräsentiert dabei eine quadratische Fläche. Diese wird rekursiv in je vier gleich große Quadranten zerlegt bis die gewünschte Auflösung erreicht ist und die Rekursion in einem Blatt endet. Durch rekursive Anwendung dieser Zerteilung kann die vom Wurzelknoten repräsentierte Fläche beliebig fein aufgelöst werden. Beispielsweise wird ein Quad-Tree berechnet, der eine Vielzahl von Marker verschiedener Größen an verschiedenen Positionen in jeder Ebene des Quad-Trees umfasst. Von einer höheren Ebene zur nächstniedrigeren vervierfacht sich die Anzahl der Marker während ihre Größe sich um den Faktor 2 verringert. Wir bezeichnen dies als Markerbaum. Adaptive Markerpositionierung wird in mehreren Schritten umgesetzt.The Size and position the marker is preferably determined by means of a quad-tree, wherein the marker size accordingly visibility is maximized. A quadtree is in computer science a special tree structure in which every inner node is up to can have four children. The term quad-tree derives from the Number of children of an inner node (quad (four) + tree (tree) = Quad-tree). This structure becomes mainly two-dimensional for organization Data used in the field of computer graphics. The root of the tree represents while a square area. These is recursively divided into four equal quadrants each until the desired resolution is reached and the recursion ends in a leaf. By recursive application This division allows the surface represented by the root node to be arbitrary finely resolved. For example, a quad-tree is calculated that has a plurality of Markers of different sizes at different Includes positions in each level of the quad-tree. From a higher level to the next lower The number of markers quadruples as their size quadruples reduced by a factor of 2. We call this a marker tree. Adaptive marker positioning is implemented in several steps.

Zuerst wird ein Vollbild-Quad in Projektorauflösung erzeugt und eine projektive Transformation berechnet, auf die die erzeugte Vordergrundprojektion aus der Kameraperspektive abgebildet ist. Dies wird erreicht durch Nutzung der Model-View-Matrix, die sich aus der in den zuvor dargestellten Bildern rekonstruierten Struktur zur Kameranachverfolgung ergibt.First is a full-screen Quad generated in projector resolution and a projective Transformation calculates to which the generated foreground projection from the camera perspective is shown. This is achieved by Use of the model view matrix, the to be reconstructed from the pictures presented above Structure for camera tracking results.

Das Ergebnis ist ein Binärbild, das die Sichtbarkeit jedes Projektorpixels aus Kamerasicht enthält, was wir als Sichtbarkeitskarte bezeichnen wollen. Diese Technik ist analog zu herkömmlichem Shadow Mapping. Die ursprüngliche Sichtbarkeitskarte wird dann benutzt, um die kleinstmögliche Markergröße zu berechnen, die benutzt wird, indem die Anzahl der Projektorpixel, die im Kamerabild aus der vorherigen Perspektive sichtbar sind, ermittelt wird. Wir rechnen die Sichtbarkeitskarte mittels Unterabtastung zu einer Bildpyramide, die die größtmögliche Markergröße in der obersten Ebene (z.B. per Definition 2×2 Marker) bis herunter zu der festgelegten kleinstmöglichen Markergröße in der untersten Ebene (z.B. 16×16 Pixel pro Marker) umfasst. Dies führt zu einer Sichtbarkeitskarte mit multipler Auflösung, die wir Sichtbarkeitsbaum nennen. Während der Laufzeit werden der Markerbaum und der Sichtbarkeitsbaum in den korrespondierenden Ebenen kombiniert: In einer Top-Down-Richtung werden nur Einträge, die weder verschaffet (d.h. die in der selben Sichtbarkeitsbaum-Ebene als sichtbar markiert sind) noch bereits durch Marker höherer Ebenen belegt sind, behandelt. Die verbleibenden Einträge werden dann in der aktuellen Ebene des Markerbaums als belegt gekennzeichnet. Regionen, die über alle Ebenen unsichtbar sind, werden auf der untersten Ebene der Markerpyramide gekennzeichnet. Wenn die unterste Ebene erreicht ist, wird der markierte Markerbaum kollabiert und die nichtüberlappenden Einträge, die durch verschiedene Ebenen belegt sind, werden kombiniert. Daraus ergibt sich ein kodiertes Bild, das die Menge optimal skalierter und platzierter Marker bezüglich der Vordergrundverdeckungen und der Kameraperspektive enthält. Lokale Markerregionen können zeitlich geblendet werden, wenn eine Änderung der Marker in einem bestimmten Bereich des Bildes stattfindet.The result is a binary image that contains the visibility of each projector pixel from the camera's point of view, which we call a visibility map. This technique is analogous to traditional shadow mapping. The original visibility map is then used to calculate the smallest possible marker size that is used by determining the number of projector pixels that are visible in the camera image from the previous perspective. We sub-scan the visibility map to create an image pyramid that includes the largest possible marker size in the top level (eg by definition 2 × 2 markers) down to the lowest possible marker size in the lowest level (eg 16 × 16 pixels per marker). This leads to a visibility map with multiple resolution, which we call visibility tree. During runtime, the marker tree and the visibility tree are combined in the corresponding levels Bound: In a top-down direction, only entries that are neither assigned (that is, marked as visible in the same visibility tree level) are already covered by higher-level markers are treated. The remaining entries are then marked as occupied in the current level of the marker tree. Regions that are invisible over all levels are marked at the lowest level of the marker pyramid. When the lowest level is reached, the marked marker tree is collapsed and the non-overlapping entries occupied by different levels are combined. The result is a coded image that contains the set of optimally scaled and placed markers for foreground occlusion and camera perspective. Local marker regions may be blinded when a change in markers occurs in a particular area of the image.

Vorzugsweise wird beim Keying-Verfahren der Vordergrund blitzartig jeweils nur im Bild oder nur im komplementären Bild beleuchtet, wobei Objekte und/oder Personen im Vordergrund anhand der im Bild und im komplementären Bild unterschiedlichen Helligkeit vom gleichbleibend hellen Hintergrund unterscheiden werden. Diese Technik wird auch als Real-Time-Flash-Keying bezeichnet.Preferably In the keying method, the foreground is only ever in a flash in the picture or only in the complementary one Illuminated image, with objects and / or people in the foreground based on the different brightness in the picture and in the complementary picture be distinguished from the consistently bright background. These Technology is also referred to as real-time flash keying.

Zur Extraktion des Vordergrunds werden vorzugsweise nur Helligkeitsunterschiede berücksichtigt, die einen bestimmten Schwellwert übersteigen. Insbesondere ist dieser Schwellwert größer als das Doppelte der aktuell größten Differenz, d.h. der Intensitätsdifferenz, die zur Integration der Struktur verwendet wird.to Extraction of the foreground preferably only brightness differences considered, which exceed a certain threshold. In particular this threshold is greater than twice the currently largest difference, i.e. the intensity difference, which is used to integrate the structure.

Die blitzartige Beleuchtung erfolgt bevorzugt durch mindestens einen LED-Blitz oder durch mindestens einen der Videoprojektoren. Beispielsweise wird der Vordergrund 60 mal pro Sekunde mit Blitzen von jeweils acht Millisekunden Länge beleuchtet, so dass bei 120 Hz Bildwiederholfrequenz jeweils nur der Vorder grund des Bilds oder des komplementären Bilds beleuchtet ist. Beispielsweise können weiße LEDs mit einer Farbtemperatur von 5600 K benutzt werden. Zusätzlich können Farbfilter verwendet werden.The Lightning-like lighting is preferably carried out by at least one LED flash or through at least one of the video projectors. For example the foreground 60 times per second with flashes of eight each Milliseconds in length illuminated, so that at 120 Hz refresh rate only the Foreground of the image or the complementary image is illuminated. For example can white LEDs be used with a color temperature of 5600 K. In addition, color filters be used.

Insbesondere bei schnellen Kamerabewegungen können das aufgenommene Bild und das aufgenommene komplementäre Bild ebenfalls mit Hilfe einer Homografiematrix korrigiert werden, bevor der Vordergrund extrahiert wird.Especially with fast camera movements can the captured image and the captured complementary image also be corrected with the help of a homography matrix before the Foreground is extracted.

In einer weiteren Ausführungsform wird beim Keying-Verfahren der Vordergrund konstant beleuchtet, wobei Objekte und/oder Personen im Vordergrund anhand der gleichbleibenden Helligkeit vom Hintergrund unterschieden werden, der anhand der im Bild und im komplementären Bild unterschiedlich hellen, nicht wahrnehmbaren Strukturen identifiziert wird.In a further embodiment the foreground is constantly illuminated during the keying process, with objects and / or people in the foreground based on the same Brightness can be distinguished from the background, based on the in the Picture and in the complementary Image different bright, imperceptible structures is identified.

Vorzugsweise werden zwei der Kameras zueinander koaxial angeordnet, wobei eine der Kameras auf den Vordergrund und die andere der Kameras auf den Hintergrund fokussiert wird und die aufgenommenen Bilder und/oder die aufgenommenen komplementären Bilder beider Kameras zur Extraktion des Vordergrunds genutzt werden.Preferably two of the cameras are arranged coaxially with each other, with a the cameras on the foreground and the other of the cameras on the Background is focused and the captured images and / or the recorded complementary Images from both cameras can be used to extract the foreground.

Im Folgenden werden Ausführungsbeispiele der Erfindung anhand von Zeichnungen näher erläutert.in the Below are embodiments of the Invention explained in more detail with reference to drawings.

Darin zeigen:In this demonstrate:

1a eine Ansicht eines als erweitertes Studio dienenden Raumes mit zwei Kameras, fünf digitalen Videoprojektoren, die den Raum beleuchten und sowohl räumliche virtuelle Objekte als auch Filmsequenzen projizieren, 1a a view of an extended studio space with two cameras, five digital video projectors illuminating the room, and projecting both spatial virtual objects and movies;

1b der Raum aus 1a mit projizierten, nicht wahrnehmbaren Strukturen zur Kameraverfolgung und -nachführung, 1b the room out 1a with projected, imperceptible camera tracking and tracking structures,

2 eine schematische Draufsicht auf einen als erweitertes Studio dienenden Raum mit drei digitalen Videoprojektoren, zwei Kameras und analogen Lichtquellen, 2 a schematic plan view of serving as an extended studio space with three digital video projectors, two cameras and analog light sources,

3a eine Studioanordnung mit einer Person und einem Vordergrundobjekt und einer Projektion im Hintergrund, 3a a studio arrangement with a person and a foreground object and a projection in the background,

3b ein von einer Kamera aufgenommenes Bild der Studioanordnung mit Markern multipler Auflösung als eingebettete Strukturen, 3b a picture taken by a camera of the studio arrangement with markers of multiple resolution as embedded structures,

3c ein mit der Kamera aufgenommenes auf das in 3b gezeigte aufgenommene Bild folgendes komplementäres Bild mit Markern multipler Auflösung als eingebettete Strukturen, 3c a shot with the camera on the in 3b shown captured image with multiple resolution markers as embedded structures,

3d ein aus dem Bild und dem komplementären Bild mittels Real-Time-Flash-Keying extrahierter Vordergrund, 3d a foreground extracted from the image and the complementary image by means of real-time flash keying,

3e eine Ansicht mit den extrahierten Markern für die Kameraverfolgung und -nachführung, 3e a view with the extracted camera tracking and tracking markers,

3f eine Szene in erweiterter Realität mit der Person und dem Vordergrundobjekt aus 3a, einem virtuellen Hintergrund und einer dreidimensionalen Erweiterung 3f a scene in augmented reality with the person and foreground object 3a , a virtual background and a three-dimensional extension

4a eine Darstellung einer wahrnehmbaren relativen Intensität eines wiederholt projizierten Bildes mit unverändertem Bildinhalt, 4a a representation of a perceptible relative intensity of a repeatedly projected Image with unchanged image content,

4b eine Darstellung der wahrnehmbaren relativen Intensität eines wiederholt projizierten Bildes mit abrupt verändertem Bildinhalt, 4b a representation of the perceptible relative intensity of a repeatedly projected image with abruptly changed image content,

4c eine Darstellung der wahrnehmbaren relativen Intensität eines wiederholt projizierten Bildes mit allmählich eingeblendetem und ausgeblendetem Bildinhalt, 4c a representation of the perceived relative intensity of a repeatedly projected image with gradually faded in and hidden image content,

5a einen extrahierten Vordergrund eines aufgenommenen Bildes aus einer Kameraperspektive, 5a an extracted foreground of a captured image from a camera perspective,

5b eine Sichtbarkeitskarte des in 5a gezeigten Bildes, 5b a visibility map of the in 5a shown picture,

5c die Konstruktion eines Sichtbarkeitsbaums für einen Hintergrund des aufgenommenen Bildes mittels eines Quad-Tree-Verfahrens, 5c the construction of a visibility tree for a background of the captured image by means of a quad-tree method,

5d eine Anordnung einer Vielzahl von Marker mit multipler Auflösung in einem Markerbaum auf dem Hintergrund, und 5d an arrangement of a plurality of multiple resolution markers in a marker tree on the background, and

5e ein projiziertes Bild mit den als Strukturen eingebetteten Marker aus 5d. 5e a projected image with the markers embedded as structures 5d ,

Einander entsprechende Teile sind in allen Figuren mit den gleichen Bezugszeichen versehen.each other corresponding parts are in all figures with the same reference numerals Mistake.

In 1a ist eine Ansicht eines als erweitertes Studio dienenden Raumes 1 mit zwei Kameras 2, fünf digitalen Videoprojektoren 3, die den Raum 1 beleuchten und sowohl räumliche Erweiterungen 4 als auch Filmsequenzen 5 projizieren, gezeigt. Weiter sind drei analoge Lichtquellen 6 vorgesehen. Die Szene ist so gezeigt, wie sie von einem Betrachter und weitgehend in einer bearbeiteten Aufnahme wahrgenommen würde. In der Szene des Raumes 1 sind vier Personen 7 sichtbar, die durch Bewegungen Abschattungen verursachen können. Zur Verfolgung und Nachführung der Kameras 2, zum Beispiel bei Bewegung der Kameras 2 oder der Personen 7 werden Marker benötigt, mit denen die Position der Kameras 2 jederzeit eindeutig bestimmbar ist. In 1b ist gezeigt, wie solche Marker 8 als Strukturen C in ein projiziertes Bild eingebettet sind. Gleichzeitig sind die mittels eines Keying-Verfahrens als zum Vordergrund gehörig identifizierten Personen 7 extrahiert, um feststellen zu können, ob bei einer gegebenen Kameraposition ein Marker 8 für die entsprechende Kamera 2 sichtbar oder verschattet ist, um den entsprechenden Marker 8 gegebenenfalls neu zu positionieren.In 1a is a view of a room serving as an extended studio 1 with two cameras 2 , five digital video projectors 3 that the room 1 illuminate and both spatial extensions 4 as well as movie sequences 5 project, shown. Next are three analog light sources 6 intended. The scene is shown as it would be perceived by a viewer and largely in an edited shot. In the scene of the room 1 are four people 7 visible, which can cause shadowing due to movement. For tracking and tracking the cameras 2 , for example, when moving the cameras 2 or the persons 7 Markers are needed to determine the position of the cameras 2 clearly definable at any time. In 1b is shown as such markers 8th as structures C are embedded in a projected image. At the same time, the persons identified as belonging to the foreground by means of a keying method are 7 extracted to determine if a marker is at a given camera position 8th for the corresponding camera 2 visible or shadowed to the corresponding marker 8th if necessary reposition.

In 2 ist eine schematische Draufsicht auf einen als erweitertes Studio dienenden Raum 1 mit drei digitalen Videoprojektoren 3, zwei Kameras 2 und zwei analogen Lichtquellen 6 gezeigt. Die Videoprojektoren 3 werden von einem oder mehreren Computer 9 mit individuellen Bildfolgen B_P versorgt, die sie auf im Raum 1 befindliche Oberflächen projizieren. Die mit den Videoprojektoren 3 synchronisierten Kameras 2 nehmen jeweils Teile des Raumes 1 mit den entsprechenden Projektionen auf. Die dabei entstehenden aufgenommenen Bildfolgen B_A werden wiederum den Computern 9 zugeführt. Dort kann eine Extraktion von Vordergrund und Hintergrund, eine dynamische Positionierung von nicht wahrnehmbaren Marker, die Kameraverfolgung und -nachführung und die letztlich darzustellende Bildfolge berechnet und einer Weiterverarbeitung in einer Produktionseinheit 10 zugeführt werden.In 2 is a schematic plan view of serving as an extended studio space 1 with three digital video projectors 3 , two cameras 2 and two analog light sources 6 shown. The video projectors 3 be from one or more computers 9 supplied with individual image sequences B _P , which she in on space 1 projecting surfaces. The ones with the video projectors 3 synchronized cameras 2 each take parts of the room 1 with the corresponding projections. The resulting captured image sequences B _A turn the computers 9 fed. There, an extraction of foreground and background, dynamic positioning of imperceptible markers, camera tracking and tracking, and the final image sequence to be displayed can be calculated and further processed in a production unit 10 be supplied.

In 3a ist eine Studioanordnung mit einer Person 7 und einem Vordergrundobjekt 11 vor einem Hintergrund 12 gezeigt. Der Hintergrund kann mittels einer Rückprojektion erzeugt sein. Mittels einer Kamera 2 werden Aufnahmen der Szene bei einer hohen Bildwiederholfrequenz von beispielsweise 120 Hz erzeugt. Ein erstes Bild I der dabei entstehenden Bildfolge B_A ist in 3b gezeigt. Dabei werden auf den Hintergrund 12 projizierte Strukturen C in Form von Markern 8 sichtbar, die unterschiedlich skaliert sind. Die Macker 8 sind so in das projizierte Bild I eingebettet, dass sich eine gerade wahrnehmbare Differenz Δ zu einem Originalbild O ohne den Marker 8 ergibt. 3c zeigt ein nachfolgend projiziertes komplementäres Bild I', in dem die Marker 8 sich mit dem Negativen der gerade wahrnehmbaren Differenz Δ vom Originalbild O abheben. Bei hinreichend schneller Projektion, wie sie bei einer Bildwiederholfrequenz von 120 Hz gegeben ist, nimmt ein Betrachter einen Mittelwert (I + I')/2 der beiden Bilder I und I' wahr, was etwa dem Originalbild O entspricht, so dass er die Marker 8 nicht sieht. Wäh rend der Projektion des in 3b gezeigten Bildes I ist der Vordergrund beleuchtet, beispielsweise mittels eines LED-Blitzes, während der Projektion des in 3c gezeigten komplementären Bildes hingegen nicht. Dies ermöglicht eine einfache Extraktion des Vordergrundobjekts 11 und der Person 7 mittels eines Keying-Verfahrens, wie in 3d gezeigt ist. Anhand der Unterschiede zwischen dem Bild I und dem komplementären Bild I' können auch die Marker 8 extrahiert werden, wie in 3e gezeigt ist. In 3e wird dann ein virtueller Hintergrund 12 mit einer räumlichen Erweiterung 4 projiziert.In 3a is a studio arrangement with one person 7 and a foreground object 11 in front of a background 12 shown. The background can be generated by means of a back projection. By means of a camera 2 Recordings of the scene are generated at a high refresh rate of, for example, 120 Hz. A first image I of the resulting image sequence B _A is in 3b shown. It will be on the background 12 projected structures C in the form of markers 8th visible, which are scaled differently. The Macker 8th are so embedded in the projected image I that there is a just noticeable difference Δ to an original image O without the marker 8th results. 3c shows a subsequently projected complementary image I 'in which the markers 8th stand out from the original image O with the negative of the just perceivable difference Δ. With a sufficiently fast projection, as given at a refresh rate of 120 Hz, a viewer perceives a mean value (I + I ') / 2 of the two images I and I', which corresponds approximately to the original image O, so that he the markers 8th does not see. During the projection of the in 3b The image I shown is the foreground illuminated, for example by means of an LED flash, during the projection of the in 3c however, not shown. This allows for easy extraction of the foreground object 11 and the person 7 using a keying method, as in 3d is shown. Based on the differences between the image I and the complementary image I 'can also be the markers 8th be extracted as in 3e is shown. In 3e becomes a virtual background 12 with a spatial extension 4 projected.

Wird eine Struktur C unter Beachtung der gerade wahrnehmbaren Differenz Δ in das projizierte Bild I und/oder das komplementäre Bild I' eingebettet, so verhält sich die wahrnehmbare relative Intensität, wie in 4a gezeigt. Die Struktur C ist über die gesamte Bildfolge B_P unverändert und wird vom Betrachter nicht wahrgenommen. Wird die Struktur C jedoch abrupt ausgeblendet oder abrupt auf eine andere Struktur C umgeschaltet, nimmt ein Betrachter dies als Flimmern wahr wie in 4b gezeigt, da die relative integrierte Lichtmenge 13 der gerade wahrnehmbaren Differenz Δ entspricht. Um dies zu vermeiden, wird eine Struktur C jeweils über mehrere aufeinander folgende Bilder I und komplementäre Bilder I' allmählich ein- und/oder ausgeblendet, wie in 4c gezeigt ist. Für die hier gezeigten drei Blendschritte reduziert sich die relative integrierte Lichtmenge 13 auf Δ/2. Für eine Anzahl s + 1 von Blendschritten reduziert sich die relative integrierte Lichtmenge 13 auf Δ/s, wenn die Differenz kontinuierlich in jedem Schritt um Δ/s abgesenkt bzw. angehoben wird. Eine Umschaltung der eingebetteten Struktur C erfolgt jeweils, wenn die Differenz Null ist.If a structure C is embedded in the projected image I and / or the complementary image I 'taking into account the just perceptible difference Δ, the perceptible relative intensity behaves as in FIG 4a shown. The structure C is unchanged over the entire image sequence B _P and is not perceived by the viewer. However, if the structure C is abruptly hidden or switched abruptly to another structure C, a viewer takes this is true as flicker as in 4b shown as the relative integrated amount of light 13 corresponds to the currently perceivable difference Δ. In order to avoid this, a structure C is gradually faded in and / or faded out over a plurality of successive images I and complementary images I ', as in FIG 4c is shown. For the three blending steps shown here, the relative integrated amount of light is reduced 13 on Δ / 2. For a number s + 1 of blend steps, the relative integrated amount of light is reduced 13 to Δ / s when the difference is continuously lowered or raised by Δ / s in each step. Switching of the embedded structure C takes place in each case if the difference is zero.

In den 5a bis 5e wird verdeutlicht, wie eine Vielzahl von Marker 8 mit multipler Auflösung für die Kameraverfolgung und -nachführung in ein Bild I (entsprechend im komplementären Bild I') eingebettet wird. 5a zeigt dabei einen extrahierten Vordergrund eines aufgenommenen Bildes I aus einer Kameraperspektive mit einer Person 7 und einem Vordergrundobjekt 11. Marker 8 müssen so platziert sein, dass sie infolge Abschattung durch die Person 7 oder das Vordergrundobjekt 11 nicht unsichtbar werden. 5b zeigt das auf einen Pro jektionsschirm transformierte Bild I aus 5a. Für dieses Bild I wird ein Quad-Tree berechnet. Dabei entsteht ein Sichtbarkeitsbaum für den Hintergrund 12. Zuerst wird ein Vollbild-Quad in Projektorauflösung erzeugt und die projektive Transformation wird berechnet, auf die die erzeugte Vordergrundprojektion aus der Kameraperspektive abgebildet ist. Dies wird erreicht durch Nutzung einer Model-View-Matrix, die sich aus dem Tracking des zuvor abgebildeten Bildes ergibt.In the 5a to 5e Clarifies how a variety of markers 8th with multiple resolution for camera tracking and tracking in an image I (corresponding to the complementary image I ') is embedded. 5a shows an extracted foreground of a captured image I from a camera perspective with a person 7 and a foreground object 11 , marker 8th must be placed in such a way that they are shadowed by the person 7 or the foreground object 11 not become invisible. 5b shows the image I transformed on a projection screen 5a , For this image I a quad-tree is calculated. This creates a visibility tree for the background 12 , First, a full-screen quad is generated at projector resolution, and the projective transformation is computed onto which the generated foreground projection is mapped from the camera perspective. This is achieved by using a model-view matrix resulting from the tracking of the previously imaged image.

Das Ergebnis ist ein Binärbild, das die Sichtbarkeit jedes Projektorpixels aus Kamerasicht enthält, was wir als Sichtbarkeitskarte bezeichnen, wie in 5b gezeigt ist. Diese Technik ist analog zu herkömmlichem Shadow Mapping. Die ursprüngliche Sichtbarkeitskarte wird dann benutzt, um die kleinstmögliche Markergröße zu berechnen, die benutzt wird, indem die Anzahl der Projektorpixel, die im Kamerabild aus der vorherigen Perspektive sichtbar sind, ermittelt wird. Wir rechnen die Sichtbarkeitskarte mittels Unterabtastung zu einer Bildpyramide, die die größtmögliche Markergröße in der obersten Ebene (z.B. per Definition 2×2 Marker) bis herunter zu der festgelegten kleinstmöglichen Markergröße in der untersten Ebene (z.B. 16×16 Pixel pro Marker) umfasst. Dies führt zu einer Sichtbarkeitskarte mit multipler Auflösung, die wir Sichtbarkeitsbaum nennen, wie in 5c gezeigt ist. Während der Laufzeit werden der Markerbaum und der Sichtbarkeitsbaum in den korrespondierenden Ebenen kombiniert, wie in 5d gezeigt ist: In einer Top-Down-Richtung werden nur Einträge, die weder verschattet (d.h. die in der selben Sichtbarkeitsbaum-Ebene als sichtbar markiert sind) noch bereits durch Marker 8 höherer Ebenen belegt sind, behandelt. Die verbleibenden Einträge werden dann in der aktuellen Ebene des Markerbaums als belegt gekennzeichnet. Regionen, die über alle Ebenen unsichtbar sind, werden auf der untersten Ebene der Markerpyramide gekennzeichnet. Wenn die unterste Ebene erreicht ist, wird der markierte Markerbaum kollabiert und die nichtüberlappenden Einträge, die durch verschiedene Ebenen belegt sind, werden kombiniert. Daraus ergibt sich ein kodiertes Bild, das die Menge optimal skalierter und platzierter Marker 8 bezüglich der Vordergrundverdeckungen und der Kameraperspektive enthält, wie in 5e gezeigt ist. Lokale Markerregionen können zeitlich geblendet werden, wenn eine Änderung der Marker 8 in einem bestimmten Bereich des Bildes I stattfindet.The result is a binary image that contains the visibility of each projector pixel from a camera view, which we call a visibility map, as in 5b is shown. This technique is analogous to traditional shadow mapping. The original visibility map is then used to calculate the smallest possible marker size that is used by determining the number of projector pixels that are visible in the camera image from the previous perspective. We sub-scan the visibility map to create an image pyramid that includes the largest possible marker size in the top level (eg by definition 2 × 2 markers) down to the lowest possible marker size in the lowest level (eg 16 × 16 pixels per marker). This leads to a visibility map with multiple resolution, which we call visibility tree, as in 5c is shown. During runtime, the marker tree and the visibility tree are combined at the corresponding levels, as in 5d is shown: In a top-down direction, only entries that are neither shaded (ie marked as visible in the same visibility tree level) are already marked 8th higher levels are treated. The remaining entries are then marked as occupied in the current level of the marker tree. Regions that are invisible over all levels are marked at the lowest level of the marker pyramid. When the lowest level is reached, the marked marker tree is collapsed and the non-overlapping entries occupied by different levels are combined. This results in a coded image representing the set of optimally scaled and placed markers 8th Concerning the foreground occlusions and the camera perspective contains as in 5e is shown. Local marker regions can be blinded when changing the marker 8th takes place in a certain area of the image I.

Die Bildwiederholfrequenz von 120 Hz wurde exemplarisch gewählt. Es kann eine andere Bildwiederholfrequenz verwendet werden.The Refresh rate of 120 Hz was selected as an example. It a different refresh rate can be used.

Zur Vermeidung von Clipping bei der Addition und/oder Subtraktion der gerade wahrnehmbaren Differenz Δ zum Bild I oder zum komplementären Bild I kann das Originalbild O skaliert werden. Dabei genügt eine Kontrastreduzierung von 2Δ. Die Helligkeit kann hierfür beispielsweise um 10% angehoben werden, um Kamerarauschen in dunklen Bildregionen zu reduzieren. Praktisch führt dies zu einer Kontrastreduzierung um etwa 10% bis 20%. Das Originalbild O wird insbesondere linear skaliert.to Avoiding clipping in the addition and / or subtraction of the just noticeable difference Δ to Picture I or complementary Image I, the original image O can be scaled. One is enough Contrast reduction of 2Δ. The brightness can do this For example, raised by 10% to camera noise in dark Reduce image regions. In practice, this leads to a reduction in contrast by about 10% to 20%. The original image O becomes, in particular, linear scaled.

11: Raumroom
22: Kameracamera
33: Digitaler Videoprojektordigital video projector
44: Räumliche Erweiterungspatial extension
55: projizerte Filmsequenzprojizerte movie
66: analoge Lichtquelleanalog light source
77: Personperson
88th: Markermarker
99: Computercomputer
1010: Produktionseinheitproduction unit
1111: Vordergrundobjektforeground object
1212: Hintergrundbackground
1313: relative integrierte Lichtmengerelative integrated amount of light
B_A B _A: aufgenommene Bildfolgerecorded image sequence
B_P B _P: projizierte Bildfolgeprojected image sequence
ΔΔ: gerade wahrnehmbare Differenzjust noticeable difference
II: Bildimage
II: komplementäres Bildcomplementary picture
OO: Originalbildoriginal image
ss: Anzahlnumber

Claims

Method for creating augmented reality in a room ( 1 ), in which at least a part of the space ( 1 ) limiting and / or in space ( 1 ) surfaces by means of at least one digital video projector ( 3 ) Images (I, I ') of an image sequence (B _P ) are projected, wherein the images (I, I') are spatially and / or temporally modulated in luminance and / or chrominance, at least one with at least one of the video projectors ( 3 ) synchronized camera ( 2 ) at least part of the room ( 1 ).

A method according to claim 1, characterized in that at least one coded structure (C) in at least one of the projected images (I) is integrated, wherein in an immediately following of the projected images (I '), a complement of the coded structure (C) so integrated is that the structure (C) is only for a viewer or only for the camera ( 2 ) is visible.

Method according to one of claims 1 or 2, characterized in that at least a part of the space ( 1 ) is illuminated by means of the images projected onto the surfaces (I, I ').

Method according to one of the preceding claims, characterized characterized in that the images (I, I ') are modulated pixel by pixel.

Method according to Claim 4, characterized in that a difference (Δ), which is just perceptible to a human eye, is calculated for at least one pixel of at least one original image (O) from an original image sequence which is superimposed on an image for a viewer or one of the cameras ( 2 ) imperceptible structure (C) a positionally identical pixel in an image (I) derived from the original image (O) at least in a red and / or blue and / or green color channel at most around the currently perceivable difference (Δ) and in a complementary image (I ') is changed by at most one negative of the currently perceivable difference (Δ), the image (I) and the complementary image (I') being successively projected.

Method according to claim 5, characterized in that that the alternate projection with a refresh rate takes place, which is greater than is a flicker frequency.

Method according to Claim 6, characterized that the refresh rate is 120 Hz.

Method according to one of claims 5 to 7, characterized that the just noticeable difference (Δ) on a regional image brightness and / or a spatial resolution and / or a spatial frequency of the original image (O) and / or the Structure (C) and / or a temporal frequency of the image (I) and / or of the complementary Picture (I ') and / or a maximum speed of eye movement based on the pixel is determined.

Method according to one of claims 6 to 8, characterized in that at least one of the cameras ( 2 ) is synchronized to the frame rate, wherein the original image (O) of an average of the captured image (I) and the recorded complementary image (I ') and / or the structure (C) of a difference or a quotient of the recorded image ( I) and the captured complementary image (I ') is reconstructed.

Method according to one of claims 5 to 9, characterized that an edge detection algorithm on the captured image (I) and applied to the captured complementary image (I '), an optical flux is calculated and a homography matrix is determined to the captured image (I) and the recorded complementary image (I ') to align each other.

Method according to one of claims 8 to 10, characterized that determines the spatial frequency by applying a Laplace pyramid algorithm becomes.

Method according to claim 11, characterized in that that a Laplace pyramid is used with six stages.

Method according to one of claims 5 to 12, characterized that the green one Channel of the original image (O) is changed by a smaller difference as the red and the blue channel.

Method according to claim 13, characterized in that that difference in the green Channel maximum one quarter of the difference in red and / or blue Channel is.

Method according to one of claims 5 to 14, characterized that the difference to the original image (O) over a sequence of images (I) and complementary Pictures (I ') step by step increased from zero to maximum on the currently perceivable difference (Δ) and / or of the maximum of the currently perceivable difference (Δ) up to Zero is lowered.

Method according to claim 15, characterized in that that a change of structure (C) then takes place when the difference Is zero.

Method according to one of claims 15 or 16, characterized that a number (s) of the steps over which the difference is raised or lowered, depending one by means of the local spatial frequency and the local brightness determined just perceivable brightness and / or contrast difference becomes.

Method according to one of claims 15 to 17, characterized that the just noticeable difference (Δ) is then redetermined, if If the difference is not equal to zero, the original image (O) in the Original sequence opposite a previous original image (O) changes.

Method according to one of claims 5 to 18, characterized in that as structure (C) at least one two-dimensional marker ( 8th ) is used.

A method according to claim 19, characterized in that a position and / or a size of the marker ( 8th ) is dynamically recalculated during the image sequence (B _P ) and changed in the image (I) and in the complementary image (I ') if one of the cameras ( 2 ) or to another of the cameras ( 2 ) and the previous marker ( 8th ) due to shading by a foreground object ( 11 ) or a person ( 7 ) is not or poorly visible in a foreground, foreground objects ( 11 ) and / or persons ( 7 ) are extracted in the foreground by means of a keying method.

Method according to claim 20, characterized in that the size and position of the markers ( 8th ) is determined by means of a quad-tree, wherein the marker size is maximized according to the visibility.

Method according to one of claims 20 or 21, characterized in that in the keying method, the foreground is lighted in each case only in the image (I) or only in the complementary image (I '), wherein foreground objects ( 11 ) and / or persons ( 7 ) in the foreground on the basis of the different brightness in the image (I) and in the complementary image (I ') from the constant light background ( 12 ).

Method according to claim 22, characterized in that that for extraction of the foreground only brightness differences considered which exceed a certain threshold.

Method according to claim 23, characterized that the threshold value is greater as twice the currently largest difference.

Method according to one of claims 22 to 24, characterized in that the flash-like illumination by at least one LED flash or by at least one of the video projectors ( 3 ) he follows.

Method according to one of claims 20 or 21, characterized in that in the keying method the foreground is constantly illuminated, foreground objects ( 11 ) and / or persons ( 7 ) in the foreground on the basis of the constant brightness from the background ( 12 ), which is identified by means of the non-perceivable structures (C) of different brightness in the image (I) and in the complementary image (I ').

Method according to one of claims 20 to 26, characterized in that two of the cameras ( 2 ) are arranged coaxially with each other, one of the cameras ( 2 ) on the foreground and the other of the cameras ( 2 ) on the background ( 12 ) and the recorded images (I) and / or the recorded complementary images (I ') of both cameras ( 2 ) are used to extract the foreground.

Studio, which by means of a procedure after a the claims 1 to 27 illuminated and / or in the means of the method image content representable and / or in the for a camera or a viewer invisible, coded structures can be integrated into the lighting and / or the image content.

Studio according to claim 28, characterized that in addition at least one analog light source is provided.

Studio according to one of claims 28 or 29, characterized that the structures for camera tracking and tracking and / or for keying and / or for 3D extraction can be used.

Studio according to one of claims 28 to 30, characterized that structures for a viewer present in the studio recognizable but for the camera contain non-recordable information.