US20100134599A1 - Arrangement and method for the recording and display of images of a scene and/or an object - Google Patents

Arrangement and method for the recording and display of images of a scene and/or an object Download PDF

Info

Publication number
US20100134599A1
US20100134599A1 US11/988,897 US98889707A US2010134599A1 US 20100134599 A1 US20100134599 A1 US 20100134599A1 US 98889707 A US98889707 A US 98889707A US 2010134599 A1 US2010134599 A1 US 2010134599A1
Authority
US
United States
Prior art keywords
images
tuple
depth
image
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/988,897
Inventor
Ronny Billert
Torma Ferenc
Daniel Fuessel
David Reuss
Alexander Schmidt
Jens Meichsner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3D International Europe GmbH
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Assigned to VISUMOTION GMBH reassignment VISUMOTION GMBH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BILLERT, RONNY, FERENC, TORMA, FUESSEL, DANIEL, MEICHSNER, JENS, RUESS, DAVID, SCHMIDT, ALEXANDER
Publication of US20100134599A1 publication Critical patent/US20100134599A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/133Equalising the characteristics of different image components, e.g. their average brightness or colour balance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/25Image signal generators using stereoscopic image cameras using two or more image sensors with different characteristics other than in their location or field of view, e.g. having different resolutions or colour pickup characteristics; using image signals from one sensor to control the characteristics of another sensor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/388Volumetric displays, i.e. systems where the image is built up from picture elements distributed through a volume
    • H04N13/395Volumetric displays, i.e. systems where the image is built up from picture elements distributed through a volume with depth sampling, i.e. the volume being constructed from a stack or sequence of 2D image planes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/286Image signal generators having separate monoscopic and stereoscopic modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/302Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • H04N13/344Displays for viewing with the aid of special glasses or head-mounted displays [HMD] with head-mounted left-right displays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • the invention relates to an arrangement and a method for the recording and display of images of a scene and/or an object, suitable especially for the display of the recorded images for spatial perception.
  • the invention further relates to a method of transmitting images for spatial perception.
  • the classical stereo camera consisting of two like cameras, one each for a left and right image.
  • high-resolving camera systems are required, though.
  • interpolation of the intermediate views is required. This causes artefacts to be visible especially in the middle views.
  • the invention is based on the problem of setting forth a new way of recording real scenes and/or objects with the least possible effort and subsequently displaying them three-dimensionally in two or more views for spatial perception. Another problem of the invention is to find a suitable method for transmitting images for spatial perception.
  • the problem is solved with an arrangement for recording images of a scene and/or an object and displaying them for spatial perception, this arrangement to comprise the following components:
  • At least two satellite cameras of a second camera type for the recording of images, with the first and second camera types differing by at least one parameter;
  • an image conversion device arranged downstream of the cameras, for receiving and processing the initial image data, this image conversion device performing, among other processes, a depth or disparity recognition, for which only those images are employed that were recorded by cameras of the same camera type (preferably those that were recorded by the at least two satellite cameras), but not the residual images; and
  • a 3D image display device connected to the image conversion device, which displays the image data for spatial perception without special viewing aids, the 3D image display device displaying at least two views of the scene and/or object.
  • the 3D image display device may also display 3, 4, 5, 6, 7, 8, 9 or even more views simultaneously or at an average time.
  • image display devices of the last-named, so called “multi-view” 3D type with 4 or more views displayed the special advantages of the invention take effect, viz. that it is possible, with relatively few (e.g. three or four) cameras, to provide more views than the number of cameras employed.
  • the main and the satellite camera generally, but not imperatively, differ by their quality.
  • the main camera is a high-quality camera
  • the satellite cameras employed may be of lesser quality (e.g., industrial cameras) and thus mostly, but not imperatively, have a lower resolution, among other parameters.
  • the second camera type has a lower resolution than the first one.
  • the two camera types may also differ (at least) by the built-in imaging chip.
  • the advantage of the invention is that, besides the classical stereo camera system, here consisting essentially of two identical high-resolution cameras, a three-camera system is used, preferably consisting of a central high-quality camera and two additional cameras of lower resolution, arranged to the left and right, respectively, of the main camera.
  • the main camera is arranged between the satellite cameras, for example.
  • the main camera is arranged between the satellite cameras.
  • the distances between the cameras and their alignment are variable within customary limits.
  • the use of further satellite cameras may be of advantage, as this enables a further reduction of misinterpretations especially during the subsequent processing of the image data.
  • the general idea of the invention obviously also includes other embodiments, e.g., the use of several main cameras or of still more satellite cameras.
  • All cameras may be arranged in parallel or pointed at a common focus. It is also possible that not all of them are pointed at a common focus (convergence angle).
  • the optical axes of the cameras may lie in one plane or in different planes, with the center points of the objectives preferably arranged in line or on a (preferably isosceles or equilateral) triangle. For special cases of application, the center points of the cameras' objectives may also be spaced at unequal distances relative to each other (with the objective center points forming a scalene triangle). It is further possible that all (at least three) cameras (i.e. all existing main and satellite cameras) differ by at least one parameter, e.g. by their resolution.
  • the cameras can be synchronized with regard to zoom, f-stop, focus etc. as well as with regard to the individual frames (i.e. best possible true-to-frame synchronization in recording).
  • the cameras may be fixed at permanent locations or movable relative to each other; the setting of both the base distance between the cameras and the convergence
  • the beam path preferably in front of the objectives of the various cameras—can be provided with additional optical elements, e.g. one or several semitransparent mirrors.
  • additional optical elements e.g. one or several semitransparent mirrors.
  • a semitransparent mirror in reflection position arranged at an angle of about 45 degrees relative to the principal rays emerging from the objectives of the satellite cameras, would follow, whereas the same mirror follows in transmission position, arranged at an angle of also 45 degrees relative to the principal ray emerging from the objective of the main camera.
  • the objective center points of the main camera and of at least two satellite cameras form an isosceles triangle, e.g., in version “1+2”.
  • the objective center points of the three satellite cameras form a triangle, preferably an isosceles one.
  • the objective center point of the main camera should be arranged within the said triangle, the triangle being assumed to include its sides.
  • one satellite camera and the main camera are optically arranged relative to each other in such a way that both record an image on essentially the same optical axis, with preferably at least one semitransparent mirror being arranged between the two cameras.
  • the two other satellite cameras are preferably arranged so as to form a straight line or a triangle with the satellite camera that is associated with the main camera.
  • a version “1+4” with a quadrangle (e.g. square) of 4 satellite cameras with a main camera inside the quadrangle (e.g., at the center of its area), or even a version “1+n” with a circle of n 5 or more satellite cameras.
  • quadrangle e.g. square
  • main camera inside the quadrangle e.g., at the center of its area
  • the image conversion device generates at least three views of the recorded scene or object, employing for such generation of at least three views, in addition to the depth or disparity data registered, the image recorded by the at least one main camera and at least two more images recorded by the satellite cameras, though not necessarily by all cameras provided. It is quite possible that one of the at least three views generated still is equal to one of the input images. In the simplest case, the image conversion device may even use only the image recorded by the at least one main camera and the associated depth information for generating the views.
  • the main camera (or all main cameras) and all satellite cameras preferably record with frame-accurate synchronization, at a tolerance of maximally 100 frames per 24 hours.
  • black-and-white cameras as satellite cameras, and subsequently automatically assign a tonal value preferably to the images produced by them.
  • the problem is also solved by a method for the recording and display of images of a scene and/or an object, comprising the following steps:
  • Transfer of the image data to an image conversion device in which subsequently a rectification, a color adjustment, a depth or disparity recognition and subsequent generation of further views from n or less than n images of said the n-tuple and the depth or disparity recognition values are carried out, with at least one view being generated that is not exactly equal to any of the images of the n-tuple created, and with the image conversion device employing, for depth or disparity recognition, only such images of the n-tuple that have the same resolution;
  • the depth or disparity recognition employs the images with that equal resolution having the lowest total number of pixels compared to all other resolutions provided.
  • the depth recognition and subsequent generation of further views from the n-tuple of images and the depth or disparity recognition data can be carried out, for example, by creating a stack structure and projecting it onto a desired view.
  • the creation of a stack structure may be replaced by other applicable depth or disparity recognition algorithms, with the depth or disparity values recognized being used for the creation of desired views.
  • a stack structure may, in general, correspond to a layer structure of graphical elements in different (virtual) planes.
  • the line-by-line comparison can possibly be made in an oblique direction rather; this will be favorable if the cameras are not arranged in a horizontal plane. If pixels lying on top of each other have the same tonal value, this will be saved; if they have different tonal values, none of these will be saved. Thereafter, the lines are displaced relative to each other by defined steps (e.g., by 1 ⁇ 4 or 1 ⁇ 2 pixel) in opposite directions; after every step the result of the comparison is saved again. At the end of this process, the three-dimensional stack structure with the coordinates X, Y and Z is obtained, with X and Y corresponding to the pixel coordinates of the input image, whereas Z represents the extent of relative displacement between the views.
  • defined steps e.g., by 1 ⁇ 4 or 1 ⁇ 2 pixel
  • the task essentially consists in deleting the least probable combinations in case of ambiguous representations of image elements in the stack. In addition, this contributes to data reduction. Further reduction is achieved if a height profile curve is derived from the remaining elements to obtain an unambiguous imaging of the tonal values in a discrete depth plane (Z coordinate).
  • the depth is determined for at least three original images of the n-tuple, preferably in the form of depth maps.
  • at least two depth maps having different resolution are created.
  • a reconstruction is performed by inverse projection of the images of the n-tuple into the stack space by means of depth maps, so that the stack structure is reconstructed, and so afterwards again different views can be generated therefrom by projection.
  • Other methods of creating the views from the image data provided (n-tuples of images, depth information) are also possible.
  • the original images of the n-tuple with the respective associated depths can be transmitted to the 3D image display device and then the reconstruction in accordance with the inventive method can be done first.
  • the images of the n-tuple are created, e.g., by means of a 3D camera system, e.g. a multiple camera system consisting of several separate cameras.
  • a depth map is created for each image, so that the rectification, color adjustment and depth or disparity recognition steps can be dropped.
  • at least two of the three depth maps differ in resolution.
  • the image having the higher resolution corresponds, in spatial terms, to a perspective view lying between the perspective views of the other two images.
  • the 3D image display device employed can preferably display 2, 3, 4, 5, 6, 7, 8, 9 or even more views simultaneously or at an average time. It is particularly with such devices, known as “multi-view” 3D image display devices with at least 4 or more views displayed, that the special advantages of the invention take effect, namely, that with relatively few (e.g. three) original images, more views can be provided for spatial display than the number of original images.
  • the combination, mentioned farther above, of at least two different views or images in accordance with the parameter assignment of the 3D display of a 3D image display device for spatial presentation without special viewing aids may contain a combination of views not only from different points in space but in time also.
  • n images e.g. original images, or views
  • a unit for the reconstruction of the stack structure and the unit for the projection of the stack structure onto the desired view, or units of other kind that perform the reconstruction of views differently have to be integrated into the 3D image display device.
  • projection here may, in principle, also mean a mere displacement.
  • depth or disparity recognition methods can be used to detect depth or disparities from the n-tuple of images (with n>2), and/or to generate further views from this n-tuple of images.
  • Such alternative methods or partial methods are described, for example, in the publications “Tao, H. and Sawhney, H.: Global matching criterion and color segmentation based stereo, in Proc. Workshop on the Application of Computer Vision (WACV2000), pp. 246-253, December 2000”, “M. Lin and C. Tomasi: Surfaces with occlusions from layered Stereo. Technical report, Stanford University, 2002. In preparation “C.
  • the images created are transferred to the image conversion device.
  • all views of each image, generated by the image conversion device can be transferred to the 3D image display device.
  • the invention comprises a method for the transmission of 3D information for the purpose of later display for spatial perception without special viewing aids, on the basis of at least three different views, a method in which, starting from at least one n-tuple of images (with n>2) characterizing different angles of view of an object or a scene, with at least two images of the n-tuple having different resolutions, the depth is determined for at least three images, and thereafter at least three images of the n-tuple together with the respective depth information (preferably in the form of depth maps) are transmitted in a transmission channel.
  • At least two of the depth maps determined may differ in resolution.
  • the depth information is determined only from images of the n-tuple having the same resolution.
  • the depth is also generated for at least one image of higher resolution.
  • the depth information determined from images of the n-tuple having the lowest existing resolution can be transformed into a higher resolution by way of edge recognitions in the at least one image of higher resolution. This is helpful especially if, e.g., in the versions “1+2, “1+3” and “1+5” described before, the high-resolution main camera zooms in on various scene details and/or objects, i.e. records at a magnification. In this case, there is no absolute need to vary the zoom settings of the satellite cameras, too. Instead, the resolution of the corresponding depth information for the main camera is increased as described above, so that the desired views can be created with sufficient quality.
  • the transmission channel may be, e.g., a digital TV signal, the Internet or a DVD (HD, SD, BlueRay etc.).
  • a digital TV signal the Internet or a DVD (HD, SD, BlueRay etc.).
  • MPEG-4 can be used to advantage.
  • the image having the higher resolution corresponds, in spatial terms, to a perspective view lying between the perspective views of the other two images.
  • the 3D image display device employed can preferably display 2, 3, 4, 5, 6, 7, 8, 9 or even more views simultaneously or at an average time.
  • multi-view 3D image display devices with 4 or more views displayed the special advantages of the invention take effect, viz. that with relatively few (e.g. three) original images, more views can be provided than the number of original images.
  • the reconstruction from the n-tuple of images transmitted together with the respective depth information (with at least two images of the n-tuple having different resolutions) in different views is performed, e.g., in the following way: In a three-dimensional coordinate system, the color information of each image—observed from a suitable direction—are arranged in the depth positions marked by the respective depth information belonging to the image.
  • volume pixels voxels
  • voxels volume pixels
  • the information transmitted is reconstructible in a highly universal way, e.g. as (perspective) views, tomographic slice images or voxels.
  • image formats are of great advantage for special 3D presentation methods, such as volume 3D displays.
  • meta-information e.g. in a so-called alpha channel in addition.
  • This may be information supplementing the images, such as geometric conditions of the n>2 images (e.g., relative angles, camera parameters), or transparency or contour information.
  • the problem of the invention can be solved by a method of transmitting 3D information for the purpose of subsequent display for spatial perception without special viewing aids, on the basis of at least three different views, whereby, starting from at least one n-tuple of images with n>2 that characterize different viewing angles of an object or scene, the depth is determined for at least three images, and at least three images of the n-tuple together with the respective depth information (preferably in the form of depth maps) are subsequently transmitted in a transmission channel.
  • FIG. 1 a sketch illustrating the principle of the arrangement according to the invention, with a main camera and three satellite cameras;
  • FIG. 2 a version with a main camera and two satellite cameras
  • FIG. 3 a schematic illustration of the step-by-step displacement of two lines against one another, and generation of the Z coordinate;
  • FIG. 4 a scheme of optimization by elimination of ambiguities compared to FIG. 3 ;
  • FIG. 5 a scheme of optimization by reduction of the elements to an unambiguous height profile curve, compared to FIG. 4 ;
  • FIG. 6 a schematic illustration of the step-by-step displacement of three lines against one another, and generation of the Z coordinate;
  • FIG. 7 a scheme of optimization by elimination of ambiguities compared to FIG. 6 ;
  • FIG. 8 a scheme of optimization by reduction of the elements to an unambiguous height profile curve, compared to FIG. 7 ;
  • FIG. 9 a schematic illustration of a projection of a view from the scheme of optimization
  • FIG. 10 a schematic illustration of an image combination of four images, suitable for spatial display without special viewing aids (state of the art).
  • FIG. 11 a schematic illustration of the transmission method according to the invention.
  • An arrangement according to the invention essentially consists of a 3D camera system 1 , an image conversion device 2 and a 3D image display device 3 .
  • the 3D camera system 1 contains three satellite cameras 14 , 15 and 16 , one main camera 13 ;
  • the image conversion device 2 contains a rectification unit 21 , a color adjustment unit 22 , a unit for establishing the stack structure 23 , a unit for the optimization of the stack structure 24 , and a unit 25 for the projection of the stack structure onto the desired view, and
  • the 3D image display device 3 contains an image combination unit 31 and a 3D display 32 , with the 3D display 32 displaying at least two views of a scene or object for spatial presentation.
  • the 3D display 32 can also work on the basis of, say, 3, 4, 5, 6, 7, 8, 9 or even more views.
  • a 3D display 32 of model “Spatial View 19 inch” is eligible, which displays 5 different views at a time.
  • FIG. 2 shows another arrangement according to the invention.
  • the 3D camera system 1 contains a main camera 13 , a first satellite camera 14 , and a second satellite camera 15 .
  • the image conversion device 2 contains a rectification unit 21 , a color adjustment unit 22 , a unit for establishing the stack structure 23 , a unit for the optimization of the stack structure 24 , a unit 25 for projecting the stack structure onto the desired view, and a unit for determining the depth 26 ; and the 3D image display device 3 contains, as shown in FIG. 2 , a unit for the reconstruction of the stack structure 30 , an image combination unit 31 , and a 3D display 32 .
  • the 3D camera system 1 consists of a main camera 13 and two satellite cameras 14 , 15 , with the main camera 13 being a high-quality camera with a high resolving power, whereas the two satellite cameras 14 , 15 are provided with a lower resolving power.
  • the camera positions relative to each other are variable in spacing and alignment within the known limits, so that stereoscopic images can be taken.
  • the rectification unit 21 the camera images are rectified, i.e. a compensation of lens distortions, camera rotations, zoom differences, etc., is made.
  • the rectification unit 21 is followed by the color adjustment unit 22 .
  • the tonal/brightness values of the recorded images are balanced to a common level.
  • the image data thus corrected are now fed to the unit 23 for establishing the stack structure.
  • a line-by-line comparison is made of the input images, but only those of the satellite cameras ( 14 , 15 according to FIG. 2 , or 14 , 15 , 16 according to FIG. 1 ).
  • the comparison according to FIG. 3 is based on the comparison of only two lines each.
  • at first two lines are placed one on top of the other with the same Y coordinate, which, according to FIG. 3 , corresponds to plane 0 .
  • the comparison is made pixel by pixel, and, as shown in FIG.
  • the result of the comparison is saved as a Z coordinate in accordance with the existing comparison plane, a process in which pixels lying on top of each other retain their tonal value if it is identical; if it is not, no tonal value is saved.
  • the lines are displaced by increments of 1 ⁇ 2 pixel each as shown in FIG. 3 , with the pixel being assigned to plane 1 , or a next comparison is made in plane 1 , the result of which is saved in plane 1 (Z coordinate).
  • the comparisons are generally made up to plane 7 and then with plane ⁇ 1 up to plane ⁇ 7 , each being saved as a Z coordinate in the respective plane.
  • the number of planes corresponds to the maximum depth information occurring, and may vary depending on the image content.
  • the three-dimensional structure thus established with the XYZ coordinates means that, for each pixel, the degree of relative displacement between the views is saved via the appertaining Z coordinate.
  • FIG. 6 the same comparison is made on the basis of the embodiment shown in FIG. 1 , save that three lines are compared here accordingly.
  • a simple comparison between FIG. 6 and FIG. 3 shows that the comparison of three lines involves substantially fewer misinterpretations. Thus, it is of advantage to do the comparison with more than two lines.
  • the stack structure established which is distinguished also by the fact that now the input images are no longer present individually, is fed to the subsequent unit 24 for optimization of the stack structure.
  • ambiguous depictions if image elements are identified with the aim to delete such errors due to improbable combinations, so that a corrected set of data is generated in accordance with FIG. 4 or FIG. 7 .
  • a height profile curve that is as shallow or smooth as possible is established from the remaining elements in order to achieve an unambiguous imaging of the tonal values in a discrete depth plane (Z coordinate).
  • Z coordinate a discrete depth plane
  • FIG. 5 and FIG. 8 respectively.
  • the result according to FIG. 5 is now fed to the unit 25 for the projection of the stack structure onto the desired view as shown in FIG. 1 .
  • the stack structure is projected onto a defined plane in the space.
  • the (i.e. each) desired view is generated via the angles of the plane, as can be seen in FIG.
  • At least one view is generated that is not exactly equal to any of the images recorded by the camera system 1 . All views generated are present at the output port of the image conversion device 2 and can thus be transferred to the subsequent 3D image display device 3 for stereoscopic presentation; by means of the image combination unit 31 incorporated, at first the different views are combined in accordance with the given parameter assignment of the 3D display 32 .
  • FIG. 2 illustrates another, optional way for transmitting the processed data to the 3D image display device 3 .
  • the unit 24 for the optimization of the stack structure is followed by the unit 26 for determining the depth (broken line). Determining the depth of the images creates a particularly efficient data transfer format. This is because only three images and three depth images are transferred, preferably in the MPEG-4 format.
  • the 3D image display device 3 is provided, on the input side, with a unit 30 for reconstructing the stack structure, a subsequent image combination unit 31 and a 3D display 32 .
  • the images and depths received can be particularly efficiently reconverted into the stack structure by inverse projection, so that the stack structure can be made available to the subsequent unit 25 for projecting the stack structure onto the desired view.
  • the further procedure is then identical to the version illustrated in FIG. 1 , save for the advantage that not all the views need to be transferred, especially if the unit 25 is integrated in the 3D image display device 3 .
  • This last-named, optional way can also be taken in the embodiment according to FIG. 1 , provided that the circumstances are matched accordingly.
  • FIG. 10 shows a schematic illustration of a state-of-the-art method (JP 08-331605) to create an image combination of four images or views, suitable for spatial presentation on a 3D display without special viewing aids, for example on the basis of a suitable lenticular or barrier technology.
  • JP 08-331605 a state-of-the-art method to create an image combination of four images or views, suitable for spatial presentation on a 3D display without special viewing aids, for example on the basis of a suitable lenticular or barrier technology.
  • the four images or views have been combined in the image combination unit 31 in accordance with the image combination structure suitable for the 3D display 32 .
  • FIG. 11 is a schematic illustration of the transmission method according to the invention.
  • a total of 3 color images and 3 depth images (or streams of moving images accordingly) are transmitted.
  • one of the color image streams has a resolution of 1920 ⁇ 1080 pixels, whereas the other two have a resolution of 1280 ⁇ 720 (or 1024 ⁇ 768) pixels.
  • Each of the appertaining depth images (or depth image streams) is transmitted with half the horizontal and half the vertical resolution, i.e. 960 ⁇ 540 pixels and 640 ⁇ 360 (or 512 ⁇ 384) pixels, respectively.
  • the depth images consist of gray-scale images, e.g. with 256 or 1024 possible gray levels per pixel, with each gray level representing one depth value.
  • the highest-resolution color image would have, for example, 4096 ⁇ 4096 pixels, and the other color images would have 2048 ⁇ 2048 or 1024 ⁇ 1024 pixels.
  • the appertaining depth images are transmitted with half the horizontal and half the vertical resolution. This version would be of advantage if the same data record is to be used for stereoscopic presentations of particularly high resolution (e.g. in the 3D movie theater with right and left images) as well as for less well-resolved 3D presentation on 3D displays, but then with at least two views presented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Stereoscopic And Panoramic Photography (AREA)
  • Studio Devices (AREA)

Abstract

The invention relates to an arrangement and a method for capturing and displaying images of a scene and/or an object. Said arrangement and method are particularly suited to display the captured images in a three-dimensionally perceptible manner. The aim of the invention is to create a new possibility to take images of real scenes and/or objects with as little effort as possible and then autostereoscopically display the same in a three-dimensional fashion from two or more perspectives. Said aim is achieved by providing at least one main camera of a first camera type for recording images, at least one satellite camera of a second camera type for recording images, an image converting device which is mounted behind the cameras, and a 3D image display device, the two camera types differing from each other in at least one parameter. A total of at least three cameras is provided. Also disclosed is a method for transmitting 3D data.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This is a national phase application of International Application No. PCT/DE2007/001965, filed Oct. 29, 2007, which claims priority of German Application No. 10 2006 055 641.0, filed Nov. 22, 2006, the complete disclosures of which are hereby incorporated by reference.
  • BACKGROUND OF THE INVENTION
  • a) Field of the Invention
  • The invention relates to an arrangement and a method for the recording and display of images of a scene and/or an object, suitable especially for the display of the recorded images for spatial perception. The invention further relates to a method of transmitting images for spatial perception.
  • b) Description of the Related Art
  • At present there exist essentially three basically different methods, and the appertaining arrangements, for recording 3D image information:
  • (1) The classical stereo camera, consisting of two like cameras, one each for a left and right image. For a highly resolved display, high-resolving camera systems are required, though. With multichannel systems, interpolation of the intermediate views is required. This causes artefacts to be visible especially in the middle views.
  • (2) The use of a multiview camera system. Its advantage over the stereo camera is the correct image reproduction with multichannel systems. In particular, no interpolations are required. A disadvantage is the great effort needed to implement exact alignment of the—e.g., eight—cameras with each other. Another disadvantage is the increased cost involved in the use of several cameras, which in addition entail further problems such as differing white levels, tonal values and/or geometry characteristics, which have to be balanced accordingly. The high data rate to be handled with this method is also a drawback.
  • (3) The use of a depth camera. Here, use is made of a color camera jointly with a depth sensor, which registers the—as a rule, cyclopean—depth information of the scene to be recorded. Apart from the fact that depth sensors are relatively expensive, their drawback is that they often do not work very exactly, and/or that no acceptable compromise between accuracy and speed is achieved. The method requires a general extrapolation, in which artefacts, especially in the outer views, cannot be excluded and, generally, occluding artefacts cannot be covered up.
  • OBJECT AND SUMMARY OF THE INVENTION
  • The invention is based on the problem of setting forth a new way of recording real scenes and/or objects with the least possible effort and subsequently displaying them three-dimensionally in two or more views for spatial perception. Another problem of the invention is to find a suitable method for transmitting images for spatial perception.
  • According to the invention, the problem is solved with an arrangement for recording images of a scene and/or an object and displaying them for spatial perception, this arrangement to comprise the following components:
  • at least one main camera of a first camera type for the recording of images;
  • at least two satellite cameras of a second camera type for the recording of images, with the first and second camera types differing by at least one parameter;
  • an image conversion device arranged downstream of the cameras, for receiving and processing the initial image data, this image conversion device performing, among other processes, a depth or disparity recognition, for which only those images are employed that were recorded by cameras of the same camera type (preferably those that were recorded by the at least two satellite cameras), but not the residual images; and
  • a 3D image display device connected to the image conversion device, which displays the image data for spatial perception without special viewing aids, the 3D image display device displaying at least two views of the scene and/or object.
  • However, the 3D image display device may also display 3, 4, 5, 6, 7, 8, 9 or even more views simultaneously or at an average time. Especially in image display devices of the last-named, so called “multi-view” 3D type with 4 or more views displayed, the special advantages of the invention take effect, viz. that it is possible, with relatively few (e.g. three or four) cameras, to provide more views than the number of cameras employed.
  • The main and the satellite camera generally, but not imperatively, differ by their quality. Mostly, the main camera is a high-quality camera, whereas the satellite cameras employed may be of lesser quality (e.g., industrial cameras) and thus mostly, but not imperatively, have a lower resolution, among other parameters. In this case, then, the second camera type has a lower resolution than the first one. The two camera types may also differ (at least) by the built-in imaging chip.
  • Essentially, the advantage of the invention is that, besides the classical stereo camera system, here consisting essentially of two identical high-resolution cameras, a three-camera system is used, preferably consisting of a central high-quality camera and two additional cameras of lower resolution, arranged to the left and right, respectively, of the main camera. Thus, the main camera is arranged between the satellite cameras, for example.
  • Preferably then, the main camera is arranged between the satellite cameras. The distances between the cameras and their alignment (either in parallel or pointed at a common focus) are variable within customary limits. The use of further satellite cameras may be of advantage, as this enables a further reduction of misinterpretations especially during the subsequent processing of the image data.
  • According to the embodiment of the invention, it may thus be of advantage that
  • exactly one main camera and two satellite cameras are provided (“version 1+2”);
  • exactly one main camera and three satellite cameras are provided (“version 1+3”); or
  • exactly one main camera and five satellite cameras are provided (“version 1+5”).
  • The general idea of the invention obviously also includes other embodiments, e.g., the use of several main cameras or of still more satellite cameras.
  • All cameras may be arranged in parallel or pointed at a common focus. It is also possible that not all of them are pointed at a common focus (convergence angle). The optical axes of the cameras may lie in one plane or in different planes, with the center points of the objectives preferably arranged in line or on a (preferably isosceles or equilateral) triangle. For special cases of application, the center points of the cameras' objectives may also be spaced at unequal distances relative to each other (with the objective center points forming a scalene triangle). It is further possible that all (at least three) cameras (i.e. all existing main and satellite cameras) differ by at least one parameter, e.g. by their resolution. The cameras can be synchronized with regard to zoom, f-stop, focus etc. as well as with regard to the individual frames (i.e. best possible true-to-frame synchronization in recording). The cameras may be fixed at permanent locations or movable relative to each other; the setting of both the base distance between the cameras and the convergence angles may be automatic.
  • It may be of advantage to provide adapter systems that facilitate fixing especially the satellite cameras to the main camera. In this way, ordinary cameras can be subsequently converted into a 3D camera. It is also feasible, though, to convert an existing stereo camera system into a 3D camera conforming to the invention by retrofitting an added main camera.
  • Furthermore, the beam path—preferably in front of the objectives of the various cameras—can be provided with additional optical elements, e.g. one or several semitransparent mirrors. This makes it possible, e.g., to arrange each of two satellite cameras rotated 90 degrees relative to the main camera, so that the camera bodies of all three cameras are arranged in such a way that their objective center points are closer together horizontally than they would be if all three cameras were arranged immediately side by side, in which case the dimension of the camera bodies would necessitate a certain, greater spacing of the objective center points. In the constellation with the two satellite cameras rotated 90 degrees, a semitransparent mirror in reflection position, arranged at an angle of about 45 degrees relative to the principal rays emerging from the objectives of the satellite cameras, would follow, whereas the same mirror follows in transmission position, arranged at an angle of also 45 degrees relative to the principal ray emerging from the objective of the main camera.
  • Preferably, the objective center points of the main camera and of at least two satellite cameras form an isosceles triangle, e.g., in version “1+2”.
  • For version “1+3” it may be of advantage that the objective center points of the three satellite cameras form a triangle, preferably an isosceles one. In this case, the objective center point of the main camera should be arranged within the said triangle, the triangle being assumed to include its sides.
  • Moreover, in version “1+3” it is possible that one satellite camera and the main camera are optically arranged relative to each other in such a way that both record an image on essentially the same optical axis, with preferably at least one semitransparent mirror being arranged between the two cameras.
  • In this case, the two other satellite cameras are preferably arranged so as to form a straight line or a triangle with the satellite camera that is associated with the main camera.
  • Other embodiments can also be implemented, such as, e.g., a version “1+4” with a quadrangle (e.g. square) of 4 satellite cameras with a main camera inside the quadrangle (e.g., at the center of its area), or even a version “1+n” with a circle of n=5 or more satellite cameras.
  • Advantageously, the image conversion device generates at least three views of the recorded scene or object, employing for such generation of at least three views, in addition to the depth or disparity data registered, the image recorded by the at least one main camera and at least two more images recorded by the satellite cameras, though not necessarily by all cameras provided. It is quite possible that one of the at least three views generated still is equal to one of the input images. In the simplest case, the image conversion device may even use only the image recorded by the at least one main camera and the associated depth information for generating the views.
  • The main camera (or all main cameras) and all satellite cameras preferably record with frame-accurate synchronization, at a tolerance of maximally 100 frames per 24 hours.
  • For special embodiments it may also be useful to use black-and-white cameras as satellite cameras, and subsequently automatically assign a tonal value preferably to the images produced by them.
  • The problem is also solved by a method for the recording and display of images of a scene and/or an object, comprising the following steps:
  • Creation of at least an n-tuple of images, with n>2, with at least two images having different resolutions;
  • Transfer of the image data to an image conversion device, in which subsequently a rectification, a color adjustment, a depth or disparity recognition and subsequent generation of further views from n or less than n images of said the n-tuple and the depth or disparity recognition values are carried out, with at least one view being generated that is not exactly equal to any of the images of the n-tuple created, and with the image conversion device employing, for depth or disparity recognition, only such images of the n-tuple that have the same resolution;
  • Subsequent creation of a combination of at least three different views or images in accordance with the parameter assignment of the 3D display of a 3D image display device for spatial presentation without special viewing aids; and finally
  • Presentation of the combined 3D image on the 3D display.
  • The depth or disparity recognition employs the images with that equal resolution having the lowest total number of pixels compared to all other resolutions provided.
  • The depth recognition and subsequent generation of further views from the n-tuple of images and the depth or disparity recognition data can be carried out, for example, by creating a stack structure and projecting it onto a desired view.
  • The creation of a stack structure may be replaced by other applicable depth or disparity recognition algorithms, with the depth or disparity values recognized being used for the creation of desired views.
  • A stack structure may, in general, correspond to a layer structure of graphical elements in different (virtual) planes.
  • If a 3D camera system consisting of cameras of different types with different image resolutions is used, it is possible first to carry out a size adaptation after transfer of the image data to the image conversion device. The result of this are images that all have the same resolution. This may correspond to the highest resolution of the cameras, but preferably it is equal to that of the lowest-resolution camera(s). Subsequently, the camera images are rectified, i.e. their geometric distortions are corrected (compensation of lens distortions, misalignment of cameras, zoom differences, etc., if any). The size adaptation may also be performed within the rectifying process. Immediately after, a color adjustment, is carried out, e.g. as taught by the publications “Joshi, N. Color Calibration for Arrays of Inexpensive Image Sensors. Technical Report CSTR 2004-02 Mar. 31, 2004 Apr. 4, 2004, Stanford University, 2004” and A. LLie and G. Welch. “Ensuring color consistency across multiple cameras”, ICCV 2005. In particular, the tonal/brightness values of the camera images are matched, so that they are at an equal or at least comparable level. For the image data thus provided, the stack structure for depth recognition is established. In this process, the input images (only the images of the n-tuple having the same resolution), stacked on top of each other in the first step, are compared with each other line by line. The line-by-line comparison can possibly be made in an oblique direction rather; this will be favorable if the cameras are not arranged in a horizontal plane. If pixels lying on top of each other have the same tonal value, this will be saved; if they have different tonal values, none of these will be saved. Thereafter, the lines are displaced relative to each other by defined steps (e.g., by ¼ or ½ pixel) in opposite directions; after every step the result of the comparison is saved again. At the end of this process, the three-dimensional stack structure with the coordinates X, Y and Z is obtained, with X and Y corresponding to the pixel coordinates of the input image, whereas Z represents the extent of relative displacement between the views. Thus, if two or three cameras are used, always two or three lines, respectively, are compared and displaced relative to each other. It is also possible to use more than two, e.g., three cameras and still combine always two lines only, in which case the comparisons have to be matched once more. If three or more lines are compared, there are far fewer ambiguities than with the comparison of the two lines of two input images only. In the subsequent optimization of the stack structure, the task essentially consists in deleting the least probable combinations in case of ambiguous representations of image elements in the stack. In addition, this contributes to data reduction. Further reduction is achieved if a height profile curve is derived from the remaining elements to obtain an unambiguous imaging of the tonal values in a discrete depth plane (Z coordinate). What normally follows now is the projection of the stack structure onto the desired views. At least two views should be created, one of which might still be equal to one of the input images. However, this is done, as a rule, with the particular 3D image display device in mind that is used thereafter. The subsequent combination of the different views provided corresponds to the parameter assignment of the 3D display.
  • Once the stack structure has been created, or following the method steps in accordance with the invention, the depth is determined for at least three original images of the n-tuple, preferably in the form of depth maps. Preferably, at least two depth maps having different resolution are created.
  • Further, after the original images of the n-tuple and the respective associated depths have been taken over, preferably a reconstruction is performed by inverse projection of the images of the n-tuple into the stack space by means of depth maps, so that the stack structure is reconstructed, and so afterwards again different views can be generated therefrom by projection. Other methods of creating the views from the image data provided (n-tuples of images, depth information) are also possible.
  • Moreover, the original images of the n-tuple with the respective associated depths can be transmitted to the 3D image display device and then the reconstruction in accordance with the inventive method can be done first.
  • In general, the images of the n-tuple are created, e.g., by means of a 3D camera system, e.g. a multiple camera system consisting of several separate cameras.
  • Alternatively it is possible, in the method described above for the recording and display of images of a scene and/or an object, to create the images by means of a computer. In this case, preferably a depth map is created for each image, so that the rectification, color adjustment and depth or disparity recognition steps can be dropped. Preferably, at least two of the three depth maps differ in resolution. In a preferred embodiment, n=3 images may be provided, one of which has the (full-color) resolution of 1920×1080 pixels and the other two have the (full-color) resolution of 1280×720 (or 1024×768) pixels, whereas the appertaining depth maps have 960×540 and 640×360 (or 512×384) pixels, respectively. The image having the higher resolution corresponds, in spatial terms, to a perspective view lying between the perspective views of the other two images.
  • The 3D image display device employed can preferably display 2, 3, 4, 5, 6, 7, 8, 9 or even more views simultaneously or at an average time. It is particularly with such devices, known as “multi-view” 3D image display devices with at least 4 or more views displayed, that the special advantages of the invention take effect, namely, that with relatively few (e.g. three) original images, more views can be provided for spatial display than the number of original images. By the way, the combination, mentioned farther above, of at least two different views or images in accordance with the parameter assignment of the 3D display of a 3D image display device for spatial presentation without special viewing aids may contain a combination of views not only from different points in space but in time also.
  • Another important advantage of the invention is the fact that, after the optimization of the stack structure, the depth is determined per original image. The resulting data have an extremely efficient data transfer format, viz. as n images (e.g. original images, or views) plus n depth images (preferably with n=3), so that a data rate is achieved that is markedly lower than that required if all views were transferred. As a consequence, a unit for the reconstruction of the stack structure and the unit for the projection of the stack structure onto the desired view, or units of other kind that perform the reconstruction of views differently, have to be integrated into the 3D image display device.
  • For the steps mentioned above, it is possible to use disparity instead of depth. By the way, the term “projection” here may, in principle, also mean a mere displacement.
  • Of course, other depth or disparity recognition methods than the one described before can be used to detect depth or disparities from the n-tuple of images (with n>2), and/or to generate further views from this n-tuple of images. Such alternative methods or partial methods are described, for example, in the publications “Tao, H. and Sawhney, H.: Global matching criterion and color segmentation based stereo, in Proc. Workshop on the Application of Computer Vision (WACV2000), pp. 246-253, December 2000”, “M. Lin and C. Tomasi: Surfaces with occlusions from layered Stereo. Technical report, Stanford University, 2002. In preparation “C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, Richard Szeliski: High-quality video view interpolation using a layered representation, International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2004, Los Angeles, Calif., pp: 600-608”, “S. M. Seitz and C. R. Dyer: View Morphing, Proc. SIGGRAPH 96, 1996, 21-30”.
  • By the method according to the invention, it is possible, in principle, that the images created are transferred to the image conversion device. Moreover, all views of each image, generated by the image conversion device can be transferred to the 3D image display device.
  • In an advantageous embodiment, the invention comprises a method for the transmission of 3D information for the purpose of later display for spatial perception without special viewing aids, on the basis of at least three different views, a method in which, starting from at least one n-tuple of images (with n>2) characterizing different angles of view of an object or a scene, with at least two images of the n-tuple having different resolutions, the depth is determined for at least three images, and thereafter at least three images of the n-tuple together with the respective depth information (preferably in the form of depth maps) are transmitted in a transmission channel.
  • In a preferred embodiment, the n-tuple of images is a quadruple of images (n=4), with preferably three images having the same resolution and the forth one having a higher resolution, and with the fourth image preferably belonging to the images transmitted in the transmission channel, so that, for example, one high-resolution image and two of the lower-resolution images are transmitted together with the depth information.
  • Herein, at least two of the depth maps determined may differ in resolution. The depth information is determined only from images of the n-tuple having the same resolution.
  • It is also possible that, from the depth information determined, the depth is also generated for at least one image of higher resolution.
  • In a further development of the method according to the invention as well as of the transmission method, the depth information determined from images of the n-tuple having the lowest existing resolution can be transformed into a higher resolution by way of edge recognitions in the at least one image of higher resolution. This is helpful especially if, e.g., in the versions “1+2, “1+3” and “1+5” described before, the high-resolution main camera zooms in on various scene details and/or objects, i.e. records at a magnification. In this case, there is no absolute need to vary the zoom settings of the satellite cameras, too. Instead, the resolution of the corresponding depth information for the main camera is increased as described above, so that the desired views can be created with sufficient quality.
  • Further developments provide for a great number of n-tuples of images and associated depth information to be processed in succession, so that a spatial display of moving images is made possible. Finally in this case, it is also possible to perform a spatial and temporal filtering of the great number of n-tuples of images.
  • The transmission channel may be, e.g., a digital TV signal, the Internet or a DVD (HD, SD, BlueRay etc.). As a compression standard, MPEG-4 can be used to advantage.
  • It is also of advantage if at least two of the three depth maps have different resolutions. For example, in a preferred embodiment, n=3 images may be provided, one of them having the (full-color) resolution of 1920×1080 pixels, and two having the (full-color) resolution of 1280×720 (or 1024×768) pixels, whereas the pertaining depth maps have 960×540 and 640×360 (or 512×384) pixels, respectively. The image having the higher resolution corresponds, in spatial terms, to a perspective view lying between the perspective views of the other two images.
  • The 3D image display device employed can preferably display 2, 3, 4, 5, 6, 7, 8, 9 or even more views simultaneously or at an average time. Especially in those mentioned last, known as “multi-view” 3D image display devices with 4 or more views displayed, the special advantages of the invention take effect, viz. that with relatively few (e.g. three) original images, more views can be provided than the number of original images. The reconstruction from the n-tuple of images transmitted together with the respective depth information (with at least two images of the n-tuple having different resolutions) in different views is performed, e.g., in the following way: In a three-dimensional coordinate system, the color information of each image—observed from a suitable direction—are arranged in the depth positions marked by the respective depth information belonging to the image. This creates a colored three-dimensional volume with volume pixels (voxels), which can be imaged from different perspectives or directions by a virtual camera or by parallel projections. In this way, more than three views can be advantageously regenerated from the information transmitted. Other reconstruction algorithms for the views or images are possible as well.
  • Regardless of this, the information transmitted is reconstructible in a highly universal way, e.g. as (perspective) views, tomographic slice images or voxels. Such image formats are of great advantage for special 3D presentation methods, such as volume 3D displays.
  • Moreover, in all transmission versions proposed by this invention it is possible to transmit meta-information, e.g. in a so-called alpha channel in addition. This may be information supplementing the images, such as geometric conditions of the n>2 images (e.g., relative angles, camera parameters), or transparency or contour information.
  • Finally, the problem of the invention can be solved by a method of transmitting 3D information for the purpose of subsequent display for spatial perception without special viewing aids, on the basis of at least three different views, whereby, starting from at least one n-tuple of images with n>2 that characterize different viewing angles of an object or scene, the depth is determined for at least three images, and at least three images of the n-tuple together with the respective depth information (preferably in the form of depth maps) are subsequently transmitted in a transmission channel.
  • Preferably the n-tuple of images is a triple of images (n=3), with the three images having the same resolution. It is also possible, however, that, e.g., n=5 or n=6 cameras generate 5 or 6 images each, so that the depth information is determined from the quintuple or sixtuple of images, or at least from three images of them, and 3 of the 5 or 6 images together with their depth maps are subsequently transmitted, even with the added possibility of a reduction of the resolution of individual images and/or depth maps.
  • Below, the invention is described in greater detail by example embodiments.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The drawings show
  • FIG. 1: a sketch illustrating the principle of the arrangement according to the invention, with a main camera and three satellite cameras;
  • FIG. 2: a version with a main camera and two satellite cameras;
  • FIG. 3: a schematic illustration of the step-by-step displacement of two lines against one another, and generation of the Z coordinate;
  • FIG. 4: a scheme of optimization by elimination of ambiguities compared to FIG. 3;
  • FIG. 5: a scheme of optimization by reduction of the elements to an unambiguous height profile curve, compared to FIG. 4;
  • FIG. 6: a schematic illustration of the step-by-step displacement of three lines against one another, and generation of the Z coordinate;
  • FIG. 7: a scheme of optimization by elimination of ambiguities compared to FIG. 6;
  • FIG. 8: a scheme of optimization by reduction of the elements to an unambiguous height profile curve, compared to FIG. 7;
  • FIG. 9: a schematic illustration of a projection of a view from the scheme of optimization;
  • FIG. 10: a schematic illustration of an image combination of four images, suitable for spatial display without special viewing aids (state of the art); and
  • FIG. 11: a schematic illustration of the transmission method according to the invention.
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • An arrangement according to the invention essentially consists of a 3D camera system 1, an image conversion device 2 and a 3D image display device 3. As shown in FIG. 1, the 3D camera system 1 contains three satellite cameras 14, 15 and 16, one main camera 13; the image conversion device 2 contains a rectification unit 21, a color adjustment unit 22, a unit for establishing the stack structure 23, a unit for the optimization of the stack structure 24, and a unit 25 for the projection of the stack structure onto the desired view, and the 3D image display device 3 contains an image combination unit 31 and a 3D display 32, with the 3D display 32 displaying at least two views of a scene or object for spatial presentation. The 3D display 32 can also work on the basis of, say, 3, 4, 5, 6, 7, 8, 9 or even more views. As an example, a 3D display 32 of model “Spatial View 19 inch” is eligible, which displays 5 different views at a time.
  • FIG. 2 shows another arrangement according to the invention. Here, the 3D camera system 1 contains a main camera 13, a first satellite camera 14, and a second satellite camera 15. The image conversion device 2 contains a rectification unit 21, a color adjustment unit 22, a unit for establishing the stack structure 23, a unit for the optimization of the stack structure 24, a unit 25 for projecting the stack structure onto the desired view, and a unit for determining the depth 26; and the 3D image display device 3 contains, as shown in FIG. 2, a unit for the reconstruction of the stack structure 30, an image combination unit 31, and a 3D display 32.
  • According to the embodiment shown in FIG. 2, the 3D camera system 1 consists of a main camera 13 and two satellite cameras 14, 15, with the main camera 13 being a high-quality camera with a high resolving power, whereas the two satellite cameras 14, 15 are provided with a lower resolving power. As usual, the camera positions relative to each other are variable in spacing and alignment within the known limits, so that stereoscopic images can be taken. In the rectification unit 21, the camera images are rectified, i.e. a compensation of lens distortions, camera rotations, zoom differences, etc., is made. The rectification unit 21 is followed by the color adjustment unit 22. Here, the tonal/brightness values of the recorded images are balanced to a common level. The image data thus corrected are now fed to the unit 23 for establishing the stack structure.
  • Now, in principle, a line-by-line comparison is made of the input images, but only those of the satellite cameras (14, 15 according to FIG. 2, or 14, 15, 16 according to FIG. 1). The comparison according to FIG. 3 is based on the comparison of only two lines each. In the first step, at first two lines are placed one on top of the other with the same Y coordinate, which, according to FIG. 3, corresponds to plane 0. The comparison is made pixel by pixel, and, as shown in FIG. 3, the result of the comparison is saved as a Z coordinate in accordance with the existing comparison plane, a process in which pixels lying on top of each other retain their tonal value if it is identical; if it is not, no tonal value is saved. In the second step, the lines are displaced by increments of ½ pixel each as shown in FIG. 3, with the pixel being assigned to plane 1, or a next comparison is made in plane 1, the result of which is saved in plane 1 (Z coordinate). As can be seen from FIG. 3, the comparisons are generally made up to plane 7 and then with plane −1 up to plane −7, each being saved as a Z coordinate in the respective plane. The number of planes corresponds to the maximum depth information occurring, and may vary depending on the image content. The three-dimensional structure thus established with the XYZ coordinates means that, for each pixel, the degree of relative displacement between the views is saved via the appertaining Z coordinate. As shown in FIG. 6, the same comparison is made on the basis of the embodiment shown in FIG. 1, save that three lines are compared here accordingly. A simple comparison between FIG. 6 and FIG. 3 shows that the comparison of three lines involves substantially fewer misinterpretations. Thus, it is of advantage to do the comparison with more than two lines. The stack structure established, which is distinguished also by the fact that now the input images are no longer present individually, is fed to the subsequent unit 24 for optimization of the stack structure. Here, ambiguous depictions if image elements are identified with the aim to delete such errors due to improbable combinations, so that a corrected set of data is generated in accordance with FIG. 4 or FIG. 7. In the next step, a height profile curve that is as shallow or smooth as possible is established from the remaining elements in order to achieve an unambiguous imaging of the tonal values in a discrete depth plane (Z coordinate). The results are shown in FIG. 5 and FIG. 8, respectively. The result according to FIG. 5 is now fed to the unit 25 for the projection of the stack structure onto the desired view as shown in FIG. 1. Here, the stack structure is projected onto a defined plane in the space. The (i.e. each) desired view is generated via the angles of the plane, as can be seen in FIG. 9. As a rule, at least one view is generated that is not exactly equal to any of the images recorded by the camera system 1. All views generated are present at the output port of the image conversion device 2 and can thus be transferred to the subsequent 3D image display device 3 for stereoscopic presentation; by means of the image combination unit 31 incorporated, at first the different views are combined in accordance with the given parameter assignment of the 3D display 32.
  • FIG. 2 illustrates another, optional way for transmitting the processed data to the 3D image display device 3. Here, the unit 24 for the optimization of the stack structure is followed by the unit 26 for determining the depth (broken line). Determining the depth of the images creates a particularly efficient data transfer format. This is because only three images and three depth images are transferred, preferably in the MPEG-4 format. According to FIG. 2, the 3D image display device 3 is provided, on the input side, with a unit 30 for reconstructing the stack structure, a subsequent image combination unit 31 and a 3D display 32. In the unit 30 for reconstructing the stack structure, the images and depths received can be particularly efficiently reconverted into the stack structure by inverse projection, so that the stack structure can be made available to the subsequent unit 25 for projecting the stack structure onto the desired view. The further procedure is then identical to the version illustrated in FIG. 1, save for the advantage that not all the views need to be transferred, especially if the unit 25 is integrated in the 3D image display device 3. This last-named, optional way can also be taken in the embodiment according to FIG. 1, provided that the circumstances are matched accordingly.
  • For better understanding, FIG. 10 shows a schematic illustration of a state-of-the-art method (JP 08-331605) to create an image combination of four images or views, suitable for spatial presentation on a 3D display without special viewing aids, for example on the basis of a suitable lenticular or barrier technology. For that purpose, the four images or views have been combined in the image combination unit 31 in accordance with the image combination structure suitable for the 3D display 32.
  • FIG. 11, finally, is a schematic illustration of the transmission method according to the invention. In an MPEG-4 data stream, a total of 3 color images and 3 depth images (or streams of moving images accordingly) are transmitted. To particular advantage, one of the color image streams has a resolution of 1920×1080 pixels, whereas the other two have a resolution of 1280×720 (or 1024×768) pixels. Each of the appertaining depth images (or depth image streams) is transmitted with half the horizontal and half the vertical resolution, i.e. 960×540 pixels and 640×360 (or 512×384) pixels, respectively. In the simplest case, the depth images consist of gray-scale images, e.g. with 256 or 1024 possible gray levels per pixel, with each gray level representing one depth value.
  • In another embodiment, the highest-resolution color image would have, for example, 4096×4096 pixels, and the other color images would have 2048×2048 or 1024×1024 pixels. The appertaining depth images (or depth image streams) are transmitted with half the horizontal and half the vertical resolution. This version would be of advantage if the same data record is to be used for stereoscopic presentations of particularly high resolution (e.g. in the 3D movie theater with right and left images) as well as for less well-resolved 3D presentation on 3D displays, but then with at least two views presented.
  • While the foregoing description and drawings represent the represent the present invention, invention, it will be obvious to those skilled in the art that various changes may be made therein without departing from the true spirit and scope of the present invention.
  • LIST OF REFERENCE NUMBERS
    • 1 Camera system
    • 13 Main camera
    • 14 First satellite camera
    • 15 Second satellite camera
    • 16 Third satellite camera
    • 2 Image conversion device
    • 21 Rectification unit
    • 22 Color adjustment unit
    • 23 Unit for establishing the stack structure
    • 24 Unit for optimizing the stack structure
    • 26 Unit for projecting the stack structure onto the desired view
    • 26 Unit for determining the depth
    • 3 3D image display device
    • 30 Unit for reconstructing the stack structure
    • 31 Image combination unit
    • 32 3D display

Claims (43)

1-39. (canceled)
40. The arrangement for the recording of images of a scene and/or an object and their display for spatial perception, comprising:
at least one main camera of a first camera type for the recording of images;
at least two satellite cameras of a second camera type for the recording of images, with the camera types differing in at least one parameter;
an image conversion device, arranged downstream of the cameras, that receives and processes the initial image data, said image conversion device performing, among other processes, a depth or disparity recognition employing only those images recorded by cameras of the same camera type (by said at least two satellite cameras), but not the remaining images; and
a 3D image display device, connected to the image conversion device, that displays the provided image data for spatial perception without special aids, with the 3D image display device displaying at least two views.
41. The arrangement as claimed in claim 40, wherein the two camera types differ at least in the resolution of the images to be recorded.
42. The arrangement as claimed in claim 40, wherein the two camera types differ at least in the built-in imaging chip.
43. The arrangement as claimed in claim 40, wherein exactly one main camera and two satellite cameras are provided.
44. The arrangement as claimed in claim 40, wherein exactly one main camera and three satellite cameras are provided.
45. The arrangement as claimed in claim 40, wherein exactly one main camera and five satellite cameras are provided.
46. The arrangement as claimed in claim 40, wherein the second camera type has a lower resolution than the first camera type.
47. The arrangement as claimed in claim 43, wherein the main camera is arranged between the satellite cameras.
48. The arrangement as claimed in claim 40, wherein at least one partially transparent mirror is arranged in front of each of the objectives of the main camera and all satellite cameras.
49. The arrangement as claimed in claim 44, wherein the center points of the objectives of the three satellite cameras form a triangle.
50. The arrangement as claimed in claim 49, wherein the triangle is a isosceles triangle.
51. The arrangement as claimed in claim 49, wherein the center point of the objective of the main camera is arranged inside said triangle, with the triangle to be understood to include its sides.
52. The arrangement as claimed in claim 44, wherein one satellite camera and the main camera are optically arranged relative to each other in such a way that both record an image on essentially the same optical axis, for which purpose preferably at least one partially transparent mirror is arranged between the two cameras.
53. The arrangement as claimed in claim 52, wherein the two other satellite cameras are arranged to form a straight line or a triangle together with the satellite camera associated to the main camera.
54. The arrangement as claimed in claim 40, wherein the image conversion device generates at least two views of the scene or object recorded, and that, for generating these at least two views, the image conversion device employs, besides the depth or disparity data recognized, the image recorded by the at least one main camera and at least one more image recorded by the satellite cameras, but not necessarily the images of all cameras provided.
55. The arrangement as claimed in claim 54, wherein one of the at least three views generated is still equal to one of the input images.
56. The arrangement as claimed in claim 40, wherein the main camera or all main cameras, and all satellite cameras record with frame-accurate synchronization at a tolerance of maximally 100 frames per 24 hours.
57. A method for the recording and display of images of a scene and/or an object, comprising the following steps:
generating at least one n-tuple of images, with n>2, with at least two images of the n-tuple having different resolutions;
transferring the image data to an image conversion device, in which then a rectification, a color adjustment, a depth or disparity recognition and subsequent generation of further views from the n or less than n images of the said n-tuple and from the depth or disparity recognition data are carried out, with at least one view being generated that is not exactly equal to any of the n-tuple of images generated, and with the image conversion device employing, for depth or disparity recognition, only such images of the n-tuple that have the same resolution;
subsequently generating a combination of at least two different views or images in accordance with the parameter assignment of the 3D display of a 3D image display device, for spatial presentation without special aids; and
finally presenting the combined 3D image on the 3D display.
58. The method as claimed in claim 57, wherein, for the depth or disparity recognition, those images of equal resolution are employed whose resolution has the lowest total number of pixels compared with all other resolutions provided.
59. The method as claimed in claim 58, wherein, for depth recognition, a stack structure is established by means of a line-by-line comparison of the pre-processed initial image data of an n-tuple, precisely, of those images of the n-tuple only that have the same resolution, in such a way that first those lines of the different images of an n-tuple which have the same Y coordinate are placed in register on top of each other and then a first comparison is made, the result of the comparison being saved in one line in such a way that equal tonal values in register are saved, whereas different tonal values are deleted, which is followed by a displacement of the lines in opposite directions by specified increments of preferably ¼ to 2 pixels, the results after each increment being saved in further lines analogously to the first comparison; so that, as a result after the comparisons made for each pixel, the Z coordinate provides the information about the degree of displacement of the views relative to each other.
60. The method as claimed in claim 59, wherein, after the establishment of the stack structure, an optimization is made in such a way that ambiguities are eliminated, and/or a reduction of the elements to an unambiguous height profile curve is carried out.
61. The method as claimed in claim 59, wherein, after the establishment of the stack structure or after the steps described in claim 60, the depth is determined for at least three original images of the n-tuple, preferably in the form of depth maps.
62. The method as claimed in claim 61, wherein, after transfer of the original images of the n-tuple and the respective depths appertaining to them, a reconstruction is carried out by inverse projection of the views of the n-tuple into the stack space by depth maps, so that die stack structure is reconstructed, and so that again different views can be subsequently generated therefrom by projection.
63. The method as claimed in claim 57, wherein the images generated are transmitted to the image conversion device.
64. The method as claimed in claim 57, wherein all views generated of each image by the image conversion device are transmitted to the 3D image display device.
65. The method as claimed in claim 61, wherein the original images of the n-tuple with the respective depths appertaining to them are transmitted to the 3D image display device, after which first the reconstruction according to claim 62 is carried out.
66. The method as claimed in claim 57, wherein the images of the n-tuple are generated by a 3D camera system.
67. The method as claimed in claim 57, wherein the images of the n-tuple are generated by a computer.
68. The method as claimed in claim 61, wherein at least two depth maps differing in resolution are generated.
69. A method for the transmission of 3D information for the purpose of later display for spatial perception without special aids, on the basis of at least two different views, comprising the steps of:
proceeding from at least one n-tuple of images, with n>2, which characterize different angles of view of an object or a scene, with at least two images of the n-tuple having different resolutions;
determining the depth for at least three images; and
thereafter, at least three images of the n-tuple, together with the respective depth information, are transmitted in a transmission channel.
70. The method as claimed in claim 69, wherein the depth information is in the form of depth maps.
71. The method as claimed in claim 69, wherein the n-tuple of images is a quadruple of images (n=4), with three images preferably having the same resolution, whereas the fourth image has a higher resolution and preferably belongs to the images transmitted in the transmission channel.
72. The method as claimed in claim 69, wherein at least two of the three depth maps have different resolutions.
73. The method as claimed in claim 69, wherein the image data and the depth information are generated in the MPEG-4 format.
74. The method as claimed in claim 69, wherein the depth information is determined only from such images of the n-tuple that have the same resolution.
75. The method as claimed in claim 74, wherein, from the depth information determined, the depth also for at least one image of higher resolution is generated.
76. The method as claimed in claim 57, wherein depth information determined from images of the n-tuple that have the lowest resolution provided are transformed into a higher resolution by way of edge recognitions in the at least one image of higher resolution.
77. The method as claimed in claim 57, wherein a great number of n-tuples of images and appertaining depth information are processed in succession, so that a spatial display of moving images is made possible.
78. The method as claimed in claim 77, wherein the great number of n-tuples of images is subjected to spatial and temporal filtering.
79. A method of transmitting 3D information for the purpose of subsequent display for spatial perception without special aids, on the basis of at least two different views, comprising the steps of:
proceeding from at least one n-tuple of images with n>2, which characterize different viewing angles of an object or a scene;
determining the depth for at least three images; and
thereafter at least three images of the n-tuple, together with the respective depth information are transmitted, in a transmission channel.
80. The method of claim 79, wherein the depth information is in the form of depth maps.
81. The method of claim 79, wherein the n-tuple of images is a triple of images (n=3), with the three images having the same resolution.
US11/988,897 2006-11-22 2007-10-29 Arrangement and method for the recording and display of images of a scene and/or an object Abandoned US20100134599A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE102006055641A DE102006055641B4 (en) 2006-11-22 2006-11-22 Arrangement and method for recording and reproducing images of a scene and / or an object
DE102006055641.0 2006-11-22
PCT/DE2007/001965 WO2008061490A2 (en) 2006-11-22 2007-10-29 Arrangement and method for capturing and displaying images of a scene and/or an object

Publications (1)

Publication Number Publication Date
US20100134599A1 true US20100134599A1 (en) 2010-06-03

Family

ID=38596678

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/920,290 Expired - Fee Related US8330796B2 (en) 2006-11-22 2007-04-27 Arrangement and method for the recording and display of images of a scene and/or an object
US11/988,897 Abandoned US20100134599A1 (en) 2006-11-22 2007-10-29 Arrangement and method for the recording and display of images of a scene and/or an object

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/920,290 Expired - Fee Related US8330796B2 (en) 2006-11-22 2007-04-27 Arrangement and method for the recording and display of images of a scene and/or an object

Country Status (5)

Country Link
US (2) US8330796B2 (en)
EP (3) EP2095625A1 (en)
DE (1) DE102006055641B4 (en)
TW (2) TWI347774B (en)
WO (2) WO2008064617A1 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090002819A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Multiview autostereoscopic display device and multiview autostereoscopic display method
US20090002482A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Method for displaying three-dimensional (3d) video and video apparatus using the same
US20100135640A1 (en) * 2008-12-03 2010-06-03 Dell Products L.P. System and Method for Storing and Displaying 3-D Video Content
US20120050465A1 (en) * 2010-08-30 2012-03-01 Samsung Electronics Co., Ltd. Image processing apparatus and method using 3D image format
US20120169850A1 (en) * 2011-01-05 2012-07-05 Lg Electronics Inc. Apparatus for displaying a 3d image and controlling method thereof
US20120268559A1 (en) * 2011-04-19 2012-10-25 Atsushi Watanabe Electronic apparatus and display control method
US20140132599A1 (en) * 2010-02-03 2014-05-15 Samsung Electronics Co., Ltd. Image processing apparatus and method
US20150124062A1 (en) * 2013-11-04 2015-05-07 Massachusetts Institute Of Technology Joint View Expansion And Filtering For Automultiscopic 3D Displays
US9161010B2 (en) 2011-12-01 2015-10-13 Sony Corporation System and method for generating robust depth maps utilizing a multi-resolution procedure
US20160073091A1 (en) * 2014-09-10 2016-03-10 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and calibration of a three-dimensional measuring device
WO2016097470A1 (en) * 2014-12-15 2016-06-23 Nokia Technologies Oy Multi-camera system consisting of variably calibrated cameras
EP2651114A4 (en) * 2010-12-08 2016-11-16 Sony Corp Image capture device and image capture method
US9602811B2 (en) 2014-09-10 2017-03-21 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and controlling a three-dimensional measuring device
US9618620B2 (en) 2012-10-05 2017-04-11 Faro Technologies, Inc. Using depth-camera images to speed registration of three-dimensional scans
US9671221B2 (en) 2014-09-10 2017-06-06 Faro Technologies, Inc. Portable device for optically measuring three-dimensional coordinates
US9684078B2 (en) 2010-05-10 2017-06-20 Faro Technologies, Inc. Method for optically scanning and measuring an environment
US9769463B2 (en) 2014-09-10 2017-09-19 Faro Technologies, Inc. Device and method for optically scanning and measuring an environment and a method of control
US9967538B2 (en) 2013-11-04 2018-05-08 Massachussetts Institute Of Technology Reducing view transitions artifacts in automultiscopic displays
US10067231B2 (en) 2012-10-05 2018-09-04 Faro Technologies, Inc. Registration calculation of three-dimensional scanner data performed between scans based on measurements by two-dimensional scanner
US10070116B2 (en) 2014-09-10 2018-09-04 Faro Technologies, Inc. Device and method for optically scanning and measuring an environment
US10510163B2 (en) * 2017-01-13 2019-12-17 Kabushiki Kaisha Toshiba Image processing apparatus and image processing method
CN112995432A (en) * 2021-02-05 2021-06-18 杭州叙简科技股份有限公司 Depth image identification method based on 5G double recorders
US11350077B2 (en) 2018-07-03 2022-05-31 Faro Technologies, Inc. Handheld three dimensional scanner with an autoaperture

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090055803A (en) * 2007-11-29 2009-06-03 광주과학기술원 Method and apparatus for generating multi-viewpoint depth map, method for generating disparity of multi-viewpoint image
KR101420684B1 (en) * 2008-02-13 2014-07-21 삼성전자주식회사 Apparatus and method for matching color image and depth image
DE102008023501B4 (en) 2008-05-09 2011-04-14 Visumotion Gmbh Method and arrangement for the synchronous recording of at least two video data streams of different formats
US8866920B2 (en) 2008-05-20 2014-10-21 Pelican Imaging Corporation Capturing and processing of images using monolithic camera array with heterogeneous imagers
US8184143B2 (en) * 2008-06-27 2012-05-22 Sony Mobile Communications Ab Simulated reflective display
CN101911713B (en) 2008-09-30 2014-01-08 松下电器产业株式会社 Recording medium, reproduction device, system LSI, reproduction method, spectacle, and display device associated with 3D video
JP5329677B2 (en) 2009-01-27 2013-10-30 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Depth and video coprocessing
US8520020B2 (en) * 2009-12-14 2013-08-27 Canon Kabushiki Kaisha Stereoscopic color management
KR101214536B1 (en) * 2010-01-12 2013-01-10 삼성전자주식회사 Method for performing out-focus using depth information and camera using the same
US8687044B2 (en) 2010-02-02 2014-04-01 Microsoft Corporation Depth camera compatibility
US20120050495A1 (en) * 2010-08-27 2012-03-01 Xuemin Chen Method and system for multi-view 3d video rendering
CN107346061B (en) 2012-08-21 2020-04-24 快图有限公司 System and method for parallax detection and correction in images captured using an array camera
EP2901671A4 (en) * 2012-09-28 2016-08-24 Pelican Imaging Corp Generating images from light fields utilizing virtual viewpoints
US9386298B2 (en) * 2012-11-08 2016-07-05 Leap Motion, Inc. Three-dimensional image sensors
CN110213565B (en) 2013-01-15 2021-03-09 移动眼视力科技有限公司 Imaging system and method for depth calculation
EP3032828A4 (en) * 2013-08-06 2016-12-14 Sony Interactive Entertainment Inc Three-dimensional image generating device, three-dimensional image generating method, program, and information storage medium
CN104567758B (en) * 2013-10-29 2017-11-17 同方威视技术股份有限公司 Stereo imaging system and its method
TWI530909B (en) 2013-12-31 2016-04-21 財團法人工業技術研究院 System and method for image composition
CN104144335B (en) * 2014-07-09 2017-02-01 歌尔科技有限公司 Head-wearing type visual device and video system
US9621819B2 (en) 2015-07-09 2017-04-11 Chunghwa Picture Tubes, Ltd. Electronic device and multimedia control method thereof
US10469821B2 (en) * 2016-06-17 2019-11-05 Altek Semiconductor Corp. Stereo image generating method and electronic apparatus utilizing the method
TWI613904B (en) * 2016-06-17 2018-02-01 聚晶半導體股份有限公司 Stereo image generating method and electronic apparatus utilizing the method
DE102017120956A1 (en) 2017-09-11 2019-03-14 Thomas Emde Device for audio / visual recording and playback of images / films
DE102019109147A1 (en) * 2019-04-08 2020-10-08 Carl Zeiss Jena Gmbh PICTURE ACQUISITION SYSTEM AND METHOD OF CAPTURING PICTURES
US11039113B2 (en) * 2019-09-30 2021-06-15 Snap Inc. Multi-dimensional rendering
CA3168799A1 (en) * 2020-01-21 2021-07-29 Proprio, Inc. Methods and systems for augmenting depth data from a depth sensor, such as with data from a multiview camera system
US11503266B2 (en) * 2020-03-06 2022-11-15 Samsung Electronics Co., Ltd. Super-resolution depth map generation for multi-camera or other environments

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6052124A (en) * 1997-02-03 2000-04-18 Yissum Research Development Company System and method for directly estimating three-dimensional structure of objects in a scene and camera motion from three two-dimensional views of the scene
US6055012A (en) * 1995-12-29 2000-04-25 Lucent Technologies Inc. Digital multi-view video compression with complexity and compatibility constraints
US6198505B1 (en) * 1999-07-19 2001-03-06 Lockheed Martin Corp. High resolution, high speed digital camera
US6269175B1 (en) * 1998-08-28 2001-07-31 Sarnoff Corporation Method and apparatus for enhancing regions of aligned images using flow estimation
US6271876B1 (en) * 1997-05-06 2001-08-07 Eastman Kodak Company Using two different capture media to make stereo images of a scene
US6633676B1 (en) * 1999-05-27 2003-10-14 Koninklijke Philips Electronics N.V. Encoding a video signal
US20030231179A1 (en) * 2000-11-07 2003-12-18 Norihisa Suzuki Internet system for virtual telepresence
US20040027451A1 (en) * 2002-04-12 2004-02-12 Image Masters, Inc. Immersive imaging system
US20040104935A1 (en) * 2001-01-26 2004-06-03 Todd Williamson Virtual reality immersion system
US20050099689A1 (en) * 2003-09-18 2005-05-12 Reiko Fukushima Three-dimensional image display device
US20050230641A1 (en) * 2004-04-05 2005-10-20 Won Chun Data processing for three-dimensional displays
US7046271B2 (en) * 2000-01-25 2006-05-16 X3D Technologies Gmbh Method and system for the three-dimensional representation
US20060132610A1 (en) * 2004-12-17 2006-06-22 Jun Xin Multiview video decomposition and encoding
US20070121182A1 (en) * 2005-09-29 2007-05-31 Rieko Fukushima Multi-viewpoint image generation apparatus, multi-viewpoint image generation method, and multi-viewpoint image generation program

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08201941A (en) 1995-01-12 1996-08-09 Texas Instr Inc <Ti> Three-dimensional image formation
CN1126377C (en) * 1995-03-08 2003-10-29 皇家菲利浦电子有限公司 Three-dimensional image display system
JP3096613B2 (en) 1995-05-30 2000-10-10 三洋電機株式会社 3D display device
US6023291A (en) * 1996-10-16 2000-02-08 Space Systems/Loral, Inc. Satellite camera attitude determination and image navigation by means of earth edge and landmark measurement
EP1418766A3 (en) * 1998-08-28 2010-03-24 Imax Corporation Method and apparatus for processing images
GB2343320B (en) 1998-10-31 2003-03-26 Ibm Camera system for three dimentional images and video
JP2000321050A (en) 1999-05-14 2000-11-24 Minolta Co Ltd Method and apparatus for acquiring three-dimensional data
KR100375708B1 (en) * 2000-10-28 2003-03-15 전자부품연구원 3D Stereosc opic Multiview Video System and Manufacturing Method
CA2386560A1 (en) * 2002-05-15 2003-11-15 Idelix Software Inc. Controlling optical hardware and dynamic data viewing systems with detail-in-context viewing tools
JP4229398B2 (en) 2003-03-28 2009-02-25 財団法人北九州産業学術推進機構 Three-dimensional modeling program, three-dimensional modeling control program, three-dimensional modeling data transmission program, recording medium, and three-dimensional modeling method
WO2005124687A1 (en) 2004-06-16 2005-12-29 The University Of Tokyo Method for marker tracking in optical motion capture system, optical motion capture method, and system
DE102004061998A1 (en) * 2004-12-23 2006-07-06 Robert Bosch Gmbh Stereo camera for a motor vehicle

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6055012A (en) * 1995-12-29 2000-04-25 Lucent Technologies Inc. Digital multi-view video compression with complexity and compatibility constraints
US6052124A (en) * 1997-02-03 2000-04-18 Yissum Research Development Company System and method for directly estimating three-dimensional structure of objects in a scene and camera motion from three two-dimensional views of the scene
US6271876B1 (en) * 1997-05-06 2001-08-07 Eastman Kodak Company Using two different capture media to make stereo images of a scene
US6269175B1 (en) * 1998-08-28 2001-07-31 Sarnoff Corporation Method and apparatus for enhancing regions of aligned images using flow estimation
US6633676B1 (en) * 1999-05-27 2003-10-14 Koninklijke Philips Electronics N.V. Encoding a video signal
US6198505B1 (en) * 1999-07-19 2001-03-06 Lockheed Martin Corp. High resolution, high speed digital camera
US7046271B2 (en) * 2000-01-25 2006-05-16 X3D Technologies Gmbh Method and system for the three-dimensional representation
US20030231179A1 (en) * 2000-11-07 2003-12-18 Norihisa Suzuki Internet system for virtual telepresence
US20040104935A1 (en) * 2001-01-26 2004-06-03 Todd Williamson Virtual reality immersion system
US20040027451A1 (en) * 2002-04-12 2004-02-12 Image Masters, Inc. Immersive imaging system
US20050099689A1 (en) * 2003-09-18 2005-05-12 Reiko Fukushima Three-dimensional image display device
US20050230641A1 (en) * 2004-04-05 2005-10-20 Won Chun Data processing for three-dimensional displays
US20060132610A1 (en) * 2004-12-17 2006-06-22 Jun Xin Multiview video decomposition and encoding
US20070121182A1 (en) * 2005-09-29 2007-05-31 Rieko Fukushima Multi-viewpoint image generation apparatus, multi-viewpoint image generation method, and multi-viewpoint image generation program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Kyung-tae et al. (Synthesis of a high-resolution 3D-stereoscopic image pair from a high-resolution monoscopic image and a low-resolution depth map, 1998) *

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8619129B2 (en) * 2007-06-27 2013-12-31 Samsung Electronics Co., Ltd. Multiview autostereoscopic display device and multiview autostereoscopic display method
US20090002482A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Method for displaying three-dimensional (3d) video and video apparatus using the same
US20090002819A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Multiview autostereoscopic display device and multiview autostereoscopic display method
US20100135640A1 (en) * 2008-12-03 2010-06-03 Dell Products L.P. System and Method for Storing and Displaying 3-D Video Content
US20140132599A1 (en) * 2010-02-03 2014-05-15 Samsung Electronics Co., Ltd. Image processing apparatus and method
US9684078B2 (en) 2010-05-10 2017-06-20 Faro Technologies, Inc. Method for optically scanning and measuring an environment
US20120050465A1 (en) * 2010-08-30 2012-03-01 Samsung Electronics Co., Ltd. Image processing apparatus and method using 3D image format
EP2651114A4 (en) * 2010-12-08 2016-11-16 Sony Corp Image capture device and image capture method
CN102595157A (en) * 2011-01-05 2012-07-18 Lg电子株式会社 Apparatus for displaying a 3d image and controlling method thereof
US20120169850A1 (en) * 2011-01-05 2012-07-05 Lg Electronics Inc. Apparatus for displaying a 3d image and controlling method thereof
US9071820B2 (en) * 2011-01-05 2015-06-30 Lg Electronics Inc. Apparatus for displaying a 3D image and controlling method thereof based on display size
US20120268559A1 (en) * 2011-04-19 2012-10-25 Atsushi Watanabe Electronic apparatus and display control method
US9161010B2 (en) 2011-12-01 2015-10-13 Sony Corporation System and method for generating robust depth maps utilizing a multi-resolution procedure
US9739886B2 (en) 2012-10-05 2017-08-22 Faro Technologies, Inc. Using a two-dimensional scanner to speed registration of three-dimensional scan data
US11815600B2 (en) 2012-10-05 2023-11-14 Faro Technologies, Inc. Using a two-dimensional scanner to speed registration of three-dimensional scan data
US9618620B2 (en) 2012-10-05 2017-04-11 Faro Technologies, Inc. Using depth-camera images to speed registration of three-dimensional scans
US11112501B2 (en) 2012-10-05 2021-09-07 Faro Technologies, Inc. Using a two-dimensional scanner to speed registration of three-dimensional scan data
US11035955B2 (en) 2012-10-05 2021-06-15 Faro Technologies, Inc. Registration calculation of three-dimensional scanner data performed between scans based on measurements by two-dimensional scanner
US10739458B2 (en) 2012-10-05 2020-08-11 Faro Technologies, Inc. Using two-dimensional camera images to speed registration of three-dimensional scans
US9746559B2 (en) 2012-10-05 2017-08-29 Faro Technologies, Inc. Using two-dimensional camera images to speed registration of three-dimensional scans
US10203413B2 (en) 2012-10-05 2019-02-12 Faro Technologies, Inc. Using a two-dimensional scanner to speed registration of three-dimensional scan data
US10067231B2 (en) 2012-10-05 2018-09-04 Faro Technologies, Inc. Registration calculation of three-dimensional scanner data performed between scans based on measurements by two-dimensional scanner
US9967538B2 (en) 2013-11-04 2018-05-08 Massachussetts Institute Of Technology Reducing view transitions artifacts in automultiscopic displays
US20150124062A1 (en) * 2013-11-04 2015-05-07 Massachusetts Institute Of Technology Joint View Expansion And Filtering For Automultiscopic 3D Displays
US9756316B2 (en) * 2013-11-04 2017-09-05 Massachusetts Institute Of Technology Joint view expansion and filtering for automultiscopic 3D displays
US20180135965A1 (en) * 2014-09-10 2018-05-17 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and calibration of a three-dimensional measuring device
US10499040B2 (en) 2014-09-10 2019-12-03 Faro Technologies, Inc. Device and method for optically scanning and measuring an environment and a method of control
US9879975B2 (en) * 2014-09-10 2018-01-30 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and calibration of a three-dimensional measuring device
US9602811B2 (en) 2014-09-10 2017-03-21 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and controlling a three-dimensional measuring device
US20170292828A1 (en) * 2014-09-10 2017-10-12 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and calibration of a three-dimensional measuring device
US10070116B2 (en) 2014-09-10 2018-09-04 Faro Technologies, Inc. Device and method for optically scanning and measuring an environment
US10088296B2 (en) * 2014-09-10 2018-10-02 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and calibration of a three-dimensional measuring device
US9769463B2 (en) 2014-09-10 2017-09-19 Faro Technologies, Inc. Device and method for optically scanning and measuring an environment and a method of control
US10401143B2 (en) 2014-09-10 2019-09-03 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and controlling a three-dimensional measuring device
US9915521B2 (en) 2014-09-10 2018-03-13 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and controlling a three-dimensional measuring device
US9671221B2 (en) 2014-09-10 2017-06-06 Faro Technologies, Inc. Portable device for optically measuring three-dimensional coordinates
US9693040B2 (en) * 2014-09-10 2017-06-27 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and calibration of a three-dimensional measuring device
US20160073091A1 (en) * 2014-09-10 2016-03-10 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and calibration of a three-dimensional measuring device
WO2016097470A1 (en) * 2014-12-15 2016-06-23 Nokia Technologies Oy Multi-camera system consisting of variably calibrated cameras
US10510163B2 (en) * 2017-01-13 2019-12-17 Kabushiki Kaisha Toshiba Image processing apparatus and image processing method
US11350077B2 (en) 2018-07-03 2022-05-31 Faro Technologies, Inc. Handheld three dimensional scanner with an autoaperture
CN112995432A (en) * 2021-02-05 2021-06-18 杭州叙简科技股份有限公司 Depth image identification method based on 5G double recorders

Also Published As

Publication number Publication date
US20090315982A1 (en) 2009-12-24
WO2008061490A3 (en) 2008-08-28
EP2095625A1 (en) 2009-09-02
WO2008064617A1 (en) 2008-06-05
EP2800350A3 (en) 2014-11-12
TW200841704A (en) 2008-10-16
DE102006055641A1 (en) 2008-05-29
US8330796B2 (en) 2012-12-11
WO2008061490A2 (en) 2008-05-29
EP2800350A2 (en) 2014-11-05
TWI347774B (en) 2011-08-21
DE102006055641B4 (en) 2013-01-31
TW200824427A (en) 2008-06-01
EP2106657A2 (en) 2009-10-07

Similar Documents

Publication Publication Date Title
US8330796B2 (en) Arrangement and method for the recording and display of images of a scene and/or an object
Anderson et al. Jump: virtual reality video
JP5238429B2 (en) Stereoscopic image capturing apparatus and stereoscopic image capturing system
EP1836859B1 (en) Automatic conversion from monoscopic video to stereoscopic video
Huynh-Thu et al. Video quality assessment: From 2D to 3D—Challenges and future trends
Zilly et al. Production rules for stereo acquisition
Kauff et al. Depth map creation and image-based rendering for advanced 3DTV services providing interoperability and scalability
US6496598B1 (en) Image processing method and apparatus
US20100231689A1 (en) Efficient encoding of multiple views
KR20110124473A (en) 3-dimensional image generation apparatus and method for multi-view image
JP4939639B2 (en) Image processing apparatus, image processing method, program, and recording medium
JP2000215311A (en) Method and device for generating virtual viewpoint image
US11812009B2 (en) Generating virtual reality content via light fields
KR101933037B1 (en) Apparatus for reproducing 360 degrees video images for virtual reality
US20180262749A1 (en) Storing Data Retrieved from Different Sensors for Generating a 3-D Image
Zilly et al. Real-time generation of multi-view video plus depth content using mixed narrow and wide baseline
Knorr et al. Stereoscopic 3D from 2D video with super-resolution capability
US20110193937A1 (en) Image processing apparatus and method, and image producing apparatus, method and program
Knorr et al. An image-based rendering (ibr) approach for realistic stereo view synthesis of tv broadcast based on structure from motion
Gurrieri et al. Stereoscopic cameras for the real-time acquisition of panoramic 3D images and videos
KR20110025083A (en) Apparatus and method for displaying 3d image in 3d image system
Knorr et al. From 2D-to stereo-to multi-view video
WO2012014695A1 (en) Three-dimensional imaging device and imaging method for same
JP6595878B2 (en) Element image group generation apparatus and program thereof
US11729370B2 (en) Multi-perspective display driver

Legal Events

Date Code Title Description
AS Assignment

Owner name: VISUMOTION GMBH,GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BILLERT, RONNY;FERENC, TORMA;FUESSEL, DANIEL;AND OTHERS;REEL/FRAME:020412/0905

Effective date: 20071219

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION