WO2013060135A1 - 一种视频呈现方法和系统 - Google Patents

一种视频呈现方法和系统 Download PDF

Info

Publication number
WO2013060135A1
WO2013060135A1 PCT/CN2012/075544 CN2012075544W WO2013060135A1 WO 2013060135 A1 WO2013060135 A1 WO 2013060135A1 CN 2012075544 W CN2012075544 W CN 2012075544W WO 2013060135 A1 WO2013060135 A1 WO 2013060135A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional video
video
information
view
dimensional
Prior art date
Application number
PCT/CN2012/075544
Other languages
English (en)
French (fr)
Inventor
刘源
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP12844506.1A priority Critical patent/EP2739056A4/en
Publication of WO2013060135A1 publication Critical patent/WO2013060135A1/zh
Priority to US14/200,196 priority patent/US9392222B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/139Format conversion, e.g. of frame-rate or size
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/349Multi-view displays for displaying three or more geometrical viewpoints without viewer tracking
    • H04N13/351Multi-view displays for displaying three or more geometrical viewpoints without viewer tracking for displaying simultaneously
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/398Synchronisation thereof; Control thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/302Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays
    • H04N13/305Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays using lenticular lenses, e.g. arrangements of cylindrical lenses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/302Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays
    • H04N13/31Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays using parallax barriers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/356Image reproducers having separate monoscopic and stereoscopic modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking
    • H04N13/368Image reproducers using viewer tracking for two or more viewers

Definitions

  • the present invention relates to the field of remote presentation technologies, and in particular, to a video presentation method and system. Background technique
  • Telepresence can create an immersive virtual meeting environment that fully reflects the human factors of the participants and replicates the real experience of the participants as much as possible, which greatly enhances the acceptance of the end users. Increase usage, improve demand, return on investment, and user satisfaction.
  • Telepresence systems have many advantages over traditional video conferencing systems, including: the ability to provide life-size images, witness the effects of communication, smoother movements, and precise limb behavior of remote participants; HD, studio-level Video, lighting and audio effects; a unified conference environment that allows participants to feel at the same meeting place, ensuring consistency in the experience of different meeting locations; hiding conference equipment such as cameras, reducing the impact on users.
  • the two-dimensional video technology currently used in telepresence systems. At one end of a telepresence system, a plurality of display screens, sound/image capture devices, communication devices, and the like can be included.
  • the two-dimensional video technology is based on two-dimensional information. It can only express the content of the scene and ignore the depth information such as the distance and position of the object. It is incomplete.
  • a technical problem to be solved by embodiments of the present invention is to provide a video presentation method and system.
  • a telepresence system for 3D video can be implemented to improve the simulation of the telepresence system.
  • an embodiment of the present invention provides a video presentation method, including:
  • the plurality of viewpoint image streams are alternately displayed in order in the viewing area, wherein a distance between two adjacent viewpoints among the plurality of viewpoint image streams displayed in the viewing area is a lay length.
  • a video rendering system comprising:
  • a receiving module configured to receive a multi-view 3D video signal from a remote end
  • a determining module configured to determine a plurality of viewpoint image streams in the multi-view 3D video signal; and a display module, wherein the plurality of viewpoint image streams are alternately displayed in a viewing area, wherein displayed in the viewing area
  • the distance between two adjacent viewpoints in a plurality of viewpoint image streams is a lay length.
  • a three-dimensional video technology is adopted, and a multi-view three-dimensional video signal adapted to a telepresence system is selected according to characteristics of the three-dimensional video signal, and the three-dimensional video signal is displayed according to the viewing area, thereby ensuring The viewer using the system can effectively view the video with three-dimensional effects, and realizes the remote three-dimensional video rendering system with great practicality.
  • FIG. 1 is a schematic flowchart of a video presentation method in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a specific process of video capability negotiation in the embodiment of the present invention
  • FIG. 3 is another embodiment of video capability negotiation in the embodiment of the present invention
  • FIG. 4 is a schematic diagram of a specific process of the video presentation system in the embodiment of the present invention
  • FIG. 5 is a schematic diagram of a specific process of the negotiation module in the embodiment of the present invention
  • FIG. 6 is another schematic flowchart of a negotiation module in an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a specific composition of the remote presentation system layout 1 in the embodiment of the present invention
  • FIG. 8 is a schematic diagram of another specific composition of the remote presentation system layout 1 in the embodiment of the present invention
  • FIG. 10 is a specific composition of the telepresence system layout 3 (part) in the embodiment of the present invention
  • FIG. 11 is a schematic diagram showing a specific composition of a remote presentation system layout 3 (another part) in an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a specific composition of the remote presentation system layout 4 in the embodiment of the present invention
  • FIG. 13 is a schematic diagram of a specific composition of the remote presentation system layout 5 in the embodiment of the present invention
  • FIG. 14 is a remote embodiment of the present invention. A specific composition diagram of the system layout 6 is presented.
  • Multi-view eye 3D display is an ideal 3D rendering mode. Users can view 3D images in a wide range of viewing positions, and the 3D image angles of each position are different. However, since the viewing effect of the multi-view eye display has a great relationship with the viewing distance of the viewer and the viewing angle range, it is necessary to optimize the display effect of the multi-view eye 3D display for the remotely rendered scene to obtain the best 3D. Video experience.
  • the multi-view eye 3D technology in the embodiment of the present invention mainly includes Parallax Barriers-based technology, Lenticular Lenslets-based technology, Fresnel lens (Fresnel Lens) + Time Division Multiplexing (Time) -Multiplexed ) technology.
  • the multi-view eye 3D technology based on the parallax barrier is to place a fence type baffle in front of or behind the liquid crystal screen.
  • the observer's left or right eye passes through a strip on the baffle due to the shielding effect of the baffle.
  • the slit can only see a column of pixels in the parity column on the display, but not all column pixels at the same time, so that the two images consisting of the parity pixel columns respectively become a 3D image pair with horizontal parallax.
  • the visual function of the brain is finally synthesized into a 3D image with a sense of depth.
  • the principle of multi-view eye 3D technology based on lenticular grating is similar to that based on parallax barrier. It uses the principle of refraction of cylindrical lens unit to guide light into a specific observation area, and generate stereo image pairs corresponding to left and right eyes, and finally Stereo vision is produced by the fusion of the brain. Since the lenticular grating is transmissive Style. Therefore, the biggest advantage of the autostereoscopic display produced by this technology is that the display screen is not blocked, the display brightness is not affected, and the stereoscopic display effect is better. In order to solve the 2D/3D display switching problem, a technique of injecting a liquid crystal module to adjust the grating plate parameters in the cylindrical lens array can be solved.
  • the refractive index of the liquid crystal and the lens can be made uniform by applying an appropriate voltage, and the light passing through the lens layer does not undergo refraction; in the 3D display mode, no voltage is applied, so that the liquid crystal and the lens have different refractive indices, The light is refracted as it passes through the lens layer, enabling 3D display.
  • Fresnel lens + time division multiplexing technology is to obtain multi-view 3D images by increasing the time domain frame rate.
  • the solution consists of a high-speed CRT display, projection lens, liquid crystal shutter and Fresnel mirror.
  • the high-speed CRT and optical system enables a 2D image to be imaged in one part of the multi-view observation area at each moment.
  • the observer is as if it can see multiple images at the same time.
  • the imaging of each viewpoint requires 60 Hz, a total of 8 viewpoints, and the CRT requires a minimum of 480 Hz to allow the observer to see a flicker-free image of 8 viewpoints.
  • a plurality of viewpoint image streams are obtained according to the characteristics of the viewing area, and the image stream is When the viewing area is displayed, when the left eye of the person receives one viewpoint image stream and the right eye receives another adjacent viewpoint image stream (or vice versa), a 3D visual effect can be formed.
  • FIG. 1 it is a specific flowchart of a video presentation method in an embodiment of the present invention, where the method includes the following steps.
  • the system to which the method is applied includes a plurality of 3D imaging devices and a plurality of 3D display devices, the signal received here may be a plurality of multi-view 3D video signals from the far end, according to which different 3Ds may be presented. Video content.
  • 102 Determine a plurality of view image streams in the multi-view 3D video signal. Due to the multi-view 3D technology, the viewing effect and the width of the screen of the 3D display, the width of the viewing range, the number of viewpoints, and the optimal viewing distance are determined, when the display display 3D video, the viewing area information can be obtained first, and then according to The viewing area information determines a plurality of viewpoint image streams in the multi-view three-dimensional video signal.
  • the plurality of view image streams are alternately displayed in order in the viewing area, wherein a distance between two adjacent view points in the plurality of view image streams displayed in the viewing area is a lay length.
  • the foregoing embodiment may further include a negotiation process, that is, including step 100, and the remote end.
  • the 3D video capability negotiation is performed to determine 3D video capability information or 3D video stream information, so as to receive the multi-view 3D video signal from the far end according to the determined 3D video capability information or the 3D video stream information.
  • the specific negotiation process may be performed in two different manners, for example, receiving three-dimensional video capability information sent by the remote end; performing three-dimensional video capability adaptation according to the three-dimensional video capability information, and obtaining locally supported three-dimensional video capability information. Or, constructing the three-dimensional video capability information, and transmitting the three-dimensional video capability information to the remote end, so that the third-party video capability information is transmitted by the remote end to transmit the three-dimensional video signal according to the three-dimensional video capability information. As shown in Figure 2 and Figure 3, the specific process of negotiation is performed in two modes.
  • the negotiation process in this embodiment includes the following steps.
  • the sender first constructs a 3D video capability information parameter according to its own 3D video capability (or 3D capability).
  • the 3D video capability information parameter may include a 3D video capture end parameter and a 3D video display end parameter.
  • the 3D video display terminal parameters include one or more of a number of 3D video display devices, a 3D video display device type, a number of viewpoints, an ideal viewing distance, and a maximum display parallax.
  • the 3D video collection end parameter includes one or more of a number of 3D video capture devices, a 3D video capture device type, and a spatial position relationship of the 3D video capture device.
  • the 3D camera can be a camera that actually exists in telepresence, or a virtual camera that is not real, such as a virtual viewpoint camera that can be a 3D video rendered by a computer.
  • 3D video capability information can be described in a variety of formats, such as ASN.1 Abstract Syntax Notation One, or Extensible Markup Language (XML) language, or a cylinder A single text format is described.
  • ASN.1 Abstract Syntax Notation One or Extensible Markup Language (XML) language
  • XML Extensible Markup Language
  • the sender sends the 3D video capability information parameter to the receiver by using a signaling message.
  • the receiver receives the 3D video capability information parameter.
  • the receiver After receiving the 3D video capability information parameter, the receiver adapts according to its own 3D video capability, for example, according to its own 3D video rendering capability, determines whether to receive the video stream and receive which video streams, or decide how to render the 3D. Video stream. And make a determination to the sender.
  • the sender sends the 3D video stream.
  • the receiver receives the 3D video stream sent by the sending end according to the foregoing adaptation. 207. The receiver decodes the 3D video stream and renders the display.
  • the negotiation mode includes the following steps.
  • the receiver first constructs information parameters according to its own 3D capabilities.
  • the 3D video capture end parameters and the 3D video display end parameters of the receiver described above are described by using the ASN.1 abstract syntax mark, or are described by XML, or are described in a text format of the package.
  • the 3D capability information collected by the camera may be described by using a set of multiple attributes of the device, and multiple attributes are separated by spaces.
  • the 3D capability information as shown in Tables 1 to 5 below is formed.
  • the 3D capability information collected by the camera may be described by using a set of multiple attributes of the following single, and multiple attributes are separated by spaces, as shown in Table 1.
  • Table 1 Table 1:
  • IDENTITY camera identifier
  • TYPE Camera type, which can be a camera that outputs a video image (video) or a camera that outputs a depth image (depth).
  • POSITION Location information that describes the positional relationship of the camera. There are several ways to describe the location information. One way is to define the origin of a position as XYZ coordinates, and then use the rotation matrix and translation vector to represent. The rotation matrix R is a matrix of 3 X 3 and the translation vector is a column vector of 3 X 1 . Another way is to use a predefined positional relationship, such as left, center, or right, or an identifier that represents a location, such as P0, PI .
  • the 3D capability information of the video display side (or render side) can be described in the manner shown in Table 2.
  • IDENTITY display identifier
  • TYPE Display device type, which can be one of the following types of display devices: 2D display (2d), 3D display with glasses (including red-blue glasses, polarized glasses, polarized glass, time-multiplexing-glass, etc.), autostereoscopic ), multiview display (multiview) and other types.
  • POSITION Displays device location information in the same format as the camera's location.
  • the 3D video capability information in another embodiment of the present invention is further described below.
  • two 2D cameras CO left camera
  • C1 right camera
  • the 3D video capability information in this example is described in the form of Table 3.
  • a 2D camera and a depth camera are used
  • 3D video capture and video playback via a 24-inch single-view autostereoscopic display The positional relationship between the camera and the display is described by the rotation matrix and the translation vector.
  • the 3D video capability information in this example is described in Table 4.
  • DO autostereoscopic [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
  • eight cameras positions P0 to P7, respectively
  • a 50-inch multi-view display positions left, center, and right
  • the optimal viewing distance is 2.5m
  • the viewing angle range is lm
  • the maximum parallax is 50 pixels.
  • Table 5 its 3D video capabilities are shown in Table 5.
  • the receiver sends the 3D capability information parameter to the sender by using a signaling message.
  • the sender receives the 3D capability information parameter.
  • the sender learns the 3D capability of the receiver (for example, the number of supported 3D video streams and the coding mode, etc.) through the received 3D capability information parameter, performs 3D capability adaptation with the local 3D capability, and then confirms to the receiver.
  • the sender's 3D capability of the receiver is adapted to the local 3D capability of the display
  • the 3D capability of the sender's display is adapted to the local 3D capability of the acquisition.
  • the sender performs coding and sending of the video according to the adapted situation.
  • the receiver receives the 3D video stream sent by the sending end.
  • the receiver decodes the 3D video stream and renders the display.
  • the description of the 3D video stream characteristic may also be included, for example, whether it is a 3D video stream, and the data content of the 3D video stream (for example, 2D video data or depth/disparity data) ), the encoding method of the 3D video stream (for example, whether to use spatial domain packing mode coding or scalable resolution to enhance frame compatible stereo mode coding, etc.), other parameters of the 3D video stream (including resolution, frame rate, required bandwidth, etc.) And establishing a correspondence between the video camera and the acquisition camera described in the 3D capability information, a correspondence between the video stream and the presented display device, and the like.
  • the data content of the 3D video stream for example, 2D video data or depth/disparity data
  • the encoding method of the 3D video stream for example, whether to use spatial domain packing mode coding or scalable resolution to enhance frame compatible stereo mode coding, etc.
  • other parameters of the 3D video stream including resolution, frame rate, required bandwidth, etc.
  • a 3D video consisting of two demo video streams, wherein V0 and VI are 2D video streams, respectively corresponding to the cameras CO and C1, and a stereoscopic video stream is formed by a combination setl, in the display Rendering on the DO.
  • V0 and VI are encoded by H.264/AVC with a resolution of 1920 X 1080, a frame rate of 60 frames, and a occupied bandwidth of 4M.
  • the 3D video stream information in this example is described in Table 6.
  • a 1-way 3D video stream V0 composed of 2D video of the left and right object cameras C0 and C1 is presented on display D0. It is encoded in the airspace packing mode of Side-By-Side in the format of H.264/AVC. The resolution is 1920 X 1080, the frame rate is 60 frames, and the occupied bandwidth is 4M. The description of the 3D video stream information in this example is shown in Table 7. Table 7:
  • a 3D video stream V0 composed of 2D videos of 3 object cameras C0, CI and C2, , rendered on display D0.
  • the encoding is performed by the MVC method.
  • the resolution is 1920 ⁇ 1080
  • the frame rate is 30 frames
  • the occupied bandwidth is 8M.
  • the 3D video stream information in this example is described in Table 8.
  • the two parties may perform the subsequent codec process in multiple manners, which are described below.
  • the most simple case is that the two parties are homogeneous telepresence systems, and the remote end can decode and present the 3D video stream according to the local end.
  • the receiving end is a telepresence system that only supports 2D video, cannot render 3D video, or does not support the local 3D video encoding mode, so it can only decode 2D video data in the 3D video stream. And carry out 2D rendering.
  • the 3D video data can decode only one of the 2D video streams without decoding other video streams; for the 3D video data using the spatial domain packing method, one of the 2D video images can be decoded and rendered;
  • the rate-enhanced frame-compatible stereoscopic 3D video data can be used in a manner of decoding only the 2D video data of the base layer without decoding the enhancement layer data.
  • the telepresence system at the receiving end can also support 3D video, but the supported rendering mode is different from that of the local end.
  • the receiving end system needs to render the 3D video mode according to the local rendering mode. For example, according to the 3D display mode and the maximum parallax rendering capability of the local display device, the images of the left and right viewpoints are regenerated according to the decoded 2D video data and the disparity/depth map.
  • an MCU is required for 2D/3D video encoding conversion.
  • 3D to 2D video transcoding can be performed by the MCU.
  • it can also be transcoded by the MCU to adapt to telepresence systems that support different 3D video encoding and decoding formats.
  • the sender can send the 3D video data in the airspace packet mode, and the receiver can only receive the 3D video data in the scalable resolution enhanced frame compatible stereo mode, and the MCU performs the conversion of the two encoding formats.
  • an embodiment of the present invention further provides a video presentation system, where the system includes the following modules: a receiving module 1 configured to receive a multi-view 3D video signal from a remote end; Determining a plurality of viewpoint image streams in the multi-view 3D video signal; the display module 3 alternately displaying the plurality of viewpoint image streams in the viewing area, wherein the plurality of viewpoints displayed in the viewing area The distance between two adjacent viewpoints in the image stream is the pupil distance.
  • the receiving module 1 is further configured to receive a plurality of multi-view three-dimensional video signals from a remote end; the system includes a plurality of display modules 3 respectively displaying a plurality of viewpoint images determined according to the plurality of multi-view three-dimensional video signals flow.
  • system may further include: an obtaining module 4, configured to obtain viewing area information; and the determining module 2 is further configured to determine, according to the viewing area information, a plurality of viewpoint image streams in the multi-view three-dimensional video signal.
  • the system may further include: a negotiation module 5, configured to perform three-dimensional video capability negotiation with the remote end, determine three-dimensional video capability information or three-dimensional video stream information, so that the receiving module is determined according to the determined three-dimensional view
  • the frequency capability information and the three-dimensional video stream information receive the multi-view 3D video signal from the far end.
  • the three-dimensional video capability information and the three-dimensional video stream information include three-dimensional video collection end parameters and three-dimensional video display end parameters.
  • the three-dimensional video display end parameter includes one or more of a number of three-dimensional video display devices, a three-dimensional video display device type, a number of viewpoints, an ideal viewing distance, and a maximum display parallax; and the three-dimensional video collection end parameters include three-dimensional video capture.
  • the number of devices, the type of 3D video capture device, and the spatial positional relationship of the 3D video capture device include one or more of the number of devices, the type of 3D video capture device, and the spatial positional relationship of the 3D video capture device.
  • the negotiation module 5 may include: a receiving submodule 50, configured to receive three-dimensional video capability information or three-dimensional video stream information sent by the remote end, and an adaptation sub-module 52, configured to use, according to the three-dimensional video capability information Or 3D video capability adaptation with 3D video stream information to obtain locally supported 3D video capability information or 3D video stream information.
  • a receiving submodule 50 configured to receive three-dimensional video capability information or three-dimensional video stream information sent by the remote end
  • an adaptation sub-module 52 configured to use, according to the three-dimensional video capability information Or 3D video capability adaptation with 3D video stream information to obtain locally supported 3D video capability information or 3D video stream information.
  • the negotiation module 5 may include: a construction sub-module 51, configured to construct three-dimensional video capability information or three-dimensional video stream information; and a sending sub-module 53 configured to use the three-dimensional video capability information or three-dimensional The video stream information is sent to the remote end, so that the third-party video capability information is transmitted by the remote end, and the three-dimensional video signal is sent according to the three-dimensional video capability information or the three-dimensional video stream information.
  • a construction sub-module 51 configured to construct three-dimensional video capability information or three-dimensional video stream information
  • a sending sub-module 53 configured to use the three-dimensional video capability information or three-dimensional The video stream information is sent to the remote end, so that the third-party video capability information is transmitted by the remote end, and the three-dimensional video signal is sent according to the three-dimensional video capability information or the three-dimensional video stream information.
  • the display 2 in the middle is a multi-view eye 3D display (2D/3D display switching is possible), and the adjacent displays 1 and 3 are ordinary 2D displays.
  • the viewing angle range A1 (shaded area in the figure) of the display 2 covers exactly the two seats 16 and 17 in the middle, and the optimum viewing distance is the vertical distance from the display screen surface to the seats 16 and 17.
  • the images of the plurality of viewpoints alternately appear in the viewing area (the dotted line portion in the figure) in order, and the distance between the adjacent two viewing viewpoints is the pupil distance, so that the user 1002 And the left and right eyes of 1003 can simultaneously see the images of the two viewpoints, thereby forming stereoscopic (ie, 3D) vision.
  • the viewing viewpoint covers the A1 area, the user can see the 3D video in this area, but the user can only see the 2D video by watching screens 1 and 3.
  • This layout does not require a high viewing angle for a multi-view 3D display, and the viewing angle range only needs to cover a small area.
  • the display 2 has the function of the display module in the above embodiment, and the functions of the receiving module and the determining module in the above embodiment can be integrated in the display 2 (in this case, the display 2 is required to have corresponding processing) Capabilities), can also be implemented by a processor with processing power; similarly, the functions of the negotiation module and the acquisition module can have similar settings.
  • Capabilities can also be implemented by a processor with processing power; similarly, the functions of the negotiation module and the acquisition module can have similar settings.
  • the systems in the following embodiments are similar, and are not mentioned here.
  • FIG. 8 a multi-row implementation based on the layout of FIG. 7 is shown, which is similar to the single-row mode (the 3D imaging principle is the same as in FIG. 7, and the dotted-point view portion is not drawn here for the illustration). .
  • the system layout 2 in the embodiment of the present invention (the 3D imaging principle is the same as the layout in Fig. 7, in order to illustrate the illustration, the dotted line view portion is not drawn here).
  • the displays 1, 2, and 3 are all multi-view eye 3D displays (2D/3D display switching is possible).
  • the viewing angle range of the display 2 A10 covers the middle two seats 16 and 17, the viewing angle of the display 1 A20 covers the seats 14 and 15, and the viewing angle range A30 of the display 3 covers the seats 18 and 19.
  • the person in each seat can view the display closest to him to get the desired 3D experience, and the two seats 16 and 17 in the middle usually get the ideal 3D experience when viewing displays 1 and 3.
  • FIG. 10 and FIG. 11 it is a design layout 3 of a multi-view 3D presentation in a telepresence system according to an embodiment of the present invention (the 3D imaging principle is the same as that of the layout 1 , and the dotted line is not drawn here for the illustration. Viewpoint part).
  • the displays 1, 2, and 3 are all multi-view eye 3D displays (2D/3D display switching is possible), and the viewing angle ranges A100, A200, and A300 cover the entire user area.
  • the viewing angle of each display is related to the relative position of the display to the user's seat, so the viewing angle range may be different and different manufacturing processes are required.
  • the principle of multi-row mode is similar to that of single row, and will not be described here.
  • FIG. 12 it is a design layout 4 of a multi-view 3D presentation in a telepresence system according to an embodiment of the present invention (the 3D imaging principle is the same as that of the layout 1, and the dotted line is not drawn here for the illustration. Viewpoint part).
  • a side-by-side independent multi-view 3D display 101 is employed to provide 3D rendering. Therefore, in the design of the display, it is necessary to make the viewing angle range A400 cover the entire user seat area.
  • the principle of the multi-row mode is similar to that of the single row, and will not be described here.
  • FIG. 13 it is a design layout 5 of a multi-view 3D presentation in a telepresence system according to an embodiment of the present invention (the 3D imaging principle is the same as that of the layout 1 , and the dotted viewpoint portion is not drawn here for the illustration) .
  • a separate multi-view 3D display 102 placed below the intermediate display is employed to provide 3D rendering.
  • Viewing Range of Views The A500 range must also cover the entire user seat area for the best 3D experience.
  • the principle of the multi-row mode is similar to that of the single row, and will not be described here.
  • the auxiliary displays 20, 21, and 22 are displays that support 2D/3D display.
  • the 3D viewing angle range of 20 covers the viewing angle range of the seats 14, 15, 21 covering the viewing angle range of the seats 16, 17, 22 as 18, 19, so that the user of each seating area can see the assistance of the area.
  • the 3D video of the display for optimal viewing.
  • the principle of the multi-row mode is similar to that of the single row, and will not be described here.
  • a three-dimensional video technology is adopted, and a multi-view three-dimensional video signal adapted to a telepresence system is selected according to characteristics of the three-dimensional video signal, and the three-dimensional video signal is displayed according to the viewing area, thereby ensuring The viewer using the system can effectively view the video with three-dimensional effects and realize the remote 3D video rendering system.
  • the embodiment of the present invention further provides a specific solution for performing 3D video information negotiation in a remote system, and realizes effective coordination when the two ends having different 3D video capabilities form a remote system.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Abstract

本发明实施例公开了一种视频呈现方法和系统,该方法包括:接收来自远端的多视点三维视频信号;确定所述多视点三维视频信号中的多个视点图像流;在观看区域中按顺序交替显示所述多个视点图像流,其中,在所述观看区域中显示的多个视点图像流中两个相邻视点的距离为瞳距。采用本发明,可以实现远程三维视频呈现。

Description

一种视频呈现方法和系统 本申请要求于 2011 年 10 月 28 日提交中国专利局、 申请号为 201110334678.X、发明名称为"一种视频呈现方法和系统"的中国专利申请的优 先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及远程呈现技术领域, 尤其涉及一种视频呈现方法和系统。 背景技术
远程呈现系统(Telepresence ) 可以创建沉浸式虚拟会议环境, 该虚拟会 议环境充分体现了参与者的人性化因素, 并尽可能地复制参与人的真实体验, 能够极大地提高最终用户的接受度, 从而提高使用率, 改进需求、 投资回报 率和用户的满意度。
远程呈现系统相比传统视讯会议系统具有很多优点, 包括: 能够提供真 人大小的图像, 目艮神交流效果, 更加流畅的运动以及远端参会者精确的肢体 行为; 高清的、 演播室级的视频、 光照和音频效果; 统一的会议环境, 使参 与者感觉处于相同的会议地点, 确保了不同会议地点体验的一致性; 隐藏了 摄像机等会议设备, 减少对用户的影响等。
目前远程呈现系统采用的二维视频技术。 在一个远程呈现系统的一端可 以包括多个显示屏幕、 声音 /图像采集设备、 通讯设备等。 但是二维视频技术 是以二维信息为载体形式, 只能表现出景物的内容而忽略了物体的远近、 位 置等深度信息, 是不完整的。
发明内容
本发明实施例所要解决的技术问题在于, 提供一种视频呈现方法和系统。 可以实现三维视频的远程呈现系统, 提高远程呈现系统的仿真度。
为此, 本发明实施例提供了一种视频呈现方法, 包括:
接收来自远端的多视点三维视频信号; 确定所述多视点三维视频信号中的多个视点图像流;
在观看区域中按顺序交替显示所述多个视点图像流, 其中, 在所述观看 区域中显示的多个视点图像流中两个相邻视点的距离为瞳距。
以及, 一种视频呈现系统, 包括:
接收模块, 用于接收来自远端的多视点三维视频信号;
确定模块, 用于确定所述多视点三维视频信号中的多个视点图像流; 显示模块, 在观看区域中按顺序交替显示所述多个视点图像流, 其中, 在所述观看区域中显示的多个视点图像流中两个相邻视点的距离为瞳距。
在本发明实施例的视频呈现系统中采用的是三维视频技术, 并根据三维 视频信号的特点选择了适应于远程呈现系统的多视点三维视频信号, 根据观 看区域进行三维视频信号的显示, 保证了使用系统的观看者可以有效的观看 到具有三维效果的视频, 极具实用性的实现了远程三维视频呈现系统。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1是本发明实施例中的视频呈现方法的一个具体流程示意图; 图 2是本发明实施例中进行视频能力协商的一个具体流程示意图; 图 3是本发明实施例中进行视频能力协商的另一个具体流程示意图; 图 4是本发明实施例中的视频呈现系统的一个具体流程示意图; 图 5是本发明实施例中的协商模块的一个具体流程示意图;
图 6是本发明实施例中的协商模块的另一个具体流程示意图;
图 7是本发明实施例中的远程呈现系统布局 1的一个具体组成示意图; 图 8是本发明实施例中的远程呈现系统布局 1的另一个具体组成示意图; 图 9是本发明实施例中的远程呈现系统布局 的一个具体组成示意图; 图 10是本发明实施例中的远程呈现系统布局 3 (部分) 的一个具体组成 示意图;
图 11是本发明实施例中的远程呈现系统布局 3 (另一部分) 的一个具体 组成示意图;
图 12是本发明实施例中的远程呈现系统布局 4的一个具体组成示意图; 图 13是本发明实施例中的远程呈现系统布局 5的一个具体组成示意图; 图 14是本发明实施例中的远程呈现系统布局 6的一个具体组成示意图。
具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
本发明实施例中采用的多视点棵眼三维( Three Dimensions , 3D )技术。 多视点棵眼 3D显示, 是一种比较理想的 3D呈现方式, 用户可以在一个较大 范围内的多个观看位置上观看到 3D图像, 每个位置的 3D图像视角都不同。 但由于多视点棵眼显示器的观看效果和观看者的观看距离及观看视角范围有 很大关系, 因此需要针对远程呈现的场景对多视点棵眼 3D显示器的显示效果 进行优化才能获得最佳的 3D视频体验。
本发明实施例中的多视点棵眼 3D 技术主要包括基于视差挡板 ( Parallax Barriers ) 的技术、 基于柱镜光栅 ( Lenticular Lenslets ) 的技术、 菲涅尔透镜 ( Fresnel Lens ) +时分复用 ( Time-Multiplexed ) 的技术。
如,基于视差挡板的多视点棵眼 3D技术是在液晶屏的前方或者后方放置 一块栅栏式的挡板, 由于挡板的遮挡作用, 观察者的左眼或右眼通过挡板上 的一条狭缝只能看到显示屏上的奇偶列中的某一列像素, 而不能同时看到所 有列像素, 这样分别由奇偶像素列组成的两幅图像就成了具有水平视差的 3D 图像对, 通过大脑的视觉作用, 最终合成为具有深度感的 3D图像。
基于柱镜光栅的多视点棵眼 3D技术原理与基于视差挡板的技术类似,其 利用柱透镜单元的折射原理, 引导光线进入特定的观察区, 产生对应左右眼 的立体图像对, 并最终在大脑的融合下产生立体视觉。 由于柱镜光栅是透射 式的。 因此利用这种技术生产的自由立体显示器的最大优点是不遮挡显示画 面, 不影响显示亮度, 立体显示效果比较好。 为了解决 2D/3D显示切换问题, 可以在柱透镜阵列中注入液晶模块调节光栅板参数的技术。 在 2D模式下, 可 以通过施加适当电压使液晶和透镜的折射率一致, 通过透镜层的光线不会发 生折射作用; 在 3D显示模式下, 不施加电压, 使得液晶和透镜具有不同的折 射率, 光线在通过透镜层时就会发生折射作用, 从而能够进行 3D显示。
菲涅尔透镜 +时分复用的技术是通过提高时域帧率的方式来获得多视点 3D 图像。 该方案由一个高速的 CRT显示器, 投影镜头, 液晶快门以及菲涅 尔镜等部分构成, 高速 CRT和光学系统使在每一个时刻有一个 2D图像在多 视点观察区域的一个部分成像, 当多个视点的图像成像足够快时, 观察者就 像能同时看到多个图像一样。 例如每个视点的成像需要 60Hz, —共有 8个视 点, 则 CRT最低需要有 480Hz才能使观察者看到 8个视点的无闪烁图像。
在了解了上述多视点棵眼 3D显示技术的原理后,可以更容易的理解本发 明实施例中的视频呈现方法的原理: 根据观看区域的特点来获得多个视点图 像流, 当将图像流在观看区域显示时, 当人的左目艮接收一个视点图像流、 右 目艮接收相邻的另一个视点图像流(反之亦然) 时, 则可以形成 3D视觉效果。
如图 1所示, 为本发明实施例中的视频呈现方法的一个具体流程示意图, 该方法包括下述步骤。
101、 接收来自远端的多视点三维视频信号。 若应用该方法的系统中包括 多个 3D摄像设备和多个 3D显示设备, 则此处接收的信号则可为来自远端的 多个多视点三维视频信号, 根据这些信号可呈现出不同的 3D视频内容。
102、确定所述多视点三维视频信号中的多个视点图像流。由于多视点 3D 技术中, 观看效果与 3D显示器的屏幕的宽度, 观看范围的宽度, 视点数量以 及最优的观看距离决定, 则在设置显示器显示 3D视频时, 可以先获得观看区 域信息, 再根据所述观看区域信息确定所述多视点三维视频信号中的多个视 点图像流。
103、 在观看区域中按顺序交替显示所述多个视点图像流, 其中, 在所述 观看区域中显示的多个视点图像流中两个相邻视点的距离为瞳距。
在上述实施例中还可以进一步包括一协商过程, 即包括步骤 100、 与远端 进行三维视频能力协商, 确定三维视频能力信息或和三维视频流信息, 以便 根据确定的三维视频能力信息或和三维视频流信息接收来自远端的多视点三 维视频信号。
在具体协商过程时可以两种不同的方式, 如, 接收远端发送的三维视频 能力信息; 根据所述三维视频能力信息进行三维视频能力适配, 获得本地支 持的三维视频能力信息。 或, 构造三维视频能力信息; 将所述三维视频能力 信息发送至远端, 以便远端进行三维视频能力适配后根据所述三维视频能力 信息发送三维视频信号。 如图 2和图 3所示, 分别为两种方式下进行协商的 具体过程。
如图 2所示, 本实施例中的协商过程包括如下步骤。
201、 发送方首先根据自己的 3D视频能力(或筒称 3D能力 )构造 3D视 频能力信息参数。 如, 3D视频能力信息参数可包括 3D视频采集端参数和 3D 视频显示端参数。 所述 3D视频显示端参数包括 3D视频显示设备数量、 3D 视频显示设备类型、 视点数、 理想观看距离以及最大显示视差中的一个或多 个。 所述 3D视频采集端参数包括 3D视频采集设备数量、 3D视频采集设备 类型、 3D视频采集设备空间位置关系中的一个或多个。 需要注意的是 3D摄 像机可以是远程呈现中实际存在的摄像机, 也可以是并不真实存在的虚拟摄 像机, 例如可以是由计算机渲染得到的 3D视频的虚拟视点摄像机。
3D视频能力信息可以通过多种格式进行描述,例如采用 ASN.1抽象语法 标记(Abstract Syntax Notation One ) 进行描述, 或者采用可扩展标记语言 (Extensible Markup Language, XML)语言进行描述, 或是采用筒单的文本格式 进行描述。
202、 发送方通过信令消息向接收方发送 3D视频能力信息参数。
203、 接收方接收 3D视频能力信息参数。
204、 接收方在收到 3D视频能力信息参数后根据自己的 3D视频能力进 行适配,例如根据自己的 3D视频渲染能力决定是否接收视频流及接收哪几个 视频流, 或是决定如何渲染 3D视频流。 并向发送方进行确定。
205、 在 3D视频能力适配后, 发送方发送 3D视频流。
206、 接收方根据前述适配的情况接收发送端发过来的 3D视频流。 207、 接收方对 3D视频流进行解码并渲染显示。
如图 3所示, 该协商模式包括如下步骤。
301、 接收方首先根据自己的 3D能力构造信息参数。 如, 将上述描述的 接收方的 3D视频采集端参数和 3D视频显示端参数,通过采用 ASN.1抽象语 法标记进行描述, 或者采用 XML进行描述, 或是采用筒单的文本格式进行描 述。 例如, 在本发明实施例中可以采用筒单的多个属性的集合对摄像机采集 的 3D能力信息进行描述, 多个属性之间通过空格区分。 形成如下述表 1~表5 的 3D能力信息。
例如, 在本发明实施例中可以采用如下筒单的多个属性的集合对摄像机 采集的 3D能力信息进行描述, 多个属性之间通过空格区分, 如表 1所示。 表 1:
IDENTITY TYPE POSITION RESOLUTION FRAMERATE . . .
IDENTITY: 摄像机标识符
TYPE: 摄像机类型, 可以是输出视频图像的摄像机 (video ), 也可以是输出深度图像 ( depth) 的摄像机。
POSITION : 位置信息, 描述摄像机的位置关系。 位置信息的描述有多种方式, 一种方 式是定义某个位置为 XYZ 坐标的原点, 然后采用旋转矩阵和平移向量进行表示。 旋转 矩阵 R为 3 X 3的矩阵, 平移向量为 3 X 1的列向量。 另一种方式是采用预先定义好的 某种位置关系, 例如左、 中、 右方式, 或是表示位置的标识符, 例如 P0, PI . .
RESOLUTION : 分辨率
FRAMERATE: 帧率
其它属性… 视频显示端(或称渲染端)的 3D能力信息可以采用如表 2所显示的方式 描述。
表 2:
IDENTITY TYPE POSITION SIZE RESOLUTION FRAMERATE VIEWPOINTS DISTANCE WIDTH PARALLAX . . .
IDENTITY: 显示器标识符
TYPE : 显示设备类型, 可以是如下类型的显示设备中的一种: 2D显示器 (2d), 需要带眼镜的 3D显示器 (包括红蓝眼镜 red-blue-glass, 偏振 光目艮镜 polarize— glass禾口时分目艮镜 time-multiplexing-glass等), 自动立体 显示器 (autostereoscopic), 多视点显示器 (multiview) 等类型。
POSITION: 显示设备位置信息, 格式和摄像机的位置信息一致。
SIZE: 显示器尺寸
RESOLUTION: 显示器分辨率。
FRAMERATE: 帧率
VIEWPOINTS: 视点数量
DISTANCE: 观看距离
WIDTH: 观看范围
PARALLAX: 最大视差
其它属性… 以下进一步描述本发明的另一实施例中的 3D视频能力信息, 在本例中, 采用了两个 2D摄像机 CO (左摄像机)和 C1 (右摄像机 )构成双目立体摄像 机进行 3D视频采集, 以及使用 24寸单视点自动立体显示器进行视频回放, 显示器的位置在中间, 最佳观看距离为 1.5m, 可视范围 30cm, 最大视差 30 像素。 则本例中的 3D视频能力信息如表 3的形式描述。
表 3:
CO video left 1920,1080 60
CI video right 1920,1080 60
DO autostereoscopic center 24 1920,1080 60 1 1.5 30 30 在本发明的另一实施例中,采用一个 2D摄像机和深度摄像机的方式进行
3D视频采集, 并且通过 24寸单视点自动立体显示器进行视频回放。 摄像机 和显示器的位置关系都通过旋转矩阵和平移向量描述,则本例中的 3D视频能 力信息如表 4所述。
表 4:
CO video [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0] [0.0, 0.0,
0.0] 1920,1080 60
CI depth [-0.056728, -0.996211, -0.065927, -0.969795, 0.070673, -0.233457, 0.237232, 0.050692, -0.970130] [100.09, 30.42, 475.85] 1280,720 60
DO autostereoscopic [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
[0.0, 0.0, 0.0] 24 1920,1080 60 1 1.5 30 30 在本发明的另一实施例中, 采用了 8个摄像机(位置分别为 P0~P7)构成 了摄像机阵列进行 3D视频的采集, 并且采用 3台 8个视点的 50寸多视点显 示器(位置为左、 中、 右)进行呈现。 最佳观看距离 2.5m, 视角范围 lm, 最 大视差 50像素。 则其 3D视频能力如表 5所示。
表 5:
CO video P0 1920,1080 60 I
CI video PI 1920,1080 60
C7 video P7 1920,1080 60
DO multiview left 50 1280,720 60 8 2.5 100 50
Dl multiview center 50 1280,720 60 8 2.5 100 50
D2 multiview right 50 1280,720 60 8 2.5 100 50
302、 接收方通过信令消息向发送方发送 3D能力信息参数。
303、 发送方接收 3D能力信息参数。
304、 发送方通过接收到的 3D能力信息参数得知接收方的 3D能力 (例 如支持的 3D视频流数量和编码方式等),与本地的 3D能力进行 3D能力适配, 然后向接收方进行确认。 如, 将发送方的采集端 3D能力与本地的显示端 3D 能力进行适配,将发送方的显示端 3D能力与本地的采集端 3D能力进行适配。
305、 发送方根据适配的情况进行视频的编码和发送。
306、 接收方接收发送端发过来的 3D视频流。
307、 接收方对 3D视频流进行解码并渲染显示。
在上述协商过程, 在交互 3D视频能力信息参数时, 还可以包括对 3D视 频流特性的描述, 如, 是否是 3D视频流, 3D视频流的数据内容(例如是 2D 视频数据还是深度 /视差数据), 3D视频流的编码方式(例如是采用空域打包 方式编码还是可分级分辨率增强帧兼容立体方式编码等), 3D视频流的其它 一些参数(包括分辨率, 帧率, 所需带宽等等), 以及建立视频流和 3D能力 信息中描述的采集端摄像机的对应关系, 视频流和呈现的显示设备的对应关 系等等。 本发明的某一实施例中, 由左右 2个演示视频流组成的 3D视频, 其中 V0和 VI均为 2D视频流, 分别对应摄像机 CO和 C1 , 通过一个组合 setl构 成一个立体视频流, 在显示器 DO上进行呈现。 V0和 VI采用 H.264/AVC进 行编码, 分辨率为 1920 X 1080, 帧率为 60帧, 占用带宽为 4M。 则本例中的 3D视频流信息描述如表 6所示。
表 6:
VO presentation stereo CO DO AVC 1920 , 1080 60 4
VI presentation stereo CI DO AVC 1920 , 1080 60 4
setl {V0 , VI } 本发明的某一实施例中, 由左右两个对象摄像机 C0和 C1的 2D视频合 成的 1路 3D视频流 V0,在显示器 D0上进行呈现。采用 Side-By-Side的空域 打包方式进行编码,格式为 H.264/AVC。分辨率为 1920 X 1080,帧率为 60帧, 占用带宽为 4M。 则本例中的 3D视频流信息描述如表 7所示。 表 7:
V0 object stereo (CO , CI ) DO side-by-side AVC 1920 , 1080 60 4 本发明的某一实施例中, 3个对象摄像机 C0、 CI和 C2的 2D视频组成 的 1路 3D视频流 V0, , 在显示器 D0上进行呈现。 采用 MVC方式进行编码。 分辨率为 1920 χ 1080, 帧率为 30帧, 占用带宽为 8M。 则本例中的 3D视频 流信息描述如表 8所示。
表 8:
V0 ~~ object multiview (CO , CI , C2 ) DO MVC 1920 , 1080 30 8 I
在通过上述两个实施例描述的方式进行能力协商后的结果可能有多种, 根据适配的情况双方可以采取多种方式进行后续的编解码过程, 筒要描述如 下。
1、 最筒单的情况是双方为同构的远程呈现系统, 远端可以按照本端的方 式对 3D视频流进行解码和呈现。
2、 接收端为只支持 2D视频的远程呈现系统, 无法呈现 3D视频, 或者 不支持本端的 3D视频编码方式, 因此只能解码 3D视频流中的 2D视频数据 并进行 2D呈现。 的 3D视频数据, 可以只解码其中的一个 2D视频流, 不解码其它的视频流; 对于采用空域打包方式的 3D视频数据, 可以解码后采用其中的一个 2D视频 图像进行呈现; 对于采用可分级分辨率增强帧兼容立体方式的 3D视频数据, 可以采用只解码基本层的 2D视频数据, 而不解码增强层数据的方式。
3、 接收端的远程呈现系统也可以支持 3D视频, 但支持的呈现方式和本 端不同。 这时接收端的系统需要根据本地的呈现方式对 3D视频方式进行渲 染。 例如按照本地显示设备的 3D显示方式和最大视差呈现能力, 根据解码得 到的 2D视频数据和视差 /深度图重新生成左右视点的图像。
对于多点会议的情况, 需要有 MCU来进行 2D/3D视频编码的转换。 对 于不支持 3D视频的终端, 可以由 MCU进行 3D到 2D的视频转码。 对于不 同的 3D视频编码格式, 也可以由 MCU进行转码以适配支持不同 3D视频编 解码格式的的远程呈现系统。例如发送端可以发送空域打包方式的 3D视频数 据, 而接收端只能接收可分级分辨率增强帧兼容立体方式的 3D视频数据, 则 由 MCU进行两种编码格式的转换。
相应的, 如图 4所示, 本发明实施例还提供了一种视频呈现系统, 该系 统包括如下模块: 接收模块 1 , 用于接收来自远端的多视点三维视频信号; 确 定模块 2, 用于确定所述多视点三维视频信号中的多个视点图像流; 显示模块 3, 在观看区域中按顺序交替显示所述多个视点图像流, 其中, 在所述观看区 域中显示的多个视点图像流中两个相邻视点的距离为瞳距。
其中, 所述接收模块 1还用于接收来自远端的多个多视点三维视频信号; 所述系统包括多个显示模块 3分别显示根据所述多个多视点三维视频信号确 定的多个视点图像流。
同时, 该系统还可可包括: 获取模块 4, 用于获得观看区域信息; 则所述 确定模块 2还用于根据所述观看区域信息确定所述多视点三维视频信号中的 多个视点图像流。
系统还可包括: 协商模块 5, 用于与远端进行三维视频能力协商, 确定三 维视频能力信息或和三维视频流信息, 以便所述接收模块根据确定的三维视 频能力信息或和三维视频流信息接收来自远端的多视点三维视频信号。
所述三维视频能力信息或和三维视频流信息包括三维视频采集端参数和 三维视频显示端参数。
其中, 所述三维视频显示端参数包括三维视频显示设备数量、 三维视频 显示设备类型、 视点数、 理想观看距离以及最大显示视差中的一个或多个; 所述三维视频采集端参数包括三维视频采集设备数量、 三维视频采集设备类 型、 三维视频采集设备空间位置关系中的一个或多个。
如图 5所示, 协商模块 5可包括: 接收子模块 50, 用于接收远端发送的 三维视频能力信息或和三维视频流信息; 适配子模块 52, 用于根据所述三维 视频能力信息或和三维视频流信息进行三维视频能力适配, 获得本地支持的 三维视频能力信息或和三维视频流信息。
或, 如图 6所示, 协商模块 5可包括: 构造子模块 51 , 用于构造三维视 频能力信息或和三维视频流信息; 发送子模块 53, 用于将所述三维视频能力 信息或和三维视频流信息发送至远端, 以便远端进行三维视频能力适配后根 据所述三维视频能力信息或和三维视频流信息发送三维视频信号。,
其中, 上述系统实施例中各名词的解释如前述方法实施例中一致, 此处 以下以几个不同的系统布局进一步说明上述的根据观看位置进行多视点 3D显示的情况。
如图 7所示,在本实施例的布局中中间的显示器 2为多视点棵眼 3D显示 器(可以进行 2D/3D显示切换) , 旁边的显示器 1和 3为普通的 2D显示器。 显示器 2的观看视角范围 A1 (图中的阴影区域)正好覆盖中间的两个座位 16 和 17, 最优观看距离为显示屏幕表面到座位 16和 17的垂直距离。 根据前面 所述的多视点棵眼 3D显示器的显示原理,多个视点的图像按顺序交替出现在 观看区域中 (图中虚线部分) , 相邻两个观看视点的距离为瞳距, 这样用户 1002和 1003的左右眼能够同时看到两个视点的图像, 从而形成立体(即 3D ) 视觉。 由于观看视点范围覆盖了 A1 区域, 在这个区域内用户都能够看到 3D 视频, 但用户观看屏幕 1和 3只能看到 2D视频。 这种布局对多视点 3D显示 器的观看视角要求不高, 观看视角范围只需要覆盖一个较小的区域。 在与远端系统进行能力协商过程中, 会通知本地的上述特征, 进而获得 合适的多视点 3D视频流, 并进行渲染。 本实施例的系统中, 显示器 2具有上 述实施例中的显示模块的功能, 而上述实施例中的接收模块和确定模块的功 能可以集成在显示器 2中 (此时, 要求显示 2有相应的处理能力) , 也可以 单独有具有处理能力的处理器实现; 类似的, 协商模块、 获取模块的功能也 可以有类似的设置。 以下各实施例中的系统也类似, 此处不做赞述。
如图 8所示,显示了基于图 7的布局多排实现方式, 和单排方式类似(其 3D成像原理和图 7中的一样, 为了筒化图示, 此处不再绘制虚线视点部分)。
如图 9所示, 为本发明实施例中的系统布局 2 (其 3D成像原理和图 7中 的布局一样, 为了筒化图示, 此处不再绘制虚线视点部分) 。 在该布局中, 显示器 1、 2、 3皆为多视点棵眼 3D显示器(可以进行 2D/3D显示切换) 。 显示器 2的观看视角范围 A10覆盖中间两个座位 16和 17, 显示器 1的观看 视角范围 A20覆盖座位 14和 15, 而显示器 3的观看视角范围 A30覆盖座位 18和 19。 在这种布局下, 每个座位上的人观看离自己最近的显示器都能获得 理想的 3D体验,中间的两个座位 16和 17在观看显示器 1和 3时通常也能获 得理想的 3D体验,但其它座位在观看每个显示器时可能无法都获得 3D体验。 可以看出, 3个显示器的观看视角范围是不同的, 中间的范围小, 两边的范围 大, 因此中间的显示器和两边的显示器需要采用不同的制造工艺保证观看视 角范围不同。 多排方式的原理和单排类似, 此处不再敷述。
如图 10和图 11所示,为本发明实施例中的远程呈现系统中多视点 3D呈 现的设计布局 3 (其 3D成像原理和布局 1一样, 为了筒化图示, 此处不再绘 制虚线视点部分)。 在该布局中, 显示器 1、 2、 3皆为多视点棵眼 3D显示器 (可以进行 2D/3D显示切换), 其观看视角范围 A100、 A200和 A300都分别 覆盖了整个用户区域。 在这种布局下, 每个用户在观看 3 个屏幕时都能获得 最佳的 3D体验。每个显示器的观看视角都和显示器与用户座位的相对位置关 系相关, 因此观看视角范围有可能不同, 需要采用不同的制造工艺。 多排方 式的原理和单排类似, 此处不再敷述。
如图 12所示, 为本发明实施例中的远程呈现系统中多视点 3D呈现的设 计布局 4 (其 3D成像原理和布局 1一样, 为了筒化图示, 此处不再绘制虚线 视点部分)。 在该布局中, 采用了侧面方式的独立的多视点 3D显示器 101来 提供 3D呈现。 因此在显示器设计时, 需要使观看视角范围 A400覆盖整个用 户座位区域。 多排方式的原理和单排类似, 此处不再敷述。
如图 13所示, 为本发明实施例中的远程呈现系统中多视点 3D呈现的设 计布局 5 (其 3D成像原理和布局 1一样, 为了筒化图示, 此处不再绘制虚线 视点部分)。 在该布局中, 采用了中间显示器下方放置的独立的多视点 3D显 示器 102来提供 3D呈现。 观看视角范围 A500范围也必须覆盖整个用户座位 区域才能获得最佳的 3D体验。 多排方式的原理和单排类似, 此处不再敷述。
图 14所示, 为本发明实施例中的远程呈现系统中多视点 3D呈现的设计 布局 6 (其 3D成像原理和布局 1一样, 为了筒化图示, 此处不再绘制虚线视 点部分)。 在布局中, 辅助显示器 20、 21和 22为支持 2D/3D显示的显示器。 其中 20的 3D观看视角范围覆盖座位 14、 15, 21的观看视角范围覆盖座位 16、 17, 22的观看视角范围覆盖作为 18, 19, 这样每个座位区域的用户都能 看到该区域的辅助显示器的 3D视频, 获得最优的观看效果。 多排方式的原理 和单排类似, 此处不再敷述。
在本发明实施例的视频呈现系统中采用的是三维视频技术, 并根据三维 视频信号的特点选择了适应于远程呈现系统的多视点三维视频信号, 根据观 看区域进行三维视频信号的显示, 保证了使用系统的观看者可以有效的观看 到具有三维效果的视频, 实现了远程三维视频呈现系统。
同时,本发明实施例还提供了远程系统中进行 3D视频信息协商的具体方 案, 实现了具有不同 3D视频能力的两端组成远程系统时的有效协调。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流 程, 是可以通过计算机程序来指令相关的硬件来完成, 所述的程序可存储于 一计算机可读取存储介质中, 该程序在执行时, 可包括如上述各方法的实施 例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体( Read-Only Memory, ROM )或随机存者 i己忆体 ( Random Access Memory, RAM )等。
以上所揭露的仅为本发明一种较佳实施例而已, 当然不能以此来限定本 发明之权利范围, 因此依本发明权利要求所作的等同变化, 仍属本发明所涵 盖的范围。。

Claims

权利要求书
1、 一种视频呈现方法, 其特征在于, 所述方法包括:
接收来自远端的多视点三维视频信号;
确定所述多视点三维视频信号中的多个视点图像流;
在观看区域中按顺序交替显示所述多个视点图像流, 其中, 在所述观看区 域中显示的多个视点图像流中两个相邻视点的距离为瞳距。
2、 如权利要求 1所述的方法, 其特征在于, 在所述确定所述多视点三维视 频信号中的多个视点图像流之前还包括:
获得观看区域信息;
所述确定所述多视点三维视频信号中的多个视点图像流包括:
根据所述观看区域信息确定所述多视点三维视频信号中的多个视点图像 流。
3、 如权利要求 1所述的方法, 其特征在于, 所述远端的多视点三维视频信 号包括远端的多个多视点三维视频信号。
4、 如权利要求 1至 3中所述的方法, 其特征在于, 在所述接收来自远端的 多视点三维视频信号之前还包括:
与远端进行三维视频能力协商, 确定三维视频能力信息或和三维视频流信 息, 以便根据确定的三维视频能力信息或和三维视频流信息接收来自远端的多 视点三维视频信号。
5、 如权利要求 4所述的方法, 其特征在于, 所述与远端进行三维视频能力 协商, 确定三维视频能力信息或和三维视频流信息包括:
接收远端发送的三维视频能力信息或和三维视频流信息;
根据所述三维视频能力信息或和三维视频流信息对远端和本地的三维视频 能力进行三维视频能力适配, 获得本地支持的三维视频能力信息或和三维视频 流信息。
6、 如权利要求 4所述的方法, 其特征在于, 所述与远端进行三维视频能力 协商, 确定三维视频能力信息或和三维视频流信息包括:
构造三维视频能力信息或和三维视频流信息;
将所述三维视频能力信息或和三维视频流信息发送至远端, 以便远端进行 三维视频能力适配后根据所述三维视频能力信息或和三维视频流信息发送三维 视频信号。
7、 如权利要求 4所述的方法, 其特征在于, 所述三维视频能力信息包括三 维视频采集端参数和三维视频显示端参数;
其中, 所述三维视频显示端参数包括三维视频显示设备数量、 三维视频显 示设备类型、 视点数、 理想观看距离以及最大显示视差中的一个或多个; 所述 三维视频采集端参数包括三维视频采集设备数量、 三维视频采集设备类型、 三 维视频采集设备空间位置关系中的一个或多个。
8、 如权利要求 4所述的方法, 其特征在于, 所述三维视频能力信息采用抽 象语法标记 ASN.1进行描述, 或采用可扩展标记语言 XML进行描述, 或采用 文本格式进行描述。
9、 如权利要求 4所述的方法, 其特征在于, 所述三维视频流信息包括是否 是三维视频流、 三维视频流的数据内容、 三维视频流的编码方式分辨率、 帧率、 带宽中的一个或多个。
10、 一种视频呈现系统, 其特征在于, 所述系统包括:
接收模块, 用于接收来自远端的多视点三维视频信号;
确定模块, 用于确定所述多视点三维视频信号中的多个视点图像流; 显示模块, 在观看区域中按顺序交替显示所述多个视点图像流, 其中, 在 所述观看区域中显示的多个视点图像流中两个相邻视点的距离为瞳距。
11、 如权利要求 10所述的系统, 其特征在于, 所述系统还包括: 获取模块, 用于获得观看区域信息;
所述确定模块还用于根据所述观看区域信息确定所述多视点三维视频信号 中的多个视点图像流。
12、 如权利要求 10所述的系统, 其特征在于,
所述接收模块还用于接收来自远端的多个多视点三维视频信号;
所述系统包括多个显示模块分别显示根据所述多个多视点三维视频信号确 定的多个视点图像流。
13、 如权利要求 10至 12中所述的系统, 其特征在于, 所述系统还包括: 协商模块, 用于与远端进行三维视频能力协商, 确定三维视频能力信息或 和三维视频流信息, 以便所述接收模块根据确定的三维视频能力信息或和三维 视频流信息接收来自远端的多视点三维视频信号。
14、 如权利要求 13所述的系统, 其特征在于, 所述三维视频能力信息或和 三维视频流信息包括三维视频采集端参数和三维视频显示端参数;
其中, 所述三维视频显示端参数包括三维视频显示设备数量、 三维视频显 示设备类型、 视点数、 理想观看距离以及最大显示视差中的一个或多个; 所述 三维视频采集端参数包括三维视频采集设备数量、 三维视频采集设备类型、 三 维视频采集设备空间位置关系中的一个或多个。
15、 如权利要求 13所述的系统, 其特征在于, 所述协商模块包括: 接收子模块, 用于接收远端发送的三维视频能力信息或和三维视频流信息; 适配子模块, 用于根据所述三维视频能力信息或和三维视频流信息进行三 维视频能力适配, 获得本地支持的三维视频能力信息或和三维视频流信息。
16、 如权利要求 13所述的系统, 其特征在于, 所述协商模块包括: 构造子模块, 用于构造三维视频能力信息或和三维视频流信息; 发送子模块, 用于将所述三维视频能力信息或和三维视频流信息发送至远 端, 以便远端进行三维视频能力适配后根据所述三维视频能力信息或和三维视 频流信息发送三维视频信号。
PCT/CN2012/075544 2011-10-28 2012-05-16 一种视频呈现方法和系统 WO2013060135A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP12844506.1A EP2739056A4 (en) 2011-10-28 2012-05-16 METHOD AND SYSTEM FOR VIDEO PRESENTATION
US14/200,196 US9392222B2 (en) 2011-10-28 2014-03-07 Video presence method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110334678.X 2011-10-28
CN201110334678.XA CN103096014B (zh) 2011-10-28 2011-10-28 一种视频呈现方法和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/200,196 Continuation US9392222B2 (en) 2011-10-28 2014-03-07 Video presence method and system

Publications (1)

Publication Number Publication Date
WO2013060135A1 true WO2013060135A1 (zh) 2013-05-02

Family

ID=48167094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/075544 WO2013060135A1 (zh) 2011-10-28 2012-05-16 一种视频呈现方法和系统

Country Status (4)

Country Link
US (1) US9392222B2 (zh)
EP (1) EP2739056A4 (zh)
CN (1) CN103096014B (zh)
WO (1) WO2013060135A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9210381B2 (en) * 2013-06-24 2015-12-08 Dialogic Corporation Resource-adaptive video encoder sharing in multipoint control unit
KR101826025B1 (ko) * 2016-07-22 2018-02-07 한국과학기술연구원 사용자 인터렉션이 가능한 3d 영상 콘텐츠 생성 시스템 및 방법
CN106454204A (zh) * 2016-10-18 2017-02-22 四川大学 一种基于网络深度相机的裸眼立体视频会议系统
US10652553B2 (en) * 2016-12-07 2020-05-12 Qualcomm Incorporated Systems and methods of signaling of regions of interest

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6603504B1 (en) * 1998-05-25 2003-08-05 Korea Institute Of Science And Technology Multiview three-dimensional image display device
CN101291415A (zh) * 2008-05-30 2008-10-22 深圳华为通信技术有限公司 一种三维视频通信的方法、装置及系统
CN101394572A (zh) * 2007-09-21 2009-03-25 株式会社东芝 三维图像处理装置以及方法
CN101518095A (zh) * 2006-09-15 2009-08-26 三星电子株式会社 具有改善的分辨率的多视角自动立体显示器
CN101651841A (zh) * 2008-08-13 2010-02-17 华为技术有限公司 一种立体视频通讯的实现方法、系统和设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5703961A (en) * 1994-12-29 1997-12-30 Worldscape L.L.C. Image transformation and synthesis methods
JP2002095018A (ja) * 2000-09-12 2002-03-29 Canon Inc 画像表示制御装置及び画像表示システム、並びに画像データの表示方法
KR100813961B1 (ko) * 2005-06-14 2008-03-14 삼성전자주식회사 영상 수신장치
WO2009013438A2 (fr) * 2007-07-17 2009-01-29 France Telecom Appropriation dynamique d'au moins un equipement multimedia lors d'une mise en communication
CN101453662B (zh) * 2007-12-03 2012-04-04 华为技术有限公司 立体视频通信终端、系统及方法
JP5469911B2 (ja) * 2009-04-22 2014-04-16 ソニー株式会社 送信装置および立体画像データの送信方法
KR101615111B1 (ko) * 2009-06-16 2016-04-25 삼성전자주식회사 다시점 영상 표시 장치 및 방법
EP2786344B1 (en) * 2011-11-30 2016-02-24 Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. Hybrid recursive analysis of spatio-temporal objects
US9729847B2 (en) * 2012-08-08 2017-08-08 Telefonaktiebolaget Lm Ericsson (Publ) 3D video communications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6603504B1 (en) * 1998-05-25 2003-08-05 Korea Institute Of Science And Technology Multiview three-dimensional image display device
CN101518095A (zh) * 2006-09-15 2009-08-26 三星电子株式会社 具有改善的分辨率的多视角自动立体显示器
CN101394572A (zh) * 2007-09-21 2009-03-25 株式会社东芝 三维图像处理装置以及方法
CN101291415A (zh) * 2008-05-30 2008-10-22 深圳华为通信技术有限公司 一种三维视频通信的方法、装置及系统
CN101651841A (zh) * 2008-08-13 2010-02-17 华为技术有限公司 一种立体视频通讯的实现方法、系统和设备

Also Published As

Publication number Publication date
EP2739056A1 (en) 2014-06-04
EP2739056A4 (en) 2014-08-13
CN103096014B (zh) 2016-03-30
US20140184730A1 (en) 2014-07-03
US9392222B2 (en) 2016-07-12
CN103096014A (zh) 2013-05-08

Similar Documents

Publication Publication Date Title
RU2528080C2 (ru) Кодирующее устройство для сигналов трехмерного видеоизображения
US8456505B2 (en) Method, apparatus, and system for 3D video communication
US20070171275A1 (en) Three Dimensional Videoconferencing
US20110134227A1 (en) Methods and apparatuses for encoding, decoding, and displaying a stereoscopic 3d image
TW201223247A (en) 2D to 3D user interface content data conversion
WO2006008613A1 (en) System and method for transferring video information
WO2015168969A1 (zh) 基于振动光栅的裸眼立体显示方法与装置
US20140085435A1 (en) Automatic conversion of a stereoscopic image in order to allow a simultaneous stereoscopic and monoscopic display of said image
Ito Future television—super hi-vision and beyond
IJsselsteijn et al. Human factors of 3D displays
Gotchev Computer technologies for 3d video delivery for home entertainment
US9392222B2 (en) Video presence method and system
KR20100112940A (ko) 데이터 처리방법 및 수신 시스템
May A survey of 3-D display technologies
CN112243121A (zh) 裸眼3d显示器的多模式显示方法
Date et al. Full parallax table top 3d display using visually equivalent light field
Cellatoglu et al. Autostereoscopic imaging techniques for 3D TV: proposals for improvements
de Beeck et al. Three dimensional video for the home
CN203164525U (zh) 一种柱面镜立体影像助视屏及立体影像重现系统
KR101674688B1 (ko) 입체영상 재생 장치 및 입체영상 재생 방법
Onural et al. Three-dimensional television: From science-fiction to reality
KR101433082B1 (ko) 2차원 영상과 3차원 영상의 중간 정도 느낌을 주는 영상 변환 및 재생 방법
Zuo et al. Enabling eye-contact for home video communication with a multi-view autostereoscopic display
Longhi State of the art 3d technologies and mvv end to end system design
Robitza 3d vision: Technologies and applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12844506

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE