WO2009152769A1 - 视频通讯方法、装置及系统 - Google Patents

视频通讯方法、装置及系统 Download PDF

Info

Publication number
WO2009152769A1
WO2009152769A1 PCT/CN2009/072320 CN2009072320W WO2009152769A1 WO 2009152769 A1 WO2009152769 A1 WO 2009152769A1 CN 2009072320 W CN2009072320 W CN 2009072320W WO 2009152769 A1 WO2009152769 A1 WO 2009152769A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
content
color
video
local
Prior art date
Application number
PCT/CN2009/072320
Other languages
English (en)
French (fr)
Inventor
方平
刘琛
刘源
Original Assignee
深圳华为通信技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN2008101270074A external-priority patent/CN101610421B/zh
Priority claimed from CN 200810210178 external-priority patent/CN101662694B/zh
Application filed by 深圳华为通信技术有限公司 filed Critical 深圳华为通信技术有限公司
Priority to PL09765408T priority Critical patent/PL2299726T3/pl
Priority to EP09765408A priority patent/EP2299726B1/en
Priority to ES09765408T priority patent/ES2389401T3/es
Publication of WO2009152769A1 publication Critical patent/WO2009152769A1/zh
Priority to US12/971,392 priority patent/US8446459B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/156Mixing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/286Image signal generators having separate monoscopic and stereoscopic modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the present invention relates to the field of video communications, and in particular, to a video communication method, and an apparatus and system using the video communication method.
  • video communication technology has been widely used, for example, video telephony, video conferencing, etc. all use video communication technology.
  • Current video communication applications primarily use traditional two-dimensional images or video.
  • the Chroma key method is mainly used to extract the foreground target in the video by color segmentation.
  • the extracted foreground objects are combined with other far-end videos to increase the sense of reality. For example: Combine the person in the video (foreground target) with the far-end slide presentation.
  • the Chroma key method has the following drawbacks:
  • the Chroma key method requires that the background of the split video be blue, green or other single color, so that the foreground target and the background can be segmented, and the background color cannot be seen in the foreground, because the method is for the background and the foreground.
  • the color requirements are strict, which leads to inconvenience in use.
  • Chroma key method can only distinguish the foreground target and background, but can not divide the scene content into more layers, can not achieve the replacement of some targets in the foreground, such as in the conference scene, there may be a table in front of the person, if the other party The table can replace the cost of the table, which can increase the realism, but the Chroma key method can not replace the other's table with the local table, which can not further improve the realism.
  • This technology only realizes the replacement of 2D video content, which does not enable the user to experience the depth of the scene and lacks realism.
  • the above two-dimensional image or video can only express the content of the scene, and cannot reflect the depth information such as the distance and position of the scene.
  • the scenery has a good distance, position perception, and can reflect the three-dimensional sense of the scene.
  • the stereoscopic video technology is based on the binocular parallax principle, and by displaying a slightly different scene content to the left and right eyes, the person can obtain the depth and layering of the scene.
  • the prior art uses stereo video technology and special decoration of the communication two-way scene, so that the user feels that both communication parties are in the same scene to increase the sense of reality.
  • the indoor environment of the communication parties is arranged in the same way, so that in the communication process, the user sees the other party in the video, just like the other party is in the same scene as himself, but the application range of the method is arranged by the environment of both parties. limit.
  • the prior art also provides an interactive presentation system, the system mainly comprising: an active infrared camera, and a command recognition unit and a synthesizing unit connected to the infrared camera.
  • the rendering system in the prior art first captures an image of a video object by using an active infrared camera to acquire information of the video object; then, the command recognition unit converts the information into an output command signal and sends the information to the synthesizing unit;
  • the information of the video object may include an infrared image captured by the infrared camera to obtain a gesture of the recognized video object, or a voice of the received video object; finally, the synthesizing unit synthesizes the image of the video object captured by the infrared camera and the image of the presented material,
  • the position of the video object is controlled, and the screen display of the rendered material is controlled in accordance with an output command signal of the command recognition system.
  • the presentation system provided by the prior art can only support the display mode of the two-dimensional video, and does not support the display by the three-dimensional video.
  • Embodiments of the present invention provide a video communication method, device, and system, which are not limited by the environment arrangement of the communication parties, increase the realism of the communication parties in the communication process, and display the scene content in a three-dimensional video manner.
  • a video preprocessing method includes:
  • the local target content is segmented from the local scene content according to the depth value of the local scene content.
  • a video preprocessing apparatus includes:
  • An information acquiring module configured to acquire local scene content and a depth value thereof
  • a segmentation module configured to segment local target content from the local scene content according to the local scene content depth value.
  • a video receiving method includes: acquiring local background content and depth values of the local background content;
  • the far-end target content and the local background content are synthesized into a scene content according to a depth value of the far-end target content and a depth value of the local background content.
  • a video receiving device includes:
  • a transmission interface module configured to receive a remote target content sent by the remote end and the remote target content depth value
  • An extraction module configured to acquire a local background content and a depth value of the local background content
  • a synthesis module configured to: the remote target content according to a depth value of the remote target content and a depth value of the local background content And synthesizing the scene content with the local background content.
  • a video communication system including a transmitting end and a receiving end:
  • the sending end is configured to obtain a depth value of the content of the sending end scene and the content of the sending end scene, and according to the depth value of the scene content of the sending end, segment the sending end target receiving end from the sending end scene content, and receive And the depth value of the target content of the sending end and the target content of the sending end sent by the sending end, and acquiring the background content of the receiving end and the depth value of the background content of the receiving end, according to the depth value of the target content of the sending end And the depth value of the receiving end background content synthesizes the sending end target content and the receiving end background content into the scene content.
  • a video processing method including: Obtaining a color/grayscale image of the at least two viewpoints of the video object to be rendered and a color/grayscale image of at least one viewpoint of the at least one presentation material;
  • a video sending method includes:
  • the rendered image is encoded and transmitted.
  • a video sending method includes:
  • a color/grayscale image of at least two viewpoints of the video object to be presented and a color/grayscale image of at least one viewpoint of the at least one presentation material are encoded and transmitted.
  • a video receiving method includes:
  • the rendered image is displayed in a three-dimensional manner.
  • a video processing device includes:
  • An image acquisition and processing unit configured to acquire a color/grayscale image of at least two viewpoints of the video object to be presented and a color/grayscale image of at least one viewpoint of the at least one presentation material; Combining a color/grayscale image of at least two viewpoints of the video object with a color/grayscale image of at least one viewpoint of the at least one rendering material to obtain a rendered image;
  • a display unit configured to display the rendered image in a three-dimensional manner.
  • a video transmitting device includes:
  • An image acquisition unit configured to acquire a color/grayscale image of at least two viewpoints of the video object to be presented and a color/grayscale image of at least one viewpoint of the at least one presentation material;
  • a coding unit configured to encode a color/gray image of at least two viewpoints of the video object to be presented and a color/gray image of at least one viewpoint of the at least one presentation material to obtain a coded image
  • a sending unit configured to send the encoded image.
  • a video receiving device includes:
  • a receiving unit configured to receive a coded image
  • a decoding unit configured to decode the acquired encoded image, obtain a color/grayscale image of at least two viewpoints of the video object to be presented, and a color/grayscale image of at least one viewpoint of the at least one presentation material;
  • a synthesizing unit configured to synthesize a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one presentation material to obtain a rendered image
  • a display unit configured to display the rendered image in a three-dimensional manner.
  • a video transmitting device includes:
  • An image acquisition and processing unit configured to acquire a color/grayscale image of at least two viewpoints of a video object to be rendered; at least one of a color/grayscale image of at least two viewpoints of the video object and at least one presentation material
  • the color/grayscale image of the viewpoint is synthesized to obtain a rendered image
  • a coding unit configured to encode the presented image to obtain a coded image
  • a sending unit configured to send the encoded image.
  • a video communication system includes: a video transmitting device and a video receiving device,
  • the video sending device includes:
  • An image acquisition and processing unit configured to acquire a color/grayscale image of at least two viewpoints of the video object to be presented and a color/grayscale image of at least one viewpoint of the at least one presentation material; at least two of the video objects Combining a color/grayscale image of the viewpoint with a color/grayscale image of at least one viewpoint of the at least one rendering material to obtain a rendered image;
  • a coding unit configured to encode the presentation image to obtain a coded image
  • a sending unit configured to send the coded image
  • the video receiving device includes:
  • a receiving unit configured to receive the coded image
  • a decoding unit configured to decode the acquired encoded image to obtain the presented image
  • a display unit configured to display the rendered image in a three-dimensional manner.
  • the locally displayed picture needs to be synthesized through the local background content and the remote target content, so that the user sees the background in the picture and the current location of the user.
  • the scenes are exactly the same, just like the environment in which the two parties are located is the same, which can increase the realism in the user communication process.
  • the environments of the two communication parties are allowed to be different, and the background is not required to be changed to a single color. Therefore, when the embodiment of the present invention is implemented, communication is not received.
  • the environmental constraints of both parties can increase the realism in the communication process.
  • a multi-view color/gray image of the obtained video object and a color/gray image of the material are generated to generate a presentation image, and the presentation image supports a three-dimensional display manner, and then the presented image is in a three-dimensional manner.
  • the display shows that the prior art can only support the problems brought by the rendering of the two-dimensional video, thereby realizing the rendering of the three-dimensional video.
  • FIG. 1 is a flowchart of a video preprocessing method according to a first embodiment of the present invention
  • FIG. 2 is a block diagram of a video pre-processing apparatus according to a first embodiment of the present invention
  • FIG. 3 is a flowchart of a video receiving method according to a first embodiment of the present invention
  • FIG. 4 is a block diagram of a video receiving apparatus according to a first embodiment of the present invention
  • FIG. 5 is a schematic diagram of a video communication device according to a first embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a video communication device according to a second embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a stereo camera used in a second embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a process of synthesizing scene content in a second embodiment of the present invention.
  • FIG. 9 is a flow chart of video communication according to a third embodiment of the present invention.
  • FIG. 10 is a structural diagram of a video communication system according to a third embodiment of the present invention.
  • FIG. 11 is a flowchart of a video pre-processing method according to Embodiment 4 of the present invention
  • FIG. 13 is a schematic diagram of a camera for obtaining depth information according to Embodiment 5 of the present invention
  • FIG. 14 is an acquisition method according to Embodiment 5 of the present invention. Schematic diagram of camera construction of depth information and color/grayscale images
  • 15 is a schematic diagram of the occlusion provided by the fifth embodiment of the method according to the present invention.
  • FIG. 16 is a schematic diagram of a method according to Embodiment 5 of the present invention.
  • FIG. 17 is a flowchart of a method for transmitting a video presentation according to Embodiment 6 of the present invention
  • FIG. 18 is a flowchart of a method for receiving a video presentation according to Embodiment 6 of the present invention
  • Schematic diagram of the rendering device
  • Figure 20 is a schematic diagram of a device for transmitting a video presentation according to a fifth embodiment of the present invention
  • Figure 21 is a schematic diagram of a device for receiving a video presentation according to a fifth embodiment of the present invention
  • FIG. 23 is a schematic diagram of a communication system provided by an embodiment of the system of the present invention.
  • the local background content and the remote target content are combined into one screen for display, so that the communication parties do not need to specially arrange the scene, so that the scene in the picture is the same as the scene in which the scene is located, and the communication process is increased. Realism, and can display the scene content in 3D video.
  • This embodiment provides a video preprocessing method. As shown in FIG. 1, the video preprocessing method includes the following steps:
  • the local scene content may be divided into two or more levels by the depth value of the local scene content, so that the level of the local target content may be segmented, that is, the local target content is segmented from the local scene content.
  • Step 103 is a step of transmitting the preprocessed content, which may be omitted.
  • the embodiment further provides a video pre-processing apparatus.
  • the video pre-processing apparatus includes: an information acquisition module 21, a segmentation module 22, and a transmission module 23.
  • the information acquiring module 21 is configured to obtain the local scene content and the depth value thereof, where the information acquiring module can be implemented by using a depth camera or a stereo camera, wherein the depth camera uses infrared technology to acquire the depth of the image, and the stereo camera uses the dual camera. To get the depth of the image.
  • the local scene content can be divided into two or more levels, and the dividing module 22 is configured to segment the local target content from the local scene content according to the local scene content depth value.
  • the sending module 23 is configured to send the local target content and its depth value to the remote end.
  • the video pre-processing apparatus performs video pre-processing mainly through the information acquisition module 21 and the segmentation module 22, wherein the transmission module 23 can be omitted.
  • the embodiment further provides a video receiving method corresponding to the video preprocessing method.
  • the video receiving method includes the following steps:
  • the background content synthesizes the scene content.
  • the embodiment further provides a video receiving apparatus.
  • the video receiving apparatus includes: a transmission interface module 41, an extracting module 42 and a synthesizing module 43.
  • the transmission interface module 41 is configured to receive the target content and the depth value sent by the remote end; the extraction module 42 is configured to obtain the local background content and the depth value thereof; the synthesis module 43 is configured to use the remote target content according to the relationship of the depth value. Synthesize the scene content with the local background content. Generally, the pixel with a small depth value blocks the pixel with a large depth value; finally, the synthesized scene content is displayed by a device such as a display.
  • an embodiment of the present invention further provides a video communication device, which specifically includes: an information acquiring module 51, a dividing module 52, a transmission interface module 53, an extracting module 54, and a synthesizing module 55.
  • the information acquiring module 51 is configured to acquire the local scene content and the depth value thereof, and the information acquiring module 51 can be implemented by using a depth camera or a stereo camera, wherein the depth camera uses infrared technology to acquire the depth of the image, and the stereo camera uses the double Camera to get the depth of the image.
  • the segmentation module 52 is configured to segment the local target content from the local scene content according to the local scene content depth value.
  • the transmission interface module 53 is configured to send the local target content and its depth value to the remote end.
  • the transmission interface module 53 is further configured to receive the target content and the depth value sent by the remote end, the extraction module 54 is configured to obtain the local background content and the depth value thereof, and the synthesis module 55 is configured to use the depth value relationship to the remote target.
  • the content and the local background content are combined to synthesize the scene content.
  • the pixels with small depth values block pixels with large depth values; finally, the synthesized scene content is displayed by the display module.
  • the local background content may be the remaining content after the segmentation module 54 splits the local target content, or the background content of the local target and its depth value may be acquired by another camera.
  • the video pre-processing device and the video receiving device in this embodiment are communicated, for example, both are connected to the same network, this is enough to form a video communication system, and the transmitting end of the system Including the video pre-processing apparatus of FIG. 2, the receiving end includes the video receiving apparatus of FIG. Example 2:
  • the embodiment provides a video communication device, where the device sends the local target content in the local scene content and the depth value corresponding to the local target content to the peer device, and the peer device receives the local target content,
  • the local target content and the background content of the opposite end are combined into one scene and displayed to the peer user. This ensures that the scene seen by the user on the opposite end is exactly the same as the scene in which the peer user is located. It is more realistic and realistic.
  • the local video communication device After receiving the remote target content, the local video communication device combines the remote target content and the local background content into a scene and displays it to the local user to improve the local user's presence and reality in the communication process. .
  • the video communication device mainly includes: an information acquisition module 61, a segmentation module 62, an encoding module 63, a transmission interface module 64, a decoding module 65, a synthesizing module 66, and a display module 67.
  • the information acquiring module 61 is configured to perform the shooting of the local scene content, and the calculation of the depth value corresponding to the local scene content, or directly obtain the depth value corresponding to the local scene content; the segmentation module 62 is configured to use the depth value from the local scene content.
  • the local target content is segmented;
  • the encoding module 63 is configured to encode the segmented local target content and its corresponding depth value;
  • the transmission interface module 64 is configured to send the local target content and its depth value, or receive the target content sent by the remote end.
  • the decoding module 65 is configured to implement decoding of the received remote target content and its depth value;
  • the synthesizing module 66 is configured to fuse the decoded remote target content with the local background content, according to the corresponding depth value.
  • a stereoscopic view is generated, where the local background content may be the remaining content after the local target content is segmented in the local scene content, or may be the scene content opposite the local target captured by another group of cameras; the display module 67 is configured to implement the pairing Display into an image, which can be a stereo display Ordinary two-dimensional display device or, if the stereoscopic display device, it is necessary reconstruct a two-dimensional image of another viewpoint.
  • the information acquisition module 61 can be implemented in the following two ways: 1. Using the depth camera to obtain the local scene content and its depth value at the same time; 2. Using two or more cameras to capture the local scene content. The corresponding depth value is obtained by the stereo image matching method.
  • the Depth Camera is a new type of camera that captures the RGB (red green blue) color image while capturing the depth value of each pixel in the color image.
  • Current depth cameras mainly use infrared to obtain the depth value of the target in the scene.
  • the method of obtaining the corresponding depth value by the stereo image matching method requires that two or more cameras are used to capture the scene during image acquisition, and two or more images of different angles of the scene are obtained. By matching the images, the scene can be obtained on different images. Parallax, according to the parallax and the internal and external parameters of the camera, the depth value corresponding to each pixel in the image can be calculated. The following takes the two cameras as an example to describe the image matching method to obtain the depth value.
  • Figure 7 shows the imaging diagram of two parallel cameras placed horizontally, where 01 and 02 are the optical centers of the two cameras, respectively, and the distance is B.
  • the distance from point A to the vertical point 0 of the camera is Z (ie, point A).
  • Depth), A1 and A2 are the point A of the image point of the two cameras, respectively.
  • Obtaining the depth information by the stereo camera includes: finding an image point corresponding to a point in the scene in two or more images, and then determining the depth value according to the coordinates of the point in the two or more images.
  • the process of finding a corresponding point in a different image in a different image in the scene is done by image matching.
  • Current image matching technologies mainly include: window-based matching, feature-based matching, and dynamic programming. Among them, window-based matching and dynamic programming methods use gray-based matching algorithms.
  • the algorithm for grayscale is to divide one image into two or more small sub-regions, and use its gray value as a template to find sub-regions of the most similar gray value distribution in other images, if the two sub-regions satisfy the gray
  • the similarity of the degree distribution requires that we can consider that the points in the sub-areas are matched, that is, the imaging points of the two sub-areas are the imaging of the same point in the scene.
  • correlation functions are usually used to measure the similarity of two regions.
  • Feature-based matching does not directly utilize the grayscale of the image, but uses features derived from image grayscale information to match, which is more stable than using simple luminance and grayscale variation information. Matching features can be thought of as potentially important features that describe the 3D structure of the scene, such as the intersection of edges and edges (corner points). Feature-based matching generally first obtains a sparse depth information map, and then uses an interpolation value to obtain a dense depth information map of the image.
  • the segmentation module 62 segments the image according to the local scene content and its corresponding depth value to obtain local target content in the local scene.
  • the segmentation module 62 is implemented by the search unit 621 and the segmentation unit 622.
  • the search unit 621 is configured to search for an area where the local target content appears in the local scene content
  • the segmentation unit 622 is configured to perform the local target content region in the local scene content. Accurate edge contour extraction, segmentation to get local target content and other local background content.
  • the local user can estimate the location of the local target relative to the camera, and set the depth value range in which the local target content appears.
  • the search unit is Find the area in which the target content appears within the depth value range.
  • the existing face recognition technology can be used to automatically recognize the location where the face image appears from the local scene content by the face recognition unit 623, and then the search unit 621 Finding a depth value corresponding to the location of the face image in the depth value of the local scene content, and then determining a range of the local target content depth value according to the found depth value, and determining, according to the range of the depth value, the local target content in the scene content The area in the middle. Thereby determining the depth range in which the character's target appears in the scene.
  • the depth value is adapted to correspond to the color image
  • the person region divided according to the depth value corresponds to the person region in the color image.
  • the local target content of the resulting color image and its depth is then sent to the encoding module 63, which encodes it and transmits it to the remote via the transport interface module 64.
  • the local target content Since the size of the local target content extracted from the extraction is different, the local target content needs to be adjusted to the same size, and the local target content is generally adjusted to the same size as the local scene content, so that each frame is obtained the same size.
  • the image to be encoded is easy to encode. This adjustment does not scale the local target content itself, but only changes the size of the canvas used by the local target content. For blank areas that appear after resizing, you can fill with a value of 0.
  • the encoding module 63 in this embodiment encodes the segmented local target content and its depth value.
  • Stereo video has a much larger amount of data than a single-channel 2D video: Binocular stereo video has two data channels. The increase in video data has created difficulties for both storage and transmission.
  • stereo video coding can be mainly divided into two categories: block-based coding and object-based coding.
  • block-based coding In stereo image coding, in addition to intra prediction and inter prediction to eliminate data redundancy in the spatial and temporal domains, spatial data redundancy between multi-channel images must also be eliminated.
  • Parallax estimation and compensation is a key technique in stereo video coding to eliminate spatial redundancy between multi-channel images. The core of disparity estimation compensation is to find correlations between two (or more than three) images.
  • the stereo video coding content here includes a color image and its corresponding depth value, and layer coding can be adopted, that is, the color image hybrid coding is put into the base layer, and the depth value is mixed and code
  • the transmission interface module 64 in this embodiment is configured to send the encoded local target content and its depth value, and receive the encoded remote target content and its depth value transmitted by the remote end, and send it to the decoding module for decoding processing.
  • the transmission interface module 64 in this embodiment may be various wired or wireless interfaces capable of transmitting, such as: a broadband interface, a Bluetooth interface, an infrared interface, or an access technology of a mobile communication network using a mobile phone.
  • the transmission interface module only needs to transmit the local target and the depth value in the local scene content, and the amount of data is reduced compared with the original local scene content, which can reduce the bandwidth occupation rate during data transmission.
  • the transmission interface module 64 of the video communication device of this embodiment needs to perform processing to display.
  • the decoding module 65 is configured to decode the received remote data to obtain the target content of the remote end and its corresponding depth value.
  • the synthesizing module 66 is configured to fuse the decoded far-end target content with the local background content according to the depth value, to obtain a color image of the synthesized far-end target content and the local background, and a corresponding depth value, where the local background content is
  • the extraction module 69 is completed.
  • the occlusion relationship is determined according to the depth value of the remote target content and the depth value of the local background content, and then the corresponding color image content is synthesized according to the occlusion relationship.
  • the display module 67 is a three-dimensional display device
  • the virtual image of another view is further reconstructed according to the composite color image content and the corresponding depth value. Therefore, the view reconstruction module 68 may be further included in the embodiment.
  • the image content is reconstructed from the view to generate a virtual view image, and the virtual view image and the composite color image form a stereo view, and are sent to the three-dimensional display device for stereoscopic display.
  • the received remote target content (person) is given, and the depth of the remote target content is illustrated, as well as the local background content (tree and table) acquired locally by the depth camera mode, and The depth of the local background content is extracted, and then synthesized according to the depth relationship therein to obtain a synthesized scene.
  • the remote character can be inserted between the local table and the tree.
  • the scaling problem of the remote target content In order to perfectly integrate the remote character with the local background content, it may be necessary to adjust the position of the remote target content relative to the camera by the zoom unit 661, in which case the remote target content size needs to be simultaneously scaled. When it is necessary to pull the far-end target content to a closer distance, that is, when the depth value is reduced, the remote target content needs to be enlarged; when the far-end target content is arranged at a greater distance, the depth value is increased. When you need to narrow down its remote target content. Since the far-end target content is a single target, the range of depth variation is limited, and the scaling of the perspective relationship can be simplified to a linear scaling consistent with its depth when performing image scaling.
  • the synthesizing unit 662 needs to consider its mutual occlusion problem. Determining an occlusion relationship between the scaled far-end target content and the local background content according to the depth value of the scaled far-end target content and the depth value of the local background content, and when the horizontal and vertical positions of the pixel point coincide, the depth value is small.
  • the pixel occludes a point with a large depth value (close-range occlusion vista), and combines the scaled far-end target content and the local background content into the scene content according to the occlusion relationship between the scaled far-end target content and the local background content.
  • the first one is to use another set of cameras to capture the scene content of the local target opposite to the local target.
  • the content of the scene is displayed by the human.
  • the content of the scene is directly combined with the content of the remote target.
  • the background seen is merged with the far-end characters. Since the opposite scene is used directly, there is no hole filling problem, but a set of cameras needs to be added at each end of the video communication.
  • Another solution is to use the local background content left after culling the local target content, and fill it with the method of edge pixel filling for possible holes.
  • the video communication device of the embodiment adopts a three-dimensional stereoscopic display device, and the display device supports only the left and right image input mode display, another image needs to be reconstructed, thereby realizing stereoscopic display.
  • Some autostereoscopic displays support a two-dimensional color image and its corresponding depth value for three-dimensional display, so that there is no need to reconstruct another image, but the auto-stereoscopic display itself reconstructs another image. And in the reconstruction process to complete the corresponding hole filling, such as hi l ips stereo display.
  • View reconstruction also known as virtual view image synthesis, generally refers to reconstructing images from other perspectives from models or images at different angles. This embodiment is implemented by the view reconstruction module 68.
  • the parallax between the virtual view and the known view can be calculated according to the following formula:
  • d is the parallax between the virtual view and the known view
  • f is the focal length of the camera
  • B is the distance between the virtual view and the original camera point
  • Z is the depth of the image.
  • the display module in this embodiment is used to display the synthesized image.
  • the display module 67 can be a stereoscopic display device including an autostereoscopic display device, stereoscopic glasses and a three-dimensional display of the holographic display device, etc., and realize stereoscopic display of the stereoscopic image, which allows the user to experience the depth of the scene and feel the stereoscopic effect.
  • stereoscopic display it is generally necessary to perform the above-described view reconstruction and hole filling.
  • the display module can also be a common two-dimensional display device, and only the two-dimensional composite image is displayed. When only the two-dimensional image needs to be displayed, the view reconstruction is not required, and the synthesized two-dimensional image is directly displayed.
  • This embodiment is an example of a communication process in a video communication system. Specifically, two users (A and B) communicate through the video communication device in Embodiment 2. In the communication process, user A sends video data to user B, and The user B receives the video data of the user A.
  • the structure of the video communication system is as shown in FIG. 10, including a transmitting end and a receiving end, and the transmitting end and the receiving end are connected through a network.
  • the sending end is configured to obtain the content of the scene and the depth value of the sending end, and according to the depth value of the scene content of the sending end, segment the target content of the sending end from the scene content of the sending end, and set the target content and the depth value of the sending end
  • the sending end includes: the information acquiring module 1001, configured to: capture the content of the local scene, and calculate the depth value corresponding to the local scene content, or directly obtain the depth value corresponding to the local scene content; the segmentation module 1002, For encoding the local target content from the local scene content according to the depth value; the encoding module 1003 is configured to encode the segmented local target content and its corresponding depth value; the transmission interface module 1004 is configured to be local The target content and its depth value are sent to the receiving end.
  • the receiving end is configured to receive the target content and the depth value sent by the sending end, and obtain the background content of the receiving end and the depth value thereof, and synthesize the target content of the sending end and the background content of the receiving end to synthesize the scene content according to the depth value.
  • the receiving end includes a transmission interface module 1 005 for receiving the target content sent by the remote end and its depth value, and a decoding module 1 006 for decoding the received remote target content and its depth value;
  • the 007 is configured to merge the decoded remote target content with the local background content, and generate a stereo view according to the corresponding depth value, where the local background content may be the remaining content after the local target content is segmented in the local scene content,
  • the extraction module 101 0 extracts the remaining content; the local background content may also be the scene content opposite to the local target captured by another group of cameras;
  • the display module 1 009 is used to realize the display of the paired image, which may be a stereoscopic display device Or a normal two-dimensional display device, if it is a stereoscopic display device, it is necessary to reconstruct a two-dimensional image of another viewpoint. Reconstructing another viewpoint 2D image can be done by the view reconstruction module 008.
  • the information acquiring module of the video communication device of the user A obtains the local scene content and the depth value thereof.
  • the depth of the local scene content and the scene content may be obtained by using the depth camera or the stereo camera.
  • B is the distance between the two cameras, and ⁇ ⁇ is the position difference of each pixel in the two cameras.
  • the segmentation module of the user's video communication device separates the local target content from the local scene content, specifically: performing face recognition on the captured local scene content by the face recognition unit in the segmentation module to obtain a face image.
  • the position is then searched by the search unit in the segmentation module for the depth value corresponding to the face image position in the depth value of the local scene content, and the range of the character depth value in the captured picture is determined according to the found depth value.
  • the segmentation unit in the segmentation module segments the person object from the local scene content according to the determined region.
  • the remaining content after the local scene content is segmented by the local character target, and the depth value of the remaining content may be saved; and the background content of the opposite side of the character target may be simultaneously acquired by another camera. Its depth value, and save.
  • the local character target In order to unify the target size of the local character, the local character target needs to be expanded to the size of the original captured image, or cropped to a picture of another size; the void area generated after the clipping can be filled with a value of 0.
  • step 905. Encode the local character target and the depth value obtained in step 904 respectively, preferably using layered coding, and using layered coding requires less data to be transmitted.
  • the above steps complete the sending operation of User A.
  • the following steps are for User B to receive data and process the data.
  • the video communication device of user B receives the human object and the depth value sent by user A through the transmission interface module.
  • the video communication device of user B decodes the received data through a decoding module, and obtains a character target of the user A and a depth value thereof. At the same time, the video communication device of the user B needs to obtain the depth value of the background content and the background content. In general, the remaining content after the local target is removed from the local scene content may be used as the background content. If the background content of the opposite side of the user B and its depth value are obtained by another camera, the picture seen by the user B is made more realistic, and there is no hole problem when the image is synthesized.
  • the person object and the depth value sent by the user A are scaled by the scaling unit in the synthesizing module to obtain a target object of a relatively ideal size, and when the remote target content needs to be pulled to a closer distance, the image is reduced.
  • the remote target content needs to be enlarged; when the far-end target content is arranged at a longer distance, that is, when the depth value is increased, the remote target content needs to be reduced.
  • the occlusion relationship between the far-end character target and the local background content is as follows: When the horizontal and vertical positions of the pixel point coincide, the pixel point having a small depth value blocks a point having a large depth value (close-range occlusion vision).
  • composition unit in the synthesis module then combines the character object and the background content into a scene content according to the above determined occlusion relationship.
  • the holes in the synthesized scene content need to be pixel-filled; if the background content is directly obtained from the opposite scene of the user B, the pixel filling is not performed.
  • the view reconstruction module performs virtual view image synthesis on the synthesized scene content, specifically calculating a parallax between the virtual view and the known view according to the following formula:
  • d is the parallax between the virtual view and the known view
  • f is the focal length of the camera
  • B is the distance between the virtual view and the original camera point
  • Z is the depth of the image.
  • the color of the pixel at a certain scan line in the right image is determined by the color of the pixel at the corresponding scan line ⁇ ' in the left image (composite image), where ⁇ 'the coordinates Determined by:
  • the device of the user A may further include a video receiving device
  • the device of the user B may further include a video pre-processing device to ensure that the user B can send the video data to the user A. If User B needs to send video data to User A, the process is the same as that of Figure 9, except that the sender and receiver change.
  • the embodiments of the present invention are mainly used in video communication, for example: Video chat, office video calls, video conferencing, etc.
  • An embodiment of the present invention provides a video processing method, as shown in FIG. 11, including the following steps: Step 111: Acquire at least one color/gray image of at least two viewpoints of a video object to be presented and at least one of at least one presentation material Color/grayscale image of the viewpoint;
  • Step 112 Synthesize a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one rendering material to obtain a rendering image; Step 113, the rendering The image is displayed in three dimensions.
  • a multi-view color/gray image of the acquired video object and a color/gray image of the rendered material are generated, and the rendered image is supported, and the rendered image supports a three-dimensional display manner, and then the rendered image is displayed. Displayed in a three-dimensional manner, the problem that the prior art can only support the presentation of the two-dimensional video is solved, thereby realizing the presentation of the three-dimensional video.
  • a brief description of Stereo Video / 3D video technology is used.
  • the conventional video technology can only provide two-dimensional information.
  • the embodiment of the present invention enables the user as a viewer to not only obtain information about the content of the scene but also obtain depth information about the distance, location, and the like of the scene. .
  • 3D video technology can provide images with depth information conforming to the principle of stereo vision. It can truly reproduce the objective world scene and express the depth, layering and authenticity of the scene. It is an important direction for the development of video technology.
  • the basic principle of 3D video technology is to simulate the principle of human eye imaging.
  • the left eye image and the right eye image are obtained by using two cameras.
  • the left and right eyes are respectively seen by the left and right eye images, and finally the stereoscopic image is synthesized.
  • the observer can feel the depth of the scene. Therefore, binocular stereo video can be viewed as an extension of depth information added to existing 2D video.
  • multi-view video technology is that two or more cameras simultaneously shoot scenes, for example, sports or drama scenes, different cameras have different shooting angles, and generate more than two video streams; these different viewpoint video streams are sent to the user terminal.
  • the user can select any viewpoint and direction to view the scene.
  • the above viewpoint may be a pre-defined fixed camera's shooting viewpoint, or may be a virtual viewpoint, and an image thereof is synthesized by an image taken by a real camera around.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • An embodiment of the present invention provides a video sending method. As shown in FIG. 12, the method includes the following steps: Step 121: Acquire at least one color/gray image of at least two viewpoints of a video object to be presented and at least one of at least one presentation material Color/grayscale image of the viewpoint;
  • the video object to be presented above includes an object such as a person and an object as a foreground after the natural scene segmentation, and may also be a background object; the presentation material may be a document to be presented, a picture, a video, or a computer-generated graphic or the like.
  • the method for acquiring a color/grayscale image of at least two viewpoints of a video object includes the following steps:
  • Step 1210 Acquire at least one depth information and at least one color/gray image of the same viewpoint of the scene in which the video object is located.
  • the above-mentioned depth image can be obtained by using an image matching method based on the image matching method.
  • the embodiment of the present invention can collect the depth of the scene.
  • the information camera acquires the above depth information and acquires a color/grayscale image through a camera capable of acquiring scene color/grayscale images. The following is a brief description of how these two cameras work.
  • FIG. 13 a schematic diagram of a depth image is obtained by using a camera equipped with a super fast shutter (Charge Coupled Device, CCD) and an intensity modulated illuminator.
  • CCD Charge Coupled Device
  • object a is a square object
  • object b is a triangular object.
  • the sum in the figure is the illumination intensity of the objects a and b acquired by the camera during the shutter opening time (shown as c in the figure), and the modulated light with decreasing intensity and the modulated light with decreasing intensity, respectively.
  • the instantaneous illumination intensity of the reflected light on the near object a to the camera is detected by the ultra-fast shutter of the camera detecting device, and a square distribution in the image A is obtained; the object b reflects the light to obtain a triangular distribution in the image A. . Since the object a is closer to the camera, the instantaneous light intensity detected by the camera is stronger, and the brightness of the square image is brighter than the triangle. Therefore, the difference in brightness of the captured image A can be used to detect the depth of the object. However, the brightness of the reflected light of the object is affected by parameters such as the reflectivity of the object, the distance from the object to the camera, the modulation index of the light source, and the spatial inhomogeneity of the illumination.
  • the image B can be obtained by linearly decreasing the spatial distribution of the illumination intensity described above, and the image A and the image B can be combined, and the adverse effect can be eliminated by the signal processing algorithm, and an accurate depth map is obtained.
  • the depth of the object b (the triangular object in the figure) is larger than the depth of the object a (the square object in the figure), that is, the object a is visually closer to the camera, and the object b is farther from the camera.
  • the camera may be used to obtain depth information, and then a camera that obtains a color/gray image may be configured, or a camera capable of simultaneously acquiring depth information and color/gray images may be directly used.
  • Fig. 14 it is a basic configuration diagram of a high-definition (Axi-Vis) camera, which can simultaneously acquire depth information and color/grayscale images.
  • a depth image processing unit and a color image processing unit are included.
  • Near-infrared LED arrays are used to modulate illuminators with the ability to quickly and directly modulate the near-infrared LED array with a wavelength of 850 nm that is outside the visible range and does not interfere with visible light.
  • Four LED arrays can be used around the camera lens to evenly illuminate the scene.
  • the color image processing unit can be a color HD camera; the near-infrared light is processed by the depth image processing unit to obtain the depth of the object Image.
  • the near-infrared light separated by the dichroic prism is focused on the photocathode, and a short pulse bias is applied between the photocathode and the microchannel plate (MCP).
  • a shutter of one billionth of a second is achieved, and the shutter is turned on and the light modulation frequency has the same frequency to obtain a better signal-to-noise ratio (SNR).
  • SNR signal-to-noise ratio
  • An optical image of the object is obtained on the phosphor by opening the shutter, and the optical image is then focused by a relay lens onto a high-resolution progressive CCD camera, converted into a photoelectron image, and finally a depth map of the object is formed by the signal processor.
  • Step 1211 Perform video segmentation on the color/gray image of the same view according to the acquired depth information of one view to acquire the video object.
  • Video segmentation can be performed in any one of a variety of ways, dividing a video image into foreground and background. For example, a chroma-key segmentation technique, a Depth-key segmentation technique, or a technique of detecting a difference between a current image and a pre-photographed background image is performed.
  • the first and third technologies have more restrictions on the scenario.
  • the embodiment of the present invention adopts a deep key segmentation technology for video segmentation, and mainly includes technical points:
  • a binarized depth mask is generated from the threshold according to the depth map, and the video object is extracted according to the mask. For example, a pixel with a mask value of 1 is a foreground object pixel, and a pixel with a mask value of 0 is a background object pixel, so the video object can be extracted or removed according to the mask value.
  • a chain code description is constructed for the boundary of the depth mask, and the contour of the foreground object is restored according to the chain code description.
  • a processing area is defined as an area of the alpha synthesis by the outline of the object.
  • Step 1212 Generate a color/grayscale image of at least one other viewpoint of the video object according to the depth information of the one viewpoint and the color/gray image of the same viewpoint.
  • d can be obtained by the depth map, so that the color/gray image of the video object at another viewpoint can be generated.
  • the embodiment of the present invention uses the sub-pixel level for the calculated pixel position.
  • the weighted averaging method can be used to determine the luminance and chrominance values of the corresponding new pixels based on the luminance and chrominance values of neighboring pixels of a pixel on the original image.
  • a color/grayscale image of two or more different viewpoints of the video object can be generated.
  • FIG. 15 the case where "occlusion" is generated at the time of generation of a multi-view image is shown, wherein the left side of FIG. 15 (1) shows the case where the scene is observed at the original image viewpoint, and the right side of FIG. 15 (2) shows Observe the scene at a new viewpoint where the image needs to be reconstructed.
  • the front object shown as A in the figure
  • C shows the occlusion area
  • the front object when the scene is observed at the left side ⁇ 3 ⁇ 4, the front object
  • the occlusion area of the rear object becomes larger, and a part of the image obtained at Q is not displayed in the image at A. Therefore, in the pixel mapping process, it is necessary to determine whether a pixel is occluded, and if it is not occluded, perform the above-described pixel mapping process; if the pixel is occluded, skip the process.
  • FIG. 16 a case where a "hole” is generated at the time of generation of a multi-view image is shown, wherein the left side of FIG. 16 (1) shows the case where the scene is observed at the original image viewpoint Q, and the right side is shown in FIG. 16 (2).
  • FIG. 16 (1) shows the case where the scene is observed at the original image viewpoint Q
  • FIG. 16 (2) shows the right side.
  • the left side of the 6 ⁇ position due to the occlusion of the front object (shown as A in the figure) on the rear object (shown as B in the figure), the left part of the rear object is invisible, so it is generated at Q. There are no pixels in this part of the image.
  • all pixels in the new image can be set to a special color value before new image generation.
  • the area in the image that still has a special color value is the hole area.
  • the corresponding information of the pixels in the hole area may be determined according to the depth, brightness, and chrominance information of the pixels around the hole, and repaired, for example, by linear or non-linear interpolation;
  • the motion compensation method can be used to search for the video frame sequence before the current reconstructed frame, and the pixel information corresponding to the hole region is found and repaired.
  • the above method of obtaining a color/grayscale image of at least two viewpoints of a video object to be presented is also applicable to acquiring a color/grayscale image of at least one viewpoint of at least one of the presented materials.
  • Step 122 synthesize a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one presentation material to obtain a presentation image; the image synthesis is to be divided
  • the foreground objects are seamlessly integrated into the background, optional, here
  • the background may be a color/grayscale image generated by the rendering material via the multi-viewpoint image, or a color/grayscale image of the viewpoint of the rendering material.
  • the method adopted in the embodiment of the present invention is to perform a lpha value synthesis on the edge and background of the foreground object based on the depth image, and the main technical points included are:
  • an area to be processed is defined, which is mainly located near the boundary of the foreground object, and is a rectangular area centered on the above-mentioned chain code point.
  • the size of the treated area is related to the sharpness of the edge of the foreground object.
  • the degree of sharpness of the edge of the foreground object is obtained by calculating the derivative of the vertical direction of the edge of the object.
  • the sharper edge foreground to background transition area is small, so the required processing area is smaller; the blurred area foreground to background is transitioned step by step, so a larger processing area is required.
  • the pixels are assumed to be either all in the foreground object (opaque al pha value, pure foreground color), or all outside the foreground object (transparent a lpha value, pure background color).
  • a mixture of foreground and background pixels having a lpha translucent value the color value of the pixel is foreground color and background color ⁇ mixing:
  • I(i, j) a * F(i ) + ( ⁇ - a) * B(i )
  • the color value of a certain pixel in the processing area can be calculated by estimating the a 1 pha value.
  • the above-mentioned key techniques for image synthesis used in the embodiment of the present invention are given, but are not limited thereto, and other methods may be used. Image synthesis.
  • the location information of the video object and the control command information from the video object may be acquired by using the depth information, and the color/gray image of the at least two viewpoints of the video object is obtained.
  • a color/grayscale image of at least one viewpoint of the at least one presentation material is synthesized to obtain a rendered image.
  • the location information is obtained from the depth image described above for controlling the position of the video object on the presentation material; the control command information is used to control the content of the presentation material and the location of the video object on the presentation material.
  • three-dimensional coordinate information of the gesture feature points of the video object in the scene can be obtained, and the gesture of the video object is recognized by the three-dimensional coordinate information of the feature points. Since the depth map can extend the gesture recognition of the video object to the three-dimensional space, the front and back motions of the video object gesture can also be accurately recognized.
  • the identification information is converted into control command information.
  • the embodiment of the present invention can also obtain the change information of the depth of the video object by detecting the same feature points in different time domain spaces, and convert the change information into control command information.
  • the embodiment of the invention further includes:
  • Step 123 Encode and send the presented image.
  • the acquired rendered image is compressed, for example, by using the H.264 protocol or the MPEG-4 protocol to encode the rendered image for limited network bandwidth and for transmission over the network to the far end.
  • the compressed image is decoded.
  • the encoded image is decoded at the receiving end according to the H.264 protocol or the MPEG-4 protocol, and the rendered image is obtained in three-dimensional The way to display.
  • the rendered image may be displayed in a three-dimensional manner using a device such as stereo glasses, an autostereoscopic display, or a projector.
  • Embodiments of the present invention include displaying the rendered image in a two-dimensional (2D) manner, so that the two-dimensional and three-dimensional image presentation modes are compatible.
  • the multi-view image generation of the video object is not required, and the direct The acquired video object and the color/grayscale image of the rendered material are combined and displayed in two dimensions.
  • the embodiment of the present invention further provides a video sending method, as shown in FIG. 17, to ensure remote rendering of a three-dimensional video, including:
  • Step 171 Acquire a color/grayscale image of at least two viewpoints of a video object to be presented and a color/grayscale image of at least one viewpoint of at least one presentation material;
  • Step 172 Color/grayscale images of at least two viewpoints of the video object to be presented and A color/grayscale image of at least one viewpoint of the at least one presentation material is encoded and transmitted.
  • the embodiment of the present invention further provides a video receiving method, which can implement remote presentation of a three-dimensional video, as shown in FIG. 18, including:
  • Step 181 Acquire an encoded image, and decode the obtained encoded image to obtain a color/grayscale image of at least two viewpoints of the video object to be presented and a color/grayscale image of at least one viewpoint of the at least one presentation material;
  • Step 182 Synthesize a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one presentation material to obtain a presentation image.
  • Step 183 The image is displayed in three dimensions.
  • Embodiments of the present invention include displaying the rendered image in a two-dimensional (2D) manner, so that the two-dimensional and three-dimensional image presentation modes are compatible.
  • the multi-view image generation of the video object is not required, and the direct The acquired video object and the color/grayscale image of the rendered material are combined and displayed in two dimensions.
  • step 171 and step 172 and step 181 to step 183 in the embodiment of the present invention refer to the fifth embodiment.
  • the main differences are as follows:
  • the fifth embodiment of the present invention mainly processes the collected video image by the transmitting end, for example, video segmentation, multi-view generation, etc., and the transmitting end encodes the synthesized image.
  • the presentation image includes the video object and the presentation material, and the decoded image (ie, the presentation image) is three-dimensionally presented.
  • the collected video image is processed by the transmitting end. Transmitting, on the transmitting end, only the depth information and the color/gray image of the video object and the presentation material, and decoding the encoded image at the receiving end, and then synthesizing with the presentation material to generate a presentation image. , the rendered image is displayed in three dimensions.
  • a multi-view color/gray image of the acquired video object and a color/gray image of the rendered material are generated, and the rendered image is supported, and the rendered image supports a three-dimensional display manner, and then the rendered image is displayed.
  • the existing technology can only support The rendering of the two-dimensional video brings about the rendering of the three-dimensional video.
  • the embodiment of the present invention further provides a video processing apparatus, as shown in FIG. 19, comprising: an image acquisition and processing unit 191, configured to acquire a color/gray image of at least two viewpoints of a video object to be presented, and at least one Generating a color/grayscale image of at least one viewpoint of the material; synthesizing a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one rendering material Presenting an image;
  • the display unit 192 is configured to display the rendered image in a three-dimensional manner.
  • the image acquisition and processing unit 191 described above includes:
  • An image acquisition module configured to acquire at least one depth information and at least one color/gray image of the same viewpoint of the scene in which the video object is located;
  • a video segmentation module configured to perform video segmentation on a color/gray image of the same view according to the obtained depth information of one view, to acquire the video object;
  • a multi-viewpoint image generating module configured to generate a color/grayscale image of at least one other viewpoint of the video object according to the acquired depth information of one viewpoint and a color/grayscale image of the same viewpoint;
  • a synthesizing module configured to synthesize a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one rendering material to obtain a rendered image.
  • a multi-view color/gray image of the acquired video object and a color/gray image of the rendered material are generated, and the rendered image is supported, and the rendered image supports a three-dimensional display manner, and then the rendered image is displayed. Displayed in a three-dimensional manner, the problem that the prior art can only support the presentation of the two-dimensional video is solved, thereby realizing the presentation of the three-dimensional video.
  • an embodiment of the present invention provides a video transmission apparatus, including:
  • the image obtaining unit 201 is configured to acquire color of at least two viewpoints of the video object to be presented/ a grayscale image and at least one color/grayscale image of at least one viewpoint of the presentation material; an encoding unit 202, a color/grayscale image for at least two viewpoints of the video object to be rendered, and the at least one rendering Color/grayscale image of at least one viewpoint of the material is encoded to obtain a coded image;
  • the sending unit 203 is configured to send the encoded image.
  • the image obtaining unit 201 described above includes:
  • An image acquisition module configured to acquire at least one depth information and at least one color/gray image of the same viewpoint of the scene in which the video object is located;
  • a video segmentation module configured to perform video segmentation on a color/gray image of the same view according to the obtained depth information of one view, to acquire the video object;
  • a multi-viewpoint image generating module configured to generate a color/grayscale image of at least one other viewpoint of the video object according to the acquired depth information of one view and the color/grayscale image of the same view.
  • the embodiment of the present invention further provides a video receiving apparatus, including:
  • the receiving unit 210 is configured to receive a coded image.
  • a decoding unit 211 configured to decode the acquired encoded image, obtain a color/grayscale image of at least two viewpoints of the video object to be presented, and a color/grayscale image of at least one viewpoint of the at least one presentation material;
  • a synthesizing unit 212 configured to synthesize a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one presentation material to obtain a rendered image;
  • the display unit 21 3 is configured to display the presented image in a three-dimensional manner.
  • a multi-view color/gray image of the obtained video object and a color/gray image of the rendered material are generated, and the rendered image is supported, and the rendered image supports three-dimensional display.
  • the presented image is displayed in a three-dimensional manner, which solves the problem that the prior art can only support the presentation of the two-dimensional video, thereby realizing the presentation of the three-dimensional video.
  • An embodiment of the present invention provides a video transmitting apparatus, as shown in FIG. 22, including: an image acquiring and processing unit 221, configured to acquire a color/grayscale image of at least two viewpoints of a video object to be presented; a color/grayscale image of at least two viewpoints of the video object and a color/grayscale image of at least one viewpoint of the at least one view material are combined to obtain a rendered image; an encoding unit 222 is configured to encode the rendered image to obtain a coded image; a sending unit 223, configured to send the coded image.
  • a multi-view color/gray image of the acquired video object and a color/gray image of the rendered material are generated, and the rendered image is supported, and the rendered image supports a three-dimensional display manner, and then the rendered image is displayed. Displayed in a three-dimensional manner, the problem that the prior art can only support the presentation of the two-dimensional video is solved, thereby realizing the presentation of the three-dimensional video.
  • the embodiment of the present invention provides a video communication system, as shown in FIG. 23, including: a video transmitting device 231 and a video receiving device 232.
  • the video transmitting device 231 includes:
  • An image acquisition and processing unit 2311 configured to acquire a color/grayscale image of at least two viewpoints of the video object to be presented and a color/grayscale image of at least one viewpoint of the at least one presentation material; at least two of the video objects Combining a color/grayscale image of the viewpoints with a color/grayscale image of the at least one viewpoint of the at least one rendering material to obtain a rendered image;
  • the encoding unit 2312 is configured to: encode the rendered image to obtain a coded image; and send a unit 2313, configured to send the coded image;
  • the video receiving device 232 includes:
  • a receiving unit 2321 configured to receive the encoded image
  • the decoding unit 2322 is configured to decode the acquired encoded image to obtain the presented image
  • the display unit 2323 is configured to display the rendered image in a three-dimensional manner.
  • the video communication system can also perform the three-dimensional display of the video image of the network receiving end on the transmitting end, and the transmitting end and the receiving end can have the same video image processing and display function, and the video presentation is performed.
  • the sending device 231 further includes:
  • a second decoding unit configured to decode the acquired encoded image to obtain a decoded rendered image
  • a second display unit configured to perform three-dimensional display on the presented image.
  • the receiving device 232 of the video presentation further includes:
  • a second image acquisition and processing unit configured to acquire a color/grayscale image of at least two viewpoints of the video object to be presented and a color/grayscale image of at least one viewpoint of the at least one presentation material; Combining a color/grayscale image of two viewpoints with a color/grayscale image of at least one viewpoint of the at least one rendering material to obtain a rendered image;
  • a second coding unit configured to encode the presented image, obtain a coded image, and send the image.
  • the technical solution provided by the embodiment of the present invention generates a presentation image by using a multi-view color/gray image of the acquired video object and a color/gray image of the presentation material, and the presentation image supports a three-dimensional display manner, and then the presentation is performed.
  • the image is displayed in a three-dimensional manner, which solves the problem that the prior art can only support the rendering of the two-dimensional video, thereby realizing the rendering of the three-dimensional video.
  • a presentation image including a video object and a presentation material may be encoded at a transmitting end of the network for transmission over a network.
  • the decoding unit at the receiving end of the network decodes the received data, and then the display unit directly renders the decoded image, but is not limited thereto, and optionally includes only the video object and the video object at the transmitting end.
  • the depth information of the material and the color/grayscale image are encoded for transmission over the network.
  • the decoded image is combined with the stereoscopic image of the rendering material to obtain a rendered image, which is then presented by the display unit.
  • the video communication system provided by the system embodiment of the present invention can be compatible with two-dimensional rendering and three-dimensional rendering, and can display the decoded image in two-dimensional or three-dimensional manner.
  • the technical solution provided by the embodiment of the present invention generates a presentation image by using a multi-view color/gray image of the acquired video object and a color/gray image of the presentation material, and the presentation image supports a three-dimensional display manner, and then the presentation is performed.
  • the image is displayed in a three-dimensional manner, which solves the problem that the prior art can only support the rendering of the two-dimensional video, thereby realizing the rendering of the three-dimensional video.

Description

视频通讯方法、 装置及系统 技术领域
本发明涉及视频通讯领域, 特别涉及一种视频通讯方法, 以及采用该视 频通讯方法的装置和系统。
背景技术
随着通讯技术的不断发展, 视频通讯技术已经得到较为广泛的运用, 例 如: 视频电话、 视频会议等均运用了视频通讯技术。 目前的各种视频通讯应 用主要采用传统的二维图像或视频。
目前对于图像内容的目标提取, 主要采用 Chroma key (色度编码)方法 通过颜色分割提取出视频中的前景目标。 在视频通讯中, 将提取出的前景目 标与其它远端视频合成从而增加真实感。 例如: 将视频中的人 (前景目标 ) 和远端的幻灯片讲稿合成。 但是 Chroma key方法存在如下缺陷:
1、 Chroma key方法要求被分割视频的背景采用蓝色、 绿色或其它单一颜 色, 这样才能实现前景目标和背景的分割, 并且要求前景中不能出现背景中 的颜色, 由于该方法对背景和前景的颜色要求严格, 所以导致使用不便。
2、 Chroma key方法仅能区分出前景目标和背景, 而不能将场景内容分成 更多的层次, 无法实现前景中部分目标的替换, 如在会议场景中, 人的前方 可能存在桌子, 如果将对方的桌子替换成本地的桌子, 则可以增加真实感, 但是 Chroma key方法不能将对方的桌子替换成本地的桌子, 不能进一步提高 真实感。
3、该技术仅实现了二维视频内容的替换,无法使用户体验到场景的深度, 缺乏真实感。
上述二维图像或视频只能表现出景物的内容, 不能反映出景物的远近、 位置等深度信息。
人类习惯使用两只眼睛来观察世界, 由于双目视差的存在, 使得观察到 的景物具有较好的远近、 位置感知, 能够体现出景物的立体感。 立体视频技 术基于双目视差原理, 通过给人的左右眼显示略有差异的场景内容, 从而使 人获得场景的纵深感和层次感。
为了增加视频通讯的真实感, 现有技术运用立体视频技术, 并对通讯双 方场景的特别装饰, 使用户感觉通讯双方处于同一场景中, 以增加真实感。 例如: 将通讯双方的室内环境布置成一样的, 这样在通讯过程中, 用户看到 视频中的对方, 就像对方就处在和自己一样的场景中, 但该方法应用范围受 到双方环境布置的限制。
现有技术还提供了一种交互式呈现系统, 该系统主要包括: 一个活动的 红外摄像机、 和与红外摄像机相连的命令识别单元和合成单元。 现有技术中 的呈现系统, 首先利用活动的红外摄像机拍摄视频对象的图像, 获取视频对 象的信息; 然后, 命令识别单元, 将该信息转化为输出命令信号, 并发送给 合成单元; 其中, 来自视频对象的信息可以包括红外摄像机拍摄得到识别视 频对象手势的红外图像, 或者接收到的视频对象的语音; 最后, 合成单元, 将红外摄像机拍摄得到的视频对象的图像和呈现材料的图像进行合成, 控制 视频对象的位置, 并且根据命令识别系统的输出命令信号控制呈现材料的屏 幕显示。
但是, 现有技术提供的呈现系统只能支持二维视频的显示方式, 不支持 以三维视频的方式进行显示。
发明内容
本发明的实施例提供一种视频通讯方法、 设备及系统, 不受通讯双方的 环境布置的限制, 增加通讯双方在通讯过程中的真实感, 并将场景内容以三 维视频的方式进行显示。
本发明的实施例采用如下技术方案:
一种视频预处理方法, 包括:
获取本地场景内容及其深度值; 根据所述本地场景内容的深度值从所述本地场景内容中分割出本地目标 内容。
一种视频预处理装置, 包括:
信息获取模块, 用于获取本地场景内容及其深度值;
分割模块, 用于根据所述本地场景内容深度值, 从所述本地场景内容中 分割出本地目标内容。
一种视频接收方法, 包括: 获取本地背景内容和所述本地背景内容的深度值;
根据所述远端目标内容的深度值和所述本地背景内容的深度值将所述远 端目标内容和所述本地背景内容合成场景内容。
一种视频接收装置, 包括:
传输接口模块, 用于接收远端发送的远端目标内容及所述远端目标内容 深度值;
提取模块, 用于获取本地背景内容及所述本地背景内容的深度值; 合成模块, 用于根据所述远端目标内容的深度值和所述本地背景内容的 深度值将所述远端目标内容和所述本地背景内容合成场景内容。
一种视频通讯系统, 包括发送端和接收端:
发送端, 用于获取发送端场景内容及所述发送端场景内容的深度值, 根 据所述发送端场景内容深度值, 从所述发送端场景内容中分割出发送端目标 接收端, 用于接收所述发送端发送的所述发送端目标内容及所述发送端 目标内容的深度值, 并获取接收端背景内容及所述接收端背景内容的深度值, 根据所述发送端目标内容的深度值和所述接收端背景内容的深度值将所述发 送端目标内容和所述接收端背景内容合成场景内容。
一种视频处理方法, 包括: 获取要呈现的视频对象的所述至少两个视点的彩色 /灰度图像和至少一 个呈现材料的至少一个视点的彩色 /灰度图像;
将所述视频对象的所述至少两个视点的彩色 /灰度图像和所述至少一个 呈现材料的所述至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像; 将所述呈现图像以三维的方式进行显示。
一种视频发送方法, 包括:
获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至少一个呈 现材料的至少一个视点的彩色 /灰度图像;
将所述视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现 材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
将所述呈现图像进行编码并发送。
一种视频发送方法, 包括:
获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至少一个呈 现材料的至少一个视点的彩色 /灰度图像;
将所述要呈现的视频对象的至少两个视点的彩色 /灰度图像和所述至少 一个呈现材料的至少一个视点的彩色 /灰度图像进行编码并发送。
一种视频接收方法, 包括:
获取编码图像;
对所述编码图像进行解码, 得到要呈现的视频对象的至少两个视点的彩 色 /灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像;
将所述视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现 材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
将所述呈现图像以三维的方式进行显示。
一种视频处理装置, 包括:
图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点的彩 色 /灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将所述 视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的至少 一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
显示单元, 用于将所述呈现图像, 以三维的方式进行显示。
一种视频发送装置, 包括:
图像获取单元, 用于获取要呈现的视频对象的至少两个视点的彩色 /灰度 图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像;
编码单元, 用于将所述要呈现的视频对象的至少两个视点的彩色 /灰度图 像和所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行编码, 得到 编码图像;
发送单元, 用于发送所述编码图像。
一种视频接收装置, 包括:
接收单元, 用于接收编码图像;
解码单元, 用于对获取到的所述编码图像进行解码, 获取要呈现的视频 对象的至少两个视点的彩色 /灰度图像和至少一个呈现材料的至少一个视点 的彩色 /灰度图像;
合成单元, 用于将所述视频对象的至少两个视点的彩色 /灰度图像和所述 至少一个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图 像;
显示单元, 用于将所述呈现图像, 以三维的方式进行显示。
一种视频发送装置, 包括:
图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点的彩 色 /灰度图像; 将所述视频对象的至少两个视点的彩色 /灰度图像和至少一个 呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
编码单元, 用于将所述呈现图像进行编码, 得到编码图像;
发送单元, 用于发送所述编码图像。
一种视频通信系统, 包括: 视频发送装置和视频接收装置, 所述视频发送装置包括:
图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点的彩 色 /灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将所述 视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的至少 一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
编码单元, 用于将要所述呈现图像进行编码, 得到编码图像;
发送单元, 用于发送所述编码图像;
所述视频接收装置包括:
接收单元, 用于接收所述编码图像;
解码单元, 用于对获取到的编码图像进行解码, 获取所述呈现图像; 显示单元, 用于将所述呈现图像, 以三维的方式进行显示。
由上述技术方案所描述的本发明实施例, 在视频通讯过程中, 本地显示 的画面需要通过本地的背景内容和远端的目标内容合成, 使得用户看到画面 中的背景和自己当前所处的场景完全相同, 就像是通讯双方所处的环境是一 样的, 能够增加用户通讯过程中的真实感。 并且由于本实施例中并不需要对 通讯双方的场景进行特殊布置, 允许通讯双方所处的环境不同, 也不需要将 背景换成单一颜色, 故而在实施本发明实施例时, 不会受到通讯双方环境的 限制, 可增加通讯过程中的真实感。 并且本发明实施例通过获取的视频对象 的多视点彩色 /灰度图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈 现图像支持三维的显示方式, 再将该呈现图像以三维的方式显示出来, 解决 了现有技术只能支持二维视频的呈现带来的问题, 从而实现了三维视频的呈 现。
附图说明
图 1为本发明第一实施例视频预处理方法的流程图;
图 2为本发明第一实施例视频预处理装置的框图;
图 3为本发明第一实施例视频接收方法的流程图; 图 4为本发明第一实施例视频接收装置的框图;
图 5为本发明第一实施例视频通讯设备的原理图;
图 6为本发明第二实施例视频通讯设备的原理图
图 7为本发明第二实施例中采用的立体摄像机的原理图;
图 8为本发明第二实施例中合成场景内容过程的示意图;
图 9为本发明第三实施例视频通讯流程图;
图 10为本发明第三实施例视频通讯系统的结构图;
图 11为本发明方法实施例四提供的视频预处理方法流程图; 图 1 3为本发明方法实施例五提供的获取深度信息的摄像机原理示意图; 图 14 为本发明方法实施例五提供的获取深度信息和彩色 /灰度图像的摄 像机构造示意图;
图 15为本发明方法实施例五提供的出现遮挡时的示意图;
图 16为本发明方法实施例五提供的出现空洞时的示意图;
图 17为本发明方法实施例六提供的视频呈现的发送方法流程图; 图 18为本发明方法实施例六提供的视频呈现的接收方法流程图; 图 19为本发明装置实施例四提供的视频的呈现装置示意图;
图 20为本发明装置实施例五提供的视频呈现的发送装置示意图; 图 21为本发明装置实施例五提供的视频呈现的接收装置示意图; 图 22为本发明装置实施例六提供的视频呈现的发送装置示意图; 图 23为本发明系统实施例提供的通信系统示意图。
具体实施方式
本发明实施例将本地的背景内容和远端的目标内容合成一个画面进行显 示, 使得通讯双方不需要对场景进行特殊布置, 即可让画面中的场景和自身 所处场景相同, 增加通讯过程中的真实感, 并能将场景内容以三维视频的方 式进行显示。 下面结合附图对本发明视频通讯方法、 装置及设备的实施例进 行详细描述。
实施例 1 :
本实施例提供一种视频预处理方法, 如图 1 所示, 该视频预处理方法包 括如下步骤:
101、 通过深度摄像机或者立体摄像机来获取本地场景内容及其深度值。
102、 由本地场景内容的深度值可以将本地场景内容分成两个以上层次, 这样就可以将本地目标内容所在的层次分割出来, 即从本地场景内容中分割 出本地目标内容。
103、 将分割出的本地目标内容, 以及本地目标内容对应的深度值发送到 远端, 一般需要发送到通讯的对端。
本实施例中主要通过步骤 101和步骤 102完成对图像的预处理,步骤 103 是一个将预处理内容发送出去的步骤, 可以省略。
对应于上述视频预处理方法, 本实施例还提供一种视频预处理装置, 如 图 2所示, 该视频预处理装置包括: 信息获取模块 21、 分割模块 22和发送模 块 23。
其中, 信息获取模块 21用于获取本地场景内容及其深度值, 所述信息获 取模块可以通过深度摄像机或者立体摄像机来实现, 其中的深度摄像机采用 红外技术获取图像的深度, 而立体摄像机采用双摄像头来获取图像的深度。 得到本地场景内容的深度值后, 就可以将本地场景内容分成两个以上层次, 分割模块 22用于依据本地场景内容深度值, 从本地场景内容中分割出本地目 标内容。 发送模块 23用于将所述本地目标内容及其深度值发送到远端。
该视频预处理装置中主要通过信息获取模块 21和分割模块 22完成视频 预处理, 其中的发送模块 23可以省略。
为了能够完成视频通讯, 本实施例还提供一种与上述视频预处理方法对 应的视频接收方法, 如图 3所示, 该视频接收方法包括如下步骤:
301、 接收远端发送的目标内容及其深度值。 302、 获取本地的背景内容和背景内容的深度值。
303、根据深度值的不同,确定本地背景内容和远端目标内容的遮挡关系, 一般为深度值小的像素挡住深度值大的像素, 这样即可根据深度值的关系将 远端目标内容和本地背景内容合成场景内容。
对应于上述视频接收方法, 本实施例还提供一种视频接收装置, 如图 4 所示, 该视频接收装置包括: 传输接口模块 41、 提取模块 42和合成模块 43。
其中, 传输接口模块 41用于接收远端发送的目标内容及其深度值; 提取 模块 42用于获取本地的背景内容及其深度值; 合成模块 43用于根据深度值 的关系将远端目标内容和本地背景内容合成场景内容, 一般情况下是深度值 小的像素挡住深度值大的像素; 最后通过显示器等设备显示合成后的场景内 容。
如图 5 所示, 本发明实施例还提供一种视频通讯设备, 具体包括: 信息 获取模块 51、 分割模块 52、 传输接口模块 53、 提取模块 54和合成模块 55。
其中, 信息获取模块 51用于获取本地场景内容及其深度值, 所述信息获 取模块 51可以通过深度摄像机或者立体摄像机来实现, 其中的深度摄像机采 用红外技术获取图像的深度, 而立体摄像机采用双摄像头来获取图像的深度。 分割模块 52用于依据本地场景内容深度值, 从本地场景内容中分割出本地目 标内容。 传输接口模块 53用于将所述本地目标内容及其深度值发送到远端。
所述传输接口模块 53还用于接收远端发送的目标内容及其深度值, 提取 模块 54用于获取本地的背景内容及其深度值; 合成模块 55用于根据深度值 的关系将远端目标内容和本地背景内容合成场景内容, 一般情况下是深度值 小的像素挡住深度值大的像素; 最后通过显示模块显示合成后的场景内容。
其中本地的背景内容可以为分割模块 54分割出本地目标内容后的剩余内 容, 也可以通过另一个摄像机获取本地目标对面的背景内容及其深度值。
如果让本实施例中的视频预处理装置和视频接收装置之间进行通讯, 比 方说均接入到同一个网络, 这样就够成一个视频通讯系统, 该系统的发送端 包括图 2的视频预处理装置, 接收端包括图 4中的视频接收装置。 实施例 2 :
本实施例提供一种视频通讯设备, 该设备将本地场景内容中的本地目标 内容, 以及本地目标内容对应的深度值发送到对端设备, 对端设备在接收到 本地目标内容后, 将所述本地的目标内容和对端的背景内容合成一幅场景, 并显示给对端的用户。 这样可以确保对端的用户所看到的场景和对端用户自 身所处的场景完全一样, 比较具有临场感和真实感。 本地的视频通讯设备在 接收到远端的目标内容后, 将远端目标内容和本地背景内容合成一幅场景, 并显示给本地的用户, 以提高本地用户在通讯过程中的临场感和真实感。
如图 6所示,该视频通讯设备主要包括: 信息获取模块 61、分割模块 62、 编码模块 63、 传输接口模块 64、 解码模块 65、 合成模块 66和显示模块 67。
其中, 信息获取模块 61用于实现对本地场景内容的拍摄, 以及本地场景 内容对应深度值的计算, 或者直接获取本地场景内容对应的深度值; 分割模 块 62 用于根据深度值从本地场景内容中分割出本地目标内容; 编码模块 63 用于对分割出的本地目标内容及其对应的深度值进行编码; 传输接口模块 64 用来发送本地目标内容及其深度值, 或者接收远端发送的目标内容及其深度 值; 解码模块 65用于实现对接收到的远端目标内容及其深度值的解码; 合成 模块 66用于将解码得到的远端目标内容与本地背景内容融合, 根据对应的深 度值生成立体视图, 其中的本地背景内容可以是本地场景内容中分割出本地 目标内容后的剩余内容, 也可以是采用另一组摄像机拍摄的本地目标对面的 场景内容; 显示模块 67用于实现对和成图像的显示, 可以是立体显示设备或 普通二维显示设备, 如果是立体显示设备, 则需要重构一幅另一个视点的二 维图像。
下面分别对本实施例视频通讯设备中的各个模块做详细介绍。
信息获取模块 61可以有以下两种实现方式: 一、 采用深度摄像机同时得 到本地场景内容及其深度值; 二、 采用两台以上摄像机拍摄本地场景内容, 通过立体图像匹配方法得到对应的深度值。
深度摄像机 ( Depth Camera )是一种新型摄像机, 深度摄像机可以在拍 摄 RGB ( red green blue, 红绿蓝)彩色图像的同时获取彩色图像中每个像素 对应的深度值。 目前的深度摄像机主要采用红外方式来获取场景中目标的深 度值。
通过立体图像匹配方法得到对应的深度值的方法, 要求在图像采集时采 用两台以上摄像机拍摄场景, 得到场景不同角度的两幅以上图像, 通过对图 像进行匹配, 可以获得场景在不同图像上的视差, 根据视差和摄像机的内外 参数, 即可计算得到图像中每个像素对应的深度值。 以下将以两台摄像机为 例对图像匹配方式获取深度值进行说明。
如图 7所示为水平放置的两台平行摄像机成像示意图, 其中 01和 02分 别为两个摄像机光心, 其距离为 B, 点 A到摄像机的垂直点 0的距离为 Z (即 点 A的深度), A1和 A2分别为点 A在两个摄像机的成像点。
Aior = 丄
由三角形 A1 01 01' 和三角形 A 01 C相似可得: C01 1
A202/ = 丄
由三角形 A2 02 01' 和三角形 A 02 C相似可得: C02 1 ; 故推出两个成像点的视差为: d AlO -A202' =f * (C01-C02) /Z=f *B/Z。 所以, 可以得到点 Α的深度值 Z=f *B/d。
由于 f 为焦距已知, B可以测量出来, d可以通过图像匹配的方法计算得 到, 所以, 采用两台摄像机可以获取到场景中每个点对应的深度值。
通过立体摄像机来获取深度信息包括: 找到场景中某一点在两幅以上图 像中对应的成像点, 然后再根据该点在两幅以上图像中的坐标求出其深度值。 找到场景中某一点在不同图像中对应成像点的过程由图像匹配完成。 目前的 图像匹配技术主要包括: 基于窗口的匹配、 基于特征的匹配和动态规划法等。 其中, 基于窗口的匹配和动态规划法都采用了基于灰度的匹配算法。 基 于灰度的算法是将其中一个图像分割成两个以上小的子区域, 以其灰度值作 为模版在其它图像中找到和其最相似灰度值分布的子区域, 如果两个子区域 满足灰度值分布的相似性要求, 我们可以认为子区域中的点是匹配的, 即这 两个子区域的成像点是场景中同一点的成像。 在匹配过程中, 通常使用相关 函数衡量两个区域的相似性。
基于特征的匹配没有直接利用图像的灰度, 而是利用由图像灰度信息导 出的特征进行匹配, 相比利用简单的亮度和灰度变化信息更加稳定。 匹配特 征可以认为是潜在的能够描述场景 3D结构重要特征,如边缘和边缘的交点(角 点)。 基于特征的匹配一般先得到稀疏的深度信息图, 然后利用内插值等方法 得到图像的密集深度信息图。
分割模块 62根据本地场景内容及其对应的深度值, 对图像进行分割得到 本地场景中的本地目标内容。 分割模块 62可以通过查找单元 621和分割单元 622来实现,查找单元 621用于查找本地目标内容在本地场景内容中出现的区 域, 分割单元 622 用于在本地场景内容中对本地目标内容的区域进行准确的 边缘轮廓提取, 分割得到本地目标内容和其它本地背景内容。
一般来说, 本地目标内容在本地场景中出现的区域, 可以由本地用户估 计本地目标相对摄像机的位置后, 设定本地目标内容出现的深度值范围, 在 后续的视频处理中, 由查找单元在该深度值范围内查找目标内容出现的区域。
如果需要查找的本地目标内容为一个人物形象, 那么可以采用现有的人 脸识别技术, 通过人脸识别单元 623从本地场景内容中自动识别出人脸图像 出现的位置, 然后由查找单元 621 在本地场景内容的深度值中查找所述人脸 图像位置对应的深度值, 然后根据查找到的深度值确定本地目标内容深度值 的范围, 并根据所述深度值的范围确定本地目标内容在场景内容中的区域。 从而确定人物目标在场景中出现的深度范围。
由于深度值适合和彩色图像相对应的, 根据深度值分割出的人物区域和 从彩色图像中的人物区域相对应。 得到的彩色图像的本地目标内容及其深度 值后将被发送到编码模块 63 , 编码模块 63对其编码后通过传输接口模块 64 发送到远端。
由于从提取到的本地目标内容的大小不一样, 需要将这些本地目标内容 调整到同一大小, 一般是将这些本地目标内容调整到与本地场景内容一样的 大小, 从而对每一帧得到相同大小的待编码图像, 便于编码。 这种调整不会 对本地目标内容本身进行缩放, 只是改变了本地目标内容所使用画布的大小。 对于调整大小后出现的空白区域, 可以采用 0值填充。
本实施例中的编码模块 63对分割出来的本地目标内容及其深度值进行编 码。 相比单通道的二维视频, 立体视频具有大得多的数据量: 双目立体视频 具有两个数据通道。 视频数据的增加给其存储和传输都带来了困难。 目前立 体视频编码主要也可以分为两类: 基于块的编码和基于对象的编码。 在立体 图像的编码中, 除了帧内预测和帧间预测消除空域和时域上的数据冗余度外, 还必须消除多通道图像之间的空域数据冗余性。 视差 (Para l lax )估计与补 偿是立体视频编码中的一项关键技术, 用于消除多通道图像间的空域冗余度。 视差估计补偿的核心是找到两幅 (或三幅以上) 图像间的相关性。 此处的立 体视频编码内容包括彩色图像及其对应的深度值, 可以采用分层编码, 即将 彩色图像混合编码放入基本层, 深度值混合编码后放入增强层。
本实施例中的传输接口模块 64 用于发送编码后本地目标内容及其深度 值, 并接收远端传输的编码后的远端目标内容及其深度值, 送到送到解码模 块进行解码处理。 本实施例中的传输接口模块 64可以是能够实现传输的各种 有线或无线接口, 例如: 宽带接口, 蓝牙接口、 红外接口或者采用手机的移 动通信网的接入技术。 本实施例中传输接口模块只需要传输本地场景内容中 的本地目标及其深度值, 相对于原有的本地场景内容而言, 其数据量有所减 少, 可以减小数据传输时的带宽占用率
本实施例视频通讯设备的传输接口模块 64接收到远端目标内容及其深度 值后, 需要进行处理才能显示。 解码模块 65用于对接收的远端数据进行解码, 得到远端的目标内容及其 对应的深度值。
合成模块 66用于根据深度值将解码得到的远端目标内容与本地背景内容 进行融合, 得到合成的远端目标内容与本地背景融合后的彩色图像, 以及对 应的深度值, 其中本地背景内容由提取模块 69完成。 合成过程中先要根据远 端目标内容的深度值与本地背景内容的深度值确定遮挡关系, 然后按照遮挡 关系合成对应彩色图像内容。 当显示模块 67为三维立体显示设备, 需进一步 根据合成彩色图像内容和对应的深度值重构另一视点的虚拟图像, 故而本实 施例中还可以包括视图重构模块 68 ,用于对合成后的图像内容进行视图重构, 生成一个虚拟视点图像, 该虚拟视点图像和合成彩色图像即构成立体视图, 发送到三维立体显示设备实现立体显示。
如图 8所示, 给出了接收到的远端目标内容(人物), 并示意出该远端目 标内容的深度, 以及本地采用深度摄像机方式获取的本地背景内容(树和桌 子), 并示意出该本地背景内容的深度, 然后根据其中的深度关系进行合成, 得到合成的场景。 由于获得了远端目标内容和本地背景内容相对摄像机的距 离, 可以将远端人物插入到本地的桌子和树之间。
为了能够让合成的图像更加逼真, 需要解决如下问题:
( 1 )远端目标内容的缩放问题。 为了使远端人物与本地背景内容完美融 合, 可能需要通过缩放单元 661 调整远端目标内容相对摄像机的位置, 这时 需要同时对远端目标内容大小进行缩放。 当需要把远端目标内容拉到更近的 距离时, 即减小深度值时, 需要对远端目标内容进行放大; 当把远端目标内 容安排在更远的距离时, 即增大深度值时, 需要对其远端目标内容进行缩小。 由于远端目标内容是单个目标, 其深度变化的范围有限, 在进行图像缩放时 可以将透视关系的缩放简化为与其深度一致的线性缩放。
( 2 )远端目标内容和本地背景内容之间的相互遮挡问题。 在对远端目标 内容与本地背景内容融合时, 通过合成单元 662 需要考虑其相互遮挡问题。 根据缩放后的远端目标内容的深度值以及本地背景内容的深度值, 确定缩放 后的远端目标内容和本地背景内容的遮挡关系, 当像素点的水平和垂直位置 重合时, 深度值小的像素点遮挡深度值大的点(近景遮挡远景), 按照缩放后 的远端目标内容和本地背景内容的遮挡关系, 将缩放后的远端目标内容和本 地背景内容合成场景内容。
( 3 )空洞填充问题。 在去除了本地目标内容后得到的本地背景内容可能 存在空洞, 使其与远端目标内容融合后仍可能存在空洞。 在此有两种解决方 式:
第一种为使用另一组摄像机采集拍摄本地目标对面的场景内容, 一般为 人所看见的场景内容, 在合成时, 采用该场景内容直接和远端目标内容合成, 该方式效果较好, 即人所看见的背景与远端人物融合, 由于直接使用对面的 场景, 不存在空洞填补问题, 但是需要在视频通讯的每一端增加一组摄像机。
另一种解决方案为使用剔除本地目标内容后剩下的本地背景内容, 对于 可能出现的空洞, 采用边缘像素填充的方法进行填充。
当本实施例视频通讯设备采用三维立体显示设备时, 并且显示设备仅支 持左右图像输入方式显示时, 需要重构另一幅图像, 从而实现立体显示。 有 些自动立体显示器支持一幅二维彩色图像及其对应的深度值进行三维立体显 示, 这样就不需要重构另一幅图像了, 而是由自动立体显示器自身完成另一 幅图像的重构, 并且在重构过程中完成相应的空洞填充, 如 hi l ips 的立体 显示器。
视图重构也称为虚拟视点图像合成, 一般指从模型或不同角度的图像重 构其它视角的图像。 本实施例通过视图重构模块 68来实现, 当已知图像的深 度时, 可以根据以下公式计算虚拟视点与已知视图之间的视差:
d=A101/ -A202' =f * (C01-C02) /Z=f *B/Z。
其中, d 为虚拟视点视图与已知视图之间的视差, f 为摄像机的焦距, B 为虚拟视点与原摄像点之间的距离, Z为图像的深度。 当基于合成图像及其深度重构其右边的图像时, 右边图像中某条扫描线 处像素的颜色由左图像(合成图像)中对应扫描线 Χ'处像素的颜色确定, 其 中 Χ'的坐标由下式确定:
, β
xI = xr + a = xr + - 在根据以上公式确定合成视图内容时, 由于存在遮挡问题而导致右图中 的某些点无法在左图中找到对应的点, 即存在空洞问题, 同样采用空洞边缘 的像素点对其进行填充, 填充可以采用双线性插值方式进行。
本实施例中的显示模块用来显示对合成后的图像。 该显示模块 67可以是 立体显示设备包括自动立体显示设备, 立体眼镜和全息显示设备三维立体显 示等, 实现立体图像的立体显示, 可以让用户体验到场景的深度, 感受到立 体效果。 当需要进行立体显示时, 一般需要完成上述的视图重构和空洞填充。 本实施例显示模块也可以是普通二维显示设备, 仅显示二维合成图像, 当只 需要显示二维图像, 则不需要进行视图重构, 直接显示合成后的二维图像。
实施例 3:
本实施例为视频通信系统中的一个通讯过程实例, 具体为两个用户( A和 B )通过实施例 2中的视频通讯设备进行通讯, 其通讯过程中用户 A向用户 B 发送视频数据, 以及用户 B接收用户 A的视频数据的全过程, 该视频通讯系 统的结构如图 10所示, 包括发送端和接收端,发送端和接收端通过网络连接。
所述发送端, 用于获取发送端的场景内容及其深度值, 根据发送端的场 景内容深度值, 从发送端的场景内容中分割出发送端的目标内容, 并将所述 发送端的目标内容及其深度值发送到接收端; 所述发送端包括: 信息获取模 块 1001 , 用于实现对本地场景内容的拍摄, 以及本地场景内容对应深度值的 计算, 或者直接获取本地场景内容对应深度值; 分割模块 1002 , 用于根据深 度值从本地场景内容中分割出本地目标内容; 编码模块 1003 , 用于对分割出 的本地目标内容及其对应的深度值进行编码; 传输接口模块 1004用来将本地 目标内容及其深度值发送到接收端。
所述接收端用于接收发送端发送的目标内容及其深度值, 并获取接收端 的背景内容及其深度值, 根据深度值将发送端的目标内容和接收端的背景内 容合成场景内容。 所述接收端包括传输接口模块 1 005 , 用来接收远端发送的 目标内容及其深度值; 解码模块 1 006 , 用于实现对接收到的远端目标内容及 其深度值的解码; 合成模块 1 007 , 用于将解码得到的远端目标内容与本地背 景内容融合, 根据对应的深度值生成立体视图, 其中的本地背景内容可以是 本地场景内容中分割出本地目标内容后的剩余内容, 通过提取模块 1 01 0提取 该剩余内容; 本地背景内容也可以是采用另一组摄像机拍摄的本地目标对面 的场景内容; 显示模块 1 009 , 用于实现对和成图像的显示, 可以是立体显示 设备或普通二维显示设备, 如果是立体显示设备, 则需要重构一幅另一个视 点的二维图像。重构另一个视点二维图像可以通过视图重构模块 1 008来完成。
其通信过程如图 9所示, 具体包括如下步骤:
901、 用户 A的视频通讯设备的信息获取模块获取本地场景内容及其深度 值; 可以通过深度摄像机或者立体摄像机获取本地场景内容以及场景内容的 深度值。 深度摄像机通过红外线可以直接获取深度; 而立体摄像机一般通过 两个平行的摄像机获取场景内容, 然后计算出该场景内容中每个像素的深度 值, 计算公式为: Z=fB/ A x ; 其中 f 为焦距, B为两个摄像机的距离, Δ χ为 每个像素在两个摄像机中的位置差异。
902、 用户 Α的视频通讯设备的分割模块从本地场景内容中分割出本地目 标内容, 具体为: 由分割模块中的人脸识别单元对拍摄到的本地场景内容进 行人脸识别得到人脸图像的位置, 然后由分割模块中的查找单元在本地场景 内容的深度值中查找所述人脸图像位置对应的深度值, 并根据查找到的深度 值确定拍摄到的图片中人物深度值的范围。 这样就可以确定本地目标内容在 场景内容中的区域, 最后由分割模块中的分割单元根据确定的区域从本地场 景内容中分割出人物目标。 903、 在分割得出本地人物目标后, 可以保存所述本地场景内容分割出本 地人物目标后的剩余内容, 及剩余内容的深度值; 也可通过另一个摄像机同 时获取人物目标对面的背景内容及其深度值, 并保存。
904、 将为了统一本地人物目标尺寸, 需要将本地人物目标扩大到原采集 图片的大小, 或者裁剪成其他尺寸的图片; 剪裁后产生的空洞区域可以填充 成 0值。
905、 分别对步骤 904中所得到本地人物目标及其深度值进行编码, 最好 使用分层编码, 采用分层编码需要传输的数据量较少。
906、 将编码后的所述本地人物目标及其深度值, 通过传输接口模块发送 到用户 B的视频通讯设备。
以上步骤完成了用户 A的发送操作, 以下步骤为用户 B接收数据及其对 数据的处理过程。
907、 用户 B的视频通讯设备通过传输接口模块, 接收到用户 A发送的人 物目标及其深度值。
908、 用户 B的视频通讯设备通过解码模块对接收到的数据解码, 得到用 户 A的人物目标及其深度值。 同时用户 B的视频通讯设备还需要获取背景内 容和背景内容的深度值, 一般情况下, 可以将本地场景内容中去除本地目标 后的剩余内容作为其背景内容。 如果通过另一个摄像机获取用户 B对面的背 景内容及其深度值, 会使用户 B看到的画面更真实, 并且在合成图像时不会 产生空洞问题。
909、 通过合成模块中缩放单元的对用户 A发送过来的人物目标及其深度 值进行缩放, 得到较为理想大小的人物目标, 当需要把远端目标内容拉到更 近的距离时, 即减小深度值时, 需要对远端目标内容进行放大; 当把远端目 标内容安排在更远的距离时, 即增大深度值时, 需要对其远端目标内容进行 缩小。
然后根据用户 A 的人物目标缩放后的深度值以及背景内容的深度值, 确 定远端人物目标和本地背景内容的遮挡关系, 遮挡原则为: 当像素点的水平 和垂直位置重合时, 深度值小的像素点遮挡深度值大的点 (近景遮挡远景)。
合成模块中的合成单元再按照上述确定遮挡关系将人物目标和背景内容 合成一幅场景内容。
如果背景内容是去除目标内容后的剩余内容, 则需要将合成场景内容中 的空洞进行像素填充; 如果背景内容是直接获取用户 B对面的场景, 则不用 进行像素填充。
91 0、 视图重构模块对所述合成的场景内容进行虚拟视点图像合成, 具体 为根据以下公式计算虚拟视点与已知视图之间的视差:
d=A101 / -A202' =f * (C01-C02) /Z=f *B/Z。
其中, d 为虚拟视点视图与已知视图之间的视差, f 为摄像机的焦距, B 为虚拟视点与原摄像点之间的距离, Z为图像的深度。
当基于合成图像及其深度重构其右边的图像时, 右边图像中某条扫描线 处像素的颜色由左图像(合成图像)中对应扫描线 χ'处像素的颜色确定, 其 中 χ'的坐标由下式确定:
, β
xI = xr + a = xr + - 在完成视图重构后, 需要对虚拟视点图像合成后的场景内容中的空洞进 行像素填充。
91 1、 通过显示模块显示合成后的场景内容, 例如: 通过自动立体显示设 备、 立体眼镜或全息显示设备三维立体显示等, 实现立体图像的立体显示, 或者通过普通二维显示设备仅显示二维合成图像。
本实施例的视频通讯系统中, 用户 A 的设备还可以包括视频接收装置, 用户 B的设备还可以包括视频预处理装置, 以确保用户 B可以向用户 A发送 视频数据。 如果用户 B需要向用户 A发送视频数据, 其过程和图 9一样, 只 是发送方和接收方改变了。 本发明实施例主要用在视频通讯中, 例如: 一般 的视频聊天, 办公用的视频电话、 视频会议等。
实施例 4 :
本发明实施例提供了一种视频处理方法, 如图 11所示, 包括如下步骤: 步骤 111、 获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至 少一个呈现材料的至少一个视点的彩色 /灰度图像;
步骤 112、 将所述视频对象的至少两个视点的彩色 /灰度图像和所述至少 一个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像; 步骤 113、 将所述呈现图像以三维的方式进行显示。
本发明实施例提供的技术方案, 通过获取的视频对象的多视点彩色 /灰度 图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈现图像支持三维的显 示方式, 再将该呈现图像以三维的方式显示出来, 解决了现有技术只能支持 二维视频的呈现带来的问题, 从而实现了三维视频的呈现。 例采用的立体视频(Stereo Video) / 三维(3D )视频技术进行简要说明。
传统的视频技术只能提供二维信息, 然而, 本发明实施例通过采用三维 视频技术, 使作为观看者的用户不仅可以获知关于景物内容的信息, 还可以 获知关于景物的远近、 位置等深度信息。
三维视频技术可以提供符合立体视觉原理的具有深度信息的画面, 能够 真实地重现客观世界景象, 表现出场景的纵深感、 层次感和真实性, 是当前 视频技术发展的重要方向。
三维视频技术的基本原理是模拟人眼成像原理, 采用双摄像机得到左眼 图像和右眼图像, 在呈现时使人的左右眼分别看到左右眼图像, 最后合成得 到具有立体感的图像, 使观察者能够感到场景的深度。 因此, 可以把双目立 体视频看成是在现有 2D视频上增加了对深度信息的扩展。
上述的获取要呈现的视频对象的至少两个视点的彩色 /灰度图像部分, 采 用了多视点视频 ( Mul t i-Viewpoint Video MVV ) /自由视点 ( Free Viewpoint Video , FVV )视频技术。 多视点视频技术的基本思想是由两个以上摄像机同 时拍摄场景, 例如, 体育或戏剧场景, 不同的摄像机的拍摄角度不同, 产生 两个以上视频流; 这些不同视点的视频流送到用户终端, 用户可以选择任意 的视点和方向观看场景。 上述的视点可以是预定义的固定摄像机的拍摄视点, 也可以是一个虚拟视点, 其图像由周围真实的摄像机拍摄的图像合成得到。
实施例五:
本发明实施例提供了一种视频发送方法, 如图 12所示, 包括如下步骤: 步骤 121、 获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至 少一个呈现材料的至少一个视点的彩色 /灰度图像;
上述要呈现的视频对象包括自然场景分割后的作为前景的人和物体等对 象, 也可以是背景对象; 上述呈现材料可以是要呈现的文档, 图片, 视频或 是由计算机生成的图形等内容。
本发明实施例提供的获取视频对象至少两个视点的彩色 /灰度图像的方 法包括如下步骤:
步骤 1210、 获取所述视频对象所处场景的同一视点的至少一个深度信息 和至少一个彩色 /灰度图像。
可以使用两台以上普通彩色摄像机, 通过基于图像匹配的方法获取上述 深度图像, 考虑到该方法算法复杂度高, 实时性能差等不足之处, 优选的, 本发明实施例通过一个能够采集场景深度信息的摄像机获取上述深度信息, 通过一个能够采集场景彩色 /灰度图像的摄像机获取彩色 /灰度图像。 下面对 这两种摄像机的工作原理进行简要说明。
如图 13 所示, 为利用配备超高速快门的电荷耦合器件(Charge Coup led Device , CCD)的摄像机和调强发光器获取深度图像的的原理示意图。 场景中 有物体 a和物体 b, 物体 a为方形物体, 物体 b为三角形物体。 图中 和 分 别为摄像机在快门打开时间内 (图中 c所示)获取到的物体 a和 b的光照强 度, 4和 分别表示强度递增的调制光和强度递减的调制光。 其中,较近物体 a上的反射光线发射到摄像机的瞬时的光照强度 被摄像 机探测装置的超高速快门检测到, 并得到在图像 A中的方形分布; 物体 b反 射光线得到图像 A中的三角形分布。 由于物体 a距离摄像机较近, 摄像机探 测到的瞬时的光照强度 比 强, 方形图像的亮度比三角形要亮, 因此, 可利 用捕获到的图像 A 的亮度的差异来检测物体的深度。 但是, 物体反射光的亮 度会受物体的反射率、 物体到摄像机的距离、 光源的调制指数和照度的空间 不均匀性等参数的影响。 此时, 可利用与上述光照强度空间分布呈线性递减 的方式获得图像 B, 将图像 A和图像 B相结合, 并通过信号处理算法可消除不 利的影响, 得到精确的深度图, 在该深度图中, 物体 b (图中的三角形物体) 的深度比物体 a (图中的方形物体)的深度大, 即在视觉上, 物体 a距离摄像 机较近, 物体 b距离摄像机较远。
可选的, 本发明实施例可采用上述摄像机获取深度信息, 再配置一台获 取彩色 /灰度图像的摄像机, 或者直接利用能够同时获取深度信息和彩色 /灰 度图像的摄像机。
如图 14所示, 为高清( High Def ini t ion, HD ) Axi -视觉( Axi_Vi s ion ) 摄像机的基本构造示意图, 该摄像机能同时获取深度信息和彩色 /灰度图像。 在 HD Ax i-Vi s ion摄像机系统中, 包括深度图像处理单元和彩色图像处理单 元。 近红外 LED 阵列用于调强发光器, 其具有快速直接调制的能力, 该近红 外 LED阵列的发射光的波长为 850 nm, 在可见光的范围之外, 不会干扰可见 光。 可采用 4个 LED阵列环绕在摄像机镜头的周围, 可均匀地照亮摄像的场 景。 同时还可以有一个可见光源, 如荧光源, 用于照射被摄像物体, 该光源 具有超过近红外光区域的频谱。
当物体的反射光经过摄像机镜头的二向棱镜时, 可见光和近红外光被分 离, 其中, 可见光进入彩色图像处理单元并由彩色图像处理单元进行处理后, 获得物体的彩色图像, 即二维 (2D ) 图像, 该彩色图像处理单元可为一个彩 色高清摄像机; 近红外光则经过深度图像处理单元处理后, 获得物体的深度 图像。 在深度图像处理单元中, 经上述二向棱镜分离出的近红外光被聚焦到 光电阴极上的同时, 在光电阴极和微通道板(Micro Channel Pla te , MCP ) 之间施加短脉冲偏压, 实现十亿分之一秒的快门, 快门的开启和光线调制频 率具有相同的频率, 以获得更好的信噪比 ( S igna l to Noi se Rat io, SNR )。 利用快门的开启在磷光体上获得物体的光学图像, 该光学图像再经过中继镜 头聚焦到高分辨率的逐行 CCD摄像机上, 转换为光电子图像, 最后通过信号 处理器形成物体的深度图。
上述描述给出了两种优选的获取深度信息和彩色 /灰度图像的方法, 但本 发明实施例并不限于此, 其它相似和相关的获取深度信息和彩色 /灰度图像的 方法均属于本发明实施例的保护范围之内。
步骤 1211、 根据所述获取到的一个视点的深度信息对同一视点的彩色 / 灰度图像进行视频分割, 获取所述视频对象。
可采用不限一种方法进行视频分割, 将一副视频图像分成前景和背景。 例如, 色度键 ( chroma-key )分割技术、 深度键 ( Depth-key )分割技术或是 检测当前图像和预先拍摄的背景图像之间的差异进行分割的技术。 其中第一 种和第三种技术对场景的限制条件较多, 优选的, 本发明实施例采用基于深 度键分割技术进行视频分割, 主要包括的技术要点:
根据深度图通过阈值生成一个二值化的深度掩模 ( mask ), 根据该掩模对 视频对象进行提取。 例如, 掩模值为 1 的像素为前景对象像素, 掩模值为 0 的像素为背景对象像素, 因此可以根据掩模值提取或去除视频对象。
对该深度掩模的边界构造出一个链码描述, 根据该链码描述恢复前景对 象的轮廓。 以对象的轮廓定义一个处理区域作为 Alpha合成的区域。
步骤 1212、根据所述一个视点的深度信息和所述同一视点的彩色 /灰度图 像生成所述视频对象的至少一个其它视点的彩色 /灰度图像。
下面对本发明实施例采用的多视点图像的生成技术作简单介绍。
空间中某像素点 P = [X,y,Z]投影到摄像机 2D图像平面上的点 [xj] , 满足 下列关系: x = F—; y = F- , 其中, F为摄像机焦距。
Z Z
假设空间中的两个像素点 = [X, , , 4 ]和 ¾ = [X2 ,Y2,Z2]都投影到一个摄 像机对上。一个摄像机的位于 [0,0,0] (左摄像机),另一个摄像机位于 [ ,0,0] (右 摄像机), 两个摄像机的焦距相等, 都为 F, 且处于平行位置。 则满足:
ΧΛ „ , "Χ、 -Β „Χ -Β
χ, Λ = F——; X, Ί = F ; x„、 = F ; χΒΊ = F- 其中, ¾¾ 2分别为 和 ρ2点由左摄像机得到位置、 ¾¾,2分别为 和 ρ2点由右摄像机得到位置, 则 ]和 ¾点的深度(即视差) 满足:
X; X;-B F B
^xRi=xLi-di , 其中, i取 1或 2(
, z. 可以看出, 只要知道 和(1, 就可以求出 的值。
在本发明实施例中, 由深度摄像机的彩色 /灰度图像得到, d可以通过 深度图求出, 从而可以生成上述视频对象在另一个视点的彩色 /灰度图像。
由于深度往往不是整数, 本发明实施例对计算得到的像素位置采用亚像 素级别。 此时可以采用加权平均法, 根据原始图像上某一像素邻近像素的亮 度和色度值确定对应的新像素的亮度和色度值。
采用上述方法, 可以生成该视频对象的两个以上不同视点的彩色 /灰度图 像。
为了保证获取到高质量的彩色 /灰度图像, 在多视点图像生成时, 需要解 决图像中因视点改变导致的 "遮挡" 和 "空洞" 问题。 下面对本发明实施例 中对 "遮挡" 和 "空洞" 的问题进行处理的方法给出简要说明。
如图 15所示, 显示了多视点图像生成时产生 "遮挡" 的情况, 其中左边 的图 15 (1 )所示为在原图像视点 处观察场景的情况, 右边的图 15 (2)所 示为在需要重构图像的新视点 Α处观察场景的情况。在左边 处观察场景时, 在前物体(图中 A所示)遮挡住了在后物体(图中 B所示) 的一小部分, 图 中 C所示表示遮挡区域; 当在右边 <¾处观察场景时, 在前物体对在后物体的 遮挡区域变大, 导致 Q处得到的图像中的一部分在 A处的图像中无法显示。 因此在像素映射过程中, 需要判定一个像素是否被遮挡, 如果没有被遮挡, 则进行上述的像素映射处理; 如果该像素被遮挡, 则跳过不做处理。
如图 16所示, 显示了多视点图像生成时产生 "空洞" 的情况, 其中左边 的图 16 ( 1 )所示为在原图像视点 Q处观察场景的情况, 右边的图 16 ( 2 )所 示为在需要重构图像的新视点 Α处观察场景的情况。 在左边的6^立置处, 由 于在前物体(图中 A所示)对在后物体(图中 B所示) 的遮挡, 在后物体的 左边一部分是看不见的, 因此在 Q处生成的图像中没有这部分的像素。 在右 边的新位置 <¾处观察, 在后物体的左边部分已经没有遮挡, 但由于 处生成 的图像中没有这部分的像素, 导致在 处生成的图像中产生缺乏对应于 处 图像像素的空洞区域(如图中 C所示)。
为了处理空洞, 可以在进行新图像生成前, 先将新图像中所有的像素设 置为特殊的颜色值。 当上述映射过程完成后, 图像中仍留有特殊颜色值的区 域即为空洞区域。 其中, 对于小的空洞区域, 可以根据空洞周围像素的深度 和亮度、 色度信息确定该空洞区域内像素相应的信息, 并进行修补, 例如, 可采用线性或非线性插值的方法进行修补; 对于较大的空洞, 可以采用运动 补偿的方式, 在当前重构帧之前的视频帧序列中进行查找, 找到对应于该空 洞区域的像素信息并进行修补。
上述获取要呈现的视频对象的至少两个视点的彩色 /灰度图像的方法, 也 适用于获取至少一个呈现材料的至少一个视点的彩色 /灰度图像。
步骤 122、 将所述视频对象的至少两个视点的彩色 /灰度图像和所述至少 一个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像; 图像合成是将分割后的前景物体无缝集成到背景中去, 可选的, 这里的 背景可以为呈现材料经多视点图像生成后的彩色 /灰度图像, 或者为该呈现材 料一个视点的彩色 /灰度图像。 本发明实施例采用的方法为基于深度图像对前 景物体的边缘和背景进行 a lpha值合成, 主要包括的技术要点有:
首先, 定义需要处理的区域, 该处理区域主要位于前景物体的边界附近, 为以上述链码点的为中心的矩形区域。
该处理区域的大小和前景物体边缘的锐利程度有关。 通过计算物体边缘 的垂直方向的导数来得到前景物体边缘的锐利程度。 较锐利的边缘前景到背 景的过渡区域很小, 因此需要的处理区域较小; 模糊的区域前景到背景是逐 步过渡的, 因此需要较大的处理区域。
在定义的处理区域边界上, 像素点被假定要么全部处于前景对象中 (不 透明的 a l pha值, 纯的前景色),要么全部位于前景对象之外(透明 a lpha值, 纯的背景色)。 在处理区域中, 像素点是前后景的混合, 具有半透明的 a lpha 值 , 该像素点的颜色值 是前景颜色 和背景颜色 Β 的混合:
I(i, j) = a * F(i ) + (\ - a) * B(i )
通过估算出 a 1 pha值"可以算出处理区域中某一像素点的颜色值 。 上述给出了本发明实施例采用的进行图像合成时的关键技术, 但不限于 此, 可采用其它的方法进行图像合成。
可选的, 在本发明实施例中, 可利用深度信息, 获取到视频对象的位置 信息和来自所述视频对象的控制命令信息, 将所述视频对象的至少两个视点 的彩色 /灰度图像和所述至少一个呈现材料的至少一个视点的彩色 /灰度图像 进行合成, 获取呈现图像。
该位置信息由上述深度图像获取, 用于控制视频对象在呈现材料上的位 置; 该控制命令信息用于控制呈现材料的内容和视频对象在呈现材料上的位 置。
本发明实施例提供的获取所述控制命令信息的方法包括:
根据所述深度图像, 对来自视频对象的手势进行识别, 将手势识别信息 转化为控制命令信息。
通过分析深度图像, 可以得到视频对象在场景中手势特征点的三维坐标 信息, 通过这些特征点的三维坐标信息对视频对象的手势进行识别。 由于深 度图可以将视频对象的手势识别扩展到三维空间, 因此对视频对象手势的前 后运动也能准确识别。 将识别信息转化为控制命令信息。
进一步的, 本发明实施例还可通过对不同时域空间上相同特征点的检测 , 得到视频对象深度的变化信息, 将该变化信息转化为控制命令信息。
为了支持视频的远程呈现本发明实施例还包括:
步骤 123、 将所述呈现图像进行编码并发送。
将获取的呈现图像进行压缩, 例如, 采用 H. 264协议或 MPEG-4协议对上 述呈现图像进行编码, 以适用于有限的网络带宽, 通过网络传输到远端的进 行呈现。
在网络的接收端, 相应的, 对压缩后的图像进行解码, 例如, 在接收端 根据 H. 264协议或 MPEG-4协议对所述编码后的图像进行解码, 得到上述呈现 图像, 以三维的方式进行显示。 可以采用立体眼镜、 自动立体显示器或投影 仪等设备, 将所述呈现图像以三维的方式显示出来。
本发明实施例包括对所述呈现图像以二维 (2D ) 的方式进行显示, 使二 维和三维的图像呈现方式兼容, 这种情况下不需再对视频对象进行多视点图 像生成, 将直接获取的视频对象和呈现材料的彩色 /灰度图像进行合成, 以二 维方式显示出来。
实施例六:
本发明实施例还提供了一种视频发送方法, 如图 17所示, 以保证实现三 维视频的远程呈现, 包括:
步骤 171、 获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至 少一个呈现材料的至少一个视点的彩色 /灰度图像;
步骤 172、 将所述要呈现的视频对象的至少两个视点的彩色 /灰度图像和 所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行编码并发送。 相应的, 本发明实施例还提供了一种视频接收方法, 能够实现三维视频 的远程呈现, 如图 18所示, 包括:
步骤 181、 获取编码图像, 对获取到的编码图像进行解码, 得到要呈现的 视频对象的至少两个视点的彩色 /灰度图像和至少一个呈现材料的至少一个 视点的彩色 /灰度图像;
步骤 182、 将所述视频对象的至少两个视点的彩色 /灰度图像和所述至少 一个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像; 步骤 183、 将所述呈现图像以三维的方式进行显示。
本发明实施例包括对所述呈现图像以二维 (2D ) 的方式进行显示, 使二 维和三维的图像呈现方式兼容, 这种情况下不需再对视频对象进行多视点图 像生成, 将直接获取的视频对象和呈现材料的彩色 /灰度图像进行合成, 以二 维方式显示出来。
本发明实施例中步骤 171和步骤 172及步骤 181至步骤 183的具体方法 可参考实施例五, 主要的区别在于:
为了进行远程呈现, 考虑到网络带宽等因素, 本发明实施例五主要由发 送端对采集到的视频图像进行处理, 例如, 视频分割、 多视点生成等, 发送 端对合成后的呈现图像编码, 以进行网络传输, 该呈现图像包括了视频对象 和呈现材料, 对解码后的图像(即呈现图像)进行三维呈现; 而本发明实施 例六, 主要由发送端对采集到的视频图像进行相应处理, 在发送端只对所述 视频对象和呈现材料的深度信息和彩色 /灰度图像进行编码, 在接收端先对所 述编码后的图像进行解码, 再与呈现材料进行合成后, 生成呈现图像, 对该 呈现图像进行三维显示。
本发明实施例提供的技术方案, 通过获取的视频对象的多视点彩色 /灰度 图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈现图像支持三维的显 示方式, 再将该呈现图像以三维的方式显示出来, 解决了现有技术只能支持 二维视频的呈现带来的问题, 从而实现了三维视频的呈现。
实施例七:
本发明实施例还提供了一种视频处理装置, 如图 19所示, 包括: 图像获取和处理单元 191 ,用于获取要呈现的视频对象的至少两个视点的 彩色 /灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将所 述视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的至 少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
显示单元 192 , 用于将所述呈现图像, 以三维的方式进行显示。
上述图像获取和处理单元 191包括:
图像获取模块, 用于获取所述视频对象所处场景的同一视点的至少一个 深度信息和至少一个彩色 /灰度图像;
视频分割模块, 用于根据所述获取到的一个视点的深度信息对同一视点 的彩色 /灰度图像进行视频分割, 获取所述视频对象;
多视点图像生成模块, 用于根据所述获取到的一个视点的深度信息和同 一视点的彩色 /灰度图像生成所述视频对象的至少一个其它视点的彩色 /灰度 图像;
合成模块, 用于将所述视频对象的至少两个视点的彩色 /灰度图像和至少 一个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 得到呈现图像。
本发明实施例提供的技术方案, 通过获取的视频对象的多视点彩色 /灰度 图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈现图像支持三维的显 示方式, 再将该呈现图像以三维的方式显示出来, 解决了现有技术只能支持 二维视频的呈现带来的问题, 从而实现了三维视频的呈现。
实施例八:
为了支持远程视频呈现, 如图 20所示, 本发明实施例提供了一种视频发 送装置, 包括:
图像获取单元 201 , 用于获取要呈现的视频对象的至少两个视点的彩色 / 灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 编码单元 202 , 用于将所述要呈现的视频对象的至少两个视点的彩色 /灰 度图像和所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行编码, 得到编码图像;
发送单元 203 , 用于发送所述编码图像。
上述图像获取单元 201包括:
图像获取模块, 用于获取所述视频对象所处场景的同一视点的至少一个 深度信息和至少一个彩色 /灰度图像;
视频分割模块, 用于根据所述获取到的一个视点的深度信息对同一视点 的彩色 /灰度图像进行视频分割, 获取所述视频对象;
多视点图像生成模块, 用于根据所述获取到的一个视点的深度信息和同 一视点的彩色 /灰度图像生成所述视频对象的至少一个其它视点的彩色 /灰度 图像。
为了进一步实现远程视频的三维呈现, 相应的, 如图 21所示, 本发明实 施例还提供了一种视频接收装置, 包括:
接收单元 210 , 用于接收编码图像;
解码单元 211 , 用于对获取到的所述编码图像进行解码, 获取要呈现的视 频对象的至少两个视点的彩色 /灰度图像和至少一个呈现材料的至少一个视 点的彩色 /灰度图像;
合成单元 212 , 用于将所述视频对象的至少两个视点的彩色 /灰度图像和 所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现 图像;
显示单元 21 3 , 用于将所述呈现图像, 以三维的方式进行显示。
本发明实施例中各功能模块的具体工作方式参见实施例六。
本发明实施例提供的技术方案, 通过获取的视频对象的多视点彩色 /灰度 图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈现图像支持三维的显 示方式, 再将该呈现图像以三维的方式显示出来, 解决了现有技术只能支持 二维视频的呈现带来的问题, 从而实现了三维视频的呈现。
实施例九:
本发明实施例提供了一种视频发送装置, 如图 22所示, 包括: 图像获取和处理单元 221 ,用于获取要呈现的视频对象的至少两个视点的 彩色 /灰度图像; 将所述视频对象的至少两个视点的彩色 /灰度图像和至少一 个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像; 编码单元 222 , 用于将所述呈现图像进行编码, 得到编码图像; 发送单元 223 , 用于发送所述编码图像。
本发明实施例中各功能模块的具体工作方式参见实施例五。
本发明实施例提供的技术方案, 通过获取的视频对象的多视点彩色 /灰度 图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈现图像支持三维的显 示方式, 再将该呈现图像以三维的方式显示出来, 解决了现有技术只能支持 二维视频的呈现带来的问题, 从而实现了三维视频的呈现。
实施例十:
本发明实施例提供了一种视频通信系统, 如图 23所示, 包括: 视频发送 装置 231和视频接收装置 232 ,
所述视频发送装置 231包括:
图像获取和处理单元 2311 , 用于获取要呈现的视频对象的至少两个视点 的彩色 /灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将 所述视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的 至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
编码单元 2312 , 用于将要所述呈现图像进行编码, 得到编码图像; 发送单元 2313 , 用于发送所述编码图像;
所述视频接收装置 232包括:
接收单元 2321 , 用于接收所述编码图像; 解码单元 2322 , 用于对获取到的编码图像进行解码, 获取所述呈现图像; 显示单元 2323 , 用于将所述呈现图像, 以三维的方式进行显示。
进一步的, 在实现远程呈现时, 上述的视频通信系统还能够实现将网络 接收端的视频图像在发送端进行三维显示, 发送端和接收端可具备相同视频 图像处理和显示功能, 这时上述视频呈现的发送装置 231还包括:
第二解码单元, 用于对获取的编码图像进行解码, 得到解码后的呈现图 像;
第二显示单元, 用于对所述呈现图像进行三维显示。
所述视频呈现的接收装置 232还包括:
第二图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点 的彩色 /灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将 所述视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的 至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
第二编码单元, 用于将所述呈现图像进行编码, 得到编码图像并发送。 本发明系统实施例提供的技术方案, 通过获取的视频对象的多视点彩色 / 灰度图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈现图像支持三维 的显示方式, 再将该呈现图像以三维的方式显示出来, 解决了现有技术只能 支持二维视频的呈现带来的问题, 从而实现了三维视频的呈现。
在本发明系统实施例中, 在网络的发送端可以对包括视频对象和呈现材 料的呈现图像进行编码, 以通过网络进行传输。 在网络的接收端解码单元对 接收到的数据进行解码, 然后显示单元将解码后的图像直接进行呈现, 但不 限于此, 可选的, 还包括在发送端只对视频对象的视频对象和呈现材料的深 度信息和彩色 /灰度图像进行编码, 以通过网络进行传输。 在网络的接收端, 再将解码后的图像与呈现材料的立体图像进行合成, 得到呈现图像后再由显 示单元进行呈现。
在进行多视点图像的合成, 获取呈现图像时, 还可以根据所述深度信息 计算所述视频对象的位置信息; 对来自所述视频对象的手势生成识别信息, 将该识别信息转化为控制命令信息, 以对所述视频处理单元进行控制。
本发明系统实施例提供的视频通信系统, 可以对二维呈现和三维呈现进 行兼容, 可以将解码后的图像以二维或者三维的方式进行显示。
本发明系统实施例提供的技术方案, 通过获取的视频对象的多视点彩色 / 灰度图像及呈现材料的彩色 /灰度图像, 生成呈现图像, 该呈现图像支持三维 的显示方式, 再将该呈现图像以三维的方式显示出来, 解决了现有技术只能 支持二维视频的呈现带来的问题, 从而实现了三维视频的呈现。
本领域普通技术人员可以理解实现上述实施例中的全部或部分步骤, 可 以通过程序指令相关硬件完成。 所述实施例对应的软件可以存储在一个计算 机可存储读取的介质中。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应该以权利要求的保护范围为准。

Claims

权利 要求 书
1、 一种视频预处理方法, 其特征在于, 包括:
获取本地场景内容及其深度值;
根据所述本地场景内容的深度值从所述本地场景内容中分割出本地目标内 容。
2、根据权利要求 1所述的视频预处理方法, 其特征在于, 所述方法还包括: 保存所述本地场景内容分割出本地目标内容后的剩余内容, 及所述剩余内 容的深度值, 或者保存本地目标对面的背景内容及其深度值。
3、 根据权利要求 1所述的视频预处理方法, 其特征在于, 所述获取本地场 景内容及其深度值包括:
获取深度摄像机拍摄的本地场景内容以及所述深度摄像机获取的所述场景 内容的深度值; 或者
获取两个平行的摄像机拍摄的所述场景内容的两幅图像,
对所述两幅图像进行匹配, 计算每个像素在两个平行摄像机中对应的视 差;
然后通过下式计算出场景内容中每个像素的深度值: Z=fB/ A x , 其中, Z为 该像素的深度值, f 为摄像机的焦距, B为两个摄像机的距离, Δ χ为每个像素 在两个平行摄像机中对应的视差。
4、 根据权利要求 1所述的视频预处理方法, 其特征在于, 所述根据所述本 地场景内容的深度值从所述本地场景内容中分割出本地目标内容包括:
确定本地目标内容深度值的范围;
根据所述本地目标内容深度值的范围确定本地目标内容在本地场景内容中 的区域;
根据所述区域从本地场景内容中分割出本地目标内容。
5、 根据权利要求 4所述的视频预处理方法, 其特征在于, 如果所述本地目 标内容为人物, 所述方法还包括: 对本地场景内容进行人脸识别得到人脸图像的位置;
在本地场景内容的深度值查找所述人脸图像位置对应的深度值;
所述确定本地目标内容深度值的范围包括: 根据查找到的所述人脸图像位 置对应的深度值确定本地目标内容深度值的范围。
6、根据权利要求 1所述的视频预处理方法, 其特征在于, 所述方法还包括:
7、 一种视频预处理装置, 其特征在于, 包括:
信息获取模块, 用于获取本地场景内容及其深度值;
分割模块, 用于根据所述本地场景内容深度值, 从所述本地场景内容中分 割出本地目标内容。
8、根据权利要求 7所述的视频预处理装置, 其特征在于, 所述装置还包括: 存储模块, 用于保存所述本地场景内容分割出本地目标内容后的剩余内容, 及所述剩余内容的深度值, 或者保存本地目标对面的背景内容及其深度值。
9、 根据权利要求 7所述的视频预处理装置, 其特征在于, 所述分割模块包 括:
查找单元, 用于确定所述本地目标内容深度值的范围, 并根据所述本地目 标内容深度值的范围确定本地目标内容在所述本地场景内容中的区域;
分割单元, 用于根据所述区域从所述本地场景内容中分割出本地目标内容。
10、 根据权利要求 9 所述的视频预处理装置, 其特征在于, 如果所述本地 目标内容为人物, 所述分割模块还包括:
人脸识别单元, 用于对所述本地场景内容进行人脸识别得到人脸图像的位 置;
所述查找单元具体用于在所述本地场景内容的深度值中查找所述人脸图像 位置对应的深度值, 然后根据查找到的深度值确定所述本地目标内容深度值的 范围, 并根据所述本地目标内容深度值的范围确定本地目标内容在所述本地场 景内容中的区域。
11、 根据权利要求 7 所述的视频预处理装置, 其特征在于, 所述装置还包 编码模块,
码。
12、 一种视频接收方法, 其特征在于包括: 获取本地背景内容和所述本地背景内容的深度值;
根据所述远端目标内容的深度值和所述本地背景内容的深度值将所述远端 目标内容和所述本地背景内容合成场景内容。
1 3、 根据权利要求 12所述的视频接收方法, 其特征在于, 所述本地背景内 容为: 本地场景内容中去除本地目标后的剩余内容, 或者本地目标对面的背景 内容。
14、 根据权利要求 12所述的视频接收方法, 其特征在于, 根据所述远端目 标内容的深度值和所述本地背景内容的深度值将所述远端目标内容和所述本地 背景内容合成场景内容包括:
根据所述远端目标内容的深度值以及所述本地背景内容的深度值, 确定所 述远端目标内容和所述本地背景内容的遮挡关系;
按照上述遮挡关系将所述远端目标内容和所述本地背景内容合成场景内 容。
15、 根据权利要求 14所述的视频接收方法, 其特征在于, 根据所述远端目 标内容的深度值和所述本地背景内容的深度值将所述远端目标内容和所述本地 背景内容合成场景内容之前还包括: 的远端目标内容及缩放后的远端目标内容的深度值; 定所述远端目标内容和所述本地背景内容的遮挡关系具体为: 根据所述缩放后的远端目标内容的深度值以及所述本地背景内容的深度 值, 确定所述缩放后的远端目标内容和所述本地背景内容的遮挡关系;
所述按照上述遮挡关系将所述远端目标内容和所述本地背景内容合成场景 内容具体为:
按照所述缩放后的远端目标内容和所述本地背景内容的遮挡关系, 将所述 缩放后的远端目标内容和所述本地背景内容合成场景内容。
16、 根据权利要求 12所述的视频接收方法, 其特征在于, 该方法还包括: 对合成的所述场景内容进行虚拟视点图像合成。
17、 根据权利要求 12所述的视频接收方法, 其特征在于, 所述接收远端发 对所述远端目标内容及所述远端目标的深度值进行解码。
18、 一种视频接收装置, 其特征在于, 包括:
传输接口模块, 用于接收远端发送的远端目标内容及所述远端目标内容深 度值;
提取模块, 用于获取本地背景内容及所述本地背景内容的深度值; 合成模块, 用于根据所述远端目标内容的深度值和所述本地背景内容的深 度值将所述远端目标内容和所述本地背景内容合成场景内容。
19、 根据权利要求 18所述的视频接收装置, 其特征在于, 所述合成模块具 所述远端目标内容和所述本地背景内容的遮挡关系, 并按照上述遮挡关系将所 述远端目标内容和所述本地背景内容合成场景内容。
20、 根据权利要求 19所述的视频接收装置, 其特征在于, 该装置还包括: 放;
所述合成单元具体用于根据所述远端目标内容缩放后的深度值以及所述本 地背景内容的深度值, 确定所述远端目标内容和所述本地背景内容的遮挡关系。
21、 根据权利要求 18所述的视频接收装置, 其特征在于, 该装置还包括: 视图重构模块, 用于对所述合成模块合成的所述场景内容进行虚拟视点图 像合成。
22、 根据权利要求 18所述的视频接收装置, 其特征在于, 所述本地背景内 容为: 本地场景内容中去除本地目标后的剩余内容, 或者本地目标对面的背景 内容。
23、 根据权利要求 18所述的视频接收装置, 其特征在于, 该装置还包括: 信息获取模块, 用于获取本地的场景内容以及场景内容的深度值; 分割模块, 用于根据所述本地场景内容的深度值从所述场景内容中分割出 本地目标内容; 值发送到远端。
24、 根据权利要 23所述的视频接收装置, 其特征在于, 所述本地背景内容 为: 本地场景内容中去除所述本地目标后的剩余内容, 或者所述本地目标对面 的背景内容。
25、 根据权利要求 23所述的视频接收装置, 其特征在于, 所述本地目标内 容为人物, 所述分割模块包括:
人脸识别单元, 用于对本地场景内容进行人脸识别得到人脸图像的位置; 查找单元, 用于根据所述人脸图像的位置确定所述人物深度值的范围, 并 根据所述深度值的范围确定所述人物在所述本地场景内容中的区域;
分割单元, 用于根据所述区域从所述本地场景内容中分割出所述人物。
26、 根据权利要求 18所述的视频接收装置, 其特征在于, 该装置还包括: 解码模块, 用于对传输接口模块接收到的所述远端目标内容及所述远端目 标内容的深度值进行解码。
27、 一种视频通讯系统, 其特征在于, 包括:
发送端, 用于获取发送端场景内容及所述发送端场景内容的深度值, 根据 所述发送端场景内容深度值, 从所述发送端场景内容中分割出发送端目标内容, 接收端, 用于接收所述发送端发送的所述发送端目标内容及所述发送端目 标内容的深度值, 并获取接收端背景内容及所述接收端背景内容的深度值, 根 据所述发送端目标内容的深度值和所述接收端背景内容的深度值将所述发送端 目标内容和所述接收端背景内容合成场景内容。
28、 一种视频处理方法, 其特征在于, 包括:
获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至少一个呈现 材料的至少一个视点的彩色 /灰度图像;
将所述视频对象的所述至少两个视点的彩色 /灰度图像和所述至少一个呈 现材料的所述至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
将所述呈现图像以三维的方式进行显示。
29、 根据权利要求 28所述的视频处理方法, 其特征在于, 所述获取要呈现 的视频对象的至少两个视点的彩色 /灰度图像的步骤包括:
获取所述视频对象所处场景的同一视点的至少一个深度信息和至少一个彩 色 /灰度图像;
根据所述获取到的一个视点的深度信息对同一视点的彩色 /灰度图像进行 视频分割, 获取所述视频对象;
根据所述一个视点的深度信息和所述同一视点的彩色 /灰度图像生成所述 视频对象的至少一个其它视点的彩色 /灰度图像。
30、 根据权利要求 28所述的视频处理方法, 其特征在于, 还包括: 获取所述视频对象的位置信息和控制命令信息;
所述将所述视频对象的所述至少两个视点的彩色 /灰度图像和所述至少一 个呈现材料的所述至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像具体 为:
根据所述位置信息和控制命令信息, 将所述视频对象的至少两个视点的彩 色 /灰度图像和所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行合 成, 获取呈现图像。
31、 一种视频发送方法, 其特征在于, 包括:
获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至少一个呈现 材料的至少一个视点的彩色 /灰度图像;
将所述视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材 料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
将所述呈现图像进行编码并发送。
32、 根据权利要求 31所述的视频发送方法, 其特征在于, 所述获取要呈现 的视频对象的至少两个视点的彩色 /灰度图像的步骤包括:
获取所述视频对象所处场景的同一视点的至少一个深度信息和至少一个彩 色 /灰度图像;
根据所述获取到的一个视点的深度信息对同一视点的彩色 /灰度图像进行 视频分割, 获取所述视频对象;
根据所述一个视点的深度信息和所述同一视点的彩色 /灰度图像生成所述 视频对象的至少一个其它视点的彩色 /灰度图像。
33、 根据权利要求 31所述的视频发送方法, 其特征在于, 还包括: 获取所述视频对象的位置信息和控制命令信息;
所述将所述视频对象的所述至少两个视点的彩色 /灰度图像和所述至少一 个呈现材料的所述至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像具体 为:
根据所述位置信息和控制命令信息, 将所述视频对象的至少两个视点的彩 色 /灰度图像和所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行合 成, 获取呈现图像。
34、 一种视频发送方法, 其特征在于, 包括:
获取要呈现的视频对象的至少两个视点的彩色 /灰度图像和至少一个呈现 材料的至少一个视点的彩色 /灰度图像;
将所述要呈现的视频对象的至少两个视点的彩色 /灰度图像和所述至少一 个呈现材料的至少一个视点的彩色 /灰度图像进行编码并发送。
35、 根据权利要求 34所述的视频发送方法, 其特征在于, 所述获取要呈现 的视频对象的至少两个视点的彩色 /灰度图像的步骤包括:
获取所述视频对象所处场景的同一视点的至少一个深度信息和至少一个彩 色 /灰度图像;
根据所述获取到的一个视点的深度信息对同一视点的彩色 /灰度图像进行 视频分割, 获取所述视频对象;
根据所述一个视点的深度信息和所述同一视点的彩色 /灰度图像生成所述 视频对象的至少一个其它视点的彩色 /灰度图像。
36、 一种视频接收方法, 其特征在于, 包括:
获取编码图像;
对所述编码图像进行解码, 得到要呈现的视频对象的至少两个视点的彩色 / 灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像;
将所述视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材 料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
将所述呈现图像以三维的方式进行显示。
37、 根据权利要求 36所述的视频接收方法, 其特征在于, 还包括: 获取所述视频对象的位置信息和控制命令信息;
所述将所述视频对象的所述至少两个视点的彩色 /灰度图像和所述至少一 个呈现材料的所述至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像具体 为:
根据所述位置信息和控制命令信息, 将所述视频对象的至少两个视点的彩 色 /灰度图像和所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行合 成, 获取呈现图像。
38、 一种视频处理装置, 其特征在于, 包括:
图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点的彩色 / 灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将所述视频对 象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的至少一个视点 的彩色 /灰度图像进行合成, 获取呈现图像;
显示单元, 用于将所述呈现图像, 以三维的方式进行显示。
39、 根据权利要求 38所述的视频处理装置, 其特征在于, 所述图像获取和 处理单元包括:
图像获取模块, 用于获取所述视频对象所处场景的同一视点的至少一个深 度信息和至少一个彩色 /灰度图像;
视频分割模块, 用于根据所述获取到的一个视点的深度信息对同一视点的 彩色 /灰度图像进行视频分割, 获取所述视频对象;
多视点图像生成模块, 用于根据所述获取到的一个视点的深度信息和同一 视点的彩色 /灰度图像生成所述视频对象的至少一个其它视点的彩色 /灰度图 像;
合成模块, 用于将所述视频对象的至少两个视点的彩色 /灰度图像和至少一 个呈现材料的至少一个视点的彩色 /灰度图像进行合成 , 得到所述呈现图像。
40、 一种视频发送装置, 其特征在于, 包括:
图像获取单元, 用于获取要呈现的视频对象的至少两个视点的彩色 /灰度图 像和至少一个呈现材料的至少一个视点的彩色 /灰度图像;
编码单元, 用于将所述要呈现的视频对象的至少两个视点的彩色 /灰度图像 和所述至少一个呈现材料的至少一个视点的彩色 /灰度图像进行编码, 得到编码 图像;
发送单元, 用于发送所述编码图像。
41、 根据权利要求 40所述的视频发送装置, 其特征在于, 所述图像获取单 元包括: 图像获取模块, 用于获取所述视频对象所处场景的同一视点的至少一个深 度信息和至少一个彩色 /灰度图像;
视频分割模块, 用于根据所述获取到的一个视点的深度信息对同一视点的 彩色 /灰度图像进行视频分割, 获取所述视频对象;
多视点图像生成模块, 用于根据所述获取到的一个视点的深度信息和同一 视点的彩色 /灰度图像生成所述视频对象的至少一个其它视点的彩色 /灰度图 像。
42、 一种视频接收装置, 其特征在于, 包括:
接收单元, 用于接收编码图像;
解码单元, 用于对获取到的所述编码图像进行解码, 获取要呈现的视频对 象的至少两个视点的彩色 /灰度图像和至少一个呈现材料的至少一个视点的彩 色 /灰度图像;
合成单元, 用于将所述视频对象的至少两个视点的彩色 /灰度图像和所述至 少一个呈现材料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像; 显示单元, 用于将所述呈现图像, 以三维的方式进行显示。
43、 一种视频发送装置, 其特征在于, 包括:
图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点的彩色 / 灰度图像; 将所述视频对象的至少两个视点的彩色 /灰度图像和至少一个呈现材 料的至少一个视点的彩色 /灰度图像进行合成, 获取呈现图像;
编码单元, 用于将所述呈现图像进行编码, 得到编码图像;
发送单元, 用于发送所述编码图像。
44、 一种视频通信系统, 其特征在于, 包括: 视频发送装置和视频接收装 置,
所述视频发送装置包括:
图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点的彩色 / 灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将所述视频对 象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的至少一个视点 的彩色 /灰度图像进行合成, 获取呈现图像;
编码单元, 用于将要所述呈现图像进行编码, 得到编码图像;
发送单元, 用于发送所述编码图像;
所述视频接收装置包括:
接收单元, 用于接收所述编码图像;
解码单元, 用于对获取到的编码图像进行解码, 获取所述呈现图像; 显示单元, 用于将所述呈现图像, 以三维的方式进行显示。
45、 根据权利要求 44所述的视频通信系统, 其特征在于, 所述视频发送装 置还包括:
第二解码单元, 用于对获取的编码图像进行解码, 得到解码后的呈现图像; 第二显示单元, 用于对所述呈现图像进行三维显示。
46、 根据权利要求 44所述的视频通信系统, 其特征在于, 所述视频接收装 置还包括:
第二图像获取和处理单元, 用于获取要呈现的视频对象的至少两个视点的 彩色 /灰度图像和至少一个呈现材料的至少一个视点的彩色 /灰度图像; 将所述 视频对象的至少两个视点的彩色 /灰度图像和所述至少一个呈现材料的至少一 个视点的彩色 /灰度图像进行合成, 获取呈现图像;
第二编码单元, 用于将所述呈现图像进行编码, 得到编码图像并发送。
PCT/CN2009/072320 2008-06-17 2009-06-17 视频通讯方法、装置及系统 WO2009152769A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PL09765408T PL2299726T3 (pl) 2008-06-17 2009-06-17 Sposób, urządzenie i system komunikacji wideo
EP09765408A EP2299726B1 (en) 2008-06-17 2009-06-17 Video communication method, apparatus and system
ES09765408T ES2389401T3 (es) 2008-06-17 2009-06-17 Método, aparato y sistema de comunicación a través de vídeo
US12/971,392 US8446459B2 (en) 2008-06-17 2010-12-17 Video communication method, device, and system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN200810127007.4 2008-06-17
CN2008101270074A CN101610421B (zh) 2008-06-17 2008-06-17 视频通讯方法、装置及系统
CN200810210178.3 2008-08-29
CN 200810210178 CN101662694B (zh) 2008-08-29 2008-08-29 视频的呈现方法、发送、接收方法及装置和通信系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/971,392 Continuation US8446459B2 (en) 2008-06-17 2010-12-17 Video communication method, device, and system

Publications (1)

Publication Number Publication Date
WO2009152769A1 true WO2009152769A1 (zh) 2009-12-23

Family

ID=41433707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/072320 WO2009152769A1 (zh) 2008-06-17 2009-06-17 视频通讯方法、装置及系统

Country Status (5)

Country Link
US (1) US8446459B2 (zh)
EP (1) EP2299726B1 (zh)
ES (1) ES2389401T3 (zh)
PL (1) PL2299726T3 (zh)
WO (1) WO2009152769A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012055892A1 (en) * 2010-10-29 2012-05-03 Thomson Licensing Method for generation of three-dimensional images encrusting a graphic object in the image and an associated display device
WO2012059781A1 (en) * 2010-11-03 2012-05-10 Alcatel Lucent System and method for providing a virtual representation
EP2472878A1 (en) * 2010-12-31 2012-07-04 Advanced Digital Broadcast S.A. Method and apparatus for combining images of a graphic user interface with a stereoscopic video
CN102630026A (zh) * 2011-02-03 2012-08-08 美国博通公司 一种处理视频的方法及系统
US20130321662A1 (en) * 2011-02-08 2013-12-05 Furukawa Electric Co., Ltd. Optical module
EP2381692A3 (en) * 2010-04-19 2014-04-16 LG Electronics Inc. Image display apparatus and method for controlling the same

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9035876B2 (en) 2008-01-14 2015-05-19 Apple Inc. Three-dimensional user interface session control
US8933876B2 (en) 2010-12-13 2015-01-13 Apple Inc. Three dimensional user interface session control
JP2011151773A (ja) * 2009-12-21 2011-08-04 Canon Inc 映像処理装置及び制御方法
US20110164032A1 (en) * 2010-01-07 2011-07-07 Prime Sense Ltd. Three-Dimensional User Interface
US8787663B2 (en) * 2010-03-01 2014-07-22 Primesense Ltd. Tracking body parts by combined color image and depth processing
US9628722B2 (en) 2010-03-30 2017-04-18 Personify, Inc. Systems and methods for embedding a foreground video into a background feed based on a control input
US8649592B2 (en) 2010-08-30 2014-02-11 University Of Illinois At Urbana-Champaign System for background subtraction with 3D camera
US8872762B2 (en) 2010-12-08 2014-10-28 Primesense Ltd. Three dimensional user interface cursor control
JP5699609B2 (ja) 2011-01-06 2015-04-15 ソニー株式会社 画像処理装置および画像処理方法
WO2012150940A1 (en) * 2011-05-05 2012-11-08 Empire Technology Development Llc Lenticular directional display
US8848029B2 (en) * 2011-05-27 2014-09-30 Microsoft Corporation Optimizing room lighting based on image sensor feedback
US8773499B2 (en) * 2011-06-24 2014-07-08 Microsoft Corporation Automatic video framing
US9459758B2 (en) 2011-07-05 2016-10-04 Apple Inc. Gesture-based interface with enhanced features
US8881051B2 (en) 2011-07-05 2014-11-04 Primesense Ltd Zoom-based gesture user interface
US9377865B2 (en) 2011-07-05 2016-06-28 Apple Inc. Zoom-based gesture user interface
US9030498B2 (en) 2011-08-15 2015-05-12 Apple Inc. Combining explicit select gestures and timeclick in a non-tactile three dimensional user interface
GB201114591D0 (en) * 2011-08-23 2011-10-05 Tomtom Int Bv Methods of and apparatus for displaying map information
US9218063B2 (en) 2011-08-24 2015-12-22 Apple Inc. Sessionless pointing user interface
KR101809954B1 (ko) * 2011-10-12 2017-12-18 엘지전자 주식회사 이동 단말기 및 그 제어방법
KR20130080324A (ko) * 2012-01-04 2013-07-12 한국전자통신연구원 실감형 방송을 위한 스케일러블 비디오 코딩 장치 및 방법
KR20130088636A (ko) * 2012-01-31 2013-08-08 삼성전자주식회사 영상 전송 장치 및 방법, 그리고 영상 재생 장치 및 방법
US9229534B2 (en) 2012-02-28 2016-01-05 Apple Inc. Asymmetric mapping for tactile and non-tactile user interfaces
US9286658B2 (en) 2012-03-22 2016-03-15 Qualcomm Incorporated Image enhancement
US9584806B2 (en) * 2012-04-19 2017-02-28 Futurewei Technologies, Inc. Using depth information to assist motion compensation-based video coding
EP2667354B1 (en) * 2012-05-24 2015-07-08 Thomson Licensing Method and apparatus for analyzing stereoscopic or multi-view images
US9213556B2 (en) 2012-07-30 2015-12-15 Vmware, Inc. Application directed user interface remoting using video encoding techniques
US9277237B2 (en) 2012-07-30 2016-03-01 Vmware, Inc. User interface remoting through video encoding techniques
CN102821323B (zh) * 2012-08-01 2014-12-17 成都理想境界科技有限公司 基于增强现实技术的视频播放方法、系统及移动终端
CN102902710B (zh) * 2012-08-08 2015-08-26 成都理想境界科技有限公司 基于条形码的增强现实方法、系统及移动终端
TWI451344B (zh) * 2012-08-27 2014-09-01 Pixart Imaging Inc 手勢辨識系統及手勢辨識方法
US9292927B2 (en) * 2012-12-27 2016-03-22 Intel Corporation Adaptive support windows for stereoscopic image correlation
US9092657B2 (en) 2013-03-13 2015-07-28 Microsoft Technology Licensing, Llc Depth image processing
US9538081B1 (en) * 2013-03-14 2017-01-03 Amazon Technologies, Inc. Depth-based image stabilization
KR102001636B1 (ko) * 2013-05-13 2019-10-01 삼성전자주식회사 이미지 센서와 대상 객체 사이의 상대적인 각도를 이용하는 깊이 영상 처리 장치 및 방법
US9251613B2 (en) * 2013-10-28 2016-02-02 Cyberlink Corp. Systems and methods for automatically applying effects based on media content characteristics
US9485433B2 (en) 2013-12-31 2016-11-01 Personify, Inc. Systems and methods for iterative adjustment of video-capture settings based on identified persona
US9414016B2 (en) 2013-12-31 2016-08-09 Personify, Inc. System and methods for persona identification using combined probability maps
US10021366B2 (en) * 2014-05-02 2018-07-10 Eys3D Microelectronics, Co. Image process apparatus
GB201410285D0 (en) * 2014-06-10 2014-07-23 Appeartome Ltd Augmented reality apparatus and method
KR102214934B1 (ko) * 2014-07-18 2021-02-10 삼성전자주식회사 단항 신뢰도 및 쌍별 신뢰도 학습을 통한 스테레오 매칭 장치 및 방법
JP6646361B2 (ja) * 2015-04-27 2020-02-14 ソニーセミコンダクタソリューションズ株式会社 画像処理装置、撮像装置、画像処理方法およびプログラム
US9916668B2 (en) 2015-05-19 2018-03-13 Personify, Inc. Methods and systems for identifying background in video data using geometric primitives
US9563962B2 (en) 2015-05-19 2017-02-07 Personify, Inc. Methods and systems for assigning pixels distance-cost values using a flood fill technique
US9699405B2 (en) * 2015-07-14 2017-07-04 Google Inc. Immersive teleconferencing with translucent video stream
US10609307B2 (en) * 2015-09-28 2020-03-31 Gopro, Inc. Automatic composition of composite images or videos from frames captured with moving camera
US9883155B2 (en) 2016-06-14 2018-01-30 Personify, Inc. Methods and systems for combining foreground video and background video using chromatic matching
US9881207B1 (en) 2016-10-25 2018-01-30 Personify, Inc. Methods and systems for real-time user extraction using deep learning networks
CN107205144A (zh) * 2017-06-30 2017-09-26 努比亚技术有限公司 3d图像的合成方法及移动终端、存储介质
US10951859B2 (en) 2018-05-30 2021-03-16 Microsoft Technology Licensing, Llc Videoconferencing device and method
US11800056B2 (en) 2021-02-11 2023-10-24 Logitech Europe S.A. Smart webcam system
US11800048B2 (en) 2021-02-24 2023-10-24 Logitech Europe S.A. Image generating system with background replacement or modification capabilities
CN113822817A (zh) * 2021-09-26 2021-12-21 维沃移动通信有限公司 文档图像增强方法、装置及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000175171A (ja) * 1998-12-03 2000-06-23 Nec Corp テレビ会議の映像生成装置及びその生成方法
CN1275871A (zh) * 2000-07-21 2000-12-06 清华大学 多摄像头视频目标提取的视频图像通信系统及实现方法
JP2001141422A (ja) * 1999-11-10 2001-05-25 Fuji Photo Film Co Ltd 画像撮像装置及び画像処理装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004537082A (ja) * 2001-01-26 2004-12-09 ザクセル システムズ インコーポレイテッド 仮想現実環境における実時間バーチャル・ビューポイント
EP1433335B1 (en) 2001-08-15 2010-09-29 Koninklijke Philips Electronics N.V. 3d video conferencing system
KR100588042B1 (ko) 2004-01-14 2006-06-09 한국과학기술연구원 인터액티브 프레젠테이션 시스템
US7227567B1 (en) 2004-09-14 2007-06-05 Avaya Technology Corp. Customizable background for video communications
US7675520B2 (en) * 2005-12-09 2010-03-09 Digital Steamworks, Llc System, method and computer program for creating two dimensional (2D) or three dimensional (3D) computer animation from video

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000175171A (ja) * 1998-12-03 2000-06-23 Nec Corp テレビ会議の映像生成装置及びその生成方法
JP2001141422A (ja) * 1999-11-10 2001-05-25 Fuji Photo Film Co Ltd 画像撮像装置及び画像処理装置
CN1275871A (zh) * 2000-07-21 2000-12-06 清华大学 多摄像头视频目标提取的视频图像通信系统及实现方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2381692A3 (en) * 2010-04-19 2014-04-16 LG Electronics Inc. Image display apparatus and method for controlling the same
WO2012055892A1 (en) * 2010-10-29 2012-05-03 Thomson Licensing Method for generation of three-dimensional images encrusting a graphic object in the image and an associated display device
WO2012059781A1 (en) * 2010-11-03 2012-05-10 Alcatel Lucent System and method for providing a virtual representation
EP2472878A1 (en) * 2010-12-31 2012-07-04 Advanced Digital Broadcast S.A. Method and apparatus for combining images of a graphic user interface with a stereoscopic video
WO2012090059A1 (en) * 2010-12-31 2012-07-05 Advanced Digital Broadcast S.A. Method and apparatus for combining images of a graphic user interface with a stereoscopic video
CN102630026A (zh) * 2011-02-03 2012-08-08 美国博通公司 一种处理视频的方法及系统
US20130321662A1 (en) * 2011-02-08 2013-12-05 Furukawa Electric Co., Ltd. Optical module
US9628698B2 (en) * 2012-09-07 2017-04-18 Pixart Imaging Inc. Gesture recognition system and gesture recognition method based on sharpness values

Also Published As

Publication number Publication date
PL2299726T3 (pl) 2013-01-31
US8446459B2 (en) 2013-05-21
EP2299726A1 (en) 2011-03-23
EP2299726B1 (en) 2012-07-18
EP2299726A4 (en) 2011-05-25
ES2389401T3 (es) 2012-10-25
US20110090311A1 (en) 2011-04-21

Similar Documents

Publication Publication Date Title
US8446459B2 (en) Video communication method, device, and system
CN101610421B (zh) 视频通讯方法、装置及系统
CN101662694B (zh) 视频的呈现方法、发送、接收方法及装置和通信系统
US9986258B2 (en) Efficient encoding of multiple views
Mueller et al. View synthesis for advanced 3D video systems
JP5654138B2 (ja) 3dヒューマンマシンインターフェースのためのハイブリッドリアリティ
JP7036599B2 (ja) 奥行き情報を用いて全方向視差を圧縮したライトフィールドを合成する方法
Muller et al. Reliability-based generation and view synthesis in layered depth video
Kauff et al. Depth map creation and image-based rendering for advanced 3DTV services providing interoperability and scalability
US9060165B2 (en) 3D video communication method, sending device and system, image reconstruction method and system
JP5243612B2 (ja) 中間画像合成およびマルチビューデータ信号抽出
JP5544361B2 (ja) 三次元ビデオ信号を符号化するための方法及びシステム、三次元ビデオ信号を符号化するための符号器、三次元ビデオ信号を復号するための方法及びシステム、三次元ビデオ信号を復号するための復号器、およびコンピュータ・プログラム
JP4942106B2 (ja) 奥行データ出力装置及び奥行データ受信装置
WO2014037603A1 (en) An apparatus, a method and a computer program for image processing
KR20110093828A (ko) 3d 이미지 신호를 인코딩하기 위한 방법 및 시스템, 인코딩된 3d 이미지 신호, 3d 이미지 신호를 디코딩하기 위한 방법 및 시스템
JP7344988B2 (ja) ボリュメトリック映像の符号化および復号化のための方法、装置、およびコンピュータプログラム製品
US20230283759A1 (en) System and method for presenting three-dimensional content
Vetro et al. Depth‐Based 3D Video Formats and Coding Technology
Mora Multiview video plus depth coding for new multimedia services
Vázquez et al. A non-conventional approach to the conversion of 2D video and film content to stereoscopic 3D
Zhang et al. DIBR-based conversion from monoscopic to stereoscopic and multi-view video
Strintzis et al. Review of methods for object-based coding of stereoscopic and 3D image sequences
Pickering Stereoscopic and Multi-View Video Coding
Lee et al. 3D Video System: Survey and Possible Future Research
Xing Towards a three-dimensional immersive teleconferencing system: Design and implementation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09765408

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2009765408

Country of ref document: EP