WO2022022501A1 - 视频处理方法、装置、电子设备及存储介质 - Google Patents

视频处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022022501A1
WO2022022501A1 PCT/CN2021/108640 CN2021108640W WO2022022501A1 WO 2022022501 A1 WO2022022501 A1 WO 2022022501A1 CN 2021108640 W CN2021108640 W CN 2021108640W WO 2022022501 A1 WO2022022501 A1 WO 2022022501A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
texture
depth map
map
frame
Prior art date
Application number
PCT/CN2021/108640
Other languages
English (en)
French (fr)
Inventor
王荣刚
蔡砚刚
顾嵩
盛骁杰
Original Assignee
阿里巴巴集团控股有限公司
北京大学深圳研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 北京大学深圳研究生院 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2022022501A1 publication Critical patent/WO2022022501A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/246Calibration of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • the embodiments of this specification relate to the technical field of video processing, and in particular, to a video processing method, apparatus, electronic device, and storage medium.
  • Video data is data that supports video playback for the user to watch. Generally, the video data only supports the user to watch from one viewing angle, and the user cannot adjust the viewing angle.
  • Free viewpoint video is a technology that can provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during the viewing process, and watch from the free viewpoint they want to watch, which can greatly improve the viewing experience.
  • the above coding methods have many limitations.
  • the above encoding method may cause compression loss to the spliced image, thereby affecting the image quality of the reconstructed free-viewpoint video;
  • the decoding capability of the decoding end may be difficult to meet the requirements.
  • the embodiments of this specification provide a video processing method, an apparatus, an electronic device, and a storage medium, which can reduce the requirement on the decoding capability of the decoding end and reduce the compression loss.
  • the embodiments of this specification provide a video processing method, including:
  • the texture map spliced image and the depth map spliced image are separately encoded to obtain the corresponding texture map compressed image and depth map compressed image.
  • the method further includes: encapsulating the texture map compressed image and the depth map compressed image in a video compression frame of the same video channel.
  • the depth map compressed image is encapsulated in a frame header area of the video compressed frame.
  • the depth map stitched image is encoded in a region-of-interest-based encoding manner to obtain a corresponding depth map compressed image.
  • encoding is performed based on a preset constant quantization parameter to obtain a corresponding depth map compressed image.
  • the foreground edge pixel area in the depth map sub-region corresponding to each viewpoint included in the depth map mosaic image is encoded using the first constant quantization parameter
  • the non-foreground edge pixel area in the depth map sub-region corresponding to each viewpoint is encoded.
  • the pixel region is encoded using a second constant quantization parameter, and the parameter value of the first constant quantization parameter is smaller than the parameter value of the second constant quantization parameter.
  • the pixels in the depth map sub-region corresponding to each viewpoint included in the depth map mosaic image are in the pixel point set in the texture map mosaic image corresponding to the pixels in the texture map sub-region of the corresponding viewpoint one-to-one. all or part of the pixels.
  • the embodiments of this specification also provide another video processing method, including:
  • the encoding identification information corresponding to the encoding method is stored in the encoding information area in the image set user information; image, splicing the depth maps corresponding to the texture maps of the multiple viewpoints synchronized in the frame into a depth map splicing image, and encoding the texture map splicing image and the depth map splicing image respectively to obtain a corresponding texture map compressed image and depth map to compress the image.
  • the method further includes:
  • the texture map compressed image and the depth map compressed image are encapsulated in the same video channel.
  • encapsulating the texture map compressed image and the depth map compressed image in the same video channel includes:
  • the depth map compressed image is encapsulated into a depth map area in a frame header of a video compressed frame of the same video channel.
  • the method further includes:
  • the coding mode belongs to multi-view joint coding, obtain the viewpoint identifiers and frame identifiers of the texture maps and depth maps of the frame-synchronized multiple viewpoints, and store them in the image information area in the user information of the image set.
  • the encoding identification information corresponding to the encoding method is stored in the encoding information area in the user information of the image set; Encoding to obtain a texture map compressed image and a depth map compressed image.
  • the method further includes:
  • the texture map compressed image and the depth map compressed image are combined as an image and packaged into the same video channel to obtain a video compressed frame including the texture map compressed image and the depth map compressed image.
  • the viewpoint identifiers and frame identifiers of the texture maps and depth maps of the multiple viewpoints of the frame synchronization are obtained, and stored in the image information area in the user information of the image set, and the corresponding encoding methods are stored in the image information area.
  • the encoding identification information is stored in the encoding information area in the user information of the image set, including:
  • the viewpoint identifiers and frame identifiers of the texture maps and depth maps of the frame-synchronized multiple viewpoints are stored in the image information area in the frame header of the video compression frame according to the acquisition sequence.
  • splicing the texture maps of multiple viewpoints of frame synchronization into a texture map mosaic image, and splicing depth maps corresponding to the texture maps of the multiple viewpoints of frame synchronization into a depth map mosaic image comprising:
  • the depth maps corresponding to the texture maps of the multiple viewpoints synchronized in the frame are scanned according to the raster scanning method, and the depth maps of the multiple viewpoints are spliced according to the raster scanning order to obtain the depth map spliced image.
  • the embodiments of this specification also provide another video processing method, including:
  • the texture map compressed image and the depth map compressed image in the video compressed frame are decoded respectively to obtain the texture map mosaic An image and a depth map mosaic image; wherein the texture map mosaic image includes synchronized texture maps of multiple viewpoints; the depth map mosaic image includes depth maps corresponding to the texture maps of the multiple viewpoints.
  • the texture map compressed image and the depth map compressed image in the video compression frame are respectively decoded to obtain the texture map mosaic image and the depth map mosaic image, including:
  • the depth map compressed image is decoded by using a decoding method corresponding to the coding identification information of the depth map compressed image to obtain the depth map mosaic image;
  • the texture map compressed image is decoded by using a decoding method corresponding to the encoding identification information of the texture map compressed image to obtain the texture map mosaic image.
  • the texture map compressed image and the depth map compressed image in the video compression frame are respectively decoded to obtain the texture map mosaic image and the depth map mosaic image, including:
  • the depth map compressed image is decoded by using the second decoding resource to obtain a depth map mosaic image.
  • the method further includes:
  • the method further includes:
  • a corresponding multi-view joint decoding method is adopted for the texture map compressed image and the depth map compressed image in the video compressed frame Decoding is performed to obtain synchronized texture maps and corresponding depth maps of multiple viewpoints.
  • the decoding of the joint compressed image in the video compression frame to obtain the synchronized texture maps and corresponding depth maps of multiple viewpoints including:
  • the frame header of the video compression frame is decoded, and from the encoding information storage area in the frame header, the encoding identification information of the texture map compressed image and the encoding identification of the depth map compressed image are obtained;
  • the texture map compressed image and the depth map compressed image are decoded by using a multi-view joint decoding method corresponding to the encoding identification information of the texture map compressed image and the encoding identification of the depth map compressed image to obtain the synchronized multiple viewpoints The texture map and the corresponding depth map.
  • the method further includes:
  • the embodiments of this specification also provide another video processing method, including:
  • the depth map compressed image is decoded using a decoding method corresponding to the coding identification information of the depth map compressed image to obtain a depth map mosaic image, where the depth map mosaic image includes texture map viewpoints corresponding to the multiple viewpoints depth map.
  • the embodiments of this specification also provide a video processing apparatus, including:
  • a texture map splicing unit which is suitable for splicing texture maps of multiple viewpoints with frame synchronization into a texture map splicing image
  • a depth map splicing unit adapted to splicing depth maps corresponding to the texture maps of the multiple viewpoints synchronized by the frame into a depth map splicing image
  • a first encoding unit adapted to encode the texture map spliced image to obtain a corresponding texture map compressed image
  • the second encoding unit is adapted to encode the depth map spliced image to obtain a corresponding depth map compressed image.
  • the embodiments of this specification also provide another video processing apparatus, including:
  • an encoding category determination unit adapted to determine the encoding category to which the encoding mode belongs
  • a first encoding processing unit adapted to store the encoding identification information corresponding to the encoding mode in the encoding information area in the image set user information when it is determined that the encoding mode belongs to the depth map custom encoding; and synchronize the multiple viewpoints of the frame
  • the texture maps are spliced into a texture map spliced image
  • the depth maps corresponding to the texture maps of the multiple viewpoints of the frame synchronization are spliced into a depth map spliced image
  • the texture map spliced image and the depth map spliced image are encoded respectively. , get the corresponding texture map compressed image and depth map compressed image.
  • the device further includes:
  • An encapsulation unit adapted to encapsulate the texture map compressed image and the depth map compressed image in the same video channel.
  • the embodiments of this specification also provide another video processing apparatus, including:
  • a decapsulating unit adapted to decapsulate the free-view video stream, to obtain the video compression frame and the coding category information to which the coding mode of the video compression frame belongs;
  • the first decoding unit is adapted to, based on the coding category information of the video compressed frame, when it is determined that the video compressed frame adopts the depth map custom coding, to compress the texture map image and the depth map compressed image in the video compressed frame Decode respectively to obtain a texture map mosaic image and a depth map mosaic image; wherein the texture map mosaic image includes texture maps of multiple viewpoints that are synchronized; the depth map mosaic image includes depths corresponding to the texture maps of the multiple viewpoints picture.
  • the device further includes:
  • the second decoding unit is adapted to, based on the coding category information of the video compression frame, when it is determined that the video compression frame adopts multi-view joint coding, adopt the corresponding multi-view joint decoding for the joint compressed image in the video compression frame Decoding is performed in the same way to obtain the texture maps and corresponding depth maps of multiple simultaneous viewpoints.
  • the embodiments of this specification also provide another video processing apparatus, including:
  • An indication information decoding unit adapted to decode the frame header of the video compression frame, and obtain the encoding identification information of the depth map compressed image and the encoded identification information of the texture map compressed image included in the video compression frame from the frame header;
  • a texture map decoding unit adapted to decode the texture map compressed image by a decoding method corresponding to the encoded identification information of the texture map compressed image, to obtain a texture map mosaic image, and the depth map mosaic image includes a plurality of synchronized The texture map of the viewpoint;
  • a depth map decoding unit adapted to decode the depth map compressed image by a decoding method corresponding to the coding identification information of the depth map compressed image, to obtain a depth map mosaic image, the depth map mosaic image including the The depth map corresponding to the texture map view of each view.
  • the embodiments of this specification also provide an electronic device, including:
  • an image processing device suitable for splicing texture maps of a plurality of frame-synchronized viewpoints into a texture-map spliced image, and splicing depth maps corresponding to the texture maps of the frame-synchronized multiple viewpoints into a depth-map spliced image;
  • a first encoding device adapted to encode the texture map spliced image to obtain a corresponding texture map compressed image
  • the second encoding device is adapted to encode the depth map spliced image to obtain a corresponding depth map compressed image.
  • the embodiments of this specification also provide another electronic device, including: a first decoding device and a second decoding device, wherein:
  • the first decoding device is adapted to decode the frame header of the video compression frame, and when the encoding identification information obtained from the video compression frame obtained from the image set user information of the frame header includes the encoding identification information of the depth map compressed image
  • a decoding method corresponding to the encoded identification information of the compressed texture image is used to decode the compressed image of the texture image to obtain a mosaic image of the texture image
  • the mosaic image of the texture image includes: Synchronized texture maps of multiple viewpoints, and triggering the second decoding device to decode the depth map compressed image;
  • the second decoding device is adapted to decode the depth map compressed image by using a decoding method corresponding to the coding identification information of the depth map compressed image to obtain a depth map mosaic image, wherein the depth map mosaic image includes the depth maps corresponding to the texture map viewpoints of the multiple viewpoints.
  • the embodiments of this specification further provide another electronic device, including a memory and a processor, where the memory stores computer instructions that can be executed on the processor, wherein the processor executes the computer instructions when the processor executes the computer instructions.
  • inventions of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are executed.
  • the depth maps corresponding to the texture maps of the frame-synchronized multiple viewpoints are spliced into a depth map image mosaic image, and encode the texture image mosaic image and the depth map mosaic image respectively to obtain the corresponding texture image compressed image and depth map compressed image.
  • the images are encoded separately, so the corresponding encoding method can be used for encoding and compression based on the characteristics of the texture map and the depth map, so that the compression loss can be reduced, and the free viewpoint reconstructed based on the texture map spliced image and the depth map spliced image can be improved.
  • the image quality of the video is described by splicing the texture maps of the frame-synchronized multiple viewpoints into a texture map mosaic image.
  • the compressed image of the texture map and the compressed image of the depth map can be multiplexed into one video channel, so that existing hardware devices can be multiplexed.
  • the hardware interface is in line with the specifications of the existing open source framework, and it is also convenient for the synchronization of the depth map and the texture map.
  • the depth map compressed image is encapsulated in the frame header area of the video compression frame, and when different decoding resources are used at the decoding end to decode the texture map compressed image and the depth map compressed image, the amount of time used for the depth map can be reduced.
  • the reading of irrelevant data in the process of compressed image decoding can improve data processing efficiency and save processing resources.
  • the coding method based on the region of interest is used to encode the depth map mosaic image, so that the compression encoding can be performed according to the image characteristics of the depth map mosaic image, so that the compression loss of the depth map can be reduced, and the reconstruction based on the depth map can be improved. image quality of free-view video.
  • encoding is performed based on a preset constant quantization parameter to obtain a corresponding depth map compressed image, which can reduce the compression loss of the ROI pixel area.
  • the first constant quantization parameter is used to encode the foreground edge pixel region in the depth map sub-region corresponding to each viewpoint included in the depth map mosaic image, and the non-foreground edge pixel in the depth map sub-region corresponding to each viewpoint is encoded.
  • the region is encoded using the second constant quantization parameter. Since the foreground edge pixel region in the depth map subregion in the depth map stitched image is very critical to the reconstruction quality of the free-view video, the foreground edge pixel region in the depth map subregion is used.
  • the parameter value of the first constant quantization parameter is smaller than the parameter value of the second quantization parameter used for the non-foreground edge pixel area in the depth map sub-region, which can reduce the compression loss of the depth map mosaic image, thereby improving the freedom of reconstruction. Reconstruction quality of viewpoint video.
  • the pixels of the depth map sub-region corresponding to each viewpoint included in the depth map mosaic image are the pixels in the pixel set corresponding to the pixels in the texture map sub-region of the corresponding viewpoint in the texture map mosaic image one-to-one. For some pixels, the resolution of the depth map mosaic image can be reduced, thereby further saving transmission resources.
  • the texture maps of the frame-synchronized multiple viewpoints are spliced into a texture map spliced image, Stitching the depth maps corresponding to the texture maps of the multiple viewpoints of the frame synchronization into a depth map mosaic image, and encoding the texture map mosaic image and the depth map mosaic image respectively to obtain the corresponding texture map compressed image and depth map image compression image; store the encoding category identification and encoding identification information corresponding to the encoding method in the image set user information, through this video processing process, the texture map and the depth map can be separately spliced and encoded and compressed separately, so that the Different image features based on texture map and depth map use corresponding coding methods to reduce compression loss and reduce the decoding performance requirements of the decoder.
  • the scanning mode identification information corresponding to the scanning mode in the scanning mode information area in the video compression frame header it is unnecessary to store the texture map mosaic image and the depth map mosaic image in the video compression frame header.
  • the splicing rules and viewpoint identification can save transmission resources.
  • the video compression frame and the encoding type information of the encoding mode of the video compression frame are obtained, and based on the encoding type information of the video compression frame, when determining When the video compression frame adopts the depth map custom encoding, the texture map compressed image and the depth map compressed image in the video compression frame are respectively decoded to obtain a texture map mosaic image and a depth map mosaic image; wherein, the texture image
  • the spliced image includes synchronized texture maps of multiple viewpoints; the depth map spliced image includes depth maps corresponding to the texture maps of the multiple viewpoints.
  • the texture map compressed image and the depth map compressed image in the video compression frame can be decoded separately, so that different decoding methods can be used.
  • the resource decodes the texture map compressed image and the depth map compressed image respectively, so that the resources at the decoding end can be fully utilized, and the limitation of decoding performance and decoding capability of one decoding resource can be avoided.
  • multi-view joint coding can remove redundant information between views and improve coding efficiency.
  • FIG. 1 is a schematic diagram of a specific application system of a free-view video display in an embodiment of this specification
  • FIG. 2 is a schematic diagram of an interactive interface of a terminal device in an embodiment of this specification
  • FIG. 3 is a schematic diagram of a setting mode of a collection device in an embodiment of the present specification
  • FIG. 4 is a schematic diagram of another terminal device interaction interface in the embodiment of this specification.
  • FIG. 5 is a schematic diagram of a free-viewpoint video data generation process in an embodiment of the present specification
  • 6 is a schematic diagram of the generation and processing of a kind of 6DoF video data in the embodiment of this specification;
  • FIG. 7 is a schematic structural diagram of a data header file in an embodiment of the present specification.
  • FIG. 8 is a schematic diagram of a user side processing 6DoF video data in an embodiment of the present specification
  • FIG. 9 is a schematic structural diagram of a spliced image of a video frame in the embodiment of the present specification.
  • FIG. 10 is a flowchart of a video processing method in the embodiment of the present specification.
  • FIG. 11 is a schematic diagram of a texture map splicing process of a specific application scenario in the embodiment of this specification.
  • FIG. 12 is a schematic diagram of a depth map stitching process of a specific application scenario in the embodiment of this specification.
  • FIG. 13 is a schematic diagram of a raster scanning mode in the embodiment of this specification.
  • FIG. 14 is a schematic diagram of a method for setting a viewpoint identifier corresponding to a raster scanning method in an embodiment of the present specification
  • 16 is a schematic diagram of a process of encoding using depth map custom encoding in an embodiment of the present specification
  • FIG. 17 is a schematic diagram of a process of encoding using multi-view joint encoding in an embodiment of the present specification
  • FIG. 18 is a schematic structural diagram of a multimedia channel in an embodiment of the present specification.
  • 19 is a schematic diagram of a specific type and regional distribution of user information of an image set in the embodiment of this specification.
  • 20 is a schematic diagram of the format of a video compression frame in a video channel in the embodiment of this specification.
  • 21 is a flowchart of a video processing method in an embodiment of the present specification.
  • 26 is a flowchart of another video processing method in the embodiment of this specification.
  • FIG. 27 is a schematic structural diagram of a video processing apparatus in an embodiment of the present specification.
  • FIG. 29 is a schematic structural diagram of another video processing apparatus in the embodiment of this specification.
  • FIG. 30 is a schematic structural diagram of another video processing apparatus in the embodiment of this specification.
  • 31 is a schematic structural diagram of an electronic device in an embodiment of this specification.
  • FIG. 34 is a schematic structural diagram of a video processing system in an embodiment of the present specification.
  • a specific application system for free-view video display in an embodiment of the present invention may include a collection system 11 of multiple collection devices, a server 12 , and a display device 13 , wherein the collection system 11 can collect images of the area to be viewed.
  • the acquisition system 11 or the server 12 can process the acquired synchronized multiple texture maps to generate multi-angle free viewing angle data that can support the display device 13 to perform virtual viewpoint switching.
  • the display device 13 can display reconstructed images generated based on multi-angle free viewing angle data, the reconstructed images correspond to virtual viewpoints, and can display reconstructed images corresponding to different virtual viewpoints according to user instructions, and switch the viewing position and viewing angle.
  • the process of performing image reconstruction to obtain a reconstructed image may be implemented by the display device 13, or may be implemented by a device located in a content delivery network (Content Delivery Network, CDN) by means of edge computing.
  • CDN Content Delivery Network
  • the user can view the area to be viewed through the display device 13 , and in this embodiment, the area to be viewed is a basketball court. As mentioned earlier, the viewing position and viewing angle can be switched.
  • users can swipe across the screen to switch virtual viewpoints.
  • the virtual viewpoint for viewing can be switched.
  • the position of the virtual viewpoint before sliding may be VP 1
  • the position of the virtual viewpoint may be VP 2 .
  • the reconstructed image displayed on the screen may be as shown in FIG. 4 .
  • the reconstructed image may be obtained by performing image reconstruction based on multi-angle free viewing angle data generated from images collected by multiple collection devices in an actual collection situation.
  • the image viewed before switching may also be a reconstructed image.
  • the reconstructed images may be frame images in the video stream.
  • the manner of switching the virtual viewpoint according to the user's instruction may be various, which is not limited here.
  • the viewpoint can be represented by coordinates of 6 degrees of freedom (DoF), wherein the spatial position of the viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotation directions Accordingly, based on the coordinates of 6 degrees of freedom, a virtual viewpoint, including position and viewing angle, can be determined.
  • DoF degrees of freedom
  • the multi-angle free viewing angle data may include depth map data, which is used to provide third-dimensional information outside the plane image. Compared with other implementations, such as providing three-dimensional information through point cloud data, the data volume of the depth map data is smaller.
  • the switching of the virtual viewpoints may be performed within a certain range, which is a multi-angle free viewing angle range. That is, within the multi-angle free viewing angle range, the position and viewing angle of the virtual viewpoint can be switched arbitrarily.
  • the multi-angle free viewing angle range is related to the arrangement of the acquisition device.
  • the wider the shooting coverage of the acquisition device the larger the multi-angle free viewing angle range.
  • the quality of the picture displayed by the terminal device is related to the number of collection devices. Generally, the more collection devices are set, the fewer empty areas in the displayed picture.
  • the range of multi-angle free viewing angles is related to the spatial distribution of the acquisition devices.
  • the range of multi-angle free viewing angles and the interaction mode with the display device on the terminal side can be set based on the spatial distribution relationship of the collection devices.
  • texture map acquisition and depth map calculation are required, including three main steps, namely Multi-camera Video Capturing, camera internal and external parameter calculation (Camera Parameter Estimation), and Depth Map Calculation.
  • Multi-camera Video Capturing it is required that the video captured by each camera can be aligned at the frame level.
  • the texture image (Texture Image) can be obtained through the video acquisition of multiple cameras;
  • the camera parameters (Camera Parameter) can be obtained through the calculation of the internal and external parameters of the camera, and the camera parameters can include the internal parameter data of the camera and the external parameter data;
  • the depth map Through the depth map calculation, The depth map, multiple synchronized texture maps, depth maps and camera parameters corresponding to the viewing angle can be obtained to form 6DoF video data.
  • the texture map collected from multiple cameras, the camera parameters of all cameras, and the depth map of each camera are obtained.
  • These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6DoF video data. Because of these data, the client can generate virtual viewpoints according to the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.
  • DoF degrees of freedom
  • 6DoF video data and indicative data can be compressed and transmitted to the user side, and the user side can obtain the user side 6DoF expression according to the received data, that is, the aforementioned 6DoF video data and metadata.
  • the indicative data may also be called metadata, wherein the video data includes texture map and depth map data of each viewpoint corresponding to multiple cameras, and the texture map and depth map can be spliced according to certain splicing rules or splicing modes , forming a stitched image.
  • Metadata can be used to describe the data pattern of 6DoF video data, specifically can include: stitching pattern metadata (Stitching Pattern metadata), used to indicate the pixel data of multiple texture maps and depth map data in the stitched image. Storage rules; edge protection metadata (Padding pattern metadata), which can be used to indicate the way of edge protection in stitched images, and other metadata (Other metadata).
  • stitching Pattern metadata used to indicate the pixel data of multiple texture maps and depth map data in the stitched image.
  • Storage rules edge protection metadata (Padding pattern metadata), which can be used to indicate the way of edge protection in stitched images
  • other metadata Other metadata
  • the user side obtains 6DoF video data, which includes camera parameters, stitched images (texture map and depth map), and description metadata (metadata), in addition to user-end interactive behavior data .
  • 6DoF video data which includes camera parameters, stitched images (texture map and depth map), and description metadata (metadata), in addition to user-end interactive behavior data .
  • DIBR Depth Image-Based Rendering
  • the user side can use Depth Image-Based Rendering (DIBR, Depth Image-Based Rendering) for 6DoF rendering, so as to generate a virtual viewpoint image at a specific 6DoF position generated according to the user's behavior, that is, according to the user's behavior.
  • Indicate determine the virtual viewpoint of the 6DoF position corresponding to the indication.
  • the texture maps of multiple synchronized viewpoints and the corresponding depth maps are usually spliced to obtain a spliced image, and the spliced image is uniformly encoded and transmitted.
  • the texture map to viewpoint 8 and the depth map from viewpoint 1 to viewpoint 8 are spliced together to form a spliced image 90, and then the spliced image 90 can be encoded using an encoding method to obtain a video compression frame, and then the video compression frame can be processed. transmission.
  • the embodiments of this specification provide a video processing method, in which, based on the different image characteristics of the texture map and the depth map, the texture maps and corresponding depth maps of multiple synchronized viewpoints are spliced respectively to obtain the texture map.
  • the stitched image and the depth map stitched image are spliced, and the texture map stitched image and the depth map stitched image are encoded separately, so that the corresponding texture map compressed image and depth map compressed image can be obtained.
  • the corresponding encoding method can be used for encoding and compression based on the characteristics of the texture map and the depth map, so as to reduce the compression loss and improve the performance of the mosaic image based on the texture map.
  • the image quality of the free-view video reconstructed by stitching the image with the depth map.
  • the texture maps of the frame-synchronized multiple viewpoints can be scanned according to a preset scanning method, and the texture map stitched image can be obtained by stitching according to a preset stitching rule, such as the texture map stitching shown in FIG. 11 .
  • a preset stitching rule such as the texture map stitching shown in FIG. 11 .
  • the texture map stitched image video stream can be obtained according to the frame timing.
  • the depth map corresponding to the texture maps of the frame-synchronized multiple viewpoints may be scanned according to the preset scanning method, and the depth map stitched image may be obtained by stitching according to the preset stitching rule.
  • the schematic diagram of the depth map stitching process shown in Figure 12, for the depth map video streams corresponding to cameras 1 to 4, the depth map stitching image video stream can be obtained according to the frame sequence.
  • the specific scanning mode may be raster scan (RasterScan), zigzag scan (ZigZag-Scan) and the like.
  • raster scanning refers to scanning from left to right and from top to bottom, scanning one line first, and then moving to the starting position of the next line to continue scanning, as shown in the schematic diagram of raster scanning mode as shown in FIG. 13 .
  • Z is an image representation.
  • the texture map and depth map of each viewpoint in space may be acquired and spliced sequentially according to the scanning trajectory.
  • a fixed scanning manner is used for scanning by default.
  • the encoding end device and the decoding end device can agree to use the raster scanning method to scan, so that only the identification information corresponding to the texture map and depth map of each viewpoint can be transmitted during transmission, instead of the identification information corresponding to the scanning method.
  • the terminal device can decode according to the same scanning order.
  • a preset value can be set for viewpoints arranged in an arbitrary manner in space.
  • the setting rules for viewpoint identification in order to simplify the storage rules of viewpoint information in the texture map sub-region and depth map sub-region corresponding to the texture map mosaic image and the depth map mosaic image.
  • each camera is used as an original viewpoint.
  • a master camera may be selected, the master camera is used to ensure synchronous acquisition, the other cameras are slave cameras, and the master camera is marked as 0.
  • the embodiment of this specification does not limit the selection rule of the master camera.
  • a spatial coordinate system can be established with the main camera as the origin, as shown in Figure 14, and then, the sorting can be performed according to the spatial coordinate system (x, y, z).
  • An example sorting rule is as follows:
  • any camera is selected as the master camera, and its label is set to 0.
  • the coordinates (-1, 1, 1), (-1, 0 of the 7 slave cameras in the coordinate system are obtained. ,1),(1,0,1),(1,0,0),(-1,0,-1),(0,0,-1),(1,0,-1).
  • the sequence of the camera labels is determined through the above process, that is, the obtained raster scan sequence is determined.
  • the above camera labels can also represent the acquisition sequence of the texture map and depth map of each viewpoint. Therefore, the frame synchronization can be scanned according to the raster scan method. texture maps of several viewpoints, and splicing the texture maps of the multiple viewpoints according to the raster scanning order to obtain the texture map mosaic image; and scanning the texture maps of the frame-synchronized multiple viewpoints according to the raster scanning method corresponding depth map, and splicing the texture maps of the multiple viewpoints according to the raster scanning order to obtain the depth map spliced image.
  • S103 Encode the texture map mosaic image and the depth map mosaic image respectively to obtain a corresponding texture map compressed image and a depth map compressed image.
  • the texture map mosaic image and the depth map mosaic image may be encoded using different encoding modes respectively.
  • the depth map stitched image may be coded using a region-of-interest-based coding manner to obtain a corresponding depth map compressed image.
  • the depth map mosaic image is encoded by the coding method based on the region of interest, so that the compression encoding can be performed according to the image characteristics of the depth map mosaic image, thereby reducing the compression loss of the depth map, and further improving the free viewpoint reconstructed based on the depth map.
  • the image quality of the video is a region-of-interest-based coding manner to obtain a corresponding depth map compressed image.
  • encoding is performed based on a preset constant quantization parameter to obtain a corresponding depth map compressed image.
  • the inventor found through research that the quality of the foreground edge pixel region in the depth map sub-region is very critical to the image quality reconstructed based on the virtual viewpoint. Based on this, in order to improve the image quality of the reconstructed virtual viewpoints, in a specific implementation of this specification, for the foreground edge pixel regions in the depth map sub-regions corresponding to each viewpoint included in the depth map mosaic image, the first constant The quantization parameter is encoded, and the non-foreground edge pixel area in the depth map sub-region corresponding to each viewpoint is encoded with a second constant quantization parameter, and the parameter value of the first constant quantization parameter is smaller than the parameter value of the second constant quantization parameter .
  • texture map splicing images for texture map splicing images, according to needs, from VP8, VP9, AV1, H.264/AVC, H.265/HEVC, Versatile Video Coding (VVC), audio and video coding standards ( Audio and Video Coding Standard, AVS), AVS+, AVS2, AVS3 and other encoding methods suitable for texture maps are selected for encoding the texture map mosaic image to obtain a texture map compressed image.
  • VVC Versatile Video Coding
  • AVS Audio and Video Coding Standard
  • AVS+ Audio and Video Coding Standard
  • AVS2 AVS3 and other encoding methods suitable for texture maps are selected for encoding the texture map mosaic image to obtain a texture map compressed image.
  • H.264/AVC was jointly developed by ITU-T and ISO/IEC and is positioned to cover the entire video application field.
  • ITU-T named this standard H.264 (formerly H.26L), while ISO/IEC called it For MPEG-4 Advanced Video Coding (Advanced Video Coding, AVC).
  • H.265 is a new video coding standard developed by ITU-T VCEG after H.264. The full standard is called High Efficiency Video Coding (HEVC).
  • HEVC+, AVS2 and AVS3 are the optimization and evolution technologies of AVS.
  • step S101 and step S102 may be performed sequentially (step S101 may be performed first, or step S102 may be performed first), or may be performed in parallel.
  • step S103 the following steps may be performed:
  • the texture map compressed image and the depth map compressed image can be multiplexed into one video channel, so that the hardware interface of the existing hardware device can be multiplexed , and conforms to the specifications of the existing open source framework, and also facilitates the synchronization of the depth map and the texture map.
  • the depth map compressed image may be encapsulated in a frame header area of the video compressed frame.
  • the depth map compressed image is encapsulated in the frame header area of the video compression frame, and when different decoding resources are used at the decoding end to decode the texture map compressed image and the depth map compressed image, the amount of time used for the depth map can be reduced.
  • the reading of irrelevant data in the process of compressed image decoding can improve data processing efficiency and save processing resources.
  • the pixels in the depth map sub-region corresponding to each viewpoint included in the depth map mosaic image are a set of pixels in the texture map mosaic image that correspond one-to-one with the pixels in the texture map sub-region of the corresponding viewpoint all or part of the pixels in the .
  • the depth map mosaic image is obtained by down-sampling the original depth map or the pixels in the original depth map mosaic image, and each viewpoint included in the depth map mosaic image obtained by down-sampling corresponds to
  • the pixels in the depth map sub-region are part of the pixel set in the texture map spliced image that corresponds to the pixels in the texture map sub-region of the corresponding viewpoint, which can reduce the resolution of the depth map spliced image, so that Transmission resources can be further saved.
  • step S151 Determine the encoding category to which the encoding method belongs. If it is determined that the encoding method belongs to depth map custom encoding, step S152 may be performed; if it is determined that the encoding method belongs to multi-view joint encoding, step S153 may be performed.
  • the encoding system or the electronic device used for encoding can perform two actions: one aspect is to store the encoding identification information corresponding to the encoding method to the image set user
  • the coded information area in the information is convenient for the decoding system or the electronic device used for decoding to identify and use the corresponding decoding method for decoding.
  • the encoding is performed by using the depth map custom encoding.
  • FIG. 16 it is a schematic diagram of the encoding process using the depth map custom encoding, wherein the texture maps T1-T4 of the four viewpoints of frame synchronization can be spliced according to the preset scanning method to obtain a texture map spliced image Tx, and then use the first encoding method to encode the texture map mosaic image Tx to obtain the texture map compressed image Te; for the depth maps D1 to D4 corresponding to the texture maps T1 to T4 of the four viewpoints of frame synchronization, the preset The spliced depth map image Dx is obtained by splicing the depth map spliced image Dx, and then the depth map spliced image Dx is encoded using the second encoding method to obtain the depth map compressed image De.
  • different encoding methods are used for the texture map mosaic image Tx and the depth map mosaic image Dx, so that a more
  • the texture maps T1 to T4 of the multiple viewpoints synchronized by the frame may be scanned in a raster scanning manner, and the texture maps of the multiple viewpoints may be spliced in a raster scanning order to obtain the texture and scan the depth maps D1 to D4 corresponding to the texture maps of the multiple viewpoints synchronized in the frame according to the raster scanning method, and splicing the texture maps of the multiple viewpoints according to the raster scanning order to obtain The depth map stitches the image Dx.
  • the first encoding mode may be from VP8, VP9, AV1, H.264/AVC, H.265/HEVC, VVC, AVS, AVS+, AVS2, AVS3, etc. suitable for texture maps.
  • Select in the coding mode; the second coding mode can be coded by using the coding mode based on the region of interest, for example, for the region of interest (Region Of Internet, ROI) pixel area in the depth map mosaic image, can be based on The preset constant quantization parameter is encoded to obtain the corresponding depth map compressed image.
  • an encoder or encoding software that supports a corresponding encoding manner may be used for implementation.
  • S153 Acquire the viewpoint identifiers and frame identifiers of the texture maps and depth maps of the frame-synchronized multiple viewpoints, store them in the image information area in the user information of the image set, and store the encoding identifier information corresponding to the encoding mode to the encoding information area in the user information of the image set; and encoding the texture maps and depth maps of the frame-synchronized multiple viewpoints using a preset multi-view joint encoding method to obtain a joint compressed image.
  • multi-view joint coding in addition to depth map custom coding, there are also coding types such as multi-view joint coding. Therefore, in addition to depth map custom coding, multi-view joint coding can also be used as needed. Encode texture and depth maps for multiple viewpoints.
  • Using multi-view joint coding can remove redundant information between views and improve coding efficiency.
  • the encoding system or the electronic device used for encoding may perform actions in two aspects: one aspect is to acquire the texture maps and depths of the multiple viewpoints synchronized by the frame
  • the viewpoint identifier and frame identifier of the image are stored in the image information area in the user information of the image set, and the encoding identifier information corresponding to the encoding method is stored in the encoding information area in the user information of the image set, so as to facilitate
  • the decoding system or the electronic device used for decoding identifies and uses a corresponding decoding method to perform decoding.
  • the multi-view joint coding method is used to perform coding.
  • FIG. 17 which is a schematic diagram of a coding process using multi-view joint coding in an embodiment of the present specification
  • viewpoints view 0 to viewpoint N-1
  • One viewpoint is selected as the texture map of the independent viewpoint as the independent view.
  • the texture map of viewpoint 0 is selected as the independent view, and the HEVC-compliant video encoding device is used for encoding.
  • the encoded texture map is compressed.
  • the image is output to the encapsulation device.
  • the preset depth map encoding device is used to encode the depth map of the viewpoint 0 with reference to the information of the texture compressed image output by the HEVC-compliant video encoding device.
  • the obtained depth map compressed image on the one hand, is output to the encapsulation device for further encapsulation, and on the other hand, it is output to the video encoding device for dependent views and the depth map encoding device for dependent views of other viewpoints for use.
  • a device, a depth map encoding device, a video encoding device for dependent views, a depth map encoding device for dependent views, and other encoding devices jointly encode texture map compressed images and depth map compressed images.
  • each encoding apparatus may adopt the same encoding manner, or may select different encoding manners according to different characteristics of the texture map and the depth map.
  • the specific multi-view joint encoding mode may be selected from encoding modes such as VP8, VP9, AV1, H.264/AVC, H.265/HEVC, VVC, AVS, AVS+, AVS2, and AVS3.
  • the involved coding-related indication information such as the coding category identifier corresponding to the coding mode, and the coding identification information of the specific coding mode, may be stored in the image set user information.
  • step S152 or step S153 the encoding and compression of the video image can be realized, and the multimedia content (video, audio, subtitle, chapter information, etc.) generated by the encoder can be packaged for video transmission and synchronous playback of different multimedia contents.
  • the specific encapsulation format can be understood as the container of the media.
  • the encoded data of the video channel and the encoded data of the audio channel can be packaged into a complete multimedia file.
  • encoding information, encapsulation information, etc. can be regarded as image set user information. Based on different encapsulation formats and encoding methods, different types of user information in the image set user information can be stored in one area, or can be stored in one area. Set in different storage areas as needed. The following will be described through specific application scenarios.
  • the texture map compressed image and the depth map compressed image encoded by the depth map custom coding method and the multi-view joint coding method are respectively encoded, and the user information of the image set is introduced in combination with the depth map custom coding method and the multi-view joint coding method. How some examples are stored.
  • one video channel may be multiplexed.
  • the compressed texture map image and the depth map image may be multiplexed.
  • the compressed image is encapsulated in the video compression frame of the same video channel, as shown in the schematic diagram of the multimedia information channel structure in Figure 18, the multimedia information channel 180 includes the video channel 181 and the audio channel 182, in order to realize the compressed image and depth of the encoded texture map.
  • the compressed image of the texture map and the compressed image of the depth map can be encapsulated in the video channel 181. More specifically, the compressed image of the texture map and the compressed image of the depth map can be encapsulated in the video of the same video channel. compressed frame.
  • a series of video compression frames are encapsulated according to time series, and each video compression frame 18F includes a frame header 18A and a frame body 18B.
  • the depth map compressed image obtained by using the depth map encoding method can be encapsulated into the depth map area in the frame header 18A of the video compression frame 18F of the video channel 181, and the texture map The compressed image is encapsulated in the frame body 18B of the video compressed frame 18F of the video channel 181 .
  • the texture map compressed image and texture map compressed image obtained by encoding in step S153 can be similarly encapsulated in the video compressed frame of the video channel.
  • the synchronized texture map compressed image of each viewpoint and the depth map compressed image of the same viewpoint may be sequentially stored in the frame body 18B of the video compression frame 18F.
  • the following describes the specific storage method of the user information of the image set after the encapsulation processing with the example of depth map custom coding and multi-view joint coding.
  • the encoding category identifier corresponding to the encoding method may be stored in the image set user information area in the channel header of the video channel, and the encoding category identifier is used as a subset of the image set user information, in order to achieve fast decoding and save energy.
  • Storage resources which can be stored in the image set user information area in the channel header of the video channel. For example, if only two types of encoding types, depth map custom encoding and multi-view joint encoding are available, they can be specially set in the channel header information.
  • the 1-bit character is used to identify whether the video channel contains a video compressed frame obtained by a depth map custom coding method or a video compressed frame obtained by a multi-view joint coding method.
  • the encoding identification information may be stored in the encoding information area in the frame header of the video compression frame.
  • the specific identification information in the frame header of the video compression frame and the specific encoding method of the depth map is 3D-HEVC, or 3D-AVC, etc.; if the depth map custom encoding is used, the specific identification information in the frame header of the video compression frame can identify the texture map mosaic image and the depth map mosaic image.
  • the encoding method is AVC, HEVC, AVS, AVS2, AVS3, etc.
  • the identifier of the viewpoint included in the joint compressed image and the frame identifier corresponding to the joint spliced image may be stored in the video compression frame.
  • the encoding category area Ca1 can be set in the channel header Ca of the video channel, and the encoding category identifier is stored to represent the encoding to which the specific encoding method belongs.
  • Category such as depth map custom coding, or multi-view joint coding
  • encoding identification information is stored in the coding information area Ia1 in the frame header Ia of the video compression frame I, to represent the depth map custom coding or multi-view joint coding.
  • the specific encoding mode used for example, which encoding mode or multiple encoding modes are specifically 3D-HEVC, 3D-AVC, AVC, HEVC, AVS, etc.
  • the frame header Ia of the video compression frame 1 may include the image information storage area Ia2, which can store multiple The identification of the viewpoint and the frame identification of the texture map and depth map for which the view is jointly encoded.
  • the texture map compressed images and depth map compressed images of each viewpoint can be interleaved in the frame body Ib in sequence, as shown in Figure 20, the video compression frame in the video channel
  • the video compression frame in the video channel The format schematic diagram of , wherein, the texture map compressed image and the depth map compressed image of each viewpoint obtained in the video compression frame 1 can be arranged adjacently as one image combination, and the images of each viewpoint are arranged in sequence according to the processing order of the encoding device, and In the image information storage area Ia2, the viewpoint identifiers and frame identifiers of each texture map compressed image and depth map compressed image are arranged in the same order.
  • the viewpoint identifiers corresponding to the texture map compressed image and the depth map compressed image can be As an array arrangement, for example, it can be represented as V0D0V1D1V2D2..., where V represents the texture map and D represents the depth map.
  • the compressed image of the texture map is stored in the frame body Ib, and the compressed image of the depth map can be stored in the depth map storage area Ia3 of the frame header Ia.
  • FIG. 19 and FIG. 20 only illustrate the schematic structure of one video compressed frame in a video channel.
  • multiple video compressed frames are sequentially stored in the video channel to form a video stream.
  • the free-view video stream can be transmitted through the video channel, and then the free-view video can be played based on the free-view video stream.
  • the following steps can be used to process the obtained free-view video stream:
  • S211 Decapsulate the free-view video stream to obtain the video compression frame and the coding category information to which the coding mode of the video compression frame belongs.
  • the channel header of the video channel is decapsulated to obtain the encoding category information to which the encoding method of the video compressed frame belongs.
  • the texture map mosaic image includes synchronized texture maps of multiple viewpoints; the depth map mosaic image includes depth maps corresponding to the texture maps of the multiple viewpoints.
  • the joint decoding method is used for decoding to obtain synchronized texture maps and corresponding depth maps of multiple viewpoints.
  • Step S213, as an optional step, can implement decoding of the video compressed frame obtained by the multi-view joint encoding method.
  • step S212 or step S213 referring to the flowchart of the decoding method shown in FIG. 22, for the video compression frame of the depth map custom encoding method, the following decoding methods can be used for decoding:
  • S2122 Decode the depth map compressed image by using a decoding method corresponding to the encoding identification information of the depth map compressed image, to obtain the depth map mosaic image.
  • S2123 Decode the compressed texture image by using a decoding method corresponding to the encoded identification information of the compressed texture image, to obtain the texture image mosaic image.
  • step S2122 and step S2123 different decoding resource distributions may be used for decoding.
  • the depth map compressed image is stored in the depth map storage area of the frame header, and the texture map compressed image is stored in the frame body Ib, as shown in Figure 19 , the texture map compressed image may be decoded by using the first decoding resource to obtain a texture map mosaic image, and the depth map compressed image may be decoded using the second decoding resource to obtain a depth map mosaic image.
  • S2132 Decode the texture map compressed image and the depth map compressed image by using a multi-view joint decoding method corresponding to the encoding identification information of the texture map compressed image and the encoding identification of the depth map compressed image, to obtain the synchronized multi-view image.
  • the texture map compressed image and the depth map compressed image are interleaved and stored in the frame body Ib, as shown in Figure 20.
  • the texture map compressed image may be decoded using the first decoding resource to obtain a texture map
  • the depth map compressed image may be decoded using the second decoding resource to obtain a depth map.
  • the texture maps and depth maps of other viewpoints may refer to the texture maps or depth maps decoded from other viewpoints during the decoding process.
  • the first decoding resource is a hardware decoding resource
  • the second decoding resource is a software decoding resource
  • the first decoding resource is a graphics processing unit (Graphics Processing Unit, GPU) decoding chip
  • the second decoding resource is a central processing unit (Central Processing Unit, CPU) decoding chip.
  • the image reconstruction of the virtual viewpoint can be performed based on the free viewpoint according to the decoded image data.
  • the specific data referenced for image reconstruction is slightly different, and the following is an example description through two specific embodiments:
  • image reconstruction can be performed in the following manner:
  • S242 Acquire a virtual viewpoint, and based on the parameter data of the viewpoint, select a texture map and a depth map sub-region in the texture map spliced image and the depth map spliced image in the corresponding texture map sub-regions that match the position of the virtual viewpoint
  • the depth map is used as a reference texture map and a reference depth map.
  • the virtual viewpoint may be determined based on user interaction behavior, or determined based on preset position information of the virtual viewpoint.
  • texture maps and depth maps of some viewpoints may be selected for image reconstruction of virtual viewpoints based on spatial position relationship information.
  • the reference texture map and the reference depth map may be determined based on the distance relationship between the virtual viewpoint position and the texture map and depth map in each texture map sub-region in the texture map mosaic image and the depth map mosaic image.
  • the spatial position relationship information may be determined based on the parameter information of the texture map and the depth map, and the parameter information may be stored in the frame header of the video compression frame or the channel header of the video channel.
  • the parameter information may include internal parameter information and external parameter information of each viewpoint.
  • combined rendering is performed based on the reference texture map and the reference depth map, and the image of the virtual viewpoint can be obtained.
  • the texture map mosaic image and the depth map mosaic image can be encoded by the encoding method matching the characteristics of the image, so the compression loss of the texture map mosaic image and the depth map mosaic image obtained by decoding It is smaller, so the quality of the reconstructed virtual viewpoint image can be improved.
  • image reconstruction can be performed in the following manner:
  • S251 Obtain the identifier of the viewpoint, the frame identifier of the compressed video frame, and the parameter data of each viewpoint from the frame header of the compressed video frame.
  • S252 Acquire a virtual viewpoint, and based on the identification and parameter data of each viewpoint, select a texture map and a depth map matching the virtual viewpoint from the synchronized texture maps and corresponding depth maps of multiple viewpoints as the reference texture map and Refer to the depth map.
  • the virtual viewpoint may be determined based on user interaction behavior, or determined based on preset position information of the virtual viewpoint.
  • texture maps and depth maps of some viewpoints may be selected for image reconstruction of virtual viewpoints based on spatial position relationship information.
  • the reference texture map and the reference depth map may be determined based on the distance relationship between the virtual viewpoint position and the texture map and depth map in each texture map sub-region in the texture map mosaic image and the depth map mosaic image.
  • the spatial position relationship information may be determined based on the parameter information of the texture map and the depth map, and the parameter information may be stored in the frame header of the video compression frame or the channel header of the video channel.
  • the parameter information may include internal parameter information and external parameter information of each viewpoint.
  • combined rendering is performed based on the reference texture map and the reference depth map, and the image of the virtual viewpoint can be obtained.
  • the above method is used to reconstruct the image of the virtual viewpoint. Since the texture map and depth map of each viewpoint can be encoded separately by the coding method that matches the image characteristics, the compression loss of the decoded texture map and depth map of each viewpoint is small. , so the image quality of the reconstructed virtual viewpoint can be improved.
  • AR Augmented Reality
  • certain objects in the image of the free-view video may be determined as virtual rendering target objects based on certain indication information, and the indication information may be generated based on user interaction, or may be based on certain preset trigger conditions or a third party. command is obtained.
  • the virtual rendering target object in the image of the virtual viewpoint may be acquired in response to the interactive control instruction generated by the special effect.
  • S262 Acquire a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object.
  • the implanted AR special effects are presented in the form of virtual information images.
  • the virtual information image may be generated based on augmented reality special effect input data of the target object.
  • a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object may be acquired.
  • the virtual information image corresponding to the virtual rendering target object may be generated in advance, or may be generated immediately in response to the special effect generation instruction.
  • a virtual information image matching the position of the virtual rendering target object can be obtained based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the reconstructed image, so that the obtained virtual information image can be made to match the position of the virtual rendering target object.
  • the position of the virtual rendering target object in the three-dimensional space is more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite image is more realistic and vivid, and the user's visual experience is enhanced.
  • a virtual information image corresponding to the target object can be generated according to a preset special effect generation method based on the augmented reality special effect input data of the virtual rendering target object.
  • the augmented reality special effect input data of the target object may be input into a preset three-dimensional model, and the output matches the virtual rendering target object based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the image.
  • the augmented reality special effects input data of the virtual rendering target object can be input into a preset machine learning model, and the position of the virtual rendering target object in the image obtained based on the three-dimensional calibration can be output and the same as that of the virtual rendering target object.
  • S263 Perform synthesis processing on the virtual information image and the image of the virtual viewpoint and display.
  • the virtual information image and the image of the virtual viewpoint can be synthesized and displayed in various ways. Two specific implementation examples are given below:
  • Example 1 Perform fusion processing on the virtual information image and the corresponding image to obtain a fusion image, and display the fusion image;
  • Example 2 The virtual information image is superimposed on the corresponding image to obtain a superimposed composite image, and the superimposed composite image is displayed.
  • the obtained composite image may be displayed directly, or the obtained composite image may be inserted into a video stream to be played for playback and display.
  • the fused image may be inserted into the video stream to be played for display.
  • the free viewpoint video may include a special effect display identifier.
  • the superimposed position of the virtual information image in the image of the virtual viewpoint may be determined based on the special effect display identifier, and then the virtual information image may be placed in the image of the virtual viewpoint. The determined superposition position is displayed in superposition.
  • the video processing apparatus 270 may include: a texture map stitching unit 271 , a depth map stitching unit 272 , and a first coding unit 273 and the second coding unit 274, wherein:
  • the texture map splicing unit 271 is suitable for splicing texture maps of multiple viewpoints of frame synchronization into a texture map splicing image
  • the depth map splicing unit 272 is adapted to splicing depth maps corresponding to the texture maps of the multiple viewpoints of the frame synchronization into a depth map splicing image;
  • the first encoding unit 273 is adapted to encode the texture map spliced image to obtain a corresponding texture map compressed image
  • the second encoding unit 274 is adapted to encode the depth map mosaic image to obtain a corresponding depth map compressed image.
  • the texture maps and depth maps of multiple viewpoints used to form synchronization of free-viewpoint videos can be compressed and encoded by means of depth map custom coding.
  • the video processing apparatus 280 may include: an encoding category determination unit 281 and a first encoding processing unit Unit 282, where:
  • the first encoding processing unit 282 is adapted to, when it is determined that the encoding method belongs to the custom encoding of the depth map, the encoding category identifier and encoding identifier information corresponding to the encoding method are stored in the encoding information area in the image set user information; and the frame
  • the texture maps of the synchronized multiple viewpoints are spliced into a texture map spliced image
  • the depth maps corresponding to the texture maps of the frame-synchronized multiple viewpoints are spliced into a depth map spliced image
  • the texture map spliced image and the depth map are spliced together.
  • the image mosaic images are encoded separately to obtain the corresponding compressed image of texture map and compressed image of depth map.
  • the video processing apparatus 280 may further include: a second encoding processing unit 283, adapted to acquire the texture map and the viewpoints of the depth map of the frame-synchronized multiple viewpoints when it is determined that the encoding mode belongs to the multi-view joint encoding identification and frame identification, and store them in the image information area in the image set user information, store the encoding identification information corresponding to the encoding mode in the encoding information area in the image set user information; and synchronize the frame
  • the texture maps and depth maps of the multiple viewpoints are encoded by using a preset multi-view joint coding method, respectively, to obtain a texture map compressed image and a depth map compressed image.
  • the video processing apparatus 280 may further include: an encapsulation unit 284, adapted to encapsulate the texture map compressed image and the depth map compressed image in the same video channel.
  • the encapsulation unit 284 may place the texture map compressed image in the frame body of the video compression frame for the texture map compressed image and the depth map compressed image obtained by using the depth map custom encoding process, and compress the depth map The image is placed in the frame header of the video compressed frame.
  • the encapsulation unit 284 may interleave the texture map compressed image and the depth map compressed image in the frame body of the video compression frame for the texture map compressed image and the depth map compressed image obtained by the multi-view joint encoding process. .
  • the embodiments of this specification also provide another video processing apparatus, which can perform decapsulation and decoding processing on video compressed frames.
  • the video processing apparatus 290 may include: a decapsulation unit 291 and a first decoding unit 292, in:
  • the decapsulating unit 291 is adapted to decapsulate the free-view video stream to obtain the video compression frame and the coding category information to which the coding mode of the video compression frame belongs;
  • the first decoding unit 292 is adapted to, based on the coding category information of the video compressed frame, compress the image and depth of the texture map in the video compressed frame when it is determined that the video compressed frame adopts the depth map custom coding.
  • the compressed images are decoded respectively to obtain a texture map mosaic image and a depth map mosaic image; wherein, the texture map mosaic image includes texture maps of multiple simultaneous viewpoints; the depth map mosaic image includes texture maps of the multiple viewpoints the corresponding depth map.
  • the video processing apparatus may further include: a second decoding unit 293 adapted to, based on the coding category information of the compressed video frame, determine that the compressed video frame adopts multi-view joint coding At the time of decoding, the texture map compressed image and the depth map compressed image in the video compression frame are decoded using the corresponding multi-view joint decoding method to obtain synchronized texture maps and corresponding depth maps of multiple viewpoints.
  • a second decoding unit 293 adapted to, based on the coding category information of the compressed video frame, determine that the compressed video frame adopts multi-view joint coding
  • the texture map compressed image and the depth map compressed image in the video compression frame are decoded using the corresponding multi-view joint decoding method to obtain synchronized texture maps and corresponding depth maps of multiple viewpoints.
  • the video processing apparatus 300 shown in FIG. 30 may include: an indication information decoding unit 301 , a texture map decoding unit 302 and a depth map decoding unit 303 ,in:
  • the indication information decoding unit 301 is adapted to decode the frame header of the video compression frame, and obtain the video compression frame from the frame header including the encoding identification information of the depth map compressed image and the encoding identification of the texture map compressed image. information;
  • the texture map decoding unit 302 is adapted to decode the texture map compressed image by using a decoding method corresponding to the encoded identification information of the texture map compressed image, to obtain a texture map mosaic image, and the depth map mosaic image includes synchronization. texture maps of multiple viewpoints;
  • the depth map decoding unit 303 is adapted to decode the depth map compressed image by using a decoding method corresponding to the encoding identification information of the depth map compressed image, to obtain a depth map mosaic image, and the depth map mosaic image includes: depth maps corresponding to the texture map viewpoints of the multiple viewpoints.
  • the embodiment of this specification also provides a corresponding video processing device.
  • the electronic device can be used to encode and compress video data, as shown in FIG. 31 .
  • the electronic device 310 may include:
  • the image processing device 311 is adapted to splicing the texture maps of the frame-synchronized multiple viewpoints into a texture map spliced image, and splicing the depth maps corresponding to the texture maps of the frame-synchronized multiple viewpoints into a depth map spliced image;
  • the first encoding device 312 is adapted to encode the texture map spliced image to obtain a corresponding texture map compressed image
  • the second encoding device 313 is adapted to encode the depth map spliced image to obtain a corresponding depth map compressed image.
  • the image processing apparatus 311 may be implemented by a preset data processing chip, for example, any one of a single-core or multi-core processor, a GPU chip, and an FPGA chip.
  • the first encoding device 312 and the second encoding device 313 may be implemented by a special encoder, encoding software, or encoding software in cooperation with an encoder.
  • the first encoding device 312 and the second encoding device 313 may be executed independently, or may be executed in cooperation.
  • the first encoding device 312 and the second encoding device 313 may also be used to perform multi-view joint encoding.
  • the first encoding device encodes the texture map
  • the second encoding device encodes the texture map.
  • the depth map is encoded.
  • the electronic device can be used to decode the video compressed frames of the video channel.
  • the electronic device 320 includes: a first decoding device 321 and a second decoding device 321.
  • the first decoding device 321 is adapted to decode the frame header of the video compression frame, and when the encoding identification information obtained from the video compression frame obtained from the image set user information of the frame header includes the encoding identification of the depth map compressed image. information and the encoded identification information of the compressed texture image, the texture image compressed image is decoded by a decoding method corresponding to the encoded identification information of the compressed texture image to obtain a texture image mosaic image, the texture image mosaic image including synchronized texture maps of multiple viewpoints, and triggering the second decoder to decode the depth map compressed image;
  • the second decoding device 322 is adapted to decode the depth map compressed image by using a decoding method corresponding to the encoding identification information of the depth map compressed image to obtain a depth map mosaic image, where the depth map mosaic image includes a depth maps corresponding to the texture map viewpoints of the multiple viewpoints.
  • the above-mentioned electronic equipment can also be used to decode the compressed video frame obtained by using the multi-view joint encoding method, wherein the first decoding device 321 can perform texture map encoding, and the second decoding device 322 Do depth map encoding.
  • the first decoding device 321 can perform texture map encoding
  • the second decoding device 322 Do depth map encoding.
  • the embodiment of this specification further provides another electronic device, wherein, as shown in FIG. 33 , the electronic device 330 may include a memory 331 and a processor 332 .
  • the memory 331 stores computer instructions that can be executed on the processor 332, wherein, when the processor executes the computer instructions, the steps of the method described in any of the foregoing embodiments can be executed.
  • the electronic device may also include other electronic components or assemblies.
  • the electronic device 330 may further include a communication component 333, which may communicate with the acquisition system or a cloud server to obtain a texture map of multiple viewpoints for generating synchronization of free-view video frames, or to obtain The free-view video compressed frames obtained by encoding and encapsulating the video processing methods in the foregoing embodiments of the present specification can then be decapsulated and decoded by the processor 332 based on the video compressed frames obtained by the communication component 333, and according to the virtual Viewpoint position, and perform free-viewpoint video reconstruction of the virtual viewpoint.
  • a communication component 333 may communicate with the acquisition system or a cloud server to obtain a texture map of multiple viewpoints for generating synchronization of free-view video frames, or to obtain The free-view video compressed frames obtained by encoding and encapsulating the video processing methods in the foregoing embodiments of the present specification can then be decapsulated and decoded by the processor 332 based on the video compressed frames obtained by the communication component 333, and according to the virtual Viewpoint position, and
  • the electronic device 330 may further include a display component 334 (eg, display, touch screen, projector) to display the reconstructed image of the virtual viewpoint.
  • a display component 334 eg, display, touch screen, projector
  • the electronic device can be set as a cloud server or a server cluster, or as a local server to perform compression processing on free-view video frames before transmission.
  • the electronic device 330 may specifically be a handheld electronic device such as a mobile phone, a notebook computer, a desktop computer, a set-top box, or other electronic device with video processing and playback functions.
  • the received video frames can be compressed.
  • Decoding processing is performed, and based on the decoded video frame and the acquired virtual viewpoint, the image of the virtual viewpoint is reconstructed.
  • the virtual viewpoint may be determined based on user interaction behavior, or determined based on preset position information of the virtual viewpoint.
  • the memory, the processor, the communication component and the display component may communicate through a bus network.
  • the communication component 333 and the display component 334 and the like may be components arranged inside the electronic device 330, or may be external components connected by expansion components such as expansion interfaces, docking stations, and expansion cables. equipment.
  • the processor 332 may use a central processing unit (Central Processing Unit, CPU) (for example, a single-core processor, a multi-core processor), a CPU group, a graphics processing unit (Graphics Processing Unit, GPU), an AI chip , FPGA chip, etc. any one or more of them are implemented in coordination.
  • CPU Central Processing Unit
  • an electronic device cluster composed of multiple electronic devices may also be used for collaborative implementation.
  • the video processing system A0 includes a collection array A1 composed of multiple collection devices, a data processing device A2, a server cluster A3 in the cloud, a playback control device A4, a playback terminal A5 and an interactive terminal A6.
  • each acquisition device in the acquisition array A1 can be placed in a fan shape at different positions in the on-site acquisition area, and can synchronously acquire video data streams from corresponding angles in real time.
  • the collection device may also be arranged in the ceiling area of the basketball stadium, on the basketball hoop, and the like.
  • the collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle or an irregular shape.
  • the specific arrangement can be set according to one or more factors such as the specific on-site environment, the number of acquisition devices, the characteristics of the acquisition devices, and imaging effect requirements.
  • the collection device may be any device with a camera function, such as a common camera, a mobile phone, a professional camera, and the like.
  • the data processing device A2 can be placed in a non-acquisition area on site, and can be regarded as an on-site server.
  • the data processing device A2 may send a stream pull instruction to each acquisition device in the acquisition array A1 through a wireless local area network, respectively, and each acquisition device in the acquisition array A1 will obtain a stream based on the stream pull instruction sent by the data processing device A2.
  • the video data stream is transmitted to the data processing device A2 in real time.
  • each acquisition device in the acquisition array A1 can transmit the obtained video data stream to the data processing device A2 in real time through the switch A7.
  • the acquisition array A1 and the switch A7 together form an acquisition system.
  • the data processing device A2 When the data processing device A2 receives the video frame interception instruction, it intercepts the video frame at the specified frame moment from the received multi-channel video data stream to obtain frame images of multiple synchronized video frames, and uses the obtained specified Multiple synchronized video frames at frame moments are uploaded to the server cluster A3 in the cloud.
  • the server cluster A3 in the cloud uses the received original texture maps of multiple synchronized video frames as an image combination, determines the parameter data corresponding to the image combination and the estimated depth map corresponding to each original texture map in the image combination, and Based on the corresponding parameter data of the image combination, the pixel data of the texture map and the depth data of the corresponding depth map in the image combination, frame image reconstruction is performed based on the acquired virtual viewpoint path, and corresponding multi-angle free-view video data is obtained.
  • the server can be placed in the cloud, and in order to process data in parallel more quickly, a server cluster A3 in the cloud can be composed of multiple different servers or server groups according to different data processed.
  • the cloud server cluster A3 may include: a first cloud server A31, a second cloud server A32, a third cloud server A33, and a fourth cloud server A34.
  • the first cloud server A31 can be used to determine the corresponding parameter data of the image combination;
  • the second cloud server A32 can be used to determine the estimated depth map of the original texture map of each viewpoint in the image combination and perform depth map correction processing
  • the third cloud server A33 can be based on the position information of the virtual viewpoint, based on the corresponding parameter data of the image combination, the texture map and the depth map of the image combination, use the virtual viewpoint reconstruction based on the depth map (Depth Image Based Rendering, DIBR ) algorithm to reconstruct frame images to obtain images of virtual viewpoints;
  • the fourth cloud server A34 can be used to generate free viewpoint videos (multi-angle free viewpoint videos).
  • first cloud server A31, the second cloud server A32, the third cloud server A33, and the fourth cloud server A34 may also be a server group composed of a server array or a server sub-cluster, which is not required in this embodiment of the present invention. limit.
  • the playback control device A4 can insert the received free-view video frame into the to-be-played video stream, and the playback terminal A5 receives the to-be-played video stream from the playback control device A4 and plays it in real time.
  • the playback control device A4 may be a manual playback control device or a virtual playback control device.
  • a dedicated server that can automatically switch video streams can be set up as a virtual playback control device to control the data source.
  • a broadcast director control device such as a broadcast director station, can be used as a playback control device in the embodiment of the present invention.
  • the data processing device A2 can be placed in the on-site non-collection area or the cloud according to the specific situation, and the server (cluster) and playback control device can be placed in the on-site non-collection area according to the specific situation, and the cloud or terminal access.
  • this embodiment is not intended to limit the specific implementation and protection scope of the present invention.
  • the data processing device A2 or the server cluster A3 in the cloud can use the video processing method in the embodiment of this specification to encode, encapsulate, etc. the texture map and the corresponding depth map of each viewpoint; the playback terminal A5 and the interactive terminal A6 and so on, the received video compressed frames can be decapsulated and decoded.
  • the playback terminal A5 and the interactive terminal A6 may be specially provided with at least one or more combinations of corresponding decoding chips, decoding modules or decoding software to perform decoding processing of video compression frames.
  • inventions of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments may be performed.
  • computer instructions when executed, the steps of the methods described in any of the foregoing embodiments may be performed.
  • the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

一种视频处理方法、装置、电子设备及存储介质,其中,一种视频处理方法包括:将帧同步的多个视点的纹理图拼接为纹理图拼接图像;将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。上述方案能够降低对解码端解码能力的要求,并减少压缩损失。

Description

视频处理方法、装置、电子设备及存储介质
本申请要求2020年07月31日递交的申请号为202010762409.2、发明名称为“视频处理方法、装置、电子设备及存储介质”中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书实施例涉及视频处理技术领域,尤其涉及一种视频处理方法、装置、电子设备及存储介质。
背景技术
视频数据是支持视频播放,以供用户观看的数据,通常,视频数据仅支持用户从一个视角进行观看,用户无法调整观看的视角。自由视点视频是一种能够提供高自由度观看体验的技术,用户可以在观看过程中通过交互操作,调整观看视角,从想观看的自由视点角度进行观看,从而可以大幅提升观看体验。
为实现自由视点视频数据的存储、传输及播放,需要对视频图像进行编码及封装等处理,目前通常采用的编码方式是:将同步的多个视点的纹理图和对应的深度图进行拼接,得到拼接图像,并将拼接图像采用统一进行编码。
然而,上述编码方式存在诸多局限性。比如,上述编码方式可能会对拼接图像造成压缩损失,进而影响重建得到的自由视点视频的图像质量;又如,由于拼接图像的分辨率较大,解码端的解码能力可能难以满足要求。
发明内容
有鉴于此,本说明书实施例提供视频处理方法、装置、电子设备及存储介质,能够降低对解码端解码能力的要求,并减少压缩损失。
本说明书实施例提供了一种视频处理方法,包括:
将帧同步的多个视点的纹理图拼接为纹理图拼接图像;
将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
可选地,所述方法还包括:将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道的视频压缩帧。
可选地,将所述深度图压缩图像封装在所述视频压缩帧的帧头区域。
可选地,将所述深度图拼接图像采用基于感兴趣区域的编码方式进行编码,得到对应的深度图压缩图像。
可选地,对于所述深度图拼接图像中的ROI像素区域,基于预设的恒定量化参数进行编码,得到对应的深度图压缩图像。
可选地,对于所述深度图拼接图像中包含的各视点对应的深度图子区域中的前景边缘像素区域采用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,所述第一恒定量化参数的参数值小于所述第二恒定量化参数的参数值。
可选地,所述深度图拼接图像中包含的各视点对应的深度图子区域的像素点为纹理图拼接图像中与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的全部或部分像素点。
本说明书实施例还提供了另一种视频处理方法,包括:
确定编码方式所属编码类别;
若确定编码方式属于深度图自定义编码,则将所述编码方式对应的编码标识信息存储至图像集用户信息中的编码信息区域;以及将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,以及将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
可选地,所述方法还包括:
将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道。
可选地,所述将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道,包括:
将所述深度图压缩图像封装至所述同一视频通道的视频压缩帧的帧头中的深度图区域。
可选地,所述方法还包括:
若确定编码方式属于多视联合编码,则获取所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识,并存储至所述图像集用户信息中的图像信息区域,将所述编码方式对应的编码标识信息存储至所述图像集用户信息中的编码信息区域;以及将所述帧同步的多个视点的纹理图和深度图分别采用采用预设的多视联合编码方式进行编码,得到纹理图压缩图像和深度图压缩图像。
可选地,所述方法还包括:
将所述纹理图压缩图像和所述深度图压缩图像作为图像组合,封装至同一视频通道,得到包含所述纹理图压缩图像和所述深度图压缩图像的视频压缩帧。
可选地,所述获取所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识,并存储至所述图像集用户信息中的图像信息区域,将所述编码方式对应的编码标识信息存储至所述图像集用户信息中的编码信息区域,包括:
将所述编码方式对应的编码类别标识存储至视频通道的通道头中编码类别区域;
将所述编码标识信息存储至视频压缩帧帧头中的编码信息区域;
将所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识按照获取顺序存储于所述视频压缩帧帧头中的图像信息区域。
可选地,所述将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,包括:
按照光栅扫描方式扫描所述帧同步的多个视点的纹理图,并按照光栅扫描顺序将所述多个视点的纹理图进行拼接,得到所述纹理图拼接图像;
按照所述光栅扫描方式扫描所述帧同步的多个视点的纹理图对应的深度图,并按照光栅扫描顺序将所述多个视点的深度图进行拼接,得到所述深度图拼接图像。
本说明书实施例还提供了另一种视频处理方法,包括:
对自由视点视频流进行解封装,获得视频压缩帧及所述视频压缩帧的编码方式所属编码类别信息;
基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用深度图自定义编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像;其中,所述纹理图拼接图像包括同步的多个视点的纹理图;所述深度图拼接图像包括所述多个视点的纹理图对应的深度图。
可选地,所述对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像,包括:
对所述视频压缩帧的帧头进行解码,从所述帧头中的编码信息存储区域,获取到所述纹理图压缩图像的编码标识信息和所述深度图压缩图像的编码标识信息;
采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到所述深度图拼接图像;
采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到所述纹理图拼接图像。
可选地,所述对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像,包括:
利用第一解码资源对所述纹理图压缩图像进行解码,得到纹理图拼接图像;
利用第二解码资源对所述深度图压缩图像进行解码,得到深度图拼接图像。
可选地,所述方法还包括:
从所述视频压缩帧的帧头中获取视点的参数数据;
获取虚拟视点,基于所述视点的参数数据,选择所述纹理图拼接图像和所述深度图拼接图像中与虚拟视点位置匹配的相应纹理图子区域中的纹理图和深度图子区域中的深度图作为参考纹理图和参考深度图;
基于所述参考纹理图和参考深度图,重建得到所述虚拟视点的图像。
可选地,所述方法还包括:
基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用多视联合编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像采用对应的多视联合解码方式进行解码,得到同步的多个视点的纹理图以及对应的深度图。
可选地,所述对所述视频压缩帧中的联合压缩图像解码,得到同步的多个视点的纹理图以及对应的深度图,包括:
对所述视频压缩帧的帧头进行解码,从所述帧头中的编码信息存储区域,获取到所述纹理图压缩图像的编码标识信息和深度图压缩图像的编码标识;
采用与所述纹理图压缩图像的编码标识信息和深度图压缩图像的编码标识对应的多视联合解码方式对所述纹理图压缩图像和深度图压缩图像进行解码,得到所述同步的多个视点的纹理图以及对应的深度图。
可选地,所述方法还包括:
从所述视频压缩帧的帧头中获得视点的标识及所述视频压缩帧的帧标识和各视点的参数数据;
获取虚拟视点,基于各视点的标识和参数数据,从所述同步的多个视点的纹理图以及对应的深度图中选择与所述虚拟视点匹配的纹理图和深度图作为参考纹理图和参考深度图;
基于所述参考纹理图和参考深度图,重建得到所述虚拟视点的图像。
本说明书实施例还提供了另一种视频处理方法,包括:
解码视频压缩帧的帧头,从所述帧头中获得所述视频压缩帧中包括的深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息;
采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述纹理图拼接图像包括同步的多个视点的纹理图;
采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
本说明书实施例还提供了一种视频处理装置,包括:
纹理图拼接单元,适于将帧同步的多个视点的纹理图拼接为纹理图拼接图像;
深度图拼接单元,适于将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
第一编码单元,适于将所述纹理图拼接图像进行编码,得到对应的纹理图压缩图像;
第二编码单元,适于将所述深度图拼接图像进行编码,得到对应的深度图压缩图像。
本说明书实施例还提供了另一种视频处理装置,包括:
编码类别确定单元,适于确定编码方式所属编码类别;
第一编码处理单元,适于在确定编码方式属于深度图自定义编码时,将所述编码方式对应的编码标识信息存储至图像集用户信息中的编码信息区域;以及将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,以及将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
可选地,所述装置还包括:
封装单元,适于将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道。
本说明书实施例还提供了另一种视频处理装置,包括:
解封装单元,适于对自由视点视频流进行解封装,获得视频压缩帧及所述视频压缩帧的编码方式所属编码类别信息;
第一解码单元,适于基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用深度图自定义编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像;其中,所述纹理图拼接图像包括同步的多个视点的纹理图;所述深度图拼接图像包括所述多个视点的纹理图对应的深度图。
可选地,所述装置还包括:
第二解码单元,适于基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用多视联合编码时,对所述视频压缩帧中的联合压缩图像采用对应的多视联合解码方式进行解码,得到同步的多个视点的纹理图以及对应的深度图。
本说明书实施例还提供了另一种视频处理装置,包括:
指示信息解码单元,适于解码所述视频压缩帧的帧头,从所述帧头中获得所述视频压缩帧中包括的深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息;
纹理图解码单元,适于采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述深度图拼接图像包括同步的多个视点的纹理图;
深度图解码单元,适于采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
本说明书实施例还提供了一种电子设备,包括:
图像处理装置,适于将帧同步的多个视点的纹理图拼接为纹理图拼接图像,以及将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
第一编码装置,适于将所述纹理图拼接图像进行编码,得到对应的纹理图压缩图像;
第二编码装置,适于将所述深度图拼接图像进行编码,得到对应的深度图压缩图像。
本说明书实施例还提供了另一种电子设备,包括:第一解码装置和第二解码装置, 其中:
所述第一解码装置,适于解码视频压缩帧的帧头,并当从所述帧头的图像集用户信息中获得所述视频压缩帧中获得编码标识信息包括深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息时,采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述纹理图拼接图像包括同步的多个视点的纹理图,以及触发所述第二解码装置对所述深度图压缩图像进行解码;
所述第二解码装置,适于采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
本说明书实施例还提供了另一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行前述任一实施例所述方法的步骤。
本说明书实施例还提供来一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行前述任一实施例所述方法的步骤。
与现有技术相比,本说明书实施例的技术方案具有以下有益效果:
一方面,采用本说明书实施例的视频处理方法,通过将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,并将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像,这一视频处理过程由于对纹理图拼接图像和深度图拼接图像分别进行编码,因而可以基于纹理图和深度图的特征采用相应的编码方法进行编码压缩,从而可以减少压缩损失,进而可以提高基于所述纹理图拼接图像和深度图拼接图像重建得到的自由视点视频的图像质量。
进一步地,通过将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道,也即纹理图压缩图像和深度图压缩图像可以复用一个视频通道,从而可以复用已有硬件设备的硬件接口,并符合现有开源框架的规范,也便于深度图与纹理图进行同步。
将所述深度图压缩图像封装在所述视频压缩帧的帧头区域,在解码端采用不同的解码资源分别对所述纹理图压缩图像和深度图压缩图像进行解码时,可以减少用于深度图压缩图像解码过程中无关数据的读取,从而可以提高数据处理效率,节约处理资源。
进一步地,对于深度图拼接图像采用基于感兴趣区域的编码方式进行编码,从而能够按照深度图拼接图像的图像特征进行压缩编码,从而可以减少深度图的压缩损失,进而可以提高基于深度图重建得到的自由视点视频的图像质量。
进一步地,对于所述深度图拼接图像中的ROI像素区域,基于预设的恒定量化参数进行编码,得到对应的深度图压缩图像,可以减少所述ROI像素区域的压缩损失。
进一步地,对于所述深度图拼接图像中包含的各视点对应的深度图子区域中的前景边缘像素区域采用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,由于深度图拼接图像中深度图子区域中的前景边缘像素区域对于自由视点视频的重建质量非常关键,故对于深度图子区域中的前景边缘像素区域采用的第一恒定量化参数的参数值小于对于所述深度图子区域中非前景边缘像素区域采用的第二量化参数的参数值,可以减小深度图拼接图像的压缩损失,进而可以提高重建得到的自由视点视频的重建质量。
进一步地,所述深度图拼接图像中包含的各视点对应的深度图子区域的像素点为纹理图拼接图像中与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的部分像素点,可以降低深度图拼接图像的分辨率,从而可以进一步节约传输资源。
在本说明书实施例的另一视频处理方法中,通过确定编码方式所属编码类别,在确定编码方式属于深度图自定义编码时,将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,以及将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像;将所述编码方式对应的编码类别标识及编码标识信息存储至图像集用户信息,通过这一视频处理过程,可以实现纹理图和深度图的分别单独拼接以及分别编码压缩,从而可以基于纹理图和深度图的不同图像特征采用相应的编码方式,减少压缩损失,并降低对解码端解码性能的要求。
进一步地,通过将所述扫描方式对应的扫描方式标识信息存储至所述视频压缩帧帧头中的扫描方式信息区域,使得视频压缩帧帧头中无须存储纹理图拼接图像和深度图拼接图像具体的拼接规则和视点标识,从而可以节约传输资源。
本说明书实施例的另一方面,通过对自由视点视频流进行解封装,获得视频压缩帧及所述视频压缩帧的编码方式所属编码类别信息,基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用深度图自定义编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像;其中,所述纹理图拼接图像包括同步的多个视点的纹理图;所述深度图拼接图像包括所述多个视点的纹理图对应的深度图。由这一视频处理过程可知,在确定所述视频压缩帧采用深度图自定义编码时,可以对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,从而可以采用不同的解码资源对所述纹理图压缩图像和深度图压缩图像分别解码,从而可以充分地利用解码端的资源,避免受到一种解码资源的解码性能和解码能力的限制。
此外,采用多视联合编码,可以去除视点间的冗余信息,提高编码效率。
附图说明
图1是本说明书实施例中一种自由视点视频展示的具体应用系统示意图;
图2是本说明书实施例中一种终端设备交互界面示意图;
图3是本说明书实施例中一种采集设备设置方式的示意图;
图4是本说明书实施例中另一种终端设备交互界面示意图;
图5是本说明书实施例中一种自由视点视频数据生成过程的示意图;
图6是本说明书实施例中一种6DoF视频数据的生成及处理的示意图;
图7是本说明书实施例中一种数据头文件的结构示意图;
图8是本说明书实施例中一种用户侧对6DoF视频数据处理的示意图;
图9是本说明书实施例中一种视频帧的拼接图像的结构示意图;
图10是本说明书实施例中一种视频处理方法的流程图;
图11是本说明书实施例中一具体应用场景的纹理图拼接过程示意图;
图12是本说明书实施例中一具体应用场景的深度图拼接过程示意图;
图13是本说明书实施例中一种光栅扫描方式示意图;
图14是本说明书实施例中一种光栅扫描方式对应的视点标识设定方法的示意图;
图15是本说明书实施例中另一种视频处理方法的流程图;
图16是本说明书实施例中一种采用深度图自定义编码进行编码的过程示意图;
图17是本说明书实施例中一种采用多视联合编码进行编码的过程示意图;
图18是本说明书实施例中一种多媒体通道的结构示意图;
图19是本说明书实施例中一种图像集用户信息的具体类型及区域分布示意图;
图20是本说明书实施例中一种视频通道中视频压缩帧的格式示意图;
图21是本说明书实施例中一种视频处理方法的流程图;
图22是本说明书实施例中一种解码方法的流程图;
图23是本说明书实施例中另一种解码方法的流程图;
图24是本说明书实施例中另一种视频处理方法的流程图;
图25是本说明书实施例中另一种视频处理方法的流程图;
图26是本说明书实施例中另一种视频处理方法的流程图;
图27是本说明书实施例中一种视频处理装置的结构示意图;
图28是本说明书实施例中另一种视频处理装置的结构示意图;
图29是本说明书实施例中另一种视频处理装置的结构示意图;
图30是本说明书实施例中另一种视频处理装置的结构示意图;
图31是本说明书实施例中一种电子设备的结构示意图;
图32是本说明书实施例中另一种电子设备的结构示意图;
图33是本说明书实施例中另一种电子设备的结构示意图;
图34是本说明书实施例中一种视频处理系统的结构示意图。
具体实施方式
为使本领域技术人员更好地理解和实施本说明书中的实施例,以下首先结合附图及具体应用场景对自由视点视频的实现方式进行示例性介绍。
参考图1,本发明实施例中一种自由视点视频展示的具体应用系统,可以包括多个采集设备的采集系统11、服务器12和显示设备13,其中采集系统11,可以对待观看区域进行图像采集;采集系统11或者服务器12,可以对获取到的同步的多个纹理图进行处理,生成能够支持显示设备13进行虚拟视点切换的多角度自由视角数据。显示设备13可以展示基于多角度自由视角数据生成的重建图像,重建图像对应于虚拟视点,根据用户指示可以展示对应于不同虚拟视点的重建图像,切换观看的位置和观看角度。
在具体实现中,进行图像重建,得到重建图像的过程可以由显示设备13实施,也可以由位于内容分发网络(Content Delivery Network,CDN)的设备以边缘计算的方式实施。可以理解的是,图1仅为示例,并非对采集系统、服务器、终端设备以及具体实现方式的限制。
继续参考图1,用户可以通过显示设备13对待观看区域进行观看,在本实施例中,待观看区域为篮球场。如前所述,观看的位置和观看角度是可以切换的。
举例而言,用户可以在屏幕滑动,以切换虚拟视点。在本发明一实施例中,结合参考图2,用户手指沿D 22方向滑动屏幕时,可以切换进行观看的虚拟视点。继续参考图3,滑动前的虚拟视点的位置可以是VP 1,滑动屏幕切换虚拟视点后,虚拟视点的位置可以是VP 2。结合参考图4,在滑动屏幕后,屏幕展示的重建图像可以如图4所示。重建图像,可以是基于由实际采集情境中的多个采集设备采集到的图像生成的多角度自由视角数据进行图像重建得到的。
可以理解的是,切换前进行观看的图像,也可以是重建图像。重建图像可以是视频流中的帧图像。另外,根据用户指示切换虚拟视点的方式可以是多样的,在此不做限制。
在具体实施中,视点可以用6自由度(Degree of Freedom,DoF)的坐标表示,其中,视点的空间位置可以表示为(x,y,z),视角可以表示为三个旋转方向
Figure PCTCN2021108640-appb-000001
相应地,基于6自由度的坐标,可以确定虚拟视点,包括位置和视角。
虚拟视点是一个三维概念,生成重建图像需要三维信息。在一种具体实现方式中,多角度自由视角数据中可以包括深度图数据,用于提供平面图像外的第三维信息。相比于其它实现方式,例如通过点云数据提供三维信息,深度图数据的数据量较小。
在本发明实施例中,虚拟视点的切换可以在一定范围内进行,该范围即为多角度自由视角范围。也即,在多角度自由视角范围内,可以任意切换虚拟视点的位置以及视角。
多角度自由视角范围与采集设备的布置相关,采集设备的拍摄覆盖范围越广,则多角度自由视角范围越大。终端设备展示的画面质量,与采集设备的数量相关,通常,设置的采集设备的数量越多,展示的画面中空洞区域越少。
此外,多角度自由视角的范围与采集设备的空间分布相关。可以基于采集设备的空间分布关系设置多角度自由视角的范围以及在终端侧与显示设备的交互方式。
本领域技术人员可以理解的是,上述各实施例以及对应的附图仅为举例示意性说明,并非对采集设备的设置以及多角度自由视角范围之间关联关系的限定,也并非对交互方式以及显示设备展示效果的限定。
结合参照图5,为进行自由视点视频重建,需要进行纹理图的采集和深度图计算,包括了三个主要步骤,分别为多摄像机视频采集(Multi-camera Video Capturing),摄像机内外参计算(Camera Parameter Estimation),以及深度图计算(Depth Map Calculation)。对于多摄像机视频采集来说,要求各个摄像机采集的视频可以帧级对齐。其中,通过多摄像机的视频采集可以得到纹理图(Texture Image);通过摄像机内外参计算,可以得到摄像机参数(Camera Parameter),摄像机参数可以包括摄像机内部参数数据和外部参数数据;通过深度图计算,可以得到深度图(Depth Map),多个同步的纹理图及对应视角的深度图和摄像机参数,形成6DoF视频数据。
在本说明书实施例方案中,并不需要特殊的摄像机,比如光场摄像机,来做视频的采集。同样的,也不需要在采集前先进行复杂的摄像机校准的工作。可以布局和安排多摄像机的位置,以更好的拍摄需要拍摄的物体或者场景。
在以上的三个步骤处理完后,就得到了从多摄像机采集来的纹理图,所有摄像机的摄像机参数,以及每个摄像机的深度图。可以把这三部分数据称作为多角度自由视角视频数据中的数据文件,也可以称作6自由度视频数据(6DoF video data)。因为有了这些数据,用户端就可以根据虚拟的6自由度(Degree of Freedom,DoF)位置,来生成虚拟视点,从而提供6DoF的视频体验。
结合参考图6,6DoF视频数据以及指示性数据可以经过压缩和传输到达用户侧,用户侧可以根据接收到的数据,获取用户侧6DoF表达,也即前述的6DoF视频数据和元数据。其中,指示性数据也可以称作元数据(Metadata),其中,视频数据包括多摄像机对应的各视点的纹理图和深度图数据,纹理图和深度图可以按照一定的拼接规则或拼接模式进行拼接,形成拼接图像。
结合参考图7,元数据可以用来描述6DoF视频数据的数据模式,具体可以包括:拼接模式元数据(Stitching Pattern metadata),用来指示拼接图像中多个纹理图的像素数据以及深度图数据的存储规则;边缘保护元数据(Padding pattern metadata),可以用于指示对拼接图像中进行边缘保护的方式,以及其它元数据(Other metadata)。元数据可以存储于数据头文件,具体的存储顺序可以如图7所示,或者以其它顺序存储。
结合参考图8,用户侧得到了6DoF视频数据,其中包括了摄像机参数,拼接图像(纹理图以及深度图),以及描述元数据(元数据),除此之外,还有用户端的交互行为数据。通过这些数据,用户侧可以采用基于深度图的渲染(DIBR,Depth Image-Based  Rendering)方式进行的6DoF渲染,从而在一个特定的根据用户行为产生的6DoF位置产生虚拟视点的图像,也即根据用户指示,确定与该指示对应的6DoF位置的虚拟视点。
目前,通常将同步的多个视点的纹理图和对应的深度图进行拼接,得到拼接图像,并将拼接图像采用统一进行编码并传输,参照图9所示的拼接图像的结构示意图,将视点1至视点8的纹理图和视点1至视点8的深度图拼接在一起,形成拼接图像90,之后可以将拼接图像90可以采用一编码方式进行编码,得到视频压缩帧,之后可以将视频压缩帧进行传输。
发明人经研究发现,上述视频处理方法虽然具有很强的应用普适性,然而,由于纹理图和深度图本身的特性具有较大的差异,对于拼接得到的包含纹理图和深度图的拼接图像,难以采用差异化的编码工具和策略等进行处理,因此会对拼接图像造成压缩损失,进而影响重建得到的自由视点视频的图像质量。又如,拼接后得到拼接图像分辨率较大,解码端的解码能力可能难以满足要求。
综上可知,现有的视频处理方式存在诸多局限性。
针对上述问题,本说明书实施例提供了一种视频处理方式,其中,基于纹理图和深度图不同的图像特性,对同步的多个视点的纹理图和对应的深度图分别进行拼接,得到纹理图拼接图像和深度图拼接图像,并将纹理图拼接图像和深度图拼接图像分别进行编码,从而可以得到对应的纹理图压缩图像和深度图压缩图像。
由于对纹理图拼接图像和深度图拼接图像分别进行编码,因而可以基于纹理图和深度图的特征采用相应的编码方法进行编码压缩,从而可以减少压缩损失,进而可以提高基于所述纹理图拼接图像和深度图拼接图像重建得到的自由视点视频的图像质量。
为使本领域技术人员更好地理解和实现本说明书实施例,以下分别参照附图,通过从编码侧和解码侧两侧的视频处理过程进行详细说明。
首先从编码侧的视频处理方式进行详细阐述。
参照图10所示的视频处理方法的流程图,在本说明书实施例中,具体可以包括如下步骤:
S101,将帧同步的多个视点的纹理图拼接为纹理图拼接图像。
在具体实施中,可以按照预设的扫描方式扫描所述帧同步的多个视点的纹理图,并按照预设的拼接规则拼接得到所述纹理图拼接图像,如图11所示的纹理图拼接过程示意图,对于摄像机1至摄像机4同步采集的4个视频流,可以按照帧时序,得到纹理图拼接图像视频流。
S102,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像。
与步骤S101类似,可以按照所述预设的扫描方式扫描所述帧同步的多个视点的纹理图对应的深度图,并按照所述预设的拼接规则拼接得到所述深度图拼接图像。如图12所示的深度图拼接过程示意图,对于摄像机1至摄像机4对应的深度图视频流,可以按照 帧时序,得到深度图拼接图像视频流。
在具体实施中,具体的扫描方式可以光栅扫描(RasterScan)、Z字形扫描(ZigZag-Scan)等多种。
其中,光栅扫描,是指从左往右,由上而下,先扫描完一行,再移至下一行起始位置继续扫描,如图13所示的光栅扫描方式示意图。Z字形扫描中Z是形象的表示方式,在具体实施例中,可以按照所述扫描轨迹对空间中各视点的纹理图及深度图按照顺序进行获取及拼接。
可以理解的是,以上扫描方式仅为示例说明,本说明书实施例中并不对具体的扫描方式作任何限定。
在本说明书一些实施例中,预设采用固定的一种扫描方式进行扫描。例如,编码端设备和解码端设备可以约定采用光栅扫描方式进行扫描,这样传输时可以仅传输各视点的纹理图和深度图对应的标识信息即可,而不用传递扫描方式对应的标识信息,解码端设备按照同样的扫描顺序进行解码即可。
在本说明书实施例中,针对空间中任意方式排布的视点,为简化纹理图拼接图像和深度图拼接图像中对应的纹理图子区域和深度图子区域中视点信息存储规则,可以预先设定视点标识的设定规则。
在本说明书一些实施例中,如图14所示的视点标识设定方法的示意图,各个相机作为原始视点。首先,可以选取一个主摄像机,主摄像机用于保证同步采集,其他摄像机为从摄像机,并将主摄像机标号为0,本说明书实施例中并不限定主摄像机的选取规则。之后,可以以主摄像机为原点建立空间坐标系,如图14所示,之后,可以按照空间坐标系(x,y,z)进行排序,其中一示例排序规则如下:
a.按照z坐标的小标号优先顺序进行排序,越小的标号的摄像机标号越小;
b.在z坐标排序基础上按照y坐标的小标号优先顺序进行排序,越小的标号获得的相机标号越小;
c.在y坐标排序基础上按照x坐标的小标号优先顺序进行排序。越小的标号获得的相机标号越小。
按照上述视点排序规则得到的一视点排序样例如下:
按照图14,从中选择了任意一台摄像机作为主摄像机,并将其标号定为0,之后,获得7台从摄像机在坐标系中的坐标(-1,1,1),(-1,0,1),(1,0,1),(1,0,0),(-1,0,-1),(0,0,-1),(1,0,-1)。按照上述排序规则,将其排序获得(-1,0,-1),(0,0,-1),(1,0,-1),(1,0,0),(-1,0,1),(1,0,1),(-1,1,1)。
最终确定的摄像机标号如下:
(0,0,0)摄像机标号0;
(-1,0,-1)摄像机标号1;
(0,0,-1)摄像机标号2;
(1,0,-1)摄像机标号3;
(1,0,0)摄像机标号4;
(-1,0,1)摄像机标号5;
(1,0,1)摄像机标号6;
(-1,1,1)摄像机标号7。
通过上述流程确定摄像机标号的顺序,即是确定得到的光栅扫描顺序,上述摄像机标识也可以表征各视点的纹理图和深度图的获取顺序,因此,可以按照光栅扫描方式扫描所述帧同步的多个视点的纹理图,并按照光栅扫描顺序将所述多个视点的纹理图进行拼接,得到所述纹理图拼接图像;并按照所述光栅扫描方式扫描所述帧同步的多个视点的纹理图对应的深度图,并按照光栅扫描顺序将所述多个视点的纹理图进行拼接,得到所述深度图拼接图像。
S103,将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
在具体实施中,根据所述纹理图拼接图像和所述深度图拼接图像的不同特征,可以对所述纹理图拼接图像和所述深度图拼接图像分别采用不同的编码方式进行编码。
例如,可以将所述深度图拼接图像采用基于感兴趣区域的编码方式进行编码,得到对应的深度图压缩图像。对于深度图拼接图像采用基于感兴趣区域的编码方式进行编码,从而能够按照深度图拼接图像的图像特征进行压缩编码,从而可以减少深度图的压缩损失,进而可以提高基于深度图重建得到的自由视点视频的图像质量。
作为具体示例,对于所述深度图拼接图像中的感兴趣区域(Region Of Internet,ROI)像素区域,基于预设的恒定量化参数进行编码,得到对应的深度图压缩图像。
发明人经研究发现,深度图子区域中的前景边缘像素区域的质量对于基于虚拟视点重建得到的图像质量是非常关键的。基于此,为提高重建得到的虚拟视点的图像质量,在本说明书一具体实施中,对于所述深度图拼接图像中包含的各视点对应的深度图子区域中的前景边缘像素区域采用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,所述第一恒定量化参数的参数值小于所述第二恒定量化参数的参数值。
在具体实施中,对于纹理图拼接图像,可以根据需要,从VP8、VP9、AV1、H.264/AVC、H.265/HEVC、通用视频编码(Versatile Video Coding,VVC)、音视频编码标准(Audio and Video Coding Standard,AVS)、AVS+、AVS2、AVS3等多种适合纹理图的编码方式中选择一种用于对所述纹理图拼接图像进行编码,得到纹理图压缩图像。
其中,VP8、VP9是开放的视频压缩编码方式。H.264/AVC由ITU-T和ISO/IEC联合开发的,定位于覆盖整个视频应用领域,ITU-T给这个标准命名为H.264(以前叫做 H.26L),而ISO/IEC称它为MPEG-4高级视频编码(Advanced Video Coding,AVC)。H.265是ITU-T VCEG继H.264之后所制定的新的视频编码标准,标准全称为高效视频编码(High Efficiency Video Coding,HEVC)。AVS+、AVS2和AVS3等是AVS的优化和演进技术。
在具体实施中,步骤S101和步骤S102可以按照先后执行(可以先执行步骤S101,也可以先执行步骤S102),也可以并行执行。
作为可选步骤,在步骤S103之后,可以执行如下步骤:
S104,将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道。
通过将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道,也即纹理图压缩图像和深度图压缩图像可以复用一个视频通道,从而可以复用已有硬件设备的硬件接口,并符合现有开源框架的规范,也便于深度图与纹理图进行同步。
在具体实施中,可以将所述深度图压缩图像封装在所述视频压缩帧的帧头区域。
将所述深度图压缩图像封装在所述视频压缩帧的帧头区域,在解码端采用不同的解码资源分别对所述纹理图压缩图像和深度图压缩图像进行解码时,可以减少用于深度图压缩图像解码过程中无关数据的读取,从而可以提高数据处理效率,节约处理资源。
在具体实施中,所述深度图拼接图像中包含的各视点对应的深度图子区域的像素点为纹理图拼接图像中与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的全部或部分像素点。在本说明书一些实施例中,通过对原始深度图或者原始深度图拼接图像中的像素点进行降采样后得到所述深度图拼接图像,通过降采样得到的深度图拼接图像中包含的各视点对应的深度图子区域的像素点为纹理图拼接图像中与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的部分像素点,可以降低深度图拼接图像的分辨率,从而可以进一步节约传输资源。
在具体实施中,还可以对上述实施例作进一步的扩展。以下通过一些具体实施例进行示例说明。
为扩展本说明书实施例所适用的压缩编码兼容性,可以设置多种编码方式,其中包括深度图自定义编码,在本说明书一些实施例中,参照图15所示的视频处理方法的流程图,具体可以采用如下步骤进行实施:
S151,确定编码方式所属编码类别,若确定编码方式属于深度图自定义编码,可以执行步骤S152;若确定编码方式属于多视联合编码,可以执行步骤S153。
S152,将所述编码方式对应的编码标识信息存储至图像集用户信息中的编码信息区域;以及将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,以及将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
在具体实施中,在确定编码方式属于深度图自定义编码时,编码系统或者用于编码 的电子设备可以做两个方面的动作:一个方面是将编码方式对应的编码标识信息存储至图像集用户信息中的编码信息区域,以便于解码系统或者用于解码的电子设备识别及采用对应的解码方式进行解码。
另一方面是,采用所述深度图自定义编码进行编码的执行动作。如图16所示,为采用深度图自定义编码进行编码的过程示意图,其中,对于帧同步的四个视点的纹理图T1~T4,可以按照预设的扫描方式进行拼接,得到纹理图拼接图像Tx,之后对纹理图拼接图像Tx采用第一编码方式进行编码,得到纹理图压缩图像Te;对于帧同步的四个视点的纹理图T1~T4所对应的深度图D1~D4,可以按照预设的扫描方式进行拼接,得到深度图拼接图像Dx,之后对深度图拼接图像Dx采用第二编码方式进行编码,得到深度图压缩图像De。在具体实施中,对于纹理图拼接图像Tx和深度图拼接图像Dx采用不同的编码方式,从而可以基于纹理图和深度图不同的特性,采用更加匹配的编码方式进行编码。
在本说明书一些实施例中,可以按照光栅扫描方式扫描所述帧同步的多个视点的纹理图T1~T4,并按照光栅扫描顺序将所述多个视点的纹理图进行拼接,得到所述纹理图拼接图像Tx;以及按照所述光栅扫描方式扫描所述帧同步的多个视点的纹理图对应的深度图D1~D4,并按照光栅扫描顺序将所述多个视点的纹理图进行拼接,得到所述深度图拼接图像Dx。
在本说明书一些实施例中,所述第一编码方式可以从VP8、VP9、AV1、H.264/AVC、H.265/HEVC、VVC、AVS、AVS+、AVS2、AVS3等多种适合纹理图的编码方式中进行选取;所述第二编码方式可以采用基于感兴趣区域的编码方式进行编码,例如,对于所述深度图拼接图像中的感兴趣区域(Region Of Internet,ROI)像素区域,可以基于预设的恒定量化参数进行编码,得到对应的深度图压缩图像。
在具体实施中,可以采用支持相应编码方式的编码器或者编码软件进行实施。
S153,获取所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识,并存储至所述图像集用户信息中的图像信息区域,将所述编码方式对应的编码标识信息存储至所述图像集用户信息中的编码信息区域;以及将所述帧同步的多个视点的纹理图和深度图分别采用采用预设的多视联合编码方式进行编码,得到联合压缩图像。
在本说明书一些实施例中,编码类别除了有深度图自定义编码外,还有多视联合编码等编码类型,因此根据需要,除了可以采用深度图自定义编码外,还可以采用多视联合编码对多个视点的纹理图和深度图进行编码。
采用多视联合编码,可以去除视点间的冗余信息,提高编码效率。
在具体实施中,在确定编码方式属于多视联合编码时,编码系统或者用于编码的电子设备可以做两个方面的动作:一个方面是获取所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识,并存储至所述图像集用户信息中的图像信息区域,以及将所述 编码方式对应的编码标识信息存储至所述图像集用户信息中的编码信息区域,以便于解码系统或者用于解码的电子设备识别及采用对应的解码方式进行解码。另一方面是,采用所述多视联合编码方式进行编码的执行动作。
如图17所示,为本说明书实施例中一种采用多视联合编码进行编码的过程示意图,对于帧同步的多个视点(视点0至视点N-1)的纹理图和深度图,首先,从中选取一个视点作为独立视点的纹理图作为独立视图,如图17所示,选取视点0的纹理图作为独立视图,采用符合HEVC的视频编码装置进行编码,一方面,将编码得到的纹理图压缩图像输出至封装装置,另一方面,采用预设的深度图编码装置,参考所述符合HEVC的视频编码装置输出的纹理压缩图像的信息,对所述视点0的深度图进行编码,对于编码后得到的深度图压缩图像,一方面输出至所述封装装置继续进行封装,另一方面输出至其他视点的用于非独立视图的视频编码装置和用于非独立视图的深度图编码装置,以供用于非独立视图的视频编码装置对相应视点的纹理图进行编码时作为参考,以及供用于非独立视图的深度图编码装置对相应视点的深度图进行编码时作为参考,从而得到符合HEVC的视频编码装置、深度图编码装置,以及用于非独立视图的视频编码装置和用于非独立视图的深度图编码装置等多个编码装置联合编码得到的纹理图压缩图像和深度图压缩图像。
在具体实施中,各编码装置可以采用相同的编码方式,也可以根据纹理图和深度图的不同特性选择不同的编码方式。
在具体实施中,具体的多视联合编码方式可以从VP8、VP9、AV1、H.264/AVC、H.265/HEVC、VVC、AVS、AVS+、AVS2、AVS3等编码方式中进行选取。
在具体实施中,为了便于解码端进行解码,可以将涉及到的与编码相关的指示信息,例如编码方式对应的编码类别标识、具体编码方式的编码标识信息等可以存储至图像集用户信息。
通过前述步骤S152或步骤S153,可以实现视频图像的编码压缩,为进行视频传输及不同多媒体内容的同步播放,可以对编码器生成的多媒体内容(视频、音频、字幕、章节信息等)进行封装处理,具体的封装格式可以理解为是媒体的容器。通过封装,可以将视频通道的编码和音频通道的编码数据打包成一个完整的多媒体文件。
在具体实施中,基于编码方式的不同,具体的编码视频帧、编码信息、封装信息等均可以有所不同。在本说明书实施例中,编码信息、封装信息等均可以视为图像集用户信息,基于封装格式、编码方式的不同,图像集用户信息中不同类型的用户信息可以集中在一个区域存储,也可以根据需要设置在不同的存储区域。以下将通过具体应用场景进行说明。
以下首先分别通过深度图自定义编码方式和多视联合编码方式编码得到的纹理图压缩图像和深度图压缩图像的封装方式,结合深度图自定义编码方式及多视联合编码方式 介绍图像集用户信息一些示例的存放方式。
在本说明书实施例中,对于步骤S152或步骤S153编码得到的纹理图压缩图像和深度图压缩图像,可以复用一个视频通道,具体而言,可以将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道的视频压缩帧中,如图18所示的多媒体信息通道结构示意图,多媒体信息通道180包括视频通道181和音频通道182,为实现对编码后的纹理图压缩图像和深度图压缩图像进行传输,可以将纹理图压缩图像和深度图压缩图像封装在视频通道181中,更具体地,可以将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道的视频压缩帧中。继续参照图18,在视频通道181中,按照时序封装有一系列视频压缩帧,每个视频压缩帧18F包括帧头18A和帧体18B。
在本说明书一些实施例中,对于采用深度图编码方式得到的深度图压缩图像,可以封装至所述视频通道181的视频压缩帧18F的帧头18A中的深度图区域,而将所述纹理图压缩图像封装在所述视频通道181的视频压缩帧18F的帧体18B中。在本说明书一些实施例中,对于步骤S153编码得到的纹理图压缩图像和纹理图压缩图像,同样地可以封装在视频通道的视频压缩帧中。具体地,可以将同步的每个视点的纹理图压缩图像和同一视点的深度图压缩图像依序存储于视频压缩帧18F的帧体18B中。
以下结合深度图自定义编码和多视联合编码示例说明在进行封装处理后图像集用户信息的具体存储方式。
在具体实施中,可以将所述编码方式对应的编码类别标识存储至视频通道的通道头中图像集用户信息区域,编码类别标识作为图像集用户信息的一个子集,为实现快速解码,并节约存储资源,可以存储于视频通道的通道头中的图像集用户信息区域,例如,若仅有深度图自定义编码和多视联合编码这两类编码类别可选,可以在通道头信息中专门设置1比特的字符用于标识所述视频通道中包含的是深度图自定义编码方式得到的视频压缩帧,还是采用多视联合编码方式得到的视频压缩帧。而若有更多类别的编码方式,则可以预留更多比特位用于标识所属的编码类别。同时,可以将所述编码标识信息存储至所述视频压缩帧的帧头中的编码信息区域,例如,若是使用多视联合编码,视频压缩帧的帧头中具体的标识信息可以标识对纹理图和深度图具体的编码方式是3D-HEVC,还是3D-AVC等;若是使用深度图自定义编码,视频压缩帧的帧头中具体的标识信息可以标识纹理图拼接图像和深度图拼接图像具体的编码方式是AVC、HEVC、AVS、AVS2、AVS3等。
此外,对应于多视联合编码方式得到的联合压缩图像,在具体实施中,可以将所述联合压缩图像中包含的视点的标识以及所述联合拼接图像对应的帧标识存储至所述视频压缩帧帧头中的图像信息存储区域。
参照图19所示的图像集用户信息的具体类型及区域分布示意图,在具体实施中,可以在视频通道的通道头Ca中设置编码类别区域Ca1,存储编码类别标识,以表征具体编码方式所属编码类别,如深度图自定义编码,或者为多视联合编码,并在视频压缩帧I 的帧头Ia中的编码信息区域Ia1存储编码标识信息,以表征深度图自定义编码或多视联合编码中所采用的具体编码方式,如具体为3D-HEVC、3D-AVC、AVC、HEVC、AVS等中的哪一种编码或多种编码方式。
在具体实施中,作为图像集用户信息中的扩展信息,继续参照图19,对应于多视联合编码方式,视频压缩帧I的帧头Ia中可以包括所述图像信息存储区域Ia2,可以存储多视联合编码所针对的纹理图和深度图的视点的标识和的帧标识。
如前实施例所述,对应于多视联合编码,各视点的纹理图压缩图像和深度图压缩图像可以按照顺序交错排布在帧体Ib中,如图20所示的视频通道中视频压缩帧的格式示意图,其中,视频压缩帧I中得到的各视点的纹理图压缩图像及深度图压缩图像可以作为一个图像组合相邻排布,各个视点的图像按照编码装置处理顺序依序排布,且在图像信息存储区域Ia2中按照与相同顺序排布各纹理图压缩图像和深度图压缩图像的视点标识及帧标识,例如,对于同一帧,纹理图压缩图像和深度图压缩图像对应的视点标识可以作为一个数组排列,比如,可以表示为V0D0V1D1V2D2……,其中V表示纹理图,D表示深度图。
对应于深度图自定义编码,帧体Ib中仅存储纹理图压缩图像,而深度图压缩图像可以存储于帧头Ia的深度图存储区域Ia3中。
可以理解的是,图19和图20中仅示意了视频通道中的一个视频压缩帧的示意结构,在具体实施中,多个视频压缩帧依次存储于所述视频通道中,形成视频流。
经上述处理,通过视频通道可以传输自由视点视频流,进而可以基于所述自由视点视频流进行自由视点视频的播放。
为使本领域技术人员更好地理解本说明书实施例中的通过编码和封装等对视频进行处理的方法的原理及方案的具体实施,以下从传输至终端侧等解码端后,对获取到的自由视点视频流的解封装、解码,以及自由视点视频的重建等进行对应阐述。
参照图21所示的视频处理方法的流程图,在自由视点视频流接收侧,如解码端设备,对于获取到的自由视点视频流,可以采用如下步骤进行处理:
S211,对自由视点视频流进行解封装,获得视频压缩帧及所述视频压缩帧的编码方式所属编码类别信息。
在具体实施中,与前述实施例中封装方式对应,对视频通道的通道头进行解封装,可以获得视频压缩帧的编码方式所属编码类别信息。
S212,基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用深度图自定义编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像。
其中,所述纹理图拼接图像包括同步的多个视点的纹理图;所述深度图拼接图像包括所述多个视点的纹理图对应的深度图。
S213,基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用多视联合编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别采用对应的多视联合解码方式进行解码,得到同步的多个视点的纹理图以及对应的深度图。
步骤S213作为可选步骤,可以实现多视联合编码方式得到的视频压缩帧的解码。
其中,对于步骤S212或步骤S213,参照图22所示的解码方法的流程图,对于深度图自定义编码方式的视频压缩帧,可以采用如下解码方式进行解码:
S2121,对所述视频压缩帧的帧头进行解码,从所述帧头中的编码信息存储区域,获取到所述纹理图压缩图像的编码标识信息和所述深度图压缩图像的编码标识信息。
S2122,采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到所述深度图拼接图像。
S2123,采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到所述纹理图拼接图像。
在具体实施中,对于步骤S2122和步骤S2123,可以采用不同的解码资源分布进行解码。对于采用深度图编码方式得到的视频压缩帧,如前实施例示例,其中,深度图压缩图像存储于帧头的深度图存储区域,纹理图压缩图像存储于帧体Ib中,如图19所示,可以利用第一解码资源对所述纹理图压缩图像进行解码,得到纹理图拼接图像,而利用第二解码资源对所述深度图压缩图像进行解码,得到深度图拼接图像。
参照图23所示的另一种解码方法的流程图,对于多视联合编码方式,可以采用步骤进行解码:
S2131,对所述视频压缩帧的帧头进行解码,从所述帧头中的编码信息存储区域,获取到所述纹理图压缩图像的编码标识信息和深度图压缩图像的编码标识。
S2132,采用与所述纹理图压缩图像的编码标识信息和深度图压缩图像的编码标识对应的多视联合解码方式对所述纹理图压缩图像和深度图压缩图像进行解码,得到所述同步的多个视点的纹理图以及对应的深度图。
对于采用多视联合编码方式得到的视频压缩帧,纹理图压缩图像和深度图压缩图像交织存储于帧体Ib中,如图20所示。可以利用第一解码资源对所述纹理图压缩图像进行解码,得到纹理图,而利用第二解码资源对所述深度图压缩图像进行解码,得到深度图。其中,除了独立视点的纹理图,其他视点的纹理图和深度图在进行解码过程中可以参考其他视点解码得到的纹理图或深度图。
在本说明书实施例中,作为一具体示例,所述第一解码资源为硬件解码资源,所述第二解码资源为软件解码资源。作为另一具体示例,所述第一解码资源为图形处理器(Graphics Processing Unit,GPU)解码芯片,所述第二解码资源为中央处理器(Central Processing Unit,CPU)解码芯片。
对前述实施例中的视频压缩帧进行解码后,可以根据解码得到的图像数据,基于自 由视点,进行虚拟视点的图像重建。基于前述不同编码方式的视频压缩帧,图像重建所参照的具体数据略有不同,以下分别通过两个具体实施例进行示例说明:
参照图24所示的视频处理方法的流程图,对于采用深度图自定义解码方式得到的视频帧,可以采用如下方式进行图像重建:
S241,从所述视频压缩帧的帧头中获取视点的参数数据。
S242,获取虚拟视点,基于所述视点的参数数据,选择所述纹理图拼接图像和所述深度图拼接图像中与虚拟视点位置匹配的相应纹理图子区域中的纹理图和深度图子区域中的深度图作为参考纹理图和参考深度图。
在具体实施中,所述虚拟视点可以基于用户交互行为确定,或者基于预先设置的虚拟视点位置信息确定。
在本说明书一些实施例中,为在保证重建图像质量的情况下,减少数据运算量,可以基于空间位置关系信息,仅选取部分视点的纹理图和深度图进行虚拟视点的图像重建。具体而言,参考纹理图和参考深度图可以基于虚拟视点位置与所述纹理图拼接图像和所述深度图拼接图像中各纹理图子区域中的纹理图和深度图的距离关系确定。
其中,空间位置关系信息可以基于纹理图和深度图的参数信息确定,所述参数信息可以在视频压缩帧的帧头或者视频通道的通道头中存储。所述参数信息可以包括各视点的内参信息和外参信息。
S243,基于所述参考纹理图和参考深度图,重建得到所述虚拟视点的图像。
在具体实施中,基于所述参考纹理图和参考深度图,进行组合渲染,可以得到所述虚拟视点的图像。
采用上述方式进行虚拟视点的图像重建,由于可以采用与图像特性匹配的编码方式分别对纹理图拼接图像和深度图拼接图像进行编码,因而解码得到的纹理图拼接图像和深度图拼接图像的压缩损失较小,故可以提高重建的虚拟视点的图像的质量。
参照图25所示的视频处理方法的流程图,对于采用多视联合解码方式得到的视频帧,可以采用如下方式进行图像重建:
S251,从所述视频压缩帧的帧头中获得视点的标识及所述视频压缩帧的帧标识和各视点的参数数据。
S252,获取虚拟视点,基于各视点的标识和参数数据,从所述同步的多个视点的纹理图以及对应的深度图中选择与所述虚拟视点匹配的纹理图和深度图作为参考纹理图和参考深度图。
在具体实施中,所述虚拟视点可以基于用户交互行为确定,或者基于预先设置的虚拟视点位置信息确定。
在本说明书一些实施例中,为在保证重建图像质量的情况下,减少数据运算量,可以基于空间位置关系信息,仅选取部分视点的纹理图和深度图进行虚拟视点的图像重建。 具体而言,参考纹理图和参考深度图可以基于虚拟视点位置与所述纹理图拼接图像和所述深度图拼接图像中各纹理图子区域中的纹理图和深度图的距离关系确定。
其中,空间位置关系信息可以基于纹理图和深度图的参数信息确定,所述参数信息可以在视频压缩帧的帧头或者视频通道的通道头中存储。所述参数信息可以包括各视点的内参信息和外参信息。
S253,基于所述参考纹理图和参考深度图,重建得到所述虚拟视点的图像。
在具体实施中,基于所述参考纹理图和参考深度图,进行组合渲染,可以得到所述虚拟视点的图像。
采用上述方式进行虚拟视点的图像重建,由于可以采用与图像特性匹配的编码方式分别对各视点的纹理图和深度图进行编码,因而解码得到的各视点的纹理图和深度图的压缩损失较小,故可以提高重建的虚拟视点的图像的质量。
在具体实施中,还可以对所述虚拟视点重建得到的自由视点图像做进一步的处理。以下给出一示例性扩展方式。
为丰富用户视觉体验,可以在重建得到的自由视点图像中植入增强现实(Augmented Reality,AR)特效。在本说明一些实施例中,参照图26所示的视频处理方法的流程图,采用如下方式实现AR特效的植入:
S261,获取所述虚拟视点的图像中的虚拟渲染目标对象。
在具体实施中,可以基于某些指示信息确定自由视点视频的图像中的某些对象作为虚拟渲染目标对象,所述指示信息可以基于用户交互生成,也可以基于某些预设触发条件或第三方指令得到。在本说明书一可选实施例中,响应于特效生成交互控制指令,可以获取所述虚拟视点的图像中的虚拟渲染目标对象。
S262,获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像。
在本说明书实施例中,所植入的AR特效以虚拟信息图像的形式呈现。所述虚拟信息图像可以基于所述目标对象的增强现实特效输入数据生成。在确定虚拟渲染目标对象后,可以获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像。
在本说明书实施例中,所述虚拟渲染目标对象对应的虚拟信息图像可以预先生成,也可以响应于特效生成指令即时生成。
在具体实施中,可以基于三维标定得到的所述虚拟渲染目标对象在重建得到的图像中的位置,得到与所述虚拟渲染目标对象位置匹配的虚拟信息图像,从而可以使得到的虚拟信息图像与所述虚拟渲染目标对象在三维空间中的位置更加匹配,进而所展示的虚拟信息图像更加符合三维空间中的真实状态,因而所展示的合成图像更加真实生动,增强用户的视觉体验。
在具体实施中,可以基于虚拟渲染目标对象的增强现实特效输入数据,按照预设的 特效生成方式,生成所述目标对象对应的虚拟信息图像。
在具体实施中,可以采用多种特效生成方式。
例如,可以将所述目标对象的增强现实特效输入数据输入至预设的三维模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像;
又如,可以将所述虚拟渲染目标对象的增强现实特效输入数据,输入至预设的机器学习模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像。
S263,将所述虚拟信息图像与所述虚拟视点的图像进行合成处理并展示。
在具体实施中,可以有多种方式将所述虚拟信息图像与所述虚拟视点的图像进行合成处理并展示,以下给出两种具体可实现示例:
示例一:将所述虚拟信息图像与对应的图像进行融合处理,得到融合图像,对所述融合图像进行展示;
示例二:将所述虚拟信息图像叠加在对应的图像之上,得到叠加合成图像,对所述叠加合成图像进行展示。
在具体实施中,可以将得到的合成图像直接展示,也可以将得到的合成图像插入待播放的视频流进行播放展示。例如,可以将所述融合图像插入待播放视频流进行播放展示。
自由视点视频中可以包括特效展示标识,在具体实施中,可以基于特效展示标识,确定所述虚拟信息图像在所述虚拟视点的图像中的叠加位置,之后,可以将所述虚拟信息图像在所确定的叠加位置进行叠加展示。
为使本领域技术人员更好地理解和实现本说明书实施例,以下对可以实现上述视频处理方法的视频处理装置进行对应描述。
参照图27所示的视频处理装置的结构示意图,在本说明书一些实施例中,如图26所示,视频处理装置270可以包括:纹理图拼接单元271、深度图拼接单元272、第一编码单元273和第二编码单元274,其中:
所述纹理图拼接单元271,适于将帧同步的多个视点的纹理图拼接为纹理图拼接图像;
所述深度图拼接单元272,适于将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
所述第一编码单元273,适于将所述纹理图拼接图像进行编码,得到对应的纹理图压缩图像;
所述第二编码单元274,适于将所述深度图拼接图像进行编码,得到对应的深度图压缩图像。
采用上述视频处理装置,可以采用深度图自定义编码的方式对用于形成自由视点视频的同步的多个视点的纹理图和深度图进行压缩编码。
参照图28所示的视频处理装置的结构示意图,本说明书实施例还提供了另一种视频处理装置,如图28所示,视频处理装置280可以包括:编码类别确定单元281和第一编码处理单元282,其中:
编码类别确定单元281,适于确定编码方式所属编码类别;
第一编码处理单元282,适于在确定编码方式属于深度图自定义编码时,将所述编码方式对应的编码类别标识及编码标识信息存储至图像集用户信息中的编码信息区域;以及将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,以及将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
在具体实施中,视频处理装置280还可以包括:第二编码处理单元283,适于在确定编码方式属于多视联合编码时,获取所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识,并存储至所述图像集用户信息中的图像信息区域,将所述编码方式对应的编码标识信息存储至所述图像集用户信息中的编码信息区域;以及将所述帧同步的多个视点的纹理图和深度图分别采用采用预设的多视联合编码方式进行编码,得到纹理图压缩图像和深度图压缩图像。
在具体实施中,如图28所示,视频处理装置280还可以包括:封装单元284,适于将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道中。
作为可选示例,所述封装单元284对于采用深度图自定义编码处理得到的纹理图压缩图像和深度图压缩图像,可以将纹理图压缩图像放置在视频压缩帧的帧体中,将深度图压缩图像放置在视频压缩帧的帧头中。
作为可选示例,所述封装单元284对于采用多视联合编码处理得到的纹理图压缩图像和深度图压缩图像,可以将纹理图压缩图像和深度图压缩图像交织放置在视频压缩帧的帧体中。
本说明书实施例还提供了另一种视频处理装置,可以对视频压缩帧进行解封装和解码处理,如图29所示,视频处理装置290可以包括:解封装单元291和第一解码单元292,其中:
所述解封装单元291,适于对自由视点视频流进行解封装,获得视频压缩帧及所述视频压缩帧的编码方式所属编码类别信息;
所述第一解码单元292,适于基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用深度图自定义编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像;其中,所述纹理图拼接图像包括同步的多个视点的纹理图;所述深度图拼接图像包括所述多个视点的纹理图对应的 深度图。
在具体实施中,继续参照图29,所述视频处理装置还可以包括:第二解码单元293,适于基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用多视联合编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像采用对应的多视联合解码方式进行解码,得到同步的多个视点的纹理图以及对应的深度图。
参照图30所示的另一种视频处理装置,在本说明书实施例,如图30所示的视频处理装置300,可以包括:指示信息解码单元301、纹理图解码单元302和深度图解码单元303,其中:
所述指示信息解码单元301,适于解码所述视频压缩帧的帧头,从所述帧头中获得所述视频压缩帧中包括深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息;
所述纹理图解码单元302,适于采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述深度图拼接图像包括同步的多个视点的纹理图;
所述深度图解码单元303,适于采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
本说明书实施例还提供了相应的视频处理设备,参照图31所示的电子设备的结构示意图,在本说明实施例中,可以采用所述电子设备对视频数据进行编码压缩,如图31所示,电子设备310可以包括:
所述图像处理装置311,适于将帧同步的多个视点的纹理图拼接为纹理图拼接图像,以及将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
所述第一编码装置312,适于将所述纹理图拼接图像进行编码,得到对应的纹理图压缩图像;
所述第二编码装置313,适于将所述深度图拼接图像进行编码,得到对应的深度图压缩图像。
在具体实施中,所述图像处理装置311可以通过预设的数据处理芯片实现,例如可以是单核或多核处理器、GPU芯片、FPGA芯片等其中任意一种。
在具体实施中,所述第一编码装置312和所述第二编码装置313可以通过专门的编码器、编码软件或者编码软件与编码器协同实施。
在具体实施中,所述第一编码装置312和所述第二编码装置313可以分别独立执行,也可以配合执行。
在具体实施中,还可以采用第一编码装置312和所述第二编码装置313进行多视图联合编码,例如,由所述第一编码装置对纹理图进行编码,由所述第二编码装置对深度 图进行编码。具体的多视联合编码的处理方式可以参见前述实施例,此处不再展开描述。
参照图32所示的另一种电子设备的结构示意图,可以采用所述电子设备对视频通道的视频压缩帧进行解码,如图32所示,电子设备320包括:第一解码装置321和第二解码装置322,其中:
所述第一解码装置321,适于解码视频压缩帧的帧头,并当从所述帧头的图像集用户信息中获得所述视频压缩帧中获得编码标识信息包括深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息时,采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述纹理图拼接图像包括同步的多个视点的纹理图,以及触发所述第二解码器对所述深度图压缩图像进行解码;
所述第二解码装置322,适于采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
采用上述电子设备,可以实现对采用深度图自定义解码方式得到的视频压缩帧进行解码。
在具体实施中,还可以采用上述电子设备对采用多视联合编码方式得到的视频压缩帧进行解码,其中,可以由所述第一解码装置321进行纹理图编码,由所述第二解码装置322进行深度图编码。具体的多视联合解码的执行过程可以参见前述实施例介绍,此处不再赘述。
参照图33所示的另一种电子设备的结构示意图,本说明书实施例还提供了另一种电子设备,其中,如图33所示,电子设备330可以包括存储器331和处理器332,所述存储器331上存储有可在所述处理器332上运行的计算机指令,其中,所述处理器运行所述计算机指令时可以执行前述任一实施例所述方法的步骤。
基于所述电子设备在整个视频处理系统所处位置,所述电子设备还可以包括其他的电子部件或组件。
例如,继续参照图33,电子设备330还可以包括通信组件333,所述通信组件可以与采集系统或云端服务器通信,获得用于生成自由视点视频帧的同步的多个视点的纹理图,或者获取采用本说明书前述实施例的视频处理方法进行编码、封装后得到的自由视点视频压缩帧,进而可以由处理器332基于通信组件333获取到的视频压缩帧,进行解封装及解码处理,以及根据虚拟视点位置,进行所述虚拟视点的自由视点视频重建。
又如,在某些电子设备中,继续参照图33,电子设备330还可以包括显示组件334(如显示器、触摸屏、投影仪),以对重建得到的虚拟视点的图像进行显示。
作为服务端设备,所述电子设备可以设置为云端服务器或服务器集群,或者作为本地服务器对自由视点视频帧在传输前进行压缩处理。作为终端设备,所述电子设备330 具体可以是手机等手持电子设备、笔记本电脑、台式电脑、机顶盒等具有视频处理及播放功能的电子设备,采用所述终端设备,可以对接收到的视频压缩帧进行解码处理,并基于解码后的视频帧,以及获取到的虚拟视点,重建得到所述虚拟视点的图像。其中,所述虚拟视点可以基于用户交互行为确定,或者基于预先设置的虚拟视点位置信息确定。
在本说明书一些实施例中,存储器、处理器、通信组件和显示组件之间可以通过总线网络进行通信。
在具体实施中,如图33所示,通信组件333和显示组件334等可以为设置在所述电子设备330内部的组件,也可以为通过扩展接口、扩展坞、扩展线等扩展组件连接的外接设备。
在具体实施中,所述处理器332可以通过中央处理器(Central Processing Unit,CPU)(例如单核处理器、多核处理器)、CPU组、图形处理器(Graphics Processing Unit,GPU)、AI芯片、FPGA芯片等其中任意一种或多种协同实施。
在具体实施中,对于大量的自由视点视频帧,为减小处理时延,也可以采用多个电子设备组成的电子设备集群协同实施。
为使本领域技术人员更好地理解和实施,以下以一个具体的应用场景进行说明。参照图34所示的视频处理系统的结构示意图,如图34所示,为一种应用场景中视频处理系统的结构示意图,其中,示出了一场篮球赛的数据处理系统的布置场景,所述视频处理系统A0包括由多个采集设备组成的采集阵列A1、数据处理设备A2、云端的服务器集群A3、播放控制设备A4,播放终端A5和交互终端A6。
参照图34,以左侧的篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列A1中各采集设备可以根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频数据流。
在具体实施中,采集设备还可以设置在篮球场馆的顶棚区域、篮球架上等。各采集设备可以沿直线、扇形、弧线、圆形或者不规则形状排列分布。具体排列方式可以根据具体的现场环境、采集设备数量、采集设备的特点、成像效果需求等一种或多种因素进行设置。所述采集设备可以是任何具有摄像功能的设备,例如,普通的摄像机、手机、专业摄像机等。
而为了不影响采集设备工作,所述数据处理设备A2可以置于现场非采集区域,可视为现场服务器。所述数据处理设备A2可以通过无线局域网向所述采集阵列A1中各采集设备分别发送拉流指令,所述采集阵列A1中各采集设备基于所述数据处理设备A2发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备A2。其中,所述采集阵列A1中各采集设备可以通过交换机A7将获得的视频数据流实时传输至所述数据处理设备A2。采集阵列A1和交换机A7一起形成采集系统。
当所述数据处理设备A2接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧的帧图像,并将获得的所述指定帧时刻的多个同步视频帧上传至云端的服务器集群A3。
相应地,云端的服务器集群A3将接收的多个同步视频帧的原始纹理图作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各原始纹理图对应的估计深度图,并基于所述图像组合相应的参数数据、所述图像组合中纹理图的像素数据和对应深度图的深度数据,基于获取到的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据。
服务器可以置于云端,并且为了能够更快速地并行处理数据,可以按照处理数据的不同,由多个不同的服务器或服务器组组成云端的服务器集群A3。
例如,所述云端的服务器集群A3可以包括:第一云端服务器A31,第二云端服务器A32,第三云端服务器A33,第四云端服务器A34。其中,第一云端服务器A31可以用于确定所述图像组合相应的参数数据;第二云端服务器A32可以用于确定所述图像组合中各视点的原始纹理图的估计深度图以及进行深度图校正处理;第三云端服务器A33可以根据虚拟视点的位置信息,基于所述图像组合相应的参数数据、所述图像组合的纹理图和深度图,使用基于深度图的虚拟视点重建(Depth Image Based Rendering,DIBR)算法,进行帧图像重建,得到虚拟视点的图像;所述第四云端服务器A34可以用于生成自由视点视频(多角度自由视角视频)。
可以理解的是,所述第一云端服务器A31、第二云端服务器A32、第三云端服务器A33、第四云端服务器A34也可以为服务器阵列或服务器子集群组成的服务器组,本发明实施例不做限制。
然后,播放控制设备A4可以将接收到的自由视点视频帧插入待播放视频流中,播放终端A5接收来自所述播放控制设备A4的待播放视频流并进行实时播放。其中,播放控制设备A4可以为人工播放控制设备,也可以为虚拟播放控制设备。在具体实施中,可以设置专门的可以自动切换视频流的服务器作为虚拟播放控制设备进行数据源的控制。导播控制设备如导播台可以作为本发明实施例中的一种播放控制设备。
可以理解的是,所述数据处理设备A2可以根据具体情景置于现场非采集区域或云端,所述服务器(集群)和播放控制设备可以根据具体情景置于现场非采集区域,云端或者终端接入侧,本实施例并不用于限制本发明的具体实现和保护范围。
其中,所述数据处理设备A2或云端的服务器集群A3可以采用本说明书实施例中的视频处理方法对各视点的纹理图和对应的深度图进行编码、封装等处理;播放终端A5和交互终端A6等可以对接收到的视频压缩帧进行解封装、解码等处理。所述播放终端A5、交互终端A6中可以专门设置相应的解码芯片、解码模组或解码软件等其中至少一种或多种组合进行视频压缩帧的解码处理,具体实现可以参见前述解码端的视频处理方法示例。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行前述任一实施例所述方法的步骤,具体可以参见前述实施例介绍,此处不再赘述。
在具体实施中,所述计算机可读存储介质可以是光盘、机械硬盘、固态硬盘等各种适当的可读存储介质。
本发明实施例中多视频处理装置、电子设备、存储介质等涉及的名词解释、原理、具体实现和有益效果可以参见本发明实施例中视频处理方法,在此不再赘述。
虽然本说明书实施例披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本说明书实施例的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (32)

  1. 一种视频处理方法,其中,包括:
    将帧同步的多个视点的纹理图拼接为纹理图拼接图像;
    将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
    将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
  2. 根据权利要求1所述的方法,其中,还包括:
    将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道的视频压缩帧。
  3. 根据权利要求2所述的方法,其中,将所述深度图压缩图像封装在所述视频压缩帧的帧头区域。
  4. 根据权利要求1所述的方法,其中,将所述深度图拼接图像采用基于感兴趣区域的编码方式进行编码,得到对应的深度图压缩图像。
  5. 根据权利要求4所述的方法,其中,对于所述深度图拼接图像中的ROI像素区域,基于预设的恒定量化参数进行编码,得到对应的深度图压缩图像。
  6. 根据权利要求5所述的方法,其中,对于所述深度图拼接图像中包含的各视点对应的深度图子区域中的前景边缘像素区域采用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,所述第一恒定量化参数的参数值小于所述第二恒定量化参数的参数值。
  7. 根据权利要求1至6任一项所述的方法,其中,所述深度图拼接图像中包含的各视点对应的深度图子区域的像素点为纹理图拼接图像中与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的全部或部分像素点。
  8. 一种视频处理方法,其中,包括:
    确定编码方式所属编码类别;
    若确定编码方式属于深度图自定义编码,则将所述编码方式对应的编码标识信息存储至图像集用户信息中的编码信息区域;以及将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,以及将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
  9. 根据权利要求8所述的方法,其中,还包括:
    将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道。
  10. 根据权利要求9所述的方法,其中,所述将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道,包括:
    将所述深度图压缩图像封装至所述同一视频通道的视频压缩帧的帧头中的深度图区域。
  11. 根据权利要求8至10任一项所述的方法,其中,还包括:
    若确定编码方式属于多视联合编码,则获取所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识,并存储至所述图像集用户信息中的图像信息区域,将所述编码方式对应的编码标识信息存储至所述图像集用户信息中的编码信息区域;以及将所述帧同步的多个视点的纹理图和深度图分别采用采用预设的多视联合编码方式进行编码,得到纹理图压缩图像和深度图压缩图像。
  12. 根据权利要求11所述的方法,其中,还包括:
    将所述纹理图压缩图像和所述深度图压缩图像作为图像组合,封装至同一视频通道,得到包含所述纹理图压缩图像和所述深度图压缩图像的视频压缩帧。
  13. 根据权利要求11所述的方法,其中,所述获取所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识,并存储至所述图像集用户信息中的图像信息区域,将所述编码方式对应的编码标识信息存储至所述图像集用户信息中的编码信息区域,包括:
    将所述编码方式对应的编码类别标识存储至视频通道的通道头中编码类别区域;
    将所述编码标识信息存储至视频压缩帧帧头中的编码信息区域;
    将所述帧同步的多个视点的纹理图和深度图的视点标识及帧标识按照获取顺序存储于所述视频压缩帧帧头中的图像信息区域。
  14. 根据权利要求8至10任一项所述的方法,其中,所述将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,包括:
    按照光栅扫描方式扫描所述帧同步的多个视点的纹理图,并按照光栅扫描顺序将所述多个视点的纹理图进行拼接,得到所述纹理图拼接图像;
    按照所述光栅扫描方式扫描所述帧同步的多个视点的纹理图对应的深度图,并按照光栅扫描顺序将所述多个视点的深度图进行拼接,得到所述深度图拼接图像。
  15. 一种视频处理方法,其中,包括:
    对自由视点视频流进行解封装,获得视频压缩帧及所述视频压缩帧的编码方式所属编码类别信息;
    基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用深度图自定义编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像;其中,所述纹理图拼接图像包括同步的多个视点的纹理图;所述深度图拼接图像包括所述多个视点的纹理图对应的深度图。
  16. 根据权利要求15所述的方法,其中,所述对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像,包括:
    对所述视频压缩帧的帧头进行解码,从所述帧头中的编码信息存储区域,获取到所述纹理图压缩图像的编码标识信息和所述深度图压缩图像的编码标识信息;
    采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到所述深度图拼接图像;
    采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到所述纹理图拼接图像。
  17. 根据权利要求16所述的方法,其中,所述对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像,包括:
    利用第一解码资源对所述纹理图压缩图像进行解码,得到纹理图拼接图像;
    利用第二解码资源对所述深度图压缩图像进行解码,得到深度图拼接图像。
  18. 根据权利要求16所述的方法,其中,还包括:
    从所述视频压缩帧的帧头中获取视点的参数数据;
    获取虚拟视点,基于所述视点的参数数据,选择所述纹理图拼接图像和所述深度图拼接图像中与虚拟视点位置匹配的相应纹理图子区域中的纹理图和深度图子区域中的深度图作为参考纹理图和参考深度图;
    基于所述参考纹理图和参考深度图,重建得到所述虚拟视点的图像。
  19. 根据权利要求15至18任一项所述的方法,其中,还包括:
    基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用多视联合编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像采用对应的多视联合解码方式进行解码,得到同步的多个视点的纹理图以及对应的深度图。
  20. 根据权利要求19所述的方法,其中,所述对所述视频压缩帧中的联合压缩图像解码,得到同步的多个视点的纹理图以及对应的深度图,包括:
    对所述视频压缩帧的帧头进行解码,从所述帧头中的编码信息存储区域,获取到所述纹理图压缩图像的编码标识信息和深度图压缩图像的编码标识;
    采用与所述纹理图压缩图像的编码标识信息和深度图压缩图像的编码标识对应的多视联合解码方式对所述纹理图压缩图像和深度图压缩图像进行解码,得到所述同步的多个视点的纹理图以及对应的深度图。
  21. 根据权利要求20所述的方法,其中,还包括:
    从所述视频压缩帧的帧头中获得视点的标识及所述视频压缩帧的帧标识和各视点的参数数据;
    获取虚拟视点,基于各视点的标识和参数数据,从所述同步的多个视点的纹理图以及对应的深度图中选择与所述虚拟视点匹配的纹理图和深度图作为参考纹理图和参考深度图;
    基于所述参考纹理图和参考深度图,重建得到所述虚拟视点的图像。
  22. 一种视频处理方法,其中,包括:
    解码视频压缩帧的帧头,从所述帧头中获得所述视频压缩帧中包括的深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息;
    采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述纹理图拼接图像包括同步的多个视点的纹理图;
    采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
  23. 一种视频处理装置,其中,包括:
    纹理图拼接单元,适于将帧同步的多个视点的纹理图拼接为纹理图拼接图像;
    深度图拼接单元,适于将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
    第一编码单元,适于将所述纹理图拼接图像进行编码,得到对应的纹理图压缩图像;
    第二编码单元,适于将所述深度图拼接图像进行编码,得到对应的深度图压缩图像。
  24. 一种视频处理装置,其中,包括:
    编码类别确定单元,适于确定编码方式所属编码类别;
    第一编码处理单元,适于在确定编码方式属于深度图自定义编码时,将所述编码方式对应的编码标识信息存储至图像集用户信息中的编码信息区域;以及将帧同步的多个视点的纹理图拼接为纹理图拼接图像,将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像,以及将所述纹理图拼接图像和所述深度图拼接图像分别编码,得到对应的纹理图压缩图像和深度图压缩图像。
  25. 根据权利要求24所述的装置,其中,还包括:
    封装单元,适于将所述纹理图压缩图像和所述深度图压缩图像封装在同一视频通道。
  26. 一种视频处理装置,其中,包括:
    解封装单元,适于对自由视点视频流进行解封装,获得视频压缩帧及所述视频压缩帧的编码方式所属编码类别信息;
    第一解码单元,适于基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用深度图自定义编码时,对所述视频压缩帧中的纹理图压缩图像和深度图压缩图像分别解码,得到纹理图拼接图像和深度图拼接图像;其中,所述纹理图拼接图像包括同步的多个视点的纹理图;所述深度图拼接图像包括所述多个视点的纹理图对应的深度图。
  27. 根据权利要求26所述的装置,其中,还包括:
    第二解码单元,适于基于所述视频压缩帧的编码类别信息,在确定所述视频压缩帧采用多视联合编码时,对所述视频压缩帧中的联合压缩图像采用对应的多视联合解码方式进行解码,得到同步的多个视点的纹理图以及对应的深度图。
  28. 一种视频处理装置,其中,包括:
    指示信息解码单元,适于解码视频压缩帧的帧头,从所述帧头中获得所述视频压缩帧中包括的深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息;
    纹理图解码单元,适于采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述纹理图拼接图像包括同步 的多个视点的纹理图;
    深度图解码单元,适于采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
  29. 一种电子设备,其中,包括:
    图像处理装置,适于将帧同步的多个视点的纹理图拼接为纹理图拼接图像,以及将所述帧同步的多个视点的纹理图对应的深度图拼接为深度图拼接图像;
    第一编码装置,适于将所述纹理图拼接图像进行编码,得到对应的纹理图压缩图像;
    第二编码装置,适于将所述深度图拼接图像进行编码,得到对应的深度图压缩图像。
  30. 一种电子设备,其中,包括:第一解码装置和第二解码装置,其中:
    所述第一解码装置,适于解码视频压缩帧的帧头,并当从所述帧头的图像集用户信息中获得所述视频压缩帧中获得编码标识信息包括深度图压缩图像的编码标识信息和纹理图压缩图像的编码标识信息时,采用与所述纹理图压缩图像的编码标识信息对应的解码方式对所述纹理图压缩图像进行解码,得到纹理图拼接图像,所述纹理图拼接图像包括同步的多个视点的纹理图,以及触发所述第二解码装置对所述深度图压缩图像进行解码;
    所述第二解码装置,适于采用与所述深度图压缩图像的编码标识信息对应的解码方式对所述深度图压缩图像进行解码,得到深度图拼接图像,所述深度图拼接图像包括与所述多个视点的纹理图视点对应的深度图。
  31. 一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行权利要求1至7任一项或权利要求8至14任一项或权利要求15至21任一项或权利要求22所述方法的步骤。
  32. 一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行权利要求1至7任一项或权利要求8至14任一项或权利要求15至21任一项或权利要求22所述方法的步骤。
PCT/CN2021/108640 2020-07-31 2021-07-27 视频处理方法、装置、电子设备及存储介质 WO2022022501A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010762409.2A CN114071116A (zh) 2020-07-31 2020-07-31 视频处理方法、装置、电子设备及存储介质
CN202010762409.2 2020-07-31

Publications (1)

Publication Number Publication Date
WO2022022501A1 true WO2022022501A1 (zh) 2022-02-03

Family

ID=80037601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108640 WO2022022501A1 (zh) 2020-07-31 2021-07-27 视频处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114071116A (zh)
WO (1) WO2022022501A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023221764A1 (zh) * 2022-05-20 2023-11-23 海思技术有限公司 视频编码方法、视频解码方法及相关装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11677979B2 (en) * 2020-08-24 2023-06-13 Tencent America LLC Freeview video coding
WO2023201504A1 (zh) * 2022-04-18 2023-10-26 浙江大学 编解码方法、装置、设备及存储介质
WO2024011386A1 (zh) * 2022-07-11 2024-01-18 浙江大学 一种编解码方法、装置、编码器、解码器及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055982A (zh) * 2011-01-13 2011-05-11 浙江大学 三维视频编解码方法及装置
EP2373046A2 (en) * 2010-03-30 2011-10-05 Vestel Elektronik Sanayi ve Ticaret A.S. Super resolution based n-view + n-depth multiview video coding
CN104469387A (zh) * 2014-12-15 2015-03-25 哈尔滨工业大学 一种多视点视频编码中分量间的运动参数继承方法
US20150341614A1 (en) * 2013-01-07 2015-11-26 National Institute Of Information And Communications Technology Stereoscopic video encoding device, stereoscopic video decoding device, stereoscopic video encoding method, stereoscopic video decoding method, stereoscopic video encoding program, and stereoscopic video decoding program
CN109803135A (zh) * 2017-11-16 2019-05-24 科通环宇(北京)科技有限公司 一种基于sdi系统的视频图像传输方法及数据帧结构
CN110446051A (zh) * 2019-08-30 2019-11-12 郑州航空工业管理学院 基于3d-hevc的立体视频码流自适应系统及方法
CN111385585A (zh) * 2020-03-18 2020-07-07 北京工业大学 一种基于机器学习的3d-hevc深度图编码单元划分快速决策方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9565449B2 (en) * 2011-03-10 2017-02-07 Qualcomm Incorporated Coding multiview video plus depth content
CN105872561B (zh) * 2015-12-29 2019-07-23 上海大学 一种可分级多视点视频加深度宏块编码模式快速选择方法
CN108616748A (zh) * 2017-01-06 2018-10-02 科通环宇(北京)科技有限公司 一种码流及其封装方法、解码方法及装置
CN110012310B (zh) * 2019-03-28 2020-09-25 北京大学深圳研究生院 一种基于自由视点的编解码方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2373046A2 (en) * 2010-03-30 2011-10-05 Vestel Elektronik Sanayi ve Ticaret A.S. Super resolution based n-view + n-depth multiview video coding
CN102055982A (zh) * 2011-01-13 2011-05-11 浙江大学 三维视频编解码方法及装置
US20150341614A1 (en) * 2013-01-07 2015-11-26 National Institute Of Information And Communications Technology Stereoscopic video encoding device, stereoscopic video decoding device, stereoscopic video encoding method, stereoscopic video decoding method, stereoscopic video encoding program, and stereoscopic video decoding program
CN104469387A (zh) * 2014-12-15 2015-03-25 哈尔滨工业大学 一种多视点视频编码中分量间的运动参数继承方法
CN109803135A (zh) * 2017-11-16 2019-05-24 科通环宇(北京)科技有限公司 一种基于sdi系统的视频图像传输方法及数据帧结构
CN110446051A (zh) * 2019-08-30 2019-11-12 郑州航空工业管理学院 基于3d-hevc的立体视频码流自适应系统及方法
CN111385585A (zh) * 2020-03-18 2020-07-07 北京工业大学 一种基于机器学习的3d-hevc深度图编码单元划分快速决策方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023221764A1 (zh) * 2022-05-20 2023-11-23 海思技术有限公司 视频编码方法、视频解码方法及相关装置

Also Published As

Publication number Publication date
CN114071116A (zh) 2022-02-18

Similar Documents

Publication Publication Date Title
CN112738010B (zh) 数据交互方法及系统、交互终端、可读存储介质
KR102208129B1 (ko) 360 비디오 시스템에서 오버레이 처리 방법 및 그 장치
KR102241082B1 (ko) 복수의 뷰포인트들에 대한 메타데이터를 송수신하는 방법 및 장치
WO2022022501A1 (zh) 视频处理方法、装置、电子设备及存储介质
US11202086B2 (en) Apparatus, a method and a computer program for volumetric video
CN112738534B (zh) 数据处理方法及系统、服务器和存储介质
CN112738495B (zh) 虚拟视点图像生成方法、系统、电子设备及存储介质
KR102278848B1 (ko) 다중 뷰포인트 기반 360 비디오 처리 방법 및 그 장치
CN111727605B (zh) 用于发送和接收关于多个视点的元数据的方法及设备
WO2019076506A1 (en) APPARATUS, METHOD, AND COMPUTER PROGRAM FOR VOLUMETRIC VIDEO
WO2022022348A1 (zh) 视频压缩方法、解压方法、装置、电子设备及存储介质
JP7320146B2 (ja) ディスオクルージョンアトラスを用いたマルチビュービデオ動作のサポート
EP3434021B1 (en) Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices
WO2019229293A1 (en) An apparatus, a method and a computer program for volumetric video
EP3729805B1 (en) Method and apparatus for encoding and decoding volumetric video data
WO2021083175A1 (zh) 数据处理方法、设备、系统、可读存储介质及服务器
CN116325769A (zh) 从多个视点流式传输场景的全景视频
CN112738009B (zh) 数据同步方法、设备、同步系统、介质和服务器
EP3698332A1 (en) An apparatus, a method and a computer program for volumetric video
CN112734821B (zh) 深度图生成方法、计算节点及计算节点集群、存储介质
TWI817273B (zh) 即時多視像視訊轉換方法和系統
US20230008125A1 (en) Augmenting a view of a real-world environment with a view of a volumetric video object
WO2022141636A1 (en) Methods and systems for processing video streams with layer information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21851085

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21851085

Country of ref document: EP

Kind code of ref document: A1