WO2022022348A1 - 视频压缩方法、解压方法、装置、电子设备及存储介质 - Google Patents

视频压缩方法、解压方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022022348A1
WO2022022348A1 PCT/CN2021/107507 CN2021107507W WO2022022348A1 WO 2022022348 A1 WO2022022348 A1 WO 2022022348A1 CN 2021107507 W CN2021107507 W CN 2021107507W WO 2022022348 A1 WO2022022348 A1 WO 2022022348A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth map
texture
video frame
area
encoding
Prior art date
Application number
PCT/CN2021/107507
Other languages
English (en)
French (fr)
Inventor
盛骁杰
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2022022348A1 publication Critical patent/WO2022022348A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems

Definitions

  • the embodiments of this specification relate to the technical field of video processing, and in particular, to a video compression method, a decompression method, an apparatus, an electronic device, and a storage medium.
  • Video data is data that supports video playback for the user to watch. Generally, the video data only supports the user to watch from one viewing angle, and the user cannot adjust the viewing angle.
  • Free viewpoint video is a technology that can provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during the viewing process, and watch from the free viewpoint they want to watch, which can greatly improve the viewing experience.
  • the embodiments of the present specification provide a video compression method, a decompression method, an apparatus, a device, and a storage medium, which can reduce the compression loss of images, thereby improving the image quality of free-viewpoint videos.
  • the embodiments of this specification provide a video compression method, including:
  • the video frame includes a spliced image formed by synchronized texture maps of multiple viewpoints and depth maps of corresponding viewpoints;
  • the texture map area use a preset first encoding mode for encoding, and for the depth map area, use a preset second encoding mode for encoding to obtain a compressed video frame; wherein, the second encoding mode is: Region-of-interest-based encoding.
  • encoding the depth map region using a preset second encoding mode including:
  • encoding is performed based on a preset constant quantization parameter.
  • encoding the depth map region using a preset second encoding mode including:
  • the first constant quantization parameter is used for coding, and for the non-foreground edge pixel regions in the depth map sub-regions corresponding to each viewpoint, the second A constant quantization parameter is encoded, and the parameter value of the first constant quantization parameter is smaller than the parameter value of the second constant quantization parameter.
  • encoding the depth map region using a preset second encoding mode including:
  • a preset constant quality factor is used for encoding.
  • the texture map area includes a special effect rendering pixel area; the texture map area is encoded using a preset first encoding mode, including:
  • the special effect rendering pixel area in the texture map area is coded using a coding method suitable for augmented reality special effects.
  • the method further includes: using a constant bit rate for the stitched image.
  • the pixels in the depth map sub-region in the spliced image are all or part of the pixels in a set of pixels in a one-to-one correspondence with the pixels in the texture map sub-region of the corresponding viewpoint.
  • the texture map region includes a plurality of texture map sub-regions corresponding to viewpoints
  • the depth map region includes a plurality of depth map sub-regions corresponding to viewpoints
  • the depth map sub-regions are smaller than the texture map sub-regions. area.
  • the depth map sub-region in the spliced image is obtained by down-sampling the original depth map corresponding to the pixels in the texture map of the corresponding viewpoint one-to-one.
  • the embodiments of this specification also provide a video decompression method, including:
  • the compressed video frame includes a texture map area coded by the first coding mode and a depth map area coded by the second coding mode, wherein the The texture map area includes synchronized texture maps of multiple viewpoints, the depth map area includes depth maps of the viewpoints corresponding to each texture map, and the second encoding mode is a region-of-interest-based encoding mode;
  • the compressed video frame is decompressed according to the pixel blocks to obtain a spliced image of the free-view video frame, where the spliced image includes the synchronized multiple viewpoints.
  • the texture map area and the depth map area of the corresponding viewpoint are the texture map areas and the depth map area of the corresponding viewpoint.
  • the embodiments of this specification also provide a video compression device, including:
  • a video frame acquisition unit adapted to acquire a free-view video frame, the video frame comprising a stitched image formed by texture maps of multiple viewpoints and depth maps of corresponding viewpoints;
  • an identification unit adapted to identify the texture map area and the depth map area in the stitched image
  • a first encoding unit for encoding the texture map region using a preset first encoding mode
  • the second coding unit for the depth map region, uses a preset second coding manner to perform coding, and the second coding manner is a coding manner based on a region of interest.
  • the embodiments of this specification also provide a video decompression device, including:
  • an acquisition unit adapted to acquire a compressed video frame and a quantization parameter of a pixel block in the compressed video frame, where the compressed video frame includes a texture map area coded by the first coding mode and a depth map coded by the second coding mode region, wherein the texture map region includes texture maps of multiple simultaneous viewpoints, the depth map region includes depth maps of the viewpoints corresponding to each texture map, and the second encoding method is an encoding method based on a region of interest;
  • a decompression unit adapted to perform decompression processing on the compressed video frame according to the pixel block based on the quantization parameter of the pixel block in the compressed video frame to obtain a spliced image of the free-view video frame, the spliced image including the Synchronized texture map regions of multiple viewpoints and depth map regions of corresponding viewpoints.
  • Embodiments of the present specification further provide an electronic device, including a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes the aforementioned computer instructions when the processor executes the computer instructions.
  • inventions of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are executed.
  • the texture map area and the depth map area in the spliced image of the video frame are compressed by using different coding modes respectively, and the depth map area is compressed by using the coding mode based on the region of interest , which can be compressed according to the image features of the depth map area, so that the compression loss of the depth map can be reduced, and the image quality of the free-view video reconstructed based on the depth map can be improved.
  • coding based on preset constant quantization parameters can reduce the compression loss of the ROI pixel area, and on the basis of realizing the compression of the depth map area as much as possible, it can reduce the compression loss of the ROI pixel area. Compression loss in the depth map region of the stitched image.
  • the first constant quantization parameter is used to encode the foreground edge pixel area in the depth map sub-region corresponding to each viewpoint included in the depth map area, and the non-foreground edge pixel area in the depth map sub-region corresponding to each viewpoint is encoded.
  • the second constant quantization parameter is used for encoding.
  • the first constant quantization parameter used for the foreground edge pixel region in the depth map sub-region is The parameter value is smaller than the parameter value of the second quantization parameter used for the non-foreground edge pixel region in the depth map sub-region, which can reduce the compression loss of the depth map region, thereby improving the reconstruction quality of the reconstructed free-view video.
  • FIG. 1 is a schematic diagram of a specific application system of a free-view video display in an embodiment of this specification
  • FIG. 2 is a schematic diagram of an interactive interface of a terminal device in an embodiment of this specification
  • FIG. 3 is a schematic diagram of a setting mode of a collection device in an embodiment of the present specification
  • FIG. 4 is a schematic diagram of another terminal device interaction interface in the embodiment of this specification.
  • FIG. 5 is a schematic diagram of a free-viewpoint video data generation process in an embodiment of the present specification
  • 6 is a schematic diagram of the generation and processing of a kind of 6DoF video data in the embodiment of this specification;
  • FIG. 7 is a schematic structural diagram of a data header file in an embodiment of the present specification.
  • FIG. 8 is a schematic diagram of a user side processing 6DoF video data in an embodiment of the present specification
  • FIG. 9 is a schematic structural diagram of a spliced image of a video frame in the embodiment of the present specification.
  • FIG. 10 is a flowchart of a video compression method in the embodiment of the present specification.
  • FIG. 11 is a schematic structural diagram of another stitched image in the embodiment of this specification.
  • FIG. 13 is a schematic structural diagram of another stitched image in the embodiment of this specification.
  • 15 is a schematic structural diagram of another stitched image in the embodiment of this specification.
  • 16 is a schematic diagram of pixel data distribution of a texture map in an embodiment of the present specification.
  • 17 is a schematic diagram of pixel data distribution of another texture map in an embodiment of the present specification.
  • FIG. 18 is a schematic diagram of data storage in a spliced image in an embodiment of the present invention.
  • 19 is a schematic diagram of data storage in another spliced image in an embodiment of the present invention.
  • FIG. 20 is a schematic diagram of a video compression method for a specific application scenario in the embodiment of this specification.
  • 21 is a flowchart of a video decompression method in the embodiment of this specification.
  • 22 is a schematic structural diagram of a video compression apparatus in an embodiment of the present specification.
  • FIG. 23 is a schematic structural diagram of a video decompression device in the embodiment of this specification.
  • 24 is a schematic structural diagram of an electronic device in an embodiment of this specification.
  • FIG. 25 is a schematic structural diagram of a video processing system in an embodiment of the present specification.
  • a specific application system for free-view video display in an embodiment of the present invention may include a collection system 11 of multiple collection devices, a server 12 , and a display device 13 , wherein the collection system 11 can collect images of the area to be viewed.
  • the acquisition system 11 or the server 12 can process the acquired synchronized multiple texture maps to generate multi-angle free viewing angle data that can support the display device 13 to perform virtual viewpoint switching.
  • the display device 13 can display reconstructed images generated based on multi-angle free viewing angle data, the reconstructed images correspond to virtual viewpoints, and can display reconstructed images corresponding to different virtual viewpoints according to user instructions, and switch the viewing position and viewing angle.
  • the process of performing image reconstruction to obtain a reconstructed image may be implemented by the display device 13, or may be implemented by a device located in a content delivery network (Content Delivery Network, CDN) by means of edge computing.
  • CDN Content Delivery Network
  • the user can view the area to be viewed through the display device 13 , and in this embodiment, the area to be viewed is a basketball court. As mentioned earlier, the viewing position and viewing angle can be switched.
  • the user can swipe across the screen to switch virtual viewpoints.
  • the virtual viewpoint for viewing can be switched.
  • the position of the virtual viewpoint before sliding may be VP 1
  • the position of the virtual viewpoint may be VP 2 .
  • the reconstructed image displayed on the screen may be as shown in FIG. 4 .
  • the reconstructed image may be obtained by performing image reconstruction based on multi-angle free viewing angle data generated from images collected by multiple collection devices in an actual collection situation.
  • the image viewed before switching may also be a reconstructed image.
  • the reconstructed images may be frame images in the video stream.
  • the manner of switching the virtual viewpoint according to the user's instruction may be various, which is not limited here.
  • the viewpoint can be represented by coordinates of 6 degrees of freedom (DoF), wherein the spatial position of the viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotation directions Accordingly, based on the coordinates of 6 degrees of freedom, a virtual viewpoint, including position and viewing angle, can be determined.
  • DoF degrees of freedom
  • the multi-angle free viewing angle data may include depth map data, which is used to provide third-dimensional information outside the plane image. Compared with other implementations, such as providing three-dimensional information through point cloud data, the data volume of the depth map data is smaller.
  • the switching of the virtual viewpoints may be performed within a certain range, which is a multi-angle free viewing angle range. That is, within the multi-angle free viewing angle range, the position and viewing angle of the virtual viewpoint can be switched arbitrarily.
  • the multi-angle free viewing angle range is related to the arrangement of the acquisition device.
  • the wider the shooting coverage of the acquisition device the larger the multi-angle free viewing angle range.
  • the quality of the picture displayed by the terminal device is related to the number of collection devices. Generally, the more collection devices are set, the fewer empty areas in the displayed picture.
  • the range of multi-angle free viewing angles is related to the spatial distribution of the acquisition devices.
  • the range of multi-angle free viewing angles and the interaction mode with the display device on the terminal side can be set based on the spatial distribution relationship of the collection devices.
  • texture map acquisition and depth map calculation are required, including three main steps, namely Multi-camera Video Capturing, camera internal and external parameter calculation (Camera Parameter Estimation), and Depth Map Calculation.
  • Multi-camera Video Capturing it is required that the video captured by each camera can be aligned at the frame level.
  • the texture image (Texture Image) can be obtained through the video acquisition of multiple cameras;
  • the camera parameters (Camera Parameter) can be obtained through the calculation of the internal and external parameters of the camera, and the camera parameters can include the internal parameter data of the camera and the external parameter data;
  • the depth map can be obtained, the synchronized texture maps of multiple perspectives, the depth map and camera parameters of the corresponding perspectives can be obtained to form 6DoF video data.
  • the texture map collected from multiple cameras, the camera parameters of all cameras, and the depth map of each camera are obtained.
  • These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6DoF video data. Because of these data, the client can generate virtual viewpoints according to the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.
  • DoF degrees of freedom
  • 6DoF video data and indicative data can be compressed and transmitted to the user side, and the user side can obtain the user side 6DoF expression according to the received data, that is, the aforementioned 6DoF video data and metadata.
  • the indicative data may also be called metadata, wherein the video data includes texture map and depth map data of each viewpoint corresponding to multiple cameras, and the texture map and depth map can be spliced according to certain splicing rules or splicing modes , forming a stitched image.
  • Metadata can be used to describe the data pattern of 6DoF video data, specifically can include: stitching pattern metadata (Stitching Pattern metadata), used to indicate the pixel data of multiple texture maps and depth map data in the stitched image. Storage rules; edge protection metadata (Padding pattern metadata), which can be used to indicate the way of edge protection in stitched images, and other metadata (Other metadata).
  • stitching Pattern metadata used to indicate the pixel data of multiple texture maps and depth map data in the stitched image.
  • Storage rules edge protection metadata (Padding pattern metadata), which can be used to indicate the way of edge protection in stitched images
  • other metadata Other metadata
  • the user side obtains 6DoF video data, which includes camera parameters, stitched images (texture map and depth map), and description metadata (metadata), in addition to user-end interactive behavior data .
  • 6DoF video data which includes camera parameters, stitched images (texture map and depth map), and description metadata (metadata), in addition to user-end interactive behavior data .
  • DIBR Depth Image-Based Rendering
  • 6DoF rendering so as to generate a virtual viewpoint image at a specific 6DoF position generated according to user interaction behavior, that is, according to The user instructs, and the virtual viewpoint of the 6DoF position corresponding to the instruction is determined.
  • DIBR Depth Image-Based Rendering
  • the texture maps and corresponding depth maps of multiple synchronized viewpoints are spliced, and the spliced image after splicing is unified.
  • the encoding method is compressed and transmitted to the terminal side (such as a display device), where decoding and DIBR image reconstruction are performed on the terminal side.
  • the inventors have found through research that, due to the limitation of transmission bandwidth, using a unified coding method to compress the stitched images will cause compression loss of the depth map, which in turn will have a great impact on the image quality of the virtual viewpoint reconstructed by DIBR.
  • the texture map area and the depth map area in the spliced image of the video frame are respectively compressed by different coding methods, wherein the depth map area is compressed based on the region of interest ( Region of Interest, RoI) coding method can be compressed according to the image features of the depth map area, so that the compression loss of the depth map can be reduced, and then the image quality of the free-view video can be reconstructed based on the depth map.
  • region of interest Region of Interest, RoI
  • S101 Obtain a free-view video frame, where the video frame includes a spliced image formed by synchronized texture maps of multiple viewpoints and depth maps of corresponding viewpoints.
  • the free-view video frame may be formed by splicing a texture map and a depth map.
  • each video frame may include a spliced image formed by splicing synchronized texture maps of multiple viewpoints and depth maps of corresponding viewpoints.
  • FIG. 9 the structure of a spliced image included in a video frame is shown in FIG. 9 , which includes synchronized texture maps of 8 viewpoints and corresponding depth maps of 8 viewpoints.
  • the texture map area and the depth map area in the stitched image can be identified by means of image feature recognition, or by acquiring stitching mode metadata of the stitched image. As shown in Figure 9, it can be recognized that the upper half of the stitched image is the texture map area, and the lower half is the depth map area.
  • S103 For the texture map area, use a preset first encoding mode for encoding, and for the depth map area, use a preset second encoding mode for encoding to obtain a compressed video frame; wherein, the second encoding
  • the method is a coding method based on the region of interest.
  • encoding may be performed based on a preset constant quantization parameter (Constant Quantization Parameter, CQP).
  • CQP Constant Quantization Parameter
  • the spliced image is formed by splicing the texture maps of multiple simultaneous viewpoints and the depth maps of the corresponding viewpoints.
  • the texture map area corresponding to each viewpoint may be called the texture map sub-region of the corresponding viewpoint.
  • the corresponding depth map regions may be referred to as depth map sub-regions of the corresponding viewpoints.
  • the inventor found through research that the quality of the foreground edge pixel region in the depth map sub-region is very critical to the image quality reconstructed based on the virtual viewpoint. Based on this, in order to improve the image quality of the reconstructed virtual viewpoint, in some embodiments of this specification, the first constant quantization is adopted for the foreground edge pixel area in the depth map sub-region corresponding to each viewpoint included in the depth map area
  • the parameters are encoded, and the non-foreground edge pixel region in the depth map sub-region corresponding to each viewpoint is encoded with a second constant quantization parameter, the parameter value of the first constant quantization parameter is smaller than the parameter value of the second constant quantization parameter.
  • the QP of 0 indicates that lossless encoding is performed.
  • CQP the instantaneous bit rate will vary with Scene complexity fluctuates.
  • the value of the first constant quantization parameter is 10, and the value of the second constant quantization parameter is 40. It can be understood that the values of the first constant quantization parameter and the second constant quantization parameter may also take other values, as long as the parameter value of the first constant quantization parameter is smaller than the second constant quantization parameter. parameter value. To ensure the image quality of the reconstructed virtual viewpoint, it is recommended that the value of the first constant quantization parameter be less than or equal to 22.
  • the first constant quantization parameter and the second constant quantization parameter may take values based on a preset pixel block size.
  • the size of the pixel block is 3*3 pixels; for another example, the size of the pixel block is 32*32 pixels.
  • a constant quantization parameter may also be used in the texture map area, and may be the same as or different from the quantization parameter used in the non-foreground edge pixel area in the depth map area.
  • constant quantization parameters may be used for the texture map area of the spliced image, the non-foreground edge pixel area and the foreground edge pixel area of the depth map area, wherein the depth map area
  • the value of the quantization parameter corresponding to the foreground edge pixel area of is the smallest.
  • the value of the three values can be decreased in sequence.
  • a preset constant quality factor (Constant Quality Factor, CQF) may be used for encoding. Similar to CQP, CQF pursues constant subjectively perceived quality, and the instantaneous bit rate also fluctuates with the complexity of the scene.
  • a constant bit rate can be used to perform rate-distortion optimization.
  • a code rate control module can be set to control the code rate of the encoded video to meet specific application requirements by selecting a series of encoding parameters, and to minimize the encoding distortion. Possibly small.
  • the bit rate coefficient of the entire spliced image can be controlled to be constant, and at the same time, for the foreground edge pixel region in the depth map region, the first constant quantization parameter can be used for encoding,
  • the corresponding quantization parameter may be determined based on the bit rate coefficient of the entire spliced image and the proportion of the foreground edge pixel area in the depth map area value, the obtained quantization parameter value may be fixed or change instantaneously.
  • the pixels in the depth map sub-region in the spliced image may be all or part of the pixel points in the pixel set corresponding to the pixels in the texture map sub-region of the corresponding viewpoint.
  • the pixels of the depth map in the spliced image are all the pixels in the set of pixels that correspond one-to-one with the pixels in the texture map of the corresponding viewpoint.
  • the texture map of each corresponding viewpoint may be the original texture map collected by the camera, or the texture map of which the original texture map has been reduced in resolution.
  • the depth map of each corresponding viewpoint may be the original texture map based on the corresponding viewpoint.
  • the estimated original depth map can also be a reduced-resolution depth map for the original depth map.
  • the texture map region includes a plurality of texture map sub-regions corresponding to viewpoints
  • the depth map region includes a plurality of depth map sub-regions corresponding to viewpoints
  • the depth map sub-regions are smaller than all of the depth map sub-regions. Describe the texture map sub-area.
  • FIG. 11 or FIG. 12 is a schematic structural diagram of a spliced image
  • the texture map area includes 8 texture map sub-regions corresponding to viewpoints
  • the depth map area includes 8 depth map sub-regions corresponding to viewpoints, so The depth map sub-region is smaller than the texture map sub-region.
  • the texture map area and the depth map area in the spliced image can be divided into multiple, as shown in FIG. 12, the depth map area is divided into the area above the texture map and the area below the texture map, that is :
  • the texture map area is a continuous area, and the depth map area includes two continuous areas.
  • the depth map sub-region in the spliced image may be obtained by down-sampling the original depth map corresponding to the pixels in the texture map of the corresponding viewpoint.
  • FIG. 11 or FIG. 12 etc. are not restrictions on dividing the stitched image in an unequal manner.
  • the number of texture maps and depth maps in the stitched image is determined based on the number of viewpoints collected synchronously,
  • the pixel quantity and aspect ratio of the spliced image can be various, and the division method can also be various.
  • the number of pixels in the depth map area may also be more than the number of pixels occupied by pixel data in the texture map area.
  • the texture map area may include two continuous areas, and the depth map area may also include two continuous areas.
  • the texture map area and the depth map area can be arranged at intervals.
  • the texture map sub-regions included in the texture map region may be arranged at intervals from the depth map sub-regions included in the depth map region.
  • the number of continuous regions included in the texture map region may be equal to the number of texture map sub-regions, and the number of continuous regions included in the depth map region may be equal to the number of depth map sub-regions.
  • the pixel data of each texture map may be stored in the texture map sub-region according to the order in which the pixel points are arranged.
  • the depth data corresponding to each texture map may also be stored in the depth map sub-region in the order of pixel point arrangement.
  • the texture map 1 is illustrated with 9 pixels in FIG. 16, and the texture map 2 is illustrated with 9 pixels in FIG. 17.
  • the texture map 1 and the texture map 2 are synchronized from different viewpoints (or two images from different perspectives). According to texture map 1 and texture map 2, depth data corresponding to texture map 1 can be obtained, including depth value 1 of texture map 1 to depth value 9 of texture map 1, and depth data corresponding to texture map 2, including depth value 1 of texture map 2 to texture map 2 depth value 9.
  • the texture map 1 when the texture map 1 is stored in the image sub-region, the texture map 1 can be stored in the upper left image sub-region in the order of pixel point arrangement, that is, in the image sub-region, the arrangement of the pixel points Can be the same as texture map 1.
  • the texture map 2 is stored in the image sub-region, and can also be stored in the upper right image sub-region in this way.
  • the depth data of texture map 1 can be stored in the depth map sub-region in a similar manner.
  • the method as shown in FIG. 18 can be used. storage. If the depth value is obtained by down-sampling the original depth map, it can be stored in the depth map sub-region according to the order in which the pixels of the depth map obtained after the down-sampling are arranged.
  • the compression rate for compressing an image is related to the association of each pixel in the image, and the stronger the association, the higher the compression rate. Since the captured image corresponds to the real world, and the correlation of each pixel point is strong, by storing the pixel data and depth data of the image in the order of the pixel point arrangement, the compression rate can be improved when compressing the stitched image. High, that is, the amount of data after compression can be made smaller when the amount of data before compression is the same.
  • each texture map sub-area is obtained from images (texture maps) shot in the area to be viewed from different viewpoints (different angles) or frame images in the video, and all depth maps are stored in the depth map area.
  • images texture maps
  • viewpoints different angles
  • frame images in the video
  • edge protection may be performed on all or part of the texture map sub-region and the depth map sub-region.
  • the form of edge protection can be various. For example, taking the depth map of view point 1 in Figure 14 as an example, redundant pixels can be set around the depth map of the original view point 1; or the number of pixels of the depth map of the original view point 1 can be maintained. No change, redundant pixels that do not store actual pixel data are left around, and the original viewpoint 1 depth map is reduced and stored in the remaining pixels; or in other ways, the depth map of viewpoint 1 and other surrounding images can finally be separated. Redundant pixels are left out.
  • the stitched image includes multiple texture maps and depth maps, the correlation between adjacent boundaries of each image is poor.
  • edge protection when compressing the stitched image, the texture map and depth map in the stitched image can be reduced. quality loss.
  • the pixel field of the texture map sub-region may store three-channel data
  • the pixel field of the depth map sub-region may store single-channel data.
  • the pixel field of the texture map sub-region is used to store pixel data of any texture map in multiple synchronized texture maps, and the pixel data is usually three-channel data, such as RGB data or YUV data.
  • the depth map sub-area is used to store the depth data of the depth map. If the depth value is 8-bit binary data, it can be stored in a single channel of the pixel field. If the depth value is 16-bit binary data, it can be stored in two channels of the pixel field. to store. Alternatively, the depth value can also be stored in a larger pixel area. For example, if multiple texture maps to be synchronized are 1920*1080 images, and the depth value is 16-bit binary data, the depth value can also be stored in 2 times the 1920*1080 depth map area, and each texture map area is stored as single channel.
  • the stitched image can also be divided in combination with the specific storage method.
  • the uncompressed data amount of the spliced image is stored in the way that each channel of each pixel occupies 8 bits, and can be calculated according to the following formula: the number of synchronized multiple texture maps * (the data volume of the pixel data of the texture map + The amount of data in pixels of the depth map).
  • the original depth map can also occupy 1920*1080 pixels, which is a single channel.
  • the pixel data volume of the original texture map is: 1920*1080*8*3bit
  • the data volume of the original depth map is 1920*1080*8bit
  • the number of cameras is 30, the pixel data volume of the stitched image is 30*(1920 *1080*8*3+1920*1080*8)bit, about 237M, if it is not compressed, it will occupy a lot of system resources and the delay will be large.
  • the bandwidth is small, for example, when the bandwidth is 1Mbps, an uncompressed stitched image takes about 237s to transmit, which is poor in real-time performance and needs to be improved for user experience.
  • the number of pixels of the stitched image is about Sixteenth before downsampling.
  • the amount of data can be reduced.
  • a stitched image with a higher resolution can also be generated to improve the image quality.
  • the pixel data and depth data of the synchronized multiple texture maps may also be stored in other ways, for example, stored in the stitched image in pixel units.
  • the texture map 1 and the texture map 2 shown in Fig. 16 and Fig. 17 can be stored to the mosaic image in the manner of Fig. 19.
  • the pixel data of the texture maps of multiple viewpoints and the depth data of the corresponding viewpoint depth maps can be stored in the spliced image, and the spliced image can be divided into texture map areas and depth map areas in various ways, or it may not be divided. Store the pixel data and depth data of the texture map in a preset order.
  • the synchronized multiple texture maps may also be synchronized multiple frame images obtained by decoding multiple videos.
  • the video can be acquired by multiple cameras, and its settings can be the same as or similar to the camera used to acquire the texture map.
  • the storage methods for splicing images may have corresponding association field descriptions, so that the data processing device can obtain relevant data according to the association field.
  • the associated texture map and depth map data may have corresponding association field descriptions, so that the data processing device can obtain relevant data according to the association field.
  • the picture format of the spliced image may be any one of image formats such as BMP, PNG, JPEG, and Webp, or may also be other image formats.
  • the storage method of pixel data and depth data in free-view video data (multi-angle free-view image data) is not limited to the method of stitching images. It can be stored in other ways, and can also have a corresponding field description of the association relationship.
  • the method described in the aforementioned step S102 can be used to identify the spliced image I in the spliced image I.
  • the texture map area Zt and the depth map area Zd are used to identify the spliced image I in the spliced image I.
  • a preset first encoding method is used for encoding, and a compressed texture map area Zt' can be obtained
  • a preset second encoding method is used for encoding, which can be Obtain the compressed depth map area Zd', wherein, the second coding mode is the coding mode based on the region of interest; Based on the compressed texture map area Zt' and the compressed depth map area Zd', the compressed video frame I can be obtained '.
  • a corresponding decompression device can be used for decompression processing, and based on the spliced image obtained after decompression, the free-view video image is decompressed. reconstruction.
  • the decompression process may be performed correspondingly in the following manner:
  • S211 Acquire a compressed video frame and a quantization parameter of a pixel block in the compressed video frame, where the compressed video frame includes a texture map area coded using the first coding mode and a depth map area coded using the second coding mode, so
  • the second encoding method is an encoding method based on the region of interest.
  • the texture map area includes texture maps of multiple synchronized viewpoints
  • the depth map area includes depth maps of viewpoints corresponding to each texture map.
  • S212 Based on the quantization parameters of the pixel blocks in the compressed video frame, perform decompression processing on the compressed video frame according to the pixel blocks to obtain a spliced image of the free-view video frame, where the spliced image includes the synchronized multiple The texture map area of the viewpoint and the depth map area of the corresponding viewpoint.
  • the quantization parameters in the corresponding pixel blocks will be different, so based on the quantization parameters of the pixel blocks in the compressed video frame,
  • the compressed video frame is decompressed according to pixel blocks, so that a spliced image of the free-view video frame can be obtained.
  • the compressed video frame is decompressed by the above method. Since the compressed video frame includes the texture map area encoded by the first encoding method and the depth map area encoded by the second encoding method, the second encoding method is: Encoding based on the coding method of the region of interest can be compressed according to the image features of the depth map region, so the depth map region obtained after decompression has higher image quality, so it can improve the quality of the free-view video reconstructed based on the depth map. Image Quality.
  • the obtained free-view video frame and the parameter data corresponding to the free-view video frame can be based on the texture map and depth map of the corresponding viewpoint in the free-view video frame.
  • the texture map and the corresponding depth map of the partial viewpoints in the free viewpoint video frame may be selected based on the free viewpoint, and the texture map of the selected partial viewpoint and the corresponding depth map can be used for combined rendering to obtain the obtained The reconstructed image of the virtual viewpoint.
  • the determination may be based on the spatial positional relationship between the virtual viewpoint and each viewpoint in the free viewpoint video frame, and specifically may be determined based on parameter data corresponding to the free viewpoint video frame.
  • an augmented reality (Augmented Reality, AR) special effect may be implanted in the reconstructed free-viewpoint image.
  • AR Augmented Reality
  • the special effect rendering pixel area may be generated based on user interaction, or may be determined based on certain preset trigger conditions or third-party instructions.
  • the virtual information image to be implanted it can be provided directly or generated in real time based on interaction. After the special effect rendering pixel area is determined, the corresponding augmented reality special effect input data can be obtained, and then a matching virtual information image can be generated.
  • the special effect rendering pixel area can be obtained through three-dimensional calibration, and then a virtual information image matching the special effect rendering pixel area is obtained.
  • the position of the virtual rendering target object corresponding to the special effect rendering pixel area in the reconstructed image can be determined.
  • the virtual information image that matches the position of the virtual rendering object in the special effect rendering pixel area is obtained by calibration, so that the obtained virtual information image can be more matched with the position of the virtual rendering target object in the three-dimensional space, and then the displayed The virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite image is more realistic and vivid, enhancing the user's visual experience.
  • a virtual information image corresponding to the target object may be generated according to a preset special effect generation method based on the augmented reality special effect input data of the virtual rendering target object.
  • a variety of special effects generation methods can be used.
  • the augmented reality special effect input data of the target object may be input into a preset three-dimensional model, and the output matches the virtual rendering target object based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the image.
  • the augmented reality special effects input data of the virtual rendering target object can be input into a preset machine learning model, and the position of the virtual rendering target object in the image obtained based on the three-dimensional calibration can be output and the same as that of the virtual rendering target object.
  • ROI-based coding compression may be performed on the texture map region.
  • a special encoding method suitable for augmented reality special effects is incorrectly encoded.
  • Embodiments of this specification also provide embodiments of a corresponding video compression apparatus, such as a schematic structural diagram of a video compression apparatus shown in FIG. 22 , wherein the video compression apparatus 220 may include: a video frame acquisition unit 221 , an identification unit 222 , a first coding unit 223 and second coding unit 224, wherein:
  • the video frame obtaining unit 221 is adapted to obtain a free-view video frame, the video frame includes a stitched image formed by texture maps of multiple viewpoints and depth maps of corresponding viewpoints;
  • the identifying unit 222 is adapted to identify the texture map area and the depth map area in the spliced image
  • the first encoding unit 223 uses a preset first encoding mode to encode the texture map area;
  • the second encoding unit 224 uses a preset second encoding mode to encode the depth map region, and the second encoding mode is an encoding mode based on a region of interest.
  • the texture map area and the depth map area in the spliced image of the video frame are compressed by different coding methods respectively, wherein the depth map area is compressed by using the coding method based on the region of interest, and the depth map area can be compressed according to the depth
  • the image features of the map area are compressed, so that the compression loss of the depth map can be reduced, and the image quality of the free-view video reconstructed based on the depth map can be obtained.
  • the video compression apparatus may be implemented by a hardware encoder, an encoding chip or an encoding module composed of multiple modules, and a special hardware logic device or an encoding module may be set in the hardware encoder, encoding chip or encoding module. / and a software algorithm to compress the obtained free-view video frame to obtain a compressed video frame.
  • the embodiments of this specification also provide a video decompression apparatus corresponding to the video compression apparatus.
  • the video decompression apparatus 230 may include Acquisition unit 231, decompression unit 232, wherein:
  • the obtaining unit 231 is adapted to obtain a compressed video frame and a quantization parameter of a pixel block in the compressed video frame, wherein the compressed video frame includes a texture map area coded by the first coding mode and a texture map area coded by the second coding mode.
  • the depth map area wherein the texture map area includes the texture maps of multiple synchronized viewpoints, the depth map area includes the depth maps of the viewpoints corresponding to each texture map, and the second encoding method is based on the encoding of the region of interest Way;
  • the decompression unit 232 is adapted to perform decompression processing on the compressed video frame according to the pixel block based on the quantization parameter of the pixel block in the compressed video frame to obtain a spliced image of the free-view video frame, the spliced image.
  • a texture map area of the synchronized multiple viewpoints and a depth map area of the corresponding viewpoint are included.
  • the video decompression device can be implemented by a hardware decoder, a decoding chip or a decoding module composed of multiple modules, and a special hardware logic device or a decoding module can be set in the hardware decoder, decoding chip or decoding module / and a software algorithm to decompress the obtained free-viewpoint compressed video frame to obtain a free-viewpoint video frame.
  • the specific encoding mode and the decoding mode can also be selected based on user interaction.
  • a coding selection instruction in response to the coding selection interactive operation, can be generated, and then a corresponding coding unit (such as one or more of a specific encoder, coding module, coding chip, coding software, coding system, etc.) can be triggered to perform coding; similarly, in response to the decoding selection interaction, a decoding selection instruction can be generated, which can then trigger the corresponding decoding unit (such as one or more of a specific decoder, decoding module, decoding chip, decoding software, decoding system, etc. ) to decode.
  • a corresponding coding unit such as one or more of a specific encoder, coding module, coding chip, coding software, decoding system, etc.
  • the embodiment of the present specification also provides an electronic device.
  • the electronic device 240 may include a memory 241 and a processor 242. computer instructions running on the processor 242, wherein, when the processor executes the computer instructions, the steps of the method described in any of the foregoing embodiments can be performed.
  • the electronic device may also include other electronic components or assemblies.
  • the electronic device 240 may further include a communication component 243, which may communicate with the acquisition system or a cloud server to obtain a texture map of multiple viewpoints for generating synchronization of free-view video frames, or to obtain The free-viewpoint compressed video frame obtained after the compression process is performed using the method in the aforementioned embodiments of the present specification can then be decompressed by the processor 242 based on the compressed video frame obtained by the communication component 243, and the free-viewpoint video frame can be decompressed according to the virtual viewpoint position. reconstruction.
  • a communication component 243 may communicate with the acquisition system or a cloud server to obtain a texture map of multiple viewpoints for generating synchronization of free-view video frames, or to obtain The free-viewpoint compressed video frame obtained after the compression process is performed using the method in the aforementioned embodiments of the present specification can then be decompressed by the processor 242 based on the compressed video frame obtained by the communication component 243, and the free-viewpoint video frame can be decompressed according to the virtual viewpoint position. reconstruction.
  • the electronic device 240 may further include a display component 244 (eg, display, touch screen, projector) to display the reconstructed image of the virtual viewpoint.
  • a display component 244 eg, display, touch screen, projector
  • the electronic device can be set as a cloud server or a server cluster, or as a local server to perform compression processing on free-view video frames before transmission.
  • the electronic device may specifically be a handheld electronic device such as a mobile phone, a notebook computer, a desktop computer, a set-top box and other electronic devices with video processing and playback functions.
  • the received compressed video frames can be processed.
  • the decompression process is performed, and based on the decompressed video frame and the acquired virtual viewpoint, the image of the virtual viewpoint is reconstructed.
  • the virtual viewpoint may be determined based on user interaction behavior, or determined based on preset position information of the virtual viewpoint.
  • the memory, the processor, the communication component and the display component may communicate through a bus network.
  • the communication component 243 and the display component 244 and the like may be components arranged inside the electronic device 240, or may be external components connected by expansion components such as expansion interfaces, docking stations, and expansion cables. equipment.
  • the processor 242 can use a central processing unit (Central Processing Unit, CPU) (such as a single-core processor, a multi-core processor), a CPU group, a graphics processing unit (Graphics Processing Unit, GPU), an AI chip , FPGA chip, etc. any one or more of them are implemented in coordination.
  • CPU Central Processing Unit
  • CPU group such as a single-core processor, a multi-core processor
  • GPU graphics processing unit
  • AI chip such as AI chip, FPGA chip, etc. any one or more of them are implemented in coordination.
  • an electronic device cluster composed of multiple electronic devices may also be used for collaborative implementation.
  • the video processing system A0 includes a collection array A1 composed of a plurality of collection devices, a data processing device A2, a server cluster A3 in the cloud, a playback control device A4, a playback terminal A5 and an interactive terminal A6.
  • each acquisition device in the acquisition array A1 can be placed in a fan shape at different positions in the on-site acquisition area, and can synchronously acquire video data streams from corresponding angles in real time.
  • the collection device may also be arranged in the ceiling area of the basketball stadium, on the basketball hoop, and the like.
  • the collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle or an irregular shape.
  • the specific arrangement can be set according to one or more factors such as the specific on-site environment, the number of acquisition devices, the characteristics of the acquisition devices, and imaging effect requirements.
  • the collection device may be any device with a camera function, such as a common camera, a mobile phone, a professional camera, and the like.
  • the data processing device A2 can be placed in a non-acquisition area on site, and can be regarded as an on-site server.
  • the data processing device A2 may send a stream pull instruction to each acquisition device in the acquisition array A1 through a wireless local area network, respectively, and each acquisition device in the acquisition array A1 will obtain a stream based on the stream pull instruction sent by the data processing device A2.
  • the video data stream is transmitted to the data processing device A3 in real time.
  • each acquisition device in the acquisition array A1 can transmit the obtained video data stream to the data processing device A2 in real time through the switch A7.
  • the acquisition array A1 and the switch A7 together form an acquisition system.
  • the data processing device A2 When the data processing device A2 receives the video frame interception instruction, it intercepts the video frame at the specified frame moment from the received multi-channel video data stream to obtain frame images of multiple synchronized video frames, and uses the obtained specified Multiple synchronized video frames at frame moments are uploaded to the server cluster A3 in the cloud.
  • the server cluster A3 in the cloud uses the received original texture maps of multiple synchronized video frames as an image combination, determines the parameter data corresponding to the image combination and the estimated depth map corresponding to each original texture map in the image combination, and Based on the corresponding parameter data of the image combination, the pixel data of the texture map and the depth data of the corresponding depth map in the image combination, frame image reconstruction is performed based on the acquired virtual viewpoint path, and corresponding multi-angle free-view video data is obtained.
  • the depth map correction method described in the foregoing embodiments of this specification may be used to correct the depth map of the estimated depth map of the corresponding viewpoint.
  • the server can be placed in the cloud, and in order to process data in parallel more quickly, a server cluster A3 in the cloud can be composed of multiple different servers or server groups according to different data processed.
  • the cloud server cluster A3 may include: a first cloud server A31, a second cloud server A32, a third cloud server A33, and a fourth cloud server A34.
  • the first cloud server A31 can be used to determine the corresponding parameter data of the image combination;
  • the second cloud server A32 can be used to determine the estimated depth map of the original texture map of each viewpoint in the image combination and perform depth map correction processing
  • the third cloud server A33 can be based on the position information of the virtual viewpoint, based on the corresponding parameter data of the image combination, the texture map and the depth map of the image combination, use the virtual viewpoint reconstruction based on the depth map (Depth Image Based Rendering, DIBR ) algorithm to reconstruct frame images to obtain images of virtual viewpoints;
  • the fourth cloud server A34 can be used to generate free viewpoint videos (multi-angle free viewpoint videos).
  • first cloud server A31, the second cloud server A32, the third cloud server A33, and the fourth cloud server A34 may also be a server group composed of a server array or a server sub-cluster, which is not required in this embodiment of the present invention. limit.
  • the playback control device A4 can insert the received free-view video frame into the to-be-played video stream, and the playback terminal A5 receives the to-be-played video stream from the playback control device A4 and plays it in real time.
  • the playback control device 34 may be a manual playback control device or a virtual playback control device.
  • a dedicated server that can automatically switch video streams can be set up as a virtual playback control device to control the data source.
  • a broadcast director control device such as a broadcast director station, can be used as a playback control device in the embodiment of the present invention.
  • the data processing device A2 can be placed in the on-site non-collection area or the cloud according to the specific situation, and the server (cluster) and playback control device can be placed in the on-site non-collection area according to the specific situation, and the cloud or terminal access.
  • this embodiment is not intended to limit the specific implementation and protection scope of the present invention.
  • inventions of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are performed.
  • computer instructions when executed, the steps of the methods described in any of the foregoing embodiments are performed.
  • the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

视频压缩方法、解压方法、装置、电子设备及存储介质,其中,所述视频压缩方法包括:获取自由视点视频帧,所述视频帧包括同步的多个视点的纹理图和对应视点的深度图形成的拼接图像;识别所述拼接图像中的纹理图区域和深度图区域;对于所述纹理图区域,采用预设的第一编码方式进行编码,对于所述深度图区域,采用预设的第二编码方式进行编码,得到压缩视频帧;其中,所述第二编码方式为基于感兴趣区域的编码方式。上述方案能够减少图像的压缩损失,进而可以提高自由视点视频的图像质量。

Description

视频压缩方法、解压方法、装置、电子设备及存储介质
本申请要求2020年07月28日递交的申请号为202010740743.8、发明名称为“视频压缩方法、解压方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书实施例涉及视频处理技术领域,尤其涉及一种视频压缩方法、解压方法、装置、电子设备及存储介质。
背景技术
视频数据是支持视频播放,以供用户观看的数据,通常,视频数据仅支持用户从一个视角进行观看,用户无法调整观看的视角。
自由视点视频是一种能够提供高自由度观看体验的技术,用户可以在观看过程中通过交互操作,调整观看视角,从想观看的自由视点角度进行观看,从而可以大幅提升观看体验。
目前,为实现自由视点视频的播放,目前,存在一种基于纹理图和深度图拼接的自由视点视频生成方法,具体而言,将同步的多个视角的纹理图和对应视角的深度图进行拼接,将多个帧时刻的拼接图像统一压缩传输到终端进行解码,以及基于深度图的虚拟视点重建(Depth Image Based Rendering,DIBR)。
然而,采用上述基于纹理图和深度图拼接所得到的自由视点视频,采用目前的编码方法会导致拼接图像存在很大的压缩损失,进而对终端重建的自由视点视频的图像质量产生很大影响。
发明内容
有鉴于此,本说明书实施例提供视频压缩方法、解压方法、装置、设备及存储介质,能够减少图像的压缩损失,进而可以提高自由视点视频的图像质量。
本说明书实施例提供了一种视频压缩方法,包括:
获取自由视点视频帧,所述视频帧包括同步的多个视点的纹理图和对应视点的深度图形成的拼接图像;
识别所述拼接图像中的纹理图区域和深度图区域;
对于所述纹理图区域,采用预设的第一编码方式进行编码,对于所述深度图区域,采用预设的第二编码方式进行编码,得到压缩视频帧;其中,所述第二编码方式为基于感兴趣区域的编码方式。
可选地,所述对于所述深度图区域,采用预设的第二编码方式进行编码,包括:
对于所述深度图区域中的ROI像素区域,基于预设的恒定量化参数进行编码。
可选地,所述对于所述深度图区域,采用预设的第二编码方式进行编码,包括:
对于所述深度图区域中包含的各视点对应的深度图子区域中的前景边缘像素区域采 用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,所述第一恒定量化参数的参数值小于所述第二恒定量化参数的参数值。
可选地,所述对于所述深度图区域,采用预设的第二编码方式进行编码,包括:
对于所述深度图区域中的ROI像素区域,采用预设的恒定质量因子进行编码。
可选地,所述纹理图区域包括特效渲染像素区域;所述对于所述纹理图区域,采用预设的第一编码方式进行编码,包括:
对于所述纹理图区域中的特效渲染像素区域,采用适于增强现实特效的编码方式进行编码。
可选地,所述方法还包括:对于所述拼接图像,采用恒定码率。
可选地,所述拼接图像中深度图子区域的像素点为与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的全部或部分像素点。
可选地,所述纹理图区域包括多个与视点对应的纹理图子区域,所述深度图区域包括多个与视点对应的深度图子区域,所述深度图子区域小于所述纹理图子区域。
可选地,所述拼接图像中的深度图子区域为对与对应视点的纹理图中的像素点一一对应的原始深度图进行降采样后得到。
本说明书实施例还提供了一种视频解压方法,包括:
获取压缩视频帧和所述压缩视频帧中像素块的量化参数,所述压缩视频帧中包括采用第一编码方式进行编码的纹理图区域和采用第二编码方式编码的深度图区域,其中,所述纹理图区域包括同步的多个视点的纹理图,所述深度图区域包括各纹理图对应视点的深度图,所述第二编码方式为基于感兴趣区域的编码方式;
基于所述压缩视频帧中的像素块的量化参数,对所述压缩视频帧按照像素块进行解压缩处理,得到自由视点视频帧的拼接图像,所述拼接图像包括所述同步的多个视点的纹理图区域和对应视点的深度图区域。
本说明书实施例还提供了一种视频压缩装置,包括:
视频帧获取单元,适于获取自由视点视频帧,所述视频帧包括多个视点的纹理图和对应视点的深度图形成的拼接图像;
识别单元,适于识别所述拼接图像中的纹理图区域和深度图区域;
第一编码单元,对于所述纹理图区域,采用预设的第一编码方式进行编码;
第二编码单元,对于所述深度图区域,采用预设的第二编码方式进行编码,所述第二编码方式为基于感兴趣区域的编码方式。
本说明书实施例还提供了一种视频解压装置,包括:
获取单元,适于获取压缩视频帧和所述压缩视频帧中像素块的量化参数,所述压缩视频帧中包括采用第一编码方式进行编码的纹理图区域和采用第二编码方式编码的深度 图区域,其中,所述纹理图区域包括同步的多个视点的纹理图,所述深度图区域包括各纹理图对应视点的深度图,所述第二编码方式为基于感兴趣区域的编码方式;
解压缩单元,适于基于所述压缩视频帧中的像素块的量化参数,对所述压缩视频帧按照像素块进行解压缩处理,得到自由视点视频帧的拼接图像,所述拼接图像包括所述同步的多个视点的纹理图区域和对应视点的深度图区域。
本说明书实施例还提供了一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行前述任一实施例所述方法的步骤。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行前述任一实施例所述方法的步骤。
与现有技术相比,本说明书实施例的技术方案具有以下有益效果:
采用本说明书实施例的视频压缩方法,对于视频帧的拼接图像中的纹理图区域和深度图区域分别采用不同的编码方式进行压缩,其中,对于深度图区域采用基于感兴趣区域的编码方式进行压缩,能够按照深度图区域的图像特征进行压缩,从而可以减少深度图的压缩损失,进而可以提高基于深度图重建得到的自由视点视频的图像质量。
进一步地,对于所述深度图区域中的ROI像素区域,基于预设的恒定量化参数进行编码,可以减少所述ROI像素区域的压缩损失,在实现深度图区域尽可能压缩的基础上,可以减少拼接图像的深度图区域的压缩损失。
进一步地,对于所述深度图区域中包含的各视点对应的深度图子区域中的前景边缘像素区域采用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,由于深度图子区域中的前景边缘像素区域对于自由视点视频的重建质量非常关键,故对于深度图子区域中的前景边缘像素区域采用的第一恒定量化参数的参数值小于对于所述深度图子区域中非前景边缘像素区域采用的第二量化参数的参数值,可以减小深度图区域的压缩损失,进而可以提高重建得到的自由视点视频的重建质量。
附图说明
图1是本说明书实施例中一种自由视点视频展示的具体应用系统示意图;
图2是本说明书实施例中一种终端设备交互界面示意图;
图3是本说明书实施例中一种采集设备设置方式的示意图;
图4是本说明书实施例中另一种终端设备交互界面示意图;
图5是本说明书实施例中一种自由视点视频数据生成过程的示意图;
图6是本说明书实施例中一种6DoF视频数据的生成及处理的示意图;
图7是本说明书实施例中一种数据头文件的结构示意图;
图8是本说明书实施例中一种用户侧对6DoF视频数据处理的示意图;
图9是本说明书实施例中一种视频帧的拼接图像的结构示意图;
图10是本说明书实施例中一种视频压缩方法的流程图;
图11是本说明书实施例中另一种拼接图像的结构示意图;
图12是本说明书实施例中另一种拼接图像的结构示意图;
图13是本说明书实施例中另一种拼接图像的结构示意图;
图14是本说明书实施例中另一种拼接图像的结构示意图;
图15是本说明书实施例中另一种拼接图像的结构示意图;
图16是本说明书实施例中一种纹理图的像素数据分布的示意图;
图17是本说明书实施例中另一种纹理图的像素数据分布的示意图;
图18是本发明实施例中一种拼接图像中数据存储的示意图;
图19是本发明实施例中另一种拼接图像中数据存储的示意图;
图20是本说明书实施例中一具体应用场景的视频压缩方法的示意图;
图21是本说明书实施例中一种视频解压方法的流程图;
图22是本说明书实施例中一种视频压缩装置的结构示意图;
图23是本说明书实施例中一种视频解压装置的结构示意图;
图24是本说明书实施例中一种电子设备的结构示意图;
图25是本说明书实施例中一种视频处理系统的结构示意图。
具体实施方式
为使本领域技术人员更好地理解和实施本说明书中的实施例,以下首先结合附图及具体应用场景对自由视点视频的实现方式进行示例性介绍。
参考图1,本发明实施例中一种自由视点视频展示的具体应用系统,可以包括多个采集设备的采集系统11、服务器12和显示设备13,其中采集系统11,可以对待观看区域进行图像采集;采集系统11或者服务器12,可以对获取到的同步的多个纹理图进行处理,生成能够支持显示设备13进行虚拟视点切换的多角度自由视角数据。显示设备13可以展示基于多角度自由视角数据生成的重建图像,重建图像对应于虚拟视点,根据用户指示可以展示对应于不同虚拟视点的重建图像,切换观看的位置和观看角度。
在具体实现中,进行图像重建,得到重建图像的过程可以由显示设备13实施,也可以由位于内容分发网络(Content Delivery Network,CDN)的设备以边缘计算的方式实施。可以理解的是,图1仅为示例,并非对采集系统、服务器、终端设备以及具体实现方式的限制。
继续参考图1,用户可以通过显示设备13对待观看区域进行观看,在本实施例中,待观看区域为篮球场。如前所述,观看的位置和观看角度是可以切换的。
举例而言,用户可以在屏幕滑动,以切换虚拟视点。在本发明一实施例中,结合参考图2,用户手指沿D 22方向滑动屏幕时,可以切换进行观看的虚拟视点。继续参考图3, 滑动前的虚拟视点的位置可以是VP 1,滑动屏幕切换虚拟视点后,虚拟视点的位置可以是VP 2。结合参考图4,在滑动屏幕后,屏幕展示的重建图像可以如图4所示。重建图像,可以是基于由实际采集情境中的多个采集设备采集到的图像生成的多角度自由视角数据进行图像重建得到的。
可以理解的是,切换前进行观看的图像,也可以是重建图像。重建图像可以是视频流中的帧图像。另外,根据用户指示切换虚拟视点的方式可以是多样的,在此不做限制。
在具体实施中,视点可以用6自由度(Degree of Freedom,DoF)的坐标表示,其中,视点的空间位置可以表示为(x,y,z),视角可以表示为三个旋转方向
Figure PCTCN2021107507-appb-000001
相应地,基于6自由度的坐标,可以确定虚拟视点,包括位置和视角。
虚拟视点是一个三维概念,生成重建图像需要三维信息。在一种具体实现方式中,多角度自由视角数据中可以包括深度图数据,用于提供平面图像外的第三维信息。相比于其它实现方式,例如通过点云数据提供三维信息,深度图数据的数据量较小。
在本发明实施例中,虚拟视点的切换可以在一定范围内进行,该范围即为多角度自由视角范围。也即,在多角度自由视角范围内,可以任意切换虚拟视点的位置以及视角。
多角度自由视角范围与采集设备的布置相关,采集设备的拍摄覆盖范围越广,则多角度自由视角范围越大。终端设备展示的画面质量,与采集设备的数量相关,通常,设置的采集设备的数量越多,展示的画面中空洞区域越少。
此外,多角度自由视角的范围与采集设备的空间分布相关。可以基于采集设备的空间分布关系设置多角度自由视角的范围以及在终端侧与显示设备的交互方式。
本领域技术人员可以理解的是,上述各实施例以及对应的附图仅为举例示意性说明,并非对采集设备的设置以及多角度自由视角范围之间关联关系的限定,也并非对交互方式以及显示设备展示效果的限定。
结合参照图5,为进行自由视点视频重建,需要进行纹理图的采集和深度图计算,包括了三个主要步骤,分别为多摄像机视频采集(Multi-camera Video Capturing),摄像机内外参计算(Camera Parameter Estimation),以及深度图计算(Depth Map Calculation)。对于多摄像机视频采集来说,要求各个摄像机采集的视频可以帧级对齐。其中,通过多摄像机的视频采集可以得到纹理图(Texture Image);通过摄像机内外参计算,可以得到摄像机参数(Camera Parameter),摄像机参数可以包括摄像机内部参数数据和外部参数数据;通过深度图计算,可以得到深度图(Depth Map),同步的多个视角的纹理图及对应视角的深度图和摄像机参数,形成6DoF视频数据。
在本说明书实施例方案中,并不需要特殊的摄像机,比如光场摄像机,来做视频的采集。同样的,也不需要在采集前先进行复杂的摄像机校准的工作。可以布局和安排多摄像机的位置,以更好的拍摄需要拍摄的物体或者场景。
在以上的三个步骤处理完后,就得到了从多摄像机采集来的纹理图,所有摄像机的 摄像机参数,以及每个摄像机的深度图。可以把这三部分数据称作为多角度自由视角视频数据中的数据文件,也可以称作6自由度视频数据(6DoF video data)。因为有了这些数据,用户端就可以根据虚拟的6自由度(Degree of Freedom,DoF)位置,来生成虚拟视点,从而提供6DoF的视频体验。
结合参考图6,6DoF视频数据以及指示性数据可以经过压缩和传输到达用户侧,用户侧可以根据接收到的数据,获取用户侧6DoF表达,也即前述的6DoF视频数据和元数据。其中,指示性数据也可以称作元数据(Metadata),其中,视频数据包括多摄像机对应的各视点的纹理图和深度图数据,纹理图和深度图可以按照一定的拼接规则或拼接模式进行拼接,形成拼接图像。
结合参考图7,元数据可以用来描述6DoF视频数据的数据模式,具体可以包括:拼接模式元数据(Stitching Pattern metadata),用来指示拼接图像中多个纹理图的像素数据以及深度图数据的存储规则;边缘保护元数据(Padding pattern metadata),可以用于指示对拼接图像中进行边缘保护的方式,以及其它元数据(Other metadata)。元数据可以存储于数据头文件,具体的存储顺序可以如图7所示,或者以其它顺序存储。
结合参考图8,用户侧得到了6DoF视频数据,其中包括了摄像机参数,拼接图像(纹理图以及深度图),以及描述元数据(元数据),除此之外,还有用户端的交互行为数据。通过这些数据,用户侧可以采用基于深度图的渲染(DIBR,Depth Image-Based Rendering)方式进行的6DoF渲染,从而在一个特定的根据用户交互行为产生的6DoF位置产生虚拟视点的图像,也即根据用户指示,确定与该指示对应的6DoF位置的虚拟视点。
参照图9所示的一种视频帧的拼接图像的结构示意图,在一些具体实施方式中,将同步的多个视点的纹理图和对应的深度图进行拼接,并对拼接后的拼接图像采用统一的编码方式进行压缩后传输到终端侧(如显示设备),在终端侧进行解码及DIBR的图像重建。
发明人经研究发现,由于传输带宽的限制,对拼接图像采用统一的编码方式进行压缩的方式会造成深度图的压缩损失,进而对DIBR重建得到的虚拟视点的图像质量会产生很大影响。
针对上述问题,本说明书实施例所采用的方案中,对于视频帧的拼接图像中的纹理图区域和深度图区域分别采用不同的编码方式进行压缩,其中,对于深度图区域采用基于感兴趣区域(Region of Interest,RoI)的编码方式进行压缩,能够按照深度图区域的图像特征进行压缩,从而可以减少深度图的压缩损失,进而可以基于深度图重建得到的自由视点视频的图像质量。
参照图10所示的视频解压方法的流程图,在具体实施中,对于自由视点视频帧,可以采用如下步骤进行视频压缩:
S101,获取自由视点视频帧,所述视频帧包括同步的多个视点的纹理图和对应视点的深度图形成的拼接图像。
在具体实施中,自由视点视频帧可以为纹理图和深度图拼接而成。如前实施例所述的多角度自由视角视频,例如6DoF视频帧,每一视频帧可以包括同步的多个视点的纹理图和对应视点的深度图拼接形成的拼接图像。在本说明书一实施例中,视频帧中包含的拼接图像的结构如图9所示,其中,包括同步的8个视点的纹理图和对应的8个视点的深度图。
S102,识别所述拼接图像中的纹理图区域和深度图区域。
在具体实施中,通过图像特征识别,或者通过获取拼接图像的拼接模式元数据等方式,可以识别得到所述拼接图像中的纹理图区域和深度图区域。如图9所示,可以识别得到拼接图像的上半区域为纹理图区域,下半区域为深度图区域。
S103,对于所述纹理图区域,采用预设的第一编码方式进行编码,对于所述深度图区域,采用预设的第二编码方式进行编码,得到压缩视频帧;其中,所述第二编码方式为基于感兴趣区域的编码方式。
在本说明书一些实施例中,对于所述深度图区域中的ROI像素区域,可以基于预设的恒定量化参数(Constant Quantization Parameter,CQP)进行编码。
如前所述,拼接图像由同步的多个视点的纹理图和对应视点的深度图拼接形成,其中,对于每个视点对应的纹理图区域可以称为相应视点的纹理图子区域,对于相应视点对应的深度图区域可以称为相应视点的深度图子区域。
发明人经研究发现,深度图子区域中的前景边缘像素区域的质量对于基于虚拟视点重建得到的图像质量是非常关键的。基于此,为提高重建得到的虚拟视点的图像质量,在本说明书一些实施例中,对于所述深度图区域中包含的各视点对应的深度图子区域中的前景边缘像素区域采用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,所述第一恒定量化参数的参数值小于所述第二恒定量化参数的参数值。
其中,量化参数(Quantization Parameter,QP)值越大,表示量化步长越大,编码后得到的压缩视频的图像质量越低,QP为0表示进行无损编码,采用CQP,瞬时码率会随着场景复杂度波动。
作为一具体示例,所述第一恒定量化参数取值为10,所述第二恒定量化参数取值为40。可以理解的是,所述第一恒定量化参数和所述第二恒定量化参数取值也可以取值为其他数值,只要所述第一恒定量化参数的参数值小于所述第二恒定量化参数的参数值。为保证重建得到的虚拟视点的图像质量,所述第一恒定量化参数的取值建议小于或等于22。
在具体实施中,所述第一恒定量化参数和所述第二恒定量化参数可以基于预设的像 素块大小进行取值。例如,像素块大小为3*3像素点;又如,像素块大小为32*32像素点。
在一些其他实施例中,也可以保持所述纹理图区域采用恒定量化参数,且可以与所述深度图区域中非前景边缘像素区域采用的量化参数相同,也可以不同。
在本说明书另一实施例中,对于所述拼接图像的纹理图区域、所述深度图区域的非前景边缘像素区域和前景边缘像素区域,均可以采用恒定量化参数,其中,所述深度图区域的前景边缘像素区域对应的量化参数取值最小,作为一可选示例,三者取值大小可以为依次递减。
在本说明书另一些实施例中,对于所述深度图区域中的ROI像素区域,可以采用预设的恒定质量因子(Constant Quality Factor,CQF)进行编码。与CQP类似,但CQF追求主观感知到的质量恒定,瞬时码率也会随着场景复杂度波动。
在具体实施中,对于所述拼接图像,可以采用恒定码率进行率失真优化。例如,可以在压缩装置如编码器或包含编码器的装置中,可以设置一码率控制模块,通过选择一系列编码参数,来控制编码视频的码率满足具体的应用需求,并且使编码失真尽可能地小。在本说明书一些实施例中,对于视频帧,可以控制整个拼接图像的码率系数恒定,与此同时,对于所述深度图区域中的前景边缘像素区域,可以采用第一恒定量化参数进行编码,对于所述深度图区域中的非前景边缘像素区域和所述纹理图区域,可以基于整个拼接图像的码率系数和所述深度图区域中的前景边缘像素区域所占比例,确定相应的量化参数值,所得到的量化参数值可能是固定的,也可能是瞬时变化的。
在具体实施中,所述拼接图像中深度图子区域的像素点可以为与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的全部或部分像素点。如图9所示,所述拼接图像中深度图的像素点为与对应视点的纹理图中的像素点一一对应的像素点集合中的全部像素点。其中,各对应视点的纹理图可以为相机采集到的原始纹理图,也可以是原始纹理图降低分辨率后的纹理图,相应地,各对应视点的深度图可以为基于相应视点的原始纹理图估计得到的原始深度图,也可以为原始深度图降低分辨率后的深度图。
在本说明书一些实施例中,所述纹理图区域包括多个与视点对应的纹理图子区域,所述深度图区域包括多个与视点对应的深度图子区域,所述深度图子区域小于所述纹理图子区域。如图11或图12所示的拼接图像的结构示意图,所述纹理图区域包括8个与视点对应的纹理图子区域,所述深度图区域包括8个与视点对应的深度图子区域,所述深度图子区域小于所述纹理图子区域。此外,拼接图像中纹理图区域和深度图区域可以划分为多个,如图12所示,深度图区域被划分为在所述纹理图之上的区域和所述纹理图之下的区域,即:纹理图区域是一个连续的区域,深度图区域包括两个连续的区域。
如图11和图12所示的拼接图像,其中,所述拼接图像中的深度图子区域可以为对与对应视点的纹理图中的像素点一一对应的原始深度图进行降采样后得到。
可以理解的是,图11或图12等并非对以非等分方式对拼接图像进行划分的限制,在具体实施中,拼接图像中纹理图和深度图的数量基于同步采集的视点的数量确定,拼接图像的像素量、长宽比可以是多样的,划分方式也可以是多样的。
可以理解的是,在具体实施中,深度图区域的像素数量也可以多于纹理图区域的像素数据占用的像素数量。
或者,参见图13和图14的拼接图像的结构示意图,纹理图区域可以包括两个连续的区域,深度图区域也可以包括两个连续的区域。纹理图区域与深度图区域可以间隔排布。
又或,参见图15,纹理图区域包括的纹理图子区域可以与深度图区域包括的深度图子区域间隔排布。纹理图区域包括的连续区域的数量,可以与纹理图子区域的数量相等,深度图区域包括的连续区域的数量,可以与深度图子区域的数量相等。
在具体实施中,对于每个纹理图的像素数据,可以按照像素点排布的顺序存储至所述纹理图子区域。对于每个纹理图对应的深度数据,也可以按照像素点排布的顺序存储至所述深度图子区域。
结合参考图16至图18,图16中以9个像素示意了纹理图1,图17中以9个像素示意了纹理图2,纹理图1和纹理图2是同步的不同视点(或者说是不同视角)的两个图像。根据纹理图1和纹理图2,可以得到对应纹理图1的深度数据,包括纹理图1深度值1至纹理图1深度值9,以及对应纹理图2的深度数据,包括纹理图2深度值1至纹理图2深度值9。
参见图18,在将纹理图1存储至图像子区域,可以按照像素点排布的顺序,将纹理图1存储至左上的图像子区域,也即,在图像子区域中,像素点的排布可以是与纹理图1相同的。将纹理图2存储至图像子区域,同样可以是以该方式存储至右上的图像子区域。
类似的,将纹理图1的深度数据存储至深度图子区域,可以是按照类似的方式,在深度值与纹理图的像素值一一对应的情况下,可以按照如图18中示出的方式存储。若深度值为对原始深度图进行降采样后得到的,则可以按照降采样后得到的深度图的像素点排布的顺序,存储至深度图子区域。
本领域技术人员可以理解,对图像进行压缩的压缩率,与图像中各个像素点的关联相关,关联性越强,压缩率越高。由于拍摄得到的图像是对应真实世界的,各个像素点的关联性较强,通过按照像素点排布的顺序,存储图像的像素数据以及深度数据,可以使得对拼接图像进行压缩时,压缩率更高,也即,可以使得在压缩前数据量相同的情况下在压缩后的数据量更小。
通过对拼接图像进行划分,划分为纹理图区域和深度图区域,在纹理图区域中多个纹理图子区域相邻,或者深度图区域中多个深度图子区域相邻的情况下,由于各个纹理 图子区域中存储的数据是不同视点(不同角度)对待观看区域进行拍摄的图像(纹理图)或视频中帧图像得到的,深度图区域中存储的均为深度图,故在对拼接图像进行压缩时,也可以获得更高的压缩率。
在具体实施中,可以对所述纹理图子区域和所述深度图子区域中的全部或部分进行边缘保护。边缘保护的形式可以是多样的,例如,以图14中视点1深度图为例,可以在原视点1深度图的周边,设置冗余的像素;或者也可以在保持原视点1深度图的像素数量不变,周边留出不存放实际像素数据的冗余像素,将原始视点1深度图缩小后存储至其余像素中;或者也可以以其它方式,最终使得视点1深度图与其周围的其它图像之间留出冗余像素。
由于拼接图像中包括多个纹理图以及深度图,各个图像相邻的边界的关联性较差,通过进行边缘保护,使得在对拼接图像进行压缩时,可以降低拼接图像中的纹理图以及深度图的质量损失。
在具体实施中,纹理图子区域的像素字段可以存储三通道数据,所述深度图子区域的像素字段可以存储单通道数据。纹理图子区域的像素字段用于存储同步的多个的纹理图中任一个纹理图的像素数据,像素数据通常为三通道数据,例如RGB数据或者YUV数据。
深度图子区域用于存储深度图的深度数据,若深度值为8位二进制数据,则可以采用像素字段的单通道进行存储,若深度值为16位二进制数据,则可以采用像素字段的双通道进行存储。或者,深度值为也可以采用更大的像素区域进行存储。例如,若同步的多个纹理图均为1920*1080的图像,深度值为16位二进制数据,也可以将深度值存储至2倍的1920*1080深度图区域,每个纹理图区域均存储为单通道。拼接图像也可以结合该具体存储方式进行划分。
拼接图像的未经压缩的数据量,按照每个像素的每个通道占用8bit的方式进行存储,可以按照如下公式计算:同步的多个纹理图的数量*(纹理图的像素数据的数据量+深度图的像素的数据量)。
若原始纹理图为1080P的分辨率,也即1920*1080像素,逐行扫描的格式,原始深度图也可以占用1920*1080像素,为单通道。则原始纹理图的像素数据量为:1920*1080*8*3bit,原始深度图的数据量为1920*1080*8bit,若相机数量为30个,则拼接图像的像素数据量为30*(1920*1080*8*3+1920*1080*8)bit,约为237M,若不经压缩,则占用系统资源较多,延时较大。特别是带宽较小的情况下,例如带宽为1Mbps时,一个未经压缩的拼接图像需要约237s进行传输,实时性较差,用户体验有待提升。
通过规律性的存储以获得更高的压缩率,对原始纹理图降低分辨率,或者以降低分辨率后的像素数据作为纹理图的像素数据,或者对原始深度图中的一个或多个进行降采样等方式中的一种或者多种,可以减少拼接图像的数据量。
例如,若原始纹理图的分辨率为4K的分辨率,即4096*2160的像素分辨率,降采样为540P分辨率,也即960*540的像素分辨率,则拼接图像的像素个数约为降采样前的十六分之一。结合上述其它减少数据量方式中的任一种或多种,可以使得数据量更少。
可以理解的是,若带宽支持,且进行数据处理的设备的解码能力可以支持更高分辨率的拼接图像,则也可以生成分辨率更高的拼接图像,以提升画质。
本领域技术人员可以理解的是,在不同的应用场景中,同步的多个纹理图的像素数据以及深度数据,也可以以其它的方式存储,例如,以像素点为单位存储至拼接图像。参见图16、图17和图19,对于图16和图17所示的纹理图1和纹理图2,可以以图19的方式存储至拼接图像。
综上,多个视点的纹理图的像素数据以及对应视点深度图的深度数据可以存储至拼接图像,拼接图像可以以多种方式划分为纹理图区域以及深度图区域,或者也可以不进行划分,以预设的顺序存储纹理图的像素数据以及深度数据。
在具体实施中,同步的多个纹理图也可以是解码多个视频得到的同步的多个帧图像。视频可以是通过多个摄像机获取的,其设置可以与前文中获取纹理图的相机相同或类似。
前述实施例中存储为拼接图像的方式,例如图9及图11至图19示意出的存储方式,均可以有相应的关联关系字段描述,以使得进行数据处理的设备可以根据关联关系字段获取相关联的纹理图以及深度图数据。
在具体实施中,拼接图像的图片格式可以是BMP、PNG、JPEG、Webp等图像格式中的任一种,或者也可以是其它图像格式。自由视点视频数据(多角度自由视角图像数据)中像素数据和深度数据的存储方式并不仅限制于拼接图像的方式。可以以其他方式进行存储,也可以有相应的关联关系字段描述。
参照图20所示的一具体应用场景的视频压缩方法的示意图,对于获取到的任一自由视点视频帧种的拼接图像I,采用前述步骤S102所述方法,可以识别所述拼接图像I中的纹理图区域Zt和深度图区域Zd。进而,对于所述纹理图区域Zt,采用预设的第一编码方式进行编码,可以得到压缩纹理图区域Zt’,对于所述深度图区域Zd,采用预设的第二编码方式进行编码,可以得到压缩深度图区域Zd’,其中,所述第二编码方式为基于感兴趣区域的编码方式;基于所述压缩纹理图区域Zt’和所述压缩深度图区域Zd’,可以得到压缩视频帧I’。
经上述视频编压缩方法进行压缩后得到的压缩视频帧传输至终端侧(如显示设备)后,可以采用相应的解压装置进行解压缩处理,并基于解压后得到的拼接图像进行自由视点视频图像的重建。
参照图21所示的解压缩方法的流程图,对于获取到的采用前述实施例方式进行压缩处理所得到的压缩视频帧,具体可以采用如下方式相应地进行解压缩处理:
S211,获取压缩视频帧和所述压缩视频帧中像素块的量化参数,所述压缩视频帧中 包括采用第一编码方式进行编码的纹理图区域和采用第二编码方式编码的深度图区域,所述第二编码方式为基于感兴趣区域的编码方式。
其中,所述纹理图区域包括同步的多个视点的纹理图,所述深度图区域包括各纹理图对应视点的深度图。
S212,基于所述压缩视频帧中的像素块的量化参数,对所述压缩视频帧按照像素块进行解压缩处理,得到自由视点视频帧的拼接图像,所述拼接图像包括所述同步的多个视点的纹理图区域和对应视点的深度图区域。
由于所述压缩视频帧中的纹理图区域和深度图区域采用了不同的编码方式,因而相应的像素块中的量化参数会有差异,故可以基于所述压缩视频帧中像素块的量化参数,对所述压缩视频帧按照像素块进行解压缩处理,从而可以得到自由视点视频帧的拼接图像。
采用上述方式对压缩视频帧进行解压缩处理,由于所述压缩视频帧中包括采用第一编码方式进行编码的纹理图区域和采用第二编码方式编码的深度图区域,所述第二编码方式为基于感兴趣区域的编码方式进行编码,能够按照深度图区域的图像特征进行压缩,因此解压缩后得到的深度图区域具有较高的图像质量,故可以提高基于深度图重建得到的自由视点视频的图像质量。
采用上述实施例方法得到自由视点视频帧后,可以基于获取到的自由视点以及所述自由视点视频帧对应的参数数据,基于所述自由视点视频帧中相应视点的纹理图和深度图实现所述虚拟视点的图像重建。
在具体实施中,可以基于自由视点,选择所述自由视点视频帧中部分视点的纹理图和对应的深度图,并采用所选择的部分视点的纹理图和对应的深度图进行组合渲染,得到所述虚拟视点的重建图像。
在本说明书一些实施例中,可以基于虚拟视点与所述自由视点视频帧中各视点的空间位置关系确定,具体可以基于所述自由视点视频帧对应的参数数据确定。
在具体实施中,为丰富用户视觉体验,可以在重建得到的自由视点图像中植入增强现实(Augmented Reality,AR)特效。为实现增强现实特效的植入,需要确定纹理图区域中的特效渲染像素区域,以及待植入的虚拟信息图像。
其中,对于特效渲染像素区域,可以基于用户交互生成,也可以基于某些预设触发条件或第三方指令确定。
对于待植入的虚拟信息图像,可以直接提供,也可以基于交互实时生成。在确定特效渲染像素区域后,可以获取对应的增强现实特效输入数据,进而生成匹配的虚拟信息图像。
在具体实施中,可以通过三维标定得到特效渲染像素区域,进而获取与所述特效渲染像素区域匹配的虚拟信息图像。为标定所述特效渲染像素区域,可以与所述特效渲染 像素区域对应的虚拟渲染目标对象在重建得到的图像中的位置确定。
通过标定得到与所述特效渲染像素区域中的虚拟渲染对象位置匹配的虚拟信息图像,从而可以使得到的虚拟信息图像与所述虚拟渲染目标对象在三维空间中的位置更加匹配,进而所展示的虚拟信息图像更加符合三维空间中的真实状态,因而所展示的合成图像更加真实生动,增强用户的视觉体验。
在具体实施中,可以基于虚拟渲染目标对象的增强现实特效输入数据,按照预设的特效生成方式,生成所述目标对象对应的虚拟信息图像。具体可以采用多种特效生成方式。
例如,可以将所述目标对象的增强现实特效输入数据输入至预设的三维模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像;
又如,可以将所述虚拟渲染目标对象的增强现实特效输入数据,输入至预设的机器学习模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像。
在本说明书实施例中,在视频压缩过程中,为避免植入特效的区域的纹理图区域的压缩损失,可以对纹理图区域进行基于ROI的编码压缩。在本说明书一实施例中,对于所述纹理图区域中的特效渲染像素区域,错用适于增强现实特效的特殊编码方式进行编码。
本说明书实施例还提供了相应的视频压缩装置的实施例,如图22所示的视频压缩装置的结构示意图,其中,视频压缩装置220可以包括:视频帧获取单元221、识别单元222、第一编码单元223和第二编码单元224,其中:
所述视频帧获取单元221,适于获取自由视点视频帧,所述视频帧包括多个视点的纹理图和对应视点的深度图形成的拼接图像;
所述识别单元222,适于识别所述拼接图像中的纹理图区域和深度图区域;
所述第一编码单元223,对于所述纹理图区域,采用预设的第一编码方式进行编码;
所述第二编码单元224,对于所述深度图区域,采用预设的第二编码方式进行编码,所述第二编码方式为基于感兴趣区域的编码方式。
采用上述视频压缩装置,对于视频帧的拼接图像中的纹理图区域和深度图区域分别采用不同的编码方式进行压缩,其中,对于深度图区域采用基于感兴趣区域的编码方式进行压缩,能够按照深度图区域的图像特征进行压缩,从而可以减少深度图的压缩损失,进而可以基于深度图重建得到的自由视点视频的图像质量。
在具体实施中,视频压缩装置可以以一个硬件编码器、编码芯片或者多个模块所组成的编码模组实现,所述硬件编码器、编码芯片或编码模组中可以设置专门的硬件逻辑器件或/和软件算法,以对获取到的自由视点视频帧进行压缩处理,得到压缩视频帧。
本说明书实施例还提供了与视频压缩装置对应的视频解压装置,参照图23所示的视频解压装置的结构示意图,在本说明书一些实施例中,如图23所示,视频解压装置230可以包括获取单元231、解压缩单元232,其中:
所述获取单元231,适于获取压缩视频帧和所述压缩视频帧中像素块的量化参数,所述压缩视频帧中包括采用第一编码方式进行编码的纹理图区域和采用第二编码方式编码的深度图区域,其中,所述纹理图区域包括同步的多个视点的纹理图,所述深度图区域包括各纹理图对应视点的深度图,所述第二编码方式为基于感兴趣区域的编码方式;
所述解压缩单元232,适于基于所述压缩视频帧的中像素块的量化参数,对所述压缩视频帧按照像素块进行解压缩处理,得到自由视点视频帧的拼接图像,所述拼接图像包括所述同步的多个视点的纹理图区域和对应视点的深度图区域。
在具体实施中,视频解压装置可以以一个硬件解码器、解码芯片或者多个模块所组成的解码模组实现,所述硬件解码器、解码芯片或解码模组中可以设置专门的硬件逻辑器件或/和软件算法,以对获取到的自由视点压缩视频帧进行解压缩处理,得到自由视点视频帧。
在本说明书实施例中,在视频压缩或者解压缩过程中,除了可以基于系统的识别结果自动选择编码方式或者解码方式外,具体的编码方式及解码方式还可以基于用户交互选取。具体地,响应于编码选择交互操作,可以生成编码选择指令,进而可以触发相应的编码单元(如具体的编码器、编码模块、编码芯片、编码软件、编码系统等其中一种或多种)进行编码;类似地,响应于解码选择交互操作,可以生成解码选择指令,进而可以触发相应的解码单元(如具体的解码器、解码模块、解码芯片、解码软件、解码系统等其中一种或多种)进行解码。
本说明书实施例还提供了一种电子设备,参照图24所示的电子设备的结构示意图,其中,电子设备240可以包括存储器241和处理器242,所述存储器241上存储有可在所述处理器242上运行的计算机指令,其中,所述处理器运行所述计算机指令时可以执行前述任一实施例所述方法的步骤。
基于所述电子设备在整个视频处理系统所处位置,所述电子设备还可以包括其他的电子部件或组件。
例如,继续参照图24,电子设备240还可以包括通信组件243,所述通信组件可以与采集系统或云端服务器通信,获得用于生成自由视点视频帧的同步的多个视点的纹理图,或者获取采用本说明书前述实施例方法进行压缩处理后得到的自由视点压缩视频帧进而可以由处理器242基于通信组件243获取到的压缩视频帧,进行解压缩处理,以及根据虚拟视点位置,进行自由视点视频重建。
又如,在某些电子设备中,继续参照图24,电子设备240还可以包括显示组件244(如显示器、触摸屏、投影仪),以对重建得到的虚拟视点的图像进行显示。
作为服务端设备,所述电子设备可以设置为云端服务器或服务器集群,或者作为本地服务器对自由视点视频帧在传输前进行压缩处理。作为终端设备,所述电子设备具体可以是手机等手持电子设备、笔记本电脑、台式电脑、机顶盒等具有视频处理及播放功能的电子设备,采用所述终端设备,可以对接收到的压缩视频帧进行解压缩处理,并基于解压缩后的视频帧,以及获取到的虚拟视点,重建得到所述虚拟视点的图像。其中,所述虚拟视点可以基于用户交互行为确定,或者基于预先设置的虚拟视点位置信息确定。
在本说明书一些实施例中,存储器、处理器、通信组件和显示组件之间可以通过总线网络进行通信。
在具体实施中,如图24所示,通信组件243和显示组件244等可以为设置在所述电子设备240内部的组件,也可以为通过扩展接口、扩展坞、扩展线等扩展组件连接的外接设备。
在具体实施中,所述处理器242可以通过中央处理器(Central Processing Unit,CPU)(例如单核处理器、多核处理器)、CPU组、图形处理器(Graphics Processing Unit,GPU)、AI芯片、FPGA芯片等其中任意一种或多种协同实施。
在具体实施中,对于大量的自由视点视频帧,为减小处理时延,也可以采用多个电子设备组成的电子设备集群协同实施。
为使本领域技术人员更好地理解和实施,以下以一个具体的应用场景进行说明。参照图25所示的视频处理系统的结构示意图,如图25所示,为一种应用场景中视频处理系统的结构示意图,其中,示出了一场篮球赛的数据处理系统的布置场景,所述视频处理系统A0包括由多个采集设备组成的采集阵列A1、数据处理设备A2、云端的服务器集群A3、播放控制设备A4,播放终端A5和交互终端A6。
参照图25,以左侧的篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列A1中各采集设备可以根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频数据流。
在具体实施中,采集设备还可以设置在篮球场馆的顶棚区域、篮球架上等。各采集设备可以沿直线、扇形、弧线、圆形或者不规则形状排列分布。具体排列方式可以根据具体的现场环境、采集设备数量、采集设备的特点、成像效果需求等一种或多种因素进行设置。所述采集设备可以是任何具有摄像功能的设备,例如,普通的摄像机、手机、专业摄像机等。
而为了不影响采集设备工作,所述数据处理设备A2可以置于现场非采集区域,可视为现场服务器。所述数据处理设备A2可以通过无线局域网向所述采集阵列A1中各采集设备分别发送拉流指令,所述采集阵列A1中各采集设备基于所述数据处理设备A2发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备A3。其中,所述采集 阵列A1中各采集设备可以通过交换机A7将获得的视频数据流实时传输至所述数据处理设备A2。采集阵列A1和交换机A7一起形成采集系统。
当所述数据处理设备A2接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧的帧图像,并将获得的所述指定帧时刻的多个同步视频帧上传至云端的服务器集群A3。
相应地,云端的服务器集群A3将接收的多个同步视频帧的原始纹理图作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各原始纹理图对应的估计深度图,并基于所述图像组合相应的参数数据、所述图像组合中纹理图的像素数据和对应深度图的深度数据,基于获取到的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据。
其中,作为深度图后处理环节,可以采用本说明书前述实施例介绍的深度图校正方法对相应视点的估计深度图进行深度图校正。
服务器可以置于云端,并且为了能够更快速地并行处理数据,可以按照处理数据的不同,由多个不同的服务器或服务器组组成云端的服务器集群A3。
例如,所述云端的服务器集群A3可以包括:第一云端服务器A31,第二云端服务器A32,第三云端服务器A33,第四云端服务器A34。其中,第一云端服务器A31可以用于确定所述图像组合相应的参数数据;第二云端服务器A32可以用于确定所述图像组合中各视点的原始纹理图的估计深度图以及进行深度图校正处理;第三云端服务器A33可以根据虚拟视点的位置信息,基于所述图像组合相应的参数数据、所述图像组合的纹理图和深度图,使用基于深度图的虚拟视点重建(Depth Image Based Rendering,DIBR)算法,进行帧图像重建,得到虚拟视点的图像;所述第四云端服务器A34可以用于生成自由视点视频(多角度自由视角视频)。
可以理解的是,所述第一云端服务器A31、第二云端服务器A32、第三云端服务器A33、第四云端服务器A34也可以为服务器阵列或服务器子集群组成的服务器组,本发明实施例不做限制。
然后,播放控制设备A4可以将接收到的自由视点视频帧插入待播放视频流中,播放终端A5接收来自所述播放控制设备A4的待播放视频流并进行实时播放。其中,播放控制设备34可以为人工播放控制设备,也可以为虚拟播放控制设备。在具体实施中,可以设置专门的可以自动切换视频流的服务器作为虚拟播放控制设备进行数据源的控制。导播控制设备如导播台可以作为本发明实施例中的一种播放控制设备。
可以理解的是,所述数据处理设备A2可以根据具体情景置于现场非采集区域或云端,所述服务器(集群)和播放控制设备可以根据具体情景置于现场非采集区域,云端或者终端接入侧,本实施例并不用于限制本发明的具体实现和保护范围。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,其中, 所述计算机指令运行时执行前述任一实施例所述方法的步骤,具体可以参见前述实施例介绍,此处不再赘述。
在具体实施中,所述计算机可读存储介质可以是光盘、机械硬盘、固态硬盘等各种适当的可读存储介质。
虽然本说明书实施例披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本说明书实施例的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (14)

  1. 一种视频压缩方法,其中,包括:
    获取自由视点视频帧,所述视频帧包括同步的多个视点的纹理图和对应视点的深度图形成的拼接图像;
    识别所述拼接图像中的纹理图区域和深度图区域;
    对于所述纹理图区域,采用预设的第一编码方式进行编码,对于所述深度图区域,采用预设的第二编码方式进行编码,得到压缩视频帧;其中,所述第二编码方式为基于感兴趣区域的编码方式。
  2. 根据权利要求1所述的方法,其中,所述对于所述深度图区域,采用预设的第二编码方式进行编码,包括:
    对于所述深度图区域中的ROI像素区域,基于预设的恒定量化参数进行编码。
  3. 根据权利要求2所述的方法,其中,所述对于所述深度图区域,采用预设的第二编码方式进行编码,包括:
    对于所述深度图区域中包含的各视点对应的深度图子区域中的前景边缘像素区域采用第一恒定量化参数进行编码,对于各视点对应的深度图子区域中非前景边缘像素区域采用第二恒定量化参数进行编码,所述第一恒定量化参数的参数值小于所述第二恒定量化参数的参数值。
  4. 根据权利要求1所述的方法,其中,所述对于所述深度图区域,采用预设的第二编码方式进行编码,包括:
    对于所述深度图区域中的ROI像素区域,采用预设的恒定质量因子进行编码。
  5. 根据权利要求1所述的方法,其中,所述纹理图区域包括特效渲染像素区域;所述对于所述纹理图区域,采用预设的第一编码方式进行编码,包括:
    对于所述纹理图区域中的特效渲染像素区域,采用适于增强现实特效的编码方式进行编码。
  6. 根据权利要求1至5任一项所述的方法,其中,还包括:
    对于所述拼接图像,采用恒定码率。
  7. 根据权利要求1至5任一项所述的方法,其中,所述拼接图像中深度图子区域的像素点为与对应视点的纹理图子区域中的像素点一一对应的像素点集合中的全部或部分像素点。
  8. 根据权利要求7所述的方法,其中,所述纹理图区域包括多个与视点对应的纹理图子区域,所述深度图区域包括多个与视点对应的深度图子区域,所述深度图子区域小于所述纹理图子区域。
  9. 根据权利要求8所述的方法,其中,所述拼接图像中的深度图子区域为对与对应视点的纹理图中的像素点一一对应的原始深度图进行降采样后得到。
  10. 一种视频解压方法,其中,包括:
    获取压缩视频帧和所述压缩视频帧中像素块的量化参数,所述压缩视频帧中包括采用第一编码方式进行编码的纹理图区域和采用第二编码方式编码的深度图区域,其中,所述纹理图区域包括同步的多个视点的纹理图,所述深度图区域包括各纹理图对应视点的深度图,所述第二编码方式为基于感兴趣区域的编码方式;
    基于所述压缩视频帧中的像素块的量化参数,对所述压缩视频帧按照像素块进行解压缩处理,得到自由视点视频帧的拼接图像,所述拼接图像包括所述同步的多个视点的纹理图区域和对应视点的深度图区域。
  11. 一种视频压缩装置,其中,包括:
    视频帧获取单元,适于获取自由视点视频帧,所述视频帧包括多个视点的纹理图和对应视点的深度图形成的拼接图像;
    识别单元,适于识别所述拼接图像中的纹理图区域和深度图区域;
    第一编码单元,对于所述纹理图区域,采用预设的第一编码方式进行编码;
    第二编码单元,对于所述深度图区域,采用预设的第二编码方式进行编码,所述第二编码方式为基于感兴趣区域的编码方式。
  12. 一种视频解压装置,其中,包括:
    获取单元,适于获取压缩视频帧和所述压缩视频帧中像素块的量化参数,所述压缩视频帧中包括采用第一编码方式进行编码的纹理图区域和采用第二编码方式编码的深度图区域,其中,所述纹理图区域包括同步的多个视点的纹理图,所述深度图区域包括各纹理图对应视点的深度图,所述第二编码方式为基于感兴趣区域的编码方式;
    解压缩单元,适于基于所述压缩视频帧中的像素块的量化参数,对所述压缩视频帧按照像素块进行解压缩处理,得到自由视点视频帧的拼接图像,所述拼接图像包括所述同步的多个同步视点的纹理图区域和对应视点的深度图区域。
  13. 一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行权利要求1至9任一项或权利要求10所述方法的步骤。
  14. 一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行权利要求1至9任一项或权利要求10所述方法的步骤。
PCT/CN2021/107507 2020-07-28 2021-07-21 视频压缩方法、解压方法、装置、电子设备及存储介质 WO2022022348A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010740743.8 2020-07-28
CN202010740743.8A CN114007059A (zh) 2020-07-28 2020-07-28 视频压缩方法、解压方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022022348A1 true WO2022022348A1 (zh) 2022-02-03

Family

ID=79920620

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107507 WO2022022348A1 (zh) 2020-07-28 2021-07-21 视频压缩方法、解压方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114007059A (zh)
WO (1) WO2022022348A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116506629A (zh) * 2023-06-27 2023-07-28 上海伯镭智能科技有限公司 用于矿山无人驾驶矿车协同控制的路况数据压缩方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11770505B1 (en) * 2022-02-04 2023-09-26 Lytx, Inc. Adaptive storage reduction of image and sensor data and intelligent video stream restoration
CN114697633B (zh) * 2022-03-29 2023-09-19 联想(北京)有限公司 视频传输方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101374242A (zh) * 2008-07-29 2009-02-25 宁波大学 一种应用于3dtv与ftv系统的深度图编码压缩方法
CN103077542A (zh) * 2013-02-28 2013-05-01 哈尔滨工业大学 一种深度图的感兴趣区域压缩方法
CN108513131A (zh) * 2018-03-28 2018-09-07 浙江工业大学 一种自由视点视频深度图感兴趣区域编码方法
CN109803134A (zh) * 2017-11-16 2019-05-24 科通环宇(北京)科技有限公司 一种基于hdmi系统的视频图像传输方法及数据帧结构
CN110012310A (zh) * 2019-03-28 2019-07-12 北京大学深圳研究生院 一种基于自由视点的编解码方法及装置
US20190373241A1 (en) * 2018-05-31 2019-12-05 Intel Corporation Bit depth coding mechanism

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7292257B2 (en) * 2004-06-28 2007-11-06 Microsoft Corporation Interactive viewpoint video system and process
CN101330631A (zh) * 2008-07-18 2008-12-24 浙江大学 一种立体电视系统中深度图像的编码方法
GB0907870D0 (en) * 2009-05-07 2009-06-24 Univ Catholique Louvain Systems and methods for the autonomous production of videos from multi-sensored data
JP5858380B2 (ja) * 2010-12-03 2016-02-10 国立大学法人名古屋大学 仮想視点画像合成方法及び仮想視点画像合成システム
CN102413353B (zh) * 2011-12-28 2014-02-19 清华大学 立体视频编码过程的多视点视频和深度图的码率分配方法
CN102801997B (zh) * 2012-07-11 2014-06-11 天津大学 基于感兴趣深度的立体图像压缩方法
CN103179405B (zh) * 2013-03-26 2016-02-24 天津大学 一种基于多级感兴趣区域的多视点视频编码方法
CN103763564B (zh) * 2014-01-09 2017-01-04 太原科技大学 基于边缘无损压缩的深度图编码方法
CN104159095B (zh) * 2014-02-19 2016-12-07 上海大学 一种多视点纹理视频和深度图编码的码率控制方法
EP3193504A4 (en) * 2014-10-07 2017-11-01 Samsung Electronics Co., Ltd. Multi-view image encoding/decoding method and apparatus
GB2534136A (en) * 2015-01-12 2016-07-20 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
CN105049866B (zh) * 2015-07-10 2018-02-27 郑州轻工业学院 基于绘制失真模型的多视点加深度编码的码率分配方法
CN107770511A (zh) * 2016-08-15 2018-03-06 中国移动通信集团山东有限公司 一种多视点视频的编解码方法、装置和相关设备
CN109803135A (zh) * 2017-11-16 2019-05-24 科通环宇(北京)科技有限公司 一种基于sdi系统的视频图像传输方法及数据帧结构
KR20190099566A (ko) * 2018-02-19 2019-08-28 부산대학교 산학협력단 카메라 시점 변화에 강인한 물체 인식 및 물체 영역 추출 방법
CN110458932B (zh) * 2018-05-07 2023-08-22 阿里巴巴集团控股有限公司 图像处理方法、装置、系统、存储介质和图像扫描设备
CN109741383A (zh) * 2018-12-26 2019-05-10 西安电子科技大学 基于空洞卷积和半监督学习的图像深度估计系统与方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101374242A (zh) * 2008-07-29 2009-02-25 宁波大学 一种应用于3dtv与ftv系统的深度图编码压缩方法
CN103077542A (zh) * 2013-02-28 2013-05-01 哈尔滨工业大学 一种深度图的感兴趣区域压缩方法
CN109803134A (zh) * 2017-11-16 2019-05-24 科通环宇(北京)科技有限公司 一种基于hdmi系统的视频图像传输方法及数据帧结构
CN108513131A (zh) * 2018-03-28 2018-09-07 浙江工业大学 一种自由视点视频深度图感兴趣区域编码方法
US20190373241A1 (en) * 2018-05-31 2019-12-05 Intel Corporation Bit depth coding mechanism
CN110012310A (zh) * 2019-03-28 2019-07-12 北京大学深圳研究生院 一种基于自由视点的编解码方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DÀN JIÀNG: "Hip-hop 3 and CBA are all using the free perspective. How did Ali Entertainment create it?", HEART OF THE MACHINE REPORT, BAIDU, CN, 1 January 2020 (2020-01-01), CN, pages 1 - 6, XP055893618, Retrieved from the Internet <URL:https://baijiahao.baidu.com/s?id=1677974686517807925&wfr=spider&for=pc> [retrieved on 20220221] *
RONGGANG, LIVEVIDEOSTACK: "China's AVS UHD coding standard system and ecological construction (with some videos)", BLOG CSDN, CSDN, 1 January 2020 (2020-01-01), pages 1 - 18, XP055893624, Retrieved from the Internet <URL:https://blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/104787624> [retrieved on 20220221] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116506629A (zh) * 2023-06-27 2023-07-28 上海伯镭智能科技有限公司 用于矿山无人驾驶矿车协同控制的路况数据压缩方法
CN116506629B (zh) * 2023-06-27 2023-08-25 上海伯镭智能科技有限公司 用于矿山无人驾驶矿车协同控制的路况数据压缩方法

Also Published As

Publication number Publication date
CN114007059A (zh) 2022-02-01

Similar Documents

Publication Publication Date Title
WO2021083176A1 (zh) 数据交互方法及系统、交互终端、可读存储介质
US10567464B2 (en) Video compression with adaptive view-dependent lighting removal
US11653065B2 (en) Content based stream splitting of video data
WO2022022348A1 (zh) 视频压缩方法、解压方法、装置、电子设备及存储介质
US11202086B2 (en) Apparatus, a method and a computer program for volumetric video
CN112204993B (zh) 使用重叠的被分区的分段的自适应全景视频流式传输
WO2021083178A1 (zh) 数据处理方法及系统、服务器和存储介质
US20210192796A1 (en) An Apparatus, A Method And A Computer Program For Volumetric Video
CN107454468A (zh) 对沉浸式视频进行格式化的方法、装置和流
WO2022022501A1 (zh) 视频处理方法、装置、电子设备及存储介质
EP3434021B1 (en) Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices
WO2021083174A1 (zh) 虚拟视点图像生成方法、系统、电子设备及存储介质
WO2019229293A1 (en) An apparatus, a method and a computer program for volumetric video
WO2021083175A1 (zh) 数据处理方法、设备、系统、可读存储介质及服务器
CN116325769A (zh) 从多个视点流式传输场景的全景视频
JP7271672B2 (ja) 没入型ビデオビットストリーム処理
CN112738009B (zh) 数据同步方法、设备、同步系统、介质和服务器
CN116528065B (zh) 一种高效虚拟场景内容光场获取与生成方法
WO2021083177A1 (zh) 深度图生成方法、计算节点及计算节点集群、存储介质
TWI778749B (zh) 適用虛擬實境的影像傳輸方法、影像處理裝置及影像生成系統
US20230008125A1 (en) Augmenting a view of a real-world environment with a view of a volumetric video object

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849028

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21849028

Country of ref document: EP

Kind code of ref document: A1