CN116982085A - Patch-based remodelling and metadata for volumetric video - Google Patents

Patch-based remodelling and metadata for volumetric video Download PDF

Info

Publication number
CN116982085A
CN116982085A CN202280021309.1A CN202280021309A CN116982085A CN 116982085 A CN116982085 A CN 116982085A CN 202280021309 A CN202280021309 A CN 202280021309A CN 116982085 A CN116982085 A CN 116982085A
Authority
CN
China
Prior art keywords
patch
patches
remodelling
video
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280021309.1A
Other languages
Chinese (zh)
Inventor
苏冠铭
尹鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Priority claimed from PCT/US2022/029381 external-priority patent/WO2022245695A1/en
Publication of CN116982085A publication Critical patent/CN116982085A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

An input 3D point cloud is received that contains a spatial distribution of points. A patch including pre-remodeled patch data is generated from the input 3D point cloud. Encoder-side remodelling is performed on the pre-remodelled patch data to generate remodelled patch data for the patch. The remodelled patch data is encoded into a 3D video signal, a recipient device of the 3D video signal may decode the 3D video signal to generate a reconstructed 3D point cloud approximating the input 3D point cloud.

Description

Patch-based remodelling and metadata for volumetric video
Cross reference to related applications
The present application claims priority from European patent application 21175192.0 from 21 st 5 th year 2021 and U.S. provisional application 63/191,480 from 21 st 5 th year 2021, all of which are incorporated herein by reference in their entirety.
Technical Field
This disclosure relates generally to video coding, and more particularly, to patch-based remodelling and metadata for 3D video coding, including but not limited to video point cloud compression (V-PCC) and visual volume video-based coding (V3C).
Background
Video technology is being developed to support the transmission and rendering of three-dimensional (3D) video content based on available bandwidth supported by contemporary computing and network infrastructure. For example, MPEG video encoders and decoders may be extended or reused to support encoding and decoding MPEG-based 3D video content for rendering with a wide variety of computing devices that incorporate MPEG codecs. Other video encoders and decoders may also be implemented or developed to support encoding and decoding non-MPEG based 3D video content for rendering with computing devices that incorporate non-MPEG codecs.
Consumer devices, such as hand-held devices or wearable devices, may be installed or configured with a limited set of video codecs. Thus, if the 3D video content is not encoded and delivered in the intended video format, the device will likely not be able to find a suitable video decoder to decode the 3D video content. Even if rendered, the decoded 3D video content may include incorrect or inaccurate interpretations or representations of the original 3D video content, and generate visible artifacts of shape, color, and brightness values in the rendered image.
The methods described in this paragraph are applicable methods, but are not necessarily methods that have been previously conceived or pursued. Thus, unless otherwise indicated, it should not be assumed that any of the approaches described in this paragraph are prior art by virtue of their inclusion in this paragraph only. Similarly, unless otherwise indicated, issues identified with respect to one or more methods should not be assumed to have been recognized in any prior art based on this paragraph.
Drawings
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
FIGS. 1A-1D illustrate example codec architectures;
FIGS. 2A and 2B illustrate example projections of an input 3D point cloud;
FIG. 2C illustrates an example process flow for generating multiple patches; FIG. 2D illustrates an example layout for assembling multiple patches; FIGS. 2E and 2F illustrate example spatial transform and scaling operations;
FIG. 3A illustrates example patch level remodelling; FIG. 3B illustrates example patch level reverse remodeling; FIG. 3C illustrates an example atlas frame having multiple partitions to store atlas information; FIG. 3D illustrates an example applicable coding syntax specification;
FIGS. 4A-4C illustrate example process flows; and is also provided with
FIG. 5 illustrates an example hardware platform on which a computer or computing device described herein may be implemented.
Detailed Description
Example embodiments are described herein that relate to encoding, decoding, and representing 3D video content including, but not limited to, patch-based remodelling and metadata for video point cloud compression (V-PCC) and visual volume video-based coding (V3C). In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It may be evident, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in detail so as not to unnecessarily obscure, obscure or obscure the present invention.
Example embodiments are described herein according to the following outline:
1. general overview
2. Patch-based video encoder
3. Patch-based video decoder
4. Patch-based remodeling
5. Single patch remodelling operations
6. Multi-patch remodeling optimization
7. Scene-based cases
8. Patch-based remodelling syntax
9. Video metadata
10. Example Process flow
11. Implementation mechanism-hardware overview
12. Equivalents, extensions, alternatives and miscellaneous items
1. General overview
This summary presents a basic description of some aspects of example embodiments of the invention. It should be noted that this summary is not an extensive or exhaustive overview of the various aspects of the example embodiments. Moreover, it should be noted that this summary is not intended to be construed to identify any particularly important aspects or elements of example embodiments nor is it intended to be construed to specifically delineate example embodiments or generally delineate any scope of the invention. This summary presents some concepts related to example embodiments in a simplified and simplified format only, and is understood as a prelude to the more detailed description of example embodiments that follows.
Augmented Reality (AR), virtual Reality (VR), mixed Reality (MR), and other immersive volume video applications have demonstrated many types of new unique and excellent viewing experiences. These new viewing experiences may be delivered by or implemented by a variety of different types of technologies. Under some approaches, (3D) point clouds may be used to represent 3D (visual) scenes, but the amount of data used to represent the point clouds may be significant, especially for point cloud video applications. Each of the plurality of points in the point cloud has its 3D position or geometry data and is associated with a number of point specific properties (e.g., color, reflectivity, surface normals, etc.).
The techniques described herein support performing patch-based remodelling operations on patch data (e.g., original, pre-remodeled, etc.) in patches generated from input 3D video content or point clouds to form remodeled patch data. The operating parameters for the patch-based remodelling operation may be generated as part of the patch-based remodelling operation. A 3D video signal carrying remodeled patch data with operating parameters including, but not limited to, video metadata for a patch-based remodelling operation may be transmitted to downstream receiver devices in a particular signal format. The patch-based remodelling operations may be communicated to the recipient device by means of video metadata. The operating parameters for the patch-based remodelling operation enable the recipient device to reconstruct an output 3D point cloud that is the same as or closely approximates the input 3D video content or point cloud. The display image may be generated from the reconstructed 3D video content or point cloud and rendered on an image display operating with the recipient device.
The 3D video signal described herein from an upstream device performing patch-based reshaping to a downstream recipient device performing reconstruction of 3D video content or point clouds may comprise a plurality of component 2D video signals, which may also be referred to as 2D timing signals. These component 2D video signals may be time-based signals in the time domain and are time-indexed, time-synchronized or otherwise time-correlated with each other in the 3D video signal. Each of the component 2D video signals in the same 3D video signal may or may not be (e.g., desired as, etc.) displayable video signals. However, the component 2D video signals collectively may be used to transfer the displayable 3D video signals from an upstream device to a downstream recipient device.
Patch-based video data derived from a 3D point cloud may be efficiently compressed and decompressed with relatively high performance and fast response under the techniques described herein. In some operating cases, some or all of these techniques may be implemented with existing video codecs related to one or more of the following: motion Picture Expert Group (MPEG), advanced Video Coding (AVC), high Efficiency Video Coding (HEVC), versatile Video Coding (VVC), AOMedia video 1 (AV 1), basic video coding (EVC), point Cloud Compression (PCC), video-based point cloud compression (V-PCC), visual volume video-based coding (V3C), and the like. Patch-based video data compression or decompression associated with PCC or V-PCC may be implemented by these codecs as part of image preprocessing operations, video/image processing operations, and/or video/image post-processing operations. Available video codecs (e.g., MPEG, AVC, HEVC, VVC, AV1, EVC, etc.) can be extended or reused for compressing or decompressing patch-based video data in conjunction with a 3D point cloud.
Although V-PCC coding operations are used in some of the discussions herein for simplicity, it should be noted that in various operation cases, some or all of the techniques described herein may similarly be implemented, performed, or used in other types of 3D video coding including, but not limited to, V3C coding operations to achieve the same or similar goals or benefits.
The 3D or volumetric video coding operations described herein may be implemented as a video/image processing tool, a video/image pre-processing tool, a video/image post-processing tool, or a combination of the foregoing. In some operational cases, any of these tools may be implemented with available (e.g., existing, enhanced, etc.) video codecs that may or may not have built-in-loop remodelling modules or processing blocks. As used herein, in-loop or in-loop processing/operations may refer to normative video processing defined in applicable (e.g., industry standard, proprietary, industry standard with proprietary extensions, etc.) video coding specifications as part of video encoding and/or decoding operations. Out-of-loop or out-of-loop processing/operations may refer to video/image processing operations (e.g., preprocessing operations, post-processing operations, etc.) that are not specified in the video coding specification as part of video encoding and/or decoding operations.
The 3D or volumetric video coding operations may include patch-based reshaping to increase coding efficiency and reduce banding artifacts (or false contours) in reconstructed 3D point clouds including, but not limited to, high Dynamic Range (HDR) point clouds. Patch-based reshaping may be performed as an in-loop operation included in a 3D or volumetric video coding operation, (e.g., enhanced, etc.) V-PCC or V3C coding operation.
Additionally, optionally or alternatively, patch-based remodeling may be performed as an out-of-loop or out-of-loop operation, such as a pre-processing or post-processing operation in combination with (e.g., normative, standards-based, etc.) 3D or volumetric video coding operations, V-PCC or V3C coding operations.
Patches may be derived from a 3D point cloud with or without 2D projection. Patches described herein may refer to spatial regions or 3D bounding boxes within an atlas (e.g., rectangular, etc.) associated with volumetric information represented in a 3D point cloud. An atlas may refer to a set of patches or 2D bounding boxes and their associated patch data placed onto (e.g., rectangular, etc.) a frame and corresponding to a volume in 3D space on which the volume data is represented or rendered.
Patches derived from a 3D point cloud with 2D projections may be referred to as projected patches. Patch-based video data in a patch may be generated from projecting (e.g., 3D, un-projected, pre-projected, etc.) points in a 3D point cloud onto one or more 2D projection planes or at a depth represented by the projection planes. As used herein, a projected patch or data unit may include its corresponding occupancy, geometry, and attribute patch data with reference to a particular 2D projection plane or a particular depth represented by a particular 2D projection plane.
Patches derived from a 3D point cloud without 2D projections may be referred to as original and/or EOM patches. As used herein, an EOM patch or data unit may include geometric and attribute data for points in a 3D point cloud that are positioned at intermediate depth locations not represented in the projected patch (e.g., which may be referred to as EOM coded points, etc.). The original patch or data unit may include geometry and attribute data of the non-projected points in one or more regions of the 3D point cloud to be stored directly in the original patch or data unit.
Under the techniques described herein, different remodelling maps/functions may be used or applied to remodel different patches. Additionally, optionally or alternatively, patch-based (e.g., in-loop, out-of-loop, etc.) remodeling operations described herein may explore or consider different dynamic ranges in different patches for the purpose of achieving relatively high coding compression gains.
The remodelling operation parameters specified or used for the remodelling operation and/or corresponding reverse remodelling operation may be stored, cached, or included as remodelling metadata as part of the overall video metadata. The 3D video signals described herein may be generated or represented in a suitable signal format to include a set of component 2D video signals or 2D timing signals. The set of component 2D signals in the 3D video signal may be packaged with corresponding video metadata of the remodeled patch data and different (e.g., occupancy, geometry, attributes, etc.) types of remodeled patch data in the patch, respectively.
As used herein, the term "signal format" may refer to a format of a video signal defined or specified according to an applicable signal format specification (e.g., lossy compression, lossless compression, a combination of lossy compressed sample data and lossless compressed video metadata, etc.). The signal format specification may be incorporated into a proprietary or standards-based video coding specification. Example signal formats or video coding specifications may include, but are not necessarily limited to, any of the following: V-PCC specification, V3C specification, MPEG specification, non-MPEG specification, proprietary or non-proprietary specification, extensions of proprietary or non-proprietary specification, and the like. As used herein, a 3D video coding specification may provide a specification of syntax and syntax elements that may be supported by the specification. These syntax and syntax elements including sample data or video metadata may be transmitted or signaled from an upstream device, such as a patch-based video encoder, to a downstream device, such as a patch-based video decoder.
The 3D video encoding and decoding described herein may be driven by flexible 3D video coding syntax and syntax elements. A patch-based video encoder may generate a 3D video signal using 3D video coding syntax and syntax elements that conform to one or more different 3D video coding specifications. These different 3D video coding specifications may be marked or identified with different versions including different combinations of major and/or minor version numbers (other ways of identifying a particular 3D video coding specification may be similarly used).
The 3D video coding syntax and syntax elements provide (e.g., complete, etc.) roadmaps or guidelines for patch-based video decoders to efficiently perform 3D decoding operations, e.g., in reverse data flow. This approach allows parallel and continuous optimization of patch-based video encoder and decoder designs with improved algorithms, implementation costs, speed, etc. Additionally, optionally or alternatively, 3D video coding syntax and syntax elements and patch-based video data may be transmitted and signaled by a patch-based video encoder to a patch-based video decoder in an efficient manner that exploits redundancy between current video metadata portions and previously sent video metadata.
Sample data coded in a patch-based video signal, e.g., sample/pixel values in the spatial or temporal domain, transform coefficients of sample/pixel values in the transform domain, etc., may involve one or more of: atlas, occupancy, geometry (e.g., location of points or samples/pixels, coordinate values, etc.), attributes (e.g., visual texture, color, reflective properties such as reflectivity, surface normals, time stamps, material IDs, etc.), etc.
Video metadata accompanying sample data that is decoded in a patch-based video signal may specify that syntax or syntax elements that are part of an applicable signal format or video coding specification are decoded in a 3D video signal. These syntax and syntax elements provide a common medium that supports encoding, transmitting, and decoding patches or patch data and accompanying video metadata. For example, these grammars and syntax elements may be predefined, used, or supported by a patch-based video codec (e.g., implemented with an available 2D video codec, etc.) in an upstream video processing system to specify and carry particular operating parameters for a particular patch-based remodelling operation, as described herein in a 3D video signal. Correspondingly, these syntax and syntax elements may be used or supported by a patch-based video codec in the downstream receiver system (e.g., also implemented with an available 2D video codec, etc.) to decode or parse the patch or patch data and accompanying video metadata encoded by the upstream video processing system using the same syntax and syntax elements.
Syntax and/or syntax elements of video metadata in patch-based video data may be categorized into one of a sequence level, an atlas level, a frame level, a patch level, an occupied data level, a geometric data level, an attribute data level, a partition level (e.g., 2D or 3D bounding boxes in depicted 3D space, quadtree nodes, octree nodes, blocks, component polygons, etc.), or a function/operation level (e.g., a prediction operation, a reshaping operation, a color space conversion operation, a forward or reverse mapping, a remapping operation, an interpolation operation, a spatial scaling operation, a spatial rotation operation, an image warping operation, an image fusion operation, a sampling format conversion operation, a content mapping operation, a quantization operation, an arithmetic coding, etc.).
Example embodiments described herein relate to encoding 3D visual content. An input 3D point cloud is received. The input 3D point cloud includes a spatial distribution of points positioned at a plurality of spatial locations in the represented 3D space. A plurality of patches is generated from an input 3D point cloud. Each patch of the plurality of patches includes pre-remodeled patch data of one or more patch data types. The pre-remodeled patch data is derived at least in part from visual properties of a subset of points in the input 3D point cloud. Encoder-side remodelling is performed on the remodelled patch data contained in the plurality of patches to generate remodelled patch data of one or more patch data types for the plurality of patches. The remodelled patch data of one or more data types of the plurality of patches is encoded into the 3D video signal in place of the remodelled patch data of one or more data types of the plurality of patches. The 3D video signal causes a recipient device of the 3D video signal to generate a reconstructed 3D point cloud that approximates the input 3D point cloud.
Example embodiments described herein relate to decoding 3D visual content. Remodelled patch data of one or more data types from a plurality of patches of a three-dimensional (3D) video signal is decoded. Decoder-side remodelling is performed on remodelled patch data of the plurality of patches to generate reconstructed patch data of one or more patch data types for the plurality of patches. A reconstructed 3D point cloud is generated based on reconstructed patch data of one or more patch data types of the plurality of patches.
In some example embodiments, the mechanisms described herein form part of a media processing system, including but not limited to any of the following: handheld devices, gaming machines, televisions, laptop computers, netbook computers, tablet computers, desktop computers, computer workstations, computer kiosks, or various other types of computing devices and media processing units.
Various modifications to the preferred embodiments and the generic principles and features described herein will be readily apparent to those skilled in the art. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features described herein.
2. Patch-based video encoder
Fig. 1A illustrates an example encoder architecture 102 that may be implemented by a patch-based video encoder to generate a 3D video signal based on coding syntax or syntax elements that conform to one or more 3D video coding specifications. The patch-based video encoder may implement the encoder architecture 102 using one or more computing devices. Example patch-based video encoders described herein may include, but are not necessarily limited to, video codecs related to any of the following: V-PCC, V3C, MPEG, AVC, HEVC, VVC, AV1, EVC, etc.
As illustrated in fig. 1A, the 3D video coding architecture (102) includes computational program logic for performing 3D video coding operations, such as patch generation and patch packing blocks for receiving and processing sequences of input 3D images, each of which may be indexed by respective time indices representing time instances in a sequence of consecutive time instances represented in an input 3D point cloud (e.g., in a sequence of input 3D point clouds to be processed into a 3D video signal by a patch-based video encoder, etc.) to generate, pack, and fill patches of various patch types.
The 3D video coding architecture (102) also includes an atlas coding block for coding atlas data specifying how the generated patches are arranged (e.g., scaled, rotated, arranged, packed, etc.) in an atlas frame of an atlas bitstream of the 3D video signal.
The 3D video coding architecture (102) further includes occupancy generation, geometry generation, and attribute generation blocks for generating occupancy data, geometry data, and attribute data for each of the generated patches.
Video compression logic blocks, such as occupied video coding, geometric video coding, and attribute video coding blocks, may be implemented in a 3D video coding architecture (102) to compress sample data represented by occupied, geometric, and attribute data in occupied, geometric, and attribute frames, respectively, of a geometric and attribute video bitstream.
Geometric smoothing and attribute smoothing blocks may be implemented in a 3D video coding architecture (102) to perform spatial and/or temporal smoothing operations on geometric and attributed images and generate corresponding smoothed geometric and attribute operating parameters; a multiplexing block (denoted as a "multiplexer") for multiplexing the atlas bitstream, occupancy, geometry and properties video stream, geometry smoothing parameters, properties smoothing parameters, etc. into the 3D video signal; etc.
The geometric smoothing parameters may specify or define operational parameters for geometric smoothing operations implemented or performed by the recipient device to combine or incorporate data from neighboring patches to improve the spatial resolution or spatial consistency of geometric data in image regions near patch boundaries. The attribute smoothing parameters may specify or define operational parameters for an attribute smoothing operation that is performed or performed by the recipient device to combine or incorporate data from neighboring patches to improve the spatial resolution or attribute consistency of the attribute data in the image region near the patch boundary.
Additionally, optionally or alternatively, the geometry generation and attribute generation block may be used to generate auxiliary geometry data and auxiliary attribute data for the original and/or EOM patch. Geometric video coding and attribute video coding blocks may be used to compress sample data represented in the geometric and attribute data of the original and/or EOM patches, respectively, in auxiliary geometric and attribute frames of the auxiliary geometric and attribute video bitstream, respectively.
An example description of patch-based video coding operations may be found in: "emerging point cloud compression MPEG standard: special problems: immersive video coding and transmission (Emerging MPEG Standards for Point Cloud Compression: specialty Issue: immersive Video Coding and Transmission) ", IEEE journal (IEEE Journal on Emerging and Selected Topics in Circuits and Systems) for emerging and selected topics in circuits and systems (month 3 of 2019), volume 9, chapter 1, pages 133-148; an overview of ongoing point cloud compression standardization activities Video-Based (V-PCC) and Geometry-Based (G-PCC) (An Overview of Ongoing Point Cloud Compression Standardization Activities): video-Based (V-PCC) and Geometry-Based (G-PCC)) @, APSIPA transaction (APSIPA Transactions on Signal and Information Processing) for signal and information processing (month 2020, 4); "video-based point cloud compression standard in MPEG: from the evidence collection Committee, commission drafting [ in short, standard ] (Video-Based Point-Cloud-Compression Standard in MPEG: from Evidence Collection to Committee Draft [ Standards in a Nutshell ]) ", IEEE Signal processing impurity (IEEE Signal Processing Magazine) (month 5 of 2019), volume 36, chapter 3, pages 118 to 123; "information technology-coded representation of immersive media-part 5: coding based on visual volume Video (V3C) and point cloud compression based on Video (V-PCC) (Information technology-Coded Representation of Immersive Media-Part 5:Visual Volumetric Video-based Coding (V3C) and Video-based Point Cloud Compression (V-PCC))', ISO/IEC 23090-5:2020, the entire contents of which are hereby incorporated herein by reference as if set forth in full.
The 3D images represented with the input 3D point cloud may correspond to respective time instances in a sequence of consecutive time instances. Patches may be generated or derived from a 3D point cloud. The patch may include occupancy patch data, geometry patch data, and attribute patch data. The projected patches in the patch may include respective ones of the occupied patch data portions of the patch, respective ones of the geometric patch data portions of the patch, and respective ones of the attribute patch data portions of the patch.
Occupancy patch data for patches generated from a 3D point cloud within a given time index or instance may be packed (e.g., two-dimensional, etc.) into occupancy frames from the atlas data. Likewise, geometric patch data for patches of 3D images within a given time index or instance may be packed into (e.g., two-dimensional, etc.) geometric frames from the atlas data. Attribute patch data for patches of 3D images within a given time index or instance may be packed into (e.g., two-dimensional, etc.) attribute frames from the atlas data.
The patch generation block may be implemented to project an input 3D point cloud (representing a 3D image) to a 2D image plane to form a projected patch in the patch. Video codecs that are capable of encoding (e.g., two-dimensional, etc.) video frames can be reused to encode occupancy, occupancy of geometric and attribute patch data containing 3D images, geometric and attribute frames into a 3D video signal in the form of a set of component 2D video signals or timing signals. For the purpose of reconstructing a reconstructed 3D image or point cloud, each component 2D video signal of the set of component 2D video signals or timing signals may not be (desired as) an independent 2D video signal, but may be time-correlated with other component 2D video signals in the same set. The set of component 2D video signals or timing signals includes a set of atlases, occupancy, geometry, and attribute component video signals (e.g., sub-bitstreams, sub-streams, layers, video components, etc.) in a 2D video signal format, etc. Patches represented in the set of component 2D video signals (or 2D timing signals) in the overall 3D video signal may be clocked or indexed by logical time instances such as sequence ID, frame ID, etc. Different types (e.g., occupancy, geometry, attributes, etc.) of patch data sharing the same time index may be combined by the recipient device to generate a reconstructed 3D image or point cloud for the time instance corresponding to the time index. Example video codecs to be reused or extended for 3D video coding as described herein may include, but are not necessarily limited to, any of the following: MPEG, AVC, HEVC, VVC, AV1, EVC, etc.
Fig. 2A illustrates one or more example 2D projections 204 of an input 3D point cloud 202, which may be implemented or executed by a patch generation block of a patch-based video encoder to generate a patch including patch data of occupancy, geometry, and attributes. Some or all of the 2D projections (204) may be made with respect to one or more reference projection planes of fig. 2B.
The 2D projection (204) may be selected or performed with respect to one or more normals (directions) of one or more 2D projection planes selected for some or all points in the input 3D point cloud (202). Additionally, optionally or alternatively, the 2D projection (204) may be selected or performed with respect to one or more perspectives from one or more reference or designated cameras that are logically or physically positioned at one or more 2D projection planes onto which the input 3D point cloud (202) is projected. Additionally, optionally or alternatively, the 2D projection (204) may be selected or performed with respect to one or more dynamically determined perspectives of one or more virtual/reference viewers or real viewers (e.g., simultaneously consuming or viewing video/image content generated from the same media program, including inputting a 3D visual scene depicted or represented in the 3D point cloud (202), etc.). Additionally, optionally or alternatively, the 2D projection (204) may be selected or performed with respect to one or more perspectives selected based on artistic or creative intent.
As illustrated in fig. 2A, one or more projected patches 206 comprising patch data of occupancy (or occupancy patch data) 208, patch data of geometry (or geometry patch data) 210, and patch data of attributes (or attribute patch data) 212 are generated from 2D projections (204) of an input 3D point cloud (202). It should be noted that in various operational cases, one or more projected patches may be generated from a single 2D projection of a 3D point cloud, as described herein.
Occupancy patch data (208) may be packed into one or more (e.g., two-dimensional, etc.) occupancy frames. A plurality of pixels or locations in the occupancy frame may be populated with a plurality of occupancy samples or pixel values, respectively, from the occupancy patch data (208). In most, if not all, operating cases, not all pixels or positions in the occupied frame may be occupied by or projected from a corresponding point in the 3D point cloud (202). The occupancy patch data (208) may include a plurality of individual occupancy sample values or pixel values that occupy a plurality of pixels or locations in the frame. Each occupancy sample or pixel value of the plurality of individual occupancy samples or pixel values indicates whether a respective individual pixel or location of a plurality of pixels or locations to which the individual sample or pixel value corresponds is projected from one or more points (e.g., at least one point, a closest point, zero or more unoccluded points, zero or more intermediate points, etc.) specified or defined in the input 3D point cloud.
As illustrated in fig. 2A, the black pixels in the occupancy patch data (208) indicate that there is no occupancy from or projection from any point in the input 3D point cloud (202). Conversely, white pixels in the occupancy patch data (208) indicate occupancy or projection from one or more points (e.g., at least one point, a closest point, zero or more unoccluded points, zero or more intermediate points, etc.) in the input 3D point cloud (202).
The pixel values in the occupancy patch data (208) may or may not be limited to a single bit value, such as a binary value of 0 or 1, true or false, yes or no, etc. In some operational cases, each pixel value in the occupancy patch data (208) may be a multi-bit value, such as an 8-bit value, a 16-bit value, a 32-bit value, or the like. The upstream encoding device may encode multi-bit values within a represented range of values (e.g., 0-255, etc.) into the 3D video signal, which may be decoded by a receiver decoding device of the 3D video signal, and then binarized into binary values of 0 or 1, true or false, yes or no, etc. In some cases of operation, the multi-bit value may be obtained by some or all pre-encoding operations such as stretching (e.g., linear stretching, etc.), half-toning, dithering, etc.
The geometric patch data (210) may be packed into one or more (e.g., two-dimensional, etc.) geometric frames. A plurality of pixels or locations in the geometric frame may be filled in with a plurality of geometric samples or pixel values, respectively, from the geometric patch data (210). Each of a plurality of individual geometric samples or pixel values, for a pixel or location projected from a point (e.g., closest point, farthest point, etc.) in the input 3D point cloud, indicates a distance (e.g., in integer values, in floating point values, in decimal point values, etc.) of the point to a 2D projection plane used by the patch generation block to project the input 3D point cloud into one or more 2D projection planes of the projected patch (206).
The attribute patch data (212) may be packaged into one or more (e.g., two-dimensional, etc.) attribute frames. A plurality of pixels or locations in the attribute frame may be populated with a plurality of attribute samples or pixel values, respectively, from the attribute patch data (212). Each of a plurality of individual property samples or pixel values—properties (e.g., R, G, B, Y, cb, cr, reflectivity, color, reflectivity, surface normal, timestamp, material ID, etc.) for a pixel or location indicating point projected from a point (e.g., closest point, farthest point, etc.) in the input 3D point cloud.
Fig. 2B illustrates an example (e.g., reference, designation, etc.) 2D projection of an input 3D point cloud (e.g., 202, etc.). These 2D projections may be made with reference to a 2D projection plane, where the camera positions are on faces of a 3D polygon, e.g., a polygon in which each face intersects (or does not intersect) at least one other face.
As illustrated in fig. 2B, the input 3D point cloud (202) may be projected into a first plurality of 2D projection planes corresponding to a plurality (e.g., 6, etc.) of raw camera positions 214. Additionally, optionally or alternatively, the input 3D point cloud (202) may be projected by rotation along a first axis (e.g., a Y-axis in an X-Y-Z cartesian coordinate system of the input 3D point cloud (202)) into a second plurality of 2D projection planes corresponding to a plurality (e.g., 4, etc.) of rotated camera positions 216. Additionally, optionally or alternatively, the input 3D point cloud (202) may be projected into a third plurality of 2D projection planes corresponding to a plurality (e.g., 4, etc.) of rotated camera positions 218 by rotation along a second axis (e.g., a Z-axis in a cartesian coordinate system of the input 3D point cloud (202)). Additionally, optionally or alternatively, the input 3D point cloud (202) may be projected by rotation along a third axis (e.g., an X-axis in a cartesian coordinate system of the input 3D point cloud (202)) into a fourth plurality of 2D projection planes corresponding to the plurality (e.g., 4, etc.) of rotated camera positions 220.
A single 2D projection of the input 3D point cloud may be performed by a patch-based video encoder or patch generation block therein to generate a single projected patch or multiple projected patches.
FIG. 2C illustrates an example process flow for generating multiple patches from a 2D projection of an input 3D point cloud (e.g., 202, etc.).
In view of the input 3D point cloud (202) and the 2D projection plane to be used or referenced by the 2D projection, block 222 includes estimating, determining, and/or evaluating a normal direction (or simply, normal) 232 of points in the input 3D point cloud (202) with respect to (e.g., geometrically related, spatially related, angularly related, directionally related, etc.) the 2D projection plane.
For example, in view of a particular point in a 3D point cloud (202), neighboring points in the same 3D point cloud (202) that are adjacent to the particular point may be determined, identified, or otherwise selected. Surfaces constructed (e.g., relatively smooth, logical, best-fit, etc.) from particular points and some or all of the neighboring points may be constructed. The normal to a particular point (e.g., vertical, etc.) may be set to be the normal to the surface (e.g., vertical, etc.) on which the particular point and some or all of the adjacent points are located.
Based on the normal direction (232) of the point of the input 3D point cloud (202), the alignment of the normal direction of the point with the normal (e.g. vertical direction, etc.) of the 2D projection plane may be determined and evaluated, for example, via the individual dot product of the (individual) normal direction of the (individual) point with the normal of the 2D projection plane. In an example, a subset of points that are relatively aligned with a normal to the 2D projection plane may be identified from some or all points in the input 3D point cloud (202) by comparison to a minimum alignment threshold (e.g., aligned within an angular difference of less than 10 angles, aligned within an angular difference of less than 20 angles, etc.). A subset of points that are relatively aligned with a normal to the 2D projection plane may be projected onto the 2D projection plane to produce an initial 2D projection image. In another example, a maximum dot product or inner product between the normal to a particular point and the normal to a particular 2D projection plane among some or all dot products or inner products between the normal to the particular point and the normal to some or all 2D projection planes (e.g., one or more 2D projection planes up to 18 projection planes, etc.) may be identified. All points having the largest dot product or inner product with respect to a particular 2D projection plane may be grouped (e.g., considered, determined, etc.) into a subset of points that are aligned with the particular 2D projection plane and thus projected onto the particular 2D projection plane.
Block 224 includes segmenting the initial 2D projection image into a plurality of initial clusters (or candidate patches) 234 by initial segmentation or clustering in a first stage of the multi-stage segmentation solution. The initial segments or clusters may be implemented based on one or more segmentation or clustering algorithms or methods that satisfy one or more optimization objectives, for example, in minimizing intra-cluster distances within each of the initial clusters (or candidate patches) while maximizing inter-cluster distances between different ones of all the initial clusters (or candidate patches).
Block 226 includes refining the segmentation of the initial cluster (or candidate patch), such as performing a clean-up or smoothing of the initial cluster (or candidate patch) in a second stage of the multi-stage segmentation solution to produce a refined cluster (or candidate patch) 236. In an example, points in the initial cluster (or candidate patches) that have a large difference in geometry, properties, etc. from neighboring points in the initial cluster (e.g., within a relatively small neighborhood, etc.) may be deleted or removed from the initial cluster. In another example, an initial cluster that is smaller than the minimum cluster/patch size may be deleted, removed, or merged with a neighboring initial/refined cluster/patch.
Block 228 includes performing a connected component algorithm on the refined cluster (or candidate patch) to determine, estimate, and/or generate a connected cluster (or candidate patch) 238 from the refined cluster (or candidate patch). The connected clusters (or candidate patches) may include relatively spatially contiguous neighboring points that meet a minimum connectivity threshold.
Block 230 includes performing depth filtering on the connected clusters (or candidate patches) to generate (e.g., final, complete, output, etc.) patches from the connected clusters (or candidate patches). The connected clusters (or candidate patches) may include relatively spatially contiguous neighboring points that meet a minimum connectivity threshold.
One, two, or more patches may be generated from a 2D projection of an input 3D point cloud (202). In some operational cases, as illustrated in FIG. 2C, two patches 206-1 and 206-2 are generated from an input 3D point cloud (202).
Additionally, optionally or alternatively, one or more 2D projections (e.g., 3 projections, 18 projections, etc.) may be performed with respect to the input 3D point cloud (202) and represented in a 3D video signal generated by a patch-based video encoder. By way of example and not limitation, three 2D projections may be used to project an input 3D point cloud onto three 2D projection planes, respectively, such that a point cloud portion (or portion) of an input 3D point that is occluded in one of the three 2D projections may be captured, depicted, or otherwise not occluded in one or two other 2D projections among the three 2D projections. In various operational cases, one, two, three, or more 2D projections may be used to generate a patch to be encoded in a 3D (patch-based) video signal, as described herein. Each of the 2D projections may be clustered or segmented to produce one, two, three, or more patches.
After generating a patch using one or more 2D projections of a 3D point cloud (202), a patch-based video encoder or patch packing block therein may assemble each type of patch data in the patch into a respective 2D video/image frame. For example, all occupancy patch data in a patch may be assembled and/or packaged into a 2D occupancy frame. Likewise, all geometric patch data in the patch may be assembled and/or packed into a 2D geometric frame. All attribute patch data in the patch may be assembled and/or packaged into a 2D attribute frame.
Each type of patch data in a 2D video/image frame may be fed or input into an available video codec of a patch-based video encoder to be compressed/encoded into a respective component 2D video signal (or 2D timing signal) or sub-bitstream (e.g., occupied video component/signal, geometric video component/signal, attribute video component/signal, etc.) of an output 3D video signal or bitstream.
In some operational cases, as illustrated in fig. 1A, an optimized layout is determined or identified for compiling or packing patches (e.g., projected patches, original and/or EOM patches, etc.), given time instances into an atlas based on the video encoder of the patch or patch packing blocks therein.
The optimized layout of the atlas may be determined or identified to arrange patch data into a particular layout that reduces or minimizes the overall padding pixels. As described herein, a padded pixel may refer to an invalid pixel or location from which a point of the input 3D point cloud (202) is not projected or padded. For example, an optimized layout of patches into a set of graphs may permit patches to overlap at sample/pixel locations of the layout (or between or among bounding boxes of patches) so long as there is only one valid pixel or location, e.g., where projected or filled points from an input 3D point cloud (202) come from only one patch of two or more (e.g., space, etc.) overlapping patches at any given pixel or location in the optimized layout.
By way of example and not limitation, four patches or bounding boxes, namely, patch 0 through patch 3, are generated by the patch-based video encoder or patch generation blocks therein. As illustrated in fig. 2D, four patches or bounding boxes may be assembled into the optimal or predetermined layout 240. To reduce or minimize the overall padded pixels in the optimized layout (240), some or all of these patches or bounding boxes may overlap each other in the optimized layout (240). For example, two patches or bounding boxes, patch 1 and patch 2, may partially overlap in the optimized layout (240) to reduce or minimize the overall padded pixels in the optimized layout (240).
In some cases of operation, patches or bounding boxes may be ordered, for example, by size (e.g., bounding box size, etc.). Each of the patches may be enclosed in a bounding box, such as a rectangle having horizontal and vertical dimensions equal to the maximum horizontal and vertical sample/pixel position differences of the enclosed patches. The spatial area in the bounding box that is not occupied by a valid pixel of the patch is a padding area.
A first bounding box enclosing a first patch (e.g., a largest patch, etc.) is placed into an upper left spatial region of the optimized layout (240). A second bounding box enclosing the second patch may be placed into the optimized layout (240) to occupy as much of the unoccupied area within the first bounding box as possible. This patch (or bounding box) placement process may be repeated until all patches (or bounding boxes) are placed into the optimized layout (240).
Based at least in part on the patch order and/or block-to-patch mapping, the optimal or predetermined layout may be packaged or implemented into the atlas 242. The patch order may be defined/specified explicitly or implicitly using the respective index values assigned to the patches or bounding boxes (e.g., distinct, numerical ordering, patchidx=0, 1, 2, and 3, etc.). The block-to-patch mapping may be defined/specified using block-to-patch mapping information (labeled "block2patch information") in the atlas (242) to identify a particular patch (e.g., patch 0-patch 3, etc.) to which a block of samples/pixels (e.g., a block of 4x4 samples/pixels, a block of 8x8 samples/pixels, a block of 16x16 samples/pixels, a block of 32x32 samples/pixels, etc.) is mapped.
In some operation cases, each of the block-to-patch maps indicates a unique patch to which the block (e.g., 4x4 sample/pixel block, 16x16 sample/pixel block, 32x32 sample/pixel block, etc.) is mapped (e.g., single, etc.). The unique patch may be the last patch or last bounding box that has been placed over the block during the patch (or bounding box) placement process.
In some cases of operation, to enable parallel processing and random access of the atlas information of a patch (e.g., represented in a clean random access coded atlas, etc.), instead of using the last patch in a block-to-patch map, the block-to-patch map of a block may be set with the index of the first patch placed on the block, which may then be used to prevent subsequent patches from occupying the same block as valid pixels. However, the subsequent patch may still overlap the first patch as long as valid pixels from the subsequent patch are not present in the block.
Additionally, optionally or alternatively, the same or similar optimized layout may be used to package or assemble corresponding geometric patch data in the patch into the 2D geometric frame 244. Likewise, the same or similar optimized layout may be used to package or assemble corresponding attribute patch data in the patch into the 2D attribute frame 246.
To make the optimized layout relatively compact, zero, one or more of all patches may be spatially transformed by rotating and exchanging x and y coordinates into spatially transformed patches. The spatially transformed patches may be assembled or packed into 2D frames instead of corresponding (pre-spatially transformed) patches.
As illustrated in fig. 2E, a patch 248, which may be defined using two spatial coordinates x and y, may be spatially transformed into a first transformed patch 250 by exchanging the x and y coordinates. Additionally, optionally or alternatively, the patch (248) may be spatially transformed into a second transformed patch 252 by a 90 degree rotation. The patch (248) may be spatially transformed into a third transformed patch 254 by a 180 degree rotation. The patch (248) may be spatially transformed into a fourth transformed patch 256 by a 270 degree rotation. The patch (248) may be spatially transformed into a fifth transformed patch 258 by 180 degrees rotation followed by specular reflection. The patch (248) may be spatially transformed into a sixth transformed patch 260 by 270 degree rotation followed by specular reflection. The patch (248) may be spatially transformed into a seventh transformed patch 262 by specular reflection.
As illustrated in fig. 2F, the patch (leftmost in fig. 2F) may be spatially transformed into a transformed patch by scaling. In an example, the patch may be scaled by a scale factor (e.g., transmitted in a 3D video signal on a per patch basis, etc.) (e.g., one-half) along the horizontal direction of fig. 2F to produce a transformed patch (e.g., with a lower level of detail than the original patch, etc.). In another example, the patch may be scaled by a scale factor (e.g., one-half) along the vertical direction of fig. 2F to produce a transformed patch (e.g., with a lower level of detail than the original patch, etc.). In yet another example, the patch may be scaled by a scale factor (e.g., one-half) along each of the horizontal and vertical directions of fig. 2F to produce a transformed patch (e.g., with a much lower level of detail than the original patch, etc.).
It should be noted that in various operational cases, these and other spatial transform operations may be performed on patches to obtain spatially transformed patches that are to be packed into an atlas and encoded into a 3D video signal according to the atlas, as described herein.
After determining the optimized layout of the patch in the atlas, the patch-based video encoder or occupancy, geometry, and attribute generation blocks therein may assemble or package each type (e.g., occupancy, geometry, attribute, etc.) of patch data into a respective 2D video/image frame.
The 2D video/image frames containing the respective types of patch data may then be encoded/compressed into (the respective component 2D video signals or sub-bitstreams of) the 3D video signals or bitstreams by a patch-based video encoder or video coding block (e.g., a reused 2D video codec implementing the occupied, geometric, and attribute video coding blocks of fig. 1A, etc.).
The atlas information, such as patch specific information (e.g., patch position, corresponding 3D position, orientation, level of detail such as scale factors, spatial transforms, radial reflections, etc.), and patch packing information (including, but not limited to, specifying some or all packing order and block-to-patch mapping of the optimized layout) in conjunction with the optimized layout may be encoded or compressed, e.g., in a lossless manner, by a patch-based video encoder or an atlas coding block (of fig. 1A) therein, into one or more atlas frames in a (component) atlas signal or sub-bitstream of a 3D video signal or bitstream. In some cases of operation, a plurality of atlas parameters and their corresponding values (e.g., constituting a relatively high-level parameter set, etc.), such as a set thereof, etc., may be used to specify patch information and patch packing information for an optimized layout as part of an atlas sequence (e.g., an atlas frame sequence, etc.) encoded into an atlas bitstream (e.g., an atlas frame, etc.). The encoding of an atlas sequence in an atlas bitstream may be referred to as atlas sequence coding. Advanced parameter sets in the sequence of atlases may be used by the recipient device for decoding and reconstruction operations.
3. Patch-based video decoder
Fig. 1B illustrates an example decoder architecture 152 that may be implemented by a patch-based video decoder to decode a 3D video signal based on coding syntax or syntax elements (e.g., explicitly or implicitly specified in the 3D video signal, etc.) that conform to one or more 3D video coding specifications. The patch-based video decoder may implement the decoder architecture 152 using one or more computing devices. Example patch-based video decoders described herein may include, but are not necessarily limited to, video codecs related to any of the following: V-PCC, V3C, MPEG, AVC, HEVC, VVC, AV1, EVC, etc.
The patch-based video decoder may perform decoder operations with additional pre-processing and/or post-processing modules/blocks in reverse order (as compared to the encoder operations performed by the patch-based video encoder of fig. 1A). Some or all of the pre-processing and/or post-processing procedures may be implemented by additional processing modules/blocks to reduce the likelihood of visual artifacts (e.g., caused or introduced by patch generation and coding on the encoder side, etc.).
As illustrated in fig. 1B, the 3D video decoding architecture (152) includes computational program logic blocks for performing 3D video decoding operations, such as a demultiplexing block (labeled "demultiplexer") to demultiplex atlas bitstreams, occupancy, geometry and attribute video streams, geometry smoothing parameters, attribute smoothing parameters, and the like from a 3D video signal.
The 3D video decoding architecture (152) also includes video decompression logic, such as an occupancy, geometry, and attribute video decoding block to decode or decompress atlas data from the atlas bitstream and to decode or decompress occupancy, geometry, and attribute video decoding block from sample data (e.g., occupancy, geometry, and attribute data, etc.) represented in occupancy, geometry, and attribute data, respectively, of the occupancy, geometry, and attribute video bitstream.
The nominal format conversion block may be implemented in a 3D video decoding architecture (152) to convert decoded sample data from a decoded format (e.g., as specified in an applicable video coding standard, etc.) to a nominal format or representation. The decoded formats may be a particular resolution, bit depth, frame rate, synthesis time index, chroma format, etc., while the nominal variations may be resolution, bit depth, frame rate, synthesis time index, chroma format, etc., some of which may be different from those in the decoded formats.
The pre-reconstruction block may be implemented in a 3D video decoding architecture (152) to receive output video data from a nominal format conversion and further perform preparation operations prior to reconstruction, including, but not limited to, some or all of the following: based on the atlas data (e.g., patch order, block-to-patch mapping, spatial transformations such as transformations, specular reflection, scaling, etc.) and unpacking of decoded occupancy, geometry, and attribute data (e.g., reverse packing, etc.), patches are extracted, generated, or reconstructed; patch/boundary filtering; remove boundary points associated with decoding errors, etc.
The reconstruction blocks may be implemented in a 3D video decoding architecture (152) to receive and reconstruct an initial reconstructed 3D point cloud from patches including 2D occupancy, geometry, and attribute patch data, including but not limited to projecting patches back to (3D) points in a local or patch-specific portion of 3D space (which may be specific to an individual patch), transforming (e.g., translating, rotating, scaling, etc.) points in local patch space to 3D space (not specific to an individual patch) for representing points projected as a whole from all patches.
The post-reconstruction block may be implemented in a 3D video decoding architecture (152) to perform attribute transfer and smoothing operations on geometry and attributes of the reconstructed 3D point cloud based on smoothing and other auxiliary information in decoded sample data or atlas information (to handle discontinuities due to compression at patch boundaries), to incorporate non-projected points of the input 3D point cloud (e.g., encoded in one or more original or Enhanced Occupancy Mode (EOM) data units of the 3D video signal), to resolve conflicts and inconsistencies among reconstructed points resulting from patches with respect to different 2D projection planes, to perform hole or gap filling on missing points, and so forth.
The adaptation block may be implemented in a 3D video decoding architecture (152) to adapt an initial reconstructed 3D point cloud into an adaptive (e.g., final, completed, output, etc.) 3D point cloud based at least in part on an anchor point (e.g., a location where a reference or actual user/viewer is located; etc.), scale (e.g., scaling of the 3D point cloud relative to the anchor point; etc.), rotate (e.g., rotating an initial point cloud based on an orientation of a perspective of the reference or actual user/viewer; etc.), translate (e.g., translating or moving the initial point cloud based on a linear location of the reference or actual user/viewer; etc.), and so forth.
4. Patch-based remodeling
The contrast sensitivity of the Human Visual System (HVS) depends not only on properties such as brightness, but also on masking characteristics of the visual content such as noise and texture, and the adaptation state of the HVS. In other words, the video/image data may be quantized at least in part with respect to noise level or texture characteristics of visual content represented in the video/image data.
Content Adaptive Quantization (CAQ) may be applied to codewords in various types of patch data in patches derived from a 3D point cloud. For particular types (e.g., input, pre-reshaped, etc.) of patch data in a given patch (e.g., occupancy patch data type, geometry patch data type, attribute patch type, etc.), the patch data may be represented in an input bit depth, and a noise mask generation may be applied to the patch data to generate a noise mask image of the patch. The noise mask image of a patch characterizes each pixel in a particular type of patch data in a given patch in terms of its perceived relevance in masking quantization noise. The noise mask histogram may be generated based on the patch data in the given patch and the noise mask image generated for the patch data in the given patch. A masking noise level to bit depth (mapping) function may be applied to the noise mask histogram to produce a minimum bit depth value for each bin in the noise mask histogram. The reshaping function, which may also be referred to as a codeword mapping function, may be generated based on the input bit depth, the target bit depth, and the minimum bit depth value. The remodelling function may be applied to a particular type of (e.g., input, pre-remodeled, etc.) patch data in the patch to produce a particular type of remodelled patch in the patch in a target bit depth that may be the same as or different from the input bit depth.
The patch data of a particular type (e.g., occupancy patch data type, geometry patch data type, attribute patch type, etc.) is pre-remodeled patch data and is at least partially indicative of a target visual property, such as a target bit depth.
Under the techniques described herein, patch-based (forward) reshaping may be performed as part of a 3D video encoding operation (e.g., V-PCC video encoding operation, V3C video encoding operation, etc.). For example, an encoder-side processing block for patch-based (forward) remodelling may be executed (execution/performance) to improve coding efficiency after an optimized layout of a packaged patch for a given instance of time is decided and before different (forward) remodelled patch data of a type (e.g., occupancy, geometry, attributes, etc.) in the patch is packaged into a 2D video frame according to the optimized layout.
Some or all of these patch-based (forward) remodelling operations at the encoder side may be performed in parallel. While the patches are generated via 2D projections from the input 3D point cloud (in 3D space) to one or more 2D projection planes, those of the already generated patches may be (forward) reshaped such that the (forward) reshaped patch data of the already generated patches may be packed into 2D video frames concurrently or consecutively.
Conversely, patch-based (reverse) reshaping may be performed as part of a 3D video decoding operation (e.g., V-PCC video decoding operation, V3C video decoding operation, etc.). For example, the decoder-side processing block for patch-based (reverse) remodelling may be unpacked or unpacked at the (forward remodelled) patch according to the signaling layout (e.g., via the atlas data signaled in the received 3D video signal) from the 2D video frame for the (forward remodelled) patch packed by the encoder.
In some operation cases, patch-based forward and patch-based reverse remodelling, which may yield relatively significant benefits in coding gain/efficiency, may be inserted into 3D video encoding and decoding architecture, respectively, in a manner that introduces relatively minor changes to the architecture.
For example, encoder-side remodeling may be performed on pre-remodeled patch data included in a plurality of patches generated from a 3D point cloud based on a plurality of patch-based remodelling functions. The plurality of patch-based remodelling functions may include patch-based remodelling functions related to one or more of: a multi-segment polynomial, a three-dimensional look-up table (3 DLUT), a cross color channel predictor, a multi-color channel multiple regression (MMR) predictor, a predictor with a B-spline function as a basis function, a tensor product B-spline (TPB) predictor, etc.
In an embodiment, the plurality of patch-based remodelling functions includes a first patch-based remodelling function for remodelling a first portion of the pre-remodeled patch data included in a first patch of the plurality of patches; the plurality of patch-based remodelling functions includes a second different patch-based remodelling function for remodelling a second portion of the pre-remodeled patch data included in a second different patch of the plurality of patches.
Example remodelling functions using multi-segment polynomials are described in the following applications: PCT application No. PCT/US2017/50980 of the 2017 9 month 11 application; U.S. provisional application serial No. 62/404,307 filed on 10/5/2016 (also published on 5/4/2018 as U.S. patent application publication serial No. 2018/0098094), the entire contents of which are hereby incorporated by reference as if fully set forth herein. An example MMR-based remodeling function is described in U.S. patent 8,811,490, which is incorporated herein by reference as if set forth in its entirety. Example TPB remodeling functions are described in U.S. provisional application serial No. 62/908,770 (attorney docket No. 60178-0417), entitled "TENSOR-PRODUCT B-SPLINE PREDICTOR," filed on 1, 10, 2019, which is incorporated herein by reference as if fully set forth herein.
Fig. 1C illustrates an example encoder architecture 1002 that may be implemented by a patch-based video encoder to generate (reshaped) a 3D video signal based on coding syntax or syntax elements that conform to one or more 3D video coding specifications. The patch-based video encoder may implement the encoder architecture 1002 using one or more computing devices. Example patch-based video encoders described herein may include, but are not necessarily limited to, video codecs related to any of the following: V-PCC, V3C, MPEG, AVC, HEVC, VVC, AV1, EVC, etc.
As illustrated in fig. 1C, the 3D video coding architecture (1002) includes computational program logic blocks for performing 3D video coding operations (e.g., the operations illustrated in fig. 1A). In some operational cases, each patch may be generated and processed relatively independently of the other patches, except for the placement or packing of different types of patch data in the patch into the respective 2D video/image frame according to the applicable (optimized) layout. The individual size and individual location of the patch may be signaled in the atlas data encoded in the atlas bitstream of the reshaped 3D video signal.
Rate-distortion (R-D) tradeoff and/or banding mitigation of High Dynamic Range (HDR) 3D video content under the techniques described herein may be controlled at a relatively fine granularity, as patch-based remodeling may be performed at the patch level (e.g., local remodeling of 2D video frames of a packaged patch, etc.) rather than at the picture level (e.g., global remodeling of 2D video frames as a whole, etc.).
In some cases of operation, patch-based (forward) reshaping may be performed as part of 2D frame generation, while occupancy, geometry, and attribute data in each patch is copied into or contained by a respective 2D video frame (e.g., a corresponding 2D array represented as a sample or pixel location, etc.), e.g., after an optimized layout of patch packages is decided or completed prior to 2D video frame generation. In these operational cases, the encoder-side processing blocks (e.g., occupancy, geometry, and attribute video blocks, etc.) that implement or perform (e.g., occupancy, geometry, and attribute video frame generation of fig. 1A or 1C) include or implement encoder-side processing blocks (e.g., occupancy, geometry, and attribute remodeling blocks, etc. of fig. 1C) that implement or perform patch-based (forward) remodeling, as described herein.
In some operation cases, patch-based (forward) reshaping may be performed prior to 2D video frame generation, with occupancy, geometry, and attribute data in each patch copied into or contained by the respective 2D video frame (e.g., a corresponding 2D array represented as a sample or pixel location, etc.), but after the optimized layout of the patch package is decided or completed prior to 2D frame generation. In these operational cases, encoder-side processing blocks (e.g., occupancy, geometry, and attribute video blocks, etc.) that implement or perform 2D (e.g., occupancy, geometry, and attribute) video/image frame generation are operated in conjunction with but separate from encoder-side processing blocks (e.g., occupancy, geometry, and attribute remodeling blocks, etc. of fig. 1C) that implement or perform patch-based (forward) remodeling described herein, as depicted in fig. 1C.
Fig. 3A illustrates example patch level remodeling performed by the remodel block of fig. 1C in a patch-based video encoder with respect to a plurality of (e.g., original, pre-remodeled, etc.) patches (e.g., patch 0, patch 1, patch 2, etc.). The patch-based video encoder or a remodel block therein may perform patch-based remodelling on these original patches to produce remodeled patches (labeled "remodel 0" through "remodel 2").
The patch-based video encoder or video coding blocks therein may assemble or pack these remodeled patches into 2D video/image frames by video coding operations based on specified locations and sizes in the determined layout of the 2D video/image frames. For example, an optimized layout of an original (e.g., pre-remodeled, etc.) patch may be determined prior to applying a patch-based (forward) remodelling operation to the patch. After the remodeling operation is applied, each of the resulting remodeled patches generated from remodelling the original patch may have its own remodelling parameters separate from other remodelling parameters of other ones of the resulting remodeled patches.
Video codecs implementing occupied, geometric, and attribute video coding blocks may take or receive as input 2D video/image frames containing or packaging (e.g., individually, differentially, etc.) remodeled sample data of a remodeled patch and compress the remodeled sample data of the remodeled patch into a (remodeled) 3D video signal.
As previously described, each patch as represented in the atlas or 2D video frame generated from the optimized layout represented in the atlas may have its own remodeling parameters separate from other patches or other remodeling parameters of other patches therein. These patch-based reshaping parameters may be signaled in a signal (e.g., dedicated, component, etc.) or sub-bitstream of the 3D video signal using syntax and syntax elements applicable to the video specification (e.g., newly defined, existing, reused, etc.).
Fig. 1D illustrates an example decoder architecture 1052 that may be implemented by a patch-based video decoder to decode (reshape) a 3D video signal based on a coding syntax (e.g., explicitly or implicitly specified in the 3D video signal, etc.) that conforms to one or more 3D video coding specifications. The patch-based video decoder may implement the decoder architecture 1052 using one or more computing devices. Example patch-based video decoders described herein may include, but are not necessarily limited to, video codecs related to any of the following: V-PCC, V3C, MPEG, AVC, HEVC, VVC, AV1, EVC, etc.
The patch-based video decoder may perform decoder-side operations in reverse order (compared to the encoder operations performed by the patch-based video encoder of fig. 1C) with additional pre-processing and/or post-processing modules/blocks. Pre-processing and/or post-processing procedures may be implemented by additional processing modules/blocks to reduce the likelihood of visual artifacts (e.g., caused or introduced by patch generation and coding on the encoder side, etc.).
As illustrated in fig. 1D, the 3D video decoding architecture (1052) includes computational program logic blocks for performing 3D video decoding operations, such as a demultiplexing block (labeled "demultiplexer") to demultiplex atlas bitstreams, occupancy, geometry and attribute video streams, geometry smoothing parameters, attribute smoothing parameters, and the like from a 3D video signal.
The 3D video decoding architecture (1052) also includes video decompression logic, such as occupancy, geometry and attribute video decoding blocks to decode or decompress atlas data from the atlas bitstream and to decode or decompress occupancy, geometry and attribute video decoding blocks of remodeled sample data (e.g., remodeled occupancy, geometry and attribute data, etc.) from patches packed or represented (e.g., individually remodeled, etc.) in 2D occupancy, geometry and attribute data of the occupancy, geometry and attribute video bitstream, respectively.
The 3D video decoding architecture (1052) further includes inverse remodelling blocks, such as occupancy, geometry and attribute remodelling blocks for inversely remodelling the remodelled sample data (e.g., occupancy, geometry and attribute data, inversely remodelled), respectively.
The nominal format conversion block may be implemented in a 3D video decoding architecture (1052) to convert the inversely remodelled decoded sample data from a decoded format (e.g., as specified in an applicable video coding standard, etc.) to a nominal format or representation.
Pre-reconstruction blocks may be implemented in a 3D video decoding architecture (1052) to unpack (e.g., reverse pack, etc.), extract, generate, or reconstruct patches including 2D occupancy, geometry, and attribute patch data from inversely remodelled decoded occupancy, geometry, and attribute data based on atlas data (e.g., patch order, block-to-patch mapping, spatial transformations, such as rotation, specular reflection, scaling, etc.); the reconstruction block is to reconstruct an initial reconstructed 3D point cloud from a patch comprising 2D occupancy, geometry and attribute patch data.
The post-reconstruction block may be implemented in a 3D video decoding architecture (1052) to resolve conflicts and inconsistencies among reconstructed points generated from the patch with respect to different projection planes.
The adaptation block may be implemented in a 3D video decoding architecture (1052) to adapt an initial reconstructed 3D point cloud into an adaptive (e.g., final, completed, output, etc.) 3D point cloud based at least in part on decoded sample data (e.g., some or all geometric and attribute smoothing parameters decoded from a received 3D video signal).
As illustrated in fig. 1D, at the decoder side, patch-based (inverse) remodelling may be performed by the patch-based video decoder or inverse remodelling block therein after remodelling occupancy, geometry, and attribute data of the patch is demultiplexed, decoded, or decompressed from the received (remodelled) 3D video signal produced by the upstream patch-based video encoder. Reverse remodeling in each patch represented in a 2D video/image frame decoded from a 3D video signal may be specified or determined using video metadata (including, but not limited to, remodeling metadata including patch-based remodeling parameters) decoded or extracted from a bitstream of the 3D video signal (e.g., newly defined, existing or encoded in a reuse syntax or syntax element in a bitstream of the 3D video signal, etc.). The reverse remodeling may be followed by a reconstruction-related block of the patch-based video decoder to produce a reconstructed or reverse-remodeled patch, which may be the same as or similar to the original pre-remodeled patch. These reconstructed or inversely remodeled patches may then be used to derive or generate a reconstructed 3D point cloud.
In some operational cases, the reconstructed 3D point cloud is the same as or closely approximates the input 3D point cloud from which the upstream patch-based video encoder generated the patch packed or encoded in the 3D video signal. Additionally, optionally or alternatively, the reconstructed 3D point cloud may be represented in the same domain as the input 3D point cloud-e.g., in the same time domain as one 3D point cloud of a time instance in a time series of 3D point clouds, the same spatial coordinate system as the input 3D point cloud, the same color space, the same dynamic range or color gamut with which properties or attributes of points in the reconstructed 3D point cloud are represented.
Additionally, optionally or alternatively, the reconstructed 3D point cloud may be represented in or converted into a domain different from the domain of the input 3D point cloud. For example, the domain of the reconstructed 3D point cloud on the decoder side may differ from the domain of the input 3D point cloud on the encoder side in that the reconstructed 3D point cloud may be represented in or converted into one or more of: different spatial coordinate systems, different color spaces, different dynamic ranges or gamuts with which properties or attributes of points in the reconstructed 3D point cloud are represented.
The reconstructed 3D point cloud, whether in the same domain as the domain in which the 3D point cloud was input or not, may be further processed to render the visual image to a viewer (e.g., user, etc.).
Fig. 3B illustrates example patch level or patch-based reverse remodeling performed by the reverse remodelling block of fig. 1D in a patch-based video decoder with respect to a plurality of (forward) remodelled patches (e.g., patch 0, patch 1, patch 2, etc.).
The patch-based video decoder or a reverse remodelling block therein may perform patch-based reverse remodelling on these remodelled patches obtained from the 2D image decoding block of the patch-based video decoder to produce a reconstructed or reverse remodelled patch (labeled "Inv remodelling 0" through "Inv remodelling 2"; the same or similar to the original pre-remodelled patch on the encoder side). The patch-based video decoder or reconstruction-related blocks therein may generate a reconstructed 3D point cloud based at least in part on the inversely reshaped patch. Each of the (decoded) remodelled patches from the 3D video signal may be remodelled in reverse using its own remodelling parameters separate from the other remodelling parameters of the other remodelled patches.
While each individual patch or each individual patch has been shown to be reshaped (at the encoder side) or reshaped inversely (at the decoder side), it should be noted that in various operational cases, the patch and/or patch may be reshaped, individually or collectively, or reshaped inversely, to some extent. In some example operational cases, remodeling and/or reverse remodeling may be applied to a single type of sample data, such as only (any) one of occupancy, geometry, and attribute types. In some example operational cases, remodeling and/or reverse remodeling may be applied to only two types of sample data, such as any combination of two of the attributes, geometry, and attribute types. In some example operational cases, remodeling and/or reverse remodeling may be applied to all properties, geometries, and property types. Additionally, optionally or alternatively, one or more patches and/or one or more types of patch data in one or more patches may be remodeled or inversely remodeled into groups.
5. Single patch remodelling operations
The (e.g., original, pre-reshaped, etc.) ith sample/pixel value of the (e.g., original, pre-reshaped, etc.) patch data of the T-type (where T may be one of occupancy, geometry, or attribute) of the kth (projected) patch of the jth time instance (e.g., time instance in a time instance sequence corresponding to the 3D point cloud sequence, etc.) is labeled as
All 3 types of patch data in the same patch may have the same spatial dimension. For example, without loss of generality, the width and height dimensions of the patch may be along the width and height dimensionsIs measured or represented in the sample or pixel locations of (c). Thus, the total number of samples or pixels (or sample or pixel locations) in the kth patch is +.>
Let theIs->Representing the maximum and minimum values of T-patch data in the patch. The dynamic range in the T-patch data in the patch can be expressed as +.>
To be applied toThe patch-level remodelling function/map of the T-type (occupancy, geometry, attribute) patch data for the kth patch of the jth time instance is labeled asIn some operating cases, the reshaping function/map may be represented as a single-channel multi-segment polynomial (e.g., first or second order P-segment polynomial, etc.) polynomial or a cross-color channel reshaping function/map. Example multi-segment remodeling functions/maps and cross-color channel remodeling functions/maps may be found in U.S. provisional patent application No. 62/640,808, filed on 3/9 in 2018, the entire contents of which are hereby incorporated by reference as if fully set forth herein.
The t-th order of the remodelling (function/mapping) coefficients of the P-th polynomial segment in the P-segment polynomial can be denoted asFor simplicity, a function/map for specifying or defining remodeling>Can be collectively indicated as +.>
Patch data in the patch may be represented in one or more channels (e.g., occupancy value channel, depth value channel, R, G, B, Y, cb, cr, reflection property channel, color, reflectivity, surface normal, time stamp, material ID, etc.).
For illustration purposes only, a single channel remodeling function/map is used as an example. The output of the single channel remodeling function/map (labeled as) Can be marked as->The reverse remodeling function/map may be denoted +.>Reverse remodeling functions/maps may be created to ensure (e.g., ideally, approximately, etc.) ++>The t th order of the remodeling function/mapping coefficient of the P-th polynomial segment in the reverse remodeling function/mapping (or P-segment polynomial) can be denoted +.>For simplicity, a function/map for specifying or defining reverse remodeling>Can be collectively labeled as all reverse remodeling coefficients of (2)
Rounding of values, such as values associated with integer constraints and/or other numerical problems in video/image processing operations including reshaping and/or inverse reshaping, may introduce distortion.
The T-type (where T may be occupancy, geometry, attribute) of the kth patch of the jth time instance (e.g., time instance in the time instance sequence corresponding to the reconstructed 3D point cloud sequence, etc.) reconstructed sample/pixel value (from reverse remodeling) of the reconstructed patch data is labeled as
Video compression distortion may be expressed as a function (e.g., distance, error, etc.) between an input and a reshaped sample pixel value. For example, distortion or measurements introduced by operations (e.g., receiving input sample/pixel values to reshape the input sample/pixel values into reshaped sample/pixel values, generating corresponding output samples/pixels from inverse reshaped sample/pixel values, etc.) may be labeledTotal distortion or measurement of the patch (denoted +.>) The calculation or determination may be performed as follows:
for optimizing the reshaping function/mapping coefficientThe goal of (1) is to find +.>Optimized coefficient value +.>So that the total distortion is optimized or minimized as follows:
optimized coefficient valueMay be used to reshape the patch to produce a corresponding reshaped patch to be packed or encoded in a 2D video/image frame in a (reshaped) 3D video signal. Furthermore, the optimized coefficient value +.>Can be signaled in the bitstream of the same (reshaped) 3D video signal.
Thus, the patch may be treated like an entire image to be remodeled. Example image reshaping operations can be found in the following patents: us patent No. 10,419,762 issued 9 month 17 in 2019; us patent No. 10,032,262 issued 24, 7, 2018, the entire contents of which are hereby incorporated by reference as if fully set forth herein.
The risk of banding artifacts occurring in the reconstructed patch may be measured in each of a plurality of luminance sub-ranges (or luminance bins) in the overall luminance range (e.g., brightness, etc.). The entire luma range including the plurality of luma sub-ranges may be used to code or represent luma samples or pixel values in the (e.g., attribute, etc.) patch data of the pre-reshaped patch. The measurement of the risk of banding artifact in each luma sub-range may be used to determine or estimate a respective total number of (luma) codewords to be allocated for the luma sub-range to code the remodelled patch when remodelling the remodelled patch. The corresponding total number of (luma) codewords to be allocated for all luma sub-ranges can be used to construct a mapping curve, e.g. by curve smoothing.
By way of example and not limitation, the measurement of the risk of banding artifact in each luminance sub-range may be based on a block-based standard deviation (labeled BLKSTD), as follows:
The P-segment polynomial may be represented as labeledIs a simple one-dimensional look-up table (1D-LUT). In some operational cases, the block-based standard deviation BLKSTD may be based on the already pre-remolded domain +.>Normalization to normalized remodeled Domain [0 1 ]]Is calculated for the sample or pixel value of the normalized value range of (c).
Fig. 4A illustrates an example process flow for performing a remodeling operation to avoid or reduce banding artifacts in a patch, according to an example embodiment of the invention. In some example embodiments, one or more computing devices or components may perform this process flow.
In some cases, each complementThe block (e.g., the kth patch of the jth temporal example) may be partitioned into a plurality of non-overlapping UxU sample/pixel blocks. Marking the mth block in the patch as
In block 402, a patch-based video encoder (e.g., 1002 of fig. 1C, etc.) performs initialization to create a plurality (e.g., uniform, non-uniform, etc.) of non-overlapping codeword bins (e.g., luma sub-ranges or bins, non-luma sub-ranges or bins, etc.), the codeword values having (e.g., fixed, etc.) intervals θ available for encoding or representing (pre-reshaped) codewords in patch data of a given type (e.g., luma channel, non-luma channel, occupied value channel, geometry value channel, etc.) in the patch over the entire codeword value range. The smallest desired codeword (denoted tau) of the remolded patch (postremolded) assigned to encode or represent the (e.g., original, remolded, etc.) patch corresponding to each codeword bin x ) Can be initialized to 0 as follows: τ x =0. The total number of pre-remodeled codewords (e.g., counts, occurrences, etc.; labeled) used to record a given type of patch data in a patch of a codeword trellis) The bin count of (2) may also be initialized to 0 as follows: />
As used herein, the term "minimum required codeword" may refer to the number of codewords (e.g., integer number, etc.) required to minimize the perceived error in the reshaped codeword generated from the reshaped input codeword. In some operating cases, the minimum required codewords may be represented by a relative number (e.g., fractional value, etc.), such as a ratio of the number of codewords required to minimize the perceived error divided by the total number of codewords available in their color space or channel. When the total number of available codewords is represented as a bit depth (e.g., 8 bits, 10 bits, 12 bits, etc.), the smallest desired codeword or total number of bits (or bit depth portions) to be allocated for each input codeword cell or each codeword of an input codeword cell may be represented as a fractional bit depth.
The reshaping function/map used to reshape the (pre-molded) codeword of the (pre-molded) patch into a reshaped codeword of the reshaped patch may refer to, for example, a codeword, luma (luma) etc. transfer function. The overall bit depth used to allocate the total number of remodeled codewords in a codeword channel, and thus to generate a transfer function, may be referred to as a transfer function bit depth.
In block 404, the patch-based video encoder calculates the mean and standard deviation (BLKSTD) for each block m, labeled asIs->
In block 406, the patch-based video encoder determines, for each sample/pixel (value) in each block m, which codeword bin the sample/pixel (value) of the block belongs to—for example, using the sample/pixel (value) to determine the corresponding bin index x of the codeword bin to which the sample/pixel (value) belongs, as follows:
then, the bin count of the determined codeword bins may be incremented as follows:
in block 408, the patch-based video encoder assigns a new or current value to the smallest required codeword of the codeword trellis by adding or accumulating the block-based standard deviation of the sample/pixel block to the previous value of the smallest required codeword of the codeword trellis as follows:
the standard deviation calculated by the expressions (3) and (4) for the block represents the (masking) noise level of the block. It should be noted, however, that the masking noise level described herein may be calculated or measured by standard deviation values or non-standard deviation values. Example masking noise calculations can be found in the following patents: united states patent No. 10,701,375 issued 6 months 30 in 2020; us patent No. 10,701,404 issued 6/30 2020, the entire contents of which are hereby incorporated by reference as if fully set forth herein.
In block 410, the patch-based video encoder calculates an average of the standard deviation of bins for each bin x as follows:
in block 412, the patch-based video encoder applies a particular mapping function (labeled q) of masking noise to the bit depth function (e.g., an average of another type of standard deviation in the block or masking noise, etc.) such as to map the masking noise (level) of the block to determine or find the corresponding minimum required codeword as follows:
example mapping functions that map masking noise to the (minimum) bit depth of an input codeword bin, such as an input luma bin, or the minimum required (reshaped) codeword, can be found in the previously mentioned U.S. patent nos. 10,701,375 and 10,701,404.
In some operational cases, post-processing operations may be performed by a patch-based video encoder to generate patch-level remodelling functions/mappings, as described herein. For example, a patch-based video encoder may determine the lack of an entire codeword range (e.g., normalized to one (1), etc.)The codeword used (e.g., as u=1- Σ x Q x ) And using a power function to evenly, uniformly or non-uniformly distribute the unused codewords of the entire codeword range to the smallest required codewords of all codeword bins or sub-ranges for the purpose of improving the usability of codewords reshaped in these bins or sub-ranges. The smallest required codeword (e.g., after the unused codeword is distributed, etc.) as determined for some or all codeword bins may be used to generate the patch-level remodelling function/map.
Additionally, optionally or alternatively, in some operational cases, a smoothing operation may be performed to smooth the patch-level remodelling function/map to produce a smoothed patch-level remodelling function/map that may be used to remodel or map a remodelled codeword of a patch to a remodelled codeword of the remodelled patch.
Example operations of constructing a reshaping function/map from the smallest desired codeword in a bin and smoothing the constructed reshaping function/map can be found in the previously mentioned U.S. patent No. 10,701,375 and U.S. patent No. 10,701,404.
6. Multi-patch remodeling optimization
Depending on different optimization objectives, different solutions may be implemented to determine the remodeling parameters for each of the plurality of patches. In different operating cases, multiple patches, or patch data thereof, may be remodeled individually or collectively.
In some operational cases, the optimization objective may be to achieve individual (optimized) tape-like mitigation of individual patches. To achieve this, a solution may be implemented to reduce or avoid (e.g., final, reconstructed, etc.) banding artifacts in the texture of the 3D point cloud by applying a reshaping function or map that is optimized (e.g., sufficiently, optimally, etc.) to reduce or avoid banding artifacts in the texture of the individual patches used to reconstruct the 3D point cloud. Independent or individual remodeling functions/mappings may be used to remodel different patches-or patch data thereof-based on different codeword mappings or allocations for the different patches (e.g., locally, individually, etc.). Thus, the overall reshaping function/map need not be used or applied to all spatial regions or patches represented in the entire 2D video/image frame-or its patch data. Rather, patch-based remodelling or patch-level band mitigation solutions may be implemented in an adaptive manner for each (local) patch represented in a 2D video/image frame, or patch data thereof, thereby enabling relatively efficient codeword use and relatively high performance improvements.
Additionally, optionally or alternatively, the optimization objective may be to achieve (optimized) band-like mitigation of multiple patches as a group. To achieve this, a solution may be implemented to reduce or avoid (e.g., final, reconstructed, etc.) all banding artifacts in the texture of the 3D point cloud. A set of per-group remodeling functions/maps may be generated concurrently or together as a group and used to remodel all individual patches in the same group-or their patch data. In some operational cases, the remodeling function/mapping may be optimized using weighted distortion (for groups) calculated as a weighted sum of individual patch level distortions as follows:
wherein the method comprises the steps ofRepresenting individual weighting factors assigned to individual patches in the group, or their T-patch data. The higher the weight factor, the more important the patch to which the weight factor is assigned. The individual importance of the individual patches may be set based on one or more importance selection factors including, but not limited to, any, some, or all of the size, location, depth, texture, occupancy, etc. of these patches.
The remodeling function/mapped remodeling coefficient, labeled for k=0, 1, … K-1May be generated by solving an optimization problem to minimize group-level distortion as follows:
The optimization problem represented or formulated in expression (8) may or may not have a closed-form solution. A relatively simple solution would be to first construct or construct individual patches in a group-or (pre-conditioned) individual remodelling functions of their patch data. Depending on the individual importance of the individual patches, different weight factors (or different weight factor values) may be assigned to the patches. The (pre-adjusted) remodelling function of the individual patch may be adjusted or readjusted to produce an adjusted remodelling function of the individual patch, for example, via simple scaling as follows:
wherein the method comprises the steps ofRepresenting an adjusted 1D-LUT representing an adjusted remodeling function for reshaping T-type patch data in a group; />Representing a pre-conditioned 1D-LUT representing a pre-conditioned remodeling function for reshaping T-type patch data in a group; />Representing a scaling or multiplication factor; />Representing the offset.
In some cases of operation, the scaling factorThe redistribution may be based on one or more redistribution methods according to the following: (1) Weighting factors assigned to different patches +.>And/or (2) individual whole dynamic ranges determined by the codeword distributions of the different patches, respectively +.>
In an example, a first redistribution method based on simple normalized weighting may be used. Under this approach, the weighting factors may be determined by limiting the weighting factors to (e.g., which may be specified by a user such as a designated or authorized user, etc.), as follows:
in some operating cases, scaling factors for adjusting (pre-adjusted) remodelling functions/maps to (post-adjusted) remodelling functions/mapsCan be simply set to normalize the weighting factor +.>
In another example, a second redistribution method based on normalized weighting with discounted inverse scaling may be used. Under this approach, the original content complexity may be considered. Each of the scaling factors includes two (constituent) parts. The first part is related to the original (whole) dynamic range of the patch, which is regarded as a representation or proxy of the original content importance/complexity of the patch. The second part of the scaling factor is the weighting factor
The full dynamic range based adjustment may be reduced or attenuated for patches with relatively low original content importance/complexity. More specifically, except in the edge case, when adjusting the reshaping function/mapping, the original dynamic range of the patch may be scaled to full range, including all allowable codeword values (available for coding all possible patches) independent of the particular codeword used to represent or code the patch. Edge situations can occur when a small range of original patch data in a patch may be scaled too large to cover the full range, resulting in excessive bits being consumed to decode a relatively unimportant visually important patch. To avoid or handle these edge situations, scaling in these edge situations may be reduced, attenuated, or suppressed, either partially or completely.
In some cases of operation, the scaling factorCan be achieved by introducing an additional factor gamma which aims at avoiding excessive adjustment of the reprofiling function/map by the first part of the scaling factor>1) To be discounted or reduced to some extent, the adjustment depending on the original dynamic range of the T-patch data in the patch, as follows:
wherein the method comprises the steps ofMarking the original dynamic range of the T-shaped patch data in the patch; 2 B The full dynamic range with bit depth B is indicated.
In some cases, the offset in expression (9) aboveMay simply be set to 0 or a value for shifting the respective center of the individual remodeling function (each of which is used to remodel a patch of a plurality of patches-or its T-patch data) to the respective midpoint of the codeword range (of the T-patch data of the plurality of patches) for better compression performance. />
In some cases of operation, the coefficients may be related (e.g., adjusted, polynomials, etc.)Or a 1D-LUT representing a reshaping function specified with these coefficients, for example, performs additional approximations using the approximation algorithm described in the previously mentioned U.S. patent No. 10,701,375 and U.S. patent No. 10,701,404.
In some operating cases, a reverse remodeling function/map corresponding to a (forward) remodeling function/map may be constructed or structured by reverse or back tracking the remodeling function. Example constructions or approximations of reverse remodeling functions using the remodeling functions may be found in U.S. patent No. 10,080,026 issued on 2018, 9, 18, the entire contents of which are hereby incorporated by reference as if set forth in their entirety herein.
7. Scene-based cases
Patch-based remodeling described herein may be applied to remodel a patch, its patch data, in a relatively flexible manner. By way of example and not limitation, a (time) sequence of consecutive 3D point clouds at a plurality of consecutive times (or time instances), respectively, may depict the same visual scene for the plurality of consecutive times (or time instances). A (time) sequence of consecutive 3D point clouds at a plurality of consecutive times (or time instances), respectively, may be projected by 2D projection to generate a plurality of consecutive patch (e.g. patches, patches in patches, etc.) sequences (time) at a plurality of consecutive times (or time instances), respectively.
Each successive patch sequence of the plurality of successive patch sequences may depict a visual sub-scene with respect to a corresponding 2D projection (e.g., a 2D projection plane, a camera logically or physically positioned at the 2D projection plane, etc.), and may share the same or substantially the same patch position. For example, if the overlap or intersection between two patches at two time instances exceeds a minimum patch overlap threshold (e.g., 90%, 80%, etc.), two patches at two different time instances in the same patch sequence may be considered to share the same or substantially the same patch location.
At the decoder side, the decoder may be driven by video metadata received with patch-based video/image data in the 3D video signal. For example, the reshaping operation of the patch-based reshaping may be performed by the patch-based video decoder with respect to a continuous patch (time) sequence, as guided by reshaping metadata in video metadata signaled by an upstream patch-based video encoder in the (reshaped) 3D video signal. Thus, regardless of whether the upstream patch-based video encoder decides to use different remodelling operational parameters for different (e.g., temporally adjacent, contiguous, etc.) patches in the sequence of consecutive patches, or whether the upstream patch-based video encoder decides to reuse the same remodelling operational parameters for different (e.g., temporally adjacent, contiguous, etc.) patches in the sequence of consecutive patches, the patch-based video decoder may still perform remodelling operations, such as by remodelling metadata to boot in the same or substantially the same manner. For example, based on the applicable coding syntax, the patch-based video decoder may receive reuse flags in the video metadata or remodelling metadata therein (e.g., indicated in a 3D video signal generated by an upstream patch-based video encoder, etc.) and may simply continue to use the same remodelling operation parameters in the decoding processing loop to remodel the current or subsequent patches in the sequence of consecutive patches.
At the encoder side, a patch-based video encoder may determine whether to perform, for example, patch-level remodeling on different individual patches in a sequence of consecutive patches with different remodeling operating parameters on an individual patch basis, or whether to perform remodeling on a sequence of consecutive patches with the same remodeling operating parameters on an individual scene basis, or to perform remodeling on a subset or subdivision of a sequence of consecutive patches with the same remodeling operating parameters on an individual subset or subdivision basis, or the like.
In some cases of operation, scene-based reshaping will be performed to help achieve relatively high coding gain or relatively fast coding performance. For each patch sequence that shares the same or substantially the same patch position in multiple times (or temporal instances of the time domain) in the same scene, a single remodeling function or map may be constructed by the patch-based video encoder. Because the reshaping function/map is stable for the entire scene, this approach also facilitates or helps perform motion estimation and compensation across different video frames (e.g., different patches, different 2D images/patches, etc.), e.g., in a relatively predictable and stable manner.
In some cases, different patches have different video, image, or codeword content and are reshaped with different reshaping functions/mappings. Inter-frame or inter-frame prediction (or motion estimation) may not produce meaningful motion vectors that intersect patch boundaries separating patches from one continuous patch sequence in a scene from other patches from other continuous patch sequences in the same scene, especially for those meaningful pixels in the patch (e.g., projection from at least one point in a 3D point cloud, etc.).
The block-based encoding may be performed by the patch-based video codec described herein for the purpose of separating inter-or inter-prediction (or motion estimation) for different patches or different patch sequences in the same scene. Video tiles may be independent coding units defined by applicable video or image coding specifications and may be referred to as slices, partitions, tiles, or sub-pictures, among others. Different patches may be specified to be represented in different tiles, such as a temporal Motion Constrained Tile Set (MCTS) in HEVC, sub-pictures in VVC, and so on. Inter-frame or inter-frame prediction (or motion estimation) for different patches in the same scene may be performed independently of each other. In some cases of operation, patches enclosed in their respective bounding boxes in the form of tiles may be independently remodeled. In these operating cases, the remodeling of the patch enclosed in the patch based on the explicitly selected remodeling operating parameters for the patch is limited to within the patch boundaries of the patch. The remodeling of other patches outside the patch boundaries of the patch is based on other remodeling operating parameters that are specifically selected for the other patches.
Example tiles may include, but are not necessarily limited to, any of the following: intra-atlas type or I-TILE (TILEs that are fully decodable using information in a TILE (e.g., not predicted, possibly inter-predicted, etc.); inter-atlas type or P-TILE (a block that is decodable using information in a block and a signal inter-predicted from information in one or more other blocks), SKIP atlas block or skip_tile (all of the block information for this block is copied directly from another block with the same ID), etc.
Example patches may be represented or encoded in a tile in a patch mode (or patch coding mode), such as patch SKIP mode or p_skip (all patch information for this patch is copied directly from another patch with the same ID), patch MERGE mode or p_merge (a patch that is decodable using information in the patch and information INTER-predicted from one or more other patches and information INTER-predicted from the patch), INTER-predicted patch mode or p_inter (a patch that is decodable using information in the patch and information INTER-predicted from one or more patches), unpredicted or possibly INTER-predicted patch mode or p_intra, original patch mode or p_raw (for storing unprojected points), EOM point patch mode or p_eom, etc.
The slice-based encoding may be performed as intra-loop or out-of-loop video/image processing operations. As used herein, intra-loop video/image processing operations operate within an encoding and decoding loop. In some operation cases, video/image processing operations, which may include, but are not limited to, tile setup and remodeling, may be performed as out-of-loop operations to help improve coding efficiency and avoid visual artifacts caused by applying different reshaping functions/mappings to adjacent patches in a sequence of adjacent patches (e.g., having adjacent patch locations, etc.).
Fig. 3C illustrates an example tile-based encoding of a time sequence of a patch set according to a time sequence of an atlas (or atlas sequence) encoded in an atlas bitstream of a 3D video signal.
The atlas information parameters and values of the patch sets in the time series of patch sets may be specified or defined in the respective atlas in the atlas sequence. The atlas information parameters and values in the respective atlas identify the packing order and block-to-patch mapping of patches in the optimally laid out patch set for packing patches in the patch set in the 2D video/image frame.
Data or information redundancy, as packed from an atlas in an atlas sequence and as encoded in a set of component 2D video signals or patches in a timing signal of a 3D video signal, may be reduced, compressed or predicted in a manner similar to video data redundancy by motion vector reduction, compression or prediction. A patch may reference another patch in the same 2D video/image frame (e.g., internally, inter-frame, etc.) or in a previous or later 2D video/image frame for the purpose of copying or reusing some or all of the patch information from the later patch. Example patch information described herein may include, but is not necessarily limited to, any, some, or all of the following: the album information, the patch specific video metadata part, the patch specific remodelling metadata part in an album bitstream of the 3D video signal, in other bitstreams of the 3D video signal, etc.
As shown in fig. 3C, the atlas information parameters and values in the atlas may specify or define an atlas frame height (labeled "asps_frame_height"; where "asps" represents the "atlas sequence parameter set" defined for one or more atlas sequences) and an atlas frame width (labeled "asps_frame_width"). Patches may be assembled or packaged into an atlas by means of a 2D partition array. For example, the atlas frame height may be partitioned into (linear) height partitions labeled "partitionHeight [0]" to "partitionHeight [3]" and the atlas frame width may be partitioned into (linear) width partitions labeled "partitionWidth [0]" to "partitionWidth [3 ]". Thus, the 2D partitioned array may be formed as a rectangle or rectangular spatial region in the atlas.
Patches derived from the 3D point cloud may be represented in tiles or bounding boxes. As illustrated in fig. 3C, some or all of the partitions in the 2D partition array in the set of graphs may be assigned to form tiles (labeled "tile 0" through "tile 6"), which represent bounding boxes of patches. Zero or more partitions (labeled "unassigned") in the 2D partition array may not be assigned to any tile. For example, four 2D partitions in a 2D partition array may be assigned to form a tile "tile 0" having a horizontal width labeled "tilehidth [0]" and a vertical height labeled "tilehight [0 ]".
Each of the tiles specified in the set of maps may be assigned to carry mapping information (e.g., block-to-patch, etc.) for a corresponding one of the patches to be assembled or packaged into a 2D video/image frame (e.g., occupancy map/frame, geometric map/frame, attribute map/frame, etc.) according to the set of maps. For example, each tile (e.g., one of "tile 0" through "tile 6", etc.) may carry mapping information assigned to the corresponding patch of the tile (e.g., one of "tile 0" through "tile 6", etc.).
The atlas parameters and values in the atlas encoded in the atlas bitstream may be used by the patch-based video encoder or an atlas coding block therein to specify or define an explicit or implicit partition/patch order and partition height and partition width of an array of partitions in the atlas (e.g., as part of a high-level parameter set).
Each of tiles 0-6, or each of the patches assigned thereto, may be independently decoded (e.g., with random access, etc.) by a recipient device of the graphics stream of the 3D video signal without reference to other tiles. Each of tiles 0 through 6, or each of the patches assigned thereto, need only be accessed with video data and video metadata in that tile. Thus, some or all of the patch-based video/image processing operations may be performed in parallel.
The tile-based encoding enables parallelization and random access of particular portions, or subsets of an input 3D point cloud represented by projected or non-projected points in a patch (e.g., projected patch, original and/or EOM patch, etc.). For example, to decode a patch assigned to patch data with a delineated human head, it may only be necessary to decode the information represented in the patch, without referencing other patches assigned to other patches with other patch data that delineate other visual features/objects.
8. Patch-based remodelling syntax
Coding syntax and syntax elements for coding or decoding a 3D video signal may include remodelling (related) syntax and syntax elements. The remoting operation parameters defined or specified with the remoting syntax and syntax elements may be carried or encoded in a separate bitstream or alternatively integrated into an existing bitstream (e.g., an atlas bitstream in a 3D video signal). Example video coding syntax and syntax elements at various levels may be found in U.S. patent No. 10,136,162 issued at 2018, 11, 20, the entire contents of which are hereby incorporated by reference as if fully set forth herein.
The remodelling syntax and syntax elements may be specified or defined as part of the applicable coding syntax specifications-or extended from the applicable coding syntax specifications to encode remodelling operating parameters into and derive these operating parameters from the 3D video signal.
Fig. 3D illustrates an example at a relatively high level, applicable coding syntax specifications, such as V3C and V-PCC coding syntax specifications. The specification may specify or define several high level grammars (HLS) applicable to high level coding data constructs at the frame or sequence level as a whole. These HLSs may include, but are not necessarily limited to, any, some, or all of the following: a V-PCC parameter set (VPS) comprising a high (e.g., highest, etc.) level V-PCC parameter; a Sequence Parameter Set (SPS) comprising sequence level syntax elements (e.g., atlas SPS parameters or ASPS, etc.) applied to the sequence as a whole; a Frame Parameter Set (FPS) including frame-level syntax elements (e.g., atlas FPS parameters or AFPS, etc.) applied to the frame as a whole; a Picture Parameter Set (PPS) including picture level syntax elements applied to a picture/image as a whole; an Adaptive Parameter Set (APS) comprising operating parameters for adaptation (e.g., an atlas APS parameter or AAPS, etc.); a Picture Header (PH) comprising frame-level header parameters; supplemental Enhancement Information (SEI) including syntax elements for synthesizing SEI messages at a frame level or higher; etc.
Additionally, optionally or alternatively, the applicable coding syntax specification may specify or define a number of low-level syntax applicable to relatively low-level coding data constructs at the subframe level. These low-level grammars may include, but are not necessarily limited to, any, some, or all of the following: a Slice Header (SH) including syntax elements to code the slice header; a Tile Header (TH) comprising syntax elements to code the tile header; syntax of a Data Unit (DU) comprising syntax elements to code audio or video data; etc.
As previously described, the 3D video signal may include a coded (e.g., V3C, etc.) video component or a set of component 2D video signals or timing signals. The component 2D video signal may be a coded bitstream or video stream that is used to carry atlas data, occupancy patch data, geometry patch data, attribute patch data, original and/or EOM patch data, and so forth.
As illustrated in fig. 3D, in some operational cases, a 3D video signal may be formatted or coded into a stream of V-PCC (data) units according to a V-PCC video coding specification. The V-PCC unit stream comprises a V-PCC (data) unit stream that is decodable by a recipient device of the 3D video signal along a given decoding order. These V-PCC units may be various V-PCC or V3C (data) unit types identified by different unique numerical identifiers (labeled "vuh _unit_type) (VUH). These unique numerical identifiers correspond to different identifiers (labeled "identifiers") enumerated for different types of data units formatted or encoded in the 3D video signal according to the V-PCC video coding specification.
More specifically, data units specifying a V-PCC parameter set may be identified by an identifier or tag of "V3C-VPS" and carried in a V-PCC unit having a value of 0 "vuh_unit_type". The data units of the atlas data specifying the atlas may be identified by an identifier or tag of "V3C-AD" and carried in V-PCC units with a value of 1 "vuh_unit_type". The data units encapsulating the occupied patch data (or occupied video data) may be identified by an identifier or tag of "V3C-OVD" and carried in a V-PCC unit having a value of 2 "vuh_unit_type". The data units encapsulating the geometric patch data (or geometric video data) may be identified by an identifier or tag of "V3C-GVD" and carried in V-PCC units having a value of 3 "vuh_unit_type". The data units encapsulating the attribute patch data (or attribute video data) may be identified by an identifier or tag of "V3C-AVD" and carried in a V-PCC unit having a value of 4 "vuh_unit_type".
V-PCC data units in a 3D video signal in V-PCC unit stream format as depicted in fig. 3D may include data units constituting a number of V-PCC or V3C sample streams and logically represent the number of V-PCC or V3C sample streams. These sample streams may include sample streams of V-PCC parameter sets (identified by identifier "v3c_vps"), sample streams of atlas data, sample streams of occupied path/video data (identified by identifier "v3c_ovd"), sample streams of geometric path/video data (identified by identifier "v3c_gvd"), sample streams of attribute path/video data (identified by identifier "v3c_avd"), and so forth.
An instance of a sample stream may be further split into smaller units, referred to as sub-bitstreams.
More particularly, as illustrated in fig. 3D, a sample stream of data units encapsulating V-PCC parameter sets in a 3D video signal may be split into sub-bitstreams, each of which includes: (1) A VPS data unit header formatted or coded using an applicable syntax or syntax element identified by an identifier "v3c_vps"; and (2) a VPS data unit payload, which is formatted or coded using an applicable syntax or syntax element labeled "v3c_parameter_set ()".
A sample stream or an atlas stream of atlas data in a 3D video signal may be split into sub-bitstreams, each of which comprises: (1) An atlas data unit header, which is formatted or coded using an applicable syntax or syntax element identified by the identifier "v3c_ad"; and (2) an atlas data unit payload, which is formatted or coded using an applicable syntax or syntax element labeled "atlas_sub_stream ()".
A sample stream or an occupied stream of occupied data in a 3D video signal may be split into sub-bitstreams, each of which includes: (1) An occupied data unit header formatted or coded using an applicable syntax or syntax element identified by the identifier "v3c_ovd"; and (2) occupies a data unit payload, which is formatted or coded using an applicable syntax or syntax element labeled "video_sub_stream ()".
A sample stream or a geometric stream of geometric data in a 3D video signal may be split into sub-bitstreams, each of which includes: (1) A geometry data unit header formatted or coded using an applicable syntax or syntax element identified by the identifier "v3c_gvd"; and (2) a geometry data unit payload, which is formatted or coded using a "video_sub_stream ()" syntax or syntax element.
A sample stream or an attribute stream of attribute data in a 3D video signal may be split into sub-bitstreams, each of which includes: (1) An attribute data unit header formatted or coded using an applicable syntax or syntax element identified by the identifier "v3c_avd"; and (2) an attribute data unit payload, which is formatted or coded using a "video_sub_stream ()" syntax or syntax element.
The sub-bitstream may be a Network Adaptation Layer (NAL) sample stream that includes NAL units, each of which includes a NAL unit header and a NAL unit payload (e.g., an original byte sequence payload or RBSP) and carries the respective data. For example, as depicted in fig. 3D, an atlas sub-bitstream may be a NAL sample stream that includes NAL units that carry: (1) ASPS header formatted or coded using an applicable syntax or syntax element labeled "nal_asps" and ASPS payload formatted or coded using an applicable syntax or syntax element labeled "atlas_sequence_parameter_set_rbsp ()"; (2) AAPS header formatted or coded using an applicable syntax or syntax element labeled "nal_aaps" and AAPS payload formatted or coded using an applicable syntax or syntax element labeled "atlas_adaptation_parameter_set_rbsp ()"; (3) An AFPS header formatted or coded using an applicable syntax or syntax element labeled "nal_afps" and an AFPS payload formatted or coded using an applicable syntax or syntax element labeled "atlas_frame_parameter_set_rbsp ()"; (4) Basic supplemental enhancement information (ESEI) header formatted or coded using an applicable syntax or syntax element labeled "nal_esei" and ESEI payload formatted or coded using an applicable syntax or syntax element labeled "sei_rbsp ()"; (5) An NAL header (for an I-slice group) formatted or coded using an applicable syntax or syntax element labeled "ACL NAL unit type" (where "ACL" is an abbreviation for "atlas coding layer") and an atlas slice group layer payload formatted or coded using an applicable syntax or syntax element labeled "atlas_tile_group_layer_rbsp ()"; (6) NAL header (for P-slice group) formatted or coded using "ACL NAL unit type" syntax or syntax element, and atlas-slice-group layer payload formatted or coded using "atlas_tile_layer_rbsp ()" syntax or syntax element; (7) A SUFFIX ESEI header formatted or coded using an applicable syntax or syntax element labeled "nal_suffix_esei" and a SUFFIX ESEI payload formatted or coded using a "sei_rbsp ()" syntax or syntax element; etc.
As further depicted in fig. 3D, the atlas TILE group layer payload of the I TILE group may carry an atlas TILE group header and an atlas TILE group data unit formatted or coded using an applicable syntax or syntax element labeled "i_tile_grp". The atlas tile group data unit may include one or more of: (1) Patch data units of INTRA-type I (denoted as "I INTRA") tiles, such as formatted or decoded using an applicable syntax or syntax element denoted as "patch_data_unit ()"; (2) Patch data units of an I-primitive (labeled "i_raw") block, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (3) Patch data units of an EOM type (labeled "i_eom") tile, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (4) a delimiter, labeled "I_END"; etc.
Also, as further depicted in fig. 3D, the atlas TILE group layer payload of the P TILE group may carry an atlas TILE group header and an atlas TILE group data unit formatted or coded using an applicable syntax or syntax element labeled "p_tile_grp". The atlas tile group data unit may include one or more of: (1) Patch data units of P-skipped (labeled "p_skip") tiles, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (2) Patch data units of a P-MERGE (labeled "p_merge") tile, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (3) Patch data units of P-interframe type (labeled "p_inter") tiles, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (4) Patch data units of P INTRA (labeled "P INTRA") tiles, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (5) Patch data units of P primitive (labeled "p_raw") tiles, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (6) Patch data units for P EOM type (labeled "p_eom") tiles, such as formatted or decoded using a "patch_data_unit ()" syntax or syntax element; (7) a delimiter, denoted "P_END"; etc.
Control flags to enable reshaping and to indicate the presence of corresponding metadata in the 3D video signal may be indicated in SPS, PPS, APS, PH, SEI, SH, PH or the like at various levels. Additionally, optionally or alternatively, syntax elements used to carry or encode the remodelling operation parameters may be specified or defined in the coding syntax at various levels.
In some operation cases, the patch information data syntax may include syntax elements to internally encode or carry video/image processing operation data within an atlas tile data unit (labeled atlas tile data unit ()) at the patch level. An example Patch information data syntax, labeled Patch_information_data (), is illustrated in Table 1 below (where the "descriptor" column may be used to indicate the base data type and/or data size).
TABLE 1
/>
In some operation cases, a control flag (labeled "asps_reshaping_enabled_flag" using a single bit 0 or 1) to enable reshaping may be specified, defined, or added to a atlas SPS in a coding syntax for encoding or decoding a group of pictures (GOP) or a (picture) sequence in a visual scene, as illustrated in table 2 below.
TABLE 2
/>
A control flag ("asps_reshaping_enabled_flag") equal to 1 specifies that patch-based remodelling is enabled for the sequence. On the other hand, a control flag ("asps_reshaping_enabled_flag") equal to 0 specifies disabling patch-based remodelling of the sequence.
The remodelling metadata (e.g., labeled "patch_resumption_metadata ()", etc.) including remodelling operational parameters (e.g., coefficients or parameters specifying remodelling functions/mappings, etc.) may be carried within the 3D video signal according to a patch level syntax (labeled "patch_information_data ()") as illustrated in table 3 below.
TABLE 3 Table 3
In some operation cases, a patch level syntax ("patch_information_data ()") may support the carrying or inclusion of various data units based on a particular atlas frame type (labeled "ath_type") and/or based on a particular patch mode (labeled "patch mode"; e.g., syntax elements in an atlas bitstream or sub-bitstream that indicate how patches are defined and associated with other components and provide information on how such components are reconstructed, etc.).
The atlas frame type may refer to a particular type of atlas frame storing P tiles, a particular type of atlas frame storing I tiles, etc. In some operation cases, a patch level syntax ("patch_information_data ()") may support the carrying or inclusion of various data units, regardless of the atlas frame type ("ath_type") and/or regardless of the patch mode ("patch mode").
In some operational cases, remodelling metadata (e.g., labeled "resharping_metadata ()", etc.) may be added as part or subset of a patch data unit (labeled patch_data_unit ()), an original patch data unit (labeled "raw_patch_data_unit ()"), an Estimated Occupancy Map (EOM) patch data unit (labeled "eom_patch_data_unit ()"), etc. that corresponds to a particular patch mode. As used herein, the unprojected points of the 3D point cloud may be encoded into one or more EOM patch data units in the 3D video signal, as described herein, while the projected points may be encoded in occupancy, geometry, and attribute patch data in the 3D video signal. For other patch patterns (e.g., P_SKIP, P_MERGE, P_INTER, etc.) other than the particular patch pattern, some or all of the remodelling metadata or values therein may be inferred from the syntax or values of the syntax elements of the particular patch pattern.
Examples of adding remodelling metadata ("resharing_metadata ()") as part or subset of patch data units ("patch_data_unit ()) are illustrated in table 4 below. The same or similar syntax may be used or applied to other patch data units, such as original patch data units ("raw_patch_data_unit ()) and EOM patch data units (" eom_patch_data_unit () ").
TABLE 4 Table 4
In various operating cases, the remodeling metadata described herein may specify or define a remodeling function/map based on one or more of the following: piecewise linear segments, piecewise polynomial segments, spline curves, look-up tables (LUTs), and so forth.
In some operation cases, the 3D video signals described herein may be coded or decoded based on VVC coding specifications. Patch level remodelling metadata (e.g., "patch_reshaping_metadata ()", etc.) may be specified or defined as a Luma Map and Chroma Scaling (LMCS) data unit (labeled "lmcs_data ()") as an extension of the VVC coding specification, as illustrated in table 5 below.
TABLE 5
By way of example and not limitation, in the example coding syntax or syntax elements illustrated in tables 6-8 below, patch level remodelling metadata may specify or define the remodelling function/map as a multi-segment polynomial. Patch level remodelling metadata may be specified or defined by a coding syntax using a data construct (e.g., vdr_rpu_data_payload (), etc.), wherein syntax elements num_y_parts_minus1 and num_x_parts_minus1 may be set to zero (0).
TABLE 6
TABLE 7
TABLE 8
/>
In some operational cases, other types of video metadata, such as display management metadata, L1 metadata, etc., may be included for each patch or group of patches. Example Display Management (DM) metadata, L1 metadata, and the like, may be found in U.S. patent No. 10,460,699 issued on 10, 29, 2019, the entire contents of which are hereby incorporated by reference as if set forth in full herein.
Instead of using an existing bitstream, e.g., an atlas bitstream, the remoting metadata may also be encoded or carried in a separate bitstream, e.g., a custom bitstream. In some operation cases, a for loop may be implemented to include or encode remodelling metadata for each patch of a plurality of patches into a bitstream. Each patch may be mapped to a corresponding patch in the atlas bitstream, e.g., using or sharing the same identifier for the patch defined in the atlas bitstream.
The described reshaping techniques may be relatively efficiently integrated into a 3D video coding specification for the purpose of improving coding efficiency and performance by reusing and extending existing image preprocessing, processing, and post-processing operations according to a 3D video coding syntax. As previously described, in some operational cases, patches may be assigned different quality importance or weighting factors to allow patches that are visually relatively significant to be more accurately encoded and less affected by compression, quantization, or coding errors. The remodelling operations (e.g., reverse remodelling, point cloud reconstruction, etc.) may be specified by patch-based remodelling metadata, DM metadata, etc. according to 3D video coding specifications.
9. Video metadata
In some operational cases, video metadata generated by an upstream device, such as a patch-based video encoder (e.g., 102 of fig. 1A, 1002 of fig. 1C, etc.), which may be related to a reference processing unit or RPU, may be provided to a downstream device of a plurality of Network Abstraction Layer (NAL) data units. The NAL data unit includes a NAL header and an original byte sequence payload (RBSP).
The video metadata units may be used as a common medium to deliver video metadata from an upstream device to a downstream device, where the video metadata units may be associated with any of a plurality of video coding specifications (e.g., different versions, etc.).
Video metadata units may be carried or encoded in RBSPs of NAL data units. The video metadata unit may include a video metadata header and a video metadata payload. The video metadata header may include a header field that identifies a codec or coding system type and a particular video coding specification among a plurality of different video coding specifications. The video metadata header may also include one or more high-level (e.g., sequence level, frame level, etc.) portions of video metadata carried in the video metadata units.
The video metadata payload may be used to transmit, by an upstream device to a downstream device, descriptors (or syntax descriptions) of sets of flags, operations, and parameters that may be used to decode a 3D video signal and to generate output 3D video content, e.g., the same as or closely approximates the input 3D point cloud used by the upstream device to generate the 3D video signal. The one or more flags, operations, and parameters may be used to be performed by a downstream device to reconstruct a reshaping operation (or reverse reshaping operation) of the 3D point cloud, as described by the video metadata payload. Additionally, optionally or alternatively, one or more functions, operations, and parameters may be used for other video/image processing operations, such as DM operations to be performed by downstream devices to generate display images from the reconstructed 3D point cloud.
A coding syntax including one or more syntax elements that conform to the 3D video coding specification may be transmitted/signaled in the 3D video signal by the patch-based video encoder to the patch-based video decoder. Syntax elements may specify flags, operations, and parameters used in 3D encoding operations and corresponding 3D decoding operations. The parameters represented in the syntax elements may be of different coefficient types and may be specified as logical values, integer (fixed point) values, or floating point values, with various accuracies, bit lengths, word lengths, or the like.
Some of the coding syntax or syntax elements may be classified as sequence level information, which remains unchanged for the complete sequence of consecutive pictures. In various operating cases, the complete image sequence as specified by the sequence level information may correspond to one of: a complete sequence of consecutive 2D video/image frames for the encapsulated patch; a complete sequence of consecutive patches in the complete sequence of consecutive patches; a complete sequence of consecutive occupancy, geometry or attribute patch data; a complete sequence of consecutive 3D images represented by the complete sequence of consecutive 3D point clouds; etc. For transmission efficiency reasons, sequence level information for each image in the complete sequence of consecutive images may not be sent by the patch-based video encoder to the patch-based video decoder. Instead, the sequence level parameters for each sequence of consecutive images may be sent once. However, some or all of the sequence level information may be repeated once, twice, etc. by the patch-based video encoder within the same sequence of consecutive images for random access, error correction, and/or robustness reasons.
Some syntax elements in the coding syntax may be classified as frame level information, which remains unchanged for the entire frame/picture. In various operation cases, the entire frame/image as specified by the frame level information may correspond to one of: an entire 2D frame for encapsulating the patch; the whole patch; full occupancy, geometry, or attribute patch data in the entire patch; etc.
In some embodiments, the (entire) frame may be logically divided into one or more (e.g., non-overlapping, using a quadtree structure, using an octree structure, etc.) partitions. Some syntax elements in the coding syntax may be classified as (e.g., low-level, etc.) partition level information that remains unchanged for the entire partition of the frame. In various operating cases, a partition as specified by the partition level information may correspond to one of the following: partitions (e.g., blocks, etc.) for encapsulating the entire 2D frame of the patch; all occupied, geometric or attribute patch data partitions in the entire patch; etc.
10. Example Process flow
FIG. 4B illustrates an example process flow according to an embodiment. In some embodiments, one or more computing devices or components (e.g., one or more video codecs, encoding devices/modules, transcoding devices/modules, decoding devices/modules, inverse tone mapping devices/modules, media devices/modules, inverse map generation and application systems, etc.) may perform this process flow. In block 422, the image processing system receives an input 3D point cloud. The input 3D point cloud includes a spatial distribution of points positioned at a plurality of spatial locations in the represented 3D space.
In block 424, the image processing system generates a plurality of patches from the input 3D point cloud. Each patch of the plurality of patches includes pre-remodeled patch data of one or more patch data types. The pre-remodeled patch data is derived at least in part from visual properties of a subset of points in the input 3D point cloud.
In block 426, the image processing system performs encoder-side remodelling on the remodelled patch data contained in the plurality of patches to generate remodelled patch data of one or more patch data types for the plurality of patches.
In block 428, the image processing system encodes one or more data types of the plurality of patches of remodelled patch data into the 3D video signal in place of the one or more data types of the plurality of patches of remodelled patch data. The 3D video signal causes a recipient device of the 3D video signal to generate a reconstructed 3D point cloud that approximates the input 3D point cloud.
In an embodiment, the remodeled patch data of the one or more patch data types of the plurality of patches is generated from the pre-remodeled patch data based on a plurality of remodelling functions.
In an embodiment, the plurality of remodelling functions includes a first remodelling function for remodelling a first patch of the plurality of patches; the plurality of remodelling functions includes a second, different remodelling function for remodelling a second patch of the plurality of patches.
In an embodiment, the first remodeling function for reshaping the first patch is specified by a first remodeling metadata portion; the second remodelling function for remodelling the second patch is specified by a second different remodelling metadata portion; both the first remodelling metadata portion and the second remodelling metadata portion are encoded in the 3D video signal.
In an embodiment, the plurality of remodelling functions includes a particular remodelling function for remodelling a particular patch of the plurality of patches; the particular remodeling function is determined based at least in part on a noise level calculated from a patch data portion in the particular patch.
In an embodiment, the image processing system further performs: determining a subset of two or more patches among the plurality of patches; generating two or more pre-adjusted remodelling functions for the two or more patches, each of the two or more pre-adjusted remodelling functions corresponding to a respective patch of the two or more patches; assigning two or more weighting factors to the two or more patches, each of the two or more weighting factors being assigned to a respective patch of the two or more patches; the two or more weighting factors are used to adjust the two or more pre-adjusted remodelling functions to two or more remodelling functions included in the plurality of remodelling functions.
In an embodiment, the two or more pre-adjusted remodelling functions include pre-adjusted remodelling functions corresponding to patches of the plurality of patches; the pre-adjusted remodelling function is adjusted to a remodelling function included in the plurality of remodelling functions based at least in part on a codeword distribution determined from codewords included in the patch.
In an embodiment, the one or more patch data types include at least one of: occupancy patch data type, geometry patch data type, attribute patch data type, etc.
In an embodiment, the foregoing process flow or method is performed by a 3D video encoder implemented at least in part by one or more video codecs associated with one of: MPEG, AVC, HEVC, VVC, AV1, EVC, PCC, V-PCC, V3C, etc.
In an embodiment, the 3D video signal is further encoded with remodelling metadata that enables the receiver device to perform reverse remodelling on the remodelled patch data decoded from the 3D video signal.
In an embodiment, the pre-reshaped patch data of the one or more patch data types of the plurality of patches is encoded in one or more video frames of the 3D video signal according to an optimal or predetermined layout of an atlas of the plurality of patches; atlas information specifying the optimal or predetermined layout of the atlas is encoded in the 3D video signal according to a 3D video coding specification.
In an embodiment, the plurality of patches includes a projected patch; the projected patch is generated by applying one or more 2D projections to the input 3D point cloud.
In an embodiment, the dynamic range represented in the remodeled patch data is different from the input dynamic range represented in the pre-remodeled patch data.
In an embodiment, the encoder-side remodelling of the remodelled patch data included in the plurality of patches is performed based on a plurality of patch-based remodelling functions; the plurality of patch-based remodeling functions includes at least one patch-based remodeling function associated with one of: a multi-segment polynomial, a three-dimensional look-up table (3 DLUT), a cross color channel predictor, a multi-color channel multiple regression (MMR) predictor, a predictor with a B-spline function as a basis function, a tensor product B-spline (TPB) predictor, etc.
In an embodiment, the plurality of patch-based remodelling functions includes a first patch-based remodelling function for remodelling a first portion of the pre-remodeled patch data included in a first patch of the plurality of patches; the plurality of patch-based remodelling functions includes a second different patch-based remodelling function for remodelling a second portion of the pre-remodeled patch data included in a second different patch of the plurality of patches.
FIG. 4C illustrates an example process flow according to an embodiment. In some embodiments, one or more computing devices or components (e.g., one or more video codecs, encoding devices/modules, transcoding devices/modules, decoding devices/modules, inverse tone mapping devices/modules, media devices/modules, inverse map generation and application systems, etc.) may perform this process flow. In block 442, the image processing system decodes the remodeled patch data for one or more data types from the plurality of patches of the 3D video signal.
In block 444, the image processing system performs decoder-side remodelling on remodelled patch data for the plurality of patches to generate reconstructed patch data of one or more patch data types for the plurality of patches.
In block 444, the image processing system generates a reconstructed 3D point cloud based on the reconstructed patch data of one or more patch data types of the plurality of patches.
In an embodiment, the image processing system further performs rendering of the display image derived from the reconstructed 3D point cloud on an image display.
In an embodiment, a computing device, such as a display device, mobile device, set-top box, multimedia device, or the like, is configured to perform any of the foregoing methods. In an embodiment, an apparatus includes a processor and is configured to perform any of the foregoing methods. In an embodiment, a non-transitory computer-readable storage medium stores software instructions that when executed by one or more processors cause performance of any of the foregoing methods.
In an embodiment, a computing device includes one or more processors and one or more storage media storing a set of instructions that when executed by the one or more processors cause performance of any of the foregoing methods.
Note that although separate embodiments are discussed herein, any combination of the embodiments and/or portions of the embodiments discussed herein may be combined to form further embodiments.
11. Implementation mechanism-hardware overview
Embodiments of the invention may be implemented with the following: a computer system; a system configured in the electronic circuitry and components; an Integrated Circuit (IC) device, such as a microcontroller, a Field Programmable Gate Array (FPGA), or another configurable or Programmable Logic Device (PLD); a discrete time or Digital Signal Processor (DSP); an Application Specific IC (ASIC); and/or apparatus comprising one or more of such systems, devices, or components. The computer and/or IC may execute (or control) instructions related to adaptive perceptual quantization of images having enhanced dynamic range, such as the images described herein. The computer and/or IC may calculate any of a variety of parameters or values related to the adaptive perceptual quantization process described herein. Image and video embodiments may be implemented in hardware, software, firmware, and various combinations thereof.
Certain embodiments of the invention include a computer processor executing software instructions that cause the processor to perform the methods of the present disclosure. For example, one or more processors in a display, encoder, set top box, transcoder, or the like may implement the methods related to adaptive perceptual quantization of HDR images described above by executing software instructions in a program memory accessible to the processor. Embodiments of the present invention may also be provided in the form of a program product. The program product may comprise any non-transitory medium carrying a set of computer readable signals comprising instructions that, when executed by a data processor, cause the data processor to perform the methods of embodiments of the present invention. The program product according to embodiments of the invention may take any of a wide variety of forms. The program product may comprise, for example, physical media such as: magnetic data storage media including floppy disks, hard disk drives; an optical data storage medium comprising a CD ROM, DVD; an electronic data storage medium including ROM, flash RAM, or the like. The computer readable signal on the program product may optionally be compressed or encrypted.
Where a component (e.g., a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to "a member") should be interpreted as equivalents of that component including any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
According to one embodiment, the techniques described herein are implemented by one or more special purpose computing devices. The special purpose computing device may be hardwired to perform the techniques, or may include a digital electronic device, such as one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs), that is permanently programmed to perform the techniques, or may include one or more general purpose hardware processors that are programmed to perform the techniques in accordance with program instructions in firmware, memory, other storage devices, or a combination. Such special purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to implement the techniques. The special purpose computing device may be a desktop computer system, portable computer system, handheld device, networking device, or any other device that incorporates hardwired and/or program logic to implement the techniques.
For example, FIG. 5 is a block diagram illustrating a computer system 500 upon which an embodiment of the invention may be implemented. Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and a hardware processor 504 coupled with bus 502 for processing information. The hardware processor 504 may be, for example, a general purpose microprocessor.
Computer system 500 also includes a main memory 506, such as a Random Access Memory (RAM) or another dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in a non-transitory storage medium accessible to the processor 504, present the computer system 500 as a special purpose machine customized to perform the operations specified in the instructions.
Computer system 500 further includes a Read Only Memory (ROM) 508 or another static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk or optical disk, is provided and coupled to bus 502 for storing information and instructions.
Computer system 500 may be coupled via bus 502 to a display 512, such as a liquid crystal display, to display information to a computer user. An input device 514, including alphanumeric and other keys, is coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512. Such an input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows the device to specify positions in a plane.
Computer system 500 may implement the techniques described herein using custom hard-wired logic, one or more ASICs or FPGAs, firmware, and/or program logic in conjunction with a computer system to render computer system 500 as a special-purpose machine or to program computer system 500 as a special-purpose machine. According to one embodiment, the techniques described herein are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
The term "storage medium" as used herein refers to any non-transitory medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510. Volatile media includes dynamic memory, such as main memory 506. Common forms of storage media include, for example, a floppy disk (floppy disk), a hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media is different from, but may be used in conjunction with, transmission media. Transmission media participates in transferring information between storage media. Transmission media includes, for example, coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
One or more sequences of one or more instructions are loaded to processor 504 for execution by a computer system, which may be involved in various forms of media. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 500 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 502. Bus 502 carries the data to main memory 506, from where processor 504 retrieves and executes the instructions. The instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504.
Computer system 500 also includes a communication interface 518 coupled to bus 502. Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522. For example, communication interface 518 may be an Integrated Services Digital Network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line, or the like. As another example, communication interface 518 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 520 typically provides data communication through one or more networks to other data devices. For example, network link 520 may provide a connection through local network 522 to a host computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526. ISP 526 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 528. Local network 522 and internet 528 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 520 and through communication interface 518, which carry the digital data to computer system 500 and carry the digital data from computer system 500, are example forms of transmission media.
Computer system 500 can send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518. In the Internet example, a server 530 might transmit a code for an application program request through Internet 528, ISP 526, local network 522 and communication interface 518.
The received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.
12. Equivalents, extensions, alternatives and miscellaneous items
In the foregoing specification, examples of the application have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the application, and is intended by the applicants to be the application, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Thus, any limitations, elements, properties, features, advantages or attributes that are not explicitly recited in the claims should not limit the scope of such claims in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Exemplary embodiments of enumeration
The invention may be embodied in any of the forms described herein including, but not limited to, the example embodiments (EEEs) enumerated below that describe the structure, features, and functionality of some portions of the embodiments of the present invention.
Eee1. A method comprising:
receiving an input three-dimensional (3D) point cloud, wherein the input 3D point cloud includes a spatial distribution of points positioned at a plurality of spatial locations in the represented 3D space;
generating a plurality of patches from the input 3D point cloud, wherein each patch of the plurality of patches includes pre-remodeled patch data of one or more patch data types, wherein the pre-remodeled patch data is derived at least in part from visual properties of a subset of the points in the input 3D point cloud;
performing encoder-side remodelling on the remodelled patch data included in the plurality of patches to generate remodelled patch data of the one or more patch data types for the plurality of patches;
the remodelled patch data of the one or more data types of the plurality of patches is encoded into a 3D video signal in place of the remodelled patch data of the one or more data types of the plurality of patches, wherein the 3D video signal causes a recipient device of the 3D video signal to generate a reconstructed 3D point cloud approximating the input 3D point cloud.
EEE2. The method of EEE1 wherein the remodeled patch data of the one or more patch data types of the plurality of patches is generated from the pre-remodeled patch data based on a plurality of remodelling functions.
EEE3. The method of EEE2 wherein the plurality of remodelling functions comprises a first remodelling function for remodelling a first patch of the plurality of patches, wherein the plurality of remodelling functions comprises a second, different remodelling function for remodelling a second patch of the plurality of patches.
EEE4. The method of EEE2 or EEE3 wherein the first remodelling function for remodelling the first patch is specified by a first remodelling metadata portion, wherein the second remodelling function for remodelling the second patch is specified by a second, different remodelling metadata portion, wherein both the first remodelling metadata portion and the second remodelling metadata portion are encoded in the 3D video signal.
EEE5. The method of any one of EEE 2-EEE 4 wherein the plurality of remodelling functions comprises a particular remodelling function for remodelling a particular patch of the plurality of patches, wherein the particular remodelling function is determined based at least in part on a noise level calculated from patch data portions in the particular patch.
EEE6. The method according to any one of EEE2 to EEE5, further comprising:
determining a subset of two or more patches among the plurality of patches;
generating two or more pre-adjusted remodelling functions for the two or more patches, wherein each of the two or more pre-adjusted remodelling functions corresponds to a respective patch of the two or more patches;
assigning two or more weighting factors to the two or more patches, wherein each of the two or more weighting factors is assigned to a respective patch of the two or more patches;
the two or more weighting factors are used to adjust the two or more pre-adjusted remodelling functions to two or more remodelling functions included in the plurality of remodelling functions.
EEE7. The method of EEE6 wherein the two or more pre-conditioned remodeling functions comprise pre-conditioned remodeling functions corresponding to patches of the plurality of patches; wherein the pre-adjusted remodelling function is adjusted to a remodelling function included in the plurality of remodelling functions based at least in part on a codeword distribution determined from codewords included in the patch.
EEE8. The method of any one of EEE 1-EEE 7, wherein the one or more patch data types comprise at least one of: occupancy patch data type, geometry patch data type, or attribute patch data type.
EEE9. the method of any one of EEE 1-EEE 8, wherein the method is performed by a 3D video encoder implemented at least in part by one or more video codecs associated with one of: motion Picture Expert Group (MPEG), advanced Video Coding (AVC), high Efficiency Video Coding (HEVC), versatile Video Coding (VVC), AOMedia video 1 (AV 1), basic video coding (EVC), point Cloud Compression (PCC), video-based point cloud compression (V-PCC), or visual volume video-based coding (V3C).
EEE10. The method according to any one of EEEs 1-9 wherein the 3D video signal is further encoded with remodelling metadata that enables the receiver device to perform reverse remodelling of the remodelled patch data decoded from the 3D video signal.
EEE11. The method of any one of EEEs 1-10, wherein the pre-reshaped patch data of the one or more patch data types of the plurality of patches is encoded in one or more video frames of the 3D video signal according to an optimal layout of an atlas of the plurality of patches, wherein atlas information specifying the optimal layout of the atlas is encoded in the 3D video signal according to a 3D video coding specification.
EEE12. The method of any one of EEE 1-EEE 11 wherein the plurality of patches comprises projected patches, wherein the projected patches are generated by applying one or more 2D projections to the input 3D point cloud.
EEE13. The method according to any one of EEE 1-EEE 12 wherein the dynamic range represented in the remodeled patch data is different from the input dynamic range represented in the pre-remodeled patch data.
The method of any of EEE 1-EEE 13, wherein the encoder-side superplasticizing of the pre-remodeled patch data contained in the plurality of patches is performed based on a plurality of patch-based remodelling functions, wherein the plurality of patch-based remodelling functions includes at least one patch-based remodelling function related to one of: a multi-segment polynomial, a three-dimensional look-up table (3 DLUT), a cross color channel predictor, a predictor with a B-spline function as a basis function, or a tensor product B-spline (TPB) predictor.
EEE15. The method of EEE14 wherein the plurality of patch-based remodeling functions comprises a first patch-based remodeling function for remodeling a first portion of the pre-remolded patch data contained in a first patch of the plurality of patches, wherein the plurality of patch-based remodeling functions comprises a second, different patch-based remodeling function for remolding a second portion of the pre-remolded patch data contained in a second, different patch of the plurality of patches.
Eee16. A method comprising:
decoding remodeled patch data of one or more data types from a plurality of patches of a three-dimensional (3D) video signal;
performing decoder-side remodelling on the remodelled patch data of the plurality of patches to generate reconstructed patch data of the one or more patch data types for the plurality of patches;
a reconstructed 3D point cloud is generated based on the reconstructed patch data of the one or more patch data types of the plurality of patches.
EEE17. The method of EEE16 further comprising rendering a display image derived from the reconstructed 3D point cloud on an image display.
EEE18. An apparatus that performs any of the methods according to EEE1 through EEE17.
EEE19. A non-transitory computer readable medium storing software instructions that when executed by one or more processors cause performance of steps of any of the methods according to EEE 1-EEE 17.

Claims (15)

1. A method, comprising:
receiving an input three-dimensional (3D) point cloud, wherein the input 3D point cloud includes a spatial distribution of points positioned at a plurality of spatial locations in the represented 3D space;
generating a plurality of patches from the input 3D point cloud, wherein each patch of the plurality of patches includes pre-remodeled patch data of one or more patch data types, wherein the pre-remodeled patch data is at least partially indicative of a target visual property and is at least partially derived from visual properties of a subset of the points in the input 3D point cloud;
Performing encoder-side remodelling on the remodelled patch data included in the plurality of patches to generate remodelled patch data of the one or more patch data types for the plurality of patches;
the remodelled patch data of the one or more data types of the plurality of patches is encoded into a 3D video signal in place of the remodelled patch data of the one or more data types of the plurality of patches, wherein the 3D video signal causes a recipient device of the 3D video signal to generate a reconstructed 3D point cloud approximating the input 3D point cloud.
2. The method of claim 1, wherein the remodeled patch data of the one or more patch data types of the plurality of patches is generated from the pre-remodeled patch data based on a plurality of remodelling functions.
3. The method of claim 2, wherein the plurality of remodelling functions includes a first remodelling function for remodelling a first patch of the plurality of patches, wherein the plurality of remodelling functions includes a second, different remodelling function for remodelling a second patch of the plurality of patches.
4. The method of claim 3, wherein the first remodelling function for remodelling the first patch is specified by a first remodelling metadata portion, wherein the second remodelling function for remodelling the second patch is specified by a second, different remodelling metadata portion, wherein both the first remodelling metadata portion and the second remodelling metadata portion are encoded in the 3D video signal.
5. The method of any of claims 1-3, wherein the 3D video signal is further encoded with remodel metadata that enables the recipient device to perform reverse remodelling on the remodelled patch data decoded from the 3D video signal.
6. The method of any one of claims 2-5, wherein the plurality of remodelling functions includes a particular remodelling function for remodelling a particular patch of the plurality of patches, wherein the particular remodelling function is determined based at least in part on a noise level calculated from patch data portions in the particular patch.
7. The method of any one of claims 2 to 6, further comprising:
determining a subset of two or more patches among the plurality of patches;
generating two or more pre-adjusted remodelling functions for the two or more patches, wherein each of the two or more pre-adjusted remodelling functions corresponds to a respective patch of the two or more patches;
assigning two or more weighting factors to the two or more patches, wherein each of the two or more weighting factors is assigned to a respective one of the two or more patches and is set based on one or more importance selection factors such as size, location, depth, texture, or occupancy of each of the two or more patches;
The two or more weighting factors are used to adjust the two or more pre-adjusted remodelling functions to two or more remodelling functions included in the plurality of remodelling functions.
8. The method of any one of claims 1-7, wherein the one or more patch data types comprise at least one of: occupancy patch data type, geometry patch data type, or attribute patch data type.
9. The method of any one of claims 1-8, wherein the pre-reshaped patch data of the one or more patch data types of the plurality of patches is encoded in one or more video frames of the 3D video signal according to a predetermined layout of an atlas of the plurality of patches, wherein atlas information specifying the predetermined layout of the atlas is encoded in the 3D video signal according to a 3D video coding specification.
10. The method of any one of claims 2-9, wherein the plurality of patches includes a projected patch, wherein the projected patch is generated by applying one or more 2D projections to the input 3D point cloud.
11. The method of any one of claims 1-10, wherein the encoder-side superplasticizing of the pre-remodeled patch data included in the plurality of patches is performed based on a plurality of patch-based remodelling functions, wherein the plurality of patch-based remodelling functions includes at least one patch-based remodelling function related to one of: a three-dimensional look-up table (3 DLUT), a cross color channel predictor, or a predictor with B-spline functions as basis functions.
12. A method, comprising:
decoding remodeled patch data of one or more data types from a plurality of patches of a three-dimensional (3D) video signal;
performing decoder-side remodelling on the remodelled patch data of the plurality of patches to generate reconstructed patch data of the one or more patch data types for the plurality of patches;
a reconstructed 3D point cloud is generated based on the reconstructed patch data of the one or more patch data types of the plurality of patches.
13. The method of claim 12, further comprising rendering a display image derived from the reconstructed 3D point cloud on an image display.
14. An apparatus performing any of the methods of claims 1-13.
15. A non-transitory computer-readable medium storing software instructions which, when executed by one or more processors, cause performance of the steps of any of the methods of claims 1-13.
CN202280021309.1A 2021-05-21 2022-05-16 Patch-based remodelling and metadata for volumetric video Pending CN116982085A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163191480P 2021-05-21 2021-05-21
US63/191,480 2021-05-21
EP21175192.0 2021-05-21
PCT/US2022/029381 WO2022245695A1 (en) 2021-05-21 2022-05-16 Patch-based reshaping and metadata for volumetric video

Publications (1)

Publication Number Publication Date
CN116982085A true CN116982085A (en) 2023-10-31

Family

ID=88478231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280021309.1A Pending CN116982085A (en) 2021-05-21 2022-05-16 Patch-based remodelling and metadata for volumetric video

Country Status (1)

Country Link
CN (1) CN116982085A (en)

Similar Documents

Publication Publication Date Title
CN113615204B (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device and point cloud data receiving method
US11523135B2 (en) Apparatus, a method and a computer program for volumetric video
KR102292195B1 (en) A method for transmitting point clode data, An apparatus for transmitting point cloud data, A method for receiving point cloud data, and An apparatus for receiving point cloud data
EP3614674A1 (en) An apparatus, a method and a computer program for volumetric video
CN114503571A (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
US11659151B2 (en) Apparatus, a method and a computer program for volumetric video
CN115443652B (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device and point cloud data receiving method
CN114930813A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
US20230063575A1 (en) Patch zippering for mesh compression
US20220232261A1 (en) Video-Based Point Cloud Compression (V-PCC) Timing Information
US20220230360A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20230107834A1 (en) Method and apparatus of adaptive sampling for mesh compression by encoders
CN116349229A (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
EP4329311A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2023144445A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
CN117730539A (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
US20230129875A1 (en) A method, an apparatus and a computer program product for volumetric video encoding and video decoding
US20220385928A1 (en) Processing a point cloud
CN116982085A (en) Patch-based remodelling and metadata for volumetric video
CN115804096A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
EP4341903A1 (en) Patch-based reshaping and metadata for volumetric video
WO2019234290A1 (en) An apparatus, a method and a computer program for volumetric video
US11922664B2 (en) Method and apparatus of adaptive sampling for mesh compression by decoders
US20230419557A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20230088886A1 (en) Coding of uv coordinates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication