CN117376568A - Point cloud rate distortion coding method and related device based on perceptual weighting - Google Patents

Point cloud rate distortion coding method and related device based on perceptual weighting Download PDF

Info

Publication number
CN117376568A
CN117376568A CN202310183247.0A CN202310183247A CN117376568A CN 117376568 A CN117376568 A CN 117376568A CN 202310183247 A CN202310183247 A CN 202310183247A CN 117376568 A CN117376568 A CN 117376568A
Authority
CN
China
Prior art keywords
projection image
distortion
point cloud
projection
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310183247.0A
Other languages
Chinese (zh)
Inventor
丁克勤
张云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202310183247.0A priority Critical patent/CN117376568A/en
Publication of CN117376568A publication Critical patent/CN117376568A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/62Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding by frequency transforming in three dimensions

Abstract

The application discloses a point cloud rate distortion coding method based on perceptual weighting and a related device, wherein the method comprises the steps of obtaining a projection image corresponding to point cloud to be coded and a plurality of reference projection images corresponding to a projection module; obtaining gradient weights and structural similarity distortion degrees of all coding blocks of a projection image and all reference projection images, and calculating the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of all the coding blocks; and determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image, and encoding the projection image based on the rate distortion cost. According to the method, the two-dimensional projection images are generated through projection, then the perceived distortion degree corresponding to each projection image is determined by calculating the coding block as a unit, and coding decision is made based on the perceived distortion degree, so that the matching property of point cloud coding and visual perception quality is improved, and the coding efficiency is improved.

Description

Point cloud rate distortion coding method and related device based on perceptual weighting
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a point cloud rate distortion encoding method and related device based on perceptual weighting.
Background
In recent years, three dimensions such as Virtual Reality (VR), augmented Reality (AR), and Mixed Reality (MR) have been popular in many applications such as 3D movie viewing, heritage protection, navigation, immersive tele-phone, and tele-surgery, because they can provide users with unique six-degree-of-freedom (6 DoF) interactive, realistic, and immersive 3D visual experiences. Among other things, dynamic Point Clouds (DPCs) have become one of the mainstream manifestations of emerging immersive VR, AR and MR media due to their realistic rendering capabilities.
Dynamic Point Clouds (DPCs) represent a three-dimensional scene with a large number of unstructured high-dimensional points, where DPCs are a series of temporally continuous point clouds that can reflect not only motion and temporal ray changes, but also each of which includes geometric components for identifying locations in three-dimensional space and photometric information for reflecting ray and object properties, such as RGB colors, reflection, transparency, and the like. However, dynamic point clouds also create a huge amount of data due to the large number of high-dimensional points involved, which requires huge storage space and huge network bandwidth to transfer.
To solve this problem, the dynamic point cloud needs to be compressed to reduce the storage space required for the dynamic point cloud and the bandwidth required for transmission before the dynamic point cloud is stored and transmitted. The currently commonly used compression technology for dynamic point clouds is a video point cloud compression (V-PCC) model, where the V-PCC model measures distortion by using signal differences between an original point cloud and a distorted point cloud, such as point-to-point (D1), point-to-face (D2), and point-to-net (P2 mesh). However, the existing V-PCC model does not consider the visual perception quality of the dynamic point cloud, and cannot utilize the perception redundancy of the dynamic point cloud, thereby affecting the coding efficiency.
There is thus a need for improvements and improvements in the art.
Disclosure of Invention
The technical problem to be solved by the application is to provide a point cloud rate distortion coding method based on perceptual weighting and a related device aiming at the defects of the prior art.
In order to solve the above technical problem, a first aspect of an embodiment of the present application provides a point cloud rate distortion coding method based on perceptual weighting, where the method includes:
obtaining a projection image corresponding to a point cloud to be encoded and a plurality of reference projection images corresponding to the projection module, wherein the projection image and the plurality of reference projection images comprise a geometric projection image and a texture projection image;
obtaining gradient weights and structural similarity distortion degrees of all coding blocks of a projection image and all reference projection images, and calculating the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of all the coding blocks;
and determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image, and encoding the projection image based on the rate distortion cost.
The point cloud rate distortion coding method based on the perceptual weighting, wherein the reference projection image is obtained by downsampling the projection image, and the image scales of the projection image and a plurality of reference projection images are different from each other.
The point cloud rate distortion coding method based on the perceptual weighting, wherein the gradient weight acquisition process specifically comprises the following steps:
and calculating the gradient value of each pixel point in the coding block, and calculating the gradient weight of the coding block based on the gradient value of each pixel point.
The method for rate-distortion encoding of point clouds based on perceptual weighting, wherein the determining the rate-distortion cost corresponding to the point clouds to be encoded according to the perceived distortion degree of the projection image specifically comprises:
converting the structural similarity distortion degree in the perceived distortion degree into a mean square error distortion degree to obtain a perception coefficient corresponding to the projection image;
determining a perceived Lagrangian multiplier corresponding to the projection image based on the perceived coefficient;
and determining the rate distortion cost corresponding to the point cloud to be encoded based on the perceived distortion degree and the perceived Lagrangian multiplier.
The point cloud rate distortion coding method based on perceptual weighting, wherein the corresponding relation between the structural similarity distortion degree and the mean square error distortion degree is as follows:
wherein,structural similarity distortion of the coding block i representing the kth target projection image, +.>Mean square error distortion degree, Φ, of coding block i representing kth target projection image k,i Perceptual coefficients, blk, representing a coded block i of a kth target projection image n N-th sub-picture block, ρ, representing coding block i n Linear model parameters representing the nth sub-picture block of the coding block i, N representing the number of sub-picture blocks, E i Representing the pixel content weight, the kth target projection image is a projection image in the set of projection images formed by the projection image and the reference projection images.
The method for encoding the point cloud rate distortion based on the perceptual weighting, wherein when a perceptual Lagrangian multiplier is used for encoding, the determining the perceptual Lagrangian multiplier corresponding to the projection image based on the perceptual coefficient specifically comprises:
calculating a first target perception coefficient based on the perception coefficient of each coding block of the projection image, and calculating the ratio of the first target perception coefficient to the perception coefficient of the coding block to obtain a first perception coefficient ratio;
and calculating the product of the first perception coefficient ratio and the mean square error Lagrange multiplier of the projection image to obtain the perception Lagrange multiplier corresponding to the coding block.
The method for encoding the point cloud rate distortion based on the perceptual weighting, wherein when a perceptual Lagrangian multiplier is used for mode decision, the determining the perceptual Lagrangian multiplier corresponding to the projection image based on the perceptual coefficient specifically comprises:
Calculating a second target perception coefficient based on the perception coefficient of each coding block of the projection image, and calculating the ratio of the second target perception coefficient to the square root of the perception coefficient of the coding block to obtain a second perception coefficient ratio;
and calculating the product of the second perceptual coefficient ratio and the square root of the mean square error Lagrangian multiplier of the projection image to obtain a perceptual Lagrangian multiplier corresponding to the coding block.
A second aspect of an embodiment of the present application provides a point cloud rate distortion coding system based on perceptual weighting, the system comprising:
the acquisition module is used for acquiring a projection image corresponding to the point cloud to be encoded and a plurality of reference projection images corresponding to the projection module, wherein the projection image and the plurality of reference projection images comprise a geometric projection image and a texture projection image;
the computing module is used for acquiring the gradient weights and the structural similarity distortion degrees of the coding blocks of the projection image and each reference projection image, and computing the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of the coding blocks;
and the encoding module is used for determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image and encoding the projection image based on the rate distortion cost.
A third aspect of the embodiments of the present application provides a computer readable storage medium storing one or more programs executable by one or more processors to implement the steps in a perceptually weighted based point cloud rate-distortion coding method as described in any of the above.
A fourth aspect of the present embodiment provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in a perceptually weighted based point cloud rate-distortion coding method as described in any of the above.
The beneficial effects are that: compared with the prior art, the application provides a point cloud rate distortion coding method based on perceptual weighting and a related device, wherein the method comprises the steps of obtaining a projection image corresponding to point cloud to be coded and a plurality of reference projection images corresponding to the projection module; obtaining gradient weights and structural similarity distortion degrees of all coding blocks of a projection image and all reference projection images, and calculating the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of all the coding blocks; and determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image, and encoding the projection image based on the rate distortion cost. According to the method, the two-dimensional projection images are generated through projection, then the perceived distortion degree corresponding to each projection image is determined by calculating the coding block as a unit, and coding decision is made based on the perceived distortion degree, so that the matching property of point cloud coding and visual perception quality is improved, and the coding efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without creative effort for a person of ordinary skill in the art.
Fig. 1 is a flowchart of a point cloud rate distortion coding method based on perceptual weighting.
Fig. 2 is a schematic flow chart of a point cloud rate distortion coding method based on perceptual weighting.
FIG. 3 is a scatter plot of the correlation on the texture projection map of the approximation model and the original model.
Fig. 4 is a scatter plot of the correlation on the geometric projection map of the approximation model and the original model.
Fig. 5 is a schematic structural diagram of a point cloud rate distortion coding method based on perceptual weighting.
Fig. 6 is a schematic structural diagram of a terminal device provided in the present application.
Detailed Description
The application provides a point cloud rate distortion coding method based on perceptual weighting and a related device, and in order to make the purposes, technical schemes and effects of the application clearer and more definite, the application is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It should be understood that the sequence number and the size of each step in this embodiment do not mean the sequence of execution, and the execution sequence of each process is determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiment of the present application.
It has been found that in recent years, three dimensions such as Virtual Reality (VR), augmented Reality (AR) and Mixed Reality (MR) have been popular in many applications such as 3D movie viewing, heritage protection, navigation, immersive tele-phone and tele-surgery, etc. because they can provide users with unique six-degree-of-freedom (6 DoF) interactions, realistic and immersive 3D visual experiences. Among other things, dynamic Point Clouds (DPCs) have become one of the mainstream manifestations of emerging immersive VR, AR and MR media due to their realistic rendering capabilities.
Dynamic Point Clouds (DPCs) represent a three-dimensional scene with a large number of unstructured high-dimensional points, where DPCs are a series of temporally continuous point clouds that can reflect not only motion and temporal ray changes, but also each of which includes geometric components for identifying locations in three-dimensional space and photometric information for reflecting ray and object properties, such as RGB colors, reflection, transparency, and the like. However, dynamic point clouds also create a huge amount of data due to the large number of high-dimensional points involved, which requires huge storage space and huge network bandwidth to transfer.
To solve this problem, the dynamic point cloud needs to be compressed to reduce the storage space required for the dynamic point cloud and the bandwidth required for transmission before the dynamic point cloud is stored and transmitted. The currently commonly used compression technology for dynamic point clouds is a video point cloud compression (V-PCC) model, where the V-PCC model measures distortion by using signal differences between an original point cloud and a distorted point cloud, such as point-to-point (D1), point-to-face (D2), and point-to-net (P2 mesh). However, the existing V-PCC model does not consider the visual perception quality of the dynamic point cloud, and cannot utilize the perception redundancy of the dynamic point cloud, thereby affecting the coding efficiency.
For this reason, combining visual perception with coding has been a popular topic in the video and image processing fields. In the existing research work, jiang et al propose a new perceptual coding scheme conforming to the h.265/HEVC standard by using temporal spatial significance. Wu et al propose a Rate Distortion (RD) model of the perceptual weighted average squared error (PWMSE) and derive the lagrangian multiplier for the Rate Distortion Optimization (RDO) process from the equivalent distortion. Based on WS-PSNR, li et al optimized RDO tasks with 360 degree video evaluation index of sphere. Li et al propose an RDO method for spherical video to solve the problem of inaccurate calculation of two-dimensional image plane distortion. The basic idea of the above method is to use the HVS model or its approximation as the distortion in the coding module and to choose the best coding mode or parameters with the constraint of minimizing the perceived distortion. However, the above method is proposed for conventional 2D or 360 degree video coding, and cannot be used for dynamic point clouds.
In order to exploit visual redundancy in point clouds, li et al have adopted an occupancy map-based RDO method that designs an occupancy map-directed video compression framework that utilizes occupancy maps to enhance DPC compression performance. Since the current RDO and geometric distortion are inconsistent with the evaluation criteria D1 and D2, xiong et al propose an EPM-based RDO method that first describes the relationship between the current distortion model and the geometric quality measure, and then modifies the three-dimensional geometric distance by estimating the normal vector of the CU to estimate the normal vector of D1 and D2. However, the quality of the reconstructed point cloud in these rate-distortion coding schemes is still measured by D1 and D2, and cannot truly reflect human perception. Therefore, in the existing point cloud coding scheme, the perceptual characteristics of the point cloud are not fully considered, and the perceptual redundancy of the point cloud is not utilized.
In order to solve the above problems, in an embodiment of the present application, a projection image corresponding to a point cloud to be encoded and a plurality of reference projection images corresponding to the projection module are obtained; obtaining gradient weights and structural similarity distortion degrees of all coding blocks of a projection image and all reference projection images, and calculating the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of all the coding blocks; and determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image, and encoding the projection image based on the rate distortion cost. According to the method, the two-dimensional projection images are generated through projection, then the perceived distortion degree corresponding to each projection image is determined by calculating the coding block as a unit, and coding decision is made based on the perceived distortion degree, so that the matching property of point cloud coding and visual perception quality is improved, and the coding efficiency is improved.
The application will be further described by the description of embodiments with reference to the accompanying drawings.
The embodiment provides a point cloud rate distortion coding method based on perceptual weighting, as shown in fig. 1 and fig. 2, the method includes:
s10, obtaining projection images corresponding to point clouds to be encoded and a plurality of reference projection images corresponding to the projection modules.
Specifically, the projection image and the plurality of reference projection images comprise a geometric projection image and a texture projection image, wherein the projection image is formed by carrying out patch segmentation and packaging on a point cloud to be coded on the point cloud to be coded, the geometric projection image and the texture projection image are two-dimensional images, and the image dimensions of the geometric projection image and the texture projection image are the same. Each reference projection image in the plurality of reference projection images is an image group, and the image group comprises a reference geometric projection image and a reference texture projection image, wherein the image scale of each geometric projection image in each reference projection image is different from each other and is different from or equal to the image scale of the geometric projection image; the image dimensions of the texture projection images in the reference projection images are different from each other and are each different from the image dimensions of the texture projection images.
The reference projection image is obtained by downsampling the projection image, wherein the reference projection image is obtained by downsampling the projection image, which means that the reference geometrical projection image is obtained by downsampling the geometrical projection image, and the test texture projection image is obtained by downsampling the geometrical projection image. In one implementation, the plurality of reference projection images may be downsampled by using a gaussian pyramid, that is, a geometric projection image and a texture projection image included in the projection images are input into the gaussian pyramid, and output items of each layer of the gaussian pyramid are obtained, so as to obtain the plurality of reference projection images. It will be appreciated that by each layer of the gaussian pyramid, the resolution of the output term of that layer is reduced by one half of the input term, for example, k is used to represent the different scales, k=1 represents the corresponding image scale of the projection image, k=2 represents downsampling the projection image once, the downsampled reference projection image corresponds to one half of the corresponding image scale of the projection image, and so on, to obtain several projection images.
Of course, in practical application, after the projection image is generated by projection, other modes may be adopted for obtaining the plurality of reference projection images, for example, the plurality of projection images are sequenced from large to small according to the image scale to form a first reference projection image, a second reference projection image, and an nth reference projection image, where the first reference projection image is obtained by performing 2 times downsampling on the projection image, the second reference projection image is obtained by performing 4 times downsampling on the projection image, and so on, to obtain the nth reference projection image. In addition, the correspondence relationship of the image scales of the nth reference projection images may also be different, for example, the image scale corresponding to the projection image is 3 times the image scale of the first reference projection image, the image scale corresponding to the first reference projection image is 3 times the image scale corresponding to the second reference projection image, and so on.
S20, obtaining gradient weights and structural similarity distortion degrees of all coding blocks of the projection image and all reference projection images, and calculating the perceived distortion degree of the projection image based on the gradient weights and the structural similarity distortion degrees of all the coding blocks.
Specifically, the gradient weight is determined based on the gradient of each pixel point in the coding block, and the structural similarity distortion is used for reflecting the distortion of a distortion block corresponding to the coding block, wherein the distortion block is determined based on a geometric reconstruction and a texture reconstruction formed by a reconstruction point cloud corresponding to the point cloud to be coded. That is, when the projection image and a plurality of reference projection images are acquired, a reconstructed projection image of a reconstructed point cloud corresponding to the point cloud to be encoded is synchronously acquired, wherein the reconstructed projection image comprises a geometric reconstruction image and a texture reconstruction image; and then carrying out downsampling operation on the reconstructed projection image to obtain a plurality of reference reconstructed projection images. The determining process of the reconstructed projection image and the plurality of reference reconstructed projection images is the same as the determining process of the projection image and the plurality of reference projection images, which is not specifically described herein, but only that the projection images correspond to the reconstructed projection images, the plurality of reference projection images correspond to the plurality of reference reconstructed projection images one by one, and the image scale of the reference projection images is the same as the image scale of the reference reconstructed projection images corresponding thereto, that is, the image scale of the reference geometric projection images in the reference projection images is equal to the image scale of the reference geometric reconstruction images in the reference reconstructed projection images corresponding thereto, and the image scale of the reference texture projection images in the reference projection images is equal to the image scale of the reference texture reconstruction images in the reference reconstructed projection images corresponding thereto.
In one implementation manner, the gradient weight acquiring process specifically includes:
and calculating the gradient value of each pixel point in the coding block, and calculating the gradient weight of the coding block based on the gradient value of each pixel point.
Specifically, the gradient value of each pixel point in the coding block includes a horizontal gradient and a vertical gradient, and the calculation formulas of the horizontal gradient and the vertical gradient are respectively:
wherein F is H Gradient operator representing horizontal direction, I i Representing the ith code block, the size of M x N, j representing the jth pixel point in the code block.
After obtaining the pixel gradient value of each pixel point, calculating the gradient value of the coding block based on the pixel gradient value of each pixel point in the coding block, wherein the calculation formula of the candidate gradient value of the coding block can be:
wherein,representing candidate gradient values.
And after the candidate gradient values of each coding block, carrying out normalization operation on each candidate gradient value to obtain the gradient value of each coding block, wherein the gradient value of each coding block is used for reflecting the importance degree of the coding block relative to all the coding blocks.
The calculation formula of the gradient value is as follows:
wherein W is i Representing gradient values of the encoded blocks, nomal (·) represents normalization operations, which are calculated in the following manner:
Wherein the C (-) operation represents constraining the calculated maximum and minimum values to [0,1 ]]Between, i.e. a value less than 0 is set to 0, a number greater than 1 is set to 1, mu x Represents x i Mean, sigma of x Represents x i Standard deviation of (2).
Further, the structural similarity distortion is the structural similarity distortion of the coding block and its corresponding reconstructed coding block. And after the gradient weights and the structural similarity distortion degrees of the coding blocks are obtained, determining the corresponding perception distortion degrees of the projection images based on the gradient weights and the structural similarity distortion degrees of the coding blocks. Wherein, the texture image block distortion degree of the ith texture distortion image block in the kth target projection image in the projection image set formed by the projection image and a plurality of reference projection images can be expressed as:
wherein W is k,i Representing the gradient weights of the ith coding block in the kth group of images,representing the SSIM distortion level of the ith coded block relative to the ith reconstructed coded block in the kth group of pictures.
Of course, it should be noted that, when calculating the perceived distortion of the encoded block, the geometric projection image and the texture projection image in the projection image respectively calculate the respective corresponding perceived distortion, and the calculation processes of the geometric projection image and the texture projection image are the same, so that the present embodiment describes the determination process of the perceived distortion with the projection image.
S30, determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image, and encoding the projection image based on the rate distortion cost.
Specifically, the objective function of the rate-distortion cost is:
wherein J represents the total code Rate Distortion (RD) cost, D represents the distortion difference between the point cloud to be encoded and the reconstruction point cloud corresponding to the unit to be encoded, and R represents the encoding bit. Lambda represents the lagrangian multiplier for balancing distortion and bit rate.
Based on this, the objective function of determining the rate-distortion cost corresponding to the point cloud to be encoded according to the perceived distortion of the projection image may be expressed as:
wherein D is PPCM Representing the perceived distortion.
Since the V-PCC standard encodes geometry video and texture video separately using HEVC, compression of geometry video and texture video is considered as two independent processes, and distortion of texture video is related only to texture video encoding parameters, and distortion of geometry video is related only to geometry video encoding parameters. Thus, the objective function of the rate-distortion cost of the texture projection image and the objective function of the rate-distortion cost of the geometry projection image are expressed as:
wherein J is G Independent constraints representing texture projection images, J T Independent constraints representing geometrically projected images, R T Is a constant for a geometric video encoder, R G For texture video encoders.
In addition, since the coding decision process of the texture projection image is the same as the coding decision process of the geometric projection image, only the coding decision process will be described herein, and the texture projection image and the geometric projection image are not described, and the following coding decision process can be adopted for the progress of the texture projection image and the geometric projection image.
In one implementation manner, the determining the code rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image specifically includes:
converting the structural similarity distortion degree in the perceived distortion degree into a mean square error distortion degree to obtain a perception coefficient corresponding to the projection image;
determining a perceived Lagrangian multiplier corresponding to the projection image based on the perceived coefficient;
and determining code rate distortion cost corresponding to the point cloud to be encoded based on the perceived distortion degree and the perceived Lagrangian multiplier.
In particular, under a high resolution quantized approximation, the encoding process will preserve as much luminance information as possible. For different video content, moreover, even at high QP settings, The Pearson Correlation Coefficient (PCC) value between μ also exceeds 0.99. Based on this, i can be th The SSIM value of a pixel is rewritten as:
wherein,representing pixel level squared error, E i Representing pixel content weights, M representing the number of encoded blocks, C 2 Is C in SSIM 2 Constant epsilon t Filter coefficient representing the t-th position of the current coding block,/->And->Representing the variance at i-t before and after encoding of the encoded block.
Further, in order to reduce the amount of computation, the SSIM distortion degree of the encoded block is set to an average value of 16 sub-blocks, n representing the sub-image block of the encoded block, that is:
the following relationship is known for coding units in HEVC
Wherein ρ is n Is a linear model parameter related to image content, wherein the previous frame is encodedThe result of the code co-located block is taken as reference image content (except the first frame), Q n Is the quantization step size of a sub-block and is typically applied to all sub-blocks with the same value.
For coded blocks, D MSE Can be expressed as:
wherein,structural similarity distortion of the coding block i representing the kth projection image, +.>Mean square error distortion degree, Φ, of coding block i representing kth projection image k,i Perceptual coefficients, blk, representing a coded block i of a kth projection image n N-th sub-picture block, ρ, representing coding block i n Linear model parameters representing the nth sub-picture block of the coding block i, N representing the number of sub-picture blocks, E i Representing the pixel content weight, the kth projection image is a projection image in the set of projection images formed by the projection image and the reference projection images.
Based on this, the loss of image MSE values due to video compression can be approximated by SSIM loss of the domain image, i.e. D, through appropriate transformation PPCM Can be expressed as:
further, when the perceived coefficient is obtained, determining a perceived lagrangian multiplier corresponding to the projected image, where the determining process of the perceived lagrangian multiplier may be:
when the PPCM-based distortion metric D PPCM When applied to RD objective function in V-PCC, the Lagrangian multiplier is neededAnd->The adaptation is made so that the perceived distortion is +_ in the channel phi (where the channel comprises a texture channel and a geometry channel, here denoted phi)>And bit rate R φ An optimal compromise is made between them. MSE-based rate distortion costThe calculation at each coding block i is:
wherein,and lambda (lambda) MSE Is MSE-based distortion, bit rate and Lagrangian multiplier in HEVC, code rate +.>And distortion->The relationship between can be modeled as
Wherein,is the variance of the coded residual after inter or intra prediction and α is the proportionality constant. Will- >For a pair ofThe partial derivative is calculated, and when the partial derivative is zero, the following can be obtained:
solving the above method to obtain the optimalAnd->The method comprises the following steps:
and the total bit rate of a video is the sum of the number of bits of all blocks, which can be calculated as:
where M is the number of coded blocks in a frame/video.
Likewise, the target function pair D based on rate coding distortion MSE Taking the partial derivative and when the partial derivative is zero, obtaining:
solving the above method to obtain
The total number of bits of the encoded channel phi is calculated as:
based on this, the perceptual Lagrangian parameter may be determined based on the total number of bits of the encoding channel φ.
In one implementation, when the perceptual lagrangian multiplier is used for encoding, the determining, based on the perceptual coefficient, the perceptual lagrangian multiplier corresponding to the projected image specifically includes:
calculating a first target perception coefficient based on the perception coefficient of each coding block of the projection image, and calculating the ratio of the first target perception coefficient to the perception coefficient of the coding block to obtain a first perception coefficient ratio;
and calculating the product of the first perception coefficient ratio and the mean square error Lagrange multiplier of the projection image to obtain the perception Lagrange multiplier corresponding to the coding block.
In particular, in the V-PCC model, the attribute and geometry video is encoded by an MSE-based encoder. MSE-based total bit rate R of V-PCC MSE Total bit rate R with V-PCC based on perceived distortion φ Similarly, the channels phi, i.e. R, are encoded at different point clouds MSE =R φ Thereby, lambda can be obtained φ And lambda (lambda) MSE The relation between the two is:
thus, based on equation (1) and equation (2), the RD cost to arrive at a mode decision is:
thus, the perceived Lagrangian coefficient is:
in one implementation, when the perceived lagrangian multiplier is used for mode decision, the determining, based on the perceived coefficient, the perceived lagrangian multiplier corresponding to the projected image specifically includes:
calculating a second target perception coefficient based on the perception coefficient of each coding block of the projection image, and calculating the ratio of the second target perception coefficient to the square root of the perception coefficient of the coding block to obtain a second perception coefficient ratio;
and calculating the product of the second perceptual coefficient ratio and the square root of the average absolute error Lagrangian multiplier of the projection image to obtain a perceptual Lagrangian multiplier corresponding to the coding block.
In conventional MSE-based video encoders, SAD/MAD is used as a distortion term in rate-distortion optimization to avoid squaring to keep complexity low. RD cost is calculated as:
Wherein the method comprises the steps ofAnd->MAD-based distortion, bit rate, and Lagrangian multiplier in HEVC, respectively. Lagrangian multiplier of ME>Also, based on-> Is related to (1)
Thus, the RD cost is:
thus, a perceptual Lagrangian multiplier based on the perceptual distortionIs updated as:
to further illustrate the encoding process of this embodiment, the V-PCC encoding standard maps the geometric features and texture features of the three-dimensional point cloud into two video sequences, respectively, recorded as a geometric projection sequence and a texture projection sequence, and then compresses the video sequence of the dynamic point cloud using the existing HEVC/VVC or other video encoder, where some meta information needs to be generated during the encoding process, such as: the occupancy map and the auxiliary block information are used to describe the two video sequences, and the encoding process is the same as the existing process and will not be described in detail here.
A point cloud projection Patch (Patch) in V-PCC is a collection of information that includes the three-dimensional bounding box of the point cloud, related geometric and texture information, and atlas information needed for three-dimensional reconstruction. The division process of the point cloud projection patch is to project each frame of point cloud from a three-dimensional space onto a given two-dimensional plane, and divide the point cloud into patch blocks with smooth boundaries as few as possible so as to minimize reconstruction errors. The point cloud projection process can be to calculate the normal vector of each point in the input point cloud first, then cluster and project the whole point cloud onto 6 faces of the cube according to the normal vector to form an initial cluster. Then, according to the normal vector and the index of the nearest point, the cluster index of each point is updated by continuous iteration, so that finer patch division is realized. Finally, connected patches are extracted by a connected component extraction method and combined into larger patches, thereby obtaining a final patch set. Therefore, the geometric and texture information in the three-dimensional space is respectively converted into two-dimensional images through packaging and reorganizing the point cloud projection surface patch set on the two-dimensional plane, so that the generating and filling processes of the geometric and texture images are realized, and the geometric projection image sequence and the texture projection image sequence are obtained.
The following description is given on the premise that a geometric projection map sequence and a texture projection map sequence have been acquired, and the encoding process of the geometric projection map sequence and the texture projection map sequence includes:
a) Performing geometric projection graph analysis to obtain the perception coefficient of each coding block in the geometric projection graphWherein +_is obtained based on the current encoded block and the encoded co-located block of the previous frame except for the first frame reference frame>ρ in (2) n Parameters;
b) Determining a perceptual Lagrangian multiplier based on the perceptual coefficients;
c) Encoding the current geometric projection frame; if all the geometric projection images corresponding to the video frames are coded, entering a step d) to perform texture coding, otherwise, turning to a step a) to perform coding of the next frame;
d) Performing texture projection map analysis to obtain perceptual coefficients of each coding block in the geometric projection mapWherein in addition toA first frame reference frame, based on the current coding block and the coding co-located block of the previous frame, acquires ∈>ρ in (2) n Parameters;
e) Determining a perceptual Lagrangian multiplier based on the perceptual coefficients;
f) The current texture projection map is encoded and then goes to step d) to encode the next frame texture projection map until all texture projection maps are encoded.
In summary, the present embodiment provides a method for encoding point cloud rate distortion based on perceptual weighting, which includes obtaining a projection image corresponding to a point cloud to be encoded and a plurality of reference projection images corresponding to the projection module; obtaining gradient weights and structural similarity distortion degrees of all coding blocks of a projection image and all reference projection images, and calculating the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of all the coding blocks; and determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image, and encoding the projection image based on the rate distortion cost. According to the method, the two-dimensional projection images are generated through projection, then the perceived distortion degree corresponding to each projection image is determined by calculating the coding block as a unit, and coding decision is made based on the perceived distortion degree, so that the matching property of point cloud coding and visual perception quality is improved, and the coding efficiency is improved.
Further, to illustrate the accuracy of this embodiment, statistical analysis was performed on both Longdress and Andrew point clouds. Each point cloud is compressed by 17 Quantization Parameter (QP) pairs, ranging from 20 to 32 and 27 to 42, and is projected onto a two-dimensional image and decomposed into 400 blocks, 400 x 17 sample blocks. As shown in fig. 3 and 4, R is used 2 To calculateAnd->Near betweenModel-like accuracy, the results represent an average R over the attribute and geometry channels 2 For 0.9732 and 0.9556, the approximation in perception is accurate.
To verify the proposed coding efficiency, the coding method is implemented on the V-PCC reference software TMC2-10.0 and the corresponding HEVC reference software HM16.20-SCM 8.8. The latest V-PCC (TMC 2-10.0+HM16.20-SCM 8.8) was used as the anchoring method for the comparison. Furthermore, the RDO of the two most advanced V-PCCs, denoted "OC-RDO" based on the RDO of the occupancy map, and the RDO method based on EPM, denoted "EPM-RDO", are also used as a benchmark for comparison. Since the proposed coding method can be applied to intra and inter coding of V-PCC, V-PCC is configured as a random access coding scheme (RA) for geometry and texture coding. Five bit rate points low (r 1) to high (r 5) that meet the V-PCC universal test condition (CTC) are selected for testing. For fair comparison, graphsims (independent of proposed PPCM) are used to measure the perceived quality of compressed point clouds for different coding schemes. Bjontegaard Delta Bit Rate (BDBR) is then used to compare the RD performance of the proposed scheme and the baseline scheme.
Table 1 shows RD comparisons on 6 DPCs tested, where visual quality of compressed video is measured using GraphSIM, R T Representing the total bit rate (Mbps). It can be seen that OC-RDO can achieve a bit rate reduction of 0.34% to 17.05% compared to V-PCC, with an average of 9.61%. EPM-RDO is capable of achieving BDBR savings from-4.08% to 5.61%, on average 1.17%. As for the proposed PWRDO, it achieves a reduction in BDBR from 7.09% to 22.79%, an average 13.52% reduction compared to V-PCC, significantly exceeding the comparison of the two coding optimizations of OC-RDO and EPM-RDO. To evaluate the perceived coding efficiency of the proposed PWRDO, five different perceived PCQA indicators, including D1, graphSIM, MPED, SIAT _pcqa and proposed PPCM, were used to measure the visual quality of compressed DPC in different schemes. Then, a BDBR is calculated for each PCQA, wherein the number of total coded bits including geometry bits, attribute bits, and metadata bits. A negative BDBR means saving bits, while a positive indicates a reduced coding efficiency compared to the anchor point. Table 1 shows the proposed PWRDO and baseline RDO schemes at five different perceptionsRD comparison under PCQA index. These comparison results demonstrate that the proposed PWRDO can stably obtain higher coding yields in all PCQA, which demonstrates the effectiveness of the proposed PWRDO.
Table 1. Gains of the proposed method and 2 Point cloud coding Rate distortion optimization methods on 5 evaluation indicators
Based on the above-mentioned point cloud rate distortion coding method based on perceptual weighting, the present embodiment provides a point cloud rate distortion coding system based on perceptual weighting, as shown in fig. 5, the system includes:
the acquisition module 100 is configured to acquire a projection image corresponding to a point cloud to be encoded and a plurality of reference projection images corresponding to the projection module, where the projection image and the plurality of reference projection images each include a geometric projection map and a texture projection map;
the computing module 200 is configured to obtain gradient weights and structural similarity distortion degrees of each encoding block of the projection image and each reference projection image, and compute a perceived distortion degree of the projection image based on the gradient weights and the structural similarity distortion degrees of each encoding block;
the encoding module 300 is configured to determine a rate distortion cost corresponding to the point cloud to be encoded according to a perceived distortion degree of a projection image, and encode the projection image based on the rate distortion cost.
Based on the above-mentioned point cloud rate distortion coding method based on perceptual weighting, the present embodiment provides a computer readable storage medium storing one or more programs executable by one or more processors to implement the steps in the point cloud rate distortion coding method based on perceptual weighting as described in the above-mentioned embodiment.
Based on the above-mentioned point cloud rate distortion coding method based on perceptual weighting, the present application also provides a terminal device, as shown in fig. 6, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory) 22, which may also include a communication interface (Communications Interface) 23 and a bus 24. Wherein the processor 20, the display 21, the memory 22 and the communication interface 23 may communicate with each other via a bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may invoke logic instructions in the memory 22 to perform the methods of the embodiments described above.
Further, the logic instructions in the memory 22 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product.
The memory 22, as a computer readable storage medium, may be configured to store a software program, a computer executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 performs functional applications and data processing, i.e. implements the methods of the embodiments described above, by running software programs, instructions or modules stored in the memory 22.
The memory 22 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. In addition, the memory 22 may include high-speed random access memory, and may also include nonvolatile memory. For example, a plurality of media capable of storing program codes such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or a transitory storage medium may be used.
In addition, the specific processes that the storage medium and the plurality of instruction processors in the terminal device load and execute are described in detail in the above method, and are not stated here.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A point cloud rate distortion coding method based on perceptual weighting, the method comprising:
obtaining a projection image corresponding to a point cloud to be encoded and a plurality of reference projection images corresponding to the projection module, wherein the projection image and the plurality of reference projection images comprise a geometric projection image and a texture projection image;
obtaining gradient weights and structural similarity distortion degrees of all coding blocks of a projection image and all reference projection images, and calculating the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of all the coding blocks;
and determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image, and encoding the projection image based on the rate distortion cost.
2. The perceptually weighted point cloud rate distortion coding method of claim 1, wherein the reference projected image is obtained by downsampling the projected image, and the projected image and the plurality of reference projected images differ in image scale from each other.
3. The method for encoding the point cloud rate distortion based on the perceptual weighting according to claim 1, wherein the gradient weight obtaining process specifically comprises:
And calculating the gradient value of each pixel point in the coding block, and calculating the gradient weight of the coding block based on the gradient value of each pixel point.
4. The method for rate-distortion encoding of point cloud based on perceptual weighting according to claim 1, wherein said determining the rate-distortion cost corresponding to the point cloud to be encoded according to the perceived distortion of the projected image specifically comprises:
converting the structural similarity distortion degree in the perceived distortion degree into a mean square error distortion degree to obtain a perception coefficient corresponding to the projection image;
determining a perceived Lagrangian multiplier corresponding to the projection image based on the perceived coefficient;
and determining the rate distortion cost corresponding to the point cloud to be encoded based on the perceived distortion degree and the perceived Lagrangian multiplier.
5. The perceptual weighting-based point cloud rate distortion coding method of claim 4, wherein the correspondence between the structural similarity distortion degree and the mean square error distortion degree is:
wherein,structural similarity distortion of the coding block i representing the kth target projection image, +.>Mean square error distortion degree, Φ, of coding block i representing kth target projection image k,i Perceptual coefficients, blk, representing a coded block i of a kth target projection image n N-th sub-picture block, ρ, representing coding block i n Representing encoded blocksLinear model parameters of the nth sub-image block of i, N representing the number of sub-image blocks, E i Representing the pixel content weight, the kth target projection image is a projection image in the set of projection images formed by the projection image and the reference projection images.
6. The method according to claim 4, wherein when the perceptual lagrangian multiplier is used for encoding, the determining the perceptual lagrangian multiplier corresponding to the projected image based on the perceptual coefficients specifically comprises:
calculating a first target perception coefficient based on the perception coefficient of each coding block of the projection image, and calculating the ratio of the first target perception coefficient to the perception coefficient of the coding block to obtain a first perception coefficient ratio;
and calculating the product of the first perception coefficient ratio and the mean square error Lagrange multiplier of the projection image to obtain the perception Lagrange multiplier corresponding to the coding block.
7. The method according to claim 4, wherein when the perceptual lagrangian multiplier is used for mode decision, the determining the perceptual lagrangian multiplier corresponding to the projected image based on the perceptual coefficients specifically comprises:
Calculating a second target perception coefficient based on the perception coefficient of each coding block of the projection image, and calculating the ratio of the second target perception coefficient to the square root of the perception coefficient of the coding block to obtain a second perception coefficient ratio;
and calculating the product of the second perceptual coefficient ratio and the square root of the mean square error Lagrangian multiplier of the projection image to obtain a perceptual Lagrangian multiplier corresponding to the coding block.
8. A point cloud rate distortion coding system based on perceptual weighting, the system comprising:
the acquisition module is used for acquiring a projection image corresponding to the point cloud to be encoded and a plurality of reference projection images corresponding to the projection module, wherein the projection image and the plurality of reference projection images comprise a geometric projection image and a texture projection image;
the computing module is used for acquiring the gradient weights and the structural similarity distortion degrees of the coding blocks of the projection image and each reference projection image, and computing the perceived distortion degrees of the projection image based on the gradient weights and the structural similarity distortion degrees of the coding blocks;
and the encoding module is used for determining the rate distortion cost corresponding to the point cloud to be encoded according to the perceived distortion degree of the projection image and encoding the projection image based on the rate distortion cost.
9. A computer readable storage medium storing one or more programs executable by one or more processors to implement the steps in the perceptually weighted based point cloud rate-distortion coding method of any of claims 1-7.
10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the perceptually weighted based point cloud rate-distortion coding method as defined in any of claims 1-7.
CN202310183247.0A 2023-02-22 2023-02-22 Point cloud rate distortion coding method and related device based on perceptual weighting Pending CN117376568A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310183247.0A CN117376568A (en) 2023-02-22 2023-02-22 Point cloud rate distortion coding method and related device based on perceptual weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310183247.0A CN117376568A (en) 2023-02-22 2023-02-22 Point cloud rate distortion coding method and related device based on perceptual weighting

Publications (1)

Publication Number Publication Date
CN117376568A true CN117376568A (en) 2024-01-09

Family

ID=89395281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310183247.0A Pending CN117376568A (en) 2023-02-22 2023-02-22 Point cloud rate distortion coding method and related device based on perceptual weighting

Country Status (1)

Country Link
CN (1) CN117376568A (en)

Similar Documents

Publication Publication Date Title
US10904564B2 (en) Method and apparatus for video coding
CN108347611B (en) Optimization method of coding block-level Lagrange multiplier for theodolite
CN102970529B (en) A kind of object-based multi-view point video fractal image compression & decompression method
Li et al. Spherical domain rate-distortion optimization for 360-degree video coding
WO2021248966A1 (en) Point cloud quality assessment method, encoder, decoder, and storage medium
WO2018223086A1 (en) Methods for full parallax light field compression
Liu et al. Coarse to fine rate control for region-based 3D point cloud compression
CN107040771A (en) A kind of Encoding Optimization for panoramic video
CN105915892A (en) Panoramic video quality determination method and system
CN104159095A (en) Code rate control method for multi-view texture video and depth map coding
CN103096076B (en) Method for video coding
KR102505130B1 (en) A method and a device for encoding a signal representative of a light-field content
CN114651270A (en) Depth loop filtering by time-deformable convolution
Zhang et al. Efficient rendering distortion estimation for depth map compression
CN117376568A (en) Point cloud rate distortion coding method and related device based on perceptual weighting
CN109246407B (en) Image coding
US7657110B2 (en) Image compression using a color visual model
KR20230146629A (en) Predictive coding of boundary geometry information for mesh compression
Jaballah et al. Perceptual versus latitude-based 360-deg video coding optimization
Gao et al. Block size selection in rate-constrained geometry based point cloud compression
CN109547781B (en) Compression method and device based on image prediction
Zhang et al. Depth map compression based on platelet coding and quadratic curve fitting
CN103997635A (en) Synthesis viewpoint distortion prediction method and coding method of free viewpoint video
Yang et al. Efficient estimation of view synthesis distortion for depth coding optimization
WO2023240455A1 (en) Point cloud encoding method and apparatus, encoding device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination