CN116828166A - Volume video coding and decoding method based on inter-frame multiplexing - Google Patents

Volume video coding and decoding method based on inter-frame multiplexing Download PDF

Info

Publication number
CN116828166A
CN116828166A CN202310865717.1A CN202310865717A CN116828166A CN 116828166 A CN116828166 A CN 116828166A CN 202310865717 A CN202310865717 A CN 202310865717A CN 116828166 A CN116828166 A CN 116828166A
Authority
CN
China
Prior art keywords
point cloud
frame
decoding
inter
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310865717.1A
Other languages
Chinese (zh)
Inventor
赵东
马华东
黄成豪
王义总
高腾
郭子玄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202310865717.1A priority Critical patent/CN116828166A/en
Publication of CN116828166A publication Critical patent/CN116828166A/en
Pending legal-status Critical Current

Links

Abstract

A volume video coding and decoding method based on interframe multiplexing relates to the field of volume video coding and decoding, and mainly comprises the following steps: picture similarity detection, point cloud block similarity detection, inter-frame multiplexing, octree optimization coding algorithm, inverse Morton order rearrangement, 2D video coding algorithm, inter-frame decoding, octree optimization decoding algorithm, 2D video decoding algorithm and the like. The invention solves the problem of insufficient decoding frame rate of a decoding algorithm based on the inter-frame multiplexing at a mobile end, and achieves the aim of greatly reducing the bandwidth required by transmission by adopting the inter-frame multiplexing technology in the field of a volume video streaming media system; according to the method, the motion estimation is performed between the adjacent frames of the volume video by using the deep learning point cloud registration model based on the geometric information and the color information for the first time, so that quick and accurate redundant information elimination is realized; the invention compresses the color information of the volume video by using the 2D video coding algorithm for the first time, and realizes the inter-frame coding of the color information of the volume video.

Description

Volume video coding and decoding method based on inter-frame multiplexing
Technical Field
The invention relates to the technical field of volume video coding and decoding, in particular to a volume video coding and decoding method based on inter-frame multiplexing.
Background
Volumetric video is known as the next generation media technology, which is a new media technology that can provide a highly immersive and interactive user experience. Unlike 2D video and 360 degree video, the feature of volumetric video consisting of 3D data enables a user to view in six degrees of freedom, providing a more immersive experience for the user. However, despite the great potential, volumetric video streaming systems face two key technical challenges: (1) the volume of video data is excessive. The most common format for volumetric video is typically a point cloud format, with single frame sizes typically between 4MB and 15MB, and the uncompressed transmission of the original video stream requires a transmission rate of 1Gbps to 3.6Gbps, which far exceeds the capabilities of current WiFi or 5G networks. (2) the decoding speed cannot satisfy 30fps. The user views the volumetric video in six degrees of freedom, and the decoding and updating of the content needs to reach 30fps. The existing commonly used volume video coding and decoding algorithm still cannot meet the condition.
Currently, some methods have utilized methods such as viewport adaptation, 3D super-resolution, etc. to reduce the size of the volumetric video transport stream, while utilizing multi-threading, improving decoding algorithms, etc. to optimize decoding rate. The bandwidth requirement is reduced based on view port adaptation, namely, data is optimized according to a user view port, the content in the user view port is kept high-definition, and the content outside the view port is processed to be low-definition, so that although the bandwidth requirement can be reduced, the method needs to predict the user viewing angle in real time by a mobile terminal and optimize a point cloud transport stream in real time at a server terminal to meet the motion perception delay; the optimization method is too dependent on the prediction accuracy, and poor viewing experience can be given to users when the prediction is wrong. The 3D-based hyper-division method is used for reducing bandwidth requirements, a hyper-division model is required to be deployed at the mobile terminal, and the performance requirements on the mobile terminal are high. Increasing the decoding rate of a mobile terminal based on multithreading generally requires that the mobile terminal device has better performance, and the method consumes faster power when actually deployed to the mobile terminal. The decoding rate of the mobile terminal is improved based on the improved decoding algorithm, good effect is achieved, but the bandwidth required by the volume video streaming media system after the optimization of the matched encoding algorithm is still higher. In general, the above-mentioned optimization method still needs more than hundred megabytes of bandwidth for the compressed volume video stream to ensure transmission, and the volume video streaming media system still has a larger lifting space. Through exploration, a volume video coding and decoding algorithm used in a volume video streaming media system is still focused on single-frame processing, the inter-frame content redundancy of the volume video is ignored, the volume video coding and decoding algorithm can only provide a compression ratio of 4 to 8, and compared with hundred times compression ratios of 2D video coding algorithms H.264 and H.265 considering inter-frame coding, a larger gap is still reserved.
Disclosure of Invention
The invention aims to provide a volume video coding and decoding method based on inter-frame multiplexing, which aims to solve the problem that the bandwidth required by the current volume video transmission is too high. The invention reduces the bandwidth required by the transmission of the volume video by eliminating redundant information in the transmission, and simultaneously optimizes the decoding algorithm on the basis of the inter-frame multiplexing to meet the decoding frame rate.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention discloses a volume video coding and decoding method based on interframe multiplexing, which comprises the following steps:
step S1: inter-frame multiplexing coding;
step S1.1: detecting the picture similarity;
projecting the volume video of different frames into two-dimensional pictures at different angles, and then rapidly detecting the picture similarity between adjacent frames by using a structural similarity algorithm;
step S1.2: detecting the similarity of the point cloud blocks;
the adjacent frames which are detected to be similar by the picture similarity are subjected to block processing, and the similarity between two point cloud blocks is judged by using a heuristic algorithm;
step S1.3: inter-frame multiplexing;
after down-sampling two point cloud blocks which are similar through the point cloud block similarity detection, registering the two point cloud blocks as the input of a sparse convolution-based deep learning point cloud registration model, and calculating a transformation matrix between the two point cloud blocks, wherein the transformation matrix is used as an intermediate file of interframe multiplexing;
step S1.4: octree optimization coding algorithm;
performing intra-frame coding on geometric information of the volume video by adopting an octree optimization coding algorithm, and outputting an intermediate file of the geometric information;
step S1.5: reverse Morton order rearrangement;
rearranging color information of each frame of the volume video by adopting an inverse Morton sequence, projecting the color information to a two-dimensional plane to form a picture, and splicing multiple frames of pictures into a video stream;
step S1.6: a 2D video coding algorithm;
the method comprises the steps of splicing pictures of different frames of a volume video into a video stream by using a color information compression algorithm based on a 2D video compression technology in original sequence, and performing intra-frame and inter-frame compression on the video stream by using an H.264 algorithm to output an intermediate file of color information;
step S2: inter-frame multiplexing decoding;
step S2.1: inter-frame decoding;
firstly judging whether the intermediate file is the intermediate file multiplexed among frames in the step S1.3, if so, reconstructing the point cloud block by utilizing the transformation matrix of the point cloud block of the previous frame and the point cloud block of the previous frame; if not, respectively executing the step S2.2 and the step S2.3;
step S2.2: octree optimization decoding algorithm;
for the intermediate file which is not inter-frame multiplexing in the step S1.3, decoding the geometric information of the volume video by using an octree optimization decoding algorithm, and outputting the decoded geometric information of the volume video; the decoding frame rate is ensured by improving the octree structure and reasonably distributing GPU and CPU resources used during decoding of the mobile terminal;
step S2.3: a 2D video decoding algorithm;
for the intermediate file which is not inter-frame multiplexing in the step S1.3, decoding the volume video color information by using a 2D video decoding algorithm, and outputting the decoded volume video color information; and re-ordering by adopting a Morton order, and splicing the decoded volume video geometric information and the volume video color information to obtain single-frame volume video data.
Further, the specific operation steps of step S1.2 are as follows:
by traversing each point in the source point cloud block, finding the point closest to the target point cloud block, and obtaining the point matching relationship between the two point cloud blocks; two clouds of points are considered similar when the following three rules are satisfied simultaneously:
(1) The error between N (i, j) and N (i+1, j) is not more than 10%;
(2)the average value of (2) is less than or equal to 0.01m;
(3) 80% ofLess than 0.01m.
Where N (i, j) represents the number of points in the j-th block point cloud block of the i-th frame, N (i+1, j) represents the number of points in the j-th block point cloud block of the i+1-th frame, P (i, j, u) represents a point u in the j-th block point cloud block of the i-th frame, represents the nearest point to the point cloud block of the j-th block of the i+1th frame +.> Representing points u and +.>A Euclidean distance between them; and screening out similar point cloud blocks through the three rules.
Further, the specific operation steps of step S1.3 are as follows:
projecting the point cloud geometric information in the point cloud block into a grid; for each grid cell, calculating an internal center point; for each center point, the relative positions of the surrounding points are calculated and stored in a sparse tensor, and the point cloud geometric information is converted into the sparse tensor to serve as the input of the deep learning point cloud registration model.
Further, in step S1.3, the sparse convolution-based deep learning point cloud registration model includes: the device comprises a geometric information coding module, a color information coding module, an attention fusion module and a decoder module; the geometric information coding module extracts local and global features of the point cloud geometric information by using four sparse tensor convolution layers; the color information coding module adopts a pretrained ResNet34 to extract color information characteristics; the attention fusion module fuses the output of the geometric information coding module and the output of the color information coding module based on an attention mechanism, and generates a unique point cloud description vector for each point cloud block; the decoder module inputs point cloud description vectors which are respectively encoded by two point cloud blocks; the decoder module decodes each point cloud description vector by using four sparse tensor convolution layers, and simultaneously inputs information decoded by two point cloud description vectors into a full-connection layer, and the full-connection layer outputs a transformation matrix between two point cloud blocks as an intermediate file of inter-frame multiplexing.
Further, in step S1.3, the inter-multiplexed intermediate file is expressed as:
wherein H (i, j) represents an intermediate file when the jth block point cloud block of the ith frame adopts inter-frame multiplexing, namely, the change relative to the point cloud block of the previous frameMatrix change, A 3X3 Representing a rotation matrix, T 3X1 Represents translation vector, O 1X3 Represents the zero vector, S represents the overall scale factor, a 11 -a 13 Scaling factors in x-axis, y-axis, z-axis, a 21 -a 23 Respectively representing the shear factors in the x-axis, the y-axis and the z-axis, a 31 -a 33 Respectively representing the rotation factors in the x-axis, the y-axis and the z-axis, t x Representing the amount of translation in the x-axis, t y Representing the amount of translation on the y-axis, t z Representing the amount of translation in the z-axis.
Further, the specific operation steps of step S1.4 are as follows:
in the encoding process, different modes are adopted for encoding aiming at different parts of the octree structure; the octree structure is divided into two parts, the part above the third last layer of the octree is encoded by adopting the existing octree encoding algorithm, for the part below the third last layer of the octree, the dependency relationship of nodes on nodes at the upper layer is broken, paths of three layers after each leaf node is independently generated, the paths contain node numbers of each layer, and the path information of each leaf node is stored by taking the node number of the fourth last layer as a key in a hash mode, so that the corresponding node can be quickly found after decoding.
Further, the specific operation steps of step S1.5 are as follows:
the inverse morton sequence decodes the one-dimensional position sequence number of a certain point into a two-dimensional coordinate, wherein the values of the two coordinate axes are respectively an even bit and an odd bit in the binary representation of the one-dimensional position sequence number, and then the generated two-dimensional coordinate is used for mapping the color information of the point to the corresponding position of the picture.
Further, in step S2.1, the following interframe multiplexing formula is adopted when reconstructing the point cloud block by using the previous frame point cloud block and the transformation matrix of the previous frame point cloud block:
C(i,j)=A 3X3 C(i-1,j)S+T 3X1 S
wherein C (i, j) represents the j-th block point cloud block of the i-th frame, and C (i-1, j) represents the j-th block point cloud block of the i-1-th frame. Based on the transformation matrix H (i, j) of the point cloud block of the previous frame (step S1.3) and the point cloud block C (i-1, j) of the j-th block of the i-1 frame, namely the point cloud block of the previous frame, the point cloud block of the current frame can be obtained by carrying out rotary translation and scaling on the point cloud block of the previous frame.
Further, the specific operation steps of step S2.2 are as follows:
step S2.2.1: according to the fact that the decoding speed of the last three layers of the octree is rapidly reduced, dividing the octree structure into two parts at the last third layer of the octree, and decoding the coded stream above the last third layer of the octree by adopting the existing octree decoding mode to obtain a first part of coded stream; the coded stream under the last third layer of the octree adopts parallel decoding of each node to obtain a second part of coded stream;
step S2.2.2: and (3) optimizing the computing resource allocation during decoding of the mobile terminal, decoding the first part of coded stream by utilizing CPU resources, simultaneously decoding the second part of coded stream in parallel by utilizing GPU resources to refine the point cloud position, and directly rendering in a completely parallel mode.
Further, the specific operation steps of step S2.3 are as follows:
the Morton sequence converts the two-dimensional coordinates of a certain point on the two-dimensional picture into binary numbers, and the binary numbers are respectively used as even number bits and odd number bits of the binary representation of the one-dimensional position sequence numbers to obtain the one-dimensional position sequence numbers, and then the decoded color information is aligned with the decoded geometric information and spliced, so that the original volume video is restored.
The beneficial effects of the invention are as follows:
according to the volume video encoding and decoding method based on the inter-frame multiplexing, the redundancy among the volume video frames is utilized to further compress the data, so that the volume video transmission flow is reduced, the bandwidth required by a volume video streaming media system is greatly reduced, and the volume video playing is optimized. Compared with the prior art, the invention has the following advantages:
1) The invention adopts the inter-frame multiplexing technology in the field of the volume video streaming media system, thereby realizing the aim of greatly reducing the bandwidth required by transmission;
2) According to the method, the motion estimation is performed between the adjacent frames of the volume video by using the deep learning point cloud registration model based on the geometric information and the color information for the first time, so that quick and accurate redundant information elimination is realized;
3) The invention compresses the color information of the volume video by utilizing the 2D video coding algorithm for the first time, and realizes the inter-frame coding of the color information of the volume video;
4) The invention solves the problem of insufficient decoding frame rate of a decoding algorithm based on the inter-frame multiplexing at a mobile terminal.
Drawings
Fig. 1 is a flow chart of a method for encoding and decoding a volume video based on inter-frame multiplexing according to the present invention.
Detailed Description
In a first aspect, as shown in fig. 1, the present invention provides a method for encoding and decoding a video based on inter-frame multiplexing. Referring to fig. 1 for explanation, the method for encoding and decoding the volume video based on the inter-frame multiplexing mainly comprises the following steps:
step S1: inter-frame multiplexing coding;
the inter-frame multiplexing coding part mainly comprises six steps, namely picture similarity detection, point cloud block similarity detection, inter-frame multiplexing, octree optimization coding algorithm, inverse Morton order rearrangement and 2D video coding algorithm. The specific operation flow is as follows:
step S1.1: detecting the picture similarity;
in the field of 2D video research, judging whether pictures have similarity through algorithms such as structural similarity is a common operation. Whereas volumetric video consists of stereoscopic video data (3D data), reality can be realistically mapped into the digital world through millions of colored 3D points, giving the user a highly immersive good experience. And doing similarity matching on a millions of point clouds would take a lot of computation effort and time. Therefore, the invention designs a two-dimensional picture-based picture similarity detection module, which projects the volume videos (three-dimensional video data) of different frames into a two-dimensional picture at different angles, then uses a structural similarity algorithm to rapidly detect the similarity of the two-dimensional picture to determine whether the point clouds of adjacent frames have larger inter-frame redundancy, and uses a point cloud block similarity detection module with larger operation amount and time consumption to match on the basis of larger redundancy.
Step S1.2: detecting the similarity of the point cloud blocks;
for the adjacent frames which are similar through the picture similarity detection, the blocking processing is needed, and further point cloud block similarity detection is needed. Therefore, the invention designs a point cloud block similarity detection module, and a heuristic algorithm is embedded in the point cloud block similarity detection module, so that whether two point cloud blocks are similar or not can be detected rapidly through the heuristic algorithm. First, a matching relationship of points between two point cloud blocks needs to be generated: and traversing each point in the source point cloud block, thereby finding the point closest to the target point cloud block and obtaining the matching relation of the corresponding points. Two clouds of points are considered similar when the following three rules are satisfied simultaneously:
(1) The error between N (i, j) and N (i+1, j) is not more than 10%;
(2)the average value of (2) is less than or equal to 0.01m;
(3) 80% ofLess than 0.01m.
Where N (i, j) represents the number of points in the j-th block point cloud block of the i-th frame, N (i+1, j) represents the number of points in the j-th block point cloud block of the i+1-th frame, P (i, j, u) represents a point u in the j-th block point cloud block of the i-th frame, represents the nearest point to the point cloud block of the j-th block of the i+1th frame +.> Representing points u and +.>Euclidean distance between them. The three rules can screen out similar point cloud blocks to be used as the input of the inter-frame multiplexing module.
Step S1.3: inter-frame multiplexing;
and (2) according to the similar point cloud blocks obtained in the step (S1.2), taking the down-sampled point cloud blocks as the input of a point cloud registration algorithm to quickly obtain a corresponding transformation matrix. However, conventional point cloud registration algorithms typically only process small-scale point cloud data, which can become very slow or not at all for large-scale point cloud data. Meanwhile, the traditional point cloud registration algorithm can only rely on the geometric information of the point cloud for registration, and the color information of the point cloud is ignored. Therefore, the invention designs an inter-frame multiplexing module, and embeds a deep learning point cloud registration model based on sparse convolution in the inter-frame multiplexing module.
Specifically, firstly, point cloud geometric information in a point cloud block is projected into a grid; for each grid cell, calculating an internal center point; for each center point, calculating the relative positions of surrounding points of the center point, and storing the relative positions in a sparse tensor; through the steps, the point cloud geometric information can be converted into a sparse tensor to be used as the input of the deep learning point cloud registration model.
The sparse convolution-based deep learning point cloud registration model mainly comprises four main components: a geometric information coding module, a color information coding module, an attention fusion module and a decoder module. The geometric information coding module uses four sparse tensor convolution layers to extract local and global characteristics of point cloud geometric information; the color information coding module adopts a pretrained ResNet34 (depth residual error network) and can well extract color information characteristics; the main purpose of the attention fusion module is to fuse the output of the geometric information coding module and the output of the color information coding module based on an attention mechanism, and generate a unique point cloud description vector for each point cloud block; the decoder module inputs point cloud description vectors which are respectively encoded by two point cloud blocks; the decoder module decodes each point cloud description vector by using four sparse tensor convolution layers, and simultaneously inputs information decoded by two point cloud description vectors into a full-connection layer, and the full-connection layer outputs a transformation matrix between two point cloud blocks as an intermediate file of inter-frame multiplexing.
The obtained intermediate file can be expressed as:
wherein H (i, j) represents an intermediate file when the j-th block point cloud block of the i-th frame adopts inter-frame multiplexing, namely a transformation matrix relative to the point cloud block of the previous frame 3X3 Representing a rotation matrix, T 3X1 Represents translation vector, O 1X3 Represents the zero vector, S represents the overall scale factor, a 11 -a 13 Scaling factors in x-axis, y-axis, z-axis, a 21 -a 23 Respectively representing the shear factors in the x-axis, the y-axis and the z-axis, a 31 -a 33 Respectively representing the rotation factors in the x-axis, the y-axis and the z-axis, t x Representing the amount of translation in the x-axis, t y Representing the amount of translation on the y-axis, t z Representing the amount of translation in the z-axis.
Step S1.4: octree optimization coding algorithm;
the invention uses the optimized octree coding algorithm to carry out the intra-frame coding of the geometric information of the volume video (the stereo video data) and outputs the intermediate file of the geometric information. When encoding is performed based on the conventional octree encoding algorithm, the decoding speed of the last three layers is drastically reduced, which severely limits the decoding speed of the mobile terminal. Therefore, the invention designs an octree optimizing coding module, in the coding process, different modes are adopted for coding different parts of the octree structure, the octree structure is divided into two parts, the part above the last third layer of the octree is coded by adopting a traditional octree coding algorithm, the dependency relationship of nodes to upper nodes is broken for the part below the last third layer of the octree, paths of three subsequent layers are independently generated for each leaf node, the paths comprise node numbers of each layer, and the path information of each leaf node is stored by adopting a hash mode by taking the node number of the last fourth layer as a key, so that the corresponding node can be quickly searched after decoding.
Step S1.5: reverse Morton order rearrangement;
the color information of a volume video (stereoscopic video data) is compressed using a 2D video coding algorithm, and it is necessary to project the color information of a single frame to a two-dimensional plane. However, using the conventional raster scan order method, i.e., arranging color information in an S-shape on a picture from left to right, results in a large color difference in the vertical direction, resulting in poor intra-frame compression effect. Therefore, the invention adopts the inverse Morton order rearrangement module to rearrange the color information of each frame of the volume video and project the color information to a two-dimensional plane to form a picture. Specifically, the inverse morton sequence decodes a one-dimensional position sequence number of a certain point into a two-dimensional coordinate, wherein the values of two coordinate axes are respectively an even bit and an odd bit in a binary representation of the one-dimensional position sequence number, and then the generated two-dimensional coordinate is used for mapping the color information of the point to a corresponding position of a picture. The method can maximize the locality of the colors of adjacent points in the 8×8 pixel block, and generate smoother color patterns for the color information of each frame of the volume video, so as to achieve better intra-frame compression effect.
Step S1.6: a 2D video coding algorithm;
because the compression ratio of the existing method to the color information of the volume video is too low, the inter-frame redundancy of the color information is not fully utilized, and the problem that the bandwidth required by the color information of the volume video is higher can not be well solved. Therefore, the invention designs a 2D video coding module, and a color information compression algorithm based on a 2D video compression technology is embedded in the 2D video coding module, the color information of each frame of the volume video is projected to a two-dimensional plane to form a picture by adopting an inverse Morton sequence in the step S1.5, the pictures of different frames of the volume video are spliced into a video stream by adopting the original sequence, and the video stream is subjected to intra-frame and inter-frame compression by utilizing the existing H.264 algorithm.
Step S2: inter-frame multiplexing decoding;
the inter-multiplexing decoding section mainly includes three steps, which are inter-decoding, octree-optimized decoding algorithm, and 2D video decoding algorithm, respectively. The specific operation flow is as follows:
step S2.1: inter-frame decoding;
firstly judging whether the intermediate file is the intermediate file of the inter-frame multiplexing in the step S1.3, namely whether the inter-frame multiplexing coding is adopted, if so, reconstructing the point cloud block based on the point cloud block of the previous frame and the transformation matrix of the point cloud block of the previous frame for the intermediate file adopting the inter-frame multiplexing coding. If not, step S2.2 and step S2.3 are performed respectively. The formula of the inter-frame multiplexing adopted in reconstruction is as follows:
C(i,j)=A 3X3 C(i-1,j)S+T 3X1 S
wherein C (i, j) represents the j-th block point cloud block of the i-th frame, and C (i-1, j) represents the j-th block point cloud block of the i-1-th frame. Based on the transformation matrix H (i, j) of the point cloud block of the previous frame (step S1.3) and the point cloud block C (i-1, j) of the j-th block of the i-1 frame, namely the point cloud block of the previous frame, the point cloud block of the current frame can be obtained by carrying out rotary translation and scaling on the point cloud block of the previous frame.
Step S2.2: octree optimization decoding algorithm;
for video frames that are not inter-multiplexed in step S1.3, i.e. not inter-multiplexed encoded, the encoded volumetric video geometry information needs to be decoded. The invention decodes the volume video geometric information by using the optimized octree decoding algorithm, outputs the decoded volume video geometric information, and ensures the decoding frame rate by improving the octree structure and reasonably distributing the GPU and CPU resources used during decoding at the mobile terminal. The specific operation steps are as follows:
step S2.2.1 divides the octree structure into two parts at the last three layers of the octree according to the fact that the decoding speed of the last three layers of the octree drops sharply. The coded stream above the last layer of the octree adopts a traditional octree decoding mode; the encoded stream below the next-to-last layer of the octree employs parallel decoding for each node.
In step S2.2.2, for computing resource allocation optimization during decoding at the mobile terminal, the encoded stream of the geometric information of the volume video includes a conventional octree encoded stream and an octree encoded stream optimized by the last three layers of octree.
Step S2.3: a 2D video decoding algorithm;
the volumetric video color information is decoded using an h.264 decoder and reordered using morton order. Specifically, the morton sequence converts the two-dimensional coordinates of a certain point on the two-dimensional picture into binary numbers, and obtains one-dimensional position serial numbers by respectively using even number bits and odd number bits which are represented by the one-dimensional position serial number binary numbers, and then aligns and splices the decoded color information with the decoded geometric information to recover the original volume video (stereoscopic video data).
In a second aspect, the present invention provides a video encoding and decoding system based on inter-frame multiplexing, which is mainly used for implementing the video encoding and decoding method based on inter-frame multiplexing. The system mainly comprises the following modules:
the device comprises an inter-frame multiplexing coding module and an inter-frame multiplexing decoding module, wherein the inter-frame multiplexing coding module mainly comprises:
the picture similarity detection module is used for rendering a two-dimensional picture from different angles for each frame of the volume video, and then rapidly detecting the picture similarity between adjacent frames by using a structural similarity algorithm;
the point cloud block similarity detection module is used for carrying out block division processing on adjacent frames which are similar in picture similarity detection and judging the similarity between the point cloud blocks by using a heuristic algorithm;
the inter-frame multiplexing module is used for registering the similar point cloud blocks by using a sparse convolution-based deep learning point cloud registration model, calculating a transformation matrix between the two point cloud blocks to replace the original point cloud blocks, and taking the transformation matrix as an inter-frame multiplexing intermediate file;
the octree optimizing and encoding module is used for carrying out intra-frame encoding on the single-frame point cloud geometric information and outputting an intermediate file of the geometric information;
the inverse Morton order rearrangement module is used for rearranging the color information of each frame of the volume video and projecting the color information to a two-dimensional plane to form a picture;
the 2D video coding module is used for projecting the color information of each frame of the volume video to a two-dimensional plane by adopting an inverse Morton sequence to form a picture by using a color information compression algorithm based on a 2D video compression technology, splicing the pictures of different frames of the volume video into a video stream by adopting an original sequence, and carrying out intra-frame and inter-frame compression on the video stream by utilizing the existing H.264 algorithm to output an intermediate file of the color information.
The judging module is used for judging whether the cloud block of the point adopts inter-frame multiplexing coding or not;
the inter-frame multiplexing decoding module mainly comprises:
the inter-frame decoding module is used for reconstructing the intermediate file adopting inter-frame multiplexing coding based on the previous frame point cloud block and the transformation matrix of the previous frame point cloud block;
the octree optimizing and decoding module is used for decoding the coded volume video geometric information by adopting an octree decoding algorithm for video frames which are not coded by adopting inter-frame multiplexing, and outputting the decoded volume video geometric information;
the 2D video decoding module is used for decoding the volume video color information coded stream by adopting an H.264 decoder, reordering by adopting a Morton order and outputting the decoded volume video color information; and the single-frame volume video data is obtained by splicing the decoded volume video geometric information and the volume video color information.
In order to verify the feasibility of the volume video coding and decoding method based on the inter-frame multiplexing, a plurality of verification tests are carried out. The result shows that compared with the single-frame volume video coding and decoding method, the volume video coding and decoding method based on the inter-frame multiplexing has the advantages that the overall video stream compression ratio is improved from about 9 to about 31.27, the compression ratio of geometric information is improved from about 12 to about 19, and point cloud blocks with the similarity of about 21% between adjacent frames can be used for inter-frame multiplexing; under the condition of using a lossless 2D video coding and decoding algorithm, the color information compression ratio can be increased to 10.4 from about 7, and the color information is not lost; under the condition of using a lossy 2D video coding and decoding algorithm, the color information compression ratio can be improved from about 7 to about 50, the structural similarity between a rendered picture and a lossless picture is maintained at about 0.95, and the visual quality loss is small; in addition, the decoding frame rate of the invention at the mobile terminal reaches more than 50fps, and the invention supports the smooth rendering of the volume video at the mobile terminal with six degrees of freedom.
The volume video coding and decoding method based on the inter-frame multiplexing can consider redundant information between adjacent frames in the coding process based on the inter-frame multiplexing, and reduces the bandwidth required by the existing volume video transmission by eliminating the redundant information in the transmission. The invention can further reduce the required bandwidth on the basis of the existing volume video streaming media system, reduces the application threshold of the volume video, can be applied to applications such as virtual reality, augmented reality and the like, and effectively solves the problem of larger bandwidth required by the volume video streaming media system.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. The volume video coding and decoding method based on the inter-frame multiplexing is characterized by comprising the following steps:
step S1: inter-frame multiplexing coding;
step S1.1: detecting the picture similarity;
projecting the volume video of different frames into two-dimensional pictures at different angles, and then rapidly detecting the picture similarity between adjacent frames by using a structural similarity algorithm;
step S1.2: detecting the similarity of the point cloud blocks;
the adjacent frames which are detected to be similar by the picture similarity are subjected to block processing, and the similarity between two point cloud blocks is judged by using a heuristic algorithm;
step S1.3: inter-frame multiplexing;
after down-sampling two point cloud blocks which are similar through the point cloud block similarity detection, registering the two point cloud blocks as the input of a sparse convolution-based deep learning point cloud registration model, and calculating a transformation matrix between the two point cloud blocks, wherein the transformation matrix is used as an intermediate file of interframe multiplexing;
step S1.4: octree optimization coding algorithm;
performing intra-frame coding on geometric information of the volume video by adopting an octree optimization coding algorithm, and outputting an intermediate file of the geometric information;
step S1.5: reverse Morton order rearrangement;
rearranging color information of each frame of the volume video by adopting an inverse Morton sequence, projecting the color information to a two-dimensional plane to form a picture, and splicing multiple frames of pictures into a video stream;
step S1.6: a 2D video coding algorithm;
the method comprises the steps of splicing pictures of different frames of a volume video into a video stream by using a color information compression algorithm based on a 2D video compression technology in original sequence, and performing intra-frame and inter-frame compression on the video stream by using an H.264 algorithm to output an intermediate file of color information;
step S2: inter-frame multiplexing decoding;
step S2.1: inter-frame decoding;
firstly judging whether the intermediate file is the intermediate file multiplexed among frames in the step S1.3, if so, reconstructing the point cloud block by utilizing the transformation matrix of the point cloud block of the previous frame and the point cloud block of the previous frame; if not, respectively executing the step S2.2 and the step S2.3;
step S2.2: octree optimization decoding algorithm;
for the intermediate file which is not inter-frame multiplexing in the step S1.3, decoding the geometric information of the volume video by using an octree optimization decoding algorithm, and outputting the decoded geometric information of the volume video; the decoding frame rate is ensured by improving the octree structure and reasonably distributing GPU and CPU resources used during decoding of the mobile terminal;
step S2.3: a 2D video decoding algorithm;
for the intermediate file which is not inter-frame multiplexing in the step S1.3, decoding the volume video color information by using a 2D video decoding algorithm, and outputting the decoded volume video color information; and re-ordering by adopting a Morton order, and splicing the decoded volume video geometric information and the volume video color information to obtain single-frame volume video data.
2. The method for video encoding and decoding based on inter-frame multiplexing as claimed in claim 1, wherein the specific operation steps of step S1.2 are as follows:
by traversing each point in the source point cloud block, finding the point closest to the target point cloud block, and obtaining the point matching relationship between the two point cloud blocks; two clouds of points are considered similar when the following three rules are satisfied simultaneously:
(1) The error between N (i, j) and N (i+1, j) is not more than 10%;
(2)the average value of (2) is less than or equal to 0.01m;
(3) 80% ofLess than 0.01m.
Where N (i, j) represents the number of points in the j-th block point cloud block of the i-th frame, N (i+1, j) represents the number of points in the j-th block point cloud block of the i+1-th frame, P (i, j, u) represents a point u in the j-th block point cloud block of the i-th frame, represents the nearest point to the point cloud block of the j-th block of the i+1th frame +.> Representing points u and +.>A Euclidean distance between them; and screening out similar point cloud blocks through the three rules.
3. The method for video encoding and decoding based on inter-frame multiplexing as claimed in claim 1, wherein the specific operation steps of step S1.3 are as follows:
projecting the point cloud geometric information in the point cloud block into a grid; for each grid cell, calculating an internal center point; for each center point, the relative positions of the surrounding points are calculated and stored in a sparse tensor, and the point cloud geometric information is converted into the sparse tensor to serve as the input of the deep learning point cloud registration model.
4. The method according to claim 1, wherein in step S1.3, the sparse convolution-based deep learning point cloud registration model comprises: the device comprises a geometric information coding module, a color information coding module, an attention fusion module and a decoder module; the geometric information coding module extracts local and global features of the point cloud geometric information by using four sparse tensor convolution layers; the color information coding module adopts a pretrained ResNet34 to extract color information characteristics; the attention fusion module fuses the output of the geometric information coding module and the output of the color information coding module based on an attention mechanism, and generates a unique point cloud description vector for each point cloud block; the decoder module inputs point cloud description vectors which are respectively encoded by two point cloud blocks; the decoder module decodes each point cloud description vector by using four sparse tensor convolution layers, and simultaneously inputs information decoded by two point cloud description vectors into a full-connection layer, and the full-connection layer outputs a transformation matrix between two point cloud blocks as an intermediate file of inter-frame multiplexing.
5. The method of claim 1, wherein in step S1.3, the inter-frame multiplexed intermediate file is represented as:
wherein H (i, j) represents an intermediate file when the j-th block point cloud block of the i-th frame adopts inter-frame multiplexing, namely a transformation matrix relative to the point cloud block of the previous frame 3X3 Representing a rotation matrix, T 3X1 Represents translation vector, O 1X3 Represents the zero vector, S represents the overall scale factor, a 11 -a 13 Scaling factors in x-axis, y-axis, z-axis, a 21 -a 23 Respectively representing the shear factors in the x-axis, the y-axis and the z-axis, a 31 -a 33 Respectively representing the rotation factors in the x-axis, the y-axis and the z-axis, t x Representing the amount of translation in the x-axis, t y Representing the amount of translation on the y-axis, t z Representing the amount of translation in the z-axis.
6. The method for video encoding and decoding based on inter-frame multiplexing as claimed in claim 1, wherein the specific operation steps of step S1.4 are as follows:
in the encoding process, different modes are adopted for encoding aiming at different parts of the octree structure; the octree structure is divided into two parts, the part above the third last layer of the octree is encoded by adopting the existing octree encoding algorithm, for the part below the third last layer of the octree, the dependency relationship of nodes on nodes at the upper layer is broken, paths of three layers after each leaf node is independently generated, the paths contain node numbers of each layer, and the path information of each leaf node is stored by taking the node number of the fourth last layer as a key in a hash mode, so that the corresponding node can be quickly found after decoding.
7. The method for video encoding and decoding based on inter-frame multiplexing as claimed in claim 1, wherein the specific operation steps of step S1.5 are as follows:
the inverse morton sequence decodes the one-dimensional position sequence number of a certain point into a two-dimensional coordinate, wherein the values of the two coordinate axes are respectively an even bit and an odd bit in the binary representation of the one-dimensional position sequence number, and then the generated two-dimensional coordinate is used for mapping the color information of the point to the corresponding position of the picture.
8. The method of claim 1, wherein in step S2.1, the following formula is used for the inter-frame multiplexing when reconstructing the point cloud block by using the transformation matrix of the point cloud block of the previous frame and the point cloud block of the previous frame:
C(i,j)=A 3X3 C(i-1,j)S+T 3X1 S
c (i, j) represents the j-th point cloud block of the i-th frame, C (i-1, j) represents the j-th point cloud block of the i-1-th frame, and the current frame point cloud block can be obtained by carrying out rotary translation and scaling on the j-th point cloud block C (i-1, j) of the i-1-th frame and the j-th point cloud block of the i-1-th frame, namely the point cloud block of the last frame, based on a transformation matrix H (i, j) of the point cloud block of the last frame (step S1.3).
9. The method for video encoding and decoding based on inter-frame multiplexing as claimed in claim 1, wherein the specific operation steps of step S2.2 are as follows:
step S2.2.1: according to the fact that the decoding speed of the last three layers of the octree is rapidly reduced, dividing the octree structure into two parts at the last third layer of the octree, and decoding the coded stream above the last third layer of the octree by adopting the existing octree decoding mode to obtain a first part of coded stream; the coded stream under the last third layer of the octree adopts parallel decoding of each node to obtain a second part of coded stream;
step S2.2.2: and (3) optimizing the computing resource allocation during decoding of the mobile terminal, decoding the first part of coded stream by utilizing CPU resources, simultaneously decoding the second part of coded stream in parallel by utilizing GPU resources to refine the point cloud position, and directly rendering in a completely parallel mode.
10. The method for video encoding and decoding based on inter-frame multiplexing as claimed in claim 1, wherein the specific operation steps of step S2.3 are as follows:
the Morton sequence converts the two-dimensional coordinates of a certain point on the two-dimensional picture into binary numbers, and the binary numbers are respectively used as even number bits and odd number bits of the binary representation of the one-dimensional position sequence numbers to obtain the one-dimensional position sequence numbers, and then the decoded color information is aligned with the decoded geometric information and spliced, so that the original volume video is restored.
CN202310865717.1A 2023-07-14 2023-07-14 Volume video coding and decoding method based on inter-frame multiplexing Pending CN116828166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310865717.1A CN116828166A (en) 2023-07-14 2023-07-14 Volume video coding and decoding method based on inter-frame multiplexing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310865717.1A CN116828166A (en) 2023-07-14 2023-07-14 Volume video coding and decoding method based on inter-frame multiplexing

Publications (1)

Publication Number Publication Date
CN116828166A true CN116828166A (en) 2023-09-29

Family

ID=88127496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310865717.1A Pending CN116828166A (en) 2023-07-14 2023-07-14 Volume video coding and decoding method based on inter-frame multiplexing

Country Status (1)

Country Link
CN (1) CN116828166A (en)

Similar Documents

Publication Publication Date Title
Würmlin et al. 3D video fragments: Dynamic point samples for real-time free-viewpoint video
US10636201B2 (en) Real-time rendering with compressed animated light fields
US11153550B2 (en) Depth codec for real-time, high-quality light field reconstruction
CN109936745B (en) Method and system for improving decompression of raw video data
CN112449140B (en) Video super-resolution processing method and device
CN101616322A (en) Stereo video coding-decoding method, Apparatus and system
Li et al. A lightweight depth estimation network for wide-baseline light fields
US20220292730A1 (en) Method and apparatus for haar-based point cloud coding
CN112381813B (en) Panoramic view visual saliency detection method based on graph convolution neural network
Xu et al. Introduction to point cloud compression
CN109937573A (en) Use the lightfield compression of difference prediction replacement
CN112911302B (en) Novel merging prediction coding method for dynamic point cloud geometric information compression
Tohidi et al. Dynamic point cloud compression with cross-sectional approach
Cai et al. Towards 6DoF live video streaming system for immersive media
CN116828166A (en) Volume video coding and decoding method based on inter-frame multiplexing
Cui et al. Palette-based color attribute compression for point cloud data
US11915373B1 (en) Attribute value compression for a three-dimensional mesh using geometry information to guide prediction
US11606556B2 (en) Fast patch generation for video based point cloud coding
US20220394295A1 (en) Fast recolor for video based point cloud coding
US11683523B2 (en) Group of pictures based patch packing for video based point cloud coding
US11979606B2 (en) Conditional recolor for video based point cloud coding
WO2023024842A1 (en) Point cloud encoding/decoding method, apparatus and device, and storage medium
US20230156222A1 (en) Grid-based patch generation for video-based point cloud coding
WO2024060161A1 (en) Encoding method, decoding method, encoder, decoder and storage medium
Zhang et al. Content-aware Masked Image Modeling Transformer for Stereo Image Compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination