WO2023277128A1

WO2023277128A1 - Point cloud decoding device, point cloud decoding method, and program

Info

Publication number: WO2023277128A1
Application number: PCT/JP2022/026209
Authority: WO
Inventors: 恭平海野; 圭河村
Original assignee: Kddi株式会社
Priority date: 2021-07-02
Filing date: 2022-06-30
Publication date: 2023-01-05
Also published as: JP2023007934A

Abstract

A point cloud decoding device 200 according to the present invention is provided with: a geometric information reconstructing unit 2040 for reconstructing position information of a point cloud; an upsampling unit 2120 for upsampling the point cloud reconstructed by the geometric information reconstructing unit 2040; and a tree synthesizing unit 2020 for performing inter prediction with reference to the upsampled point cloud. The upsampling unit 2120 calculates an offset vector from the coordinates of each point in the reconstructed point cloud, and performs the upsampling using the offset vectors.

Description

Point group decoding device, point group decoding method and program

The present invention relates to a point cloud decoding device, a point cloud decoding method, and a program.

Non-Patent Document 1 discloses an inter-prediction technique for geometric information.

However, in Non-Patent Document 1, based on the distribution of points in the motion-compensated reference point group, the presence or absence of points in each subspace (node) of the point group to be encoded is estimated. At this time, especially when both the reference point group and the encoding target point group are sparse, there is a problem that if there is even a slight deviation in the result of motion compensation, the correlation in the time direction cannot be fully utilized.

Therefore, the present invention has been made in view of the above-mentioned problems, and a point cloud decoding device and a point cloud decoding method that can increase the encoding efficiency by inter prediction of position information by upsampling the reference point cloud. and to provide programs.

A first feature of the present invention is a point cloud decoding device, a geometric information reconstruction unit configured to reconstruct position information of the point cloud, and a geometric information reconstructed by the geometric information reconstruction unit an upsampling unit configured to upsample a point cloud; and a tree synthesis unit configured to perform inter prediction with reference to the upsampled point cloud, the upsampling unit is configured to calculate an offset vector from the coordinates of each point in the reconstructed point group, and perform upsampling using the offset vector.

A second feature of the present invention is a point cloud decoding device comprising a geometric information decoding unit configured to decode a first flag that controls whether or not Trisoup at multiple levels is permitted. is the gist.

A third feature of the present invention is a point group decoding method, which includes a geometric information reconstruction unit configured to reconstruct position information of the point cloud, and A step A of upsampling a point cloud, and a step B of performing inter prediction with reference to the upsampled point cloud, wherein in the step A, from the coordinates of each point in the reconstructed point cloud The gist is to calculate an offset vector and perform upsampling using the offset vector.

A fourth feature of the present invention is a program for causing a computer to function as a point cloud decoding device, wherein the point cloud decoding device is configured to reconstruct position information of a point cloud for geometric information reconstruction. an upsampling unit configured to upsample the point cloud reconstructed by the geometric information reconstruction unit; and an inter prediction with reference to the upsampled point cloud. and a tree synthesizing unit configured to calculate an offset vector from the coordinates of each point in the reconstructed point cloud, and perform upsampling using the offset vector. The gist of it is that

According to the present invention, it is possible to provide a point cloud decoding device, a point cloud decoding method, and a program that can improve the encoding efficiency of position information based on inter prediction by upsampling the reference point cloud.

FIG. 1 is a diagram showing an example of the configuration of a point cloud processing system 10 according to one embodiment. FIG. 2 is a diagram showing an example of functional blocks of the point cloud decoding device 200 according to one embodiment. FIG. 3 is a diagram showing an example of the configuration of encoded data (bitstream) received by the geometric information decoding unit 2010 of the point group decoding device 200 according to one embodiment. FIG. 4 is a diagram showing an example of the syntax configuration of GPS2011. FIG. 5 is a flowchart showing an example of upsampling in the upsampling unit 2120 of the point cloud decoding device 200 according to one embodiment. FIG. 6A is a diagram for explaining the operation of the upsampling unit 2120 of the point cloud decoding device 200 according to one embodiment. FIG. 6B is a diagram for explaining the operation of the upsampling unit 2120 of the point cloud decoding device 200 according to one embodiment. FIG. 7 is a flowchart showing an example of processing in the tree synthesizing unit 2020 of the point cloud decoding device 200 according to one embodiment. FIG. 8 is a flowchart showing an example of decoding processing of Trisoup information by the tree synthesizing unit 2020 of the point cloud decoding device 200 according to one embodiment. FIG. 9 is a diagram showing an example of functional blocks of the point cloud encoding device 100 according to one embodiment. FIG. 10 is a diagram showing an example of functional blocks of the tree synthesizing unit 2020 of the point cloud decoding device 200 according to one embodiment.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that constituent elements in the following embodiments can be appropriately replaced with existing constituent elements and the like, and various variations including combinations with other existing constituent elements are possible. Therefore, the following description of the embodiments is not intended to limit the scope of the invention described in the claims.

(First embodiment)
A point group processing system 10 according to a first embodiment of the present invention will be described below with reference to FIGS. 1 to 10. FIG. FIG. 1 is a diagram showing a point cloud processing system 10 according to an embodiment according to this embodiment.

As shown in FIG. 1, the point cloud processing system 10 has a point cloud encoding device 100 and a point cloud decoding device 200.

The point cloud encoding device 100 is configured to generate encoded data (bitstream) by encoding an input point cloud signal. The point cloud decoding device 200 is configured to generate an output point cloud signal by decoding the bitstream.

The input point cloud signal and the output point cloud signal are composed of position information and attribute information of each point in the point cloud. The attribute information is, for example, color information and reflectance of each point.

Here, such a bitstream may be transmitted from the point cloud encoding device 100 to the point cloud decoding device 200 via a transmission channel. Also, the bitstream may be stored in a storage medium and then provided from the point cloud encoding device 100 to the point cloud decoding device 200 .

(Point group decoding device 200)
The point group decoding device 200 according to the present embodiment will be described below with reference to FIGS. 2 and 10. FIG. FIG. 2 is a diagram showing an example of functional blocks of the point group decoding device 200 according to this embodiment.

As shown in FIG. 2, the point cloud decoding device 200 includes a geometric information decoding unit 2010, a tree synthesis unit 2020, an approximate surface synthesis unit 2030, a geometric information reconstruction unit 2040, an inverse coordinate transformation unit 2050, an attribute It has information decoding section 2060 , inverse quantization section 2070 , RAHT section 2080 , LoD calculation section 2090 , inverse lifting section 2100 , inverse color conversion section 2110 , upsampling section 2120 and frame buffer 2130 .

The geometric information decoding unit 2010 is configured to receive as input a bitstream related to geometric information (geometric information bitstream) among the bitstreams output from the point cloud encoding device 100 and decode the syntax.

The decoding process is, for example, context adaptive binary arithmetic decoding process. Here, for example, the syntax includes control data (flags and parameters) for controlling decoding processing of position information.

The tree synthesizing unit 2020 receives the control data decoded by the geometric information decoding unit 2010 and an occupancy code indicating in which node in the tree described later the point group exists, and determines in which region in the decoding target space the point is located. It is configured to generate tree information that exists.

It should be noted that the decoding process of the occupancy code may be configured to be performed inside the tree synthesizing unit 2020 .

This process divides the decoding target space into rectangular parallelepipeds, refers to the occupancy code to determine whether points exist in each rectangular parallelepiped, divides the rectangular parallelepiped in which the points exist into multiple rectangular parallelepipeds, and refers to the occupancy code. Tree information can be generated by recursively repeating the process.

Here, inter prediction, which will be described later, may be used when decoding such an occupancy code.

In this embodiment, a method called "Octree" that recursively divides the octree with the above-mentioned rectangular parallelepiped always as a cube, and "QtBt" that performs quadtree division and binary tree division in addition to octree division can be used. Whether or not to use "QtBt" is transmitted from the point group encoding device 100 side as control data.

Alternatively, when the control data specifies to use Predictive coding, the tree synthesis unit 2020 is configured to decode the coordinates of each point based on any tree configuration determined in the point cloud encoding device 100. ing.

The approximate surface synthesizing unit 2030 is configured to generate approximate surface information using the tree information generated by the tree synthesizing unit 2020, and decode the point group based on the approximate surface information.

For example, when decoding 3D point cloud data of an object, when the point clouds are densely distributed on the surface of the object, the approximate surface information is obtained by decoding the points instead of decoding individual point clouds. It is an approximation of the existence area of the group with a small plane.

Specifically, the approximate surface synthesizing unit 2030 can generate approximate surface information and decode the point group by, for example, a technique called "Trisoup". A specific processing example of "Trisoup" will be described later. Also, when decoding a sparse point group acquired by Lidar or the like, this process can be omitted.

The geometric information reconstruction unit 2040 reconstructs geometric information (decoding process position information in the coordinate system assumed by ).

The inverse coordinate transformation unit 2050 receives as input the geometric information reconstructed by the geometric information reconstruction unit 2040, transforms the coordinate system assumed by the decoding process into the coordinate system of the output point group signal, and converts the position information. configured to output.

The upsampling unit 2120 is configured to receive the geometric information reconstructed by the geometric information reconstructing unit 2040 and perform upsampling processing as described later.

The frame buffer 2130 is configured to receive the point cloud upsampled by the upsampling unit 2120 and store it as a reference frame. The stored reference frames are read from the frame buffer 2130 and used as reference frames when the tree synthesis unit 2020 inter-predicts temporally different frames.

Here, whether or not to use a reference frame at which time for each frame may be determined, for example, based on control data transmitted as a bitstream from the point group encoding device 100.

Here, an example of storing in the frame buffer 2130 after being upsampled by the upsampling unit 2120 has been described, but this order may be reversed.

That is, the geometric information output from the geometric information reconstruction unit 2040 is stored in the frame buffer 2130 as a reference frame, and then the reference frame stored in the frame buffer 2130 is used when inter prediction is performed by the tree synthesis unit 2020. may be up-sampled by the up-sampling unit 2120 and then input to the tree synthesis unit 2020 .

As yet another example, the upsampling unit 2120 may be configured to be included inside the tree synthesis unit 2020, as shown in FIG. The configuration example shown in FIG. 10 will be described below.

The tree synthesizing unit 2020 acquires from the frame buffer 2130 an already-decoded point cloud that is different in time from the point cloud to be decoded, and inputs it to the motion compensation unit 2021 .

The motion compensation unit 2021 is configured to perform motion compensation using motion information (motion vectors, etc.) decoded from control data. Since a known method can be applied to a specific motion compensation method, details thereof are omitted.

Here, the motion compensation unit 2021 may be configured to perform class classification processing on each point in the decoded point group and apply different motion compensation processing to each class.

For example, the motion compensation unit 2021 may be configured to assign a class of either "ground" or "object" to each point, and apply motion compensation only to points considered to be "object".

The upsampling unit 2021 is configured to input the motion-compensated point group and perform upsampling processing as described later.

As described above, when motion compensation processing based on class classification is performed, the upsampling unit 2021 may be configured to perform upsampling processing only for a specific class.

For example, the upsampling unit 2021 may be configured to perform upsampling processing only on the "ground" points, or to perform upsampling processing only on the "object" points.

For example, when the above-described class classification is performed based on the value of the z-axis coordinate (height component) of each point, the upsampling unit 2021 performs motion compensation based on the z-coordinate of each point in the point group. Alternatively, it may be configured to control whether or not to execute the upsampling process.

Specifically, only when the z-axis coordinate is greater than or equal to the first threshold (or greater than the first threshold) and less than or equal to the second threshold (or less than the second threshold) The sampler 2021 may be configured to perform upsampling.

Contrary to the above, only if the z-axis coordinate is less than or equal to the first threshold (or less than the first threshold) or greater than or equal to the second threshold (or greater than the second threshold) The upsampler 2021 may be configured to perform upsampling. Here, it is assumed that the second threshold is a value greater than the first threshold.

The probability estimation unit 2022 is configured to estimate the probability that the value of each bit of the occurrence code is 0 or that the value of each bit of the occurrence code is 1, using the point group after upsampling as input. ing.

The occupation information decoding unit 2023 is configured to decode the occupation code by entropy decoding based on the probability estimated by the probability estimation unit 2022 .

The tree configuration unit 2024 is configured to configure a tree based on the above-mentioned Occupancy Code value.

The attribute information decoding unit 2060 is configured to receive a bitstream related to attribute information (attribute information bitstream) among the bitstreams output from the point cloud encoding device 100 and decode the syntax.

The decoding process is, for example, context adaptive binary arithmetic decoding process. Here, for example, the syntax includes control data (flags and parameters) for controlling attribute information decoding processing.

Also, the attribute information decoding unit 2060 is configured to decode the quantized residual information from the decoded syntax.

The inverse quantization unit 2070 performs inverse quantization based on the quantized residual information decoded by the attribute information decoding unit 2060 and the quantization parameter, which is one of the control data decoded by the attribute information decoding unit 2060. and configured to generate dequantized residual information.

The inverse quantized residual information is output to either the RAHT section 2080 or the LoD calculation section 2090 according to the features of the point cloud to be decoded. Which one is output is specified by control data decoded by attribute information decoding section 2060 .

The RAHT unit 2080 receives as input the inverse quantized residual information generated by the inverse quantization unit 2070 and the geometric information generated by the geometric information reconstruction unit 2040, and performs a Haar transform called RAHT (Region Adaptive Hierarchical Transform) ( In the decoding process, a kind of inverse Haar transform is used to decode the attribute information of each point. As a specific processing of RAHT, for example, the method described in Non-Patent Document 1 can be used.

The LoD calculation unit 2090 is configured to receive the geometric information generated by the geometric information reconstruction unit 2040 and generate LoD (Level of Detail).

LoD is a reference relationship (referred point and referenced point ) is information for defining

In other words, LoD classifies each point included in geometric information into a plurality of levels, and encodes or decodes the attributes of points belonging to lower levels using attribute information of points belonging to higher levels. This is information that defines the structure.

As a specific method for determining LoD, for example, the method described in Non-Patent Document 1 may be used.

The inverse lifting unit 2100 uses the LoD generated by the LoD calculation unit 2090 and the inverse quantized residual information generated by the inverse quantization unit 2070 to determine the attribute information of each point based on the hierarchical structure defined by the LoD. is configured to decode the As a specific reverse lifting process, for example, the method described in Non-Patent Document 1 can be used.

The inverse color conversion unit 2110 uses the attribute information output from the RAHT unit 2080 or the inverse lifting unit 2100 when the attribute information to be decoded is color information and color conversion has been performed on the side of the point cloud encoding device 100. is configured to perform inverse color conversion processing. Whether or not to execute the inverse color conversion process is determined by the control data decoded by the attribute information decoding unit 2060 .

The point cloud decoding device 200 is configured to decode and output the attribute information of each point in the point cloud through the above processing.

(Geometric information decoding unit 2010)
The control data decoded by the geometric information decoding unit 2010 will be described below with reference to FIGS. 3 and 4. FIG.

FIG. 3 is an example of the structure of encoded data (bitstream) received by the geometric information decoding unit 2010. FIG.

First, the bitstream may contain GPS2011. GPS2011 is also called a geometry parameter set, and is a set of control data related to decoding of geometric information. A specific example will be described later. Each GPS 2011 includes at least GPS id information for individual identification when multiple GPS 2011 exist.

Second, the bitstream may contain GSH2012A/2012B. GSH2012A/2012B is also called geometry slice header or geometry data unit header, and is a set of control data corresponding to slices described later. In the following description, the term "slice" will be used, but the term "slice" can also be read as a data unit. A specific example will be described later. GSH 2012A/2012B includes at least GPS id information for designating GPS 2011 corresponding to each GSH 2012A/2012B.

Third, the bitstream may contain slice data 2013A/2013B next to GSH 2012A/2012B. The slice data 2013A/2013B includes data obtained by encoding geometric information. An example of the slice data 2013A/2013B is an occupancy code, which will be described later.

As described above, the bitstream has a configuration in which one GSH 2012A/2012B and one GPS 2011 correspond to each slice data 2013A/2013B.

As described above, in GSH2012A/2012B, which GPS 2011 to refer to is designated by GPS id information, so a common GPS 2011 can be used for multiple slice data 2013A/2013B.

In other words, GPS2011 does not necessarily need to transmit for each slice. For example, as shown in FIG. 3, it is possible to configure a bitstream such that GPS 2011 is not encoded immediately before GSH 2012B and slice data 2013B.

It should be noted that the configuration in FIG. 3 is just an example. As long as GSH 2012A/2012B and GPS 2011 correspond to each slice data 2013A/2013B, elements other than those described above may be added as components of the bitstream.

For example, the bitstream may include a sequence parameter set (SPS) 2001, as shown in FIG. Also, in the same way, it may be shaped into a configuration different from that in FIG. 3 at the time of transmission. Furthermore, it may be combined with a bitstream decoded by the attribute information decoding unit 2060, which will be described later, and transmitted as a single bitstream.

FIG. 4 is an example of the syntax configuration of GPS2011.

　The syntax name explained below is just an example. As long as the functionality of the syntaxes described below is similar, the syntax names can be different.

The GPS 2011 may include GPS id information (gps_geom_parameter_set_id) for identifying each GPS 2011.

The Descriptor column in FIG. 4 means how each syntax is encoded. ue(v) means an unsigned zero-order exponential Golomb code, and u(1) means a 1-bit flag.

The GPS 2011 may include a flag (interprediction_enabled_flag) that controls whether or not the tree synthesis unit 2020 performs inter prediction.

For example, it may be defined that inter prediction is not performed when the value of interpretation_enabled_flag is "0", and that inter prediction is performed when the value of interpretation_enabled_flag is "1".

Note that the interpretation_enabled_flag may be included in the SPS2001 instead of the GPS2011.

The geometric information decoding unit 2020 may be configured to additionally decode the following syntax when performing inter prediction, that is, when the value of the interpolation_enabled_flag is "1".

The GPS 2011 may include a flag (reference_upsampling_enabled_flag) that controls whether to apply upsampling by the upsampling unit 2120 when performing inter prediction.

For example, it may be defined that upsampling is not applied when the value of reference_upsampling_enabled_flag is "0", and that upsampling is applied when the value of reference_upsampling_enabled_flag is "1".

The geometric information decoding unit 2020 may be configured to additionally decode the following syntax when upsampling is applied, that is, when the value of reference_upsampling_enabled_flag is "1".

The GPS 2011 may include syntax (upsampling_number_div2) that controls the number of upsampling points. How to specifically use the value of the syntax will be described later.

The GPS 2011 may include a syntax (upsampling_interval) that controls the interval between upsampling points. How to specifically use the value of the syntax will be described later.

The GPS 2011 may include a flag (trisoup_enabled_flag) that controls whether Trisoup is used in the approximate surface synthesis unit 2030.

For example, it may be defined that Trisoup is not used when the value of trisoup_enabled_flag is "0", and that Trisoup is used when the value of trisoup_enabled_flag is "1".

The geometric information decoding unit 2020 may be configured to additionally decode the following syntax when using Trisoup, that is, when the value of trisoup_enabled_flag is "1".

Note that trisoup_enabled_flag may be included in SPS2001 instead of GPS2011.

The GPS 2011 may include a flag (trisoup_multilevel_enabled_flag, first flag) that controls whether to allow Trisoup at multiple levels.

For example, when the value of trisoup_multilevel_enabled_flag is '0', Trisoup at multiple levels is not permitted. It may be defined that Trisoup is allowed.

If the syntax is not included in GPS2011, the value of the syntax may be regarded as the value for Trisoup at a single level, that is, "0".

Note that trisoup_multilevel_enabled_flag may be defined to be included in SPS2001 instead of GPS2011. In this case, if the SPS 2001 does not include the trisoup_multilevel_enabled_flag, the value of the syntax may be regarded as the value for Trisoup at a single level, ie, "0".

The geometric information decoding unit 2020 may be configured to additionally decode the following syntax when trisoup at multiple levels is permitted, that is, when the value of trisoup_multilevel_enabled_flag is "1".

The GPS 2011 may include syntax (log2_trisoup_max_node_size_minus3) that defines the maximum value of the Trisoup node size when allowing Trisoup at multiple levels.

The syntax may be expressed as a logarithmic value with base 2 for the maximum value of the actual Trisoup node size. Further, the syntax may express the maximum value of the actual Trisoup node size as a value obtained by subtracting 3 after converting to base 2 logarithm.

The GPS 2011 may include syntax (log2_trisoup_min_node_size_minus3) that defines the minimum value of the Trisoup node size when allowing Trisoup at multiple levels.

The syntax may be expressed as a logarithmic value with base 2 for the minimum value of the actual Trisoup node size. Furthermore, the syntax may express the minimum value of the actual Trisoup node size as a value after converting to base 2 logarithm and then subtracting 3.

In addition, the value of the syntax may be constrained to always be 0 or more and log2_trisoup_max_node_size_minus3 or less.

The geometric information decoding unit 2020 may be configured to additionally decode the following syntax when Trisoup at multiple levels is not permitted, that is, when the value of trisoup_multilevel_enabled_flag is "0".

The GPS 2011 may include syntax (log2_trisoup_node_size_minus3) to specify the Trisoup node size when not allowing Trisoup at multiple levels.

The syntax may be expressed as a logarithmic value with base 2 for the actual Trisoup node size. Further, the syntax may express the actual Trisoup node size as a value after converting to base 2 logarithm and then subtracting 3.

(Up-sampling unit 2120)
Upsampling in the upsampling section 2120 will be described below with reference to FIGS. 5 and 6. FIG.

FIG. 5 is a flowchart showing an example of upsampling in the upsampling section 2120. FIG.

As shown in FIG. 5, in step S501, the upsampling unit 2120 performs upsampling on all points included in the point group reconstructed from the geometric information output from the geometric information reconstructing unit, that is, the positional information. Determine whether it is finished.

When all points have been processed, the upsampling unit 2120 ends the process in step S504. On the other hand, if the processing of all points has not been completed, the upsampling unit 2120 proceeds to step S502.

In step S502, the upsampling unit 2120 calculates an offset vector.

An example of obtaining an offset vector for a point P on the xyz space shown in FIG. 6A will be described below. Here, the coordinates of the point P are (X, Y, Z), and the vector starting from the origin O and ending at the point P is vp.

The offset vector vu may be calculated as normalized so that the L2 norm of vu is 1, for example. That is, the upsampling section 2120 may calculate the offset vector vu based on the following equation.

vu = vp/(X2+Y2+Z2)1/2
Note that the upsampling section 2120 may use approximate calculations to perform integer operations for the square root calculation and division described above. Also, in the approximate calculation described above, if the absolute value of vp is too small, the error due to the integer calculation becomes too large, so i·vu, which will be described later, may be obtained instead of vu.

The offset vector vu is calculated as a vector that points in the same direction as the vector vp that points to the point P starting from the origin O and that has a predetermined size (for example, L2 norm) for the point P to be up-sampled. For example, it may be calculated by a method other than the above.

In other words, assuming that the point P exists on a sphere centered on the origin O, if the offset vector vu is calculated so as to be normal to the spherical surface, it can be calculated by a method other than the above. I don't mind.

Also, the upsampling section 2120 may derive the offset vector vu by a method that includes some approximation errors with respect to the direction and magnitude described above.

After calculating the offset vector vu, the upsampling unit 2120 moves to step S503.

In step S503, the upsampling unit 2120 performs upsampling based on the offset vector vu calculated in step S502.

Again, the case of upsampling the point P in FIG. 6A will be described as an example. FIG. 6B shows an example of how the point P looks after upsampling.

Here, i shown in FIG. 6B is a parameter that controls the upsampling interval, and a value decoded as the syntax (upsampling_interval) that controls the interval of the upsampling points included in GPS2011 is used.

Also, N shown in FIG. 6B is a parameter that controls the number of points generated by upsampling for one input point, and the value decoded as the syntax (upsampling_number_div2) that controls the number of upsampling points included in GPS2011 is use.

Upsampling can be realized, for example, by adding a point having a coordinate value vs as follows, where vp is the coordinate of point P.

vs=vp±ni·vu (n=1, 2, . . . , N)
In this case, 2N points are added to one input point by upsampling.

After performing upsampling as described above, the upsampling unit 2120 returns to step S501 and performs the following processing.

(Tree synthesis unit 2020)
The processing of the tree synthesizing unit 2020 will be described below with reference to FIGS. 7 and 8. FIG. FIG. 7 is a flowchart showing an example of processing in the tree synthesizing unit 2020. As shown in FIG. An example of synthesizing a tree using "Octree" will be described below.

In step S701, the tree synthesizing unit 2020 confirms whether or not all depth processes have been completed. Note that the depth number may be included as control data in a bitstream transmitted from the point cloud encoding device 100 to the point cloud decoding device 200 .

The tree synthesizing unit 2020 calculates the node size of the depth. For 'Octree', the initial Depth node size may be defined as '2 raised to the power of Depth'. That is, when the depth number is N, the node size of the first depth may be defined as 2 to the Nth power.

Also, the node size at the second and subsequent Depths may be defined by decreasing the number of N by one. That is, the node size of the second depth is defined as "2 to the (N-1) power", the node size of the third depth is defined as "2 to the (N-2) power", . may be defined as

Alternatively, since the node size is always defined as a power of 2, the value of the exponent part (N, N−1, N−2, etc.) may be simply considered as the node size. In the following description, node size refers to the value of the exponent part.

If the processing of all depths is completed, the tree synthesis unit 2020 proceeds to step S709, and if the processing of all depths is not completed, the tree synthesis unit 2020 proceeds to step S702.

In other words, when the depth is the n-th depth, if (N−n)=0, the tree combining unit 2020 proceeds to step S709; goes to step S702.

Here, when the flag (trisoup_enabled_flag) for controlling whether or not to use Trisoup indicates that Trisoup is used, that is, when the value of trisoup_enabled_flag is "1", the tree synthesizing unit 2020 processes The depth number may be varied based on the value of the syntax that defines the minimum value of the Trisoup node size (log2_trisoup_min_node_size_minus3) or the syntax that defines the Trisoup node size (log2_trisoup_node_size_minus3). In such a case, for example, it may be defined as follows.
Processing Depth number=Total Depth number−(minimum) Trisoup node size Here, the minimum Trisoup node size can be defined by, for example, (log2_trisoup_min_node_size_minus3+3). Similarly, the Trisoup node size can be defined as (log2_trisoup_node_size_minus3+3).

In this case, if the processing for all the processing depth numbers has been completed, the tree synthesizing unit 2020 proceeds to step S709; otherwise, the tree synthesizing unit 2020 proceeds to step S702.

In other words, if (number of processing depths−n)=0, the tree synthesizing unit 2020 proceeds to step S709, and if (number of processing depths−n)>0, the tree synthesizing unit 2020 proceeds to step S702. .

Also, the tree synthesizing unit 2020 may determine that Trisoup is applied to all nodes having a node size (N-number of processing depths) when proceeding to step S709.

In step S702, the tree synthesizing unit 2020 determines whether it is necessary to decode Trisoup_applied_flag, which will be described later, at the relevant depth.

For example, when "Trisoup at multiple levels is enabled (the value of trisoup_multilevel_enabled_flag is "1")" and "the node size (Nn) of the relevant Depth is equal to or smaller than the maximum Trisoup node size", the tree synthesizing unit 2020 may determine that "decoding of Trisoup_applied_flag is required".

Also, the tree synthesis unit 2020 may determine that "decoding of Trisoup_applied_flag is unnecessary" when the above condition is not satisfied.

Here, the maximum Trisoup node size can be defined, for example, by (log2_trisoup_max_node_size_minus3+3).

After the above determination is completed, the tree synthesizing unit 2020 moves to step S703.

In step S703, the tree synthesizing unit 2020 determines whether or not all nodes included in the depth have been processed.

When it is determined that the processing of all nodes of the depth has been completed, the tree synthesizing unit 2020 moves to step S701 and performs the processing of the next depth.

On the other hand, if the processing of all nodes of the relevant depth has not been completed, the tree synthesizing unit 2020 moves to step S704.

In step S704, the tree synthesizing unit 2020 confirms whether decoding of the Trisoup_applied_flag determined in step S702 is necessary.

If it is determined that decoding of Trisoup_applied_flag is necessary, the tree synthesizing unit 2020 proceeds to step S705, and if it is determined that decoding of Trisoup_applied_flag is not required, the tree synthesizing unit 2020 proceeds to step S708. .

In step S705, the tree synthesizing unit 2020 decodes Trisoup_applied_flag.

Trisoup_applied_flag is a 1-bit flag (second flag) that indicates whether or not Trisoup is applied to the target node. For example, it may be defined that Trisoup is applied to the target node when the flag value is "1", and that Trisoup is not applied to the target node when the flag value is "0".

After decoding Trisoup_applied_flag, the tree synthesizing unit 2020 moves to step S706.

In step S706, the tree synthesizing unit 2020 checks the value of Trisoup_applied_flag decoded in step S705.

When Trisoup is applied to the target node, that is, when the value of Trisoup_applied_flag is "1", the tree synthesizing unit 2020 moves to step S707.

When Trisoup is not applied to the target node, that is, when the value of Trisoup_applied_flag is "0", the tree synthesizing unit 2020 proceeds to step S708.

In step S707, the tree synthesizing unit 2020 stores the target node as a node to which Trisoup is applied, that is, as a Trisoup node. It is assumed that the node division by "Octree" is no longer applied to such a target node. After that, the tree synthesizing unit 2020 advances to step S703 to process the next node.

In step S708, the tree synthesizing unit 2020 decodes the information called the occpancy code.

In the case of "Octree", the occpancy code is divided into 8 nodes (called child nodes) by dividing the target node in half along the x, y, and z axes. This is information indicating whether or not a point to be decoded is included.

For example, the occupation code assigns 1-bit information to each child node, and if the 1-bit information is "1", it is defined that the point to be decoded is included in the child node. If the bit information is '0', it may be defined that the point to be decoded is not included in the child node.

When decoding such an occurrence code, the tree synthesizing unit 2020 pre-estimates the probability that each child node has a point to be decoded, and entropy-decodes the bits corresponding to each child node based on the probability. good.

Similarly, the point group encoding device 100 may perform entropy encoding. Furthermore, inter-prediction may be used to estimate such probabilities. As a specific inter-prediction method, for example, the method described in Non-Patent Document 1 can be applied. Furthermore, a point cloud upsampled by the upsampling unit 2120 may be used as a reference point cloud for inter prediction.

After decoding the opportunity code, the tree synthesizing unit 2020 advances to step S703 to process the next node.

In step S709, the tree synthesizing unit 2020 decodes the Trisoup information. The tree synthesizing unit 2020 executes the process of step S709 only when Trisoup is used, that is, when the value of trisoup_enabled_flag is "1". That is, if Trisoup is not used, tree synthesizing section 2020 advances to step S710 and terminates the process.

FIG. 8 is a flowchart showing an example of decoding processing for Trisoup information.

As shown in FIG. 8, in step S801, the tree synthesizing unit 2020 decodes syntax that controls sampling intervals of decoding points.

In step S802, the tree synthesizing unit 2020 determines whether the processing of all Trisoup hierarchies has been completed. Here, the total number of Trisoup hierarchies can be defined as follows.

When allowing Trisoup at multiple levels, that is, when the value of trisoup_multilevel_enabled_flag is "1", the total number of Trisoup layers can be defined as (maximum Trisoup node size - minimum Trisoup node size + 1).

That is, in this case, the total number of Trisoup hierarchies can be defined by (log2_trisoup_max_node_size_minus3−log2_trisoup_min_node_size_minus3+1).

When Trisoup at multiple levels is not permitted, that is, when the value of trisoup_multilevel_enabled_flag is "0", the total number of Trisoup layers is 1.

When the processing of all Trisoup hierarchies has been completed, the tree synthesizing unit 2020 proceeds to step S807 and ends the processing. If the processing of all Trisoup hierarchies has not been completed, the tree synthesizing unit 2020 moves to step S803.

In step S803, the tree synthesizing unit 2020 decodes the number of unique segments in the Trisoup hierarchy. The number of unique segments is the number of sides forming a Trisoup node belonging to the Trisoup hierarchy.

In step S803, instead of decoding the number of unique segments, the tree synthesizing unit 2020 may decode a flag indicating whether (one or more) unique segments exist in the Trisoup hierarchy as described above. .

Furthermore, when Trisoup at multiple levels is not permitted, that is, when the value of trisoup_multilevel_enabled_flag is "0", it is obvious that a unique segment exists in the Trisoup hierarchy. may omit the decoding of the flag indicating whether there is (one or more) unique segments in .

After decoding the number of unique segments, the tree synthesizing unit 2020 moves to step S804.

In step S804, the tree synthesizing unit 2020 confirms the number of unique segments belonging to the Trisoup hierarchy.

If the number of unique segments is 0, that is, if the Trisoup hierarchy does not include even one Trisoup node, the tree synthesizing unit 2020 proceeds to step S802 to proceed to the processing of the next Trisoup hierarchy.

If the number of unique segments is greater than 0, the tree synthesizing unit 2020 moves to step S805.

Note that when the flag indicating whether (one or more) unique segments exist in the Trisoup hierarchy is decoded in step S803, the tree synthesizing unit 2020 determines that the Trisoup hierarchy contains unique segments. After decoding the number of unique segments in the layer, the process proceeds to step S805, and if there is no unique segment in the Trisoup layer, the process proceeds to step S802.

In step S805, the tree synthesizing unit 2020 decodes whether each unique segment includes a vertex used for Trisoup processing.

It should be noted that the number of vertices that can exist for each unique segment may be limited to only one point. In this case, the number of unique segments in which vertices exist can be interpreted as = the number of vertices.

After decoding the presence/absence of vertices for all unique segments in the Trisoup hierarchy, the tree synthesizing unit 2020 moves to step S806.

In step S806, the tree synthesizing unit 2020 decodes position information indicating where the vertex exists on each unique segment for each unique segment determined to have a vertex in step S805.

For example, if the Trisoup node size in the Trisoup hierarchy is L (2 to the L power), the position information may be encoded with an L-bit equal-length code.

After decoding the vertex positions for all unique segments having vertices in the Trisoup hierarchy, the tree synthesizing unit 2020 proceeds to step S802 to proceed to the processing of the next Trisoup hierarchy.

(Point group encoding device 100)
The point group encoding device 100 according to this embodiment will be described below with reference to FIG. FIG. 9 is a diagram showing an example of functional blocks of the point group encoding device 100 according to this embodiment.

As shown in FIG. 9, the point group encoding device 100 includes a coordinate transformation unit 1010, a geometric information quantization unit 1020, a tree analysis unit 1030, an approximate surface analysis unit 1040, a geometric information encoding unit 1050, Geometric information reconstruction unit 1060, color conversion unit 1070, attribute transfer unit 1080, RAHT unit 1090, LoD calculation unit 1100, lifting unit 1110, attribute information quantization unit 1120, attribute information encoding unit 1130 , an up-sampling unit 1140 and a frame buffer 1150 .

The coordinate transformation unit 1010 is configured to perform transformation processing from the three-dimensional coordinate system of the input point group to any different coordinate system. Coordinate transformation may transform the x, y, z coordinates of the input point cloud to arbitrary s, t, u coordinates, for example, by rotating the input point cloud. Also, as one variation of transformation, the coordinate system of the input point cloud may be used as it is.

The geometric information quantization unit 1020 is configured to quantize the position information of the input point group after coordinate transformation and remove points with overlapping coordinates. Note that when the quantization step size is 1, the position information of the input point group matches the position information after quantization. That is, when the quantization step size is 1, it is equivalent to not performing quantization.

The tree analysis unit 1030 is configured to receive the position information of the quantized point group as input and generate an occupancy code indicating at which node in the encoding target space the point exists based on the tree structure described later. It is

The tree analysis unit 1030 is configured to generate a tree structure in this process by recursively partitioning the encoding target space into rectangular parallelepipeds.

Here, if a point exists within a certain rectangular parallelepiped, a tree structure can be generated by recursively executing the process of dividing the rectangular parallelepiped into multiple rectangular parallelepipeds until the rectangular parallelepipeds reach a predetermined size. Each rectangular parallelepiped is called a node. In addition, each rectangular parallelepiped generated by dividing a node is called a child node, and an occupancy code is expressed by 0 or 1 as to whether or not a point is included in the child node.

As described above, the tree analysis unit 1030 is configured to generate an occupancy code while recursively dividing a node until it reaches a predetermined size.

In this embodiment, a method called "Octree" that recursively divides the octree with the above-mentioned rectangular parallelepiped always as a cube, and "QtBt" that performs quadtree division and binary tree division in addition to octree division can be used.

Here, whether or not to use "QtBt" is transmitted to the point cloud decoding device 200 as control data.

Alternatively, it may be specified to use Predictive coding with an arbitrary tree structure. In this case, the tree analysis unit 1030 determines the tree structure, and the determined tree structure is transmitted to the point cloud decoding device 200 as control data.

For example, tree-structured control data may be configured so that it can be decoded according to the procedure described in FIGS.

The approximate surface analysis unit 1040 is configured to use the tree information generated by the tree analysis unit 1030 to generate approximate surface information.

For example, when decoding 3D point cloud data of an object, when the point clouds are densely distributed on the surface of the object, the approximate surface information is obtained by decoding the points instead of decoding individual point clouds. It is an approximation of the existence area of the group with a small plane.
Specifically, the approximate surface analysis unit 1040 may be configured to generate approximate surface information by, for example, a technique called “Trisoup”. Also, when decoding a sparse point group acquired by Lidar or the like, this process can be omitted.

The geometric information encoding unit 1050 encodes syntax such as the occupancy code generated by the tree analysis unit 1030 and the approximate surface information generated by the approximate surface analysis unit 1040 to generate a bitstream (geometric information bitstream). is configured as Here, the bitstream may contain, for example, the syntax described in FIG.

The encoding process is, for example, context adaptive binary arithmetic encoding process. Here, for example, the syntax includes control data (flags and parameters) for controlling decoding processing of position information.

Based on the tree information generated by the tree analysis unit 1030 and the approximate surface information generated by the approximate surface analysis unit 1040, the geometric information reconstruction unit 1060 generates geometric information (code It is configured to reconstruct the coordinate system assumed by the transformation processing, that is, the position information after the coordinate transformation in the coordinate transformation unit 1010).

The upsampling unit 1140 is configured to receive the geometric information reconstructed by the geometric information reconstructing unit 1060 and perform the upsampling described with reference to FIGS. 5 and 6, for example.

The frame buffer 1150 is configured to receive the point cloud upsampled by the upsampling unit 1140 and store it as a reference frame.

The stored reference frames are read from the frame buffer 1150 and used as reference frames when the tree analysis unit 1030 inter-predicts temporally different frames.

Here, whether or not the reference frame at which time is to be used for each frame is determined, for example, based on the value of the cost function representing the coding efficiency, and the information of the reference frame to be used is used as control data for the point group decoding device. 200.

Here, an example of storing in the frame buffer 1150 after upsampling by the upsampling unit 1140 has been described, but this order may be reversed.

That is, the geometric information output from the geometric information reconstruction unit 1060 is stored in the frame buffer 1150 as a reference frame, and then the reference frame stored in the frame buffer 1150 is used when inter prediction is performed by the tree analysis unit 1030. may be up-sampled by up-sampling section 1140 and then input to tree analysis section 1030 .

As yet another example, the upsampling unit 1140 may be configured to be included inside the tree analysis unit 1030 .

The color conversion unit 1070 is configured to perform color conversion when the input attribute information is color information. Color conversion does not necessarily need to be executed, and whether or not to execute color conversion processing is encoded as part of control data and transmitted to the point cloud decoding device 200 .

The attribute transfer unit 1080 performs distortion of attribute information based on the position information of the input point group, the position information of the point group after reconstruction by the geometric information reconstruction unit 1060, and the attribute information after color change by the color conversion unit 1070. is configured to correct the attribute value so that is minimized. As a specific correction method, for example, the method described in Non-Patent Document 2 can be applied.

The RAHT unit 1090 receives as input the attribute information transferred by the attribute transfer unit 1080 and the geometric information generated by the geometric information reconstruction unit 1060, and uses a type of Haar transformation called RAHT (Region Adaptive Hierarchical Transform) to transform each configured to generate point residual information; As a specific processing of RAHT, for example, the method described in Non-Patent Document 2 can be used.

The LoD calculation unit 1100 is configured to receive the geometric information generated by the geometric information reconstruction unit 1060 and generate LoD (Level of Detail).

As a specific method for determining LoD, for example, the method described in Non-Patent Document 2 may be used.

The lifting unit 1110 is configured to generate residual information through a lifting process using the LoD generated by the LoD calculation unit 1100 and the attribute information after the attribute transfer by the attribute transfer unit 1080.

As a specific lifting process, for example, the method described in Non-Patent Document 2 may be used.

The attribute information quantization section 1120 is configured to quantize the residual information output from the RAHT section 1090 or the lifting section 1110 . Here, the case where the quantization step size is 1 is equivalent to the case where no quantization is performed.

The attribute information encoding unit 1130 performs encoding processing using the quantized residual information output from the attribute information quantization unit 1120 as a syntax, and generates a bitstream related to attribute information (attribute information bitstream). is configured as

The encoding process is, for example, context adaptive binary arithmetic encoding process. Here, for example, the syntax includes control data (flags and parameters) for controlling attribute information decoding processing.

Through the above processing, the point cloud encoding device 100 is configured to input position information and attribute information of each point in the point cloud, perform encoding processing, and output a geometric information bitstream and an attribute information bitstream. ing.

Also, the point cloud encoding device 100 and the point cloud decoding device 200 described above may be implemented as a program that causes a computer to execute each function (each process).

In the above-described embodiments, the present invention is applied to the point group encoding device 100 and the point group decoding device 200 as examples, but the present invention is not limited to such examples. A point cloud encoding/decoding system having the functions of the point cloud encoding device 100 and the point cloud decoding device 200 can be similarly applied.

In addition, according to this embodiment, for example, since it is possible to improve the overall service quality in video communication, the United Nations-led Sustainable Development Goals (SDGs) Goal 9 "Develop resilient infrastructure, It will be possible to contribute to the promotion of sustainable industrialization and the expansion of innovation.

10... Point group processing system 100... Point group encoding device 1010... Coordinate transformation unit 1020... Geometric information quantization unit 1030... Tree analysis unit 1040... Approximate surface analysis unit 1050... Geometric information encoding unit 1060... Geometric information reconstruction unit 1070... Color conversion unit 1080... Attribute transfer unit 1090... RAHT unit 1100... LoD calculation unit 1110... Lifting unit 1120... Attribute information quantization unit 1130... Attribute information encoding unit 1140... Up-sampling unit 1150... Frame buffer 200... Point group Decoding device 2010 Geometric information decoding unit 2020 Tree synthesis unit 2030 Approximate surface synthesis unit 2040 Geometric information reconstruction unit 2050 Inverse coordinate transformation unit 2060 Attribute information decoding unit 2070 Inverse quantization unit 2080 RAHT unit 2090 LoD calculator 2100 Inverse lifting unit 2110 Inverse color conversion unit 2120 Upsampling unit 2130 Frame buffer

Claims

A point cloud decoding device,
a geometric information reconstruction unit configured to reconstruct position information of the point cloud;
an upsampling unit configured to upsample the point cloud reconstructed by the geometric information reconstruction unit;
a tree synthesizer configured to perform inter prediction with reference to the upsampled point cloud;
The point cloud decoding device, wherein the up-sampling unit is configured to calculate an offset vector from the coordinates of each point in the reconstructed point cloud, and perform up-sampling using the offset vector. .
The up-sampling unit calculates the offset vector as a vector pointing in the same direction as a vector pointing to the coordinates of each point in the reconstructed point group with the origin of the coordinate space as a starting point and having a predetermined size. 2. The point group decoding device according to claim 1, wherein:
The point cloud decoding device according to claim 2, wherein the predetermined size is defined as "the L2 norm is 1".
The upsampling unit is configured to calculate upsampling points based on the offset vector, a parameter for controlling an upsampling interval, and a parameter for controlling the number of points generated by the upsampling. The point group decoding device according to any one of claims 1 to 3.
A geometric information decoding unit configured to decode a parameter for controlling the upsampling interval and a parameter for controlling the number of points generated by the upsampling as control data, respectively, from a bitstream. 5. The point group decoding device according to 4.
A point cloud decoding device,
A point cloud decoding device, comprising: a geometric information decoding unit configured to decode a first flag that controls whether or not Trisoup at multiple levels is permitted.
The geometric information decoding unit is configured to decode a maximum Trisoup node size and a minimum Trisoup node size when the value of the first flag indicates that Trisoup at the multiple levels is permitted. The point group decoding device according to claim 6.
tree composition configured to decode a second flag indicating whether or not to apply Trisoup to the target node for each node if the value of the first flag indicates to allow Trisoup at the multiple levels; 8. The point group decoding device according to claim 6, further comprising a unit.
When the value of the first flag indicates that Trisoup at the plurality of levels is permitted, the tree synthesizing unit determines whether decoding of the second flag is necessary for each depth, and determines whether the decoding of the second flag is necessary. 9. The point group decoding device according to claim 8, wherein only nodes belonging to the depth determined to be necessary are configured to decode the second flag for each node.
When the value of the first flag indicates that the Trisoup at the multiple levels is permitted, and the node size at the depth is equal to or less than the maximum Trisoup node size and greater than the minimum Trisoup node size and only the nodes belonging to the Depth for which it is determined that the second flag needs to be decoded are configured to decode the second flag for each node 9. The point group decoding device according to claim 8, wherein:
A point cloud decoding method comprising:
a geometric information reconstruction unit configured to reconstruct position information of the point cloud;
A step A of upsampling the point cloud reconstructed by the geometric information reconstruction unit;
A step B of performing inter prediction with reference to the upsampled point cloud,
A point cloud decoding method, wherein in the step A, an offset vector is calculated from the coordinates of each point in the reconstructed point cloud, and upsampling is performed using the offset vector.
A program that causes a computer to function as a point cloud decoding device,
The point group decoding device is
a geometric information reconstruction unit configured to reconstruct position information of the point cloud;
an upsampling unit configured to upsample the point cloud reconstructed by the geometric information reconstruction unit;
a tree synthesizer configured to perform inter prediction with reference to the upsampled point cloud;
The program, wherein the up-sampling unit is configured to calculate an offset vector from the coordinates of each point in the reconstructed point group, and perform up-sampling using the offset vector.