WO2023113417A1 - 포인트 클라우드 데이터의 전송 장치와 이 전송 장치에서 수행되는 방법 및, 포인트 클라우드 데이터의 수신 장치와 이 수신 장치에서 수행되는 방법 - Google Patents
포인트 클라우드 데이터의 전송 장치와 이 전송 장치에서 수행되는 방법 및, 포인트 클라우드 데이터의 수신 장치와 이 수신 장치에서 수행되는 방법 Download PDFInfo
- Publication number
- WO2023113417A1 WO2023113417A1 PCT/KR2022/020191 KR2022020191W WO2023113417A1 WO 2023113417 A1 WO2023113417 A1 WO 2023113417A1 KR 2022020191 W KR2022020191 W KR 2022020191W WO 2023113417 A1 WO2023113417 A1 WO 2023113417A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- track
- point cloud
- temporal scalability
- attribute
- information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 207
- 230000005540 biological transmission Effects 0.000 title claims abstract description 39
- 230000002123 temporal effect Effects 0.000 claims abstract description 151
- 101150030514 GPC1 gene Proteins 0.000 claims description 30
- 101150097497 GPT1 gene Proteins 0.000 claims description 5
- 230000008569 process Effects 0.000 description 156
- 238000005538 encapsulation Methods 0.000 description 47
- 238000006243 chemical reaction Methods 0.000 description 46
- 230000006870 function Effects 0.000 description 45
- 238000013139 quantization Methods 0.000 description 38
- 230000009466 transformation Effects 0.000 description 33
- 238000005516 engineering process Methods 0.000 description 23
- 230000011664 signaling Effects 0.000 description 20
- 101100287577 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) gpe-1 gene Proteins 0.000 description 19
- 238000009877 rendering Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 7
- 230000006835 compression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 6
- 208000013057 hereditary mucoepithelial dysplasia Diseases 0.000 description 5
- 230000002441 reversible effect Effects 0.000 description 5
- AWSBQWZZLBPUQH-UHFFFAOYSA-N mdat Chemical class C1=C2CC(N)CCC2=CC2=C1OCO2 AWSBQWZZLBPUQH-UHFFFAOYSA-N 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 208000031212 Autoimmune polyendocrinopathy Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 235000019395 ammonium persulphate Nutrition 0.000 description 1
- 238000000261 appearance potential spectroscopy Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 239000011449 brick Substances 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/31—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
Definitions
- the present disclosure relates to a method and apparatus for processing point cloud content.
- the point cloud content is content expressed as a point cloud, which is a set of points belonging to a coordinate system representing a 3D space.
- Point cloud content can represent three-dimensional media, and provides various services such as VR (virtual reality), AR (augmented reality), MR (mixed reality), and autonomous driving service. used to provide Since tens of thousands to hundreds of thousands of point data are required to represent the point cloud content, a method for efficiently processing the vast amount of point data is required.
- the present disclosure provides an apparatus and method for efficiently processing point cloud data.
- the present disclosure provides a point cloud data processing method and apparatus for solving latency and encoding/decoding complexity.
- the present disclosure provides apparatus and methods for supporting temporal scalability in the carriage of geometry-based point cloud compressed data (G-PCC).
- G-PCC geometry-based point cloud compressed data
- the present disclosure proposes an apparatus and methods for providing a point cloud content service that efficiently stores a G-PCC bitstream in a single track in a file or divides and stores a G-PCC bitstream in a plurality of tracks and provides signaling therefor.
- the present disclosure proposes apparatus and methods for processing a file storage scheme to support efficient access to a stored G-PCC bitstream.
- the present disclosure proposes an apparatus and method capable of specifying a track capable of carrying temporal scalability information when temporal scalability is supported.
- a method performed by a point cloud data receiving device is a method performed by a point cloud data receiving device, and includes temporal scalability information of a point cloud in a 3D space based on a G-PCC file. Obtaining and restoring the 3D point cloud based on the temporal scalability information, wherein the temporal scalability information is obtained from one or more tracks, and the temporal scalability information of a first track is the first track. It may be derived from the temporal scalability information of the second track based on the component carried by the track.
- a method performed by a point cloud data transmission device includes determining whether temporal scalability is applied to point cloud data in a 3D space, and temporal scalability information and the point cloud data. and generating a G-PCC file, wherein the temporal scalability information is carried by one or more tracks, but the existence of temporal scalability information in a first track is a component carried by the first track. can be determined based on
- an apparatus for receiving point cloud data includes a memory and at least one processor, and the at least one processor performs temporal scalability of a point cloud in a 3D space based on a G-PCC file. Obtain information, restore the 3D point cloud based on the temporal scalability information, and obtain the temporal scalability information from one or more tracks, but the temporal scalability information of the first track is carried by the first track ( It can be derived from the temporal scalability information of the second track based on the component to be carried.
- an apparatus for transmitting point cloud data includes a memory and at least one processor, and the at least one processor determines whether temporal scalability is applied to point cloud data in a 3D space. and generates a G-PCC file including temporal scalability information and the point cloud data, wherein the temporal scalability information is carried by one or more tracks, but the existence of the temporal scalability information in the first track It may be determined based on the component carried by the first track.
- a computer readable medium for storing a G-PCC bitstream or file is disclosed.
- the G-PCC bitstream or file may be generated by a method performed by a device for transmitting point cloud data.
- a method of transmitting a G-PCC bitstream or file is disclosed.
- the G-PCC bitstream or file may be generated by a method performed by a device for transmitting point cloud data.
- Apparatus and method according to embodiments of the present disclosure can process point cloud data with high efficiency.
- Devices and methods according to embodiments of the present disclosure may provide a high quality point cloud service.
- Devices and methods according to embodiments of the present disclosure may provide point cloud content for providing general-purpose services such as VR services and autonomous driving services.
- Apparatus and method according to embodiments of the present disclosure may provide temporal scalability capable of effectively accessing a desired component among G-PCC components.
- the apparatus and method according to embodiments of the present disclosure may reduce signaling overhead and improve video encoding/decoding efficiency by specifying a track in which temporal scalability information may exist when temporal scalability is supported.
- the apparatus and method according to the embodiments of the present disclosure can manipulate data at a high level consistent with a network function or a decoder function by supporting temporal scalability, thereby improving the performance of a point cloud content providing system.
- Apparatus and methods according to embodiments of the present disclosure may divide and store a G-PCC bitstream into one or more multiple tracks in a file.
- the apparatus and method according to embodiments of the present disclosure may enable smooth and gradual reproduction by reducing complexity of reproduction.
- FIG. 1 is a block diagram illustrating an example of a point cloud content providing system according to embodiments of the present disclosure.
- FIG. 2 is a block diagram illustrating an example of a process of providing point cloud content according to embodiments of the present disclosure.
- FIG. 3 shows an example of a point cloud encoding apparatus according to embodiments of the present disclosure.
- FIG. 4 illustrates an example of a voxel according to embodiments of the present disclosure.
- FIG. 5 shows an example of an octree and occupancy code according to embodiments of the present disclosure.
- FIG. 6 shows an example of a point configuration for each LOD according to embodiments of the present disclosure.
- FIG. 7 is a block diagram illustrating an example of a point cloud decoding apparatus according to embodiments of the present disclosure.
- FIG. 8 is a block diagram illustrating another example of a point cloud decoding apparatus according to embodiments of the present disclosure.
- FIG. 9 is a block diagram illustrating another example of a transmission device according to embodiments of the present disclosure.
- FIG. 10 is a block diagram illustrating another example of a receiving device according to embodiments of the present disclosure.
- FIG. 11 illustrates an example of a structure capable of interworking with a method/apparatus for transmitting and receiving point cloud data according to embodiments of the present disclosure.
- FIG. 12 shows an example of a file including a single track according to an embodiment of the present disclosure.
- FIG 13 shows an example of a file including multiple tracks according to an embodiment of the present disclosure.
- FIG. 14 and 15 show an example of a method performed by a point cloud receiving device or a point cloud transmitting device according to an embodiment of the present disclosure.
- first and second are used only for the purpose of distinguishing one element from another, and do not limit the order or importance of elements unless otherwise specified. Accordingly, within the scope of the present disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly, a second component in one embodiment may be referred to as a first component in another embodiment. can also be called
- components that are distinguished from each other are intended to clearly explain each characteristic, and do not necessarily mean that the components are separated. That is, a plurality of components may be integrated to form a single hardware or software unit, or a single component may be distributed to form a plurality of hardware or software units. Accordingly, even such integrated or distributed embodiments are included in the scope of the present disclosure, even if not mentioned separately.
- components described in various embodiments do not necessarily mean essential components, and some may be optional components. Accordingly, an embodiment comprising a subset of elements described in one embodiment is also included in the scope of the present disclosure. In addition, embodiments including other components in addition to the components described in various embodiments are also included in the scope of the present disclosure.
- the present disclosure relates to encoding and decoding of point cloud-related data, and terms used in the present disclosure may have common meanings commonly used in the technical field to which the present disclosure belongs unless newly defined in the present disclosure.
- “/” and “,” may be interpreted as “and/or”.
- “A/B” and “A, B” could be interpreted as “A and/or B”.
- “A/B/C” and “A, B, C” may mean “at least one of A, B and/or C”.
- This disclosure relates to compression of point cloud related data.
- Various methods or embodiments of the present disclosure may be applied to a point cloud compression or point cloud coding (PCC) standard (ex. G-PCC or V-PCC standard) of the Moving Picture Experts Group (MPEG) or a next-generation video/image coding standard.
- PCC point cloud compression or point cloud coding
- MPEG Moving Picture Experts Group
- MPEG Moving Picture Experts Group
- a “point cloud” may mean a set of points located in a 3D space.
- “point cloud content” is content expressed as a point cloud and may mean “point cloud video/video”.
- 'point cloud video/video' is referred to as 'point cloud video'.
- a point cloud video may include one or more frames, and one frame may be a still image or a picture. Accordingly, the point cloud video may include a point cloud image/frame/picture, and may be referred to as “point cloud image”, “point cloud frame”, and “point cloud picture”.
- point cloud data may mean data or information related to each point in a point cloud.
- Point cloud data may include geometry and/or attributes.
- point cloud data may further include meta data.
- Point cloud data may be referred to as “point cloud content data” or “point cloud video data” or the like.
- point cloud data may be referred to as "point cloud content”, “point cloud video”, “G-PCC data”, and the like.
- a point cloud object corresponding to point cloud data may be represented in a box shape based on a coordinate system, and the box shape based on the coordinate system may be referred to as a bounding box. That is, the bounding box may be a rectangular cuboid capable of containing all points of a point cloud, and may be a rectangular cuboid including a source point cloud frame.
- the geometry includes the position (or position information) of each point, and the position includes parameters (eg, a coordinate system consisting of an x-axis, a y-axis, and a z-axis) representing a three-dimensional coordinate system For example, x-axis value, y-axis value, and z-axis value). Geometry may be referred to as “geometry information”.
- the attribute may include a property of each point, and this property is one of texture information, color (RGB or YCbCr), reflectance (r), transparency, etc. of each point may contain more than Attributes may be referred to as “attribute information”.
- Meta data may include various data related to acquisition in an acquisition process described later.
- FIG. 1 shows an example of a system for providing point cloud content (hereinafter referred to as a 'point cloud content providing system') according to embodiments of the present disclosure.
- 2 shows an example of a process in which a point cloud content providing system provides point cloud content.
- the point cloud content providing system may include a transmission device 10 and a reception device 20 .
- the point cloud content providing system includes an acquisition process (S20), an encoding process (S21), a transmission process (S22), a decoding process (S23) illustrated in FIG. 2 by operations of the transmission device 10 and the reception device 20.
- a rendering process (S24) and/or a feedback process (S25) may be performed.
- the transmission device 10 acquires point cloud data, and converts a bitstream through a series of processes (eg, encoding process) on the acquired point cloud data (original point cloud data). can be printed out.
- the point cloud data may be output in the form of a bitstream through an encoding process.
- the transmission device 10 may transmit the output bitstream in the form of a file or streaming (streaming segment) to the reception device 20 through a digital storage medium or a network.
- Digital storage media may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, and SSD.
- the receiving device 20 may process (eg, decode or restore) received data (eg, encoded point cloud data) back into original point cloud data and render the received data (eg, encoded point cloud data).
- received data eg, encoded point cloud data
- render the received data eg, encoded point cloud data
- Point cloud content can be provided to the user through these processes, and the present disclosure can provide various embodiments required to effectively perform these series of processes.
- the transmission device 10 may include an acquisition unit 11, an encoding unit 12, an encapsulation processing unit 13, and a transmission unit 14, and the reception device 20 may include a receiving unit 21, a decapsulation processing unit 22, a decoding unit 23 and a rendering unit 24.
- the acquisition unit 11 may perform a process (S20) of acquiring a point cloud video through a capture, synthesis, or generation process. Accordingly, the acquisition unit 11 may be referred to as 'point cloud video acquisition'.
- Point cloud data (geometry and/or attributes, etc.) for a plurality of points may be generated by the acquisition process (S20).
- meta data related to point cloud video acquisition may be generated through the acquisition process ( S20 ).
- mesh data (eg, triangular data) representing connection information between point clouds may be generated by the acquisition process ( S20 ).
- Meta data may include initial viewing orientation metadata.
- the initial viewing orientation meta data may indicate whether the point cloud data represents forward or backward data.
- Meta data may be referred to as "auxiliary data" which is meta data about a point cloud.
- the acquired point cloud video may include a polygon file format or the stanford triangle format (PLY) file. Since a point cloud video has one or more frames, one or more PLY files may be included in the acquired point cloud video.
- the PLY file may include point cloud data of each point.
- the acquiring unit 11 includes camera equipment capable of obtaining depth (depth information) and an RGB camera capable of extracting color information corresponding to the depth information. It may consist of a combination of
- the camera equipment capable of obtaining depth information may be a combination of an infrared pattern projector and an infrared camera.
- the acquisition unit 11 may be composed of a lidar (LiDAR), and the lidar may use a radar system that measures the position coordinates of the reflector by measuring the time it takes for the laser pulse to be reflected and returned.
- LiDAR lidar
- the acquisition unit 110 may extract a shape of a geometry composed of points in a 3D space from depth information, and may extract an attribute representing color or reflection of each point from RGB information.
- Methods for extracting (or capturing, acquiring, etc.) point cloud video include an inward-facing method for capturing a central object and an outward-facing method for capturing an external environment. There may be an outward-facing scheme.
- the encoder 12 may perform an encoding process (S21) of encoding the data (geometry, attribute and/or meta data and/or mesh data, etc.) generated by the acquisition unit 11 into one or more bitstreams. . Accordingly, the encoder 12 may be referred to as a 'point cloud video encoder'. The encoder 12 may encode the data generated by the acquisition unit 11 serially or in parallel.
- the encoding process (S21) performed by the encoder 12 may be geometry-based point cloud compression (G-PCC).
- the encoder 12 may perform a series of procedures such as prediction, transformation, quantization, and entropy coding for compression and coding efficiency.
- Encoded point cloud data may be output in the form of a bitstream.
- the encoder 12 may encode the point cloud data by dividing it into geometry and attributes as will be described later.
- the output bitstream may include a geometry bitstream including encoded geometry and an attribute bitstream including encoded attributes.
- the output bitstream may further include one or more of a metadata bitstream including meta data, an auxiliary bitstream including auxiliary data, and a mesh data bitstream including mesh data.
- the encoding process (S21) will be described in more detail below.
- a bitstream containing encoded point cloud data may be referred to as a 'point cloud bitstream' or a 'point cloud video bitstream'.
- the encapsulation processor 13 may perform a process of encapsulating one or more bitstreams output from the decoder 12 in the form of a file or a segment. Accordingly, the encapsulation processor 13 may be referred to as a 'file/segment encapsulation module'.
- the encapsulation processing unit 13 is composed of a separate component/module in relation to the transmission unit 14 is represented, but according to embodiments, the encapsulation processing unit 13 is ) may be included.
- the encapsulation processing unit 13 may encapsulate the corresponding data in a file format such as ISOBMFF (ISO Base Media File Format) or may process the data in the form of other DASH segments.
- the encapsulation processing unit 13 may include meta data in a file format. Meta data may be included in, for example, boxes of various levels on the ISOBMFF file format, or may be included as data in a separate track within a file.
- the encapsulation processing unit 130 may encapsulate meta data itself into a file. Meta data processed by the encapsulation processing unit 13 may be received from a metadata processing unit not shown in the drawing.
- the meta data processing unit may be included in the encoding unit 12 or may be configured as a separate component/module.
- the transmission unit 14 may perform a transmission process (S22) of applying processing (processing for transmission) according to a file format to the 'encapsulated point cloud bitstream'.
- the transmission unit 140 may transmit a bitstream or a file/segment including the corresponding bitstream to the reception unit 21 of the reception device 20 through a digital storage medium or a network. Accordingly, the transmission unit 14 may be referred to as a 'transmitter' or a 'communication module'.
- the receiving unit 21 may receive a bitstream transmitted by the transmission device 10 or a file/segment including the corresponding bitstream. Depending on the transmitted channel, the receiving unit 21 may receive a bitstream or a file/segment including the corresponding bitstream through a broadcasting network, or may receive a bitstream or a file/segment including the corresponding bitstream through a broadband. there is. Alternatively, the receiving unit 21 may receive a bitstream or a file/segment including the corresponding bitstream through a digital storage medium.
- the receiver 21 may process the received bitstream or a file/segment including the corresponding bitstream according to a transmission protocol.
- the receiving unit 21 may perform a reverse process of transmission processing (processing for transmission) to correspond to the processing for transmission performed by the transmission device 10 .
- the receiving unit 21 may transfer encoded point cloud data to the decapsulation processing unit 22 and transfer meta data to the meta data parsing unit.
- Meta data may be in the form of a signaling table. According to embodiments, the reverse process of processing for transmission may be performed in the receiving processing unit.
- Each of the reception processing unit, the decapsulation processing unit 22, and the meta data parsing unit may be included in the reception unit 21 or configured as components/modules separate from the reception unit 21.
- the decapsulation processing unit 22 may decapsulate point cloud data (ie, a bitstream in the form of a file) received from the reception unit 21 or the reception processing unit. Accordingly, the decapsulation processor 22 may be referred to as a 'file/segment decapsulation module'.
- the decapsulation processing unit 22 may obtain a point cloud bitstream or a meta data bitstream by decapsulating files according to ISOBMFF or the like.
- metadata may be included in the point cloud bitstream.
- the obtained point cloud bitstream may be delivered to the decoder 23, and the obtained metadata bitstream may be delivered to the metadata processor.
- the meta data processing unit may be included in the decoding unit 23 or may be configured as a separate component/module.
- Meta data acquired by the decapsulation processing unit 23 may be in the form of a box or track in a file format.
- the decapsulation processor 23 may receive meta data necessary for decapsulation from the meta data processor if necessary. Meta data may be transferred to the decoder 23 and used in the decoding process (S23), or may be transferred to the rendering unit 24 and used in the rendering process (S24).
- the decoder 23 may perform a decoding process (S23) of decoding the point cloud bitstream (encoded point cloud data) by receiving the bitstream and performing an operation corresponding to the operation of the encoder 12. . Accordingly, the decoder 23 may be referred to as a 'point cloud video decoder'.
- the decoder 23 may decode the point cloud data by dividing it into geometry and attribute. For example, the decoder 23 may restore (decode) geometry from a geometry bitstream included in the point cloud bitstream, and generate attributes based on an attribute bitstream included in the point cloud bitstream and the restored geometry. It can be restored (decoded). A 3D point cloud video/image may be reconstructed based on position information according to the reconstructed geometry and an attribute (color or texture, etc.) according to the decoded attribute.
- the decoding process (S23) will be described in more detail below.
- the rendering unit 24 may perform a rendering process ( S24 ) of rendering the restored point cloud video. Accordingly, the rendering unit 24 may be referred to as a 'renderer'.
- the rendering process (S24) may refer to a process of rendering and displaying point cloud content on a 3D space.
- rendering may be performed according to a desired rendering method based on position information and attribute information of points decoded through the decoding process.
- the feedback process (S25) may include a process of transferring various feedback information that may be obtained in the rendering process (S24) or the display process to the transmitting device 10 or to other components in the receiving device 20.
- the feedback process (S25) may be performed by one or more of the components included in the receiving device 20 of FIG. 1, or may be performed by one or more of the components shown in FIGS. 7 and 8. According to embodiments, the feedback process (S25) may be performed by a 'feedback unit' or a 'sensing/tracking unit'.
- FIG. 3 shows an example of a point cloud encoding apparatus 400 according to embodiments of the present disclosure.
- the point cloud encoding device 400 of FIG. 3 may correspond to the encoder 12 of FIG. 1 in configuration and function.
- the point cloud encoding apparatus 400 includes a coordinate system conversion unit 405, a geometry quantization unit 410, an octree analysis unit 415, an approximation unit 420, a geometry encoding unit 425, Restoration unit 430, attribute transformation unit 440, RAHT transformation unit 445, LOD generation unit 450, lifting unit 455, attribute quantization unit 460, attribute encoding unit 465 and/or color A conversion unit 435 may be included.
- the point cloud data acquired by the acquisition unit 11 may go through processes for adjusting the quality (eg, lossless, lossy, or near-lossless) of the point cloud content according to network conditions or applications. there is.
- each point of the obtained point cloud content may be transmitted without loss, but in this case, real-time streaming may not be possible because the size of the point cloud content is large. Therefore, in order to smoothly provide the point cloud content, a process of reconstructing the point cloud content according to the maximum target bitrate is required.
- Processes for adjusting the quality of point cloud content may include a process of reconstructing and encoding position information (position information included in geometry information) or color information (color information included in attribute information) of points.
- a process of reconstructing and encoding position information of points may be referred to as geometry coding, and a process of reconstructing and encoding attribute information associated with each point may be referred to as attribute coding.
- Geometry coding may include a geometry quantization process, a voxelization process, an octree analysis process, an approximation process, a geometry encoding process, and/or a coordinate system conversion process. Also, geometry coding may further include a geometry restoration process. Attribute coding may include a color transformation process, an attribute transformation process, a prediction transformation process, a lifting transformation process, a RAHT transformation process, an attribute quantization process, an attribute encoding process, and the like.
- the process of converting the coordinate system may correspond to a process of converting a coordinate system for positions of points. Accordingly, the process of transforming the coordinate system may be referred to as 'transform coordinates'.
- the coordinate system conversion process may be performed by the coordinate system conversion unit 405 .
- the coordinate system conversion unit 405 converts the positions of the points from the global space coordinate system into position information of a 3-dimensional space (eg, a 3-dimensional space represented by X-axis, Y-axis, and Z-axis coordinate systems).
- Position information in a 3D space may be referred to as 'geometry information'.
- the geometry quantization process may correspond to a process of quantizing position information of points and may be performed by the geometry quantization unit 410 .
- the geometry quantization unit 410 searches for position information having a minimum (x, y, z) value among position information of points, and obtains the minimum (x, y, z) value from the position information of each point. Position information having a value may be subtracted.
- the geometry quantization unit 410 may perform a quantization process by multiplying the subtracted value by a preset quantization scale value and then adjusting (lowering or raising) the result to a near integer value. there is.
- the voxelization process may correspond to a process of matching geometry information quantized through the quantization process to a specific voxel existing in a 3D space.
- a voxelization process may also be performed by the geometry quantization unit 410 .
- the geometry quantization unit 410 may perform octree-based voxelization based on position information of points in order to reconstruct each point to which the quantization process is applied.
- a voxel may mean a space for storing information of points existing in 3D, similar to a pixel, which is a minimum unit having information of a 2D image/video.
- Voxel is a compound word combining volume and pixel.
- a voxel may estimate spatial coordinates from a positional relationship with a voxel group, and may have color or reflectance information like a pixel.
- Only one point may not exist (match) in one voxel. That is, information related to several points may exist in one voxel. Alternatively, information related to a plurality of points included in one voxel may be integrated into one point information. These adjustments may optionally be performed. In the case of integrating and expressing one voxel with one point information, the position value of the center point of the voxel can be set based on the position values of points existing in the voxel, and it is necessary to perform a related attribute conversion process.
- the attribute conversion process may be adjusted to an average value of a color or reflectance of points included in a voxel or a position value of a central point of a voxel and points adjacent to each other within a specific radius.
- the octree analyzer 415 may use an octree to efficiently manage the region/position of a voxel.
- An example of an octree according to embodiments of the present disclosure is shown in (a) of FIG. 5 .
- a quadtree can be used as a data structure to divide an area until a leaf node becomes a pixel and efficiently manage the size and location of the area.
- the present disclosure may apply the same method to efficiently manage a 3D space by position and size of space.
- 8 spaces may be created by dividing the 3D space based on the x-axis, y-axis, and z-axis.
- 8 spaces may be created for each small space again.
- the octree analyzer 415 divides regions until leaf nodes become voxels, and octree data capable of managing 8 child node regions to efficiently manage each region size and position. structure can be used.
- An octree may be expressed as an occupancy code, and an example of an occupancy code according to embodiments of the present disclosure is shown in (b) of FIG. 5 .
- the octree analyzer 415 may express the occupancy code of the corresponding node as 1 if a point is included in each node, and express the occupancy code of the corresponding node as 0 if the point is not included.
- the geometry encoding process may correspond to a process of performing entropy coding on occupancy codes.
- the geometry encoding process may be performed by the geometry encoding unit 425 .
- the geometry encoding unit 425 may perform entropy coding on the occupancy code.
- the generated occupancy code may be directly encoded or may be encoded through an intra/inter coding process to increase compression efficiency.
- the receiving device 20 may reconstruct the octree through the occupancy code.
- Positions of points in a specific region may be directly transmitted, or positions of points in a specific region may be reconstructed based on voxels using a surface model.
- a mode for directly transmitting the location of each point for a specific node may be a direct mode.
- the point cloud encoding apparatus 400 may check whether conditions for enabling the direct mode are satisfied.
- the conditions for enabling direct mode are: 1) Use direct mode option must be enabled, 2) That particular node does not correspond to a leaf node, 3) Points below the threshold must exist within that particular node. and 4) that the total number of points to be directly transmitted does not exceed a limit.
- the point cloud encoding apparatus 400 may directly entropy code and transmit the position value of a point of a corresponding specific node through the geometry encoding unit 425 .
- a mode for reconstructing a position of a point in a specific area based on a voxel using a surface model may be a trisoup mode.
- the tree-up mode may be performed by the approximation unit 420 .
- the approximation unit 420 may determine a specific level of the octree, and from the determined specific level, may reconstruct the positions of points in the node region on a voxel basis using a surface model.
- the point cloud encoding apparatus 400 may selectively apply the tree-soup mode. Specifically, when using the tree-soup mode, the point cloud encoding apparatus 400 may designate a level (specific level) to which the tree-soup mode is applied. For example, if the designated specific level is equal to the depth (d) of the octree, the tree-up mode may not be applied. That is, the specified specific level must be less than the depth value of the octree.
- a 3D cube area of nodes of a specified level is called a block, and one block may include one or more voxels.
- a block or voxel may correspond to a brick.
- Each block may have 12 edges, and the approximation unit 420 may check whether each edge is adjacent to an occupied voxel having a point.
- Each edge may be adjacent to several occupied voxels.
- a specific position of an edge adjacent to a voxel is called a vertex, and the approximation unit 420 may determine an average position of corresponding positions as a vertex when several occupied voxels are adjacent to one edge.
- the point cloud encoding apparatus 400 determines the starting point (x, y, z) of the edge, the direction vector ( ⁇ x, ⁇ y, ⁇ z) of the edge, and the position value of the vertex (relative within the edge). position values) may be entropy-coded through the geometry encoding unit 425.
- the geometry restoration process may correspond to a process of generating a restored geometry by reconstructing the octree and/or the approximated octree.
- the geometry restoration process may be performed by the restoration unit 430 .
- the restoration unit 430 may perform a geometry restoration process through triangle reconstruction, up-sampling, voxelization, and the like.
- the restoration unit 430 may reconstruct a triangle based on the starting point of the edge, the direction vector of the edge, and the position value of the vertex.
- the reconstructor 430 may perform an upsampling process to add points in the middle along the edges of the triangle and convert them into voxels.
- the restoration unit 430 may generate additional points based on the upsampling factor and the width of the block. These points can be called refined vertices.
- the restoration unit 430 may voxelize the refined vertices, and the point cloud encoding device 400 may perform attribute coding based on the voxelized position values.
- the geometry encoding unit 425 may increase compression efficiency by applying context adaptive arithmetic coding.
- the geometry encoding unit 425 may directly entropy code the occupancy code using the arithmetic code.
- the geometry encoding unit 425 adaptively performs encoding (intra-coding) based on the occupancy of neighboring nodes or adaptively performs encoding based on the occupancy code of the previous frame. may be performed (inter-coding).
- a frame may mean a set of point cloud data generated at the same time. Since intra-coding and inter-coding are optional processes, they may be omitted.
- Attribute coding may correspond to a process of coding attribute information based on the restored (reconstructed) geometry and the geometry before coordinate system conversion (original geometry). Since an attribute may be dependent on geometry, the reconstructed geometry may be utilized for attribute coding.
- attributes may include color, reflectance, and the like.
- the same attribute coding method may be applied to information or parameters included in attributes. Color has three components and reflectance has one component, and each component can be processed independently.
- Attribute coding may include a color transformation process, an attribute transformation process, a prediction transformation process, a lifting transformation process, a RAHT transformation process, an attribute quantization process, an attribute encoding process, and the like.
- the prediction transformation process, the lifting transformation process, and the RAHT transformation process may be selectively used, or a combination of one or more may be used.
- the color conversion process may correspond to a process of converting a color format within an attribute into another format.
- a color conversion process may be performed by the color conversion unit 435 . That is, the color conversion unit 435 may convert colors within attributes.
- the color conversion unit 435 may perform a coding operation of converting a color within an attribute from RGB to YCbCr.
- an operation of the color conversion unit 435 that is, a color conversion process may be optionally applied according to a color value included in an attribute.
- the position value of the points existing in the voxel is the center of the voxel in order to integrate them into one point information for the corresponding voxel. It can be set as a point. Accordingly, a process of converting the values of attributes associated with corresponding points may be required. Also, an attribute conversion process may be performed even when the tree-up mode is performed.
- the attribute transformation process may correspond to a process of transforming an attribute based on a position where geometry coding is not performed and/or a reconstructed geometry.
- the attribute conversion process may correspond to a process of transforming an attribute of a point at a corresponding position based on a position of a point included in a voxel.
- the attribute conversion process may be performed by the attribute conversion unit 440 .
- the attribute conversion unit 440 may calculate an average value of attribute values of points (neighboring points) adjacent to each other within a specific radius and the central location value of the voxel.
- the attribute transform unit 440 may apply a weight according to a distance from the central location to attribute values and calculate an average value of the weighted attribute values.
- each voxel has a position and a calculated attribute value.
- the prediction conversion process may correspond to a process of predicting an attribute value of a current point based on attribute values of one or more points (neighboring points) neighboring the current point (a point corresponding to a prediction target).
- the prediction conversion process may be performed by the level of detail (LOD) generating unit 450 .
- LOD level of detail
- Prediction transformation is a method to which an LOD transformation technique is applied, and the LOD generation unit 450 may calculate and set the LOD value of each point based on the LOD distance value of each point.
- the LOD generator 450 may generate a predictor for each point for predictive transformation. Accordingly, if there are N points, N predictors may be generated.
- the neighboring points may be points existing within a distance set for each LOD from the current point.
- the predictor may multiply attribute values of neighboring points by the 'set weight value' and set an average value of the attribute values multiplied by the weight values as the predicted attribute value of the current point.
- An attribute quantization process may be performed on a residual attribute value obtained by subtracting the predicted attribute value of the current point from the attribute value of the current point.
- the lifting transformation process may correspond to a process of reconstructing points into a set of detail levels through an LOD generation process.
- the lifting conversion process may be performed by the lifting unit 455 .
- the lifting transformation process also includes a process of generating a predictor for each point, a process of setting the calculated LOD to the predictor, a process of registering neighboring points, and a process of setting weights according to the distance between the current point and neighboring points. can include
- the RAHT transformation process may correspond to a method of predicting attribute information of nodes at a higher level using attribute information associated with nodes at a lower level of the octree. That is, the RATH conversion process may correspond to an attribute information intra-coding method through backward scan of an octree.
- the RAHT conversion process may be performed by the RAHT conversion unit 445 .
- the RAHT conversion unit 445 scans from voxels to the entire area and may perform a RAHT conversion process up to the root node while summing (merging) the voxels into a larger block at each step. Since the RAHT conversion unit 445 performs the RAHT conversion process only for occupied nodes, in the case of an empty node that is not occupied, the RAHT conversion process may be performed for a node of an upper level immediately above it.
- the attribute quantization process may correspond to a process of quantizing attributes output from the RAHT conversion unit 445, the LOD generation unit 450, and/or the lifting unit 455.
- the attribute quantization process may be performed by the attribute quantization unit 460 .
- the attribute encoding process may correspond to a process of outputting an attribute bitstream by encoding the quantized attribute.
- the attribute encoding process may be performed by the attribute encoding unit 465 .
- FIG. 7 shows an example of a point cloud decoding apparatus 1000 according to an embodiment of the present disclosure.
- the point cloud decoding apparatus 1000 of FIG. 7 may correspond to the decoding unit 23 of FIG. 1 in configuration and function.
- the point cloud decoding apparatus 1000 may perform a decoding process based on data (bitstream) transmitted from the transmission apparatus 10 .
- the decoding process may include restoring (decoding) the point cloud video by performing an operation corresponding to the above-described encoding operation on the bitstream.
- the decoding process may include a geometry decoding process and an attribute decoding process.
- the geometry decoding process may be performed by the geometry decoding unit 1010
- the attribute decoding process may be performed by the attribute decoding unit 1020. That is, the point cloud decoding apparatus 1000 may include a geometry decoding unit 1010 and an attribute decoding unit 1020 .
- the geometry decoding unit 1010 may restore geometry from the geometry bitstream, and the attribute decoding unit 1020 may restore attributes based on the restored geometry and the attribute bitstream.
- the point cloud decoding apparatus 1000 may restore a 3D point cloud video (point cloud data) based on position information according to the restored geometry and attribute information according to the restored attribute.
- the point cloud decoding apparatus 1100 includes a geometry decoding unit 1105, an octree synthesis unit 1110, an approximation synthesis unit 1115, a geometry restoration unit 1120, and a coordinate system inverse transformation unit 1125. , attribute decoding unit 1130, attribute inverse quantization unit 1135, RATH transform unit 1150, LOD generator 1140, inverse lifting unit 1145, and/or color inverse transform unit 1155. .
- the geometry decoding unit 1105, the octree synthesis unit 1110, the approximation synthesis unit 1115, the geometry restoration unit 1120, and the coordinate system inverse transformation unit 1150 may perform geometry decoding.
- Geometry decoding may be performed in a reverse process to the geometry coding described in FIGS. 1 to 6 .
- Geometry decoding may include direct coding and trisoup geometry decoding. Direct coding and tri-sup geometry decoding may be selectively applied.
- the geometry decoding unit 1105 may decode the received geometry bitstream based on Arithmetic coding.
- An operation of the geometry decoding unit 1105 may correspond to a reverse process of an operation performed by the geometry encoding unit 435 .
- the octree synthesizer 1110 may generate an octree by obtaining an occupancy code from a decoded geometry bitstream (or information on geometry secured as a result of decoding). An operation of the octree synthesis unit 1110 may correspond to a reverse process of an operation performed by the octree analysis unit 415 .
- the approximation synthesis unit 1115 may synthesize a surface based on the decoded geometry and/or the generated octree when trisup geometry encoding is applied.
- the geometry restoration unit 1120 may restore geometry based on the surface and the decoded geometry.
- the geometry restoration unit 1120 may directly import and add position information of points to which direct coding is applied.
- the geometry restoration unit 1120 may perform a reconstruction operation, eg, triangle reconstruction, up-sampling, voxelization, and the like, to restore the geometry.
- the reconstructed geometry may include a point cloud picture or frame that does not include attributes.
- the coordinate system inverse transformation unit 1150 may obtain positions of points by transforming the coordinate system based on the restored geometry. For example, the coordinate system inverse transformation unit 1150 may inversely transform the positions of points from a 3-dimensional space (eg, a 3-dimensional space represented by X-axis, Y-axis, and Z-axis coordinate systems) into position information of a global space coordinate system.
- a 3-dimensional space eg, a 3-dimensional space represented by X-axis, Y-axis, and Z-axis coordinate systems
- the attribute decoding unit 1130, the attribute inverse quantization unit 1135, the RATH transform unit 1230, the LOD generator 1140, the inverse lifting unit 1145, and/or the color inverse transform unit 1250 may perform attribute decoding.
- can Attribute decoding may include RAHT transform decoding, predictive transform decoding, and lifting transform decoding. The above three decodings may be selectively used, or a combination of one or more decodings may be used.
- the attribute decoding unit 1130 may decode the attribute bitstream based on Arithmetic coding. For example, when the attribute value of the current point is directly entropy-encoded because there are no neighboring points in the predictor of each point, the attribute decoding unit 1130 decodes the attribute value (an attribute value that is not quantized) of the current point. can As another example, when the quantized residual attribute value is entropy-encoded because neighboring points exist in the predictor of the current points, the attribute decoder 1130 may decode the quantized residual attribute value.
- the attribute inverse quantization unit 1135 may inverse quantize a decoded attribute bitstream or information about an attribute obtained as a result of decoding, and output inverse quantized attributes (or attribute values). For example, when the quantized residual attribute value is output from the attribute decoding unit 1130, the attribute inverse quantization unit 1135 may inversely quantize the quantized residual attribute value and output the residual attribute value.
- the inverse quantization process may be selectively applied based on whether the point cloud encoding device 400 encodes attributes. That is, when the attribute value of the current point is directly encoded because there are no neighboring points in the predictor of each point, the attribute decoding unit 1130 may output the attribute value of the current point that is not quantized, and the attribute encoding process is performed. can be skipped.
- the RATH transform unit 1150, the LOD generator 1140, and/or the inverse lift unit 1145 may process the reconstructed geometry and inverse quantized attributes.
- the RATH converter 1150, the LOD generator 1140, and/or the inverse lifter 1145 may selectively perform a decoding operation corresponding to the encoding operation of the point cloud encoding apparatus 400.
- the inverse color transform unit 1155 may perform inverse transform coding to inverse transform color values (or textures) included in decoded attributes. The operation of the color inverse transform unit 1155 may be selectively performed based on whether the color transform unit 435 is operated.
- the transmission device includes a data input unit 1205, a quantization processing unit 1210, a voxelization processing unit 1215, an octree occupancy code generation unit 1220, a surface model processing unit 1225, Intra/inter coding processing unit 1230, Arithmetic coder 1235, meta data processing unit 1240, color conversion processing unit 1245, attribute conversion processing unit 1250, prediction/lifting/RAHT conversion processing unit 1255 , an Arithmetic coder 1260 and a transmission processing unit 1265.
- the function of the data input unit 1205 may correspond to the acquisition process performed by the acquisition unit 11 of FIG. 1 . That is, the data input unit 1205 may acquire a point cloud video and generate point cloud data for a plurality of points. Geometry information (position information) in the point cloud data is generated by a quantization processing unit 1210, a voxelization processing unit 1215, an octree occupancy code generation unit 1220, a surface model processing unit 1225, an intra/inter coding processing unit 1230, and , may be generated in the form of a geometry bitstream through the Arithmetic Coder 1235.
- Attribute information in the point cloud data may be generated in the form of an attribute bitstream through a color conversion processing unit 1245, an attribute conversion processing unit 1250, a prediction/lifting/RAHT conversion processing unit 1255, and an arithmetic coder 1260.
- the geometry bitstream, the attribute bitstream, and/or the meta data bitstream may be transmitted to the receiving device through processing by the transmission processor 1265.
- the function of the quantization processing unit 1210 may correspond to the quantization process performed by the geometry quantization unit 410 of FIG. 3 and/or the function of the coordinate system conversion unit 405 .
- the function of the voxelization processing unit 1215 may correspond to the voxelization process performed by the geometry quantization unit 410 of FIG. ) may correspond to the function performed.
- the function of the surface model processing unit 1225 may correspond to the function performed by the approximation unit 420 of FIG. It may correspond to a function performed by the geometry encoding unit 425.
- a function of the meta data processor 1240 may correspond to that of the meta data processor described in FIG. 1 .
- the function of the color conversion processing unit 1245 may correspond to the function performed by the color conversion unit 435 of FIG. 3, and the function of the attribute conversion processing unit 1250 is performed by the attribute conversion unit 440 of FIG. It can correspond to the function of
- the function of the prediction/lifting/RAHT conversion processing unit 1255 may correspond to the functions performed by the RAHT conversion unit 4450, the LOD generation unit 450, and the lifting unit 455 of FIG. ) may correspond to the function of the attribute encoding unit 465 of FIG. 3 .
- a function of the transmission processing unit 1265 may correspond to a function performed by the transmission unit 14 and/or the encapsulation processing unit 13 of FIG. 1 .
- the receiving device includes a receiving unit 1305, a receiving processing unit 1310, an Arithmetic decoder 1315, a meta data parser 1335, an octree reconstruction processing unit 1320 based on an occupancy code, and a surface model processing unit. 1325, Inverse quantization processing unit 1330, Arismetic decoder 1340, Inverse quantization processing unit 1345, Prediction/lifting/RAHT inverse transformation processing unit 1350, Color inverse transformation processing unit 1355, and Renderer 1360 can include
- the function of the receiver 1305 may correspond to the function performed by the receiver 21 of FIG. 1, and the function of the reception processor 1310 may correspond to the function performed by the decapsulation processor 22 of FIG. there is. That is, the receiving unit 1305 receives a bitstream from the transmission processing unit 1265, and the receiving processing unit 1310 may extract a geometry bitstream, an attribute bitstream, and/or a metadata bitstream through decapsulation processing. .
- the geometry bitstream is a reconstructed (restored) position value (position information) through an Arithmetic Decoder 1315, an Octree Reconstruction Processing Unit 1320 based on Ocupancy Code, a Surface Model Processing Unit 1325, and an Inverse Quantization Processing Unit 1330 can be created with
- the attribute bitstream may be generated as an attribute value reconstructed through an Arithmetic decoder 1340, an inverse quantization processor 1345, a prediction/lifting/RAHT inverse transform processor 1350, and a color inverse transform processor 1355.
- the meta data bitstream may be generated as meta data (or meta data information) restored through the meta data parser 1335 .
- a position value, attribute value, and/or meta data may be rendered by the renderer 1360 to provide a VR/AR/MR/autonomous driving experience to the user.
- the function of the Arismetic decoder 1315 may correspond to the function performed by the geometry decoding unit 1105 of FIG. 8, and the function of the octree reconstruction processing unit 1320 based on occupancy code corresponds to It may correspond to the function performed by 1110.
- the function of the surface model processing unit 1325 may correspond to the function performed by the approximation synthesizing unit of FIG. ) may correspond to the function performed.
- a function of the meta data parser 1335 may correspond to a function performed by the meta data parser described in FIG. 1 .
- the function of the Arithmetic decoder 1340 may correspond to the function performed by the attribute decoding unit 1130 of FIG. 8, and the function of the inverse quantization processing unit 1345 may correspond to the function can be matched.
- the function of the prediction/lifting/RAHT inverse transformation processing unit 1350 may correspond to the functions performed by the RAHT transformation unit 1150, the LOD generation unit 1140, and the inverse lifting unit 1145 of FIG. 8, and the color inverse transformation processing unit ( A function of 1355 may correspond to a function performed by the inverse color transform unit 1155 of FIG. 8 .
- FIG. 11 illustrates an example of a structure capable of interworking with a method/apparatus for transmitting and receiving point cloud data according to embodiments of the present disclosure.
- the structure of FIG. 11 is at least one of an AI Server, a robot, a self-driving vehicle, an XR device, a smartphone, a home appliance, and/or an HMD. At least one represents a configuration connected to a cloud network. Robots, self-driving vehicles, XR devices, smartphones, or consumer electronics may be referred to as devices. In addition, the XR device may correspond to or interwork with a point cloud data device (PCC) according to embodiments.
- PCC point cloud data device
- a cloud network may refer to a network that constitutes a part of a cloud computing infrastructure or exists within a cloud computing infrastructure.
- the cloud network may be configured using a 3G network, a 4G or Long Term Evolution (LTE) network, or a 5G network.
- LTE Long Term Evolution
- the server may be connected to at least one or more of robots, self-driving vehicles, XR devices, smart phones, home appliances, and/or HMDs through a cloud network, and may assist at least part of processing of the connected devices.
- the HMD may represent one of the types in which an XR device and/or a PCC device according to embodiments may be implemented.
- An HMD type device may include a communication unit, a control unit, a memory unit, an I/O unit, a sensor unit, and a power supply unit.
- the XR/PCC device applies PCC and/or XR technology to HMDs, HUDs in vehicles, televisions, mobile phones, smart phones, computers, wearable devices, home appliances, digital signage, vehicles, fixed robots or mobile robots, etc. may be implemented as
- the XR/PCC device analyzes 3D point cloud data or image data obtained through various sensors or from an external device to generate location (geometry) data and attribute data for 3D points, thereby generating information about the surrounding space or real objects. Information can be obtained, and XR objects to be displayed can be rendered and output. For example, the XR/PCC device may output an XR object including additional information about the recognized object in correspondence with the recognized object.
- the XR/PCC device may be implemented as a mobile phone or the like to which PCC technology is applied.
- the mobile phone may decode and display point cloud content based on PCC technology.
- Self-driving vehicles can be implemented as mobile robots, vehicles, unmanned air vehicles, etc. by applying PCC technology and XR technology.
- An autonomous vehicle to which XR/PCC technology is applied may refer to an autonomous vehicle equipped with a means for providing an XR image or an autonomous vehicle subject to control/interaction within the XR image.
- autonomous vehicles that are controlled/interacted within an XR image are distinguished from XR devices and may be interlocked with each other.
- An autonomous vehicle equipped with a means for providing an XR/PCC image may acquire sensor information from sensors including cameras, and output an XR/PCC image generated based on the acquired sensor information.
- an autonomous vehicle may provide an XR/PCC object corresponding to a real object or an object in a screen to a passenger by outputting an XR/PCC image with a HUD.
- the XR/PCC object when the XR/PCC object is output to the HUD, at least a part of the XR/PCC object may be output to overlap the real object toward which the passenger's gaze is directed.
- an XR/PCC object when an XR/PCC object is output to a display provided inside an autonomous vehicle, at least a part of the XR/PCC object may be output to overlap the object in the screen.
- an autonomous vehicle may output XR/PCC objects corresponding to objects such as lanes, other vehicles, traffic lights, traffic signs, two-wheeled vehicles, pedestrians, and buildings.
- VR technology is a display technology that provides objects or backgrounds of the real world only as CG images.
- AR technology means a technology that shows a virtual CG image on top of a real object image.
- MR technology is similar to the aforementioned AR technology in that it mixes and combines virtual objects in the real world.
- AR technology the distinction between real objects and virtual objects made of CG images is clear, and virtual objects are used in a form that complements real objects, whereas in MR technology, virtual objects are considered equivalent to real objects. distinct from technology. More specifically, for example, a hologram service to which the above-described MR technology is applied. Integrating VR, AR and MR technologies, it can be referred to as XR technology.
- Point cloud data may represent a volumetric encoding of a point cloud consisting of a sequence of frames (point cloud frames).
- Each point cloud frame may include a number of points, positions of points, and attributes of points. The number of points, positions of points, and attributes of points may vary from frame to frame.
- Each point cloud frame may refer to a set of 3D points specified by cartesian coordinates (x, y, z) and zero or more attributes of 3D points at a particular time instance.
- the Cartesian coordinate system (x, y, z) of 3D points may be a position or geometry.
- the present disclosure may further perform a spatial division process of dividing the point cloud data into one or more 3D blocks before encoding (encoding) the point cloud data.
- a 3D block may mean all or a partial area of a 3D space occupied by point cloud data.
- a 3D block is one of a tile group, tile, slice, coding unit (CU), prediction unit (PU), or transform unit (TU) can mean more.
- a tile corresponding to a 3D block may mean all or a partial area of a 3D space occupied by point cloud data.
- a slice corresponding to a 3D block may mean all or a partial area of a 3D space occupied by point cloud data.
- a tile may be divided into one or more slices based on the number of points included in one tile.
- a tile may be a group of slices having bounding box information. Bounding box information of each tile may be specified in a tile inventory (or tile parameter set (TPS)).
- TPS tile parameter set
- a tile may overlap another tile in the bounding box.
- a slice may be a unit of data in which encoding is independently performed or a unit of data in which decoding is independently performed.
- a slice can be a set of points that can be independently encoded or decoded.
- a slice may be a series of syntax elements representing part or all of a coded point cloud frame.
- Each slice may include an index for identifying a tile to which the corresponding slice belongs.
- the spatially divided 3D blocks may be independently or non-independently processed.
- spatially divided 3D blocks may be encoded or decoded independently or non-independently, and transmitted or received independently or non-independently.
- the spatially divided 3D blocks may be quantized or inversely quantized independently or independently of each other, and may be transformed or inversely transformed independently or independently of each other.
- the space-divided 3D blocks may be independently or non-independently rendered.
- encoding or decoding may be performed in units of slices or units of tiles.
- quantization or inverse quantization may be performed differently for each tile or slice, and may be performed differently for each transformed or inverse transformed tile or slice.
- point cloud data is spatially divided into one or more 3D blocks and the spatially divided 3D blocks are processed independently or non-independently, the process of processing the 3D blocks is performed in real time and at the same time, the corresponding process is reduced. It can be treated as a delay.
- random access and parallel encoding or parallel decoding on a 3D space occupied by point cloud data may be possible, and errors accumulated during encoding or decoding may be prevented.
- the signaling information may include information for decoding some point cloud data, information related to 3D space areas for supporting spatial access, and the like.
- the signaling information may include 3D bounding box information, 3D space area information, tile information, and/or tile inventory information.
- Signaling information may be stored and signaled in a sample in a track, a sample entry, a sample group, a track group, or a separate metadata track.
- the signaling information includes a sequence parameter set (SPS) for signaling at the sequence level, a geometry parameter set (GPS) for signaling of geometry coding information, and signaling of attribute coding information. It may be signaled in units of an attribute parameter set (APS) for signal level and a tile parameter set (TPS) (or tile inventory) for signaling at the tile level. Also, signaling information may be signaled in units of coding units such as slices or tiles.
- the encapsulation processor mentioned in this disclosure may create a sample group by grouping one or more samples.
- the encapsulation processor, meta data processor, or signaling processor mentioned in this disclosure may signal signaling information related to a sample group to a sample, a sample group, or a sample entry. That is, sample group information associated with a sample group may be added to a sample, sample group, or sample entry.
- the sample group information may be 3D bounding box sample group information, 3D region sample group information, 3D tile sample group information, 3D tile inventory sample group information, and the like.
- the encapsulation processor mentioned in this disclosure may create a track group by grouping one or more tracks.
- the encapsulation processor, meta data processor, or signaling processor mentioned in this disclosure may signal signaling information related to a track group to a sample, track group, or sample entry. That is, track group information associated with a track group may be added to a sample, track group, or sample entry.
- the track group information may be 3D bounding box track group information, point cloud composition track group information, spatial area track group information, 3D tile track group information, 3D tile inventory track group information, and the like.
- FIG. 12 is a diagram for explaining an ISOBMFF-based file including a single track.
- 12(a) shows an example of the layout of an ISOBMFF-based file including a single track
- FIG. 12(b) shows a sample structure of an mdat box when a G-PCC bitstream is stored in a single track of a file.
- shows an example for 13 is a diagram for explaining an ISOBMFF-based file including multiple tracks.
- 13(a) shows an example of the layout of an ISOBMFF-based file including multiple tracks
- FIG. 13(b) shows a sample structure of an mdat box when a G-PCC bitstream is stored in a single track of a file. shows an example for
- a stsd box (SampleDescriptionBox) included in the moov box of the file may include a sample entry for a single track storing a G-PCC bitstream.
- SPS, GPS, APS, and tile inventories can be included in sample entries in the moov box or samples in the mdat box in the file.
- geometry slices and attribute slices of zero or more may be included in the sample of the mdat box in the file.
- each sample may contain multiple G-PCC components. That is, each sample may consist of one or more TLV encapsulation structures.
- a single track sample entry can be defined as follows.
- Quantity One or more sample entries may be present
- the sample entry type 'gpe1' or 'gpeg' is mandatory, and one or more sample entries may exist.
- a G-PCC track can use a VolumetricVisualSampleEntry having a sample entry type of 'gpe1' or 'gpeg'.
- the sample entry of the G-PCC track may include a G-PCC decoder configuration box (GPCCConfigurationBox), and the G-PCC decoder configuration box may include a G-PCC decoder configuration record (GPCCDecoderConfigurationRecord()).
- GPCCDecoderConfigurationRecord() may include at least one of configurationVersion, profile_idc, profile_compatibility_flags, level_idc, numOfSetupUnitArrays, SetupUnitType, completeness, numOfSepupUnit, and setupUnit.
- the setupUnit array field included in GPCCDecoderConfigurationRecord() may include TLV encapsulation structures including one SPS.
- sample entry type is 'gpe1'
- all parameter sets such as SPS, GPS, APS, and tile inventory may be included in the array of setupUints.
- the sample entry type is 'gpeg'
- the above parameter sets may be included in an array of setupUints (ie sample entry) or included in a corresponding stream (ie sample).
- An example of the syntax of a G-PCC sample entry (GPCCSampleEntry) having a sample entry type of 'gpe1' is as follows.
- a G-PCC sample entry (GPCCSampleEntry) having a sample entry type of 'gpe1' may include GPCCConfigurationBox, 3DboundingBoxInfoBox(), CubicRegionInfoBox(), and TileInventoryBox().
- 3DboundingBoxInfoBox( ) may indicate 3D bounding box information of point cloud data related to samples carried to a corresponding track.
- CubicRegionInfoBox( ) may indicate one or more pieces of spatial domain information of point cloud data carried by samples within a corresponding track.
- TileInventoryBox() may indicate 3D tile inventory information of point cloud data carried by samples within a corresponding track.
- the sample may include TLV encapsulation structures including geometry slices. Additionally, a sample may include TLV encapsulation structures that include one or more parameter sets. Additionally, a sample may contain TLV encapsulation structures containing one or more attribute slices.
- each geometry slice or attribute slice may be mapped to an individual track.
- a geometry slice may be mapped to track 1
- an attribute slice may be mapped to track 2.
- a track (track 1) carrying a geometry slice may be referred to as a geometry track or a G-PCC geometry track
- a track (track 2) carrying an attribute slice may be referred to as an attribute track or a G-PCC attribute track.
- the geometry track may be defined as a volumetric visual track carrying geometry slices
- the attribute track may be defined as a volumetric visual track carrying attribute slices.
- a track carrying a part of a G-PCC bitstream including both a geometry slice and an attribute slice may be referred to as a multiplexed track.
- each sample in a track may include at least one TLV encapsulation structure carrying data of a single G-PCC component.
- each sample contains neither geometry nor attributes, and may also contain multiple attributes.
- Multi-track encapsulation of the G-PCC bitstream can enable a G-PCC player to effectively access one of the G-PCC components.
- the following conditions need to be satisfied for a G-PCC player to effectively access one of the G-PCC components.
- a new box is added to indicate the role of the stream included in the corresponding track.
- the new box may be the aforementioned G-PCC component type box (GPCCComponentTypeBox). That is, GPCCComponentTypeBox can be included in sample entries for multiple tracks.
- a track reference is introduced from a track carrying only the G-PCC geometry bitstream to a track carrying the G-PCC attribute bitstream.
- GPCCComponentTypeBox may include GPCCComponentTypeStruct(). If GPCCComponentTypeBox is present in a sample entry of tracks carrying some or all of the G-PCC bitstream, then GPCCComponentTypeStruct() specifies the type (e.g. geometry, attribute) of one or more G-PCC components carried by each track. can instruct For example, if the value of the gpcc_type field included in GPCCComponentTypeStruct() is 2, it can indicate a geometry component, and if it is 4, it can indicate an attribute component. In addition, when the value of the gpcc_type field is 4, that is, indicates an attribute component, an AttrIdx field indicating an identifier of an attribute signaled to SPS() may be further included.
- GPCCComponentTypeBox specifies the type (e.g. geometry, attribute) of one or more G-PCC components carried by each track. can instruct For example, if the value of the gpcc_type
- the syntax of the sample entry can be defined as follows.
- Quantity One or more sample entries may be present
- the sample entry type 'gpc1', 'gpcg', 'gpc1' or 'gpcg' is mandatory, and one or more sample entries may be present. Multiple tracks (eg geometry or attribute tracks) may use a VolumetricVisualSampleEntry with a sample entry type of 'gpc1', 'gpcg', 'gpc1' or 'gpcg'.
- all parameter sets can be present in the setupUnit array.
- the parameter set may exist in the corresponding array or stream.
- the GPCCComponentTypeBox may not exist.
- the SPS, GPS and tile inventories may be present in the SetupUnit array of tracks carrying the G-PCC geometry bitstream. All relevant APSs may be present in the SetupUnit array of tracks carrying the G-PCC attribute bitstream.
- SPS, GPS, APS or tile inventory may exist in the corresponding array or stream.
- a GPCCComponentTypeBox may need to be present.
- the compressorname of the base class VolumetricVisualSampleEntry can indicate the name of the compressor used with the recommended “ ⁇ 013GPCC coding” value.
- “ ⁇ 013GPCC coding” the first byte (octal number 13 or decimal number 11 represented by ⁇ 013) is the number of remaining bytes, and may indicate the number of bytes of the remaining string.
- congif may include G-PCC decoder configuration information.
- info may indicate G-PCC component information carried in each track. info may indicate a component tile carried in a track, and may also indicate an attribute name, index, and attribute type of a G-PCC component carried in a G-PCC attribute track.
- the syntax for the sample format is as follows.
- each sample corresponds to a single point cloud frame and may be composed of one or more TLV encapsulation structures belonging to the same presentation time.
- Each TLV encapsulation structure may contain a single type of TLV payload.
- one sample may be independent (eg, sync sample).
- GPCCLength indicates the length of a corresponding sample
- gpcc_unit may include an instance of a TLV encapsulation structure including a single G-PCC component (eg, a geometry slice).
- each sample may correspond to a single point cloud frame, and samples contributing to the same point cloud frame in different tracks may have the same presentation time.
- Each sample may consist of one or more G-PCC units of the G-PCC component indicated in the GPCCComponentInfoBox of the sample entry and zero or more G-PCC units carrying either a parameter set or a tile inventory. If a G-PCC unit containing a parameter set or tile inventory exists in a sample, the corresponding F-PCC sample may need to appear before the G-PCC unit of the G-PCC component.
- Each sample may include one or more G-PCC units containing an attribute data unit and zero or more G-PCC units carrying a parameter set.
- each TLV encapsulation structure in the corresponding sample need to access Also, if one sample is composed of multiple TLV encapsulation structures, each of the multiple TLV encapsulation structures may be stored as a subsample. A subsample may be referred to as a G-PCC subsample.
- a sample contains a parameter set TLV encapsulation structure containing parameter sets, a geometry TLV encapsulation structure containing geometry slices, and an attribute TLV encapsulation structure containing attribute slices
- the parameter set A TLV encapsulation structure, a geometry TLV encapsulation structure, and an attribute TLV encapsulation structure may each be stored as a subsample.
- the type of TLV encapsulation structure carried in the corresponding subsample may be required.
- the G-PCC subsample may contain only one TLV encapsulation structure.
- One SubSampleInformationBox may exist in the sample table box (SampleTableBox, stbl) of the moov box, or may exist in the track fragment box (TrackFragmentBox, traf) of each movie fragment box (MovieFragmentBox, moof). If SubSampleInformationBox exists, an 8-bit type value of a TLV encapsulation structure may be included in a 32-bit codec_specific_parameters field of a subsample entry in SubSampleInformationBox.
- a 6-bit value of the attribute index may be included in the 32-bit codec_specific_parameters field of the subsample entry in SubSampleInformationBox.
- the type of each subsample may be identified by parsing the codec_specific_parameters field of the subsample entry in SubSampleInformationBox.
- codec_specific_parameters of SubSampleInformationBox can be defined as follows.
- bit(18) reserved 0;
- bit(24) reserved 0;
- bit(7) reserved 0;
- bit(24) reserved 0;
- payloadType may indicate the tlv_type of the TLV encapsulation structure in the corresponding subsample. For example, if the value of payloadType is 4, an attribute slice (ie, attribute slice) may be indicated.
- attrIdx may indicate an identifier of attribute information of a TLV encapsulation structure including an attribute payload in a corresponding subsample.
- attrIdx may be the same as ash_attr_sps_attr_idx of the TLV encapsulation structure including the attribute payload in the corresponding subsample.
- tile_data may indicate whether a subsample contains one tile or another tile.
- tile_data If the value of tile_data is 1, it may indicate that the subsample includes a TLV encapsulation structure (s) including a geometry data unit or attribute data unit corresponding to one G-PCC tile. If the value of tile_data is 0, it may indicate that the subsample includes the TLV encapsulation structure(s) including each parameter set, tile inventory or frame boundary marker. tile_id may indicate the index of the G-PCC tile to which the subsample is associated within the tile inventory.
- a track reference tool may be used.
- One TrackReferenceTypeBoxes can be added to the TrackReferenceBox in the TrackBox of the G-PCC track.
- TrackReferenceTypeBox may contain an array of track_IDs specifying the tracks referenced by the G-PCC track.
- the present disclosure relates to the carriage of G-PCC data (hereinafter, which may be referred to as a G-PCC bitstream, an encapsulated G-PCC bitstream, or a G-PCC file).
- G-PCC bitstream an encapsulated G-PCC bitstream
- G-PCC file a G-PCC file
- Devices and methods for supporting temporal scalability may be provided.
- the present disclosure can propose apparatus and methods for providing a point cloud content service that efficiently stores a G-PCC bitstream in a single track in a file or divides and stores a G-PCC bitstream in a plurality of tracks and provides signaling therefor. there is.
- the present disclosure proposes apparatus and methods for processing a file storage scheme to support efficient access to a stored G-PCC bitstream.
- Temporal scalability can refer to functionality that allows for the possibility of extracting one or more subsets of independently coded frames.
- temporal scalability may refer to a function of dividing G-PCC data into a plurality of different temporal levels and independently processing each G-PCC frame belonging to the different temporal levels. If temporal scalability is supported, the G-PCC player (or the transmission device and/or the reception device of the present disclosure) can effectively access a desired component (target component) among G-PCC components.
- target component target component
- temporal scalability support can be expressed as more flexible temporal sub-layering at the system level.
- temporal scalability allows a system that processes G-PCC data (point cloud content provision system) to manipulate data at a high level to match network capabilities or decoder capabilities, etc. It is possible to improve the performance of the point cloud content providing system.
- G-PCC data point cloud content provision system
- G-PCC content can be carried in a plurality of tile tracks, and information about temporal scalability can be signaled.
- the temporal scalability information includes a box present in a track or tile base track and a box present in a tile track, that is, a box for temporal scalability information (hereinafter, a 'temporal scalability information box' or It can be carried using an 'extensibility information box').
- a box present in a GPCC track or a tile base track carrying temporal scalability information may be GPCCScalabilityInfoBox
- a box present in a tile track may be GPCCTileScalabilityInfoBox.
- GPCCTileScalabilityInfoBox may exist in each tile track associated with a tile base track in which GPCCScalabilityInfoBox exists.
- the tile base track may be a track having a sample entry type of 'gpeb' or 'gpcb'.
- the GPCCScalabilityInfoBox exists in a track having a sample entry of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', or 'gpeb', it may indicate that temporal scalability is supported, It can provide information about the temporal level present in the track. Such a box may not be present in a track if temporal scalability is not used.
- a GPCC track including a box for temporal scalability information (ie, a temporal scalability information box) in a sample entry may be expressed as a temporal level track.
- temporal scalability information box eg, GPCCScalabilityInfoBox
- GPCCScalabilityInfoBox information on temporal scalability
- each track has G - One type of component of PCC data (eg geometry or attribute)
- GPCCScalabilityInfoBox information on temporal scalability
- the same scalability information may be signaled redundantly, which may be inefficient.
- the present disclosure proposes a technique for specifying temporal scalability information, that is, a track in which a temporal scalability information box exists, and processing the temporal scalability information when the corresponding box does not exist. do.
- GPCCSampleEntry() extends VolumetricVisualSampleEntry(type) ⁇ //type is 'gpc1' or 'gpcg' GPCCConfigurationBox config; GPCCComponentInfoBox info; GPCCScalabilityInfoBox temporalLevelInfo; // optional boxes ⁇ G-PCC scalability information box Definition Box Types: 'gsci' Container: GPCCSampleEntry('gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', 'gpeb') Mandatory: No Quantity: Zero or one This box signals scalability information for a G-PCC track.
- this box When this box is present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', it indicates that temporal scalability is supported and provides information about the temporal levels present in that G-PCC track.
- This box shall not be present in a track when temporal scalability is not used.
- This box shall not be present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', when all the frames are signaled in one temporal level.
- This box shall not be present in tracks with a sample entry of type 'gpt1'.
- a G-PCC track containing GPCCScalabilityInfoBox box in the sample entry is referred as temporal level track.
- GPCCScalabilityInfoBox may be present only if the track carries geometry component.
- GPCCScalabilityInfoBox is not present but it is inferred from the GPCCScalabilityInfoBox in the sample entry of the corresponding track that carries the geometry component.
- the scalability information i.e., temporal scalability information
- the scalability information will exist only in tracks carrying only specific components.
- a specific sample entry type may be 'gpc1' or 'gpcg'
- a specific component may be a geometry component.
- the temporal scalability information may be carried by a specific temporal scalability information box (eg, GPCCScalabilityInfoBox), and when one track carries another component, the temporal scalability information box does not exist in the corresponding track.
- the other component may include an attribute component (attribute component) as an example.
- one track may have a specific sample entry type, and the specific sample entry type may be as described above.
- temporal scalability information in a track carrying another component eg, attribute component
- a temporal scalability information box in a track carrying a first component eg, an attribute component
- a second component eg, a geometric component
- the sample entry types (eg, 'gpc1' and 'gpcg') of the two tracks may be the same.
- a track carrying a second component (eg, geometric component) from which temporal scalability information of a track carrying a first component (eg, an attribute component) is derived corresponds to a corresponding sample entry type having the same sample entry type.
- a temporal extensibility information box may exist after a component information box (eg, GPCCComponentInfoBox).
- a temporal scalability information box may exist as a track conveying a specific component (eg, geometric component)
- the temporal scalability information of a track conveying an attribute component. Box signaling overhead can be reduced and image encoding/decoding efficiency can be improved.
- This box signals scalability information for a G-PCC track.
- this box is present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', it indicates that temporal scalability is supported and provides information about the temporal levels present in that G-PCC track.
- This box shall not be present in a track when temporal scalability is not used.
- This box shall not be present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', when all the frames are signaled in one temporal level.
- This box shall not be present in tracks with a sample entry of type 'gpt1'.
- a G-PCC track containing GPCCScalabilityInfoBox box in the sample entry is referred as temporal level track.
- Temporal scalability is used and G-PCC data is stored/carried in a track, and if a track has a specific sample entry type (eg 'gpc1', 'gpcg'), a specific component (eg attribute component) Values of all syntax elements included in the temporal scalability information of tracks carrying ? may be the same as corresponding syntax elements included in temporal scalability information of tracks carrying other components (eg, geometric components). For example, some or all of the temporal scalability information included in the temporal scalability information boxes of the two tracks may be the same.
- a specific sample entry type eg 'gpc1', 'gpcg'
- a specific component eg attribute component
- the temporal scalability information of a track carrying a specific component is not separately signaled, or a corresponding track (eg, a track carrying an attribute component) is configured to transmit another track (eg, a track carrying a geometric component).
- a corresponding track eg, a track carrying an attribute component
- another track eg, a track carrying a geometric component
- the contradiction between temporal scalability information is reduced by reducing the signaling overhead of the temporal scalability information box of a track conveying an attribute component and equalizing the temporal scalability information of a track conveying other components. can be prevented and image encoding/decoding efficiency can be improved.
- FIG. 14 is an example of a method performed by an apparatus for receiving point cloud data
- FIG. 15 is an example of a method performed by an apparatus for transmitting point cloud data.
- the receiving device or the transmitting device may include those described with reference to the drawings in this disclosure, and may be the same as the receiving device or transmitting device assumed to describe the embodiment above. That is, it is obvious that the receiving device performing FIG. 14 and the transmitting device performing FIG. 15 may implement other embodiments described above.
- the receiving device may obtain temporal scalability information of a point cloud in a 3D space based on a G-PCC file (S1401).
- the G-PCC file may be obtained by being transmitted from a transmission device.
- the receiving device may restore the 3D point cloud based on the temporal scalability information (S1402), and the temporal scalability information is obtained from one or more tracks, and the one or more tracks include a first track, . . . An nth track may be included.
- the temporal scalability information of the first track may be derived from the temporal scalability information of the second track based on a component carried by the first track.
- the temporal scalability information can be derived from the temporal scalability information of another track based on the component carried by the track.
- the first track is a track carrying a first component (eg, an attribute component)
- temporal scalability information of the first track may be derived from temporal scalability information of the second track.
- the sample entry type of the first track may be any one of a specific type (eg, 'gpc1' or 'gpcg').
- the second track may be a track carrying a second component (eg, a geometric component).
- the temporal scalability information of one track can be derived (or inferred) from the temporal scalability information of tracks carrying other components in addition to the components carried by one track.
- the sample entry type of the second track may be, for example, 'gpc1' or 'gpcg'.
- temporal scalability information can be obtained only from tracks carrying specific components (eg, geometric components). That is, temporal scalability information can be obtained based on components carried by tracks. As an example, temporal scalability information may not exist or may not be obtained in a track carrying components other than a specific component.
- the temporal scalability information may be obtained from all tracks whose sample entry type is not a specific type (eg, 'gpt1'). That is, temporal scalability information may be obtained based on a sample entry type of a track. In other words, temporal scalability information may be obtained or derived based on the sample entry type and/or the component the track carries. Meanwhile, the temporal scalability information may be obtained by being included in a scalability information box (eg, GPCCScalabilityInfoBox). In addition, it may be signaled prior to a component information box (eg, GPCCComponentInfoBox).
- a scalability information box eg, GPCCScalabilityInfoBox
- the transmission device may determine whether temporal scalability is applied to point cloud data in a 3D space (S1501), and include temporal scalability information and point cloud data so that G - A PCC file can be created (S1502).
- the temporal scalability information is carried by one or more tracks, and as described above, the one or more tracks include a first track, . . . It may include an n-th track. Whether or not temporal scalability information may exist in the first track, that is, whether or not may be present, may be determined based on a component carried by the first track.
- temporal scalability information of a track exists may be determined based on an element carried by the track. If one track, the first track, carries a specific component (eg, an attribute component), temporal scalability information of the first track may be omitted. That is, it may not be signaled.
- the sample entry type of the track may be a specific type (eg, 'gpc1' or 'gpcg'). That is, whether temporal scalability information of a track exists may be determined based on the sample entry type of the track. That is, whether or not the temporal scalability information of the track exists may be determined based on at least one of the sample entry type of the track and/or the component carried by the track.
- the transmitting device may determine that there is no temporal scalability information of a track
- the receiving device may derive (infer) based on other temporal scalability information, as described above.
- temporal scalability information of one track may be included in the G-PCC file. That is, it may be explicitly signaled.
- the scope of the present disclosure is software or machine-executable instructions (eg, operating systems, applications, firmware, programs, etc.) that cause operations in accordance with the methods of various embodiments to be executed on a device or computer, and such software or It includes a non-transitory computer-readable medium in which instructions and the like are stored and executable on a device or computer.
- Embodiments according to this disclosure may be used to provide point cloud content. Also, embodiments according to the present disclosure may be used to encode/decode point cloud data.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
Abstract
Description
... aligned(8) class GPCCSampleEntry() extends VolumetricVisualSampleEntry (type) { //type is 'gpc1' or 'gpcg' GPCCConfigurationBox config; GPCCComponentInfoBox info; GPCCScalabilityInfoBox temporalLevelInfo; // optional boxes } G-PCC scalability information box Definition Box Types: 'gsci' Container: GPCCSampleEntry ('gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', 'gpeb') Mandatory: No Quantity: Zero or one This box signals scalability information for a G-PCC track. When this box is present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', it indicates that temporal scalability is supported and provides information about the temporal levels present in that G-PCC track. This box shall not be present in a track when temporal scalability is not used. This box shall not be present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', when all the frames are signalled in one temporal level. This box shall not be present in tracks with a sample entry of type 'gpt1'. A G-PCC track containing GPCCScalabilityInfoBox box in the sample entry is referred as temporal level track. For track with sample entry type 'gpc1' or 'gpcg', GPCCScalabilityInfoBox may be present only if the track carries geometry component. For track with sample entry type 'gpc1' or 'gpcg' that carries attribute component, GPCCScalabilityInfoBox is not present but it is inferred from the GPCCScalabilityInfoBox in the sample entry of the corresponding track that carries the geometry component. |
... G-PCC scalability information box Definition Box Types: 'gsci' Container: GPCCSampleEntry ('gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', 'gpeb') Mandatory: No Quantity: Zero or one This box signals scalability information for a G-PCC track. When this box is present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', it indicates that temporal scalability is supported and provides information about the temporal levels present in that G-PCC track. This box shall not be present in a track when temporal scalability is not used. This box shall not be present in tracks with sample entries of type 'gpe1', 'gpeg', 'gpc1', 'gpcg', 'gpcb', and 'gpeb', when all the frames are signalled in one temporal level. This box shall not be present in tracks with a sample entry of type 'gpt1'. A G-PCC track containing GPCCScalabilityInfoBox box in the sample entry is referred as temporal level track. When this box is present in tracks with sample entries of type 'gpc1' or 'gpcg' that contains attribute component, it is constrained that the values of syntax elements in the box shall be the same as their corresponding syntax elements in the GPCCScalabilityInfoBox of the associated track that contains geometry component. ... |
Claims (14)
- 포인트 클라우드 데이터의 수신 장치에서 수행되는 방법으로서,G-PCC 파일을 기반으로 3차원 공간의 포인트 클라우드의 시간적 확장성 정보를 획득하는 단계; 및상기 시간적 확장성 정보를 기반으로 상기 3차원 포인트 클라우드를 복원하는 단계;를 포함하며상기 시간적 확장성 정보는 하나 이상의 트랙으로부터 획득되되, 제1 트랙의 시간적 확장성 정보는 상기 제1 트랙이 캐리(carry)하는 성분에 기반하여 제2 트랙의 시간적 확장성 정보로부터 유도되는, 방법.
- 제1 항에 있어서,상기 제1 트랙이 속성 성분을 캐리하는 트랙이면, 상기 제1 트랙의 시간적 확장성 정보는 상기 제2 트랙의 시간적 확장성 정보로부터 유도되는, 방법.
- 제1 항에 있어서,상기 제1 트랙의 샘플 엔트리 유형은 'gpc1' 혹은 'gpcg' 중 어느 하나인, 방법.
- 제1 항에 있어서,상기 제2 트랙은 기하 성분을 캐리하는 트랙인, 방법.
- 제4 항에 있어서,상기 제2 트랙의 샘플 엔트리 유형은 'gpc1' 혹은 'gpcg' 중 어느 하나인, 방법.
- 제1 항에 있어서,상기 시간적 확장성 정보는 기하 성분을 캐리하는 트랙으로부터만 획득되는, 방법.
- 제1 항에 있어서,상기 시간적 확장성 정보는 확장성 정보 박스(GPCCScalabilityInfoBox)에 포함되어 획득되는, 방법.
- 제1 항에 있어서,상기 시간적 확장성 정보는 샘플 엔트리 유형이 'gpt1' 외인 트랙으로부터 획득되는, 방법.
- 포인트 클라우드 데이터의 전송 장치에서 수행되는 방법으로서,3차원 공간의 포인트 클라우드 데이터에 시간적 확장성이 적용되는지 여부를 결정하는 단계; 및시간적 확장성 정보 및 상기 포인트 클라우드 데이터를 포함하여 G-PCC 파일을 생성하는 단계;를 포함하되,상기 시간적 확장성 정보는 하나 이상의 트랙에 의해 캐리(carry)되되, 제1 트랙에서 시간적 확장성 정보의 존부는 상기 제1 트랙이 캐리하는 성분에 기반하여 결정되는, 방법.
- 제9 항에 있어서,상기 제1 트랙이 속성 성분을 캐리하면, 상기 제1 트랙의 시간적 확장성 정보는 생략되는, 방법.
- 제10 항에 있어서,상기 제1 트랙의 샘플 엔트리 유형은 'gpc1' 혹은 'gpcg' 중 어느 하나인, 방법.
- 제9 항에 있어서,상기 제1 트랙이 기하 성분을 캐리하면, 상기 제1 트랙의 시간적 확장성 정보는 상기 G-PCC 파일에 포함되는, 방법.
- 포인트 클라우드 데이터의 수신 장치로서,메모리; 및적어도 하나의 프로세서를 포함하고,상기 적어도 하나의 프로세서는,G-PCC 파일을 기반으로 3차원 공간의 포인트 클라우드의 시간적 확장성 정보를 획득하고,상기 시간적 확장성 정보를 기반으로 상기 3차원 포인트 클라우드를 복원하며상기 시간적 확장성 정보는 하나 이상의 트랙으로부터 획득되되, 제1 트랙의 시간적 확장성 정보는 상기 제1 트랙이 캐리(carry)하는 성분에 기반하여 제2 트랙의 시간적 확장성 정보로부터 유도되는, 장치.
- 포인트 클라우드 데이터의 전송 장치로서,메모리; 및적어도 하나의 프로세서를 포함하고,상기 적어도 하나의 프로세서는,3차원 공간의 포인트 클라우드 데이터에 시간적 확장성이 적용되는지 여부를 결정하고,시간적 확장성 정보 및 상기 포인트 클라우드 데이터를 포함하여 G-PCC 파일을 생성하되,상기 시간적 확장성 정보는 하나 이상의 트랙에 의해 캐리(carry)되되, 제1 트랙에서 시간적 확장성 정보의 존부는 상기 제1 트랙이 캐리하는 성분에 기반하여 결정되는, 장치.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020247023520A KR20240125608A (ko) | 2021-12-14 | 2022-12-13 | 포인트 클라우드 데이터의 전송 장치와 이 전송 장치에서 수행되는 방법 및, 포인트 클라우드 데이터의 수신 장치와 이 수신 장치에서 수행되는 방법 |
CN202280081440.7A CN118355667A (zh) | 2021-12-14 | 2022-12-13 | 点云数据的发送装置和由发送装置执行的方法及点云数据的接收装置和由接收装置执行的方法 |
EP22907884.5A EP4451690A1 (en) | 2021-12-14 | 2022-12-13 | Transmission device for point cloud data and method performed by transmission device, and reception device for point cloud data and method performed by reception device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163289615P | 2021-12-14 | 2021-12-14 | |
US63/289,615 | 2021-12-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023113417A1 true WO2023113417A1 (ko) | 2023-06-22 |
Family
ID=86773061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/020191 WO2023113417A1 (ko) | 2021-12-14 | 2022-12-13 | 포인트 클라우드 데이터의 전송 장치와 이 전송 장치에서 수행되는 방법 및, 포인트 클라우드 데이터의 수신 장치와 이 수신 장치에서 수행되는 방법 |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4451690A1 (ko) |
KR (1) | KR20240125608A (ko) |
CN (1) | CN118355667A (ko) |
WO (1) | WO2023113417A1 (ko) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200045323A1 (en) * | 2017-03-27 | 2020-02-06 | Nokia Technologies Oy | An Apparatus, A Method And A Computer Program For Video Coding And Decoding |
KR20210004885A (ko) * | 2019-07-03 | 2021-01-13 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
KR20210005524A (ko) * | 2019-07-04 | 2021-01-14 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
WO2021123159A1 (en) * | 2019-12-20 | 2021-06-24 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Video data stream, video encoder, apparatus and methods for hrd timing fixes, and further additions for scalable and mergeable bitstreams |
US20210360268A1 (en) * | 2019-02-15 | 2021-11-18 | Virginie Drugeon | Encoder |
-
2022
- 2022-12-13 CN CN202280081440.7A patent/CN118355667A/zh active Pending
- 2022-12-13 KR KR1020247023520A patent/KR20240125608A/ko unknown
- 2022-12-13 EP EP22907884.5A patent/EP4451690A1/en active Pending
- 2022-12-13 WO PCT/KR2022/020191 patent/WO2023113417A1/ko active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200045323A1 (en) * | 2017-03-27 | 2020-02-06 | Nokia Technologies Oy | An Apparatus, A Method And A Computer Program For Video Coding And Decoding |
US20210360268A1 (en) * | 2019-02-15 | 2021-11-18 | Virginie Drugeon | Encoder |
KR20210004885A (ko) * | 2019-07-03 | 2021-01-13 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
KR20210005524A (ko) * | 2019-07-04 | 2021-01-14 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
WO2021123159A1 (en) * | 2019-12-20 | 2021-06-24 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Video data stream, video encoder, apparatus and methods for hrd timing fixes, and further additions for scalable and mergeable bitstreams |
Also Published As
Publication number | Publication date |
---|---|
EP4451690A1 (en) | 2024-10-23 |
KR20240125608A (ko) | 2024-08-19 |
CN118355667A (zh) | 2024-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021002730A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2020190114A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2020189895A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2020189982A1 (ko) | 포인트 클라우드 데이터 처리 장치 및 방법 | |
WO2020189903A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2020242077A1 (ko) | 포인트 클라우드 데이터 처리 장치 및 방법 | |
WO2020190090A1 (ko) | 포인트 클라우드 데이터 전송 장치, 포인트 클라우드 데이터 전송 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021071257A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021141258A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021210763A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021060850A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2020189943A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2020256308A1 (ko) | 포인트 클라우드 데이터 처리 장치 및 방법 | |
WO2020262831A1 (ko) | 포인트 클라우드 데이터 처리 장치 및 방법 | |
WO2021210860A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021002592A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021261865A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021206365A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021201384A1 (ko) | 포인트 클라우드 데이터 처리 장치 및 방법 | |
WO2021029511A1 (ko) | 포인트 클라우드 데이터 전송 장치, 포인트 클라우드 데이터 전송 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2022220382A1 (ko) | 포인트 클라우드 데이터 전송 방법, 포인트 클라우드 데이터 전송 장치, 포인트 클라우드 데이터 수신 방법 및 포인트 클라우드 데이터 수신 장치 | |
WO2021261897A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2023282543A1 (ko) | 포인트 클라우드 데이터의 전송 장치와 이 전송 장치에서 수행되는 방법 및, 포인트 클라우드 데이터의 수신 장치와 이 수신 장치에서 수행되는 방법 | |
WO2020189891A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
WO2021025392A1 (ko) | 포인트 클라우드 데이터 처리 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22907884 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18716035 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20247023520 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022907884 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022907884 Country of ref document: EP Effective date: 20240715 |