WO2023202095A1

WO2023202095A1 - Point cloud media encoding method and apparatus, point cloud media decoding method and apparatus, and electronic device and storage medium

Info

Publication number: WO2023202095A1
Application number: PCT/CN2022/137764
Authority: WO
Inventors: 胡颖
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2022-04-22
Filing date: 2022-12-09
Publication date: 2023-10-26
Also published as: CN114697668A; CN116744007A; CN114697668B

Abstract

A point cloud media decoding method, which is executed by an electronic device. The method comprises: acquiring a point cloud media file, wherein the point cloud media file comprises point cloud samples, which are encapsulated in one or more tracks (S610); parsing a media file data box of each sub-sample in the point cloud samples, so as to obtain a value of a sub-sample flag bit field (S620); according to the value of the sub-sample flag bit field, acquiring index information of one or more point cloud subframes corresponding to data units in the sub-sample, wherein when one data unit in the sub-sample corresponds to the index information of at least two point cloud sub-frames, the at least two point cloud subframes have overlapped point cloud data (S630); and decapsulating and decoding the point cloud media file according to the index information of the one or more point cloud subframes, so as to obtain point cloud data (S640).

Description

Point cloud media encoding and decoding methods, devices, electronic equipment and storage media

This application claims priority to the Chinese patent application submitted to the China Patent Office on April 22, 2022, with application number 2022104281526 and the invention name "Encoding and decoding methods of point cloud media and related products", the entire content of which is incorporated by reference in in this application.

Technical field

This application belongs to the field of audio and video technology, and specifically relates to a point cloud media encoding method, a point cloud media decoding method, a point cloud media encoding device, a point cloud media decoding device, computer readable media, electronic equipment and computer programs. product.

Background technique

Point cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene. After obtaining large-scale point cloud data through point cloud acquisition equipment, the point cloud data can be encoded and encapsulated for transmission, decoding and presentation to users. There are some point cloud frames with less content in the point cloud media, and there are overlapping point cloud contents between some point cloud frames. Therefore, encoding and decoding each point cloud frame separately will cause a waste of computing resources and affect Point cloud media processing efficiency.

Contents of the invention

According to various embodiments of the present application, a point cloud media encoding method, a point cloud media decoding method, a point cloud media encoding device, a point cloud media decoding device, a computer-readable medium, an electronic device, and a computer are provided Program Products.

Additional features and advantages of the invention will be apparent from the detailed description which follows, or, in part, may be learned by practice of the invention.

According to one aspect of the embodiment of the present application, a method for decoding point cloud media is provided, which is executed by an electronic device, including:

Obtaining a point cloud media file, the point cloud media file including point cloud samples encapsulated in one or more tracks;

Analyze the media file data box of each sub-sample in the point cloud sample to obtain the value of the sub-sample flag field;

Obtain the index information of one or more point cloud subframes corresponding to each data unit in the subsample according to the value of the subsample flag field; when one data unit in the subsample corresponds to at least two When the index information of point cloud sub-frames is provided, the at least two point cloud sub-frames have overlapping point cloud data; and

The point cloud media file is decapsulated and decoded according to the index information of the one or more point cloud subframes to obtain point cloud data.

According to one aspect of the embodiment of the present application, a method for encoding point cloud media is provided, which is executed by an electronic device, including:

Obtaining point cloud source data, the point cloud source data includes a point cloud frame having one or more point cloud subframes;

Encoding the point cloud frame to obtain at least one data unit; and

The at least one data unit is encapsulated to obtain a point cloud media file. The point cloud media file includes point cloud samples encapsulated in one or more tracks; media files for each subsample in the point cloud sample. The data box includes a subframe index field; the subframe index field is used to indicate index information of one or more point cloud subframes corresponding to each data unit in the subsample; when one data unit in the subsample When the unit corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.

According to one aspect of the embodiment of the present application, a point cloud media decoding device is provided, including:

An acquisition module configured to acquire point cloud media files, where the point cloud media files include point cloud samples encapsulated in one or more tracks;

A parsing module configured to parse the media file data box of each sub-sample in the point cloud sample and obtain the value of the sub-sample flag field;

An index module configured to obtain the index information of one or more point cloud subframes corresponding to each data unit in the subsample according to the value of the subsample flag field; when one of the subsamples When the data unit corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data; and

The decoding module is configured to decapsulate and decode the point cloud media file according to the index information of the one or more point cloud subframes to obtain point cloud data.

According to one aspect of the embodiment of the present application, a point cloud media encoding device is provided, including:

An acquisition module configured to acquire point cloud source data, where the point cloud source data includes a point cloud frame having one or more point cloud subframes;

An encoding module configured to encode the point cloud frame to obtain at least one data unit; and

An encapsulation module, configured to encapsulate the at least one data unit to obtain a point cloud media file, where the point cloud media file includes point cloud samples encapsulated in one or more tracks; The media file data box of each subsample includes a subframe index field; the subframe index field is used to indicate the index information of one or more point cloud subframes corresponding to each data unit in the subsample; when the When one data unit in the subsample corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.

According to one aspect of the embodiment of the present application, a computer-readable medium is provided, on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor, the encoding and decoding method of point cloud media in the above technical solution is implemented. .

According to an aspect of an embodiment of the present application, an electronic device is provided, the electronic device comprising: a processor; and a memory for storing computer readable instructions of the processor; wherein the processor is configured to execute The computer readable instructions are used to execute the point cloud media encoding and decoding method in the above technical solution.

According to an aspect of an embodiment of the present application, a computer program product is provided. The computer program product includes computer-readable instructions, and the computer-readable instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device performs the encoding and decoding method of point cloud media as in the above technical solution.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features, objects and advantages of the application will become apparent from the description, drawings and claims. It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and do not limit the present application.

Description of the drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

Figure 1 schematically shows an exemplary system architecture block diagram applying the technical solution of the present application.

Figure 2 shows a schematic diagram of the point cloud media encoding and decoding process in an application scenario according to the embodiment of the present application.

Figure 3 shows the syntax structure of encapsulating point cloud samples based on TLV code stream format in one embodiment of the present application.

Figure 4 shows the syntax structure of a data unit encapsulated based on the TLV code stream format in one embodiment of the present application.

Figure 5 shows a schematic diagram of the principle of multi-frame combination of point cloud data in one embodiment of the present application.

Figure 6 shows a flow chart of the steps of the point cloud media decoding method in one embodiment of the present application.

Figure 7 shows an exemplary structure of encapsulating point cloud samples in a single track according to one embodiment of the present application.

Figure 8 shows an exemplary structure of encapsulating geometry code streams and attribute code streams in multiple tracks according to one embodiment of the present application.

Figure 9 shows the syntax structure of the coding-related parameter field codec_specific_parameters of the SubSampleInformationBox data box in an application scenario according to the embodiment of the present application.

Figure 10 shows the syntax structure of the coding-related parameter field codec_specific_parameters of the SubSampleInformationBox data box in the application scenario of uniform identification of subframe overlap/non-overlap according to the embodiment of the present application.

Figure 11 shows the syntax structure of the extended sample group tool in an application scenario according to the embodiment of the present application.

Figure 12 shows the syntax structure of identifying point cloud subframe related information based on the point cloud sample level media file data box in an application scenario according to the embodiment of the present application.

Figure 13 shows the syntax structure of identifying subframe related information through the subsample subframe information data box SubsampleSubframeInfoBox in an application scenario according to the embodiment of the present application.

Figure 14 shows the syntax structure of identifying subframe presentation time information through the subsample subframe information data box SubsampleSubframeInfoBox in an application scenario according to the embodiment of the present application.

Figure 15 shows the syntax structure of identifying subframe presentation time information through an extended media file data box in an application scenario according to an embodiment of the present application.

Figure 16 shows a structural block diagram of determining the spatial block correspondence based on three sub-sample division methods of data unit, spatial block and point cloud subframe in one embodiment of the present application.

Figure 17 shows a structural block diagram for determining the spatial block correspondence based on two sub-sample division methods of data unit and spatial block in an embodiment of the present application.

Figure 18 shows the syntax structure of the embodiment of the present application indicating the correspondence between point cloud subframes and spatial blocks through media file data boxes in an application scenario.

Figure 19 shows a step flow chart of a point cloud media encoding method in one embodiment of the present application.

Figure 20 shows a flow chart of encoding and decoding point cloud data in a streaming media transmission application scenario according to an embodiment of the present application.

Figure 21 shows the encapsulation result of single-track encapsulation of point cloud samples in an application scenario where point cloud subframes do not overlap with each other according to the embodiment of the present application.

Figure 22 shows the encapsulation result of not dividing attribute tracks into sub-samples when multi-track encapsulation of point cloud samples is performed in an application scenario where point cloud sub-frames do not overlap with each other according to the embodiment of the present application.

Figure 23 shows the encapsulation result of attribute track division sub-samples when multi-track encapsulation of point cloud samples in an application scenario where point cloud sub-frames do not overlap with each other according to the embodiment of the present application.

Figure 24 shows the encapsulation result of single-track encapsulation of point cloud samples in an application scenario where point cloud subframes overlap in an embodiment of the present application.

Figure 25 schematically shows a structural block diagram of a point cloud media decoding device provided by an embodiment of the present application.

Figure 26 schematically shows a structural block diagram of a point cloud media encoding device provided by an embodiment of the present application.

FIG. 27 schematically shows a structural block diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.

Detailed ways

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art.

Furthermore, the described features, structures or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the present application. However, those skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. may be adopted. In other instances, well-known methods, apparatus, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the present application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. That is, these functional entities may be implemented in software form, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices. entity.

The flowcharts shown in the drawings are only illustrative, and do not necessarily include all contents and operations/steps, nor must they be performed in the order described. For example, some operations/steps can be decomposed, and some operations/steps can be merged or partially merged, so the actual order of execution may change according to the actual situation.

The "plurality" mentioned in this article means two or more than two. "And/or" describes the relationship between related objects, indicating that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the related objects are in an "or" relationship.

In the specific implementation of this application, it involves user-related data such as point cloud media transmission content, decoding content, and consumption content. When various embodiments of this application are applied to specific products or technologies, user permission is required. Or agree, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.

Relevant terms or abbreviations involved in the embodiments of this application are explained as follows.

Immersive media: media content that can bring immersive experience to consumers. Immersive media can be divided into 3DoF media, 3DoF+ media and 6DoF media according to the user's degree of freedom when consuming media content. Point cloud media is a typical 6DoF media.

DoF: Degree of Freedom, degree of freedom. In this application, it refers to the degree of freedom a user has to support movement and interact with content while viewing immersive media.

3DoF: Three degrees of freedom, which refers to the three degrees of freedom for the user's head to rotate around the x, y, and z axes.

3DoF+: In addition to three degrees of freedom, the user also has limited degrees of freedom for movement along the x, y, and z axes.

6DoF: In addition to three degrees of freedom, the user also has the freedom to move freely along the x, y, and z axes.

Point cloud: Point cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene. Each point in the point cloud has at least three-dimensional position information. Depending on the application scenario, it may also have color, material or other information. Typically, each point in a point cloud has the same number of additional attributes.

PCC: Point Cloud Compression, point cloud compression.

G-PCC: Geometry-based Point Cloud Compression, point cloud compression based on geometric model.

Sample: sample, the encapsulation unit in the media file encapsulation process. A media file consists of many samples. Taking video media as an example, a sample of video media is usually a video frame.

Slice: point cloud slice/point cloud strip, which represents a set of syntax elements (such as geometric slices and attribute slices) of part or all of the encoded point cloud frame data.

Tile: point cloud space tiles.

DASH: dynamic adaptive streaming over HTTP, dynamic adaptive streaming based on HTTP is an adaptive bitrate streaming technology that enables high-quality streaming media to be delivered over the Internet through traditional HTTP web servers.

In terms of encoding methods, point cloud media can be divided into point cloud media (Video-based Point Cloud Compression, VPCC) that is compressed based on traditional video coding methods and point cloud media that is compressed based on geometric features (Geometry-based Point Cloud Compression, GPCC). In the file encapsulation of point cloud media, the three-dimensional position information is usually called the geometry component of the point cloud media file, and the attribute information is called the attribute component of the point cloud media file. A point cloud media file has only one geometric component, but there can be one or more attribute components.

Point cloud can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, so it is widely used. Its main application scenarios can be classified into two categories. 1) Machine-perceived point clouds, such as Computer Aided Design (CAD), Autonomous Navigation System (ANS), real-time inspection system, Geography Information System (GIS), and visual sorting robots , rescue and disaster relief robots. 2) The human eye perceives point clouds, such as point cloud application scenarios such as virtual reality (VR) games, digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive communication, and three-dimensional immersive interaction.

The main ways to obtain point clouds are: computer generation, 3D laser scanning, 3D photogrammetry, etc. Computers can generate point clouds of virtual three-dimensional objects and scenes. 3D scanning can obtain point clouds of static real-world three-dimensional objects or scenes, and millions of point clouds can be obtained per second. 3D photography can obtain point clouds of dynamic real-world three-dimensional objects or scenes, and tens of millions of point clouds can be obtained per second. In addition, in the medical field, point clouds of biological tissues and organs can be obtained from MRI, CT, and electromagnetic positioning information. These technologies reduce the cost and time period of point cloud data acquisition and improve the accuracy of the data. Changes in point cloud data acquisition methods have made it possible to acquire large amounts of point cloud data. With the continuous accumulation of large-scale point cloud data, efficient storage, transmission, release, sharing and standardization of point cloud data have become the key to point cloud applications.

Figure 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiment of the present application can be applied.

As shown in FIG. 1 , system architecture 100 includes a plurality of terminal devices that can communicate with each other through, for example, network 150 . For example, the system architecture 100 may include a first terminal device 110 and a second terminal device 120 interconnected through a network 150 . In the embodiment of FIG. 1 , the first terminal device 110 and the second terminal device 120 perform one-way data transmission.

For example, the first terminal device 110 may encode point cloud data (such as point cloud data collected by the terminal device 110) for transmission to the second terminal device 120 through the network 150, and the encoded point cloud data is represented by one or more The encoded point cloud data is transmitted in the form of a code stream. The second terminal device 120 can receive the encoded point cloud data from the network 150, decode the encoded point cloud data to restore the point cloud data, and display the point cloud according to the restored point cloud data. content.

In one embodiment of the present application, the system architecture 100 may include a third terminal device 130 and a fourth terminal device 140 that perform bidirectional transmission of encoded point cloud data, which bidirectional transmission may occur during a video conference, for example. For bidirectional data transmission, each of the third terminal device 130 and the fourth terminal device 140 may encode point cloud data (eg, point cloud data collected by the terminal device) for transmission to the third terminal through the network 150 device 130 and another one of the fourth terminal devices 140 . Each of the third terminal device 130 and the fourth terminal device 140 may also receive the encoded point cloud data transmitted by the other terminal device of the third terminal device 130 and the fourth terminal device 140, and may modify the encoded point cloud data. The encoded point cloud data is decoded to recover the point cloud data, and the point cloud content can be displayed on an accessible display device based on the recovered point cloud data.

In the embodiment of FIG. 1 , the first terminal device 110 , the second terminal device 120 , the third terminal device 130 and the fourth terminal device 140 may be servers, personal computers and smart phones, but the principles disclosed in this application may not be limited thereto. . Embodiments disclosed herein are suitable for use with laptops, tablets, media players, and/or dedicated video conferencing devices. The network 150 represents any number of networks that transmit encoded point cloud data between the first terminal device 110 , the second terminal device 120 , the third terminal device 130 and the fourth terminal device 140 , including, for example, wired and/or wireless communication networks. . Communication network 150 may exchange data in circuit-switched and/or packet-switched channels. The network may include telecommunications networks, local area networks, wide area networks, and/or the Internet. For purposes of this application, unless explained below, the architecture and topology of network 150 may be immaterial to the operations disclosed herein.

The server in the embodiment of this application may be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services. The terminal can be a smartphone, tablet, laptop, desktop computer, smart speaker, smart watch, vehicle terminal, smart TV, etc., but is not limited to this. The terminal and the server can be connected directly or indirectly through wired or wireless communication methods, which is not limited in this application.

After encoding the point cloud media, the encoded data stream needs to be encapsulated and transmitted to the user. Correspondingly, on the point cloud media player side, the point cloud file needs to be decapsulated first, then decoded, and finally the decoded data stream is presented.

The real-world visual scene A can be captured by collecting point cloud data through the collection device 210. The collection device 210 may be, for example, a set of cameras or a camera device with multiple lenses and sensors. The collection result is point cloud source data B, which is a frame sequence composed of a large number of point cloud frames. One or more point cloud frames may be encoded by the encoder 220 to obtain an encoded G-PCC bit stream, which may specifically include an encoded geometry bit stream and an attribute bit stream E. The file encapsulator 230 can encapsulate one or more encoded bit streams according to a specific media container file format to obtain a media file F for file playback or a series of initialization segments and media segments Fs for streaming transmission. In some embodiments of the present application, the media container file format may be, for example, the ISO basic media file format specified in ISO/IEC 14496-12 [ISOBMFF]. File encapsulator 230 may also encapsulate metadata in media files F or media segments Fs.

The media file F output by the file encapsulator 230 is the same as the media file F′ input by the file depackager 240 . The file decapsulator can extract the encoded bit stream E' and parse the metadata by processing the media file F' or processing the received media fragments F's. The decoder 250 may decode the G-PCC bit stream into a decoded signal D' and generate point cloud data according to the decoded signal D'. When applicable, point cloud data may be rendered and displayed by renderer 260 to a head mounted display or any other display device based on the current viewing position, viewing direction, or viewport determined by various types of sensors (e.g., head). on the screen. In addition to being used by the player to access the appropriate portion of the decoded point cloud data, the current viewing position or viewing direction can also be used for decoding optimization. In the viewport-dependent content distributor 270, the current viewing position and viewing direction are also passed to the policy module, which can be used to determine which track to receive.

In the transmission technology of point cloud media, streaming transmission technology is usually used to handle the transmission of media resources between the server and the client. Common media streaming transmission technologies include DASH (Dynamic Adaptive Streaming over HTTP), HLS (HTTP Live Streaming), SMT (Smart Media Transport) and other technologies.

Take DASH as an example. DASH is an adaptive bitrate streaming technology that enables high-quality streaming media to be delivered over the Internet through traditional HTTP web servers. DASH breaks the content into a series of small HTTP-based file fragments, each fragment contains a short length of playable content, and the total length of the content may be several hours (such as a movie or live sports event). Content will be cut into multiple bitrate alternatives to provide multiple bitrate versions for selection. When media content is played by a DASH client, the client will automatically select which alternative to download and play based on current network conditions. The client will select for playback the highest bitrate clip that can be downloaded in a timely manner, thus avoiding playback stutters or rebuffering events. Because of this, the DASH client can seamlessly adapt to changing network conditions and provide a high-quality playback experience with less lag and rebuffering.

DASH uses existing HTTP web server infrastructure. It allows devices such as Internet TVs, TV set-top boxes, desktop computers, smartphones, tablets and other devices to consume multimedia content (such as videos, TV, radio, etc.) transmitted through the Internet, and can cope with changing Internet reception conditions.

Taking point cloud compression G-PCC based on geometric models as an example, each G-PCC point cloud sample corresponds to a point cloud frame and consists of one or more G-PCC data units belonging to the same presentation time.

Figure 3 shows the syntax structure of encapsulating point cloud samples based on TLV code stream format in one embodiment of the present application. Among them, each point cloud sample consists of one or more data units G-PCC unit. gpcc_unit contains a single G-PCC data unit. G-PCC data units in the same point cloud sample correspond to the same point cloud frame and belong to the same presentation time. TLV code stream format, namely Type-length-value bytestream format, refers to a structure composed of data type Type, data length Length and data value Value. For specific information about the TLV code stream format, please refer to the standard ISO/IEC 23090-9.

Figure 4 shows the syntax structure of a data unit encapsulated based on the TLV code stream format in one embodiment of the present application. Among them, tlv_type is a type field used to indicate the type of data unit. Table 1 shows the semantic description of different values of the data unit type field in an embodiment of the present application.

Table 1

类型tlv_typetypetlv_type	语义描述Description Semantic descriptionDescription
00	Sequence parameter setSequence parameter set
11	Geometry parameter setGeometry parameter set
22	Geometry data unit Geometry data unit
33	Attribute parameter setAttribute parameter set
44	Attribute data unit Attribute data unit
55	Tile inventory Tile inventory
66	Frame boundary markerFrame boundary marker

As shown in Table 1, the type field with different values can be used to indicate different data unit types.

When the value of the type field is 0, it means that the type of the data unit is SPS (Sequence Parameter Set).

When the value of the type field is 1, it means that the type of the data unit is the geometry parameter set GPS (Geometry Parameter Set).

When the value of the type field is 2, it means that the type of the data unit is geometry data unit.

When the value of the type field is 3, it means that the type of the data unit is attribute parameter set APS (Attribute Parameter Set).

When the value of the type field is 4, it means that the type of the data unit is attribute data unit.

When the value of the type field is 5, it indicates that the type of the data unit is the tile collection Tile inventory.

When the value of the type field is 6, it indicates that the type of the data unit is Frame boundary marker.

For specific information on the above data unit types, please refer to the standard ISO/IEC 23090-9.

In the related technology of this application, each point cloud frame of the point cloud media is encoded separately, which will lead to the following two problems in the encoding and decoding process of the point cloud media. One issue is that in frame-based point cloud content, the file size of each frame may be relatively small, which is less efficient for the I/O interface. Another problem is that the decoder needs to run from the initial bounding box and divide each single frame. There is a lot of overhead in initializing the decoder in edge devices.

The first problem can be easily solved by concatenating the encoded bitstream of consecutive frames. However, the second problem is difficult to avoid unless the point clouds are combined before encoding. The embodiment of this application proposes a combined frame coding solution to solve these two problems by introducing frame index coding into the combined point cloud. In addition, the embodiments of the present application can greatly improve the coding efficiency, and are therefore also beneficial to the storage and use of frame-based point cloud content.

When combining frame coding technology, multiple point cloud frames in the original sequence are combined and encoded. For example, there are 100 point cloud frames in the original sequence, and they are combined every 4 frames to reconstruct the original point cloud sequence. A 25-frame point cloud sequence is obtained and then encoded. This encoding method can greatly improve the encoding efficiency for scenes with fewer points in each frame or strong correlation between the previous and next frames.

Figure 5 shows a schematic diagram of the principle of multi-frame combination of point cloud data in one embodiment of the present application. As shown in Figure 5, a combined frame can be formed by combining the first point cloud frame Frame1 and the second point cloud frame Frame2. When combined coding technology that combines multiple frames is used, each frame of the newly obtained sequence will contain multiple point cloud subframes. Point cloud subframes are composed of points with the same frame number or frame index attribute value. Partial representation of a point cloud frame. When two or more point cloud frames are related to each other, the single octree of each point cloud frame has a similar structure at a higher level; in the leaf nodes of the combined frame, there are also some from different frames. Duplicate content, that is, overlapping point cloud data.

The following describes the point cloud media encoding method, point cloud media decoding method, point cloud media encoding device, point cloud media decoding device, computer readable medium, electronic equipment, computer program products and other technologies provided in this application in conjunction with the specific embodiments. The plan is explained in detail. Various technical solutions in the embodiments of this application can be applied to the server side, player side or intermediate nodes of the immersive media system.

Figure 6 shows a step flow chart of the point cloud media decoding method in one embodiment of the present application. This method can be applied to various electronic devices in the server, client or intermediate node of the point cloud media system. The present application The embodiment takes a point cloud media decoding method executed by a client device installed with a point cloud decoding device as an example. As shown in Figure 6, the point cloud media decoding method includes the following steps S610 to S640.

Step S610: Obtain a point cloud media file. The point cloud media file includes point cloud samples encapsulated in one or more tracks;

Step S620: Analyze the media file data box of each sub-sample in the point cloud sample to obtain the value of the sub-sample flag field;

Step S630: Obtain the index information of one or more point cloud subframes corresponding to each data unit in the subsample according to the value of the subsample flag field; when one data unit in the subsample corresponds to at least two point cloud subframes When providing frame index information, at least two point cloud subframes have overlapping point cloud data; and

Step S640: Decapsulate and decode the point cloud media file according to the index information of one or more point cloud subframes to obtain point cloud data.

Subsamples are data encapsulation units in point cloud samples. The subsample flag field can also be used to indicate the division method of subsamples. Different types of subsamples can be divided into point cloud samples based on different division methods. For example, based on Different dimensions such as data units, spatial blocks or point cloud subframes can be divided into point cloud samples to obtain subsamples with different data capacities.

A point cloud subframe is a partial representation of a point cloud frame consisting of points with the same index information (such as frame number or frame index attribute value). When a point cloud frame is a combined frame formed by combining multiple point cloud frames, each point cloud frame in the combined frame constitutes a point cloud subframe.

In the point cloud media decoding method provided by the embodiment of the present application, through the media file data box of each sub-sample in the point cloud sample, the index information of each data unit in the sub-sample and one or more point cloud sub-frames can be indicated. Through the corresponding relationship between each other, one or more point cloud subframes corresponding to each data unit in the point cloud sample can be jointly decoded as a combined frame. On the one hand, it can reduce the need to separately decode point cloud frames with less content. The resulting waste of computing resources can, on the other hand, identify point cloud subframes with overlapping point cloud data and improve the decoding efficiency of point cloud media.

The specific implementation of each method step in the point cloud media decoding method of the present application will be described in detail below in conjunction with multiple embodiments.

In step S610, a point cloud media file is obtained. The point cloud media file includes point cloud samples encapsulated in one or more tracks.

The point cloud media file may be a media file or media segment obtained after encoding and encapsulation processing as shown in Figure 2. The media file or media segment carries a point cloud code stream to be transmitted.

In one embodiment of the present application, the data source can encapsulate the point cloud code stream into a single track based on the geometric parameter information, attribute parameter information and point cloud slice parameter information contained in the point cloud code stream, or it can also encapsulate a single track The point cloud media file of the track is repackaged into a point cloud media file containing multiple tracks.

A track refers to a volumetric visual track used to carry a coded geometry bitstream or a coded attribute bitstream, or a volumetric visual track that carries both a coded geometry bitstream and a coded attribute bitstream.

When the point cloud code stream is packaged in a single track, each point cloud sample can correspond to a complete point cloud frame.

In step S620, the media file data box of each sub-sample in the point cloud sample is analyzed to obtain the value of the sub-sample flag field.

The media file data box may be a data box based on the ISO basic media file format ISOBMFF (ISO Base Media File Format). For specific information on ISOBMFF, please refer to the standard ISO/IEC 14496-12.

When the G-PCC codestream is carried in a single track, simple ISOBMFF encapsulation can be utilized by storing the G-PCC codestream in a single track without further processing.

Figure 7 shows an exemplary structure of encapsulating point cloud samples in a single track according to one embodiment of the present application. Among them, moov represents the metadata information of the point cloud sample; the metadata information includes "trak", "stbl", "stsd", "gpe1", "gpcC", "xPS", "stsz" and "subs", etc. Various fields; mdat represents the specific media data carried in the point cloud sample, including each point cloud sample. As shown in Figure 7, the component Component can still be described by a "subs" data box with a flag value of flags=0, and the subframe index subframe_idx is provided by another "subs" data box with a flag value of flags=2. This is specified by ISOBMFF: when multiple SubSampleInformationBoxes exist in the same container box, the flag value in each SubSampleInformationBoxes should be different.

Continuing to refer to Figure 7, in a "subs" data box with a flag value of flags=2, for each point cloud sample Sample 1...Sample n, it contains one or more subsamples based on subframe division, for example Point cloud sample Sample 1 contains X subsamples, which correspond to multiple point cloud subframes with subframe index values subframe_idx=1...X. Each point cloud subframe includes different data units, such as the geometry data unit (geometry data unit), attribute data unit (attribute data unit) and frame index attribute data unit (frame index attribute data unit) shown in the figure. ).

Figure 8 shows an exemplary structure of encapsulating geometry code streams and attribute code streams in multiple tracks according to one embodiment of the present application. Among them, ftyp represents the file type and describes the version of the specification that the point cloud sample complies with; moov represents the metadata information of the point cloud sample; mdat represents the specific media data carried in the point cloud sample.

As shown in Figure 8, in multi-track packaging mode, the code stream data of each point cloud component is mapped to a separate track. There are two types of G-PCC component tracks: G-PCC geometry track and G-PCC attribute track. Each point cloud sample in the track contains at least one data unit G-PCC unit, which carries a single G-PCC component data unit instead of a geometry and attribute data unit or a multiplexing of different attribute data units. G-PCC attribute tracks should not reuse different attribute substreams, such as color and reflectivity.

In one embodiment of the present application, the relevant information of the point cloud subframe can be identified by extending the encoding-related parameter field of the subsample information data box SubSampleInformationBox. In this embodiment of the present application, the subsample flag field is also used to indicate the division method of the subsamples, and the division method of each subsample in the point cloud sample may include:

When the value of the sub-sample flag field is the first value, divide the sub-samples based on the data unit so that one sub-sample contains one data unit;

When the value of the sub-sample flag field is the second value, the sub-samples are divided based on the spatial block, so that one sub-sample contains one or more continuous data units corresponding to a first division object, and the first division object Including at least one of spatial block, parameter set, spatial block set information or frame boundary identification; and

When the value of the subsample flag field is the third value, the subsamples are divided based on the point cloud subframe, so that one subsample contains one or more continuous data units corresponding to a second division object, and the second The divided object consists of a complete point cloud subframe.

In this embodiment, various sub-sample division methods such as data unit division method, spatial block division method, and space block division method are distinguished through different values of the sub-sample flag field, so that sub-samples can be divided according to different division methods. Dividing facilitates decoding of data unit combinations in subsamples with different data capacities, which can reduce waste of computing resources and improve decoding processing efficiency.

In one embodiment of the present application, when the value of the sub-sample flag field is the third value, the media file data box of the sub-sample includes:

Subframe index field. The subframe index field is used to indicate the index information of the point cloud subframe contained in the current subsample. Therefore, the point cloud subframe can be determined through the index information of the point cloud subframe in the subframe index field, and one or more point cloud subframes corresponding to each data unit in the point cloud sample can be jointly decoded as a combined frame. On the one hand, it can It reduces the waste of computing resources caused by separately decoding point cloud frames with less content. On the other hand, it can identify point cloud subframes with overlapping point cloud data and improve the decoding efficiency of point cloud media.

Taking the ISOBMFF data box as an example, for the subsample information data box SubSampleInformationBox in the point cloud file, the subsample definition should be based on the value of the flag field in the SubSampleInformationBox data box.

Regarding the implementation of the above multiple embodiments in specific application scenarios, Figure 9 shows the syntax structure of the coding-related parameter field codec_specific_parameters of the SubSampleInformationBox data box in an application scenario according to the embodiment of the present application.

As shown in Figure 9, when the value of the sub-sample flag field flags is 0, it means that the sub-sample division method based on the G-PCC data unit is adopted. A subsample contains only one G-PCC unit.

On this basis, the subsample information data box SubSampleInformationBox can include the following fields:

payloadType, indicating the tlv_type type of the G-PCC unit contained in the subsample;

attrIdx indicates the value of the ash_attr_sps_attr_idx field corresponding to the attribute data contained in the subsample.

When the value of the sub-sample flag field flags is 1, it indicates that the sub-sample division method based on spatial tiles is adopted. A subsample contains one or more continuous data units corresponding to a spatial block tile, or one or more continuous data units corresponding to a parameter set, spatial block set information, or frame boundary identification.

tile_data, when the value is 1, it means that the sub-sample contains the geometric data or attribute data of the corresponding tile; when the value is 0, it means that the sub-sample contains parameter set data, tile geometry information or frame boundary identification.

tile_id, indicating the tile index number associated with the data in the subsample.

When the value of the subsample flag field flags is 2, it indicates that the subframe-based subsample division method is used. A subsample contains continuous data units corresponding to a complete point cloud subframe.

The subframe index field subframe_idx indicates the value of the frame number attribute corresponding to the point cloud subframe contained in the current subsample.

In one embodiment of the present application, in order to unify two application scenarios where subframes do not overlap at all and subframes overlap, the division method of each subsample in the point cloud sample may include:

When the value of the subsample flag field is the third value, the subsamples are divided based on the point cloud subframe, so that one subsample contains one or more continuous data units corresponding to a second division object, and the second The division object includes one or more point cloud subframes.

In this embodiment, through different values of the sub-sample flag field, various sub-sample division methods such as data unit division method, spatial block division method and spatial block division method are distinguished, and the sub-frames can be divided into overlapping or non-overlapping divisions. Overlapping division enables decoding of combinations of data units in subsamples with different data capacities, which can reduce waste of computing resources and improve decoding processing efficiency.

The subframe complete flag field is used to indicate whether the current subsample contains all data that constitutes the point cloud subframe;

The number of subframes field is used to indicate the number of point cloud subframes corresponding to the current subsample; and

The subframe index field is used to indicate the index information of the point cloud subframe corresponding to the current subsample.

In this embodiment, all the data constituting the point cloud subframe is indicated through the subframe complete flag field, the number of point cloud subframes is indicated through the subframe number field, and the index information of the point cloud subframe is indicated through the subframe index field, so that The corresponding point cloud subframe can be described in detail through the media file data box of the subsample, so that one or more point cloud subframes corresponding to each data unit in the point cloud sample can be jointly decoded as a combined frame. On the one hand, it can reduce There is a waste of computing resources caused by separate decoding of point cloud frames with less content. On the other hand, point cloud subframes with overlapping point cloud data can be identified to improve the decoding efficiency of point cloud media.

In one embodiment of the present application, when the point cloud sample is packaged in one track, all data constituting the point cloud subframe include all geometric data and all attribute data; when the point cloud sample is packaged in multiple tracks, All data of point cloud subframes include all geometric data or all attribute data.

Regarding the implementation of the above multiple embodiments in specific application scenarios, Figure 10 shows the syntax structure of the coding-related parameter field codec_specific_parameters of the SubSampleInformationBox data box in the application scenario of uniform identification of subframe overlap/non-overlap according to the embodiment of the present application.

As shown in Figure 10, when the value of the sub-sample flag field flags is 0, it means that the sub-sample division method based on the G-PCC data unit is adopted. A subsample contains only one data unit.

payloadType, indicating the tlv_type type of the data unit contained in the subsample;

When the value of the subsample flag field flags is 2, it indicates that the subframe-based subsample division method is used. A subsample contains one or more continuous data units, corresponding to one or more point cloud subframes.

On this basis, the subsample information data box SubSampleInformationBox can include the following fields.

Subframe complete flag field complete_subframe_flag. When the value is 1, it means that the data unit corresponding to the current subsample contains all the data that constitutes the corresponding subframe; when the value is 0, it means that the data unit corresponding to the current subsample contains all the data that constitutes the corresponding subframe. Partial data of the subframe. (In single-track packaging mode, all data refers to all geometry and attribute data; in multi-track packaging mode, all data refers to all geometric data or attribute data of all feature types)

The subframe number field num_subframes indicates the number of subframes corresponding to the data unit in the subsample. When the value of this field is 1, the subframe corresponding to the current subsample is indicated by subframe_idx; when the value of this field is greater than 1, the subframe corresponding to the current subsample is the subframe indicated by subframe_idx and the subsequent num_subframes-1 Subframes of consecutive frame numbers.

In this embodiment, for single-track encapsulation and multi-track encapsulation, the encapsulation tracks of geometric data and attribute data can be flexibly set, which can be applied to various application scenarios and ensure decoding processing efficiency in different scenarios.

In one embodiment, when the value of the subsample flag field is the first value, the media file data box of the point cloud subframe includes: a related subframe number field, used to indicate the point cloud subframe corresponding to the current subsample. The number; and the subframe index field, used to indicate the index information of the point cloud subframe corresponding to the current subsample. Specifically, the sample group tool in the extended media file data box can be used to identify the relevant information of the point cloud subframe, including the related subframe number field and the subframe index field, so that the number and index information of the point cloud subframe can be described, so as to One or more point cloud subframes corresponding to each data unit in the point cloud sample are jointly decoded as a combined frame. On the one hand, it can reduce the waste of computing resources caused by individually decoding point cloud frames with less content. On the other hand, it can Identify point cloud subframes with overlapping point cloud data to improve the decoding efficiency of point cloud media.

In one embodiment of the present application, the relevant information of the point cloud subframe can be identified through the sample group tool in the extended media file data box. In this embodiment of the present application, when the value of the subsample flag field is the first value, the media file data box of the point cloud subframe includes:

Subsample number field, used to indicate the number of subsamples included in the current sample;

The related subframe number field is used to indicate the number of point cloud subframes corresponding to the current subsample; and

In this embodiment, the media file data box of the point cloud subframe includes a subsample number field, a related subframe number field, and a subframe index field, which can describe the number of subsamples, as well as the number and index information of point cloud subframes. , so that one or more point cloud subframes corresponding to each data unit in the point cloud sample can be jointly decoded as a combined frame. On the one hand, it can reduce the waste of computing resources caused by individually decoding point cloud frames with less content. On the other hand, On the one hand, it can identify point cloud subframes with overlapping point cloud data and improve the decoding efficiency of point cloud media.

Regarding the implementation of the above embodiment in a specific application scenario, Figure 11 shows the syntax structure of the extended sample group tool in an application scenario according to the embodiment of the present application.

As shown in Figure 11, the sample group tool in the media file data box can include the following fields:

The subsample number field subsample_count indicates the number of subsamples contained in the current sample.

The related subframe number field related_subframe_num indicates the number of point cloud subframes corresponding to the current subsample. When the value of this field is 0, it means that the information contained in the current subsample has nothing to do with the point cloud subframe division (such as tile set information or frame end identifier).

The subframe index field subframe_index indicates the point cloud subframe sequence number corresponding to the current subsample. The value of this sequence number is the same as the value in the frame number attribute.

In some optional implementations, the information related to the sub-frame of the point cloud sub-frame may not be indicated at the sub-sample level, but only the information related to the sub-frame of the point cloud sub-frame may be indicated at the point cloud sample sample level. That is, in the case of overlapping point cloud subframes, only the samples with point cloud subframes and the corresponding point cloud subframe index numbers are identified at the system layer. Figure 12 shows the syntax structure of identifying point cloud subframe related information based on the point cloud sample level media file data box in an application scenario according to the embodiment of the present application.

As shown in Figure 12, the media file data box in this embodiment of the present application may include the following fields:

The related subframe number field related_subframe_num indicates the number of point cloud subframes corresponding to the current sample.

The subframe index field subframe_index indicates the subframe sequence number corresponding to the current sample. The value of this sequence number should be the same as the value in the frame number attribute.

In one embodiment of the present application, the information related to the point cloud subframe can be identified by defining a subsample subframe information data box SubsampleSubframeInfoBox. In this embodiment of the present application, when the value of the subsample flag field is the first value, the media file data box of the point cloud subframe includes:

Subframe related sample number field, used to indicate the number of point cloud samples containing multiple point cloud subframes;

The sample serial number difference field is used to indicate the serial number difference between the current point cloud sample containing multiple point cloud subframes and the previous point cloud sample containing multiple point cloud subframes in the decoding order;

Subsample number field, used to indicate the number of subsamples contained in the current point cloud sample;

The related subframe number field is used to indicate the number of point cloud subframes corresponding to the current subsample;

In this embodiment, the media file data box of the point cloud subframe also includes a subframe-related sample number field and a sample serial number difference field to calculate the number of point cloud samples in the point cloud subframe and the serial number difference between point cloud samples. Values are described to facilitate the combined decoding processing of each point cloud sample in order, which can reduce the waste of computing resources and improve the efficiency of decoding processing.

Regarding the implementation of the above multiple embodiments in specific application scenarios, Figure 13 shows the syntax structure of identifying subframe related information through the subsample subframe information data box SubsampleSubframeInfoBox in an application scenario according to the embodiment of the present application. The data box type of the subsample subframe information data box SubsampleSubframeInfoBox may be 'sbfi', for example, and is included in SampleEntry or TrackFragmentBox. The subsample subframe information data box is used to indicate the subframe information corresponding to each subsample divided based on the G-PCC data unit in a point cloud sample containing multiple subframes. When the data box exists in the track, the sub-sample flag field in the sub-sample information data box must have a value of 0, that is, the sub-sample division method based on data units is adopted.

As shown in Figure 13, the subsample subframe information data box SubsampleSubframeInfoBox can include the following fields:

The subframe related sample number field subframe_related_sample_num indicates the number of point cloud samples containing multiple point cloud subframes.

The sample serial number difference field sample_delta indicates the difference between the current sample serial number containing multiple subframes and the previous sample serial number containing multiple subframes in the decoding order. For the first point cloud sample containing multiple point cloud subframes, the value of this field is the serial number of the point cloud sample.

In one embodiment of the present application, the presentation time of the point cloud subframe can be indicated through the media file data box. In the embodiment of this application, when each sub-sample in the point cloud sample is divided into sub-samples based on point cloud sub-frames, the media file data box of the sub-sample includes:

The presentation time flag field is used to indicate whether each point cloud subframe included in the point cloud sample has the same presentation duration; and

The subsample duration field is used to indicate the presentation duration of the current subsample when each point cloud subframe contained in the point cloud sample has a different presentation duration.

In this embodiment, the presentation time flag field in the media file data box can be used to indicate whether the presentation duration of the point cloud subframes is the same, and the subsample duration field can be used to indicate the presentation duration of the subsamples, so that each point cloud subframe can be The presentation duration of the sub-sample is indicated so that the display can be performed based on the presentation duration to ensure the media presentation effect.

Regarding the implementation of the above embodiments in specific application scenarios, Figure 14 shows the syntax structure of identifying subframe presentation time information through the subsample subframe information data box SubsampleSubframeInfoBox in an application scenario according to the embodiment of the present application.

As shown in Figure 14, the coding-related parameter field codec_specific_parameters in the SubsampleSubframeInfoBox includes the following fields:

For the presentation time flag field with_unique_duration_flag, a value of 0 indicates that multiple subframes included in the sample have the same presentation time. The presentation duration of each subframe can be calculated based on the presentation time of the sample itself and the number of subframes in the sample. At this time, the value of the sample_delta field corresponding to the sample should be an integer multiple of the number of subframes. A value of 1 indicates that multiple subframes included in the sample have different presentation durations.

The subsample duration field sub_sample_duration indicates the presentation duration of the subsample. The sum of the values of this field for multiple subsamples in the sample should be equal to the value of the sample_delta field corresponding to the sample.

In one embodiment of the present application, the presentation duration of each subframe of two subframe scenarios can be uniformly indicated by extending the media file data box. On this basis, the media file data box of the point cloud sample includes:

The number of subframes field is used to indicate the number of point cloud subframes contained in the current point cloud sample;

The presentation time flag field is used to indicate whether each point cloud subframe contained in the point cloud sample has the same presentation duration;

The subframe index field is used to indicate the index information of the point cloud subframe corresponding to the current subsample when each point cloud subframe included in the point cloud sample has different presentation duration; and

In this embodiment, the number of point cloud subframes can be indicated through the subframe number field, and the index information of the point cloud subframe can be indicated through the subframe index field, so that one or more point clouds corresponding to each data unit in the point cloud sample can be Subframes are jointly decoded as combined frames. On the one hand, it can reduce the waste of computing resources caused by separately decoding point cloud frames with less content. On the other hand, it can identify point cloud subframes with overlapping point cloud data, improving point cloud media. decoding efficiency, and the presentation time flag field in the media file data box indicates whether the presentation duration of the point cloud subframes is the same, and the subsample duration field indicates the presentation duration of the subsamples, so that each point cloud subframe can be The presentation duration of the sub-sample is indicated so that the display can be performed based on the presentation duration to ensure the media presentation effect.

Regarding the implementation of the above embodiment in a specific application scenario, Figure 15 shows the syntax structure of the extended media file data box identifying subframe presentation time information in an application scenario according to the embodiment of the present application.

As shown in Figure 15, the media file data box in the embodiment of this application includes the following fields:

The subframe number field nb_subframes indicates the number of subframes corresponding to the current sample.

For the presentation time flag field with_unique_duration_flag, a value of 0 indicates that multiple corresponding subframes in the sample have the same presentation time. The presentation duration of each subframe can be calculated based on the presentation time of the sample itself and the number of subframes in the sample. At this time, the value of the sample_delta field corresponding to the sample should be an integer multiple of the number of subframes. A value of 1 indicates that multiple subframes included in the sample have different presentation durations.

The subframe index field subframe_index indicates the sequence number of the point cloud subframe.

The subsample duration field sub_sample_duration indicates the presentation duration of the corresponding subframe. The sum of the values of this field for all corresponding subframes in the sample should be equal to the value of the sample_delta field corresponding to the sample.

Based on the solution of identifying subframe related information through media file data boxes provided in the above embodiments of the present application, the spatial information of the point cloud subframe can be obtained implicitly. When multiple sub-sample division methods coexist, the corresponding relationship between the point cloud sub-frame and the spatial block tile can be found based on the information carried in each sub-sample division method.

Figure 16 shows a structural block diagram of determining the spatial block correspondence based on three sub-sample division methods of data unit, spatial block and point cloud subframe in one embodiment of the present application. When the flag value of the sub-sample sub-spample is flag=0, it corresponds to a G-PCC unit; when the flag value of the sub-sample sub-spample is flag=1, it corresponds to a point cloud space tile; the sub-sample of the sub-spample When the flag value flag=2, it corresponds to a sub-frame.

As shown in Figure 16, two point cloud subsamples can be divided based on the point cloud subframe, the first subframe with a subframe index value of subframe index=0 and the second subframe with a subframe index value of subframe index=1 frame.

By performing sub-sample division based on spatial blocks, a first spatial block tile0 and a second spatial block tile1 corresponding to the first subframe, and a third spatial block tile2 corresponding to the second subframe may be determined.

By dividing subsamples based on data units, multiple data units corresponding to the first spatial block tile0 can be determined, such as the geometric slice Geo slice0, the geometric slice Geo slice1, the color attribute slice Attr color slice0, and the color attribute slice as shown in the figure. Attr color slice1, frame index attribute slice Attr frameIdx slice0, frame index attribute slice Attr frameIdx slice1.

At the same time, multiple data units corresponding to the second space partition tile1 can be determined, such as the geometric slice Geo slice2, the geometric slice Geo slice3, the color attribute slice Attr color slice2, the color attribute slice Attr color slice3, and the frame index attribute as shown in the figure. Slice Attr frameIdx slice2, frame index attribute slice Attr frameIdx slice3.

In addition, multiple data units corresponding to the third space partition tile2 can also be determined, such as the geometric slice Geo slice4, the geometric slice Geo slice5, the color attribute slice Attr color slice4, the color attribute slice Attr color slice5, and the frame index as shown in the figure. Attr frameIdx slice4, frame index attribute slice Attr frameIdx slice5.

As shown in Figure 17, by performing subsample division based on spatial tiles, the first spatial tile tile0 and the second spatial tile tile1 can be determined.

By dividing subsamples based on data units, multiple data units corresponding to the first spatial block tile0 can be determined, such as the geometric slice Geo slice0, the color attribute slice Attr color slice1 and the corresponding sub-frame sub-frame0 as shown in the figure. The frame index attribute slice Attr frameIdx slice0, and the geometry slice Geo slice1, color attribute slice Attr color slice1 and frame index attribute slice Attr frameIdx slice1 corresponding to sub-frame0 and sub-frame1.

At the same time, multiple data units corresponding to the second spatial block tile1 can be determined, such as the geometric slice Geo slice2, the color attribute slice Attr color slice2, and the frame index corresponding to the sub-frames sub-frame0 and sub-frame1 as shown in the figure. The attribute slice Attr frameIdx slice2, as well as the geometry slice Geo slice3, the color attribute slice Attr color slice3 and the frame index attribute slice Attr frameIdx slice3 corresponding to the sub-frame sub-frame1.

In one embodiment of the present application, the correspondence between the point cloud subframes and the spatial blocks can also be explicitly indicated in the media file data box. In the embodiment of this application, the media file data box of the point cloud sample includes:

The spatial block flag field is used to indicate whether the point cloud subframe in the current sample corresponds to one or more different spatial blocks;

Subframe index field, used to indicate the index information of the current point cloud subframe;

The spatial block number field is used to indicate the number of spatial blocks corresponding to the current point cloud subframe; and

Space block identification field, used to indicate the identifier of the current space block.

In the example of the subframe index field in this implementation, the number of spatial blocks corresponding to the point cloud subframe can be indicated through the spatial block flag field, the index information of the point cloud subframe can be indicated through the subframe index field, and the index information of the point cloud subframe can be indicated through the spatial block quantity field. The number of spatial blocks, indicating the identifier of the current spatial block through the spatial block identification field, so that based on the information in each field in the media file data box of the point cloud sample, one or more corresponding data units in the point cloud sample can be Point cloud sub-frames are jointly decoded as combined frames. On the one hand, it can reduce the waste of computing resources caused by individually decoding point cloud frames with less content.

Regarding the specific implementation of the above embodiments, Figure 18 shows the syntax structure of the embodiment of the present application for indicating the correspondence between point cloud subframes and spatial blocks through media file data boxes in an application scenario.

As shown in Figure 18, the media file data box in this embodiment of the present application may include the following fields:

Spatial tile flag field with_tile_info_flag. When the value is 1, it means that the subframes in the current sample correspond to one or more different point cloud spatial tiles. When the value is 0, it means that the subframes in the current sample cannot be divided according to the point cloud. Divide space into blocks.

The spatial tile number field num_tiles indicates the number of point cloud spatial tiles corresponding to the corresponding point cloud subframe.

The spatial tile identification field tile_id indicates the identifier of the corresponding point cloud spatial tile.

Figure 19 shows a step flow chart of the point cloud media encoding method in one embodiment of the present application. This method can be applied to electronic devices in the server, client, intermediate node and other links of the point cloud media system. The embodiment of the present application is based on A client device installed with a point cloud encoding device executes a point cloud media encoding method as an example. As shown in Figure 19, the point cloud media encoding method includes the following steps S1910 to S1930.

In step S1910, point cloud source data is obtained, and the point cloud source data includes a point cloud frame having one or more point cloud subframes.

In step S1920, the point cloud frame is encoded to obtain at least one data unit.

In step S1930, at least one data unit is encapsulated to obtain a point cloud media file. The point cloud media file includes point cloud samples encapsulated in one or more tracks; media file data of each subsample in the point cloud sample. The box includes a subframe index field; the subframe index field is used to indicate the index information of one or more point cloud subframes corresponding to each data unit in the subsample; when one data unit in the subsample corresponds to at least two point clouds When the index information of the subframe is included, at least two point cloud subframes have overlapping point cloud data.

Point cloud source data includes point cloud videos (images and/or videos) representing objects and/or environments located in various 3D spaces (eg, 3D spaces representing real environments, 3D spaces representing virtual environments, etc.).

In one embodiment of the present application, the data source may use one or more cameras (for example, an infrared camera capable of protecting depth information, an RGB camera capable of extracting color information corresponding to depth information, etc.), a projector (such as , infrared pattern projectors used to protect depth information), LiDRA and other acquisition devices to capture point cloud source data. The shape of the geometric structure composed of points in the 3D space can be extracted from the depth information of the point cloud source data, and the attributes of each point can be extracted from the color information of the point cloud source data to protect the point cloud source data.

Taking point cloud video data as an example, a point cloud video can include one or more point cloud frames, and one point cloud frame can represent one frame of point cloud image. In one embodiment of the present application, point cloud video data may be captured based on at least one of inward-facing technology and outward-facing technology.

Inward-facing technology refers to a technology that captures images of a central object with one or more cameras (or camera sensors) arranged around the central object. Inward-facing techniques can be used to generate point cloud content that provides the user with 360-degree images of key objects (e.g., VR/AR that provides the user with 360-degree images of key objects such as characters, players, objects, or actors). content).

Outward-facing technology refers to a technology that uses one or more cameras (or camera sensors) arranged around the central object to capture the environment of the central object rather than the image of the central object. Point cloud content that provides the surrounding environment as it appears from the user's perspective may be generated using outward-facing techniques (eg, content representing the external environment that may be provided to a user of a self-driving vehicle).

When generating point cloud content based on capture operations from one or more cameras, the coordinate system is different within each camera. Therefore, the data source can calibrate one or more cameras to set the global coordinate system prior to the capture operation. . Additionally, the data source may generate point cloud content by compositing arbitrary images and/or videos with images and/or videos captured via the capture techniques described above. The data source may perform post-processing on the captured images and/or videos, which may, for example, remove unwanted areas (such as background), identify the spaces to which the captured images and/or videos are connected, and perform filling when spatial holes are present The operation of space holes and so on.

The data source can generate a piece of point cloud content by performing coordinate transformations on the points of the point cloud video secured from each camera. The data source can perform coordinate transformations on points based on the coordinates of each camera location. Therefore, the data source can generate a point cloud content that represents a broad spatial extent, or it can generate point cloud content with a high density of points.

In this embodiment, the corresponding relationship between each data unit in the sub-sample and the index information of one or more point cloud sub-frames is indicated through the media file data box of each sub-sample in the point cloud sample, so that the corresponding relationship can be achieved. One or more point cloud subframes corresponding to each data unit in the point cloud sample are jointly encoded as a combined frame. On the one hand, it can reduce the waste of computing resources caused by separately encoding point cloud frames with less content. On the other hand, it can Identify point cloud subframes with overlapping point cloud data to improve the coding efficiency of point cloud media.

Figure 20 shows a flow chart of encoding and decoding point cloud data in a streaming media transmission application scenario according to an embodiment of the present application. As shown in Figure 20, the server, as the data source for producing point cloud media files, can encode and send the point cloud data to the user's client. After decoding the point cloud media files through the client, the point cloud data can be obtained for use. User consumption. The specific point cloud data encoding and decoding process may include the following steps.

Step S2010: The server determines one or more subframes corresponding to each geometric slice according to the subframe index number corresponding to each geometric slice in the point cloud code stream.

If each geometric slice only corresponds to one subframe, that is, the point cloud slices contained in each subframe do not overlap with each other, then when encapsulating the subsamples of the point cloud subframe, use the subsample division method of flags=2 and indicate the corresponding subframe index number.

If there are geometric slices corresponding to multiple subframes, that is, the point cloud slices contained in each subframe overlap, then when encapsulating the subsamples of the point cloud subframes, use the subsample division method of flags=0 and indicate it through metadata The corresponding subframe index number.

Step S2020: The server encapsulates the point cloud code stream into a point cloud file, in which the point cloud subframes are divided and indicated in the form of subsamples.

The server's encapsulation of point cloud code streams can be single-track encapsulation or component-based multi-track encapsulation.

When performing single-track encapsulation, in the sub-sample division method with flags=2, the sub-samples contain all geometric and attribute information corresponding to the corresponding sub-frame. In the subsample division method with flags=0, subsamples of geometric data type, attribute data type, and parameter data type can all correspond to corresponding subframes.

When multi-track packaging is performed, in the sub-sample division method with flags=2, only the geometric track is divided into sub-samples, and the sub-samples contain all geometric information corresponding to the corresponding sub-frame. In the subsample division method with flags=0, only subsamples of geometric data type and parameter data type can be mapped to the corresponding subframe.

Step S2030: For the point cloud subframes existing in the file, indicate the presentation time information of the point cloud subframes contained in these samples.

Step S2040: For the spatial information of the point cloud subframe, indicate the corresponding relationship between the point cloud subframe and the tile.

Step S2050: The server transmits the point cloud file to the client.

Step S2060: When the client decapsulates and decodes the point cloud file, it extracts each point cloud subframe based on the information related to the point cloud subframe.

Step S2070: After the client reorders the point cloud sequence, it combines the presentation time information of the point cloud subframes for presentation.

The embodiment of this application proposes a file encapsulation method for point cloud subframes for GPCC point cloud media. At the file encapsulation level, this file encapsulation method defines the way point cloud subframes are encapsulated in samples under different scenarios, indicates the identity and duration of point cloud subframes, and indicates the correspondence between point cloud subframes and point cloud spatial blocks. This application can more flexibly support the encapsulation of point cloud subframes in files, thereby supporting more application scenarios and maximizing the coding efficiency improvement brought by point cloud subframes.

An implementation scheme of the embodiment of the present application in an application scenario where point cloud sub-frames do not overlap is as follows.

(1) The server determines the subframe corresponding to each geometric slice based on the subframe index number corresponding to each geometric slice in the point cloud code stream.

Assuming that the point cloud slices contained in each subframe do not overlap with each other, when encapsulating subsamples of the point cloud subframe, the subsample division method of flags=2 is used and the corresponding subframe index number is indicated.

Server S1 encapsulates the point cloud code stream into point cloud file F1 in a single-track manner, and the file encapsulation result is shown in Figure 21. In samples where point cloud subframes exist, the point cloud subframes are divided and indicated in the form of subsamples. The flags field in the SubSampleInformationBox data box has a value of 2, indicating that the sub-frame indexes are 1 and 2 in sub-sample0 and sub-sample1 respectively. Each point cloud sample corresponds to its own point cloud frame, including point cloud sample Sample 0, point cloud sample Sample1 and point cloud sample SampleN. Among them, point cloud sample Sample2 corresponds to point cloud sub-frame sub-frame1 and point cloud sub-frame sub -frame2.

The server S2 encapsulates the point cloud code stream into a point cloud file F2 in a component-based multi-track manner. At this time, the geometric track is divided into sub-samples. For attribute tracks, optionally, the data information belonging to the attribute track can be found through the index relationship between the geometry track and the attribute track. The encapsulation result is shown in Figure 22. In the geometry track and attribute track, each point cloud sample corresponds to its own point cloud frame. In each track, point cloud sample Sample 0, point cloud sample Sample 1 and point cloud sample Sample N all correspond to the corresponding point cloud frame; In the geometry track, the point cloud sample Sample2 corresponds to the point cloud sub-frame sub-frame1 and the point cloud sub-frame sub-frame2; while in the attribute track, the point cloud sample Sample2 also corresponds to the point cloud frame frame.

In addition, the attribute track can also be divided into sub-samples correspondingly, and the encapsulation result is shown in Figure 23. In the geometry track and attribute track, each point cloud sample corresponds to its own point cloud frame. In each track, point cloud sample Sample 0, point cloud sample Sample 1 and point cloud sample Sample N all correspond to the corresponding point cloud frame; In the geometry track and attribute track, the respective point cloud sample Sample2 corresponds to the point cloud sub-frame sub-frame1 and the point cloud sub-frame sub-frame2. In samples where point cloud subframes exist, the point cloud subframes are divided and indicated in the form of subsamples. The flags field in the SubSampleInformationBox data box has a value of 2, indicating that the sub-frame indexes are 1 and 2 in sub-sample0 and sub-sample1 respectively.

(2) For point cloud subframes that exist in the file, indicate the presentation time information of the point cloud subframes contained within these samples.

SubFrameConfigurationGroupEntry:

{nb_subframes=2;

with_unique_duration_flag=1;

{subframe_index=1;sub_sample_duration=10}

{subframe_index=2;sub_sample_duration=20}

}

Through the characteristics of the sample group itself, it can be indexed that sample2 is a sample with a sub-frame, and then by providing fields in the embodiment of this application, the presentation duration of each sub-frame can be known. Respectively are 10 units of timescale (defined by).

(3) For the spatial information of the point cloud subframe, indicate the correspondence between the point cloud subframe and the tile.

SubFrameConfigurationGroupEntry:

{nb_subframes=2;

with_tile_info_flag=1;

{subframe_index=1;num_tiles=2;{tile_id=0,1}}

{subframe_index=2;num_tiles=1;{tile_id=2}}

}

Through the characteristics of the sample group itself, it can be indexed that sample2 is a sample with a sub-frame, and then by providing fields in the embodiment of this application, the tile id information corresponding to each sub-frame can be known.

Finally, combined with the association indication of tile id and spatial information, the spatial information corresponding to each subframe can be known.

(4) The server transmits the point cloud file to the client.

(5) When the client decapsulates and decodes the point cloud file, it extracts each point cloud subframe based on the information related to the point cloud subframe, reorders the point cloud sequence, and combines the presentation time of the point cloud subframe Information and spatial information are presented.

In client implementation, optionally, the point cloud sequence can be reordered during the decapsulation stage and then decoded. It can also be decapsulated and decoded first, and then reordered according to the subframe information.

An implementation scheme of the embodiment of this application in an application scenario where point cloud subframes overlap is as follows.

Assuming that the point cloud slices contained in each subframe overlap, when encapsulating subsamples of the point cloud subframes, the subsample division method with flags=0 is used for division.

Server S1 encapsulates the point cloud code stream into point cloud file F1 in a single-track manner. In samples with point cloud subframes, the flags field in the SubSampleInformationBox data box has a value of 0, and each sub-sample contains a G-PCC. data unit. The packaging result is shown in Figure 24. Each point cloud sample corresponds to its own point cloud frame, including point cloud sample Sample 0, point cloud sample Sample1 and point cloud sample SampleN. Among them, point cloud sample Sample2 corresponds to the gpcc unit, that is, to the gpcc unit.

Combined with the information in the SubsampleSubframeInfoGroupEntry data box, the subframe information to which each G-PCC data unit in each sub-sample in sample2 belongs can be indicated.

{subsample_count=4

{related_subframe_num=1; subframe_index=1}

{related_subframe_num=2; subframe_index=1, 2}

{related_subframe_num=1; subframe_index=2}

}

According to the order of each sub-sample in sample2, the sub-sample can be associated with the corresponding subframe information.

The multi-track mode is encapsulated in sub-sample division, which is the same as the processing method in the previous embodiment. Sub-samples can be divided only in the geometry track, or sub-samples can be divided in both the geometry and attribute tracks.

SubFrameConfigurationGroupEntry:

{nb_subframes=2;

with_unique_duration_flag=0;

}

Through the characteristics of the sample group itself, it can be indexed that sample2 is a sample with a sub-frame, and then by providing fields in the embodiment of this application, it can be known that the presentation duration of each sub-frame is the same. Assuming that the duration of sample2 is 20 units of timescale (defined by), then each sub-frame is 10 units of timescale.

SubFrameConfigurationGroupEntry:

{nb_subframes=2;

with_tile_info_flag=1;

{subframe_index=1;num_tiles=2;{tile_id=0,1}}

{subframe_index=2;num_tiles=2;{tile_id=1,2}}

}

Finally, combined with the correlation indication between tile id and spatial information, the spatial information corresponding to each subframe can be known.

(4) The server transmits the point cloud file to the client.

It should be noted that although the various steps of the methods in this application are described in a specific order in the drawings, this does not require or imply that these steps must be performed in that specific order, or that all of the steps shown must be performed to achieve the desired results. the result of. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

The following describes device embodiments of the present application, which can be used to perform the point cloud media encoding and decoding methods in the above embodiments of the present application. Figure 25 schematically shows a structural block diagram of a point cloud media decoding device provided by an embodiment of the present application. As shown in Figure 25, the point cloud media decoding device 2500 includes:

The acquisition module 2510 is configured to acquire point cloud media files, where the point cloud media files include point cloud samples encapsulated in one or more tracks;

The parsing module 2520 is configured to parse the media file data box of each sub-sample in the point cloud sample to obtain the value of the sub-sample flag field;

The index module 2530 is configured to obtain the index information of one or more point cloud subframes corresponding to each data unit in the subsample according to the value of the subsample flag field; when the subsample When one data unit corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data; and

The decoding module 2540 is configured to decapsulate and decode the point cloud media file according to the index information of the one or more point cloud subframes to obtain point cloud data.

Figure 26 schematically shows a structural block diagram of a point cloud media encoding device provided by an embodiment of the present application. As shown in Figure 26, the point cloud media encoding device 2600 includes:

The acquisition module 2610 is configured to acquire point cloud source data, where the point cloud source data includes a point cloud frame having one or more point cloud subframes;

The encoding module 2620 is configured to encode the point cloud frame to obtain at least one data unit; and

The encapsulating module 2630 is configured to encapsulate the at least one data unit to obtain a point cloud media file. The point cloud media file includes a point cloud sample encapsulated in one or more tracks; in the point cloud sample The media file data box of each subsample includes a subframe index field; the subframe index field is used to indicate the index information of one or more point cloud subframes corresponding to each data unit in the subsample; when the When one data unit in the subsample corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.

The specific details of the point cloud media encoding device and decoding device provided in each embodiment of the present application have been described in detail in the corresponding method embodiments, and will not be described again here.

Figure 27 schematically shows a block diagram of a computer system used to implement an electronic device according to an embodiment of the present application.

It should be noted that the computer system 2700 of the electronic device shown in FIG. 27 is only an example, and should not impose any restrictions on the functions and scope of use of the embodiments of the present application.

As shown in Figure 27, the computer system 2700 includes a central processing unit 2701 (Central Processing Unit, CPU), which can process data according to computer readable instructions stored in a read-only memory 2702 (Read-Only Memory, ROM) or from a storage portion 2708 The computer-readable instructions loaded into the random access memory 2703 (Random Access Memory, RAM) perform various appropriate actions and processes. In the random access memory 2703, various computer readable instructions and data required for system operation are also stored. The central processing unit 2701, the read-only memory 2702 and the random access memory 2703 are connected to each other through a bus 2704. The input/output interface 2705 (Input/Output interface, ie, I/O interface) is also connected to the bus 2704.

The following components are connected to the input/output interface 2705: an input part 2706 including a keyboard, a mouse, etc.; an output part 2707 including a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (Liquid Crystal Display, LCD), etc., and a speaker, etc. ; A storage section 2708 including a hard disk, etc.; and a communication section 2709 including a network interface card such as a LAN card, a modem, etc. The communication section 2709 performs communication processing via a network such as the Internet. Driver 2710 is also connected to input/output interface 2705 as needed. Removable media 2711, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive 2710 as needed so that computer readable instructions read therefrom are installed into the storage portion 2708 as needed.

In particular, according to embodiments of the present application, the processes described in each method flowchart may be implemented as computer-readable instructions. For example, embodiments of the present application include a computer program product including computer-readable instructions carried on a computer-readable medium, the computer-readable instructions including computer-readable instruction code for performing the method illustrated in the flowchart . In such embodiments, the computer readable instructions may be downloaded and installed from the network via communications portion 2709 and/or installed from removable media 2711. When the computer readable instructions are executed by the central processor 2701, various functions defined in the system of the present application are performed.

It should be noted that the computer-readable medium shown in the embodiments of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any of the above suitable The combination. As used herein, a computer-readable storage medium may be any tangible medium that contains or stores computer-readable instructions that may be used by or in connection with an instruction execution system, apparatus, or device. In this application, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can be sent, propagated, or transmitted for use by or in connection with an instruction execution system, apparatus, or device. Read instructions. Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.

The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operations of possible implementations of systems, methods, and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented by special purpose hardware-based systems that perform the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions.

It should be noted that although several modules or units of equipment for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into being embodied by multiple modules or units.

Through the above description of the embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which can be a personal computer, server, touch terminal, or network device, etc.) to execute the method according to the embodiment of the present application.

Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. .

It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

A method for decoding point cloud media, executed by electronic equipment, characterized by including:

Obtaining a point cloud media file, the point cloud media file including point cloud samples encapsulated in one or more tracks;

Analyze the media file data box of each sub-sample in the point cloud sample to obtain the value of the sub-sample flag field;

Obtain the index information of one or more point cloud subframes corresponding to each data unit in the subsample according to the value of the subsample flag field; when one data unit in the subsample corresponds to at least two When the index information of point cloud sub-frames is provided, the at least two point cloud sub-frames have overlapping point cloud data; and

The point cloud media file is decapsulated and decoded according to the index information of the one or more point cloud subframes to obtain point cloud data.
The decoding method of point cloud media according to claim 1, characterized in that the sub-sample flag field is also used to indicate the division method of the sub-sample; the division method of each sub-sample in the point cloud sample include:

When the value of the sub-sample flag field is the first value, divide the sub-samples based on the data unit so that one sub-sample contains one data unit;

When the value of the sub-sample flag field is the second value, the sub-samples are divided based on the spatial block, so that one sub-sample contains one or more continuous data units corresponding to a first division object, the The first division object includes at least one of a spatial block, a parameter set, a spatial block set information, or a frame boundary identifier; and

When the value of the sub-sample flag field is a third value, the sub-samples are divided based on the point cloud sub-frame, so that one sub-sample contains one or more continuous data units corresponding to a second division object, The second division object includes a complete point cloud subframe.
The decoding method of point cloud media according to claim 2, characterized in that when the value of the sub-sample flag field is a third value, the media file data box of the sub-sample includes:

Subframe index field, the subframe index field is used to indicate the index information of the point cloud subframe contained in the current subsample.
The decoding method of point cloud media according to claim 1, characterized in that the sub-sample flag field is also used to indicate the division method of the sub-sample; the division method of each sub-sample in the point cloud sample include:

When the value of the sub-sample flag field is the first value, divide the sub-samples based on the data unit so that one sub-sample contains one data unit;

When the value of the sub-sample flag field is the second value, the sub-samples are divided based on the spatial block, so that one sub-sample contains one or more continuous data units corresponding to a first division object, the The first division object includes at least one of a spatial block, a parameter set, a spatial block set information, or a frame boundary identifier; and

When the value of the sub-sample flag field is a third value, the sub-samples are divided based on the point cloud sub-frame, so that one sub-sample contains one or more continuous data units corresponding to a second division object, The second division object includes one or more point cloud subframes.
The decoding method of point cloud media according to claim 4, characterized in that when the value of the sub-sample flag field is a third value, the media file data box of the sub-sample includes:

The subframe complete flag field is used to indicate whether the current subsample contains all data that constitutes the point cloud subframe;

The number of subframes field is used to indicate the number of point cloud subframes corresponding to the current subsample; and

The subframe index field is used to indicate the index information of the point cloud subframe corresponding to the current subsample.
The decoding method of point cloud media according to claim 5, characterized in that when the point cloud sample is encapsulated in a track, all data constituting the point cloud subframe include all geometric data and all attribute data. ; When the point cloud samples are encapsulated in multiple tracks, all data constituting the point cloud subframe includes all geometric data or all attribute data.
The decoding method of point cloud media according to claim 1, characterized in that when the value of the sub-sample flag field is a first value, the media file data box of the point cloud sub-frame includes:

The related subframe number field is used to indicate the number of point cloud subframes corresponding to the current subsample; and

The subframe index field is used to indicate the index information of the point cloud subframe corresponding to the current subsample.
The decoding method of point cloud media according to claim 7, characterized in that when the value of the sub-sample flag field is a first value, the media file data box of the point cloud sub-frame further includes:

Number of subsamples field, used to indicate the number of subsamples contained in the current sample.
The decoding method of point cloud media according to claim 8, characterized in that when the value of the sub-sample flag field is a first value, the media file data box of the point cloud sub-frame further includes:

The subframe-related sample number field is used to indicate the number of point cloud samples that contain multiple point cloud subframes; and

The sample serial number difference field is used to indicate the serial number difference between the current point cloud sample containing multiple point cloud subframes and the previous point cloud sample containing multiple point cloud subframes in the decoding order.
The decoding method of point cloud media according to any one of claims 1 to 9, characterized in that when each sub-sample in the point cloud sample is divided into sub-samples based on point cloud sub-frames, The media file data box for the subsample described above includes:

The presentation time flag field is used to indicate whether each point cloud subframe included in the point cloud sample has the same presentation duration; and

The subsample duration field is used to indicate the presentation duration of the current subsample when each point cloud subframe included in the point cloud sample has different presentation durations.
The method for decoding point cloud media according to any one of claims 1 to 9, characterized in that the media file data box of the point cloud sample includes:

The number of subframes field is used to indicate the number of point cloud subframes contained in the current point cloud sample;

The presentation time flag field is used to indicate whether each point cloud subframe included in the point cloud sample has the same presentation duration;

The subframe index field is used to indicate the index information of the point cloud subframe corresponding to the current subsample when each point cloud subframe included in the point cloud sample has different presentation duration; and

The subsample duration field is used to indicate the presentation duration of the current subsample when each point cloud subframe included in the point cloud sample has different presentation durations.
The method for decoding point cloud media according to any one of claims 1 to 9, characterized in that the media file data box of the point cloud sample includes:

The spatial block flag field is used to indicate whether the point cloud subframe in the current sample corresponds to one or more different spatial blocks;

Subframe index field, used to indicate the index information of the current point cloud subframe;

The spatial block number field is used to indicate the number of spatial blocks corresponding to the current point cloud subframe; and

Space block identification field, used to indicate the identifier of the current space block.
A point cloud media encoding method, executed by electronic equipment, is characterized by including:

Obtaining point cloud source data, the point cloud source data includes a point cloud frame having one or more point cloud subframes;

Encoding the point cloud frame to obtain at least one data unit; and

The at least one data unit is encapsulated to obtain a point cloud media file. The point cloud media file includes point cloud samples encapsulated in one or more tracks; media files for each subsample in the point cloud sample. The data box includes a subframe index field; the subframe index field is used to indicate index information of one or more point cloud subframes corresponding to each data unit in the subsample; when one data unit in the subsample When the unit corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.
A point cloud media decoding device, characterized by including:

An acquisition module configured to acquire point cloud media files, where the point cloud media files include point cloud samples encapsulated in one or more tracks;

The parsing module is configured to parse the media file data box of each sub-sample in the point cloud sample and obtain the value of the sub-sample flag field;

An index module configured to obtain the index information of one or more point cloud subframes corresponding to each data unit in the subsample according to the value of the subsample flag field; when one of the subsamples When the data unit corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data; and

The decoding module is configured to decapsulate and decode the point cloud media file according to the index information of the one or more point cloud subframes to obtain point cloud data.
A point cloud media encoding device, characterized by including:

An acquisition module configured to acquire point cloud source data, where the point cloud source data includes a point cloud frame having one or more point cloud subframes;

An encoding module configured to encode the point cloud frame to obtain at least one data unit; and

An encapsulation module, configured to encapsulate the at least one data unit to obtain a point cloud media file, where the point cloud media file includes point cloud samples encapsulated in one or more tracks; The media file data box of each subsample includes a subframe index field; the subframe index field is used to indicate the index information of one or more point cloud subframes corresponding to each data unit in the subsample; when the When one data unit in the subsample corresponds to the index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.
A computer-readable medium, characterized in that computer-readable instructions are stored on the computer-readable medium, and when the computer-readable instructions are executed by a processor, the method of any one of claims 1 to 13 is implemented. .
An electronic device, characterized by including:

processor; and

memory for storing computer readable instructions for the processor;

Wherein, the processor is configured to cause the electronic device to perform the method of any one of claims 1 to 13 via execution of the computer readable instructions.
A computer program product comprising computer readable instructions, characterized in that when the computer readable instructions are executed by a processor, the method of any one of claims 1 to 13 is implemented.