CN114697668B

CN114697668B - Encoding and decoding method of point cloud media and related products

Info

Publication number: CN114697668B
Application number: CN202210428152.6A
Authority: CN
Inventors: 胡颖
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2023-06-30
Anticipated expiration: 2042-04-22
Also published as: CN116744007A; WO2023202095A1; CN114697668A

Abstract

The application belongs to the technical field of audio and video, and particularly relates to a coding method of point cloud media and related products. The coding method in the application comprises the following steps: acquiring a point cloud media file, wherein the point cloud media file comprises point cloud samples packaged in one or more tracks; analyzing the media file data box of each sub-sample in the point cloud sample to obtain the value of a sub-sample zone bit field, wherein the sub-sample zone bit field is used for indicating the division mode of the sub-sample; acquiring index information of one or more point cloud subframes corresponding to each data unit in the sub-sample according to the value of the sub-sample flag bit field; when one data unit in the sub-sample corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapped point cloud data; and de-packaging and decoding the point cloud media file according to the index information of one or more point cloud subframes to obtain the point cloud data. The method and the device can avoid wasting computing resources and improve the processing efficiency of the point cloud media.

Description

Encoding and decoding method of point cloud media and related products

Technical Field

The application belongs to the technical field of audio and video, and particularly relates to a point cloud media encoding method, a point cloud media decoding method, a point cloud media encoding device, a point cloud media decoding device, a computer readable medium, electronic equipment and a computer program product.

Background

A point cloud is a set of irregularly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. After the large-scale point cloud data is acquired through the point cloud acquisition equipment, the point cloud data can be encoded and packaged for transmission, decoding and presentation to a user. Some point cloud frames with less content exist in the point cloud media, and overlapping point cloud content exists among part of the point cloud frames, so that the encoding and decoding processing is independently carried out on each point cloud frame, the waste of computing resources is generated, and the processing efficiency of the point cloud media is affected.

Disclosure of Invention

The application provides a method for encoding point cloud media, a method for decoding point cloud media, a device for encoding point cloud media, a device for decoding point cloud media, a computer readable medium, electronic equipment and a computer program product, and aims to avoid wasting computing resources and improve processing efficiency of point cloud media.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.

According to an aspect of an embodiment of the present application, there is provided a method for decoding point cloud media, including:

Acquiring a point cloud media file, wherein the point cloud media file comprises point cloud samples packaged in one or more tracks;

analyzing a media file data box of each sub-sample in the point cloud sample to obtain a value of a sub-sample zone bit field, wherein the sub-sample zone bit field is used for indicating a division mode of the sub-sample;

acquiring index information of one or more point cloud subframes corresponding to each data unit in the subsamples according to the value of the subsamples flag bit field; when one data unit in the subsamples corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapped point cloud data;

and de-packaging and decoding the point cloud media file according to the index information of the one or more point cloud subframes to obtain point cloud data.

According to an aspect of an embodiment of the present application, there is provided a method for encoding a point cloud medium, including:

acquiring point cloud source data, wherein the point cloud source data comprises a point cloud frame with one or more point cloud subframes;

encoding the point cloud frame to obtain at least one data unit;

packaging the at least one data unit to obtain a point cloud media file, wherein the point cloud media file comprises point cloud samples packaged in one or more tracks; the media file data box of each sub-sample in the point cloud sample comprises a sub-sample zone bit field and a sub-frame index field; the sub-sample zone bit field is used for indicating a division mode of the sub-sample, and the sub-frame index field is used for indicating index information of one or more point cloud sub-frames corresponding to each data unit in the sub-sample; when one data unit in the subsamples corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.

According to an aspect of an embodiment of the present application, there is provided a decoding apparatus for point cloud media, including:

an acquisition module configured to acquire a point cloud media file comprising point cloud samples encapsulated in one or more tracks;

the analysis module is configured to analyze the media file data box of each sub-sample in the point cloud sample to obtain the value of a sub-sample zone bit field, wherein the sub-sample zone bit field is used for indicating the division mode of the sub-sample;

the index module is configured to acquire index information of one or more point cloud subframes corresponding to each data unit in the subsamples according to the value of the subsamples flag bit field; when one data unit in the subsamples corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapped point cloud data;

and the decoding module is configured to de-encapsulate and decode the point cloud media file according to the index information of the one or more point cloud subframes to obtain point cloud data.

According to an aspect of an embodiment of the present application, there is provided an encoding apparatus for point cloud media, including:

An acquisition module configured to acquire point cloud source data, the point cloud source data comprising a point cloud frame having one or more point cloud subframes;

the encoding module is configured to encode the point cloud frame to obtain at least one data unit;

the packaging module is configured to perform packaging processing on the at least one data unit to obtain a point cloud media file, wherein the point cloud media file comprises point cloud samples packaged in one or more tracks; the media file data box of each sub-sample in the point cloud sample comprises a sub-sample zone bit field and a sub-frame index field; the sub-sample zone bit field is used for indicating a division mode of the sub-sample, and the sub-frame index field is used for indicating index information of one or more point cloud sub-frames corresponding to each data unit in the sub-sample; when one data unit in the subsamples corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.

According to an aspect of the embodiments of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method of encoding and decoding point cloud media as in the above technical solution.

According to an aspect of the embodiments of the present application, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of encoding and decoding point cloud media as in the above technical solution via execution of the executable instructions.

According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the encoding and decoding method of the point cloud medium as in the above technical solution.

According to the technical scheme provided by the embodiment of the application, through the media file data box of each sub-sample in the point cloud sample, the division mode of the sub-sample can be indicated, and the corresponding relation between each data unit in the sub-sample and index information of one or more point cloud subframes can be indicated, so that one or more point cloud subframes corresponding to each data unit in the point cloud sample can be used as a combined frame for common coding, on one hand, the computational resource waste generated by independently coding the point cloud frames with less content can be reduced, on the other hand, the point cloud subframes with overlapped point cloud data can be identified, and the coding and decoding efficiency of the point cloud media can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.

Fig. 2 shows a schematic diagram of a point cloud media codec flow in an application scenario according to an embodiment of the present application.

Fig. 3 illustrates a syntax structure of encapsulating point cloud samples based on TLV code stream format in one embodiment of the present application.

Fig. 4 shows a syntax structure of a data unit encapsulated based on a TLV code stream format in one embodiment of the present application.

Fig. 5 is a schematic diagram of multi-frame combining of point cloud data according to an embodiment of the present application.

Fig. 6 is a flowchart illustrating steps of a method for decoding point cloud media in an embodiment of the present application.

Fig. 7 illustrates an exemplary structure of an embodiment of the present application for encapsulating a point cloud sample in a single track.

Fig. 8 illustrates an exemplary structure of encapsulating geometric and attribute code streams in multiple tracks according to one embodiment of the present application.

Fig. 9 shows a syntax structure of a coding related parameter field codec_specific_parameters of a subsampleinformation box data box in an application scenario according to an embodiment of the present application.

Fig. 10 shows a syntax structure of a coding related parameter field code_specific_parameters of a subsampleinformation box data box in an application scenario for uniformly identifying overlapping/non-overlapping subframes in the embodiment of the present application.

FIG. 11 illustrates the syntax structure of the extended sample set tool in one application scenario according to an embodiment of the present application.

Fig. 12 shows a syntax structure of identifying point cloud subframe related information based on a media file data box of a point cloud sample level in an application scenario according to an embodiment of the present application.

Fig. 13 shows a syntax structure of identifying sub-frame related information by a sub-sample sub-frame information data box in an application scenario according to an embodiment of the present application.

Fig. 14 shows a syntax structure of sub-sample sub-frame information box for identifying sub-frame presentation time information in an application scenario according to an embodiment of the present application.

Fig. 15 shows a syntax structure of the embodiment of the present application for identifying sub-frame presentation time information by extending a media file data box in one application scenario.

Fig. 16 is a block diagram of a structure for determining a spatial block correspondence based on three sub-sample division modes of a data unit, a spatial block and a point cloud subframe in an embodiment of the present application.

Fig. 17 is a block diagram of a structure for determining a spatial block correspondence based on two sub-sample division modes of a data unit and a spatial block in an embodiment of the present application.

Fig. 18 shows a syntax structure of indicating correspondence between point cloud subframes and spatial tiles through a media file data box in one application scenario according to an embodiment of the present application.

Fig. 19 is a flowchart illustrating steps of a method for encoding cloud media according to an embodiment of the present application.

Fig. 20 shows a flowchart of performing point cloud data encoding and decoding in a streaming media transmission application scenario according to an embodiment of the present application.

Fig. 21 shows a packaging result of monorail packaging of a point cloud sample in an application scenario in which point cloud subframes are not overlapped with each other in the embodiment of the present application.

Fig. 22 shows a packaging result of a property track non-division sub-sample when a point cloud sample is subjected to multi-track packaging in an application scene in which point cloud sub-frames are not overlapped with each other in the embodiment of the application.

Fig. 23 shows a result of encapsulating an attribute track division sub-sample when performing multi-track encapsulation on a point cloud sample in an application scenario in which point cloud sub-frames do not overlap with each other in the embodiment of the present application.

Fig. 24 shows a packaging result of monorail packaging of a point cloud sample in an application scenario where overlapping point cloud subframes exist in the embodiment of the present application.

Fig. 25 schematically shows a block diagram of a decoding device for point cloud media according to an embodiment of the present application.

Fig. 26 schematically shows a block diagram of a point cloud media encoding apparatus according to an embodiment of the present application.

Fig. 27 schematically illustrates a block diagram of a computer system suitable for use in implementing embodiments of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present application. One skilled in the relevant art will recognize, however, that the aspects of the application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

References herein to "a plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In the specific embodiments of the present application, related data such as transmission content, decoding content, consumption content, and the like of point cloud media, when the embodiments of the present application are applied to specific products or technologies, user permission or consent needs to be obtained, and collection, use, and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

Related terms or abbreviations involved in the embodiments of the present application are explained as follows.

Immersion medium: the immersive media can be classified into 3DoF media, 3dof+ media and 6DoF media according to the degree of freedom of the user when consuming the media content. Point cloud media is a typical 6DoF media.

DoF: degree of Freedom degrees of freedom. The present application refers to the degrees of freedom that a user supports in viewing immersive media and generates content interactions.

3DoF: i.e. three degrees of freedom, referring to three degrees of freedom for the user's head to rotate about the x, y, z axes.

3dof+: i.e. on a three degree of freedom basis, the user also has a limited degree of freedom of movement along the x, y, z axes.

6DoF: i.e. on a three degree of freedom basis, the user also has freedom of movement along the x, y, z axes.

And (3) point cloud: a point cloud is a set of irregularly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. Each point in the point cloud has at least three-dimensional position information, and may also have color, material or other information according to the application scene. Typically, each point in the point cloud has the same number of additional attributes.

PCC: point Cloud Compression, point cloud compression.

G-PCC: geometry-based Point Cloud Compression, point cloud compression based on geometric model.

Sample: samples, a unit of encapsulation in the media file encapsulation process, a media file is composed of a plurality of samples. Taking video media as an example, one sample of video media is typically one video frame.

Slice: point cloud slice/point Yun Tiao represents a set of syntax elements (e.g., geometric slice, attribute slice) of partially or fully encoded point cloud frame data.

Tile: the point cloud space is partitioned.

DASH: dynamic adaptive streaming over HTTP dynamic adaptive streaming over HTTP is an adaptive bit rate streaming technique that allows high quality streaming media to be delivered over the internet via a conventional HTTP web server.

The point cloud media can be further classified into point cloud media (Video-based Point Cloud Compression, VPCC) compressed based on the conventional Video encoding method and point cloud media (Geometry-based Point Cloud Compression, GPCC) compressed based on geometric features. In the file package of the point cloud media, the three-dimensional position information is generally called a geometric component (Geometry Component) of the point cloud media file, and the attribute information is called an attribute component (Attribute Component) of the point cloud media file. A point cloud media file has only one geometric component, but may have one or more attribute components.

The point cloud can flexibly and conveniently express the spatial structure and the surface attribute of a three-dimensional object or scene, so that the application is wide, and the main application scene can be classified into two categories. 1) Machine-aware point clouds such as computer aided design (Computer Aided Design, CAD), autonomous navigation systems (Autonomous Navigation System, ANS), real-time inspection systems, geographic information systems (Geography Information System, GIS), vision sorting robots, rescue and relief robots. 2) Human eyes perceive point clouds such as Virtual Reality (VR) games, digital cultural heritage, free viewpoint broadcasting, three-dimensional immersive communication, three-dimensional immersive interaction, and the like.

The acquisition of the point cloud mainly comprises the following steps: computer generation, 3D laser scanning, 3D photogrammetry, and the like. The computer may generate a point cloud of the virtual three-dimensional object and scene. A 3D scan may obtain a point cloud of a static real world three-dimensional object or scene, and millions of point clouds may be acquired per second. The 3D camera can obtain a point cloud of a dynamic real world three-dimensional object or scene, and tens of millions of point clouds can be obtained per second. In addition, in the medical field, point clouds of biological tissue organs can be obtained from MRI, CT, electromagnetic localization information. The technology reduces the acquisition cost and time period of the point cloud data and improves the accuracy of the data. The transformation of the point cloud data acquisition mode enables the acquisition of a large amount of point cloud data to be possible. Along with the continuous accumulation of large-scale point cloud data, efficient storage, transmission, release, sharing and standardization of the point cloud data become key to point cloud application.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 includes a plurality of terminal devices that can communicate with each other through, for example, a network 150. For example, the system architecture 100 may include a first terminal device 110 and a second terminal device 120 interconnected by a network 150. In the embodiment of fig. 1, the first terminal apparatus 110 and the second terminal apparatus 120 perform unidirectional data transmission.

For example, the first terminal device 110 may encode point cloud data (e.g., point cloud data collected by the terminal device 110) for transmission over the network 150 to the second terminal device 120, the encoded point cloud data transmitted in one or more encoded point cloud code streams, the second terminal device 120 may receive the encoded point cloud data from the network 150, decode the encoded point cloud data to recover the point cloud data, and display the point cloud content according to the recovered point cloud data.

In one embodiment of the present application, the system architecture 100 may include a third terminal device 130 and a fourth terminal device 140 that perform bi-directional transmission of encoded point cloud data, such as may occur during a video conference. For bi-directional data transmission, each of the third terminal device 130 and the fourth terminal device 140 may encode point cloud data (e.g., point cloud data collected by the terminal device) for transmission to the other of the third terminal device 130 and the fourth terminal device 140 over the network 150. Each of the third terminal device 130 and the fourth terminal device 140 may also receive encoded point cloud data transmitted by the other of the third terminal device 130 and the fourth terminal device 140, and may decode the encoded point cloud data to recover the point cloud data, and may display the point cloud content on an accessible display device according to the recovered point cloud data.

In the embodiment of fig. 1, the first, second, third and fourth

terminal apparatuses

110, 120, 130 and 140 may be servers, personal computers and smart phones, but the principles disclosed herein may not be limited thereto. Embodiments disclosed herein are applicable to laptop computers, tablet computers, media players, and/or dedicated video conferencing devices. Network 150 represents any number of networks, including wired and/or wireless communication networks, for example, that communicate encoded point cloud data between first terminal device 110, second terminal device 120, third terminal device 130, and fourth terminal device 140. The communication network 150 may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For the purposes of this application, the architecture and topology of network 150 may be irrelevant to the operation disclosed herein, unless explained below.

The server in the embodiment of the application may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

After the point cloud media is encoded, the encoded data stream needs to be encapsulated and transmitted to the user. Correspondingly, at the point cloud media player end, the point cloud file needs to be unpacked, then decoded, and finally the decoded data stream is presented.

The real world visual scene a may be captured by a point cloud data acquisition by an acquisition device 210, which acquisition device 210 may be, for example, a set of cameras or a camera device with multiple shots and sensors. The acquisition result is point cloud source data B which is a frame sequence consisting of a large number of point cloud frames. One or more point cloud frames may be encoded by encoder 220 to obtain an encoded G-PCC bit stream, which may specifically include an encoded geometric bit stream and an attribute bit stream E. The file encapsulator 230 can encapsulate one or more coded bitstreams according to a particular media container file format to obtain a media file F or a series of initialization segments for file playback and media segments Fs for streaming. In some embodiments of the present application, the media container file format may be, for example, an ISO base media file format specified in ISO/IEC 14496-12[ ISOBMFF ]. The file packager 230 may also encapsulate the metadata in the media file F or the media fragment Fs.

The media file F output by the file encapsulator 230 is identical to the media file F' input by the file decapsulator 240. The file decapsulator may extract the encoded bitstream E ' and parse the metadata by processing the media file F ' or processing the received media fragments F's. The decoder 250 may decode the G-PCC bit stream into a decoded signal D ', and generate point cloud data from the decoded signal D'. Where applicable, the point cloud data may be rendered and displayed by a renderer 260 onto a screen of a head mounted display or any other display device based on a current viewing position, viewing direction, or viewport determined by various types of sensors (e.g., head). In addition to being used by a player to access the appropriate portion of the decoded point cloud data, the current viewing position or viewing direction may also be used for decoding optimization. In the viewport-related content distributor 270, the current viewing position and viewing direction are also passed to a policy module, which may be used to determine the track to receive.

In the transmission technology of point cloud media, streaming transmission technology is generally used to process media resource transmission between a server and a client. Common media streaming techniques include DASH (Dynamic Adaptive Streaming over HTTP), HLS (HTTP Live Streaming), SMT (Smart Media Transport), and the like.

Taking DASH as an example, DASH is an adaptive bit rate streaming technology, so that high quality streaming media can be delivered over the internet through a conventional HTTP web server. DASH breaks the content into a series of small HTTP-based file segments, each containing very short lengths of playable content, while the total length of the content may be as long as several hours (e.g., a movie or sports event live). The content will be made into multiple bit rate alternative segments to provide multiple bit rate versions for selection. When media content is played by a DASH client, the client will automatically select which alternative to download and play based on current network conditions. The client will select the highest bit rate clip that can be downloaded in time for playback, thereby avoiding play-over or rebuffering events. As such, DASH clients can seamlessly adapt to changing network conditions and provide a high quality playback experience with less incidence of chunking and rebuffering.

DASH uses existing HTTP web server infrastructure. It allows devices such as internet televisions, television set-top boxes, desktop computers, smartphones, tablet computers, etc. to consume multimedia content (e.g., video, television, broadcast, etc.) delivered over the internet and can cope with varying internet reception conditions.

Taking point cloud compression G-PCC based on a geometric model as an example, each G-PCC point cloud sample corresponds to one point cloud frame and consists of one or more G-PCC data units belonging to the same presentation time.

Fig. 3 illustrates a syntax structure of encapsulating point cloud samples based on TLV code stream format in one embodiment of the present application. Wherein each point cloud sample consists of one or more data units G-PCCunit. The gpcc_unit contains a single G-PCC data unit. G-PCC data units in the same point cloud sample correspond to the same point cloud frame and belong to the same presentation time. The TLV bitstream format, i.e., type-Length-Value bytestream format, refers to a structure body composed of a Type of data, a Length of data, and a Value of data. Specific information about TLV code stream formats may be referred to the standard ISO/IEC 23090-9.

Fig. 4 shows a syntax structure of a data unit encapsulated based on a TLV code stream format in one embodiment of the present application. Wherein tlv _type is a type field for indicating the type of the data unit. Table 1 shows a semantic description of different values for the data unit type field in one embodiment of the present application.

TABLE 1

tlv_type	Description
			0	Sequence parameter set
1	Geometry parameter set
		2	Geometry data unit
3	Attribute parameter set
		4	Attribute data unit
5	Tile inventory
		6	Frame boundary marker

As shown in table 1, different valued type fields may be used to indicate different data unit types.

When the type field takes a value of 0, it indicates that the type of the data unit is the sequence parameter set SPS (Sequence Parameter Set).

When the type field takes a value of 1, it indicates that the type of the data unit is the set of geometric parameters GPS (Geometry Parameter Set).

When the type field takes a value of 2, it indicates that the type of the data unit is a geometric data unit Geometry data unit.

When the type field takes a value of 3, it indicates that the type of the data unit is the attribute parameter set APS (Attribute Parameter Set).

When the type field takes a value of 4, it indicates that the type of the data unit is the attribute data unit Attribute data unit.

When the type field takes a value of 5, it indicates that the type of the data unit is tile set Tileinventory.

When the type field takes a value of 6, it indicates that the type of the data unit is a frame boundary flag Frame boundary marker.

Specific information of the above data unit types may be referred to the standard ISO/IEC 23090-9.

In the related art of the present application, each point cloud frame of the point cloud media is encoded separately, which causes two problems in the encoding and decoding process of the point cloud media as follows. One problem is that in frame-based point cloud content, the file size per frame may be relatively small, which is inefficient for the I/O interface. Another problem is that the decoder needs to run from the initial bounding box and divide each single frame, there is a significant overhead in initializing the decoder in the edge device.

The first problem can be easily solved by concatenating the coded bit streams of consecutive frames. However, the second problem is difficult to avoid unless the point cloud is assembled prior to encoding. The embodiments of the present application propose a scheme of combined frame coding, which solves both problems by introducing coding of frame indexes in the combined point cloud. In addition, the embodiment of the application can greatly improve the coding efficiency, so that the storage and use of the point cloud content based on the frames are also facilitated.

When the frame coding technology is combined, a plurality of point cloud frames in an original sequence are combined and then are coded, for example, 100 point cloud frames exist in the original sequence, the original point cloud sequence is reconstructed according to a mode of combining every 4 frames, and then a 25-frame point cloud sequence is obtained and is coded. In such a coding scheme, for a scene with fewer points per frame or a stronger correlation between the preceding and following frames, a larger improvement in coding efficiency can be obtained.

Fig. 5 is a schematic diagram of multi-frame combining of point cloud data according to an embodiment of the present application. As shown in fig. 5, a combined Frame combinedbrame may be formed by combining the first point cloud Frame1 and the second point cloud Frame 2. When the combined encoding technique combining multiple frames is adopted, each frame of the newly obtained sequence contains multiple point cloud subframes, and the point Yun Zizhen is a partial representation of the point cloud frame consisting of points with the same frame number or frame index attribute value. When two or more point cloud frames are correlated, the individual octree of each point cloud frame has a similar structure at a higher level; in the leaf nodes of the combined frame there is also some duplicate content from different frames, i.e. overlapping point cloud data.

The following describes in detail, with reference to specific embodiments, a method for encoding a point cloud medium, a method for decoding a point cloud medium, an apparatus for encoding a point cloud medium, an apparatus for decoding a point cloud medium, a computer readable medium, an electronic device, a computer program product, and other technical schemes. The technical schemes of the embodiment of the application can be applied to links such as a server side, a player side or an intermediate node of the immersive media system.

Fig. 6 is a flowchart illustrating steps of a method for decoding a point cloud media in an embodiment of the present application, where the method may be applied to links such as a server, a client, or an intermediate node of a point cloud media system, and the embodiment of the present application uses a method for decoding a point cloud media executed by a client device installed with a point cloud decoding apparatus as an example. As shown in fig. 6, the decoding method of the point cloud media includes the following steps S610 to S640.

S610: acquiring a point cloud media file, wherein the point cloud media file comprises point cloud samples packaged in one or more tracks;

s620: analyzing the media file data box of each sub-sample in the point cloud sample to obtain the value of a sub-sample zone bit field, wherein the sub-sample zone bit field is used for indicating the division mode of the sub-sample;

S630: acquiring index information of one or more point cloud subframes corresponding to each data unit in the sub-sample according to the value of the sub-sample flag bit field; when one data unit in the sub-sample corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapped point cloud data;

s640: and de-packaging and decoding the point cloud media file according to the index information of one or more point cloud subframes to obtain the point cloud data.

The sub-samples are data encapsulation units in the point cloud samples, different types of sub-samples can be obtained by dividing the sub-samples in the point cloud samples based on different dividing modes, for example, sub-samples with different data capacities can be obtained by dividing the sub-samples in the point cloud samples based on different dimensions such as data units, space blocks or point cloud subframes.

The point Yun Zizhen is a partial representation of a point cloud frame made up of points having the same index information (e.g., frame number or frame index attribute value). When one point cloud frame is a combined frame formed by combining a plurality of point cloud frames, each point cloud frame in the combined frame constitutes a point Yun Zizhen.

According to the decoding method of the point cloud media, through the media file data box of each sub-sample in the point cloud sample, the division mode of the sub-sample can be indicated, the corresponding relation between each data unit in the sub-sample and index information of one or more point cloud subframes can be indicated, and therefore one or more point cloud subframes corresponding to each data unit in the point cloud sample can be used as combined frames for common encoding, on one hand, the computational resource waste generated by independently encoding the point cloud frames with fewer contents can be reduced, on the other hand, the point cloud subframes with overlapped point cloud data can be identified, and the encoding and decoding efficiency of the point cloud media can be improved.

The following describes in detail the specific implementation manner of each method step in the point cloud media decoding method according to the present application with reference to multiple embodiments.

In step S610, a point cloud media file is acquired, the point cloud media file including point cloud samples encapsulated in one or more tracks.

The point cloud media file may be a media file or a media segment obtained after encoding and packaging as shown in fig. 2, where the media file or the media segment carries a point cloud code stream to be transmitted.

In an embodiment of the present application, the data source may encapsulate the point cloud code stream into a single track according to the geometric parameter information, the attribute parameter information, and the parameter information of the point cloud slice included in the point cloud code stream, or may repackage the point cloud media file of the single track into the point cloud media file including a plurality of tracks.

The track refers to a volumetric visual track (volumetric visual track) for carrying the encoded geometric bit stream or the encoded attribute bit stream, or may be a volumetric visual track carrying both the encoded geometric bit stream and the encoded attribute bit stream.

In the case of a point cloud code stream packaged in a single track, each point cloud sample may correspond to a complete point cloud frame.

In step S620, the media file data box of each sub-sample in the point cloud sample is parsed to obtain the value of the sub-sample flag bit field, where the sub-sample flag bit field is used to indicate the division mode of the sub-sample.

The media file data box may be a data box based on the ISO base media file format ISOBMFF (ISO Base Media File Format). Specific information of the ISOBMFF may be referred to the standard ISO/IEC 14496-12.

When the G-PCC stream is carried in a single track, a simple ISOBMFF package may be utilized by storing the G-PCC stream in a single track without further processing.

Fig. 7 illustrates an exemplary structure of an embodiment of the present application for encapsulating a point cloud sample in a single track. Wherein moov represents metadata information metadata of the point cloud sample; mdat represents specific media data carried in the point cloud sample. As shown in the figure, the Component can still be described by a "subs" data box with flag value flag=0, while the subframe index subframe_idx is provided by a "subs" data box with another flag value flag=2. This is specified by ISOBMFF: when a plurality of subsampleinformation boxes exist in the same container box, the flag value in each subsampleinformation box should be different.

With continued reference to fig. 7, in the "subs" data box with flag value flag=2, for each point cloud Sample 1 … … Sample, one or more sub-samples based on sub-frame division are included therein, for example, the point cloud Sample 1 includes X sub-samples, which respectively correspond to a plurality of point cloud sub-frames with sub-frame_idx= … … X. In turn, each point cloud sub-frame includes different data units, such as a geometric data unit (geometry data unit), an attribute data unit (attribute data unit), and a frame index attribute data unit (frameindexattribute data unit) as shown in the figure.

Fig. 8 illustrates an exemplary structure of encapsulating geometric and attribute code streams in multiple tracks according to one embodiment of the present application. Wherein ftyp represents a file type describing a version of a specification to which the point cloud sample complies; moov represents metadata information metadata of a point cloud sample; mdat represents specific media data carried in the point cloud sample.

As shown in fig. 8, in the multi-track encapsulation mode, the code stream data of each point cloud component is mapped into a separate track. There are two types of G-PCC component orbits: G-PCC geometric orbits and G-PCC attribute orbits. Each point cloud sample in the track contains at least one data unit G-PCC unit that carries a single G-PCC component data unit, rather than a multiplexing of geometric and attribute data units or different attribute data units. The G-PCC attribute track should not multiplex different attribute sub-streams, e.g. color, reflectivity.

In one embodiment of the present application, the relevant information of the point cloud subframe may be identified by expanding the coding relevant parameter field of the subsampleinformation box. In the embodiment of the present application, the dividing manner of each sub-sample in the point cloud sample may include:

dividing the sub-samples based on the data units when the value of the sub-sample flag bit field is a first value, so that one sub-sample contains one data unit;

dividing the sub-samples based on the spatial partitioning when the value of the sub-sample flag bit field is a second value, such that one sub-sample contains one or more consecutive data units corresponding to a first partitioning object, the first partitioning object including at least one of a spatial partitioning, a parameter set, spatial partitioning set information, or a frame boundary identification;

when the value of the sub-sample flag bit field is a third value, the sub-samples are partitioned based on the point Yun Zizhen such that one sub-sample contains one or more consecutive data units corresponding to a second partition object, which includes a complete point Yun Zizhen.

In one embodiment of the present application, when the value of the sub-sample flag bit field is a third value, the media file data box of the sub-sample includes:

And the subframe index field is used for indicating index information of the point cloud subframes contained in the current subsamples.

Taking the ISOBMFF data box as an example, for the subsampleinformation box in the point cloud file, the subsampledefinition should be based on the value of the flag field in the subsampleinformation box data box.

For the implementation of the above embodiments in a specific application scenario, fig. 9 shows the syntax structure of the coding related parameter field codec_specific_parameters of the subsampleinformation box data box in one application scenario in the embodiment of the present application.

As shown in fig. 9, when the flag field flag of the sub-sample is 0, the sub-sample division mode based on the G-PCC data unit is adopted. One sub-sample contains only one G-PCC unit.

On this basis, the subsampleinformation box may include the following fields:

payloadType, tlv _type type of G-PCC units contained in the subsamples;

attrIdx indicates that an ash_attr_sps_attr_idx field corresponding to attribute data contained in the sub-sample takes a value.

When the flag bit field flag of the sub-sample is 1, the sub-sample dividing mode based on the space block tile is adopted. One sub-sample contains one or more continuous data units corresponding to one space block tile, or contains one or more continuous data units corresponding to parameter sets, space block set information or frame boundary identification.

On this basis, the subsampleinformation box may include the following fields:

when the tile_data is 1, the sub-sample contains geometric data or attribute data of the corresponding tile; when the value is 0, the subsamples contain parameter set data, tile geometric information or frame boundary identification.

tile_id, a tile index number indicating the association of data in a sub-sample.

When the flag bit field flag of the sub-sample is 2, the sub-sample dividing mode based on the sub-frame is adopted. One sub-sample contains continuous data units corresponding to one complete point cloud sub-frame.

On this basis, the subsampleinformation box may include the following fields:

the subframe index field subframe_idx indicates the value of the frame number attribute corresponding to the point Yun Zizhen contained in the current subsamples.

In an embodiment of the present application, in order to unify two application scenarios of completely non-overlapping subframes and overlapping subframes, a partitioning manner of each sub-sample in the point cloud sample may include:

when the value of the sub-sample flag bit field is a third value, the sub-samples are divided based on the point Yun Zizhen, so that one sub-sample contains one or more continuous data units corresponding to a second division object, and the second division object comprises one or more point cloud subframes.

a subframe complete flag bit field for indicating whether the current subframe contains all data constituting a point cloud subframe;

a subframe number field, configured to indicate the number of point cloud subframes corresponding to the current subframe;

and the subframe index field is used for indicating index information of the point cloud subframe corresponding to the current subsamples.

In one embodiment of the present application, when the point cloud sample is packaged in one track, all data forming the point cloud subframe includes all geometric data and attribute data; when the point cloud sample is packaged in a plurality of tracks, all data constituting the point cloud subframe includes all geometric data or all attribute data.

For the implementation of the above embodiments in a specific application scenario, fig. 10 shows a syntax structure of a coding related parameter field codec_specific_parameters of a subsampleinformation box data box in an application scenario with overlapping/non-overlapping uniform identification subframes in the embodiment of the present application.

As shown in fig. 10, when the flag field flag of the sub-sample is 0, the sub-sample division mode based on the G-PCC data unit is adopted. One sub-sample contains only one data unit.

On this basis, the subsampleinformation box may include the following fields:

payadtype, tlv _type type of data unit contained in the subsamples;

On this basis, the subsampleinformation box may include the following fields:

When the flag bit field flag of the sub-sample is 2, the sub-sample dividing mode based on the sub-frame is adopted. One sub-sample contains one or more continuous data units, corresponding to one or more point cloud sub-frames.

On this basis, the subsampleinformation box may include the following fields.

On this basis, the subsampleinformation box may include the following fields:

when the value of the complete_subframe_flag of the subframe complete flag bit field is 1, the data unit corresponding to the current subframe sample contains all data forming the corresponding subframe; when the value is 0, the data unit corresponding to the current sub-sample contains partial data forming the corresponding sub-frame. ( In the monorail encapsulation mode, all data refer to all geometric and attribute data; in the multitrack packaging mode, all data refer to all geometric data or all attribute data of a characteristic type )

The number of subframes field num_subframes indicates the number of subframes corresponding to the data unit in the sub-sample. When the value of the field is 1, the subframe corresponding to the current subsamples is indicated by subframe_idx; when the value of the field is greater than 1, the subframe corresponding to the current subsamples is a subframe indicated by subframe_idx and a subframe with num_subframes-1 continuous frame numbers.

In one embodiment of the present application, the relevant information for the point cloud sub-frames may be identified by extending the sample set tool in the media file data box. In this embodiment of the present application, when the value of the sub-sample flag bit field is a first value, the media file data box of the point cloud sub-frame includes:

a sub-sample number field indicating the number of sub-samples contained in the current sample;

a related subframe number field, configured to indicate the number of point cloud subframes corresponding to the current subframe;

For the implementation of the above embodiments in a specific application scenario, fig. 11 shows the syntax structure of the extended sample set tool in one application scenario according to the embodiment of the present application.

As shown in fig. 11, the sample group tool in the media file data box may include the following fields:

the subsamplenumber field subsamplecount indicates the number of subsamples included in the current sample.

The related number of subframes field related_subframe_num indicates the number of point cloud subframes corresponding to the current sub-sample. When this field takes a value of 0, it indicates that the information contained in the current sub-sample is independent of the point Yun Zizhen partitioning (e.g., tile set information or end-of-frame identifier).

The subframe index field subframe_index indicates the point cloud subframe sequence number corresponding to the current subframe, and the value of the sequence number is the same as that in the frame number attribute.

In some alternative embodiments, the information related to the sub-frame of the point cloud sub-frame may not be indicated at the sub-sample level, but only at the sub-sample level. I.e. in case of overlapping point cloud subframes, only the samples where point cloud subframes are present and the corresponding point cloud subframe index numbers are identified at the system layer. Fig. 12 shows a syntax structure of identifying point cloud subframe related information based on a media file data box of a point cloud sample level in an application scenario according to an embodiment of the present application.

As shown in fig. 12, the media file data box in the embodiment of the present application may include the following fields:

the related number of subframes field related_subframe_num indicates the number of point cloud subframes corresponding to the current sample.

The subframe index field subframe_index indicates the subframe number corresponding to the current sample, and the value of the subframe number should be the same as the value in the frame number attribute.

In one embodiment of the present application, the information about the point cloud subframes may be identified by defining a subsamplesubframe information data box subsamplesubframe info box. In this embodiment of the present application, when the value of the sub-sample flag bit field is a first value, the media file data box of the point cloud sub-frame includes:

a subframe-related sample number field for indicating the number of point cloud samples comprising a plurality of point cloud subframes;

a sample sequence number difference field, configured to indicate a sequence number difference between a point cloud sample currently including a plurality of point cloud subframes and a point cloud sample previously including a plurality of point cloud subframes in a decoding order;

a sub-sample number field for indicating the number of sub-samples contained in the current point cloud sample;

For the implementation of the above embodiments in a specific application scenario, fig. 13 shows a syntax structure of identifying subframe related information by a subsamplesubframe information data box in one application scenario according to an embodiment of the present application. The sub-sample sub-frame information data box may be of the data box type 'sbfi', contained in SampleEntry or TrackFragmentBox, for example. The sub-sample sub-frame information data box is used for indicating sub-frame information corresponding to each sub-sample divided based on the G-PCC data unit in the point cloud sample containing a plurality of sub-frames. When the data box exists in the track, the subsampled flag bit field in the subsampled information data box must be valued at 0, that is, a subsampled partitioning method based on data units is adopted.

As shown in fig. 13, the subsamplesubframe information data box subsamplesubframe info box may include the following fields:

the subframe-related sample number field subframe_correlated_sample_num indicates the number of point cloud samples containing a plurality of point cloud subframes.

The sample number difference field sample_delta indicates the difference between the sample number currently containing a plurality of subframes and the sample number previously containing a plurality of subframes in decoding order. For a first point cloud sample containing a plurality of point cloud subframes, the value of the field is the serial number of the point cloud sample.

In one embodiment of the present application, the presentation time of the point cloud sub-frames may be indicated by a media file data box. In the embodiment of the present application, when the division manner of each sub-sample in the point cloud sample is to divide the sub-sample based on the point Yun Zizhen, the media file data box of the sub-sample includes:

a presentation time zone bit field, configured to indicate whether each point cloud subframe included in the point cloud sample has the same presentation duration;

and a subsampleduration field, configured to indicate a presentation duration of a current subsample when each of the point cloud subframes included in the point cloud sample has a different presentation duration.

For the implementation of the above embodiments in a specific application scenario, fig. 14 shows a syntax structure of the present embodiment of the application for identifying sub-frame presentation time information by a sub-sample sub-frame information data box in one application scenario.

As shown in fig. 14, the coding-related parameter field codec_specific_parameters in the subsamplesubframe information data box subsamplesubframe information box includes the following fields:

the presentation time flag field with_unique_duration_flag is 0, which indicates that a plurality of subframes included in a sample have the same presentation time, and the presentation time of each subframe can be calculated according to the presentation time of the sample and the number of subframes in the sample. At this time, the sample_delta field corresponding to the sample should be an integer multiple of the number of subframes. A value of 1 indicates that a plurality of subframes included in the sample have different presentation durations.

The sub-sample duration field sub_sample_duration indicates the presentation duration of the sub-sample. The sum of the field values of the plurality of sub-samples in the sample is equal to the sample_delta field value corresponding to the sample.

In one embodiment of the present application, the presentation duration of each subframe of two subframe scenes can be indicated in a unified way by expanding the media file data box. On this basis, the media file data box of the point cloud sample comprises:

a subframe number field for indicating the number of point cloud subframes included in the current point cloud sample;

The subframe index field is used for indicating index information of the point cloud subframes corresponding to the current subframe when each point cloud subframe contained in the point cloud sample has different presentation time lengths;

For the implementation of the above embodiments in a specific application scenario, fig. 15 shows a syntax structure of the embodiment of the present application for identifying sub-frame presentation time information by extending a media file data box in one application scenario.

As shown in fig. 15, the media file data box in the embodiment of the present application includes the following fields:

the number of subframes field nb_subframes indicates the number of subframes corresponding to the current sample.

The presentation time flag bit field with_unique_duration_flag is 0, which indicates that a plurality of corresponding subframes in the sample have the same presentation time, and the presentation time of each subframe can be calculated according to the presentation time of the sample and the number of subframes in the sample. At this time, the sample_delta field corresponding to the sample should be an integer multiple of the number of subframes. A value of 1 indicates that a plurality of subframes included in the sample have different presentation durations.

The subframe index field subframe_index indicates the sequence number of the point cloud subframe.

The sub-sample duration field sub_sample_duration indicates a presentation duration of the corresponding sub-frame. The sum of the field values of all subframes corresponding to the sample is equal to the sample_delta field value corresponding to the sample.

Based on the scheme of identifying the relevant information of the sub-frames through the media file data box provided in the above embodiment of the present application, the spatial information of the point cloud sub-frames can be implicitly acquired. When multiple sub-sample division modes coexist, the corresponding relation between the sub-frame of the point cloud sub-frame and the space block tile can be found according to the information carried in each sub-sample division mode.

As shown in fig. 16, two point cloud sub-samples may be divided based on a point cloud sub-frame, a first sub-frame with a sub-frame index value of subframe index=0 and a second sub-frame with a sub-frame index value of subframe index=1.

Based on the sub-sample division by the spatial segmentation, a first spatial segmentation tile0 and a second spatial segmentation tile1 corresponding to the first subframe and a third spatial segmentation tile2 corresponding to the second subframe may be determined.

Based on the sub-sample division of the data units, a plurality of data units corresponding to the first spatial block tile0 may be determined, such as a geometric slice Geoslice0, a geometric slice Geoslice1, a color attribute slice Attrcolorslice0, a color attribute slice Attrcolorslice1, a frame index attribute slice attrceidxslice 0, and a frame index attribute slice attrceidxslice 1, which are shown in the figure.

Meanwhile, a plurality of data units corresponding to the second space block tile1 can be determined, such as a geometric slice Geoslice2, a geometric slice Geoslice3, a color attribute slice Attrcolorslice2, a color attribute slice Attrcolorslice3, a frame index attribute slice attrceidxslice 2, and a frame index attribute slice attrceidxslice 3 shown in the figure.

In addition, a plurality of data units corresponding to the third spatial block tile2 may be determined, as shown in the figure, including a geometric slice Geoslice4, a geometric slice Geoslice5, a color attribute slice Attrcolorslice4, a color attribute slice Attrcolorslice5, a frame index attribute slice attrceidxslice 4, and a frame index attribute slice attrceidxslice 5.

As shown in fig. 17, the first spatial block tile0 and the second spatial block tile1 may be determined by sub-sample division based on the spatial blocks.

Based on the sub-sample division of the data units, a plurality of data units corresponding to the first spatial block tile0 may be determined, as shown in the figure, a geometric slice Geoslice0 corresponding to the sub-frame0, a color attribute slice attrocolorslice 1, and a frame index attribute slice attrocolorslice 0, and a geometric slice Geoslice1, a color attribute slice attrocolorslice 1, and a frame index attribute slice attrocolorslice 1 corresponding to the sub-frames sub-frame0 and sub-frame 1.

At the same time, a plurality of data units corresponding to the second spatial block tile1 may be determined, as shown in the figure, a geometric slice Geoslice2, a color attribute slice Attrcolorslice2, and a frame index attribute slice attrceidxslice 2 corresponding to the sub-frames sub-frame0 and sub-frame1, and a geometric slice Geoslice3, a color attribute slice Attrcolorslice3, and a frame index attribute slice attrceidxslice 3 corresponding to the sub-frames sub-frame 1.

In one embodiment of the present application, the correspondence between the point cloud subframes and the spatial tiles may also be explicitly indicated in the media file data box. In an embodiment of the present application, a media file data box of a point cloud sample includes:

A spatial block flag bit field for indicating whether a point Yun Zizhen in the current sample corresponds to one or more different spatial blocks;

a subframe index field for indicating index information of a current point cloud subframe;

a number of spatial blocks field for indicating the number of spatial blocks corresponding to the current point Yun Zizhen;

a spatial chunk identification field for indicating an identifier of the current spatial chunk.

For a specific implementation manner of the above embodiment, fig. 18 shows a syntax structure of the correspondence between the point cloud subframes and the spatial blocks indicated by the media file data box in one application scenario in the embodiment of the present application.

As shown in fig. 18, the media file data box in the embodiment of the present application may include the following fields:

the space block flag bit field has_tile_info_flag, when the value is 1, the sub-frames in the current sample correspond to one or more different point cloud space blocks respectively, and when the value is 0, the sub-frames in the current sample cannot be distinguished according to the point cloud space blocks.

Subframe index field subframe_index, indicating the sequence number of the point cloud subframe

And a space block number field num_tiles indicates the number of the point cloud space blocks corresponding to the corresponding point cloud subframes.

The spatial chunk identification field tile_id indicates the identifier of the corresponding point cloud spatial chunk.

Fig. 19 is a flowchart illustrating steps of a method for encoding point cloud media according to an embodiment of the present application, where the method may be applied to links such as a server, a client, and an intermediate node of a point cloud media system, and the embodiment of the present application uses a client device installed with a point cloud encoding apparatus to execute the method for encoding point cloud media as an example. As shown in fig. 19, the encoding method of the point cloud media includes the following steps S1910 to S1930.

In step S1910, point cloud source data is acquired, the point cloud source data including a point cloud frame having one or more point cloud subframes.

In step S1920, the point cloud frame is encoded to obtain at least one data unit.

In step S1930, performing encapsulation processing on at least one data unit to obtain a point cloud media file, where the point cloud media file includes point cloud samples encapsulated in one or more tracks; the media file data box of each sub-sample in the point cloud sample comprises a sub-sample flag bit field and a sub-frame index field; the sub-sample zone bit field is used for indicating the division mode of the sub-sample, and the sub-frame index field is used for indicating index information of one or more point cloud sub-frames corresponding to each data unit in the sub-sample; when one data unit in the sub-sample corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.

The point cloud source data includes point cloud videos (images and/or videos) representing objects and/or environments located in various 3D spaces (e.g., 3D spaces representing real environments, 3D spaces representing virtual environments, etc.).

In one embodiment of the present application, the data source may capture the point cloud source data using one or more cameras (e.g., infrared cameras capable of protecting depth information, RGB cameras capable of extracting color information corresponding to depth information, etc.), projectors (e.g., infrared pattern projectors for protecting depth information), lidas, etc. acquisition devices. The shape of the geometric structure composed of the points in the 3D space may be extracted from the depth information of the point cloud source data, and the attribute of each point may be extracted from the color information of the point cloud source data to protect the point cloud source data.

Taking point cloud video data as an example, the point cloud video may include one or more point cloud frames, and one point cloud frame may represent one frame of point cloud image. In one embodiment of the present application, the point cloud video data may be captured based on at least one of an inward facing technique and an outward facing technique.

Inward facing technology refers to technology that captures an image of a center object with one or more cameras (or camera sensors) disposed around the center object. Point cloud content that provides a 360 degree image of a key object to a user (e.g., VR/AR content that provides a 360 degree image of an object (e.g., a key object such as a character, player, object, or actor) to a user) may be generated using an inward facing technique.

Outward facing technology refers to technology that captures an environment of a center object with one or more cameras (or camera sensors) disposed around the center object instead of an image of the center object. Point cloud content for providing an ambient environment that appears from the perspective of the user (e.g., content representing an external environment that may be provided to the user of the self-driving vehicle) may be generated using an outward facing technique.

When generating point cloud content based on a capture operation of one or more cameras, the coordinate system is different among each camera, and thus, the data source may calibrate the one or more cameras to set the global coordinate system prior to the capture operation. In addition, the data source may generate point cloud content by compositing any images and/or video with images and/or video captured by the capture techniques described above. The data source may perform post-processing on the captured images and/or video, e.g., may remove unwanted areas (e.g., background), identify the space to which the captured images and/or video are connected, perform operations to fill the space holes when present, and so forth.

The data source may generate a piece of point cloud content by performing a coordinate transformation on points of the point cloud video protected from each camera. The data source may perform a coordinate transformation on the point based on the coordinates of each camera location. Thus, the data source may generate a point cloud content that represents a wide spatial range, or may generate point cloud content with a high density of points.

Fig. 20 shows a flowchart of performing point cloud data encoding and decoding in a streaming media transmission application scenario according to an embodiment of the present application. As shown in fig. 20, the server is used as a data source for producing the point cloud media file, and can encode and send the point cloud data to the client where the user is located, and the client decodes the point cloud media file to obtain the point cloud data for consumption by the user. The specific point cloud data encoding and decoding process may include the following steps.

S2010: and the server determines one or more subframes corresponding to each geometric slice according to the subframe index number corresponding to each geometric slice in the point cloud code stream.

If each geometric slice only corresponds to 1 subframe, that is, the point clouds included in each subframe are not overlapped with each other, when the point Yun Zizhen is packaged with the sub-samples, a sub-sample division mode with flag=2 is adopted and a corresponding subframe index number is indicated.

If there are a plurality of subframes corresponding to the geometric slice, that is, there is an overlap between the point clouds included in each subframe, when the point Yun Zizhen is packaged with the sub-samples, a sub-sample division mode with flag=0 is adopted, and the corresponding subframe index number is indicated by metadata.

S2020: and the server packages the point cloud code stream into a point cloud file, wherein the point cloud subframes are divided and indicated in the form of subsamples.

The encapsulation of the server-to-point cloud code stream may be a single track encapsulation or a component-based multi-track encapsulation.

When the monorail encapsulation is performed, in the sub-sample division mode of flag=2, all geometric and attribute information corresponding to the corresponding sub-frames are contained in the sub-samples. In the sub-sample division manner of flag=0, sub-samples of the geometry data type, the attribute data type, and the parameter data type may each correspond to a corresponding sub-frame.

When multi-track encapsulation is performed, in the sub-sample division mode of flag=2, only the geometric tracks are sub-sample divided, and all geometric information corresponding to the corresponding subframes is contained in the sub-samples. In the sub-sample division manner of flag=0, only sub-samples of the geometry data type, the parameter data type may correspond to the corresponding sub-frames.

S2030: for points Yun Zizhen present in the file, presentation time information for the point cloud subframes contained within these samples is indicated.

S2040: for the spatial information of the point cloud subframe, the correspondence between the point Yun Zizhen and tile is indicated.

S2050: and the server transmits the point cloud file to the client.

S2060: when the client side unpacks and decodes the point cloud file, each point Yun Zizhen is extracted according to the information related to the point cloud subframe.

S2070: and after reordering the point cloud sequence, the client performs presentation in combination with presentation time information of the point cloud sub-frames.

The embodiment of the application provides a file encapsulation method for a point cloud subframe aiming at GPCC point cloud media. In the file encapsulation method, the encapsulation modes of the point cloud subframes in the samples under different scenes are defined, the identification and the duration of the point cloud subframes are indicated, and the corresponding relation between the point cloud subframes and the point cloud space blocks is indicated. The method and the device can support the encapsulation of the point cloud subframes in the file more flexibly, so that more application scenes are supported, and the improvement of the coding efficiency caused by utilizing the point cloud subframes is maximized.

One implementation scheme of the embodiment of the application scene in which the point cloud subframes are not overlapped with each other is as follows.

(1) And the server determines subframes corresponding to the geometric slices according to the subframe index numbers corresponding to the geometric slices in the point cloud code stream.

Assuming that the point clouds included in each subframe do not overlap with each other, when the point Yun Zizhen performs sub-sample encapsulation, a sub-sample division mode of flag=2 is adopted and a corresponding subframe index number is indicated.

The server S1 packages the point cloud code stream into a point cloud file F1 in a monorail manner, and the file package result is shown in fig. 21. In the samples where the point cloud subframes exist, the point cloud subframes are divided and indicated in the form of sub-samples. The value of the flags field in the subsamplelnformationbox data box is 2, and indexes of sub-frames are respectively indicated as 1 and 2 in sub-sample0 and sub-sample 1.

The server S2 encapsulates the point cloud code stream into a point cloud file F2 in a multi-track mode based on components, and sub-sample division is carried out on the geometric track. For the attribute track, optionally, the data information of the attribute track can be correspondingly found through the index relation between the geometric track and the attribute track, and the encapsulation result is shown in fig. 22. The sub-samples may be divided according to attribute tracks, and the result of the encapsulation is shown in fig. 23. In the samples where the point cloud subframes exist, the point cloud subframes are divided and indicated in the form of sub-samples. The value of the flags field in the subsamplelnformationbox data box is 2, and indexes of sub-frames are respectively indicated as 1 and 2 in sub-sample0 and sub-sample 1.

(2) For points Yun Zizhen present in the file, presentation time information for the point cloud subframes contained within these samples is indicated.

SubFrameConfigurationGroupEntry：

{nb_subframes＝2；

with_unique_duration_flag＝1；

{subframe_index＝1；sub_sample_duration＝10}

{subframe_index＝2；sub_sample_duration＝20}

}

By means of the characteristics of the sample group, samples with sub-frames can be indexed to sample2, and then the presentation duration of each sub-frame can be known by providing fields according to the embodiment of the application. Each is 10 units of timescan (by definition).

(3) For the spatial information of the point cloud subframe, the correspondence between the point Yun Zizhen and tile is indicated.

SubFrameConfigurationGroupEntry：

{nb_subframes＝2；

with_tile_info_flag＝1；

{subframe_index＝1；num_tiles＝2；{tile_id＝0,1}}

{subframe_index＝2；num_tiles＝1；{tile_id＝2}}

}

By means of the characteristics of the sample group, the sample2 can be indexed to be a sample with sub-frames, and then the fields are provided by the embodiment of the application, so that tile id information corresponding to each sub-frame can be known.

And finally, combining the association indication of tile id and space information to know the space information corresponding to each subframe.

(4) And the server transmits the point cloud file to the client.

(5) When the client side unpacks and decodes the point cloud file, each point cloud subframe is extracted according to the information related to the point cloud subframe, and after the point cloud sequence is reordered, the point cloud sequence is presented by combining the presentation time information and the spatial information of the point cloud subframe.

In the client implementation, optionally, the point cloud sequence may be reordered and then decoded in the decapsulation stage. Or unpacking and decoding before reordering according to the information of the sub-frame.

One implementation scheme of the embodiment of the application scene with overlapping point cloud subframes is as follows.

Assuming that there is overlap between the point clouds included in each subframe, when the point Yun Zizhen is packaged with sub-samples, a sub-sample division manner with flag=0 is adopted for division.

The server S1 packages the point cloud code stream into a point cloud file F1 in a monorail mode, and takes a value of 0 in a flag field in a subsampleinformation box data box in a sample with a point cloud subframe, wherein each subsampleis a G-PCC data unit. The packaging result is shown in fig. 24.

The sub-frame information of each G-PCC data unit in each sub-sample in sample2 may be indicated in combination with the information in the subsamplesubstructureinfogroupentry data box.

{subsample_count＝4

{related_subframe_num＝1；subframe_index＝1}

{related_subframe_num＝2；subframe_index＝1，2}

{related_subframe_num＝1；subframe_index＝2}

}

And associating the sub-sample with the corresponding sub-frame information according to the sequence of each sub-sample in the sample 2.

The encapsulation of the multi-track mode is divided into sub-samples, which is the same as the processing method of the previous embodiment. The sub-sample may be divided only in the geometric track, or may be divided in both the geometric and attribute tracks.

SubFrameConfigurationGroupEntry：

{nb_subframes＝2；

with_unique_duration_flag＝0；

}

By the characteristics of the sample group, the sample2 can be indexed to be a sample with sub-frames, and then the field is provided by the embodiment of the application, so that the presentation duration of each sub-frame can be known to be the same. Assuming that sample2 is 20 units of time (by definition), then each sub-frame is 10 units of time.

SubFrameConfigurationGroupEntry：

{nb_subframes＝2；

with_tile_info_flag＝1；

{subframe_index＝1；num_tiles＝2；{tile_id＝0,1}}

{subframe_index＝2；num_tiles＝2；{tile_id＝1,2}}

}

By the characteristics of the sample group, the sample2 can be indexed to be a sample with sub-frames, and then the field is provided by the embodiment of the application, so that the tile id information corresponding to each sub-frame can be known.

And finally, combining the association indication of tile id and space information, and knowing the space information corresponding to each subframe.

(4) And the server transmits the point cloud file to the client.

It should be noted that although the steps of the methods in the present application are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

The following describes an embodiment of the apparatus of the present application, which may be used to execute the method for encoding and decoding the point cloud media in the foregoing embodiment of the present application. Fig. 25 schematically shows a block diagram of a decoding device for point cloud media according to an embodiment of the present application. As shown in fig. 25, the decoding apparatus 2500 for point cloud media includes:

an acquisition module 2510 configured to acquire a point cloud media file comprising point cloud samples encapsulated in one or more tracks;

the analyzing module 2520 is configured to analyze the media file data boxes of all the sub-samples in the point cloud sample to obtain the values of sub-sample zone bit fields, wherein the sub-sample zone bit fields are used for indicating the division modes of the sub-samples;

an index module 2530, configured to obtain index information of one or more point cloud subframes corresponding to each data unit in the subsamples according to the value of the subsamples flag bit field; when one data unit in the subsamples corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapped point cloud data;

the decoding module 2540 is configured to perform decapsulation and decoding processing on the point cloud media file according to the index information of the one or more point cloud subframes, so as to obtain point cloud data.

Fig. 26 schematically shows a block diagram of a point cloud media encoding apparatus according to an embodiment of the present application. As shown in fig. 26, the encoding device 2600 for point cloud media includes:

an acquisition module 2610 configured to acquire point cloud source data, the point cloud source data including a point cloud frame having one or more point cloud subframes;

the encoding module 2620 is configured to encode the point cloud frame to obtain at least one data unit;

an encapsulation module 2630 configured to encapsulate the at least one data unit to obtain a point cloud media file, where the point cloud media file includes point cloud samples encapsulated in one or more tracks; the media file data box of each sub-sample in the point cloud sample comprises a sub-sample zone bit field and a sub-frame index field; the sub-sample zone bit field is used for indicating a division mode of the sub-sample, and the sub-frame index field is used for indicating index information of one or more point cloud sub-frames corresponding to each data unit in the sub-sample; when one data unit in the subsamples corresponds to index information of at least two point cloud subframes, the at least two point cloud subframes have overlapping point cloud data.

Specific details of the encoding device and decoding device for point cloud media provided in each embodiment of the present application have been described in detail in the corresponding method embodiments, and are not described herein again.

Fig. 27 schematically shows a block diagram of a computer system for implementing an electronic device according to an embodiment of the present application.

Note that the computer system 2700 of the electronic device shown in fig. 27 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiments of the present application.

As shown in fig. 27, the computer system 2700 includes a central processing unit 2701 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 2702 (ROM) or a program loaded from the storage portion 2708 into a random access Memory 2703 (Random Access Memory, RAM). In the random access memory 2703, various programs and data necessary for system operation are also stored. The central processing unit 2701, the read only memory 2702 and the random access memory 2703 are connected to each other through a bus 2704. An Input/Output interface 2705 (i.e., an I/O interface) is also connected to bus 2704.

The following components are connected to the input/output interface 2705: an input portion 2706 including a keyboard, a mouse, and the like; an output portion 2707 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage portion 2708 including a hard disk and the like; and a communication section 2709 including a network interface card such as a local area network card, a modem, or the like. The communication section 2709 performs communication processing via a network such as the internet. The driver 2710 is also connected to the input/output interface 2705 as needed. A removable medium 2711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 2710 as needed, so that a computer program read out therefrom is installed into the storage portion 2708 as needed.

In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 2709 and/or installed from the removable medium 2711. The computer programs, when executed by the central processor 2701, perform the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal that propagates in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for decoding point cloud media, comprising:

analyzing a media file data box of each sub-sample contained in the point cloud sample to obtain a value of a sub-sample zone bit field, wherein the sub-sample zone bit field is used for indicating a division mode of the sub-sample;

2. The method for decoding point cloud media according to claim 1, wherein the dividing manner of each sub-sample in the point cloud sample comprises:

Dividing the sub-samples based on spatial partitioning when the value of the sub-sample flag bit field is a second value, such that one sub-sample contains one or more consecutive data units corresponding to a first partitioning object, the first partitioning object including at least one of spatial partitioning, parameter set, spatial partitioning set information, or frame boundary identification;

when the value of the sub-sample flag bit field is the third value, the sub-samples are divided based on the point Yun Zizhen, so that one sub-sample contains one or more continuous data units corresponding to a second division object, and the second division object comprises a complete point Yun Zizhen.

3. The method for decoding point cloud media according to claim 2, wherein when the value of the sub-sample flag bit field is a third value, the media file data box of the sub-sample includes:

and a subframe index field, wherein the subframe index field is used for indicating index information of the point cloud subframes contained in the current subsamples.

4. The method of decoding point cloud media according to claim 2, wherein among the plurality of sub-samples, the second division object corresponding to at least one of the sub-samples includes a plurality of point cloud sub-frames.

5. The method for decoding point cloud media as claimed in claim 4, wherein when the value of the sub-sample flag bit field is a third value, the media file data box of the sub-sample includes:

6. The method according to claim 5, wherein when the point cloud samples are encapsulated in one track, all data constituting the point cloud sub-frame includes all geometric data and attribute data; when the point cloud sample is packaged in a plurality of tracks, all data forming the point cloud subframe comprise all geometric data or all attribute data.

7. The method for decoding point cloud media according to claim 1, wherein when the value of the sub-sample flag bit field is a first value, the media file data box of the point cloud sub-frame includes:

8. The method for decoding point cloud media according to claim 7, wherein when the value of the sub-sample flag bit field is a first value, the media file data box of the point cloud sub-frame further comprises:

a number of sub-samples field for indicating the number of sub-samples contained in the current sample.

9. The method for decoding point cloud media according to claim 8, wherein when the value of the sub-sample flag bit field is a first value, the media file data box of the point cloud sub-frame further comprises:

and a sample sequence number difference field for indicating a sequence number difference between a point cloud sample currently containing a plurality of point cloud subframes and a point cloud sample previously containing a plurality of point cloud subframes in decoding order.

10. The decoding method of point cloud media according to any of claims 1 to 9, wherein when each sub-sample in the point cloud sample is divided into sub-samples based on a point Yun Zizhen, a media file data box of the sub-sample includes:

and a subsampleduration field, configured to indicate a presentation duration of a current subsample when each point cloud subframe included in the point cloud sample has a different presentation duration.

11. The method for decoding point cloud media according to any of claims 1 to 9, wherein the media file data box of the point cloud sample comprises:

a subframe index field, configured to indicate index information of a point cloud subframe corresponding to a current subframe when each point cloud subframe included in the point cloud sample has different presentation time lengths;

12. The method for decoding point cloud media according to any of claims 1 to 9, wherein the media file data box of the point cloud sample comprises:

13. A method for encoding point cloud media, comprising:

encoding the point cloud frame to obtain at least one data unit;

14. A decoding device for point cloud media, comprising:

the analysis module is configured to analyze the media file data boxes of all the subsamples contained in the point cloud sample to obtain the value of a subsample zone bit field, wherein the subsample zone bit field is used for indicating the division mode of the subsamples;

15. A point cloud media encoding apparatus, comprising:

16. A computer readable medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1 to 13.

17. An electronic device, comprising:

A processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to cause the electronic device to perform the method of any one of claims 1 to 13 via execution of the executable instructions.