GB2614100A

GB2614100A - Method, device, and computer program for optimizing encapsulation of point cloud data

Info

Publication number: GB2614100A
Application number: GB2205705.3A
Authority: GB
Inventors: Denoual Franck; Maze Frédéric; Ouedraogo Naël; Ruellan Hervé; Tocze Lionel
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-12-16
Filing date: 2022-04-19
Publication date: 2023-06-28
Anticipated expiration: 2042-04-19
Also published as: GB2614099A; GB2614100B; GB202205705D0; GB202205011D0; GB2613853A; GB2613853B

Abstract

A method of encapsulating point cloud data in an ISOBMFF media file 700 comprising obtaining 3D points organised in frames; determining sub-frames of the frames with associated timing information; generating metadata comprising the timing information; and encapsulating the 3D points and generated metadata in a media file, wherein each frame is encapsulated as a sample. The timing information may be determined using the acquisition or rendering time of the associated points, may be defined as one of time interval, time stamp, or frame index, and may be relative to a frame or a track. The generated metadata may comprise a frame rate enabling determining of the timing of sub-frames when combined with the frame index. Timing information common to multiple sub-frames may be provided in one of a sub-frame configuration, a sub-sample description 725, or a sample group description. The sub-frames may be described in different tracks. The number of sub-frames may vary between frames and the number of 3D-points may vary between sub-frames. A respective decapsulation method is also disclosed.

Description

METHOD, DEVICE, AND COMPUTER PROGRAM FOR OPTIMIZING

ENCAPSULATION OF POINT CLOUD DATA

FIELD OF THE DISCLOSURE

The present disclosure relates to encapsulation of data, in particular of uncompressed or compressed point cloud data, in a standard and interoperable format, for example to store or transmit point cloud frames comprising 3D points acquired sequentially from a sensor and/or to be rendered sequentially.

BACKGROUND OF THE DISCLOSURE

The Moving Picture Experts Group (MPEG) is standardizing the compression and storage of point cloud data (also called volumetric media data) information. Point cloud information consists in sets of 3D points with associated attribute information such as colour, reflectance, and frame index.

On the first hand, MPEG-I Part-9 (ISO/IEC 23090-9) specifies Geometry-based Point Cloud Compression (G-PCC) and specifies a bit-stream syntax for point cloud information. According to MPEG-I Part-9, a point cloud is an unordered list of points comprising geometry information, optional attributes, and associated metadata.

Geometry information describes the location of the points in a three-dimensional Cartesian coordinate system. Attributes are typed properties of each point, such as colour or reflectance. Metadata are items of information used to interpret the geometry information and the attributes. The G-PCC compression specification (MPEG-I Part-9) defines specific attributes like frame index attribute or frame number attribute, with a reserved attribute label value (3 to indicate a frame index and 4 to indicate a frame number attribute), being recalled that according to MPEG-I Part-9, a point cloud frame is set of points at a particular time instance. A point cloud frame may be partitioned into one or more ordered sub-frames. A sub-frame is a partial representation of a point cloud frame consisting of points with the same frame number or frame index attribute value.

For example, a sub-frame may be a set of points with their attributes within a point cloud frame that share common acquisition, capture, or rendering time. As another example, a sub-frame may be a set of points with their attributes within a point cloud frame that were successively acquired or capture during a given time range or should be rendered in a given time range. Yet as another example, a sub-frame may be a set of points with their attributes within a point cloud frame that were acquired according to a laser shot direction or corresponding to a part of the scanning path of the 3D sensor. Still in MPEGI Part-9, a point cloud frame is indicated by a FrameCtr variable, possibly using a frame boundary marker data unit or parameters in some data unit header (a frame_ctr_lsb syntax element).

On the second hand, M PEG-1 Part-18 (ISO/IEC 23090-18) specifies a media format that makes it possible to store and to deliver geometry-based point cloud compression data. It is also supporting flexible extraction of geometry-based point cloud compression data at delivery and/or decoding time. According to MPEG-I Part-18, the point cloud frames are encapsulated in one or more G-PCC tracks, a sample in a G-PCC track corresponding to a single point cloud frame. Each sample comprises one or more G-PCC units which belong to the same presentation or composition time. A G-PCC unit is one type-length-value (TLV) encapsulation structure containing one of SPS, GPS, APS, tile inventory, frame boundary marker, geometry data unit, and attribute data units. The syntax of TLV encapsulation structure is defined in Annex B of ISO/IEC 23090-9.

While the ISOBMFF file format has proven to be efficient to encapsulate point cloud data, there is a need to improve encapsulation efficiency in particular to improve description of encapsulated data in order to optimize the access to particular items of data.

SUMMARY OF THE DISCLOSURE

The present disclosure has been devised to address one or more of the foregoing concerns.

In this context, there is provided a solution for improving encapsulation of point cloud data.

According to a first aspect of the disclosure there is provided a method for encapsulating point cloud data in a file compliant with an ISOBMFF based standard, the method comprising: obtaining point cloud data comprising 3D points, the obtained point cloud data being organized in at least one point cloud frame; determining at least two sub-frames of the at least one point cloud frame, a timing information being associated with each of the at least two sub-frames, the timing information associated with one of the at least two sub-frames being different from the timing information associated with the other of the at least two sub-frames; generating metadata describing the at least two sub-frames, the generated metadata comprising the timing information associated with each of the at least two sub-frames; and encapsulating the obtained 3D points and the generated metadata in the file, each point cloud frame being encapsulated as one or more samples.

Accordingly, the method of the disclosure makes it possible to describe sub-frames in a file containing encapsulated point cloud data, making it possible, in particular, to identify or access and to extract specific point cloud data, without decoding data units that are not requested, enabling for example partial rendering.

According to some embodiments, the timing information associated with at least one of the at least two sub-frames is determined as a function of acquisition timing information associated with 3D points of the at least one of the at least two sub-frames or as a function of rendering timing information associated with 3D points of the at least one of the at least two sub-frames.

Still according to some embodiments, the timing information associated with at least one of the at least two sub-frames is defined as a time interval or as a timestamp. Still according to some embodiments, the timing information associated with at least one of the at least two sub-frames is a frame index.

Still according to some embodiments, the generated metadata further comprise a description of the at least one point cloud frame, the description of the at least one point cloud frame comprising a frame rate enabling determining timing of sub-frames when combined with the frame index.

Still according to some embodiments, the timing information is relative to a frame or is relative to a track.

Still according to some embodiments, common timing information associated with the at least two sub-frames is provided within a sub-frame configuration structure of the metadata.

Still according to some embodiments, the timing information associated with at least one of the at least two sub-frames is provided within a sub-sample description.

Still according to some embodiments, the timing information associated with at least one of the at least two sub-frames is provided within a sample group description. Still according to some embodiments, the at least two sub-frames are described in different tracks.

Still according to some embodiments, the number of sub-frames per frame varies from one frame to another.

Still according to some embodiments, the number of 3D points within one of the at least two sub-frames is different from the number of 3D points within the other of the at least two sub-frames.

According to a second aspect of the disclosure there is provided a method for parsing point cloud data encapsulated in a file compliant with an ISOBMFF based standard, the point cloud data comprising 3D points and being organized in at least one point cloud frame, each point cloud frame being encapsulated as one or more samples, the method comprising: obtaining metadata from the file, identifying, from the obtained metadata, timing information associated with each of at least two sub-frames of the at least one point cloud frame, the timing information associated with one of the at least two sub-frames being different from the timing information associated with the other of the at least two sub-frames, and obtaining 3D points of at least one of the at least two sub-frames, the 3D points being obtained as a function of the timing information associated with the at least one of the at least two sub-frames.

Accordingly, the method of the disclosure makes it possible to obtain a description of sub-frames in a file containing encapsulated point cloud data, making it possible, in particular, to access and to extract specific point cloud data, without decoding data units that are not requested, enabling for example partial rendering.

According to some embodiments, the timing information associated with at least one of the at least two sub-frames is representative of acquisition timing information associated with 3D points of the at least one of the at least two sub-frames or representative of rendering timing information associated with 3D points of the at least one of the at least two sub-frames.

Still according to some embodiments, the timing information associated with at least one of the at least two sub-frames is defined as a time interval, as a timestamp, or as a frame index.

Still according to some embodiments, the at least two sub-frames are described in different tracks, wherein the number of sub-frames per frame varies from one frame to another, and/or wherein the number of 3D points within one of the at least two sub-frames is different from the number of 3D points within the other of the at least two sub-frames.

Still according to some embodiments, the timing information associated with each of the at least two sub-frames is provided within a sub-frame configuration structure, a sub-sample description, or within a sample group description.

According to another aspect of the disclosure there is provided a device comprising a processing unit configured for carrying out each of the steps of the method described above.

This aspect of the disclosure has advantages similar to those mentioned above.

At least parts of the methods according to the disclosure may be computer implemented. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present disclosure may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the solution of the present disclosure can be implemented in software, the solution of the present disclosure can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments of the disclosure will now be described, by way of example only, and with reference to the following drawings in which: Figure 1 illustrates an example of a system wherein the invention can be implemented; Figure 2 illustrates the acquisition or capture of point cloud data using a 3D sensor of the LI DAR type; Figure 3 illustrates an example of steps of an encapsulation process according to some embodiments of the invention; Figures 4a, 4b, and 4c illustrate an example of different G-PCC frame and sub-frame configurations; Figures 5a to 5e illustrate different track organizations for the encapsulation of PCC frames and sub-frames into a media file; Figure 6 illustrates an example of steps of a parsing process according to some embodiments of the invention, making it possible to identify metadata structures providing ATI in G-PCC tracks and to extract a subset of data corresponding to a G-PCC sub-frame or a set of G-PCC sub-frames; Figure 7 illustrates encapsulation of additional timing information (ATI) in a sub-sample information box ( subs' box) of a G-PCC track; Figure 8 illustrates a media file having a G-PCC track containing an additional timing information (ATI) description defined in a sample group description box that is used with a G-PCC unit mapping sample group; Figure 9 illustrates a media file having a single G-PCC track described in a trak' box of a moov', containing a sub-frame description through packing of samples; Figure 10 illustrates an example of the organization of point cloud data to encapsulate and a resulting encapsulated media file, using a single PCC track, providing sub-frame description and access; Figure 11 illustrates an encapsulated media file with multiple PCC tracks, based on sub-frames; and Figure 12 schematically illustrates a processing device configured to implement at least one embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE DISCLOSURE

According to some embodiments of the disclosure, additional metadata are provided in G-PCC tracks to provide description of sub-frames (i.e. parts of a PCC sample with a same acquisition or capture time or a same rendering or presentation time or composition time). This improves time granularity and enables, depending on the 3D sensor that is used, providing spatial access (e.g. the direction of a laser shot) within a PCC sample.

These additional metadata may comprise additional timing information (ATI) such as acquisition time or rendering time information in the description of PCC tracks. It is an additional timing information to classical sample timing information like decoding time or composition time contained in the sample description boxes. This ATI may be associated with subsamples to provide PCC sub-frames (e.g. from MPEG-I Part-9) description in G-PCC tracks. ATI may also be called "sub-frame descriptor", "sub-frame description", or "sub-frame description structure". Such additional information allows identifying or accessing or extracting data units corresponding to a sub-frame, i.e. data units within a sample corresponding to a given acquisition or rendering time (or consecutive range of acquisition or rendering time) or to a given shot direction of the sensor producing the point cloud.

In other words, timestamp information provided by 3D sensors (e.g. radar based on laser beam scanning, spinning LIDARs, MEMS (Micro Electro Mechanical Systems) LIDARs, etc.) able to generate a point cloud with timestamp information associated with the recorded points may be used during the encapsulation process, in particular to encapsulate sub-frames.

Figure 1 illustrates an example of a system wherein the invention can be implemented. More precisely, the invention may be used in a media file writer such as media file writer 100 or in a media player such as media player 130 or in both.

As illustrated, media file writer 100 takes point cloud data (or volumetric data), such as point cloud data 150, as input. Point cloud data 150 may be obtained from a 3D sensor, as described by reference to Figure 2. The point cloud data may be received as uncompressed raw data or as a compressed bit-stream, for example a compressed bit-stream complying with the MPEG-I Part-9 standard. Media file writer 100 comprises encapsulation module 102.

Media file writer 100 may be connected, via a network interface (not represented), to a communication network 120 to which may also be connected, via a network interface (not represented), media player (or reader) 130 comprising a de-encapsulation module 132.

Media file writer 100 may be used to serve media files, for example using a protocol for dynamic adaptive streaming on HTTP like DASH (Dynamic Adaptive Streaming over HTTP) or HTTP Live Streaming. These protocols require a streaming manifest, such as a Media Presentation Description (MPD), or a playlist. When used to stream encapsulated media content, media file writer 100 may contain a manifest generation module such as manifest generation module 104. Media file writer 100 may also contain a compression module such as compression module 106 to compress the input point cloud data into a compressed bit-stream, for example using a point cloud compression algorithm like the one described in MPEG-I Part-9.

Encapsulation module 102 may encapsulate received point cloud data according to an ISOBMFF-based format like MPEG-I Part-18, for interoperability purposes, in order to generate a media file like media file 152 that may be stored for later use by a player or by an image analysis tool or that may be transmitted to a media player or streaming client. The encapsulation process carried out in encapsulation module 102 is further described in reference to Figure 3.

Media file writer 100 may be controlled and parameterized by a user, for example through a graphical user interface or by an application, for example by application code or scripting. To process compressed point cloud data, for example to process a bit-stream of compressed point cloud data complying with MPEG-I Part-9, encapsulation module 102 may contain a G-PCC unit parser that can read the header of G-PCC units, for example to determine the length (e.g. in bytes, like the tiv_num_payload_bytes syntax element) of the data corresponding to the unit or the unit type (e.g. the tiv_type syntax element). The G-PCC unit parser may also be able to parse header information for some G-PCC units, like for example the attribute header (to obtain its type) and may also be able to parse parameter sets to obtain general information on the bit-stream. To process uncompressed point cloud data, for example data obtained directly from a 3D sensor, encapsulation module 102 may contain a point cloud data parser that can read the point positions and their attributes directly from the captured raw data (e.g. a.ply or.pcd file parser). The media writer may be embedded in a recording device, in a multi-sensor camera device, on a vehicle embedding 3D sensors or be part of software tools in a studio where volumetric data is acquired.

Media file 152 may consist in a single media file or in a set of media segment files, for example as ISOBMFF segments (ISO base media file containing one or more segment(s)). The media file may be a fragmented file, for example for live acquisition or capture and encapsulation or live (or low-latency) streaming. It may comply with the ISOBMFF standard or to standard specifications derived from ISOBMFF (e.g. MPEG-I Part-18).

Media player (or reader) 130 may be a streaming client, the streaming features being handled by a streaming module like streaming module 134, for example implementing a DASH or HLS client, for requesting a media file such as media file 152 and for adapting the transmission parameters. Media player 130 may also contain a decompression module 136 taking as input a bit-stream representing compressed point cloud data, for example a bit-stream complying with M PEG-I Part-9, and generating point cloud data (or volumetric data) for rendering or analysis. Media file 152 may be read from a storage location or streamed using the streaming module 134. The data may be read at once or by chunks, segments, or fragments and provided to de-encapsulation module (or parsing module) 132.

De-encapsulation module (or parsing module) 132 then extracts the, or a subset of, encapsulated point cloud data, depending on the player configuration or on the choices from a user or on the parameters of an application using the media player. The extracted point cloud data may result in a bit-stream such as a bit-stream complying with MPEG-I Part-9. In such a case, the bit-stream is provided to a decompression module (an external decompression module or an internal decompression module, e.g. internal decompression module 136) for the reconstruction of the point cloud data 154 for usage by a user or application (for example visualization or analysis). The parsing process is further described in reference to Figure 4. A media player may be embedded in a display device (e.g. a smartphone, tablet, PC, vehicle with multimedia screen, etc. or software tools in a studio for volumetric data production).

Acquiring point cloud data The inventors have observed that while point cloud data are often organized by frames, each frame being associated with particular time instance, the 3D points of a point cloud frame are generally acquired sequentially by the sensor that is used. Storing acquisition, capture, or rendering time information in relation with encapsulated point cloud data may be useful, for example to enable direct access to specific data within a frame.

Figure 2 illustrates the acquisition or capture of point cloud data using a 3D sensor of the LIDAR (light detection and ranging) type. Some sensors, like rotating or spinning LIDARs may cover an entire sphere, as illustrated in Figure 2 with reference A', by successively scanning the sphere from a point of origin. For example, a typical LIDAR sensor emits pulsed light waves into the surrounding environment by first scanning along elevation angle, as illustrated in Figure 2 with reference 13', and then along azimuth angle, as illustrated in Figure 2 with reference 'C'. These emitted pulses (or laser shots) bounce off surrounding objects and return to the sensor. The sensor uses the time it took for each pulse to return to the sensor to calculate the distance it travelled. Each laser shot for a given direction (i.e. a pair of angles (azimuth, elevation) or (phi, theta)) makes it possible to obtain a distance, the direction and the obtained distance defining a 3D point, possibly described by Cartesian coordinates as illustrated in Figure 2 with reference 'A'. As illustrated in Figure 2 with reference 'D', the measured distances may be stored in cells of an array wherein rows and columns represent elevation and azimuth, respectively, being noted that some cells may be empty. By repeating this process, the sensor can build a 3D map of the environment. The 3D map may cover a 360° field of view or less than 360, depending on the sensor characteristic (or configuration). The characteristics may describe the field of view, for example as theta or phi covered ranges, the angular resolution (or angle step) that indicates the difference between two shots in unit of azimuth or elevation angle.

The emitted laser shots hitting an object and returning to the sensor generates one (or possibly more than one) point on the resulting image. The acquisition may be done at a frame rate, that is another parameter of the 3D sensor, defining the acquisition time between two frames, a frame corresponding, for example, to the capture of the maximum field of view supported by the device (e.g. a complete 360° capture when the device allows covering a 360° field of view). As observed above, the points contained in a frame may be successively acquired. As a consequence, these points may have different acquisition time and/or may have different rendering time later on.

The set of frames obtained from the 3D sensor results in a point cloud sequence. Each frame may be associated with a frame counter or frame time and points within the frame may also have additional timing information, corresponding for example to their capture time by the 3D sensor.

Naturally, there exist other types of 3D sensors wherein 3D data of the same frame are acquired successively.

It is observed that there exist applications that can take advantage of using such timing information. For example, there exist neural networks that may rely on such timing information to guide autonomous vehicles (autonomous guided vehicles, cars, drones, or robots) in real environment. For such applications, a precise measurement of the timing information associated with the cloud of points may be necessary for accurate estimation. There may also be applications for augmented reality where a 3D object is composed onto natural video, for telepresence or for topography measurement applications. Accordingly, 3D sensors may associate timestamp information with each detected point. Such timestamps may be stored as attributes, in addition to colour and reflectance. It is also to be noted that for compressed point cloud data, several frames may be packed or combined into one combined or aggregated frame to improve the compression efficiency, thus the frame in a Point Cloud bitstream may not correspond to a frame at the output of a 3D sensor, and may have different timing information (a timing resulting from encoding choice and a timing corresponding to a capture time).

Encapsulation process, G-PCC configuration, and track organization Figure 3 illustrates an example of steps of an encapsulation process according to some embodiments of the invention, making it possible to identify timing information in input data, for example in point cloud data 150 in Figure 2, and in adding description of sub-frames in G-PCC tracks. For the sake of illustration, the steps illustrated in Figure 3 may be carried out by encapsulation module 102 in Figure 1. According to the illustrated example, a first step (step 300) is directed to initializing the reception of the point cloud data (or volumetric data) to encapsulate. Next, the encapsulation module is configured (step 305) as a function of the type of data to be processed (e.g., depending on whether the point cloud data are compressed or not depending on whether the point cloud data are analyzed or not). This can be done by a user or by an application. For the sake of illustration, the configuration may comprise choosing a single track or multi-track encapsulation, choosing a live or offline encapsulation, choosing a description mode, for example describing tiles or not, determining the granularity of the timing, for example whether timing information should be associated with each point of the cloud, or sets of points on a capture time range etc. In the following, it is considered that timing information is associated with each point of the cloud or with set of points of the cloud forming sub-frames. It is to be noted that when combined or aggregated frames are encoded as a single frame, each frame that is part of the combination or aggregation may be considered as a sub-frame or as a G-PCC sub-frame when compressed with MPEG-I Part-9. Several points may share the same timing information or each point may have its own timing information. Timing information may be incremental from one point to another, following a scanning path of the 3D sensor. The timing increments may be fixed or variable depending on the 3D sensor.

Configuration step 305 may also be used to indicate whether the same configuration parameters should apply for encapsulating the whole point cloud data (static configuration) or may change when encapsulating the point cloud data (dynamic configuration). In case the media file writer also contains a compression module, for example compression module 107 in Figure 1, the configuration step may comprise setting parameters for the encoder: for example setting the display frame rate, setting an acquisition frame rate, setting a maximum number of sub-frames allowed in a frame, setting the number of sub-frames per frame, setting whether the number of sub-frames per frame is fixed or not, setting whether the sampling rate or time difference between sub-frames is constant or variable or setting parameters of the 3D sensor like field of view, angular resolution, description of the scanning path, etc. The compression module may encode these configuration parameters, for example as additional syntax elements of a G-PCC bit-stream, for example in the Sequence Parameter Set, in a volumetric usage information unit with a dedicated TLV type, or as supplemental enhancement information message, also with a dedicated TLV type. When the point cloud data are received by the encapsulation module as a G-PCC bit-stream (e.g. a bit-stream generated by the compression module 106 in Figure 1, or by an external compression module), the configuration of the encapsulation module may use information from the parameter sets of the bit-stream or supplemental information associated with the bit-stream (sometimes called SEI (Supplemental Enhancement Information) messages).

Supplemental information means encoded parameters that are not mandatory to decode the point cloud data but that may help application using these point cloud data by providing additional information.

Further to the configuration of the encapsulation module, metadata structures of a media file such as top-level boxes (e.g., ftyp or styp, moov, trak, mdat and boxes for sample description like stbl, stsd, etc) are created during an initialization step (step 310). Such an initialisation step may comprise reading parameter sets (e.g. geometry and attribute parameter sets) from an encoded bit-stream of point cloud data or may comprise obtaining information about the sensor (for uncompressed data) like a number of points, the types of attributes associated with the points (e.g., color, reflectance, timestamp, areas of interests, etc.). It is noted that some of the setting parameters defined in configuration step 305 may be reflected in track description or sample description. As well, supplemental information when available may be entirely or partially included in the media file. Parts of the configuration information (configuration parameters, parameter sets, or supplemental information) may be gathered in a metadata structure dedicated to sub-frame configuration information. For example, a metadata structure dedicated to sub-frame configuration information may contain an acquisition frame rate, a point sampling rate, a number of sub-frames within a frame, or a number of frames combined in a combined or aggregated frame. Depending on static or dynamic configuration, a metadata structure dedicated to sub-frame configuration information may be included in different parts of a track description. For static configuration, it may be included as an additional optional box in a sample entry or within a GPCCConfigurationBox box, or as a dedicated box in the track description. For dynamic configuration, a metadata structure dedicated to sub-frame configuration information may be provided as entries in a SampleGroupDescriptionBox with a specific grouping type so that group of samples sharing the same configuration can be mapped to one of these entries. In a variant for the dynamic configuration, a metadata structure dedicated to sub-frame configuration information may be contained with the sub-frame description itself, for example as a SubFrameConfigurationGroupEntry inheriting from a VolumetricVisualSampleGroupEntry: class SubFrameConfigurationGroupEntry () extends VolumetricVisualSampleGroupEntry ('sfcf') { unsigned int (16) nb subframes; // optional parameters: unsigned int (16) max nb subframes; unsigned int (32) capture timescale; unsigned int (32) sampling rate; unsigned int (16) angular resolution; // any other parameters describing the 3D sensor or sub-frame configurations where nb_subframes provides the number of sub-frames for the samples mapped to this sample group entry. The capture_timescale may be an integer that specifies the number of time units that pass in one second. It may be present when the timescale from the 'mdhd' box is not suitable, for example does not have sufficient granularity. The max_nb_subframes parameter optionally indicates the maximum number of sub-frames allowed in a sample. It may be indicated for dynamic configuration when the number of sub-frames per sample may vary along a point cloud sequence. The sampling_rate parameter may be an integer providing an acquisition rate of the 3D sensor, expressed in capture_timescale units when present or from the timescale in the 'mdhd' box otherwise. It may be used to compute a presentation or composition time for the sub-frames.

For instance, in a sample, the timing or presentation time or composition time of a sub-frame, identified by a given frame index attribute value subframe_idx, may be computed as follows when the capture_timescale parameter is defined: CT(sub-frame) = CT(sample) + subframe_idx * sampling_rate / capture_timescale where CT(sub-frame) is the presentation or composition time for the sub-frame, CT(sample) is the presentation or composition time for the sample containing the sub-frame, and subframe_idx is the index of the sub-frame in the sample.

Alternatively, when the capture_timescale parameter is not defined, the timing or presentation time or composition time of the sub-frame may be computed as follows: CT(sub-frame) = CT(sample) + subframe_idx * sampling_rate / timescale; where CT(sub-frame) is the presentation or composition time for the sub-frame, CT(sample) is the presentation or composition time for the sample containing the sub-frame, subframe_idx is the index of the sub-frame in the sample, and timescale is the timescale from the mdhd' box.

It is to be noted as a variant for the static configuration, a default sample grouping, further indicated as a static sample group (by setting static group description and static mapping flags values to 1) may also be used to describe sub-frame configuration information. When a metadata structure dedicated to sub-frame configuration information is included as an additional optional box, for example named SubFrameConfigurationBox identified by sfcgi four-character code, the fields of this box may be the same as in a SubFraineConf igurationaroupEntry.

Next, the encapsulation process enters in a processing loop for processing each frame of the point cloud data.

After reading a point cloud frame (step 315), for example unit by unit from a bit-stream of compressed point cloud data or point by point from a sensor record (e.g. a ply file or.pcd file), the encapsulation module determines whether the read data correspond to attribute data (test 320). Determining the type of the read data may consist in determining the type of a G-PCC unit by reading its type using a G-PCC unit parser (not represented). If it is determined that the read data correspond to attribute data, the type of attribute data is further checked by the G-PCC unit parser (test 325) to determine whether the attribute data comprise additional timing information. It is recalled that for a G-PCC bit-stream, attribute data correspond to a TLV unit of ADU (Attribute Data Unit) type. Therefore, at step 325, the encapsulation module may check if the attribute data unit read in step 315 provides additional timing information for some points of the frame.

For example, for a G-PCC bit-stream, such points of the frame may correspond to a G-PCC sub-frame (according to MPEG-I Part-9). Accordingly, it may come as a G-PCC attribute data unit of type frame index (attr_label equal to 3) or of type frame number (attr_label equal to 4). It may also come, in case of uncompressed received data, as a timestamp value associated with the point of the cloud, as determined from the sensor data (using a.ply or.pcd file reader).

If additional timing information is provided for some points of the frame, a new metadata structure for additional timing information (ATI) is created (step 330). The creation of this metadata structure may consist in collecting parameters describing the timing information associated with some points of the frame, for example within a sub-frame. The timing information for a frame is described as part of the sample description, for example in 'stts' or 'efts' box or, in case of fragmented media file, in a ?run' box or in a box for track fragment description. The parameters are, for example, an index (e.g., within a frame, for example the value of a frame index attribute for a G-PCC bit-stream or within a sequence, for example the value of a frame number attribute for a G-PCC bit-stream). It may also be a range of indexes (start, end or first, last) to aggregate points with different timestamps into the same sub-frame.

In another variant, the ATI may contain a time indication, as a decoding time in microseconds, milliseconds, or as timescale units (for example the presentation timescale declared in the MediaHeaderBox 'mdhd'box of the track or a capture timescale in the sub-frame configuration structure). In another variant, the ATI may contain a time range indication, as a decoding time in milliseconds, or as timescale units (for example the timescale declared in the MediaHeaderBox mdhd' box of the track or a capture timescale in the sub-frame configuration structure) to aggregate points with different timestamps into a same sub-frame. For aggregated points, the ATI may consist in a time value (in microseconds, milliseconds, or as timescale units) being the average timestamp of the different timestamps. In a variant, the time value for aggregated points is the timestamp of the first point of the aggregated points (the earliest captured one). For aggregated points, the ATI may contain a presentation duration, especially when the time information is not provided as a range.

The timing information, either as an index, a time unit, or as a range of indexes or time units, may be relative to a frame or relative to the point cloud sequence (absolute time in other words).

The ATI may also contain parameters describing the direction of the laser shot as a pair of angles, e.g. (azimuth, elevation) or (theta, phi), or may contain a description of the 3D region containing the points having the same ATI. Alternative embodiments of the metadata structure for ATI in the media file are described later in reference to Figures 7 to 11. In a first variant the additional timing information is generated once when the read data are the first data containing the additional timing information. In a variant, the ATI is generated for each attribute data unit with additional timing information.

Next, the data unit is indexed and stored (step 335) as for other data units. Indexing may consist in updating the sample description, especially the sample size, considering the number of bytes for the read data unit. Storage may consist in appending the data unit in a Media Data Box (e.g. 'mdat' or 'idat or 'imda' box). When the last data unit for a sample is reached (not represented), the sample description is appended in the appropriate sub boxes of the sample table box (or in trun' box when media file is fragmented). The above steps are repeated until no more data are to be encapsulated (test 340).

It is to be noted that the sub-frames may be defined with a range of values characterizing a time difference between timing information associated with 3D points of the sub-frames. This may be useful when encoding provides very high granularity (e.g. few microseconds) for frame index attribute and a trade-off needs to be set between access granularity (e.g. millisecond) and byte overhead of the sub-frame description. This trade-off may be defined as part of the configuration step of the encapsulation process.

Figures 4a, 4b, and 4c illustrate an example of different G-PCC frame and sub-frame configurations.

As illustrated in Figure 4a, configuration 400 corresponds to point cloud data forming a point cloud sequence comprising several point cloud frames, here point cloud frames 405-1 to 405-m. According to this configuration, each frame is composed of the same number of sub-frames (four in this example), for example sub-frames 410-11 to 410-14 and sub-frames 410-m1 to 410-m4, and the point sampling rate is constant resulting in the same number of points in each sub-frame.

According to the example illustrated in Figure 4b, configuration 420 corresponds to point cloud data forming a point cloud sequence comprising several point cloud frames referenced 425-1 to 425-n. Each frame is composed of sub-frames that number may vary from one frame to another. For the sake of illustration, frame 425-1 comprises three sub-frames referenced 430-11 to 430-14 while frame 425-n comprises four sub-frames referenced 430-n1 to 430-n4. This may come from a different mapping of acquisition or rendering times to sub-frames, considering compression efficiency. Still according to this example, the point sampling rate is constant resulting in a same number of points in each sub-frame.

Configuration 440 illustrated in Figure 4c corresponds to point cloud data forming a point cloud sequence comprising several point cloud frames referenced 445- 1 to 445-p. Each frame may be composed of a different (as illustrated) or the same number of sub-frames and the point sampling rate is not constant resulting in a different number of points in each sub-frame. For example, sub-frame 450-21 of frame 445-2 contains more points than sub-frame 450-22 of the same frame. This may be because the acquisition or rendering time range associated with sub-frame 450-11 is greater than the one of sub-frame 450-22.

Figures 5a to 5e illustrate different track organizations for the encapsulation of PCC frames and sub-frames into a media file (the media file being a single file, a fragmented file, or segment files). The illustration uses a single file for the sake of clarity.

The example illustrated in Figure 5a is directed to a configuration corresponding to a single track encapsulation scheme according to which media file 500 comprises a metadata part ('mood box 502) and a media data part ('mdat' box 504). Track description 506 describes a volumetric track (e.g. a G-PCC track) as indicated by a 'yak? handler type in handler box 508 of track description 506. Track description 506 also contains sample description (e.g. stbi box 510) and sub-frame description 512, according to embodiments of the invention. The sub-frame description may also be called additional timing information. The sub-frame description may be fully included in sample description 510, in track description 506, or may have a portion within track description 506 and another portion within sample description 510. The track and sample description describe the data units for the considered point cloud sequence according to any configuration described by reference to Figures 4a to 4c. For single track encapsulation of G-PCC data in ISOBMFF, the SubSampleInformationBox with flags equal to 0 in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes is present. SubSamplel nformationBox with flags equal to 2 (frame index) or 3 (frame_number) in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes may also be present. Embodiments for single track encapsulation are further described in reference to Figures 7 to 10.

The example illustrated in Figure 5b is directed to a configuration corresponding to a multi-track encapsulation scheme using sub-frame tracks wherein media file 520 comprises a metadata part ('moov' box 522) and a media data part ('mdat' box 524), the metadata part (mooti box 522) containing multiple volumetric tracks referenced 526-1 to 526-3 (e.g. G-PCC tracks) as indicated by a 'von/ handler type in corresponding handler boxes such as handler box 528-1. Each track description contains a sample description (e.g. st1)1 box 530-1) and a sub-frame description (e.g. sub-frame description 532-2), according to some embodiments of the invention. Again, the sub-frame description may also be called additional timing information. As for other track configurations, the sub-frame description of a track may be fully included in its sample description (e.g. in sample description 530-1), in its description (e.g. in sub-frame description 532-2), or may have a portion within a track description and another portion within a sample description. The track and sample description describe the data units for the point cloud sequence according to any configuration described by reference to Figures 4a to 4c. The particular point here is that data are stored as contiguous bytes for sequences of sub-frames, each track describing one or more sequence of sub-frames. For example, track 526-1 describes the sub-frames having a first index, track 526-2 describes the sub-frames having index 2 and 3 while track 526-3 describes sub-frames having index 4. Embodiments for sub-frame tracks are further described in reference to Figure 11.

The example illustrated in Figure 5c is directed to a configuration corresponding to a multi-track encapsulation scheme wherein media file 540 comprises a metadata part ('moov' box 542) and a media data part ( mdat box 544), the metadata part ('moov; box 542) comprising a geometry track, for example geometry track 546-1 and one or more attribute tracks, for example attribute tracks 546-2 and 546-3. The geometry track 546-1 references the attribute tracks 546-2 and 546-3 via a track reference type gpcat Each track may contain sub-frame description in its track description, in its sample description, or both. The geometry track may contain a metadata structure dedicated to sub-frame configuration information. The sub-frame information allows retrieving sub-frame related data units in 'mdat' box 544 for each geometry and attribute track. For this multi-track encapsulation of G-PCC data in ISOBMFF, when sub-samples are present, SubSamplelnformationBox with flags equal to 1 in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes should be present. SubSamplelnformationBox with flags equal to 2 or 3 in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes may also be present. SubSamplelnformationBox with flags equal to 0 in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes should not be present, because useless (each track conveying its own type of data). The syntax and semantics of the sub-sample with flags equal to 1, 2, or 3 as defined in reference to Figure 7 applies here.

Configuration 560 illustrated in Figure 5d corresponds to a multi-track encapsulation of volumetric data with tile tracks (as indicated by the handler box referenced 568-1 having a handler type set to the 'volt! type). For example, track 566-1 is a tile base track referencing one or more tile tracks such as tile tracks 566-2 and 566- 3. Each tile track may contain a sub-frame description in its track description, in its sample description, or in both. The sub-frame description allows retrieving sub-frame related data units in the mdat' box (e.g. in Imclat' box 564) for a tile or for a set of tiles. A tile track may contain the description and data corresponding to one or more sub-frames.

The tile base track may contain a metadata structure dedicated to sub-frame configuration information (not represented). For this multi-track encapsulation of tiled GPCC data in ISOBMFF, the SubSamplelnformationBox with flags equal to 1 in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes should be present when max_num_tile_ids_in_track > 1 or dynamic_num_tiles_flag = 1 in the GPCCTileSampleEntry of the track. When a G-PCC tile track carries a single G-PCC component, multiple sub-samples with each sub-sample carrying G-PCC component, which is indicated in the GPCCComponentlnfoBox in the sample entry, for a single G-PCC tile, are present. SubSamplelnformationBox with flags equal to 0 in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes should not be present. SubSamplelnformationBox with flags equal to 1, 2, or 3 in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes may be present. The syntax and semantics of the sub-sample with flags equal to 1, 2, or 3 as defined in reference to Figure 7 applies here.

Configuration 580 illustrated in Figure 5e corresponds to a multi-track encapsulation of volumetric data with a tile base track (e.g. tile base track 586-1) referencing one or more geometry tracks like geometry tracks 586-2 and 586-3. Each tile track for geometry information (geometry tile track) may itself reference zero or more attribute tracks like attribute tile tracks 586-21 and 586-22 for track 586-2 and attribute tile tracks 586-31 and 586-32 for track 586-3. As illustrated, each track has a handler box with handler type set to Ivo/V, indicating volumetric data. Each track may contain a sub-frame description in its track description, in its sample description, or in both. The sub-frame description allows retrieving sub-frame related data units in the mdaf box 584 for a given tile or for a set of tiles. A geometry tile track and its associated attribute tile tracks may contain the description and data corresponding to one or more sub-frames.

The tile base track may contain a metadata structure dedicated to sub-frame configuration information (not represented) when the configuration is the same for all tiles. Otherwise, a geometry track may contain a metadata structure dedicated to sub-frame configuration information applying to a tile or a set of tile carried in this geometry track and associated attribute tracks. When a G-PCC tile track carries all of the G-PCC components, multiple sub-samples with each sub-sample carrying all G-PCC components of a G-PCC tile may be present. SubSamplelnformationBox with any flags value in SampleTableBox, or in the TrackFragmentBox of each of its MovieFragmentBoxes may be present. The syntax and semantics of the sub-sample with flags equal to 1, 2, or 3 as defined in reference to Figure 7 applies here. It is to be noted that for the different track configuration on Figures 5a to 5e, the sub-frame description may use one of the embodiments described in reference to Figure 7 (use of SubSamplelnformationBox) or Figure 8 (use of sample grouping).

Parsing (de-encapsulation) process Figure 6 illustrates an example of steps of a parsing process according to some embodiments of the invention, making it possible to identify metadata structures providing ATI in G-PCC tracks and to extract a subset of data corresponding to a G-PCC sub-frame or a set of G-PCC sub-frames. The ATI or sub-frame description also allows extraction of a sample containing specific sub-frames, or frame(s) that have been combined or aggregated (as illustrated with reference 950 in Figure 9).

As illustrated, a first step 600 consists in receiving the media file to parse. It may be streamed using a streaming module, for example streaming module 134 in Figure 1 or it may be read from a storage location.

Next, the parser (or de-encapsulation module, such as de-encapsulation module 132 in Figure 1) is initialized (step 605). The initialization may be carried out by parsing the top-level boxes of the media file, for example the moov' box, the 'trek' boxes, and sample description boxes. When the media player contains a decoder (or a decompression module, (e.g. decompression module 136 in Figure 1), the decoder may also be initialized during this step, for example using decoder configuration information from the sample description (e.g. G-PCC configuration box gpcC).

Next, a sample is read (step 610). To that end, the parser (or the de-encapsulation module) reads sample description from the metadata part of the received media file to locate corresponding media data in the media data box of the received media file.

After having read the sample, the parser (or de-encapsulation module) looks for additional timing information or sub-frame description (test 615). Such a test may consist in looking for a metadata structure providing ATI (as described in reference to Figures 7, 8, 9, and 10 for a single track or to Figure 11 for multi-tracks.

If additional timing information is identified, the additional timing information is read (step 620). The parser is then able to determine the presence of a (one or more) partial representation of a point cloud for the frame. Depending on the user settings or applications needs, it may select a partial representation of the point cloud frame if it contains a point cloud representation with appropriate timing precision. In a variant, the parser may select one (or more) partial representation of a point cloud frame to read or to extract (step 625). For G-PCC, it may correspond to one or more G-PCC sub-frames within a G-PCC frame. Based on the ATI, the parser can retrieve the data units corresponding to the selected partial representation of a point cloud frame (step 630), the retrieval may be based on timing information, on spatial information corresponding to a partial representation, or a laser shot direction, any parameter conveyed in the ATI.

This variant may require that the compression of the coding configuration of the point cloud frame permits extraction of a subset of units that corresponds to the selected representation, for example slices that are constrained so that the entropy parsing of a data unit may not depend upon the final entropy parsing state of a data unit in the preceding slice, for example setting the entropy_continuation_enabled to value 0 in the G-PCC bit-stream. When this constraint is not fulfilled, the ATI may contain a list of dependencies to follow and process for the extraction of the subset of units that corresponds to the selected representation. There may exist advantageous bit-stream configurations providing access to a given sub-frame, for example when the bit-stream is encoded such that there is at least one GDU per sub_frame_idx, or when a GDU does not contain more sub-frames than an indicated sub-frame index or sub-frame index

range in ATI or sub-frame description.

Next, the retrieved data units are transmitted to a decoder, when they are compressed, to a display, or to an application using the point cloud data (step 635).

If no additional timing information is identified, the sample data are read (step 640) and transmitted to a decoder, when they are compressed, to a display, or to an application using the point cloud data (step 635).

As illustrated, these steps are iterated until no more samples are to be read (test 645).

Description of sub-frames as sub-samples of point cloud tracks Figure 7 illustrates encapsulation of additional timing information (ATI) in a sub-sample information box (subs' box) of a G-PCC track.

Illustrated media file 700 represents a media file (e.g. media file 152 in Figure 1) encapsulated according to embodiments of the invention. It is depicted as a non-fragmented ISO Base Media File, but it could have been an ISOBMFF fragmented or ISOBMFF segments without changing encapsulation of additional timing information since the 'subs' box is also allowed in movie fragments and in media segment files. According to this embodiment, point cloud data may be encapsulated in a single track, described in traki box 710 of cmoott box 705, for example as a G-PCC track, as indicated by the sample entry type 'gpel' referenced 715, 'gpeg', or any other four-character code (4CC) representing a single G-PCC track.

It is to be noted that a single track embedding a sub-frame description could also be indicated by a specific sample entry type (instead of gpel' 715), not conflicting with other existing four-character codes. This would allow the parser to determine that a volumetric track may provide finer timing information than the sample duration, decoding, or composition time information usually provided in the sample description.

According to a first and a second embodiments, a new value for the flags field dedicated to PCC sub-frame description is used.

According to the first embodiment, the definition of a sub-sample for a G-PCC track is extended as follows to allow sub-frame description in G-PCC tracks: a new value for the flags field of the 'subs' box is defined, for example the value 2 (or any other value not conflicting with other values already in use).

The flags value specifies the type of sub-sample information given in a 'subs' box within a G-PCC track as follows: -0: G-PCC unit based sub-samples. A sub-sample contains only one G-PCC unit (e.g. box 720 in Figure 7), -1: Tile-based sub-samples. A sub-sample either contains one or more contiguous G-PCC units corresponding to one G-PCC tile or contains one or more contiguous G-PCC units which contain either each parameter set, tile inventory, or frame boundary marker, 2: Sub-frame-based sub-samples. A sub-frame-based sub-sample either contains: * one or more contiguous G-PCC units corresponding to one G-PCC sub-frame or * one or more contiguous G-PCC units corresponding to a set of contiguous G-PCC sub-frames (i.e. a range of frame index or frame number attributes).

Other values of flags are reserved.

codec_specific_parameters field of the SubsamplelnformationBox 'subs' box may be extended with a new flags value, as follows: if (flags == 0) unsigned int(8) payloadType; if (payloadType == 4) ( // attribute payload unsigned int(6) attrldx; bit(18) reserved = 0;

I else

bit(24) reserved = 0; I else if (flags == 1) [ unsigned int(1) tile data; bit(7) reserved = 0; if (tile data) unsigned int(24) tile else bit(24) reserved = 0; ) else if (flags == 2) [ unsigned int (32) subframe idx;

I

where subframe_idx is an example of ATI as the index of the G-PCC sub-frame-based sub-sample. It may correspond to a value indicated in a frame index attribute in a G-PCC data unit or to a range of values of frame index attributes in G-PCC data units. It provides the index of the sub-frame-based sub-samples within a sample. A reader may use subframe_idx to compute a presentation or composition time for the sub-frame, for example, when the number of sub-frames per sample is constant, considering sample_duration divided by the number of sub-frame per sample in units of the timescale indicated in the MediaHeaderBox or in units of a capture timescale. When the number of sub-frames per sample is variable, the presentation or composition time for the sub-frame may be determined using a look-up table providing per sample or per group of samples the number of sub-frames (the number of sub-frame per sample and capture timescale are example of parameters declared in a metadata structure dedicated to sub-frame configuration information).

However, there may be gaps between subframe_idx values (for example when the media file is edited and some data units for a subframe_idx are removed or when the media file is streamed and some data units are not yet received). A specific value for subframe_idx (for example the value OxFFFFFFFF) is reserved for sub-samples not mapped to any sub-frame (like sub-samples 730). This may be useful when a point cloud sample does not need to be entirely indexed into sub-frame-based sub-samples (for example, 3D spaces with a very low number of points or 3D spaces where some analytics did not detect any object of interest).

In addition, sub-samples mapped to this specific value may have their discardable parameter set to 1 meaning that these sub-samples are not required to decode the current sample, or alternatively sub-samples mapped to this specific value may have their subsample_priority set to 0 to indicate low importance of these sub-samples. At the opposite, sub-samples mapped to sub-frames or even tiles (using a subsample information box with flags=1) that may have some interest for an application, or with an important number of points, may have their subsample_priority set to a high value, for example OxFF. It is also recommended that the subsample_priority for subsamples corresponding to geometry data units is set to a high value or discardable parameters set to 0 to indicate the importance of these units (to obtain points positions) when a subsample information box is present with flags=0.

This embodiment makes it possible to define several sub-sample descriptions in the same track description. For example, a first sub-sample description may be directed to the mandatory (according to MPEG-I Part-18) 'subs' box with flags set to 0 (as illustrated in Figure 7 with reference 720) and a second sub-sample description may be directed to sub-frame description using a 'subs' box with a flags value set to 2 (as illustrated in Figure 7 with reference 725). Other sub-sample descriptions may be defined, for example using a 'subs' box with a flags value set to 1 for the mapping of sub-samples to tiles.

Alternatively, subframe_idx may correspond to a value indicated in a frame number attribute in a G-PCC data unit or to a range of values of frame number attributes in G-PCC data units after conversion to an index value within a frame.

In another variant, two new values for the flags field may be defined depending on how the PCC sub-frame is indicated in the bit-stream. For example, the value 2 may indicate that the subframe_idx corresponds to a value indicated in a frame index attribute in a G-PCC data unit or to a range of values of frame index attributes in G-PCC data units and the value 3 may indicate a subframe_number corresponding to a value indicated in a frame number attribute of a G-PCC data unit, or to a range of values of frame number attributes in G-PCC data units.

In another variant, two different flags values may be defined, as follows: if (flags == 0) ( unsigned int (8) payloadType; if (payloadType == 4) // attribute payload unsigned int (6) attridx; bit (18) reserved = 0; else bit (24) reserved = 0; } else (flags unsigned int (1) tile data; bit (7) reserved = 0; if (tile data) unsigned int (24) tile else bit (24) reserved = 0; I else if ( flags == 2) unsigned int (32) subf rame idx; I else if ( flags == 3) f unsigned int (32) s nip r ame number; where, if flags is equal to 2, subframe_idx is the index of the sub-frame-based sub-sample and may correspond to a value indicated in a frame index attribute in a G-PCC data unit or to a range of values of frame index attributes in G-PCC data units, and if flags is equal to 3, subframe_number is the number of the sub-frame-based sub-sample within the sequence or track and may correspond to a value indicated in a frame number attribute in a G-PCC data unit or to a range of values of frame number attributes in G-PCC data units.

A reader may use subframe_idx to compute a presentation or composition time for the sub-frame as already described above in reference to Figure 3.

A reader may use subframe_number to compute a presentation or composition time for the sub-frame. For example, the sampling_rate parameter defined in SubFrameConfigurationGroupEntry may provide the acquisition rate of the 3D sensor expressed in capture_timescale units. In such a case, when the capture of sub-frames is regularly performed according to the sampling_rate parameter specified in SubFrameConfigurationGroupEntry, the timing or presentation time or composition time of the sub-frame may be computed as follows: CT(sub-frame) = subframe_number * sampling_rate / capture_timescale where CT(sub-frame) is the presentation or composition time for the sub-frame, subframe number is the value of the frame number attribute of the sub-frame in the sample sequence, and capture_timescale is the number of time units that pass in one second.

Alternatively, the sampling_rate parameter defined in SubFrameConfigurationGroupEntry may provide the acquisition rate of the 3D sensor expressed in the timescale defined in the 'indhd' box. In such a case, when the capture of sub-frames is regularly performed according to the sampling_rate parameter specified in subFrameConfigura ti onGroupEn try the timing or presentation time or composition time of the sub-frame may be computed as follows: CT(sub-frame) = subframe_number * sampling_rate / timescale where CT(sub-frame) is the presentation or composition time for the sub-frame, subframe_number is the value of the frame number attribute of the sub-frame in the sample sequence.

Flags values 2 and 3 are examples and may be any value not conflicting with other values already in use.

In another variant, a dedicated parameter may also be defined to signal if the sub-frame identifier is a frame index or a frame number as follows: if (flags == 0) unsigned int(8) payloadType; if (payloadType == 4) f // attribute payload unsigned int(6) attrIdx; bit(18) reserved = 0; else bit(24) reserved = 0; I else F (flags == 1) f unsigned int(1) tile data; bit (7) reserved = 0; if (tile data) unsigned int(24) else bit(24) reserved = 0; else if (flags == 2) f unsigned int (1) subframe id type; unsigned int (31) sub_f rame id; where, if flags is equal to 2 (or any value not conflicting with other values already in use), subframe_id_type indicates if set to 0, subframe_id is the index of the sub-frame-based sub-sample within a frame and may correspond to a value indicated in a frame index attribute in a G-PCC data unit or to a range of values of frame index attributes in G-PCC data units; and if set to 1, subframe_id is the number of the sub-frame-based sub-sample within the sequence or track and may correspond to a value indicated in a frame number attribute in a G-PCC data unit or to a range of values of frame number attributes in G-PCC data units.

In another variant, sub-frame indication can be signalled using the SubSampleinformationBox subs'with an existing flags value, for example with the flags value 0 signalling the G-PCC unit based sub-samples. For example, the syntax of subs' with flags value = 0 could be defined as follows: if ( flags == 0) { unsigned int (8) payloadType; if (payloadType == 4) // attribute payload unsigned int (6) attrldx; bit (1) subf rame id type; bit (16) subframe id; bit (1) reserved = 0; I else bit (1) subframe id type; bit (16) subf ram° id; bit (7) reserved = 0;

I

where subframe_id_type indicates if set to 0, subframe_id is the index of the sub-frame-based sub-sample within a frame and corresponds to a value indicated in a frame index attribute in a G-PCC data unit or to a range of values of frame index attributes in G-PCC data units; and if set to 1, subframe_id is the number of the sub-frame-based sub-sample within the sequence or track and corresponds to a value indicated in a frame number attribute in a G-PCC data unit or to a range of values of frame number attributes in G-PCC data units.

According to the second embodiment, wherein a new flags value dedicated to PCC sub-frame description is used, still using a new flags value set for example to 2 and with same features of the first embodiment, the subframe_idx is described as a pair of values: ) else if (flags == 2) unsigned int (16) frame idx start; unsigned int (16) frame idx end; 10} where frame_idx_start indicates the lowest value of the frame index attribute contained in the group of subframe-based sub-sample; and frame_idx_end indicates the highest value of the frame index attribute contained in the group of subframe-based sub-sample. It is to be noted that this second parameter may be coded as a difference from the frame_idx_start.

Similarly to previous variants, the frame indexes (represented by frame_idx_start and frame_idx_end) may represent frame numbers and/or a flag parameter or new flags values may be defined to differentiate signalling between frame index from frame number attribute values.

According to third embodiment wherein the sub-frames are described as sub-samples of point cloud tracks, an existing flags value for PCC is reused sub-frame description.

According to this embodiment, an existing flags value, for example the value 1 currently used for tile-based subsamples is reused, instead of using a new flags value, such as flags value 2. The syntax could be adapted to handle tile or sub-frame indexation. The semantics may be the following: ] else if (flags == 1) f unsigned int (2) data type; bit(6) reserved = 0; if (data type == 1) unsigned int(24) tile id; else if (data type == 2) ( unsigned int(24) subframe idx; else bit(24) reserved = 0; where subframe_idx has the same semantics as the one described by reference to the first embodiment.

The parameter called data_type indicates the type of data represented by the G-PCC units. When it is set to 1, it indicates that the sub-sample contains G-PCC units which contain either geometry data units or attribute data units, that corresponds to one G-PCC tile. When it is set to 2, it indicates that the sub-sample contains G-PCC units which contain either geometry data unit or attribute data unit, that corresponds to one or more G-PCC sub-frames.

Otherwise, it indicates that the sub-sample contains G-PCC units which contain either each parameter set, tile inventory, or frame boundary marker.

The new syntax remains backward compatible with the previous syntax and semantics of previous tile_data parameter.

This embodiment may be used when either tile or sub-frame information is present in the point cloud data to encapsulate. When both tiles and sub-frames are used, the first or the second embodiment may be preferred.

However, when the encoding is configured to allow extracting G-PCC units corresponding to one G-PCC subframe, using an advantageous tile/slice/GPCC-units set up would allow this variant to be used without penalty. For example, a slice contains one geometry and zero or more attribute data units for one sub-frame and there may be a tile corresponding to one or more slices, so that the value of subsample_count in the 'subs' box corresponds to the number of sub-frame-based sub-samples.

It is to be noted that in any variant of this embodiment, the subframe_idx parameter may be replaced by other type of information like for example a timing information for a sub-frame-based sub-sample or a pair of (azimuth, elevation) angles, a region identifier if regions are defined in the track, any information describing a sub-frame-based sub-sample or additional timing information provided that it can be represented on 32 bits (the size of the codec_specific_parameter field in the 'subs' box). In case ATI requires more than 32 bits for its description, a new version (for example, version =2) of the subs box may be defined to allows more bits for the codec_specific_parameter, as follows: aligned(8) class SubSampleInformationBox extends FulIBox('subs', version, flags) unsigned int(32) entry count; int i,j; for (i=0; i < entry count; i++) unsigned int(32) sample delta; unsigned int(16) subsample count; if (subsample count > 0) [ for (j=0; j < subsample count; j++) if(version == I version == 2) unsigned int(32) subsample size; else unsigned int(16) subsample size; unsigned int(8) subsample priority; unsigned int(8) discardable; unsigned int(32) codec specific parameters; if(version == 2) unsigned int(32) codec specific parameters extension; 1 where the codec_specific_parameters_extension is defined by the codec in use and should be interpreted as additional information to the codec_specific_parameters for a type of sub-samples determined by the flags value of the 'subs' box. This allows, for example, to indicate, for a sub-frame based sub-sample, timing information in the codec_specific_parameters and laser shot orientation in the codec_specific_parameters_extension, or any combination of possible parameters for the ATI.

Mapping of data units to PCC sub-frame description in point cloud tracks Figure 8 illustrates a media file 800 (e.g. a media file 152 in Figure 1) having a G-PCC track described in trak' box 810 of moov' box 805, containing an additional timing information (ATI) or sub-frame description defined in a sample group description box 815 that is used with a G-PCC unit mapping sample group (references 820 and 825).

For the sake of clarity and regarding their use here, it is considered in the following that that "G-PCC units" are equivalent to "TLV units".

According to this embodiment, the file writer (e.g. file writer 100 in Figure 1) generates a first sample grouping to map samples onto entries with different TLV unit patterns. This first sample grouping results in a TLV mapping sample group indicated, for example, by a grouping_type = t/vm' (here reference 830) in a SampleToGroupBox box (here sbgp' box 820) and in an associated SampleGroupDescriptionBox box (here sgpd' box 825). As illustrated, SampleToGroupBox 820 defines a group of samples and for each group, indicates an entry (reference 835) in the SampleToGroupDescriptionBox 830. Each entry 835 is a specific VolumetricVisualSampleGroupEntry, called TLVMapGroupEntry. A TLVMapGroupEntry can be used to map a range of contiguous TLV units into an entry in a second sgpd' box. By default, this second sgpd' box may be a SampleGroupDescriptionBox with grouping type indicating that the box contains SubFramelnformationGroupEntry (reference 840), for example the grouping type sfif (sub-frame information) as illustrated with reference 845. A SubFramelnformationGroupEntry (reference 840) is also a specific kind of VolumetricVisualSampleGroupEntry. When several TLV mappings are in use in a same GPCC track, the grouping_type_parameter (reference 850) of the SampleToGroup with grouping_type iltan' may be set equal to the four-character code of 'sod' box 815 (sfif in this example) to explicitly indicate into which kind of volumetric sample group entries the TLV units are mapped. According to the example illustrated in Figure 8, setting grouping_type_parameter 850 to Es& indicates that TLVMapGroupEntries 835 map a range of contiguous TLV units to an entry in sgpci box 815 of the isfif type.

TLVMapGroupEntry (e.g. TLVMapGroupEntry 835) may be defined as follows.

The TLVMapGroupEntry may be used to assign an identifier, called grouplD, to each TLV unit. The TLVMapGroupEntry, when present, may be linked to a sample group description providing the semantics of that grouplD. This link may be provided by setting the grouping_type_parameter of the SampleToGroupBox of the Wm' type to the four-character code of the associated sample grouping type. Consequently, a SampleToGroupBox of ?iv& type may never use version 0 of the box.

A PCC track should not contain both a SampleToGroupBox of the 't/vmitype associated with a grouping_type_parameter equal to a particular value groupType and a SampleToGroupBox of the groupType type. When a track contains a SampleToGroupBox of the 't/vm' type associated with a grouping_type_parameter groupType, TLV units of the mapped sample are indirectly associated with the sample group description of type groupType through the grouplD of the TLVMapGroupEntry applicable for that sample. When a track contains a SampleToGroupBox of type groupType, each sample is directly mapped to the sample group description of type groupType through the SampleToGroupBox of type groupType and all TLV units of the mapped sample are associated with the same grouplD.

The syntax of TLVMapGroupEntry may be defined as follows: class TL VPIapGroupEntr_y O extends Volumetri cVi sua 1 Sampl eGroupEntry ( ' tivra' ) bit (6) reserved = 0; unsigned int (1) large size; unsigned int (1) rle; if (large size) f unsigned int (16) entry count; I else I unsigned int (8) entry count; for (i=i; i<= entry count; i++) { if (rle) if (large size) f unsigned int (16) TLV start number; } else f unsigned int (8) TLV start number; unsigned int (16) grouplD; with the following semantics: large size indicates whether the number of TLV units entries in the track samples is represented on 8 or 16 bits, rle indicates whether run-length encoding is used (for example value 1) to assign group= to TLV units or not (for example value 0), entry count specifies the number of entries in the map. It is noted that when rle indicates that run-length encoding is used, the entry count corresponds to the number of runs where consecutive TLV units are associated with the same group.

When rle indicates that run-length encoding is not used, entry count represents the total number of TLV units, TLV start number is the 1-based TLV unit index in the sample of the first TLV unit in the current run associated with groupiD, and groupiD specifies the unique identifier of the group. All TLV units mapped to the same group with a particular grouplD value have the same properties in all the sample groups that indicate a mapping to that particular group= value and are contained in this track. More information about the group is provided by the sample group description entry with this grouplD and grouping_type equal to the grouping_type_parameter of the SampleToGroupBox of type 't/vm'.

A SubFrameInformationGroupEntry (e.g. SubFrameInformafionGroupEntry 840) may be defined as follows: class SubFrameinformationGroupEntry extends VolumetricVisualSampleGroupEntry ('sfif') { unsigned (32) frame index; // or alternatively to a single frame index: unsigned (32) frame start index; unsigned (32) frame index end; // other optional or alternative parameters 3DBoundingBox region; (or reference to an existing one via a region identifier parameter) unsigned int(32) decoding time; // a laser shot direction unsigned int(32) azimuth angle; unsigned int(32) elevation angle; // Random accessible subframe or not: unsigned int(1) random accessible; unsigned int(7) reserved=0; if (random accessible = 0) unsigned int (8) num dependencies; unsigned int (32) dependent subframes [num dependencies); where frame index is the index of the sub-frame. It may correspond to a value indicated in a frame index attribute in a G-PCC data unit. When instead of a single frame index, two frame index values are indicated, the frame_index_start and frame_index_end indicate a range of values of frame index attributes in G-PCC data units, mapped onto a sub-frame at file format level.

Alternatively, instead of indicating a frame index of the sub-frame within a frame, a frame number of the sub-frame may be indicated within the sequence or track.

It may correspond to a value indicated in a frame number attribute in a G-PCC data unit or to a range of values of frame number attributes in G-PCC data units after conversion to an index value within a frame Other parameters like the followings may be used to describe a sub-frame at file format level: -a decoding time, that may be expressed as timescale (number of timestamp values that represent a duration of one second) units, reusing the timescale declared in the movie header box of the media file. The timescale should then be computed so that sampling rate for PCC frames and PCC sub-frames can be expressed as units of this timescale, -an orientation of the sensor as a pair of angles that led to the generation of the points in a sub-frame, for example as an azimuth and elevation angle in degrees units or as factor of angle resolution that can be encoded as a parameter of the sensor in some configuration information, and -an indication on whether a sub-frame can be randomly accessed i.e. it can be independently decoded of other sub-frames within the frame. When random_accessible is set to 0, the sub-frame descriptor may contain a list of dependencies (the dependent_subframes array parameter), possibly as sub-frame indexes onto which the sub-frame depends. The number of depended sub-frames may be indicated as another parameter in the sub-frame descriptor.

On parsing side, when structures like boxes 815, 820, and 825 in Figure 8 are present in a media file, a media player may retrieve sample by sample the TLV units corresponding to a sub-frame by inspecting the entries of 'sgpd' box 815. Then, from these entries, the player may select one or more sub-frames (as described with reference to step 625 in Figure 6), based on sub-frame description parameters. The one or more identified entries (referenced 840 in Figure 8) are kept in memory (to apply to following samples if selection criteria do not change over time, otherwise the step is repeated on a sample basis) and the TLV units mapped to the identified entries can be selected for output, rendering or transmission.

Other embodiments According to particular embodiments, metadata structure dedicated to sub-frame configuration information may indicate the presence of additional timing information within a track. For instance, the metadata structure dedicated to sub-frame configuration information (e.g. SubFrameConfigurationBox, GPCCConfigurationBox, or GPCCDecoderConfigurationRecord) may include an additional parameter, for example the additional_timing_information_flag parameter, indicating the presence of Frame Index or Frame Number Attribute units. This avoids, for parsers, to check the attribute types and identify presence or not of frame index or frame number attributes. It also warns parsers when this parameter is set to 1, that ATI may be present in a track (for step 615). In a variant, the presence of Frame Index or Frame Number Attribute units may be indicated in (by a writer) or determined from (by a parser) a GPCCComponentInfoBox with a gpcc_type value equal to 4 and attr_type equal to 3 or 4, respectively to indicate frame index or frame number attributes. This parameter or the presence of attr_type = 3 or 4 in the GPCCComponentlnfoBox may be exposed as a parameter in a MIME type to inform players on the presence of additional timing information. When the media file is for streaming, the presence of additional timing information may be indicated in a codec attribute of a DASH Representations describing a G-PCC track. Optionally, a DASH Representation for a sub-frame track may contain a DASH descriptor as a SupplementalProperty or EssentialProperty with a specific scheme_id_uri attribute's value indicating that sub-frame information is provided and a value attribute indicating the one or more values of sub-frame index comprised in the sub-frame track. Having indication of the presence of sub-frames at file format level could be useful in several applications: for example, to determine that one sample may contain several frames. The application could then initialize the buffer for displaying the multiple point cloud frames. In addition, this indication is necessary to envisage the access to a sub-representation of the point cloud based on an acquisition time (and not the sample time that may depend on compression choices).

According to other embodiments, the metadata structure dedicated to sub-frame configuration information may indicate constraints for the additional timing information associated with points of a sample. For instance, a cross_sample_reordering flag may specify when equal to 0 that the timing information of all points of a sample indicates a timing in the range of the sample_time to the sum of sample_time and sample_duration inclusive, wherein sample_time is the start time of the sample and sample_duration is the duration of the sample. In other words, it indicates that sub-frame presentation or composition times in a given sample should not exceed the presentation or composition time of the sample immediately following this given sample. When equal to 1, the flag specifies that the timing of the points in a sample may or may not be in the range of the sample_time to the sample_time plus the sample_duration inclusive. This flag may be used to determine if the reordering of the points after decoding of the point cloud should be done after decoding of a single sample or not. This allows a player to reorder frames contained within a combined frame. In a variant, when the cross_sample_reordering flag equal to 1, the additional timing information description may further describe the minimal number of consecutive samples needed to reorder the points of point cloud according to the additional timing information. Having a flexibility in the ordering of frames within a combined frame may be useful to optimize the compression.

In a variant of the described embodiments, the metadata structure dedicated to sub-frame configuration information may further describe the characteristics of the timing associated with the points of a sample. It can be for example a number of different timing values within a sample. For instance, it may indicate the number of different frame numbers or frame indexes values present within a sample. In a variant, it may also indicate the maximum number of different timing within a sample or the minimum number. In another variant, when the information is a frame index, the additional timing information description may indicate the maximum range of frame index values.

In yet another embodiment, the metadata structure dedicated to sub-frame configuration information may indicate whether the encoding of the samples allows accessing to a partial representation of the point cloud frame that is associated with a particular value of timing. Typically, it can be specified using an additional parameter, for example named subframe_accessible_type parameter. When set to 0, no access to a partial representation is possible. When set to 1, access to a partial representation is possible (but may require to decode more units than the units associated with this partial representation. This may require parser or reader to check possible dependencies between sub-frames or partial representations, for example indicated as ATI. When set to 2, access to and extraction of a partial representation are possible (the partial representation is decodable without data units from other partial representations). When set to 3, access to a partial representation is possible but requires bit-stream inspection (in other words, access to the partial representation is not described in the metadata part of the media file).

Packing sub-frames or sample aggregation in point cloud tracks Figure 9 illustrates a media file 900 (e.g. media file 152 in Figure 1) having a single G-PCC track described in 'trek' box 910 of moov' box 905, containing a sub-frame description through packing (or interleaving) of samples. These specific samples are described as volumetric sample entry types 920 in the sample description box stscf referenced 915. Then, the additional timing information may be provided in the sample description.

According to this embodiment, a sub-frame is encapsulated as a sample, meaning that it has its own byte offset, number, duration, and size indication in the sub-boxes of the Sample Table Box (stb1) referenced 925 (or in track fragment boxes when the media file is fragmented). This is illustrated in the mdat' box 930: data units (TLV or PCC units) corresponding to a sub-frame (with same frame index attribute value or within a range of frame index attribute values, depending on the encapsulation settings) are stored one after the other and indexed as a sample.

In order to provide a full point cloud frame access, sample entry 920 or sample description box 915 may contain a specific metadata structure 935 providing information on how the sub-frames carried in a run of consecutive samples are packed or interleaved to form full point cloud frames and/or information for identifying the grouping_type value of the sample group used to group together subframes belonging to a same point cloud frame (the sfif grouping type in box 935). This sample group may be a compact sample group such as compact sample group 940 that defines one or more patterns describing how samples are packed or interleaved and, for each element of the pattern (a sample), provides an index in a sample group description box (e.g. isgpcibox 945). The entries of isgpd' box 945 are subframe descriptor indicated here as SubFramelnformationGroupEntry (sfif) as described in reference to Figure 8 (reference 840). The composition time of a G-PCC frame resulting from the combination of a run of consecutive samples coincides with the composition time of the last sample in the run. In a variant, the composition time of a G-PCC frame resulting from the combination of a run of consecutive samples coincides with the composition time of the first sample in the run. The duration of a G-PCC frame resulting from the combination of a run of consecutive samples is the sum of the duration of these consecutive samples. This encapsulation mode may be used in cases, e.g., where two or more point cloud frames have been combined together to form a single point cloud frame, for the sake of compression efficiency.

Alternatively, when sub-frame descriptor describing ATI associated with each sub-frame is not needed by the application, but only access to sub-frames or full frame is needed, the combined frame (reference 950) that is encoded may be encapsulated as packed or interleaved samples, where the frames are stored as one or more samples in a track (e.g. track 910), thus allowing a finer temporal access, at sub-frame level, through the PackedSampleDescriptionBox 935 signaling that sub-frames in consecutive samples are grouped together using a sample grouping of type igfra' (referenced 921 in Figure 9) indicating how to access to a GPCC frame, as follows: Grouping Type: fra'

Container: Sample Group Description Box ('sgpd')

Mandatory: No Quantity: Zero or one per track A sample group with grouping_type gfra' allows signaling runs of consecutive samples, each run of consecutive samples forming a G-PCC frame.

Each sample shall contain data corresponding to one or more GPCC sub-frames (for example like the data units in track 930).

A GPCC frame results from the combination of the data of all samples belonging to a same run of consecutive samples.

The composition time of a G-PCC frame resulting from the combination of a run of consecutive samples coincides with the composition time of the last sample in the run. In a variant, the composition time of a G-PCC frame resulting from the combination of a run of consecutive samples coincides with the composition time of the first sample in the run. The duration of a G-PCC frame resulting from the combination of a run of consecutive samples is the sum of the duration of these consecutive samples.

The grouping_type_parameter may not be set to any particular value for the SampleToGroupBox with grouping type igfrai: aligned(8) class GPCCFrameEntry() extends VolumetricVisualSampleGroupEntry ('gfra') In an alternative, both sample grouping 'sof and tgfra' may be defined in a track containing an interleaved of subframes in consecutive samples.

Description of sub-frames using data entries

Figure 10 illustrates an example of the organization of point cloud data 1000 (e.g. point cloud data 150 in Figure 1) to encapsulate and a resulting encapsulated media file 1005 (e.g. media file 152 in Figure 1), using a single PCC track referenced 1010, providing sub-frame description and access. According to the illustrated example, the point cloud data to encapsulate contains a set of PCC frames referenced 1015-1 to 1015- 1 0 n. Still according to the illustrated example, each PCC frame contains four sub-frames.

For example, frame 1015-1 contains sub-frames 1020-10 to 1020-13. Each sub-frame has a sub-frame index or frame number, for example starting at 0, incremented by 1 from one sub-frame to another within the PCC frame for frame index and within the sequence for frame number.

According to this embodiment, the track description 1010 contains a DataReferenceBox dref referenced 1025 that contains a table of data references (for example URLs). The purpose of the dref box is to declare the location(s) of the media data used within the presentation or media file. A data reference box like tdref box 1025 contains at least one data entry box. There exist several kinds of data boxes, the default one being used to indicate that media data are present in the media file in the 'aide box.

The data entries in cdref box 1025 are referenced in sample description box like stsd' box 1030. Other kinds of data entries inform parsers that there may be identifiable media data box, e.g. cirnda' box, like iiinda' boxes 1035-1 to 1035-4 that store the data for samples referencing this type of data entry in their sample description (i.e. in tstscli box 1030). In this embodiment, a new DataEntry box is proposed to indicate that data are stored in an identified media data box based on a sub-frame sequence number. The sequence number applies to the frame index or frame number of point cloud sub-frames. When sub-frame indication comes as a frame index attribute, the sequence number for a sub-frame in a frame N is equal to the frame_index of the sub-frame plus the frame index for the last sub-frame of frame N-1 (in other words, the frame index is transformed into an absolute frame number, instead of a frame relative one). Then, the data for a sample referencing this new DataEntry may be split into 'uncle' boxes so that data for a sub-frame with index Si is stored in an 'Linda' box having its imda_identifier parameter set to the sub-frame index Si. This results in sub-frames stored in different media data box and allows collecting data for a sequence of sub-frames as a contiguous byte range. The new data entry type may be defined as follows: aligned(8) class DataEntrySubFramelndexBox (bit(24) flags) extends DataEntryBaseBcx ('sfix', flags) wherein DataEntrySubFramelndexBox identifies the IdentifiedMediaDataBox containing the media data accessed through the data_reference_index corresponding to this DataEntrySubFramelndexBox. When a data_reference_index included in a sample entry refers to DataEntrySubFramelndexBox, each sample referring to the sample entry may have its data split into a number of IdentifiedMediaDataBox corresponding to the number of sub-frames within this sample (this number may be indicated or obtained in a metadata structure dedicated to sub-frame configuration information). The media data offset 0 points to the first byte of the payload of the IdentifiedMediaDataBox that has imda_identifier equal to the sub-frame number modulo the number of sub-frames per sample.

Multi-track encapsulation for sub-frame tracks Figure 11 illustrates an encapsulated media file 1100 (e.g. media file 130) with multiple PCC tracks, based on sub-frames.

As illustrated, media file 1100 contains a metadata part represented by movie box 'moov' 1105 and a media data part represented by 'mdat' box 1150. The 'moov' box contains Ityp' box 1110 and optionally a GroupListBox box igrpl' referenced 1115. This GroupListBox may declare a group of sub-frame tracks (e.g. sub-frame tracks 1120-1 to 1120-3) encapsulating point cloud data from a same point cloud sequence (e.g. point cloud data 150 in Figure 1). This can be done by defining in this igrpt box an entity group listing sub-frame tracks from a same point cloud sequence. A new entity grouping type may be defined for this specific use of EntityToGroupBox, for example the sfgp' type, for sub-frame group. An entity group of this type may reference sub-frame tracks in its list of entity_ID. Each frame in this sequence is encapsulated into samples of sub-frame tracks, each sample representing a consistent set of points from sampling or timing point of view within a frame of the point cloud sequence. Each sub-frame track contains a sample description (with at least a sample table box (e.g. Istbl' boxes 1125-1 to 1125-3 and a sample description box stsd). The stse box may contain a specific sample entry type indicating that samples of this track contains a partial representation, or partial sample, corresponding to a set of points, from a point cloud frame sharing common properties, like for example timing information.

The common properties and their type, for example timing information, may be indicated in a sample group description (sgpd' and optionally a Isbgp' boxes, not represented) with a specific grouping-type, for example for providing sub-frame information. It may be indicated, for example, by the four-character code 'sfif (or any other value, not conflicting with other 4CC already in use). The 'sgpd' box may contain any variant of the SubframelnformationGroupEntry as described in reference to Figure 8 (reference 840) or 9 (within box 945). The properties contained in a sgpd' box of a sub-frame track associate properties to group of samples in the 'mdaf box (e.g. mdat' box 1150). For example, track 1120-1 describes samples corresponding to a sub-frame with index "A" having samples indicated with references 1130-1 and 1130-2. Similarly, tracks 1120-2 and 1120-3 describe samples corresponding to a sub-frame with index "B" or "C" and samples corresponding to a sub-frame with index "D" respectively.

Data in mdat box 1150 come as bursts of consecutive samples for a given sub-frame (e.g., 1130-1 and 1130-2) or a given set of sub-frames (e.g., 1131-1 and 11312) corresponding to a track. The sample description boxes 1125-1 to 1125-3 provide byte offsets and lengths to access samples corresponding to a sub-frame or a set of sub-frames. It is to be noted that the same would apply with track fragments and track run boxes in case of a fragmented media file. Each sub-frame track contains the different data units for the point cloud data (i.e. parameter sets, geometry, and attributes). As an alternative to the grpi box 1115, each track may register itself in a track group, using a TrackGroupBox (not represented) with a new track grouping type. For example, the 4CC sfgpi may define a group of sub-frame tracks that when combined together lead to the reconstruction of a complete point cloud frame. It is to be noted that it may also be possible to combine a subset of sub-frame tracks within a group of sub-frame tracks to build a partial reconstruction of a point cloud frame.

As another alternative, instead of grouping tracks either with a grpt or a track group box, a base track (not represented) may be created to reference the sub-frame tracks corresponding to the same point cloud sequence. A base track has a specific track reference type in a TrackReferenceBox ('tref' box) indicating that the base track describes, in timing increasing order (following the declaration order in the ?ref' box), sub-frame tracks that when their data units are assembled together may lead to a bit-stream for the complete point cloud sequence. The assembling is done sample by sample. A base track may contain descriptive metadata, parameter set units or data units that are common to all sub-frame tracks. For example, the base track may contain sensor information like the acquisition rate, angular resolution, field of view, etc or a metadata structure dedicated to sub-frame configuration information. The base track may contain metadata structure dedicated to sub-frame configuration information. Each sub-frame track contains in its track or sample description, or both, the sub-frame description or additional timing information like for example acquisition or capture or rendering time, as the tracks 1120-1 to 1120-3 illustrated in Figure 11, for example using a sample group providing sub-frame information like sample group description boxes 1135-1 to 1135-3.

Sub-frame description for Point Cloud items

MPEG-I Part-18 defines G-PCC items and sub-sample item property under the 'meta' box of a media file. Similarly, to G-PCC tracks, a G-PCC item, corresponding to one Point Cloud frame may provide description of sub-frames within this image. The sub-sample item property may then be extended to support the subsample information box as described in reference to Figure 7 defining new flags value(s) or reusing existing flags value. The sub-sample item property may then provide ATI for a G-PCC item. As well, when a G-PCC item is split into geometry item with zero or more attribute items, each geometry or attribute item may have sub-sample item property providing ATI. The geometry item may have a property dedicated to sub-frame configuration information (an item property either containing parameters that may be present in a metadata structure dedicated to sub-frame configuration information or this metadata structure itself). In a variant to ATI in a sub-sample item property, a specific sub-frame item property may be created and indicated in the media file. This specific sub-frame item property may contain ATI. Each sub-frame may also be stored and described in a media file as a separate sub-frame PCC item and associated with a G-PCC item representing the full frame via a specific item reference. A sub-frame G-PCC item may be identified by a specific item_type, for example gpst for G-PCC sub-frame item. A sub-frame G-PCC item may be associated with a specific sub-frame item property or with a sub-sample item property containing ATI. When sub-frame G-PCC items are linked to the full frame G-PCC item, only the full G-PCC item may contain a property dedicated to sub-frame configuration information. An new entity group may also reference all sub-frame G-PCC items that, when combined together, lead to the reconstruction of a full G-PCC item. This new entity group is indicated by a specific grouping_type Isfig" for a sub-frame item group. A sub-frame G-PCC item may be linked to samples in a sub-frame G-PCC track through a stmt sample group, for example when the G-PCC item is used as a thumbnail for the G-PCC track or as a cover image for the sub-frame G-PCC track. A metadata structure dedicated to sub-frame configuration information may be associated with G-PCC items to provide information on the sensor. When associated with items, metadata structure dedicated to sub-frame configuration information may be an item property, for example identified with (sfcg' type for Sub-Frame ConFiguration property. Alternatively, it may be merged (its parameters may be added) in the G-PCC configuration item property.

Hardware for carrying out steps of some embodiments of the disclosure Figure 12 is a schematic block diagram of a computing device 1200 for implementation of one or more embodiments of the disclosure. The computing device 1200 may be a device such as a micro-computer, a workstation, or a light portable device. The computing device 1200 comprises a communication bus 1202 connected to: -a central processing unit (CPU) 1204, such as a microprocessor; -a random access memory (RAM) 1208 for storing the executable code of the method of embodiments of the disclosure as well as the registers adapted to record variables and parameters necessary for implementing the method for encapsulating, indexing, de-encapsulating, and/or accessing data, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; -a read only memory (ROM) 1206 for storing computer programs for

implementing embodiments of the disclosure;

-a network interface 1212 that is, in turn, typically connected to a communication network 1214 over which digital data to be processed are transmitted or received. The network interface 1212 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1204; -a user interface (UI) 1216 for receiving inputs from a user or to display information to a user; -a hard disk (HD) 1210; and/or -an I/O module 1218 for receiving/sending data from/to external devices such as a video source or display.

The executable code may be stored either in read only memory 1206, on the hard disk 1210 or on a removable digital medium for example such as a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 1212, in order to be stored in one of the storage means of the communication device 1200, such as the hard disk 1210, before being executed.

The central processing unit 1204 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the disclosure, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1204 is capable of executing instructions from main RAM memory 1208 relating to a software application after those instructions have been loaded from the program ROM 1206 or the hard-disc (HD) 1210 for example. Such a software application, when executed by the CPU 1204, causes the steps of the flowcharts shown in the previous figures to be performed. In this embodiment, the apparatus is a programmable apparatus which uses software to implement the method of the disclosure. However, alternatively, the method of the present disclosure may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).

Although the present disclosure has been described hereinabove with reference to specific embodiments, the present disclosure is not limited to the specific embodiments, and modifications will be apparent to a person skilled in the art which lie within the scope of the present disclosure.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the disclosure, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method of encapsulating point cloud data in a file compliant with an ISOBMFF based standard, the method comprising: obtaining point cloud data comprising 3D points, the obtained point cloud data being organized in at least one point cloud frame; determining at least two sub-frames of the at least one point cloud frame, a timing information being associated with each of the at least two sub-frames, the timing information associated with one of the at least two sub-frames being different from the timing information associated with the other of the at least two sub-frames; generating metadata describing the at least two sub-frames, the generated metadata comprising the timing information associated with each of the at least two sub-frames; and encapsulating the obtained 3D points and the generated metadata in the file, each point cloud frame being encapsulated as one or more samples.
2. The method of claim 1, wherein the timing information associated with at least one of the at least two sub-frames is determined as a function of acquisition timing information associated with 3D points of the at least one of the at least two sub-frames or as a function of rendering timing information associated with 3D points of the at least one of the at least two sub-frames.
3. The method of claim 1 or claim 2, wherein the timing information associated with at least one of the at least two sub-frames is defined as a time interval or as a timestamp.
4. The method of claim 1 or claim 2, wherein the timing information associated with at least one of the at least two sub-frames is a frame index.
5. The method of claim 4, wherein the generated metadata further comprise a description of the at least one point cloud frame, the description of the at least one point cloud frame comprising a frame rate enabling determining timing of sub-frames when combined with the frame index.
6. The method of any one of claims 1 to 5, wherein the timing information is relative to a frame or is relative to a track.
7. The method of any one of claims 1 to 6, wherein common timing information associated with the at least two sub-frames is provided within a sub-frame configuration structure of the metadata.
8. The method of any one of claims 1 to 7, wherein the timing information associated with at least one of the at least two sub-frames is provided within a sub-sample description.
9. The method of any one of claims 1 to 8, wherein the timing information associated with at least one of the at least two sub-frames is provided within a sample group description.
10. The method of any one of claims 1 to 9, wherein the at least two sub-frames are described in different tracks.
11. The method of any one of claims 1 to 10, wherein the number of sub-frames per frame varies from one frame to another.
12. The method of any one of claims 1 to 11, wherein the number of 3D points within one of the at least two sub-frames is different from the number of 3D points within the other of the at least two sub-frames.
13. A method for parsing point cloud data encapsulated in a file compliant with an ISOBMFF based standard, the point cloud data comprising 3D points and being organized in at least one point cloud frame, each point cloud frame being encapsulated as one or more samples, the method comprising: obtaining metadata from the file, identifying, from the obtained metadata, timing information associated with each of at least two sub-frames of the at least one point cloud frame, the timing information associated with one of the at least two sub-frames being different from the timing information associated with the other of the at least two sub-frames, and obtaining 3D points of at least one of the at least two sub-frames, the 3D points being obtained as a function of the timing information associated with the at least one of the at least two sub-frames.
14. The method of claim 13, wherein the timing information associated with at least one of the at least two sub-frames is representative of acquisition timing information associated with 3D points of the at least one of the at least two sub-frames or representative of rendering timing information associated with 3D points of the at least one of the at least two sub-frames.
15. The method of claim 13 or claim 14, wherein the timing information associated with at least one of the at least two sub-frames is defined as a time interval, as a timestamp, or as a frame index.
16. The method of any one of claims 13 to 15, wherein the at least two sub-frames are described in different tracks, wherein the number of sub-frames per frame varies from one frame to another, and/or wherein the number of 3D points within one of the at least two sub-frames is different from the number of 3D points within the other of the at least two sub-frames.
17. The method of any one of claims 13 to 16, wherein the timing information associated with each of the at least two sub-frames is provided within a sub-frame configuration structure, a sub-sample description, or within a sample group description.
18. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing each of the steps of the method according to any one of claims 1 to 17 when loaded into and executed by the programmable apparatus.
19. A non-transitory computer-readable storage medium storing instructions of a computer program for implementing each of the steps of the method according to any one of claims 1 to 17.
20. A device comprising a processing unit configured for carrying out each of the steps of the method according to any one of claims 1 to 17.