WO2022148651A1 - Method, device, and computer program for optimizing encapsulation of images - Google Patents

Method, device, and computer program for optimizing encapsulation of images Download PDF

Info

Publication number
WO2022148651A1
WO2022148651A1 PCT/EP2021/087209 EP2021087209W WO2022148651A1 WO 2022148651 A1 WO2022148651 A1 WO 2022148651A1 EP 2021087209 W EP2021087209 W EP 2021087209W WO 2022148651 A1 WO2022148651 A1 WO 2022148651A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sample
data
box
data chunks
Prior art date
Application number
PCT/EP2021/087209
Other languages
French (fr)
Inventor
Franck Denoual
Frédéric Maze
Eric Nassor
Naël OUEDRAOGO
Original Assignee
Canon Kabushiki Kaisha
Canon Europe Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Kabushiki Kaisha, Canon Europe Limited filed Critical Canon Kabushiki Kaisha
Publication of WO2022148651A1 publication Critical patent/WO2022148651A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/70Media network packetisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/765Interface circuits between an apparatus for recording and another apparatus
    • H04N5/77Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal

Definitions

  • the present disclosure relates to encapsulation of images or sequences of images, in particular of uncompressed images or uncompressed sequences of images, in a standard and interoperable format, for example to store or transmit images acquired from high throughput image sensors.
  • Very high throughput image sensors are used in many applications, for example in medical or industrial devices. In such applications, it is necessary to acquire high-definition images at a high throughput rate.
  • multi-tap sensors have been created to increase the frame rates of image sensors.
  • the image frames are split into two or more areas that are clocked out in parallel.
  • the surface of a multi-tap sensor is divided into multiple tap areas (e.g. dual-tap, quad-tap, etc.) as illustrated in Figure 1.
  • a quad-tap sensor can output ultra-high definition frames (e.g. 7680 x 4320 pixels) at a frame rate near to 1,000 frames per second.
  • each tap area has its own electronic circuit, named the tap, for creating a signal and an individual output for each of the tap areas.
  • the image data from the tap areas are shifted, amplified, and selected by the taps simultaneously over shorter distances, which enables faster frame rates.
  • Figure 1 illustrates examples of multi-tap sensors.
  • each of the dual-tap sensors referenced 100-1 and 100-2 comprises two tap areas, referenced 105-11 and 105-12 and 105-21 and 105-22 respectively, that may be of a same size or not and that may be arranged according to different configurations.
  • each of the quad-tap sensors referenced 110-1 and 110-2 comprises four tap areas that may be of a same size or not and that may be arranged according to different configurations.
  • the data obtained from the different sensor areas must be reassembled into images. This reassembling may be done by what is generally called a frame grabber.
  • the acquired data should preferably be stored or transmitted according to an interoperable format so that application specific images may become available to any common client (e.g. smartphone or personal computer) without requiring specific software other than a standard media player.
  • any common client e.g. smartphone or personal computer
  • the MPEG standard allows storage of compressed video sequences, based on a common basis format, the ISO Base Media File Format standardized by ISO as ISO/IEC 14496-12. Extensions of this standard, like ISO/IEC 14496-15, define codec- specific storage formats, based on NAL units (Network Abstraction Layer units). Video codec specifications, e.g. AVC (ISO/IEC 14496-10), HEVC (ISO/IEC 23008-2), and VVC (ISO/IEC 23090-3), define the NAL unit types and payloads.
  • AVC ISO/IEC 14496-10
  • HEVC ISO/IEC 23008-2
  • VVC VVC
  • the NALU-based File Format (ISO/IEC 14496-15) defines the storage of these NAL units so that any compliant file format parser can build a bit-stream that is decodable by a video decoder conforming to, for example, AVC, HEVC, or VVC.
  • the MPEG group is now considering building a new standard (ISO/IEC 23001-17) to offer interoperability for the storage of uncompressed videos (that do not use NAL unit).
  • ISOBMFF handles timing at sample level (decoding time or composition time is handled, for example, in ‘sits’ and ‘cits’ boxes)
  • ISOBMFF does not define metadata to handle timing (or order) information at a finer granularity than samples (i.e. image level), for example at subsample level (i.e. sub-image or image part level).
  • the present disclosure has been devised to address one or more of the foregoing concerns.
  • a solution for improving encapsulation of uncompressed images is provided.
  • a method for encapsulating an image in a file compliant with the ISOBMFF standard comprising: obtaining a sequence of data chunks, each data chunk including image data of a portion of the image; generating metadata providing a relationship between the sequence of data chunks and a predetermined scan of the portions of the image, encapsulating the data chunks of the sequence of data chunks and the generated metadata in the file.
  • the method of the disclosure makes it possible to output image parts in parallel, with no need to buffer the image parts before storage, for example for reordering purpose. This reduces needs for memory and for processing power in recording devices and/or in intermediate devices located between acquisition devices and transmission devices. Moreover, this provides low latency access to acquired images when they are remotely accessed, for example via HTTP requests by remote viewing devices such as smartphone, remote computers, and surveillance screens.
  • the data of each of the portions are obtained from a tap of a multi-tap image capturing apparatus, wherein the order of the sequence of data chunks depends on the number of taps of the multi-tap image capturing apparatus and on a scanning order of pixels of each of the portion.
  • the data chunks are encapsulated into samples according to the order of the sequence of data chunks.
  • the file comprises an indicator that the sequence of data chunks is ordered differently than the predetermined scan of the image.
  • the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes the encapsulation of all images encapsulated within the file.
  • the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes the encapsulation of some of images encapsulated within the file.
  • the generated metadata are encapsulated within a multi tap description box associated with a sample description box describing encapsulation of the image. According to some embodiments, the generated metadata are encapsulated within a sample entry.
  • the relationship between the sequence of data chunks and a predetermined scan of the portions of the image is described using a sample group mechanism so that the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes each encapsulated image of a group of encapsulated images.
  • the file comprises a sample to group box defining at least one group of samples, the image belonging to the group of samples, and wherein the file further comprises at least one sample group description box associated with the sample to group box, the at least one sample group description box comprising the generated metadata.
  • a first relationship between a sequence of data chunks and a predetermined scan of portions of an image is associated with a first group of samples and a second relationship between a sequence of data chunks and a predetermined scan of portions of an image, different from the first relationship, is associated with a second group of samples.
  • the relationship between the sequence of data chunks and a predetermined scan of the portions of the image is described within a subsample information box using codec specific parameters.
  • the generated metadata make it possible to parse only a spatial portion of the encapsulated image.
  • the image is an uncompressed image.
  • a method for parsing a file compliant with the ISOBMFF standard comprising: obtaining, from the file, metadata providing a relationship between a sequence of data chunks and a predetermined scan of portions of an image, obtaining, from the file, a sequence of data chunks, each data chunk including image data of a portion of the image, ordering the obtained data chunks according to the obtained metadata, and generating the image from the ordered data chunks.
  • the second aspect of the disclosure has advantages similar to those mentioned above.
  • the metadata are obtained from a multi tap description box associated with a sample description box describing encapsulation of the image.
  • the metadata are obtained from a sample group mechanism, the relationship between a sequence of data chunks and a predetermined scan of portions of an image describing each encapsulated image of a group of encapsulated images.
  • the metadata are obtained from a subsample information box of a sample table box using codec specific parameters.
  • a device for encapsulating an image within a file or a device for parsing an image encapsulated within a file comprising a processing unit configured for carrying out each of the steps of the method described above.
  • the third aspect of the disclosure has advantages similar to those mentioned above. At least parts of the methods according to the disclosure may be computer implemented. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module” or "system”. Furthermore, the present disclosure may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
  • a tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like.
  • a transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
  • Figure 1 illustrates examples of multi-tap sensors
  • Figure 2 illustrates different examples of scan orders of portions of an image given different tap configurations in image sensors
  • Figure 3 illustrates an example of streaming image data from a server to a client according to some embodiments of the disclosure
  • Figure 4 illustrates an example of steps for encapsulating sets of data according to some embodiments of the disclosure
  • Figure 5 illustrates an example of steps for parsing received data according to some embodiments of the disclosure
  • Figure 6 illustrates examples of encapsulation of image data obtained from a quad-tap sensor
  • Figure 7 illustrates a first embodiment wherein the storage order or reordering instructions for the data chunks of a sample are described in a sample entry;
  • Figures 8 and 9 illustrate examples of data encapsulation wherein the reordering instructions or storage order of the data chunks for the samples are described using a sample group mechanism
  • Figure 10 illustrates a particular embodiment in which the reordering instructions for data chunks are indicated at subsample level
  • Figure 11 schematically illustrates a processing device configured to implement at least one embodiment of the present disclosure.
  • image data obtained from a multi-tap sensor are encoded to be stored or transmitted as they are obtained from the sensor, in the read order, without reordering and without buffering needs.
  • This avoids memory and CPU consumption between data acquisition and data storage or data transmission.
  • this provides low latency access to acquired images when they are remotely accessed, for example via HTTP requests, by a viewing device.
  • Reconstructing the images can be done by a parser based on reordering instructions added in the metadata part of the file.
  • these metadata make it possible to access directly any spatial part of images. It is assumed that a multi-tap sensor may be made of a single device or a set of multiple devices, for example a set of two dual-tap devices forming a quad-tap sensor.
  • the inventors have observed that not only the number of taps as well as the order of accessing the data from these taps may vary from one sensor to another, but the way the pixels are scanned within each area corresponding to each tap may also vary.
  • Figure 2 illustrates different examples of scan orders of portions of an image given different tap configurations in image sensors.
  • the configurations referenced 200 and 205 correspond to configurations of a dual-tap sensor using an interlaced mode. More precisely, according to configuration 200, the data of the even rows are obtained from a first tap of the sensor, from left to right and from top to bottom, and the data from the odd rows are obtained from a second tap of the sensor, in the same order (i.e. the left to right and top to bottom order). Similarly, according to configuration 205, the data of the even columns are obtained from a first tap of the sensor, from top to bottom and from left to right, and the data from the odd columns are obtained from a second tap of the sensor, in the same order (i.e. the top to bottom and left to right order).
  • Configurations referenced 210 and 215 are other examples of image scan configuration of a dual-tap sensor.
  • the sensor area is split into two vertical parts. A first tap is used to obtain the data of the left part of the image while a second tap is used to obtain the data of the right part, both in a raster scan order, from left to right and from top to bottom.
  • the sensor area is split into two horizontal parts, a first tap is used to obtain the data of the upper part of the image while a second tap is used to obtain the data of the lower part, both in a raster scan order, from left to right and from top to bottom.
  • each tap may obtain data in a reverse raster scan order (i.e. from bottom to top and right to left) or any other combination.
  • a reverse raster scan order i.e. from bottom to top and right to left
  • the configurations referenced 220 to 235 correspond to some possible configurations of a quad-tap sensor.
  • the sensor area is split into four parts, each part corresponding to a particular tap: two upper parts and two lower parts, comprising two left parts and two right parts.
  • the data of each part are obtained in a raster scan order (i.e. from left to right and from top to bottom) from the corresponding tap.
  • the data of each of the upper parts are obtained in a raster scan order (i.e. from left to right and from top to bottom) from the corresponding tap and the data of each of the lower parts are obtained in a reverse scan order compared to the raster scan order, i.e. from left to right and from bottom to top, from the corresponding tap.
  • the data are obtained from left to right and from top to bottom by a first tap;
  • - for the upper right part the data are obtained from right to left and from top to bottom by a second tap;
  • the data are obtained from left to right and from bottom to top by a third tap;
  • the data are obtained from right to left and from bottom to top by a fourth tap.
  • the data of each part are obtained as follows
  • the data are obtained from right to left and from bottom to top by a first tap;
  • the data are obtained from left to right and from bottom to top by a second tap;
  • the data are obtained from right to left and from top to bottom by a third tap;
  • the data are obtained from left to right and from top to bottom by a fourth tap.
  • the sensor area can be split differently and other scan orders exist for quad-tap sensors. For the sake of conciseness, they are not described or illustrated here.
  • Figure 3 illustrates an example of streaming image data from a server to a client according to some embodiments of the disclosure.
  • a server 300 comprises an encapsulation module 305.
  • the server 300 may be connected, via a network interface (not represented), to a communication network 310 to which is also connected, via a network interface (not represented), a client 320 comprising a de-encapsulation module 315.
  • Server 300 processes data, e.g. uncompressed video data, uncompressed sequence of images, and/or uncompressed images, for streaming or for storage. To that end, server 300 obtains or receives data comprising, for example, the recording of a scene by one or more cameras or image sensors, referred to as a source video. It may also obtain or receive video or image data from a single camera or image sensor but as different parts (or zones or partitions) of an image in parallel. The source video is received by the server as an original sequence of pictures, or images, or picture parts or image parts, for example image data 325-1 and 325-2.
  • data comprising, for example, the recording of a scene by one or more cameras or image sensors, referred to as a source video. It may also obtain or receive video or image data from a single camera or image sensor but as different parts (or zones or partitions) of an image in parallel.
  • the source video is received by the server as an original sequence of pictures, or images, or picture parts or image parts, for example image data 325-1 and 32
  • an image may be received as two or more image parts in parallel, a first part corresponding to the left part of the image and a second part corresponding to the right part of the image.
  • Data obtained or received by the server 300, corresponding to an image part are called a data chunk.
  • a data chunk may correspond to the whole image part or there may be several data chunks for an image part (for example one data chunk per pixel row of the image part). It consists in a set of bytes representing the values of the pixels of the image part.
  • the combination of the image parts lead to a full image or complete image corresponding to an image (or frame or picture) of the source video.
  • the server may encapsulate the sequence of pictures or picture parts into a media file or media segments 330 without any compression, as they are received and as soon as they are received (i.e. without any delay ) using encapsulation module 305.
  • the server may encode a sequence of pictures into media data (i.e. bit-stream) using one or more media encoders (e.g. a video encoder encoding either an image in a multi-camera system or a picture partition in a single camera system), not represented, and encapsulates the media data in one or more media files or media segments 330 using encapsulation module 305.
  • the incoming video may contain multiple sequences of pictures (e.g. video data corresponding to image data 325-1 and video data corresponding to image data 325-2).
  • Encapsulation module 305 comprises at least one of a writer or a packager to encapsulate the media data (either uncompressed or compressed).
  • the one or more media encoders may be implemented within encapsulation module 305 to encode received data or may be distinct from encapsulation module 305. Encoding is done as soon as the sequence of pictures or picture parts (or partitions) is obtained (live encoding).
  • Encapsulation is preferably done in a live mode, or with low delay or low latency, i.e. by producing media fragments or segments.
  • Client 320 is used for processing data received from communication network 310, for example for processing media file 330.
  • de-encapsulation module 315 also known as a parser
  • the de-encapsulated data (or parsed data) corresponding to uncompressed video may be stored, displayed, or output.
  • the de-encapsulated data (or parsed data) correspond to a media data bit-stream
  • it is decoded, forming, for example, video or image data that may be stored, displayed, or output.
  • the media decoder may be implemented within de-encapsulation module 315 or it may be distinct from de-encapsulation module 315.
  • Client or server may be user devices but may also be, for example, network nodes processing the media files being transmitted or stored.
  • media file 330 may be communicated to de-encapsulation module 315 in different ways.
  • encapsulation module 305 may generate media file 330 with a media description (e.g. DASH MPD) and may communicate (or streams) it directly to de-encapsulation module 315 upon receiving a request from client 320.
  • the media file 330 may also be downloaded by client 320 and stored within a storing module of client 320.
  • media file 330 may encapsulate media data (e.g. uncompressed or encoded video, possibly with audio) into boxes according to ISO Base Media File Format (ISOBMFF, ISO/I EC 14496-12 and ISO/I EC 14496-15 standards for compressed video).
  • media file 330 may correspond to one or more media files (indicated by a FileTypeBox ‘ fty ).
  • ISOBMFF ISO Base Media File Format
  • media file 330 may include two kinds of boxes, one or more “media data box”, identified as ‘mdat’ or ‘imda’, containing the media data, and “metadata boxes” (e.g. ‘ moo ⁇ ⁇ or ‘moot box) containing metadata defining placement and timing of the media data.
  • the media data box(es) contain(s) all the data for image or image parts (for example image data 325-1, 325-2). According to the disclosure, these data are stored in the order they are obtained by server 300, for example from camera(s), sensor(s), or sensor taps.
  • the metadata boxes are extended to describe the storage order of the images or image parts in the media data box(es).
  • the storage order is indicated as re ordering instructions at sample level.
  • the reordering instructions are provided at a finer level than the samples, e.g. at a subsample level.
  • These embodiments have variants that, in addition to reordering instructions, also provide spatial information for images or image parts allowing spatial access into the recorded, stored, or encapsulated video which can be useful for extracting and viewing or storing regions of interest. Encapsulation
  • Figure 4 illustrates an example of steps for encapsulating sets of data according to some embodiments of the disclosure. These steps may be carried out, for example, in encapsulation module 305 in Figure 3.
  • a first step (400) is directed to configuring the encapsulation module, for example encapsulation module 305 of server 300 in Figure 3, that encapsulates the received images or image parts, optionally with an encoding step but preferably as uncompressed data.
  • the configuration step may consist in defining a number of cameras or sensors providing images or a number of image parts, in providing camera or sensor characteristics like a frame rate or a number of taps and their scan order (e.g. as described with reference to Figure 2) in case of image sensors.
  • the configuration information may be hard coded in the server or may be user-specified through a graphical user interface.
  • the configuration step may also be used to indicate whether the same configuration parameters should apply for processing the whole encapsulated media data (static configuration) or may change when processing the encapsulated media data (dynamic configuration).
  • the encapsulation module When the encapsulation module is configured, metadata structures of a media file (for example media file 330 in Figure 3) are created. If the media file is generated as media segments, an initialization segment is created.
  • ISOBMFF encapsulation it comprises creating a ‘ moo ⁇ ⁇ box comprising a ‘trak box for the description of the video samples corresponding to recorded images or image parts.
  • the ‘trak box contains boxes for the sample description, starting with a ‘stbi box and ‘stsd’ box providing one or more sample entries.
  • the encapsulation module inserts a new brand as a major brand or in a list of compatible brands in a ‘ftyp’ box, a ‘styp’ box, a ‘ttyp’, or a ‘etyp’ box.
  • This new brand indicates that the media file contains samples that require reordering before being reconstructed and displayed, for example the brand ‘oofs’ (for “out-of-order storage”).
  • a test is carried out to determine whether the configuration is a static configuration or a dynamic configuration (step 402).
  • the encapsulation module provides items of information about the reception or acquisition order for images or image parts to be encapsulated as a sample, i.e. having a same timestamp (step 404). Since the configuration is static, these items of information are preferably provided once for all the samples, at sample level, for example in a sample entry as described hereafter (first embodiment). Next, data chunks corresponding to images or image parts are received (step 404).
  • step 406 receives and storing data chunks from the media data box (step 408).
  • steps 410 these steps of receiving and storing data chunks are repeated until the last data chunk for a given time is received and stored.
  • the sample description is completed (not depicted), for example by setting a sample time, a sample size, and a sample offset providing the location of the sample within the media data box, and a description for a new sample may begin.
  • the media file may consist in a fragmented file.
  • the items of information directed to the data order, that are provided at step 404 preferably apply to the samples in all the fragments.
  • the samples sizes, duration, or offsets could be described within a ‘trun’ box providing a description of a run of samples for a media fragment.
  • the order of the received data chunks composing a sample and stored in a ‘mdaf box follows the items of information stored in the sample description at step 404.
  • step 412 determines whether or not there exist predetermined configurations (step 412), i.e. whether or not the configuration of the encapsulation module may change from one predefined configuration to another predefined configuration, the list of predefined configurations being determined in the configuration step (step 400).
  • the encapsulation module obtains the index of the predefined configuration order for the current sample (step 414).
  • the current sample is associated with the predefined configuration order corresponding to the obtained index (step 416), as explained according to the second embodiment, at sample group level.
  • data chunks corresponding to images or image parts of the current sample are received (step 418) and appended in the media data box of the media file (step 420). These steps of receiving and storing data chunks are repeated until the last data chunk for the current sample is received and stored or until the end of data reception. As illustrated, when all the data for the current sample are received (i.e.
  • the encapsulation module obtains an index of the predefined configuration order to be used for the next sample (step 414) and iterates on steps 416 to 424 until the end of data reception.
  • the encapsulation module keeps on receiving and storing data chunks in the media data box.
  • the media file may also be fragmented. Indeed, associating samples with predefined configurations being handled by sample grouping mechanism, this is also available for fragmented files, by design of the ISOBMFF.
  • the encapsulation module cannot indicate storage order at sample level.
  • the data chunks are then received and stored in media data boxes as they are received (steps 426 and 428), each data chunk being described in a metadata structure with ordering information (step 430), as described according to the third embodiment (relating to storage indication at subsample level).
  • the data chunk may come with header information indicating the position of the data chunk in the source video (or full or complete image) or address offset into the full or complete image. This is for example the case for some industrial cameras using the GigE Vision Streaming Protocol.
  • This header information can be used to obtain ordering information for step 430.
  • header information is removed before storage of the actual data chunk. This is done until no more data are received (test 432).
  • the sample description is completed (e.g. the sample time, the sample size, and the sample offset are set) and a description for a new sample begins.
  • the encapsulation process ends by terminating the media file, possibly with computation of indexes (e.g. indexes of the ‘sidx’, ‘ssix’, or ‘mfra’ are set) for the generated media file or media segment files (step 434).
  • indexes e.g. indexes of the ‘sidx’, ‘ssix’, or ‘mfra’ are set
  • Parsing Figure 5 illustrates an example of steps for parsing received data according to some embodiments of the disclosure.
  • a first step is directed to receiving a media file or media segment files (step 500), generated according to some embodiments of the disclosure, for example according to the steps illustrated in Figure 4.
  • a reception step may comprise a downloading step, a copying step, or a streaming or progressive transmission process.
  • the reader or parser is configured (step 502).
  • the configuration step may comprise reading the ‘moo box of the received media file (or initialization segment when the media file consists in one or more media segments).
  • This configuration step allows the reader or parser to allocate memory to store reconstructed images.
  • the parser may determine from a specific brand whether samples needs reordering or not. If no such brand is present, for example as a major brand or in a compatible brand in the ‘ftyp’, ‘styp’, ‘ttyp’, or ‘etyp’, this information may be obtained from the sample description (step 504).
  • the parser can get information on the kind of samples and their organization (step 504).
  • the parser while inspecting the sample entry, may find a box (see e.g. the first embodiment) indicating a scan order for the samples of the video track.
  • the parser may determine whether a reordering process is to be applied (step 506). If no reordering process is needed, the received data are parsed on a standard basis. On the contrary, if a reordering process it to be applied, another test is carried out to determine whether the configuration is static or not (step 508). This may be performed by determining whether or not a specific box indicating a static configuration (as described herein below) is present. As set forth above, the configuration may comprise a number of cameras or sensors providing images or a number of image parts, camera or sensor characteristics like a frame rate or a number of taps and their scan order (e.g. as described with reference to Figure 2) in case of image sensors.
  • the parser reads the sample ordering description in the corresponding box (step 510) so as to determine the reordering process to apply forthe data chunks of a given sample (step 512).
  • the reading step is performed here on a subsample (data chunk) basis, the ordering of each subsample (data chunk) within a sample being determined according to the scan pattern defined in the sample ordering description (step 510).
  • the number of subsamples and their size can be read from the ‘subs’ box (e.g. the subsample_count parameter).
  • the sample is provided to the video decoder or to the reader (step 516) depending on whether the video or images are compressed or not.
  • the parser iterates on the next sample until no more sample can be read (step 518).
  • the parser further inspects the sample description to determine whether some sample groups provide indication for sample reordering (step 520). This may be performed by looking for a specific grouping type dedicated to “sample data reordering” (or data chunks reordering or scan order indication), for example represented by the four-character code ‘sdro’ or ‘msif. It is to be noted here that the four-character code (or 4cc) is just an example, any other four- character code, not conflicting with existing and registered ones, may be used.
  • the sample group index is read at step 522, providing an entry in a sample group description with same grouping_type ‘sdro’.
  • This sample group entry provides the description for the data chunks reordering to apply in step 526 once the data chunks have been read (524).
  • the parser may rely on the sample size to detect the end of a sample or on the number of subsamples (data chunks) in the ‘subs’ box.
  • the reconstructed image is provided to a video player (step 528), either a decoder (in case of compressed video or images) or a video Tenderer (in case of uncompressed video or images). This is done until no more sample is to be read (step 530).
  • the parser reads data chunks (step 532) one after another and data chunk reordering items of information (step 534), according to the third embodiment described herein below, for example in subsample description.
  • the parser iterates on data chunks composing the current sample (step 536) or while data corresponding to the sample size have not been read (i.e. another criterion for step 536). Once all the data chunks for a sample have been read, they are reordered (step 538) to provide the reconstructed image to the video player (step 540).
  • the parser iterates on the samples until no data remain to read (step 542).
  • Figure 6 illustrates examples of encapsulation of image data obtained from a quad-tap sensor.
  • the image data of the encapsulated media data obtained from a quad-tap sensor.
  • Each tap sends image data chunk to a server or a video recorder, for example server 300 in Figure 3.
  • the data chunks are stored in an interleaving that applies to all samples. For example, data chunk 1 is received from the first tap, next data chunk 2 is received from the second tap, next data chunk 3 is received from the third tap, next data chunk 4 is received from the fourth tap, next data chunk 5 is received from the first tap, next data chunk 6 is received from the second tap, and so on.
  • This repetition can be exploited to describe the sample reordering instructions (from parser point of view), or image scan order (from image acquisition point of view), once for all samples of a video sequence.
  • Encapsulated media data 605, 610, and 615 illustrate other storage order examples according to which the reception order cannot be determined in advance.
  • the reception order from the different taps may vary.
  • data chunks 7 and 8 in the second sample come before data chunks 5 and 6 in the same sample while data chunks 7 and 8 in the first sample come after data chunks 5 and 6 in the same sample.
  • sample grouping may be use to indicate the sample reordering instructions (from parser point of view) or image scan order (from image acquisition point of view).
  • each sensor tap may send data chunks with a varying order.
  • the tap corresponding to the bottom right sent all its data (data chunks 4 and 8) before the other taps sent all their data.
  • the data chunk length may vary but also the number of data chunks per sample, as can be seen in second sample.
  • Figure 7 illustrates a first embodiment wherein the storage order or the reordering instructions for the data chunks of a sample are described in a sample entry.
  • the sample description box (‘ stsd ’), referenced 700 contains sample entry description.
  • an uncompressedVideoSampleEntry related to uncompressed video referenced 705
  • Multi tap description box 710 contains a description of the scan order of the image obtained from the image sensor(s) that is(are) used.
  • the parameters of the multi tap description box make it possible to describe various configurations of multi tap sensors, in particular the ones described by reference to Figure 2.
  • multi tap description box 710 may contain a first parameter denoted tap_number providing the number of taps. Still for the sake of illustration, its value may be set to zero to indicate that the sensor is a dual-tap sensor and to one to indicate that the sensor is a quad-tap sensor. Accordingly, it may then be coded on 1 bit. If a greater number of taps needs to be supported, additional bits can be allocated to this parameter.
  • Multi tap description box 710 may also contain a second parameter, for example tap_chunk_num, to indicate a number of bursts or of chunks corresponding to the number of reads or emissions per tap for an image.
  • a second parameter for example tap_chunk_num
  • tap_chunk_num a parameter that indicates a number of bursts or of chunks corresponding to the number of reads or emissions per tap for an image.
  • tap_chunk_num a second parameter, for example tap_chunk_num, to indicate a number of bursts or of chunks corresponding to the number of reads or emissions per tap for an image.
  • tap_chunk_num a second parameter
  • the tap_chunk_num can be provided for each tap, for example in a loop on the number of taps (not represented in multi tap description box 710).
  • the tap_chunk_num information may be used with the tap_number information to obtain the number of subsamples within a sample.
  • a sub sample information box is then added in the description, each subsample providing the byte range (through the subsample_size parameter) for a data chunk.
  • Multi tap description box 710 may also contain a third parameter, for example tapjnterlaced, to indicate, in case of dual-tap sensors, whether the scan of the image is interlaced or not (like in configurations 200 and 205 in Figure 2) and a fourth parameter, for example tap_mode, to indicate the orientation of the tap border, for example horizontal as illustrated with reference to sensor 100-2 in Figure 2 or vertical as illustrated with reference to sensor 100-1 in Figure 2.
  • a third parameter for example tapjnterlaced
  • tap_mode to indicate the orientation of the tap border, for example horizontal as illustrated with reference to sensor 100-2 in Figure 2 or vertical as illustrated with reference to sensor 100-1 in Figure 2.
  • the multi tap description box 710 may also contain, for each tap, a tap_offset providing the position of the top left corner of a tap and a tap size (e.g. in pixels), in case they are not equally sized.
  • a tap_offset providing the position of the top left corner of a tap
  • a tap size e.g. in pixels
  • multi tap description box 710 may contain an additional parameter to indicate the row scan order, either top to bottom or bottom to top. By default (when this parameter is not present), the row scan order may be set to top to bottom.
  • multi tap description box 710 may contain an additional parameter to indicate the column scan order, either left to right or right to left. Again, by default (when this parameter is not present), the column scan order may be set left to right.
  • the multi tap description box simply consists in providing a reference into this list of predefined set of tap patterns, for example as an 8 bit index.
  • the limited set of tap patterns may contain some of the most common configurations, for example the configurations illustrated in Figure 2, and defined as Codec independent Code Points (to be used as reference by encapsulation modules and parsers).
  • MultiTapDescriptionBox has been described in relation to a sample entry for uncompressed video (705), a multi tap description box like multi tap description box 710 can be used as one optional box in sample entries defined for video codecs like A VC, HEVC, or VVC.
  • Figures 8 and 9 illustrate a second embodiment where the scan order may vary from one sample to another, the reordering instructions or storage order of the data chunks for a given sample being described using a sample group mechanism. This description may be part of sample description (‘stbl’ box or ‘traf for fragmented files).
  • Figure 8 illustrates a first example of data encapsulation wherein the reordering instructions or storage order of the data chunks for the samples are described using a sample group mechanism.
  • a first sample to group box referenced 800 is created to group samples having a same scan order (e.g. a same number of data bursts, the data burst being always received in same order, etc.) like samples 805-1 to 805-3 in encapsulated media data 810.
  • Sample to group box 800 uses a new grouping type called for example ‘sspm’ (for “sub-sample mapping”) referenced 815.
  • sspm for “sub-sample mapping”
  • the four- character code or 4cc
  • the description or properties associated with each group of samples declared in sample to group box 800 are provided in a sample group description box having the same grouping_type ‘sspm’, like sample group description box 820.
  • entry_count 1
  • all the samples of the considered track in particular samples 805-1 to 805-3, use the same data chunk order, as depicted in encapsulated media data 810.
  • the associated sample group description box with grouping type ‘sspm’ provides subsample map entries (e.g. entries 825).
  • the associated sample group description box 820 may alternatively provide byte-range map entries (e.g. entries 825).
  • the grouping type is set to another four- character code (or 4cc) like ‘brmp’ for byte-range map entries.
  • a subsample map entry may be used to assign an identifier, called entryjdc, to each subsample (i.e. data chunk) within the samples of a same group of samples. It relies on a subsample information box (‘subs).
  • entry_count specifies the number of entries in the map. According to some embodiments, when rle indicates that run-length encoding is used to assign entryjdc to subsamples, the entry_count corresponds to the number of runs where consecutive subsamples are associated with the same group. When rle indicates that run-length encoding is not used to assign entryjdc to subsamples, entry_count represents the total number of subsamples.
  • subs_start_number is the 1 -based subsample index (from ‘subs’ box) in the sample of the first subsample in the current run associated with entryjdc
  • entryjdc specifies the indice of a visual sample group entry (e.g. the indices for group entries 845 and 850) in a sample group description box (e.g. in sample group description box 835) with grouping ype equal to the groupingJype_parameter of the SampleToGroupBox of type 'sspm'.
  • the entryjdc directly indicates the ordering number of a subsample or of a run of subsamples.
  • the grouping Jype_parameter of a sample to group box with ‘sspm’ grouping type shall have its groupingJype_parameter set to ‘vsso’ indicating Virtual subsample ordering. Virtual here is to indicate that no sample group description like sample group description box 835 is present, the entryjdc providing directly the property of the subsample or run of subsamples (in the given example, the order within the sample).
  • a byte range map entry may be used to assign an identifier, also called entryjdc, to byte ranges (i.e. data chunk) within the samples of a same group of samples.
  • entryjdc an identifier
  • a byte range map entry may not require a subsample information box (‘subs) to be present. Moreover, it is relevant for uncompressed data where the number of bytes per pixel may be constant from one sample to another (allowing the mapping for group of samples).
  • entry_count specifies the number of entries in the map
  • byte_range_size specifies the size of the byte range
  • entryjdc specifies the indice of a visual sample group entry (e.g. the indices for group entries 845 and 850) in a sample group description box (e.g. in sample group description box 835) in a sample group description box (e.g. in sample group description box 835) with grouping ype equal to the groupingJype_parameter of the SampleToGroupBox of type 'brmp'.
  • the entryjdc directly indicates the ordering number of a byte-range.
  • the grouping Jype_parameter of a sample to group box with ‘brmp’ grouping type has its grouping Jype_parameter set to ‘vbro’ indicating Virtual byte-range ordering.
  • Virtual is to indicate that no sample group description like sample group description box 835 is present, the entryjdc providing directly the property of the byte range (e.g. the order in the image).
  • the entries 825 are mapped into entries (group entries 845 and 850) in a second sample group description box (sample group description box 835).
  • the mapping of entries 825 is done by providing a entryjdc value (for example entryjdc 830) for one or more subsamples or byte-ranges.
  • This entryjdc value indicates an entry (for example group entry 845 or 850) in the sample group description box (e.g. sample group description box 835), starting numbering from value 1, the value 0 being reserved.
  • subsamples or data chunks 3 and 4 are mapped into the first entry referenced 845 of sample group description box 835 and subsamples or data chunks 1, 2 and 5 are mapped into the second entry referenced 850.
  • their relative order is their declaration order in the sample group description 820.
  • the sample group description box 820 indicates that data chunk 3 and 4 correspond, in this order, to an image part described by the first ‘2dsr’ entry in the sample group description box 835.
  • data chunks 1 , 2, and 5 compose, in this order, the second spatial part of the full image as described in the second entry of the sample group description box 835.
  • the entries 845 and 850 provide properties applying to the subsamples with which they are associated.
  • the kind of properties depends on the grouping type (e.g. grouping type 840) in the sample group description box (e.g. sample group description box 835).
  • the grouping type may be a ‘2dsr’ providing 2D spatial relationship information like 2D spatial relationship information 855 or a ‘2dps’ for 2D spatial positions information like 2D spatial positions information 860 (as x and y positions or as an address_offset in the full image or complete image).
  • the constraint for this grouping Jype is that it is also declared in the groupingJype_parameter of the sample to group box (e.g. sample to group box 800).
  • This sample group description may then contain more or less items of information like an ordering number for the subsample (‘sspo’, referenced 865), a spatial position (‘ 2dps referenced 860) within the image, or a spatial position and a size (‘ 2dsr ’, referenced 855) within the image.
  • the two latter allow a spatial access to data chunks corresponding to a given region. Moreover, they provide absolute positioning in the image, while ‘sspo’ provides relative ordering between subsamples or byte ranges.
  • ‘sspo’ is used in the sample group description box 835, there may be as many entries in sample group description 820 as in sample group description 835, because each data chunk may have one ordering number (865).
  • Figure 9 illustrates a second example of data encapsulation wherein the reordering instructions or storage order of the data chunks for the samples are described using a sample group mechanism, enabling the use of different scan orders from one sample to another.
  • each group of samples is mapped into one entry in a first sample group description box, referenced here 910, containing subsample map entries or byte-range map entries (as explained in reference to Figure 8).
  • Each entry in this sample group description box is then mapped into a set of properties in a second sample group description box referenced 920.
  • the type of properties in this second sample group description box depends on the grouping type referenced 925, this grouping_type being the same as the one in the grouping_type_parameter in sample to group box 900.
  • the set of properties in sample group description box 920 may be of different kinds, like an ordering property (reference 865 in Figure 8), a spatial position property (reference 860 in Figure
  • sample group description box 920 may be omitted when using virtual subsample ordering or virtual byte range ordering, depending on the grouping_type_parameter value set in the sample to group box 900 (indicating an actual sample group description box of a given grouping_type or a virtual sample grouping type, like for example the ‘vsso’ or ‘vbro’ as explained in reference to Figure 8).
  • the sample groups defined in a sample to group box for example in sample to group box 800 in Figure 8 (respectively in sample to group box 900 in Figure
  • sample group entries describing a pre-defined scan order within a sample for example into group entries 845 and 850 (respectively into group entries 930 and 935 in Figure 9) containing sample group entries providing information like the ones indicated in the MultiTapDescriptionBox.
  • group entries 845 and 850 respectively into group entries 930 and 935 in Figure 9
  • sample group entries providing information like the ones indicated in the MultiTapDescriptionBox.
  • This avoids intermediate mapping of the subsamples or byte ranges.
  • Each scanning configuration may be described as a specific visual sample group entry providing information for samples resulting from multiple scan areas or orders (e.g. samples acquired from multi-tap sensors).
  • a MultiScanDescriptionGroupEntry sample group entry may be defined as follows: aligned(8) class MultiScanDescriptionGroupEntry extends
  • This sample group entry may alternatively contain variants of the multi tap description box 710. This requires presence of a ‘subs’ box to indicate the different data chunks (as sub-samples) composing a sample.
  • the value for entry_count in the ‘subs’ box should be greater than or equal to the number of sample group entries of type ‘msif.
  • the ‘subs’ box may also have a flag value indicating that the subsamples are not stored in the image order but in a scan order.
  • sample group description boxes for example sample group description boxes 820 in Figure 8 and 910 in Figure 9
  • the grouping_type of the sample to group box and the sample group description box for example of sample to group box 800 and sample group description box 820 in Figure 8 or sample to group box 900 and sample group description box 910 in Figure 9, would then be set to ‘nalm’ instead of ‘sspm’ or ‘brmp’.
  • the grouping_type_parameter of the sample to group box may take, for example a value among ‘trif from ISO/IEC 14496-15 for spatial information or ’rrif from an amendment to ISO/IEC 14496-15 also for spatial information or ‘sspo’ (e.g. reference 865 in Figure 8) for ordering indication, ‘2dps’ (e.g. reference 860 in Figure 8) for spatial position information, or ‘2dps’(e. g. reference 855 in Figure 8).
  • the use of NALU mapping requires however properties like the one described in relation to references 855 to 865 in Figure 8 further extended with a groupID parameter allowing the identification of a property. Indeed, NAL unit mapping refers properties by their IDs (groupID) rather than by their index in a sample group description box.
  • the subsample mapping may use virtual sample grouping, using entryjdc in sample group description boxes, for example in sample group description box 820 in Figure 8 or in sample group description box 910 in Figure 9, without referring any more to any entry in a second sample group description, for example by referencing one of predefined configurations (or tap patterns).
  • These pre-defined patterns may be defined once for all by the standard as fixed values, like codec independent code points and be referred in entryjdc parameters.
  • These pre-defined patterns may be described for example with the information contained in a MultiTapDescriptionBox: number of taps, number of chunks per tap, tap scan order, etc.
  • Figure 10 illustrates an embodiment in which the reordering instructions for data chunks are indicated at subsample level, within a subsample information box (‘subs’) of a sample table box (‘stbi), for example within subsample information box 1010 of sample table box 1005 of movie box 1000 (it is to be noted that ‘subs’ box could also be used in movie fragments, e.g. in ‘moof box).
  • a subsample information box can extended via codec_specific_parameters, for example via codec_specific_parameters 1015. While these parameters are defined for video codecs like AVC, HEVC, or VVC, no definition is available for uncompressed video.
  • subsamples for uncompressed video may be defined as follows: for the use of the subsample information box (8.7.7 of ISO/IEC 14496-12) in an uncompressed video stream, a subsample is defined as one or more contiguous byte ranges within a sample. A subsample is further defined on the basis of the value of flags of the subsample information box ‘subs’ as specified below. The presence of this box is optional. However, if it is present in a track containing uncompressed video data, it must comply with a predetermined semantic, for example the following one wherein the flag value specifies the type of subsample information given in this box:
  • a subsample corresponds to a byte range in order
  • a subsample corresponds to a byte range stored out of order (with reordering information provided elsewhere, for example in a box in the sample entry (e.g. in multi tap description box 710) or via a sample grouping mechanism like ‘sspm’, ‘brmp’, or ‘msif according to the second embodiment); this indicates that subsamples do not follow image order but rather a scan order, for example from image sensor;
  • a subsample corresponds to a byte range stored out of order with order information provided as indicated in codec_specific_parameters (e.g. codec_specific_parameters 1020), i.e. as a numbering order in the sample.
  • codec_specific_parameters e.g. codec_specific_parameters 1020
  • this number varies from 1 to N, 1 corresponding for example to data chunk for the top left corner of the image and N to the bottom right of the image;
  • a subsample corresponds to a byte range stored out of order with order information provided as an horizontal and vertical positions offsets in pixels in the source image, for example using 16 bits of codec_specific_parameters 1015 for the horizontal offset and 16 other bits of this same codec_specific_parameters for the vertical offset;
  • a subsample corresponds to a byte range stored out of order with order information provided as an index into a spatial layout, for example using 16 bits of codec_specific_parameters 1015 to refer an index into a spatial layout described as a grid.
  • order information provided as an index into a spatial layout, for example using 16 bits of codec_specific_parameters 1015 to refer an index into a spatial layout described as a grid.
  • This information allows parser or reader to position the data chunk relatively to other data chunks in the 2D referential of the reconstructed image. It also allows parser or reader to retrieve data given a region in the image by determining possible intersection between this subsample and a given region;
  • a subsample corresponds to a byte range stored in order with information on its spatial position into the source image as an horizontal and vertical positions offsets in pixels in the source image, for example using 16 bits of codec_specific_parameters 1015 for the horizontal offset and 16 other bits of this same codec_specific_parameters for the vertical offset.
  • codec_specific_parameters 1015 for the horizontal offset
  • 16 other bits of this same codec_specific_parameters for the vertical offset As for value 4, this allows parser or reader to retrieve data given a region in the image
  • 6 a subsample corresponds to a byte range stored in order with information on its spatial position into the source image with order information provided as an index into a spatial layout, for example using 16 bits of codec_specific_parameters 1015 to refer an index into a spatial layout described as a grid Such a spatial layout is described hereafter. As for values 4 and 5, this allows parser or reader to retrieve data given a region in the image.
  • a metadata structure defining this spatial layout is required.
  • This metadata structure may be a specific box as a sub-box of the ‘trak’ box (or of the ‘traf if the spatial layout changes along time). It describes the positions and sizes of the different image parts or image partitions (e.g. image data 325-1, 325-2) composing an image from the source video (the full or complete image).
  • a spatial layout information box called for example SpatialLayoutBox, may be defined as follows: Box Type: 'splt '
  • horjoffset and vert_offset parameters respectively indicate the horizontal and vertical position, for example in pixel units, of an image part or partition in the source video (the full or complete image); and partition_width and partition_height respectively indicate the width and height, for example in pixels unit, of an image part or partition.
  • the spatial layout information box implicitly provides indices for image parts or partitions that can be used in subsample description or any metadata structure for spatial position indication based on a pre-defined spatial layout.
  • the spatial layout especially when regular, may be described as a number of horizontal partitions and a number of vertical partitions, implicit indices and positions for the partitions would then consist in a raster-scan order of the partitions.
  • Other ISOBMFF boxes providing same kind of information (at least horizontal and vertical positions and the indices for the partitions) could be used as an alternative.
  • references 1020 and 1025 allow a parser to rearrange data stored in a media data box, for example in mdat 1030, to provide an image, for example image 1035.
  • codec_specific_parameters has the advantage of not breaking the processing of the ‘subs’ box by legacy players.
  • An alternative to the use of this parameter could be to use a new version of the ‘subs’ box with fewer parameters (removing some existing parameters like discardable or sample_priority that may not be relevant for samples or subsamples from uncompressed video).
  • This new version of the box may be dedicated to subsamples that do not follow the image order but rather a scan order.
  • This new version of the box contains a new parameter indicating the order of each subsample in the reconstructed image.
  • FIG 11 is a schematic block diagram of a computing device 1100 for implementation of one or more embodiments of the disclosure.
  • the computing device 1100 may be a device such as a micro-computer, a workstation, or a light portable device.
  • the computing device 1100 comprises a communication bus 1102 connected to:
  • CPU central processing unit
  • RAM random access memory 1108 for storing the executable code of the method of embodiments of the disclosure as well as the registers adapted to record variables and parameters necessary for implementing the method for encapsulating, indexing, de-encapsulating, and/or accessing data, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example;
  • ROM 1106 for storing computer programs for implementing embodiments of the disclosure
  • network interface 1112 that is, in turn, typically connected to a communication network 1114 over which digital data to be processed are transmitted or received.
  • the network interface 1112 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1104;
  • a user interface (Ul) 1116 for receiving inputs from a user or to display information to a user
  • a hard disk (HD) 1110 for receiving inputs from a user or to display information to a user
  • Ul user interface
  • HD hard disk
  • an I/O module 1118 for receiving/sending data from/to external devices such as a video source or display.
  • the executable code may be stored either in read only memory 1106, on the hard disk 1110 or on a removable digital medium for example such as a disk.
  • the executable code of the programs can be received by means of a communication network, via the network interface 1112, in order to be stored in one of the storage means of the communication device 1100, such as the hard disk 1110, before being executed.
  • the central processing unit 1104 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the disclosure, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1104 is capable of executing instructions from main RAM memory 1108 relating to a software application after those instructions have been loaded from the program ROM 1106 or the hard-disc (HD) 1110 for example. Such a software application, when executed by the CPU 1104, causes the steps of the flowcharts shown in the previous figures to be performed.
  • the apparatus is a programmable apparatus which uses software to implement the method of the disclosure.
  • the method of the present disclosure may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

According to some embodiments of the disclosure, it is provided a method for encapsulating an image in a file compliant with the ISOBMFF standard, the method comprising: obtaining a sequence of data chunks, each data chunk including image data of a portion of the image; generating metadata providing a relationship between the sequence of data chunks and a predetermined scan of the portions of the image, encapsulating the data chunks of the sequence of data chunks and the generated metadata in the file.

Description

METHOD, DEVICE, AND COMPUTER PROGRAM FOR OPTIMIZING ENCAPSULATION OF IMAGES FIELD OF THE DISCLOSURE
The present disclosure relates to encapsulation of images or sequences of images, in particular of uncompressed images or uncompressed sequences of images, in a standard and interoperable format, for example to store or transmit images acquired from high throughput image sensors.
BACKGROUND OF THE DISCLOSURE
Very high throughput image sensors are used in many applications, for example in medical or industrial devices. In such applications, it is necessary to acquire high-definition images at a high throughput rate. To that end, multi-tap sensors have been created to increase the frame rates of image sensors. According to the multi-tap structure, the image frames are split into two or more areas that are clocked out in parallel. Thus, the surface of a multi-tap sensor is divided into multiple tap areas (e.g. dual-tap, quad-tap, etc.) as illustrated in Figure 1. For the sake of illustration, a quad-tap sensor can output ultra-high definition frames (e.g. 7680 x 4320 pixels) at a frame rate near to 1,000 frames per second. Frame rates of
100,000 frames per second are foreseen. Generally, each tap area has its own electronic circuit, named the tap, for creating a signal and an individual output for each of the tap areas. The image data from the tap areas are shifted, amplified, and selected by the taps simultaneously over shorter distances, which enables faster frame rates. Figure 1 illustrates examples of multi-tap sensors.
As illustrated, each of the dual-tap sensors referenced 100-1 and 100-2 comprises two tap areas, referenced 105-11 and 105-12 and 105-21 and 105-22 respectively, that may be of a same size or not and that may be arranged according to different configurations. Likewise, each of the quad-tap sensors referenced 110-1 and 110-2 comprises four tap areas that may be of a same size or not and that may be arranged according to different configurations.
The data obtained from the different sensor areas must be reassembled into images. This reassembling may be done by what is generally called a frame grabber.
For portability issues, the acquired data should preferably be stored or transmitted according to an interoperable format so that application specific images may become available to any common client (e.g. smartphone or personal computer) without requiring specific software other than a standard media player.
The MPEG standard allows storage of compressed video sequences, based on a common basis format, the ISO Base Media File Format standardized by ISO as ISO/IEC 14496-12. Extensions of this standard, like ISO/IEC 14496-15, define codec- specific storage formats, based on NAL units (Network Abstraction Layer units). Video codec specifications, e.g. AVC (ISO/IEC 14496-10), HEVC (ISO/IEC 23008-2), and VVC (ISO/IEC 23090-3), define the NAL unit types and payloads. The NALU-based File Format (ISO/IEC 14496-15) defines the storage of these NAL units so that any compliant file format parser can build a bit-stream that is decodable by a video decoder conforming to, for example, AVC, HEVC, or VVC.
The MPEG group is now considering building a new standard (ISO/IEC 23001-17) to offer interoperability for the storage of uncompressed videos (that do not use NAL unit). However, since ISOBMFF handles timing at sample level (decoding time or composition time is handled, for example, in ‘sits’ and ‘cits’ boxes), ISOBMFF does not define metadata to handle timing (or order) information at a finer granularity than samples (i.e. image level), for example at subsample level (i.e. sub-image or image part level).
The present disclosure has been devised to address one or more of the foregoing concerns.
SUMMARY OF THE DISCLOSURE
The present disclosure has been devised to address one or more of the foregoing concerns. In this context, there is provided a solution for improving encapsulation of uncompressed images.
According to a first aspect of the disclosure there is provided a method for encapsulating an image in a file compliant with the ISOBMFF standard, the method comprising: obtaining a sequence of data chunks, each data chunk including image data of a portion of the image; generating metadata providing a relationship between the sequence of data chunks and a predetermined scan of the portions of the image, encapsulating the data chunks of the sequence of data chunks and the generated metadata in the file. Accordingly, the method of the disclosure makes it possible to output image parts in parallel, with no need to buffer the image parts before storage, for example for reordering purpose. This reduces needs for memory and for processing power in recording devices and/or in intermediate devices located between acquisition devices and transmission devices. Moreover, this provides low latency access to acquired images when they are remotely accessed, for example via HTTP requests by remote viewing devices such as smartphone, remote computers, and surveillance screens.
According to some embodiments, the data of each of the portions are obtained from a tap of a multi-tap image capturing apparatus, wherein the order of the sequence of data chunks depends on the number of taps of the multi-tap image capturing apparatus and on a scanning order of pixels of each of the portion.
According to some embodiments, the data chunks are encapsulated into samples according to the order of the sequence of data chunks.
According to some embodiments, the file comprises an indicator that the sequence of data chunks is ordered differently than the predetermined scan of the image.
According to some embodiments, the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes the encapsulation of all images encapsulated within the file.
According to some embodiments, the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes the encapsulation of some of images encapsulated within the file.
According to some embodiments, the generated metadata are encapsulated within a multi tap description box associated with a sample description box describing encapsulation of the image. According to some embodiments, the generated metadata are encapsulated within a sample entry.
According to some embodiments, the relationship between the sequence of data chunks and a predetermined scan of the portions of the image is described using a sample group mechanism so that the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes each encapsulated image of a group of encapsulated images.
According to some embodiments, the file comprises a sample to group box defining at least one group of samples, the image belonging to the group of samples, and wherein the file further comprises at least one sample group description box associated with the sample to group box, the at least one sample group description box comprising the generated metadata.
According to some embodiments, a first relationship between a sequence of data chunks and a predetermined scan of portions of an image is associated with a first group of samples and a second relationship between a sequence of data chunks and a predetermined scan of portions of an image, different from the first relationship, is associated with a second group of samples.
According to some embodiments, the relationship between the sequence of data chunks and a predetermined scan of the portions of the image is described within a subsample information box using codec specific parameters.
According to some embodiments, the generated metadata make it possible to parse only a spatial portion of the encapsulated image.
According to some embodiments, the image is an uncompressed image.
According to a second aspect of the disclosure there is provided a method for parsing a file compliant with the ISOBMFF standard, the method comprising: obtaining, from the file, metadata providing a relationship between a sequence of data chunks and a predetermined scan of portions of an image, obtaining, from the file, a sequence of data chunks, each data chunk including image data of a portion of the image, ordering the obtained data chunks according to the obtained metadata, and generating the image from the ordered data chunks.
The second aspect of the disclosure has advantages similar to those mentioned above.
According to some embodiments, the metadata are obtained from a multi tap description box associated with a sample description box describing encapsulation of the image.
According to some embodiments, the metadata are obtained from a sample group mechanism, the relationship between a sequence of data chunks and a predetermined scan of portions of an image describing each encapsulated image of a group of encapsulated images.
According to some embodiments, the metadata are obtained from a subsample information box of a sample table box using codec specific parameters.
According to a third aspect of the disclosure there is provided a device for encapsulating an image within a file or a device for parsing an image encapsulated within a file, the device comprising a processing unit configured for carrying out each of the steps of the method described above.
The third aspect of the disclosure has advantages similar to those mentioned above. At least parts of the methods according to the disclosure may be computer implemented. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present disclosure may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since the solution of the present disclosure can be implemented in software, the solution of the present disclosure can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
BRIEF DESCRIPTION OF THE DRAWINGS
Some embodiments of the disclosure will now be described, by way of example only, and with reference to the following drawings in which:
Figure 1 illustrates examples of multi-tap sensors;
Figure 2 illustrates different examples of scan orders of portions of an image given different tap configurations in image sensors;
Figure 3 illustrates an example of streaming image data from a server to a client according to some embodiments of the disclosure;
Figure 4 illustrates an example of steps for encapsulating sets of data according to some embodiments of the disclosure;
Figure 5 illustrates an example of steps for parsing received data according to some embodiments of the disclosure;
Figure 6 illustrates examples of encapsulation of image data obtained from a quad-tap sensor; Figure 7 illustrates a first embodiment wherein the storage order or reordering instructions for the data chunks of a sample are described in a sample entry;
Figures 8 and 9 illustrate examples of data encapsulation wherein the reordering instructions or storage order of the data chunks for the samples are described using a sample group mechanism;
Figure 10 illustrates a particular embodiment in which the reordering instructions for data chunks are indicated at subsample level; and
Figure 11 schematically illustrates a processing device configured to implement at least one embodiment of the present disclosure.
DETAILED DESCRIPTION OF THE DISCLOSURE
According to some embodiments of the disclosure, image data obtained from a multi-tap sensor are encoded to be stored or transmitted as they are obtained from the sensor, in the read order, without reordering and without buffering needs. This avoids memory and CPU consumption between data acquisition and data storage or data transmission. In addition, this provides low latency access to acquired images when they are remotely accessed, for example via HTTP requests, by a viewing device. Reconstructing the images can be done by a parser based on reordering instructions added in the metadata part of the file. In addition, these metadata make it possible to access directly any spatial part of images. It is assumed that a multi-tap sensor may be made of a single device or a set of multiple devices, for example a set of two dual-tap devices forming a quad-tap sensor.
The inventors have observed that not only the number of taps as well as the order of accessing the data from these taps may vary from one sensor to another, but the way the pixels are scanned within each area corresponding to each tap may also vary.
Figure 2 illustrates different examples of scan orders of portions of an image given different tap configurations in image sensors.
For the sake of illustration, the configurations referenced 200 and 205 correspond to configurations of a dual-tap sensor using an interlaced mode. More precisely, according to configuration 200, the data of the even rows are obtained from a first tap of the sensor, from left to right and from top to bottom, and the data from the odd rows are obtained from a second tap of the sensor, in the same order (i.e. the left to right and top to bottom order). Similarly, according to configuration 205, the data of the even columns are obtained from a first tap of the sensor, from top to bottom and from left to right, and the data from the odd columns are obtained from a second tap of the sensor, in the same order (i.e. the top to bottom and left to right order).
Configurations referenced 210 and 215 are other examples of image scan configuration of a dual-tap sensor. According to configuration 210, the sensor area is split into two vertical parts. A first tap is used to obtain the data of the left part of the image while a second tap is used to obtain the data of the right part, both in a raster scan order, from left to right and from top to bottom. Similarly, according to configuration 215, the sensor area is split into two horizontal parts, a first tap is used to obtain the data of the upper part of the image while a second tap is used to obtain the data of the lower part, both in a raster scan order, from left to right and from top to bottom.
Naturally, other scan orders exist for dual-tap sensors. For example, each tap may obtain data in a reverse raster scan order (i.e. from bottom to top and right to left) or any other combination. For the sake of conciseness, they are not described or illustrated here. Still for the sake of illustration, the configurations referenced 220 to 235 correspond to some possible configurations of a quad-tap sensor. According to the illustrated examples, the sensor area is split into four parts, each part corresponding to a particular tap: two upper parts and two lower parts, comprising two left parts and two right parts. According to configuration 220, the data of each part are obtained in a raster scan order (i.e. from left to right and from top to bottom) from the corresponding tap. According to configuration 225, the data of each of the upper parts are obtained in a raster scan order (i.e. from left to right and from top to bottom) from the corresponding tap and the data of each of the lower parts are obtained in a reverse scan order compared to the raster scan order, i.e. from left to right and from bottom to top, from the corresponding tap.
According to configuration 230, the data of each part are obtained as follows
- for the upper left part, the data are obtained from left to right and from top to bottom by a first tap; - for the upper right part, the data are obtained from right to left and from top to bottom by a second tap;
- for the lower left part, the data are obtained from left to right and from bottom to top by a third tap; and
- for the lower right part, the data are obtained from right to left and from bottom to top by a fourth tap. According to configuration 235, the data of each part are obtained as follows
- for the upper left part, the data are obtained from right to left and from bottom to top by a first tap;
- for the upper right part, the data are obtained from left to right and from bottom to top by a second tap;
- for the lower left part, the data are obtained from right to left and from top to bottom by a third tap; and
- for the lower right part, the data are obtained from left to right and from top to bottom by a fourth tap. Again, the sensor area can be split differently and other scan orders exist for quad-tap sensors. For the sake of conciseness, they are not described or illustrated here.
Figure 3 illustrates an example of streaming image data from a server to a client according to some embodiments of the disclosure.
As illustrated, a server 300 comprises an encapsulation module 305. The server 300 may be connected, via a network interface (not represented), to a communication network 310 to which is also connected, via a network interface (not represented), a client 320 comprising a de-encapsulation module 315.
Server 300 processes data, e.g. uncompressed video data, uncompressed sequence of images, and/or uncompressed images, for streaming or for storage. To that end, server 300 obtains or receives data comprising, for example, the recording of a scene by one or more cameras or image sensors, referred to as a source video. It may also obtain or receive video or image data from a single camera or image sensor but as different parts (or zones or partitions) of an image in parallel. The source video is received by the server as an original sequence of pictures, or images, or picture parts or image parts, for example image data 325-1 and 325-2. For example, an image may be received as two or more image parts in parallel, a first part corresponding to the left part of the image and a second part corresponding to the right part of the image. Data obtained or received by the server 300, corresponding to an image part, are called a data chunk. A data chunk may correspond to the whole image part or there may be several data chunks for an image part (for example one data chunk per pixel row of the image part). It consists in a set of bytes representing the values of the pixels of the image part. The combination of the image parts lead to a full image or complete image corresponding to an image (or frame or picture) of the source video.
The server may encapsulate the sequence of pictures or picture parts into a media file or media segments 330 without any compression, as they are received and as soon as they are received (i.e. without any delay ) using encapsulation module 305. According to some embodiments, the server may encode a sequence of pictures into media data (i.e. bit-stream) using one or more media encoders (e.g. a video encoder encoding either an image in a multi-camera system or a picture partition in a single camera system), not represented, and encapsulates the media data in one or more media files or media segments 330 using encapsulation module 305. The incoming video may contain multiple sequences of pictures (e.g. video data corresponding to image data 325-1 and video data corresponding to image data 325-2).
Encapsulation module 305 comprises at least one of a writer or a packager to encapsulate the media data (either uncompressed or compressed). In case of compressed video or images, the one or more media encoders may be implemented within encapsulation module 305 to encode received data or may be distinct from encapsulation module 305. Encoding is done as soon as the sequence of pictures or picture parts (or partitions) is obtained (live encoding).
Encapsulation is preferably done in a live mode, or with low delay or low latency, i.e. by producing media fragments or segments.
Client 320 is used for processing data received from communication network 310, for example for processing media file 330. After the received data have been de- encapsulated in de-encapsulation module 315 (also known as a parser), the de- encapsulated data (or parsed data), corresponding to uncompressed video may be stored, displayed, or output. In the case according to which the de-encapsulated data (or parsed data) correspond to a media data bit-stream, it is decoded, forming, for example, video or image data that may be stored, displayed, or output. For compressed video, the media decoder may be implemented within de-encapsulation module 315 or it may be distinct from de-encapsulation module 315.
Client or server may be user devices but may also be, for example, network nodes processing the media files being transmitted or stored.
It is noted that media file 330 may be communicated to de-encapsulation module 315 in different ways. In particular, encapsulation module 305 may generate media file 330 with a media description (e.g. DASH MPD) and may communicate (or streams) it directly to de-encapsulation module 315 upon receiving a request from client 320. The media file 330 may also be downloaded by client 320 and stored within a storing module of client 320.
For the sake of illustration, media file 330 may encapsulate media data (e.g. uncompressed or encoded video, possibly with audio) into boxes according to ISO Base Media File Format (ISOBMFF, ISO/I EC 14496-12 and ISO/I EC 14496-15 standards for compressed video). In such a case, media file 330 may correspond to one or more media files (indicated by a FileTypeBox ‘ fty ). According to ISOBMFF, media file 330 may include two kinds of boxes, one or more “media data box”, identified as ‘mdat’ or ‘imda’, containing the media data, and “metadata boxes” (e.g. ‘ moo\ ί or ‘moot box) containing metadata defining placement and timing of the media data. The media data box(es) contain(s) all the data for image or image parts (for example image data 325-1, 325-2). According to the disclosure, these data are stored in the order they are obtained by server 300, for example from camera(s), sensor(s), or sensor taps. The metadata boxes are extended to describe the storage order of the images or image parts in the media data box(es).
According to some embodiments, the storage order is indicated as re ordering instructions at sample level. According to other embodiments, the reordering instructions are provided at a finer level than the samples, e.g. at a subsample level. These embodiments have variants that, in addition to reordering instructions, also provide spatial information for images or image parts allowing spatial access into the recorded, stored, or encapsulated video which can be useful for extracting and viewing or storing regions of interest. Encapsulation
Figure 4 illustrates an example of steps for encapsulating sets of data according to some embodiments of the disclosure. These steps may be carried out, for example, in encapsulation module 305 in Figure 3.
As illustrated, a first step (400) is directed to configuring the encapsulation module, for example encapsulation module 305 of server 300 in Figure 3, that encapsulates the received images or image parts, optionally with an encoding step but preferably as uncompressed data. The configuration step may consist in defining a number of cameras or sensors providing images or a number of image parts, in providing camera or sensor characteristics like a frame rate or a number of taps and their scan order (e.g. as described with reference to Figure 2) in case of image sensors. The configuration information may be hard coded in the server or may be user-specified through a graphical user interface. The configuration step may also be used to indicate whether the same configuration parameters should apply for processing the whole encapsulated media data (static configuration) or may change when processing the encapsulated media data (dynamic configuration). When the encapsulation module is configured, metadata structures of a media file (for example media file 330 in Figure 3) are created. If the media file is generated as media segments, an initialization segment is created. According to ISOBMFF encapsulation, it comprises creating a ‘ moo\ ί box comprising a ‘trak box for the description of the video samples corresponding to recorded images or image parts. The ‘trak box contains boxes for the sample description, starting with a ‘stbi box and ‘stsd’ box providing one or more sample entries. Optionally, the encapsulation module inserts a new brand as a major brand or in a list of compatible brands in a ‘ftyp’ box, a ‘styp’ box, a ‘ttyp’, or a ‘etyp’ box. This new brand indicates that the media file contains samples that require reordering before being reconstructed and displayed, for example the brand ‘oofs’ (for “out-of-order storage”).
Next, a test is carried out to determine whether the configuration is a static configuration or a dynamic configuration (step 402).
If the configuration is defined as being static over time, the encapsulation module provides items of information about the reception or acquisition order for images or image parts to be encapsulated as a sample, i.e. having a same timestamp (step 404). Since the configuration is static, these items of information are preferably provided once for all the samples, at sample level, for example in a sample entry as described hereafter (first embodiment). Next, data chunks corresponding to images or image parts are received (step
406) and appended in the media data box of the media file (step 408). As illustrated with step 410, these steps of receiving and storing data chunks are repeated until the last data chunk for a given time is received and stored. Then, the sample description is completed (not depicted), for example by setting a sample time, a sample size, and a sample offset providing the location of the sample within the media data box, and a description for a new sample may begin.
It is to be noted that the media file, depending on the configuration, may consist in a fragmented file. In such a case and if the configuration is static, the items of information directed to the data order, that are provided at step 404, preferably apply to the samples in all the fragments. The samples sizes, duration, or offsets could be described within a ‘trun’ box providing a description of a run of samples for a media fragment. The order of the received data chunks composing a sample and stored in a ‘mdaf box follows the items of information stored in the sample description at step 404.
If the configuration is not defined as being static over time (step 402), another test is carried out to determine whether or not there exist predetermined configurations (step 412), i.e. whether or not the configuration of the encapsulation module may change from one predefined configuration to another predefined configuration, the list of predefined configurations being determined in the configuration step (step 400).
If the configuration is determined to be dynamic, changing from one predefined configuration to another predefined configuration, the encapsulation module obtains the index of the predefined configuration order for the current sample (step 414). Next, the current sample is associated with the predefined configuration order corresponding to the obtained index (step 416), as explained according to the second embodiment, at sample group level. Next, data chunks corresponding to images or image parts of the current sample are received (step 418) and appended in the media data box of the media file (step 420). These steps of receiving and storing data chunks are repeated until the last data chunk for the current sample is received and stored or until the end of data reception. As illustrated, when all the data for the current sample are received (i.e. if the tests 422 and 424 are true), the encapsulation module obtains an index of the predefined configuration order to be used for the next sample (step 414) and iterates on steps 416 to 424 until the end of data reception. When data for a given sample are not fully received (i.e. if test 424 is false), the encapsulation module keeps on receiving and storing data chunks in the media data box. It is to be noted that when predefined configurations are available, the media file may also be fragmented. Indeed, associating samples with predefined configurations being handled by sample grouping mechanism, this is also available for fragmented files, by design of the ISOBMFF.
If the configuration is determined to be dynamic but does not change from one predefined configuration to another predefined configuration (i.e. steps 402 and 412 are false), that is to say when the configuration may change on a sample basis, for example when the reception order and/or the number of images or image parts are not known in advance, the encapsulation module cannot indicate storage order at sample level. The data chunks are then received and stored in media data boxes as they are received (steps 426 and 428), each data chunk being described in a metadata structure with ordering information (step 430), as described according to the third embodiment (relating to storage indication at subsample level). It is to be noted that in some cases, the data chunk may come with header information indicating the position of the data chunk in the source video (or full or complete image) or address offset into the full or complete image. This is for example the case for some industrial cameras using the GigE Vision Streaming Protocol. This header information can be used to obtain ordering information for step 430. When present, header information is removed before storage of the actual data chunk. This is done until no more data are received (test 432). When all the data of a given sample are received (not depicted), the sample description is completed (e.g. the sample time, the sample size, and the sample offset are set) and a description for a new sample begins. As illustrated, the encapsulation process ends by terminating the media file, possibly with computation of indexes (e.g. indexes of the ‘sidx’, ‘ssix’, or ‘mfra’ are set) for the generated media file or media segment files (step 434).
Parsing Figure 5 illustrates an example of steps for parsing received data according to some embodiments of the disclosure.
As illustrated, a first step is directed to receiving a media file or media segment files (step 500), generated according to some embodiments of the disclosure, for example according to the steps illustrated in Figure 4. Such a reception step may comprise a downloading step, a copying step, or a streaming or progressive transmission process. Next, the reader or parser is configured (step 502). For the sake of illustration, the configuration step may comprise reading the ‘moo box of the received media file (or initialization segment when the media file consists in one or more media segments). This configuration step allows the reader or parser to allocate memory to store reconstructed images. At the same time, the parser may determine from a specific brand whether samples needs reordering or not. If no such brand is present, for example as a major brand or in a compatible brand in the ‘ftyp’, ‘styp’, ‘ttyp’, or ‘etyp’, this information may be obtained from the sample description (step 504).
Still at the same step (or at another moment), the parser can get information on the kind of samples and their organization (step 504). In particular, the parser, while inspecting the sample entry, may find a box (see e.g. the first embodiment) indicating a scan order for the samples of the video track.
From these items of information, the parser may determine whether a reordering process is to be applied (step 506). If no reordering process is needed, the received data are parsed on a standard basis. On the contrary, if a reordering process it to be applied, another test is carried out to determine whether the configuration is static or not (step 508). This may be performed by determining whether or not a specific box indicating a static configuration (as described herein below) is present. As set forth above, the configuration may comprise a number of cameras or sensors providing images or a number of image parts, camera or sensor characteristics like a frame rate or a number of taps and their scan order (e.g. as described with reference to Figure 2) in case of image sensors.
If it is determined that the configuration is static, the parser reads the sample ordering description in the corresponding box (step 510) so as to determine the reordering process to apply forthe data chunks of a given sample (step 512). The reading step is performed here on a subsample (data chunk) basis, the ordering of each subsample (data chunk) within a sample being determined according to the scan pattern defined in the sample ordering description (step 510). The number of subsamples and their size can be read from the ‘subs’ box (e.g. the subsample_count parameter). Once the data chunk of the current sample are read (step 512) and reordered (step 514), the sample is provided to the video decoder or to the reader (step 516) depending on whether the video or images are compressed or not. Next, the parser iterates on the next sample until no more sample can be read (step 518).
If no box indicating a static configuration for the sample reordering is found in the sample entry (i.e. if test 508 is false), the parser further inspects the sample description to determine whether some sample groups provide indication for sample reordering (step 520). This may be performed by looking for a specific grouping type dedicated to “sample data reordering” (or data chunks reordering or scan order indication), for example represented by the four-character code ‘sdro’ or ‘msif. It is to be noted here that the four-character code (or 4cc) is just an example, any other four- character code, not conflicting with existing and registered ones, may be used.
If the media file contains predefined configurations for data chunk reordering, the sample group index is read at step 522, providing an entry in a sample group description with same grouping_type ‘sdro’. This sample group entry provides the description for the data chunks reordering to apply in step 526 once the data chunks have been read (524). The parser may rely on the sample size to detect the end of a sample or on the number of subsamples (data chunks) in the ‘subs’ box. When data chunks are reordered (step 526), the reconstructed image is provided to a video player (step 528), either a decoder (in case of compressed video or images) or a video Tenderer (in case of uncompressed video or images). This is done until no more sample is to be read (step 530).
If the media file does not contain any predefined configuration for data chunks reordering (i.e. if test 520 is false), the parser reads data chunks (step 532) one after another and data chunk reordering items of information (step 534), according to the third embodiment described herein below, for example in subsample description. The parser iterates on data chunks composing the current sample (step 536) or while data corresponding to the sample size have not been read (i.e. another criterion for step 536). Once all the data chunks for a sample have been read, they are reordered (step 538) to provide the reconstructed image to the video player (step 540). The parser iterates on the samples until no data remain to read (step 542).
Example of data encapsulation according to some embodiments of the disclosure
Figure 6 illustrates examples of encapsulation of image data obtained from a quad-tap sensor. For the sake of illustration, the image data of the encapsulated media data
600 are obtained according to the scan order described in reference to configuration 230 in Figure 2. Each tap, in turn, sends image data chunk to a server or a video recorder, for example server 300 in Figure 3. The data chunks are stored in an interleaving that applies to all samples. For example, data chunk 1 is received from the first tap, next data chunk 2 is received from the second tap, next data chunk 3 is received from the third tap, next data chunk 4 is received from the fourth tap, next data chunk 5 is received from the first tap, next data chunk 6 is received from the second tap, and so on. This repetition can be exploited to describe the sample reordering instructions (from parser point of view), or image scan order (from image acquisition point of view), once for all samples of a video sequence. This is explained in more detail hereafter by reference to Figures 7 to 9. It is to be noted that this could apply to any configuration of sensor taps illustrated in Figure 2. This configuration is an input to step 400 in Figure 4, to setup an encapsulation module, for example encapsulation module 305 in Figure 3.
Encapsulated media data 605, 610, and 615 illustrate other storage order examples according to which the reception order cannot be determined in advance. For example, regarding encapsulated media data 605, from one sample to another, the reception order from the different taps may vary. As an example, data chunks 7 and 8 in the second sample come before data chunks 5 and 6 in the same sample while data chunks 7 and 8 in the first sample come after data chunks 5 and 6 in the same sample. For such cases, if the samples follow one or another storage order, sample grouping may be use to indicate the sample reordering instructions (from parser point of view) or image scan order (from image acquisition point of view). Regarding encapsulated media data 610, each sensor tap may send data chunks with a varying order. For example, the tap corresponding to the bottom right sent all its data (data chunks 4 and 8) before the other taps sent all their data. Regarding encapsulated media data 615, the data chunk length may vary but also the number of data chunks per sample, as can be seen in second sample.
To handle such cases, a subsample-based approach may be used, its granularity depending on the number of varying parameters (data chunk order, data chunk length, number of data chunks per sample, etc.). On the contrary, if these parameters do not vary, a description at sample level may be more relevant. It these parameters vary, but on a limited number of configurations, a description at sample level but as sample group may prove to be relevant. First embodiment (Storage order indication at sample level: use of sample entries )
Figure 7 illustrates a first embodiment wherein the storage order or the reordering instructions for the data chunks of a sample are described in a sample entry.
As illustrated, the sample description box (‘ stsd ’), referenced 700, contains sample entry description. For example, an uncompressedVideoSampleEntry related to uncompressed video, referenced 705, provides a configuration box plus a multi tap description box (MultiTapDescriptionBox) referenced 710. Multi tap description box 710 contains a description of the scan order of the image obtained from the image sensor(s) that is(are) used. The parameters of the multi tap description box make it possible to describe various configurations of multi tap sensors, in particular the ones described by reference to Figure 2.
For the sake of illustration, multi tap description box 710 may contain a first parameter denoted tap_number providing the number of taps. Still for the sake of illustration, its value may be set to zero to indicate that the sensor is a dual-tap sensor and to one to indicate that the sensor is a quad-tap sensor. Accordingly, it may then be coded on 1 bit. If a greater number of taps needs to be supported, additional bits can be allocated to this parameter.
Multi tap description box 710 may also contain a second parameter, for example tap_chunk_num, to indicate a number of bursts or of chunks corresponding to the number of reads or emissions per tap for an image. For example, considering a dual- tap sensor, one part of the image may be read and sent as a given number of data bursts or chunks. According to encapsulated media data 600 in Figure 6, the data of each quadrant are read and sent as two data bursts. It is noted that in the cases where all the data obtained from each tap are read and sent in a single burst or chunk, this parameter may be omitted (assuming one data chunk per tap when not present). When the number of bursts or chunks differs from one tap to another, the tap_chunk_num can be provided for each tap, for example in a loop on the number of taps (not represented in multi tap description box 710). During encapsulation, the tap_chunk_num information may be used with the tap_number information to obtain the number of subsamples within a sample. A sub sample information box is then added in the description, each subsample providing the byte range (through the subsample_size parameter) for a data chunk.
Multi tap description box 710 may also contain a third parameter, for example tapjnterlaced, to indicate, in case of dual-tap sensors, whether the scan of the image is interlaced or not (like in configurations 200 and 205 in Figure 2) and a fourth parameter, for example tap_mode, to indicate the orientation of the tap border, for example horizontal as illustrated with reference to sensor 100-2 in Figure 2 or vertical as illustrated with reference to sensor 100-1 in Figure 2.
For some configurations like 110-2 in Figure 1 , the multi tap description box 710 may also contain, for each tap, a tap_offset providing the position of the top left corner of a tap and a tap size (e.g. in pixels), in case they are not equally sized. By default, the order of tap declaration in the multi tap description box 710 is assumed to be the reception order for data chunks read by each tap. It then corresponds to a subsample index in the ‘subs’ box.
According to some other embodiments (not represented in multi tap description box 710), multi tap description box 710 may contain an additional parameter to indicate the row scan order, either top to bottom or bottom to top. By default (when this parameter is not present), the row scan order may be set to top to bottom. Likewise, multi tap description box 710 may contain an additional parameter to indicate the column scan order, either left to right or right to left. Again, by default (when this parameter is not present), the column scan order may be set left to right. These two optional parameters may be defined for all the taps if they share the same scan order (like configurations 210, 220, and 215 in Figure 2) or per tap when the scan order differs from one tap to another (like configurations 225, 230, 235).
In a variant, only a limited set of tap patterns are defined for interoperability purpose and the multi tap description box simply consists in providing a reference into this list of predefined set of tap patterns, for example as an 8 bit index. The limited set of tap patterns may contain some of the most common configurations, for example the configurations illustrated in Figure 2, and defined as Codec independent Code Points (to be used as reference by encapsulation modules and parsers).
While the MultiTapDescriptionBox has been described in relation to a sample entry for uncompressed video (705), a multi tap description box like multi tap description box 710 can be used as one optional box in sample entries defined for video codecs like A VC, HEVC, or VVC.
Second embodiment (storage order indication at sample level: use of sample groups ) Figures 8 and 9 illustrate a second embodiment where the scan order may vary from one sample to another, the reordering instructions or storage order of the data chunks for a given sample being described using a sample group mechanism. This description may be part of sample description (‘stbl’ box or ‘traf for fragmented files).
Figure 8 illustrates a first example of data encapsulation wherein the reordering instructions or storage order of the data chunks for the samples are described using a sample group mechanism.
As illustrated, a first sample to group box referenced 800 is created to group samples having a same scan order (e.g. a same number of data bursts, the data burst being always received in same order, etc.) like samples 805-1 to 805-3 in encapsulated media data 810. Sample to group box 800 uses a new grouping type called for example ‘sspm’ (for “sub-sample mapping”) referenced 815. It is to be noted here that the four- character code (or 4cc) is just an example, any other four-character code, not conflicting with existing and registered ones, may be used. According to sample to group box 800, the description or properties associated with each group of samples declared in sample to group box 800 are provided in a sample group description box having the same grouping_type ‘sspm’, like sample group description box 820.
In the illustrated example, only one sample group is indicated in sample to group box 800 (i.e. entry_count=1). This means that all the samples of the considered track, in particular samples 805-1 to 805-3, use the same data chunk order, as depicted in encapsulated media data 810.
The associated sample group description box with grouping type ‘sspm’ (e.g. sample group description box 820) provides subsample map entries (e.g. entries 825). The associated sample group description box 820 may alternatively provide byte-range map entries (e.g. entries 825). In such a case, the grouping type is set to another four- character code (or 4cc) like ‘brmp’ for byte-range map entries.
A subsample map entry may be used to assign an identifier, called entryjdc, to each subsample (i.e. data chunk) within the samples of a same group of samples. It relies on a subsample information box (‘subs).
A subsample map entry may be defined as a specific VisualSampleGroupEntry as follows: class SubSampleMapEntry() extends VisualSampleGroupEntry ('sspm') { bit(7) reserved = 0; unsigned int(l) rle; unsigned int(16) entry count; for (i=l; i<= entry count; i++) { if (rle) { unsigned int(16) subs start number;
} unsigned int(16) group entry idc;
}
} with the following semantics: rle indicates whether run-length encoding is used to assign entryjdc to sub samples (e.g. value 1) or not (e.g. value 0), entry_count specifies the number of entries in the map. According to some embodiments, when rle indicates that run-length encoding is used to assign entryjdc to subsamples, the entry_count corresponds to the number of runs where consecutive subsamples are associated with the same group. When rle indicates that run-length encoding is not used to assign entryjdc to subsamples, entry_count represents the total number of subsamples. The total number of subsamples should not be greater than the subsample_count in the ‘subs’ box, subs_start_number is the 1 -based subsample index (from ‘subs’ box) in the sample of the first subsample in the current run associated with entryjdc, and entryjdc specifies the indice of a visual sample group entry (e.g. the indices for group entries 845 and 850) in a sample group description box (e.g. in sample group description box 835) with grouping ype equal to the groupingJype_parameter of the SampleToGroupBox of type 'sspm'. The indice starts at 1, the value entryjdc = 0 being reserved, for example, to let some subsamples not mapped to any visual sample group entry. In a variant called virtual subsample ordering, the entryjdc directly indicates the ordering number of a subsample or of a run of subsamples. According to this variant, the grouping Jype_parameter of a sample to group box with ‘sspm’ grouping type shall have its groupingJype_parameter set to ‘vsso’ indicating Virtual subsample ordering. Virtual here is to indicate that no sample group description like sample group description box 835 is present, the entryjdc providing directly the property of the subsample or run of subsamples (in the given example, the order within the sample).
A byte range map entry may be used to assign an identifier, also called entryjdc, to byte ranges (i.e. data chunk) within the samples of a same group of samples. A byte range map entry may not require a subsample information box (‘subs) to be present. Moreover, it is relevant for uncompressed data where the number of bytes per pixel may be constant from one sample to another (allowing the mapping for group of samples). A byte-range map entry may be defined as a specific VisualSampleGroupEntry as follows: class ByteRangeMapEntry() extends VisualSampleGroupEntry ('brmp') { unsigned int(16) entry count; for (i=l; i<= entry count; i++) { unsigned int (32) byte range size; unsigned int(16) entry idc;
}
} with the following semantics: entry_count specifies the number of entries in the map, byte_range_size specifies the size of the byte range, entryjdc specifies the indice of a visual sample group entry (e.g. the indices for group entries 845 and 850) in a sample group description box (e.g. in sample group description box 835) in a sample group description box (e.g. in sample group description box 835) with grouping ype equal to the groupingJype_parameter of the SampleToGroupBox of type 'brmp'. The indice starts at 1, the value entryjdc = 0 being reserved, for example, to let some byte ranges not mapped to any visual sample group entry. In a variant called virtual byte-range ordering, the entryjdc directly indicates the ordering number of a byte-range. According to this variant, the grouping Jype_parameter of a sample to group box with ‘brmp’ grouping type has its grouping Jype_parameter set to ‘vbro’ indicating Virtual byte-range ordering. Virtual here is to indicate that no sample group description like sample group description box 835 is present, the entryjdc providing directly the property of the byte range (e.g. the order in the image).
The entries 825 (either subsample map entries or byte ranges map entries, depending on the groupingjype 815 in use) are mapped into entries (group entries 845 and 850) in a second sample group description box (sample group description box 835). The mapping of entries 825 (subsamples or byte-ranges, depending on grouping_type 815)is done by providing a entryjdc value (for example entryjdc 830) for one or more subsamples or byte-ranges. This entryjdc value indicates an entry (for example group entry 845 or 850) in the sample group description box (e.g. sample group description box 835), starting numbering from value 1, the value 0 being reserved. For example, subsamples or data chunks 3 and 4 are mapped into the first entry referenced 845 of sample group description box 835 and subsamples or data chunks 1, 2 and 5 are mapped into the second entry referenced 850. When multiple data chunks are mapped into a same entry, their relative order is their declaration order in the sample group description 820. For example, in Figure 8, the sample group description box 820 indicates that data chunk 3 and 4 correspond, in this order, to an image part described by the first ‘2dsr’ entry in the sample group description box 835. Likewise, data chunks 1 , 2, and 5 compose, in this order, the second spatial part of the full image as described in the second entry of the sample group description box 835. The entries 845 and 850 provide properties applying to the subsamples with which they are associated.
The kind of properties depends on the grouping type (e.g. grouping type 840) in the sample group description box (e.g. sample group description box 835). As illustrated in Figure 8, the grouping type may be a ‘2dsr’ providing 2D spatial relationship information like 2D spatial relationship information 855 or a ‘2dps’ for 2D spatial positions information like 2D spatial positions information 860 (as x and y positions or as an address_offset in the full image or complete image). The constraint for this grouping Jype is that it is also declared in the groupingJype_parameter of the sample to group box (e.g. sample to group box 800). This sample group description may then contain more or less items of information like an ordering number for the subsample (‘sspo’, referenced 865), a spatial position (‘ 2dps referenced 860) within the image, or a spatial position and a size (‘ 2dsr ’, referenced 855) within the image. The two latter allow a spatial access to data chunks corresponding to a given region. Moreover, they provide absolute positioning in the image, while ‘sspo’ provides relative ordering between subsamples or byte ranges. When ‘sspo’ is used in the sample group description box 835, there may be as many entries in sample group description 820 as in sample group description 835, because each data chunk may have one ordering number (865). However, consecutive data chunks stored in a media data box (‘ mdaf ) that would not need reordering may be mapped into a same entry in sample group description 835 with grouping type ‘sspo’. In such a case, there may be less entries in sample group description box 820 than in the sample group description box 835. Figure 9 illustrates a second example of data encapsulation wherein the reordering instructions or storage order of the data chunks for the samples are described using a sample group mechanism, enabling the use of different scan orders from one sample to another. Figure 9 may be considered as another example of Figure 8, in which the samples do not necessarily follow the same subsample or data chunk order from one sample to another. As illustrated within sample to group box 900, this leads to different sample groups in the sample to group box (entry_count=2), where there was only one in Figure 8.
Again, according to this embodiment, each group of samples is mapped into one entry in a first sample group description box, referenced here 910, containing subsample map entries or byte-range map entries (as explained in reference to Figure 8). Each entry in this sample group description box is then mapped into a set of properties in a second sample group description box referenced 920. The type of properties in this second sample group description box depends on the grouping type referenced 925, this grouping_type being the same as the one in the grouping_type_parameter in sample to group box 900. Like the set of properties described in reference to Figure 8, the set of properties in sample group description box 920 may be of different kinds, like an ordering property (reference 865 in Figure 8), a spatial position property (reference 860 in Figure
8), or a spatial information (reference 855 in Figure 8). This kind is indicated by the grouping_type 925. Likewise, sample group description box 920 may be omitted when using virtual subsample ordering or virtual byte range ordering, depending on the grouping_type_parameter value set in the sample to group box 900 (indicating an actual sample group description box of a given grouping_type or a virtual sample grouping type, like for example the ‘vsso’ or ‘vbro’ as explained in reference to Figure 8). In a variant, the sample groups defined in a sample to group box, for example in sample to group box 800 in Figure 8 (respectively in sample to group box 900 in Figure
9), are directly mapped into sample group entries describing a pre-defined scan order within a sample, for example into group entries 845 and 850 (respectively into group entries 930 and 935 in Figure 9) containing sample group entries providing information like the ones indicated in the MultiTapDescriptionBox. This avoids intermediate mapping of the subsamples or byte ranges. This allows handling varying scanning configurations from one sample to another, but on a pre-defined or limited set of configurations. Each scanning configuration may be described as a specific visual sample group entry providing information for samples resulting from multiple scan areas or orders (e.g. samples acquired from multi-tap sensors). For example, a MultiScanDescriptionGroupEntry sample group entry may be defined as follows: aligned(8) class MultiScanDescriptionGroupEntry extends
VisualSampleGroupEntry ('msif') { unsigned int(l) tap number; // 0 => 2 taps / 1=> 4 taps unsigned int (7) tap chunk num; if (tap number == 0) { // 2 taps unsigned int(l) tap interlaced; unsigned int(l) tap mode; ; // 0 => horizontal; 1=> vertical;
} else { // 4 taps for (i=l; i<=4; i++) { unsigned int (1) tap row scan; // from top or from bottom unsigned int (1) tap col scan; // from left or from right
}
}
}
The parameters of this sample group entry have the same meanings as the multi tap description box 710. This sample group entry may alternatively contain variants of the multi tap description box 710. This requires presence of a ‘subs’ box to indicate the different data chunks (as sub-samples) composing a sample. The value for entry_count in the ‘subs’ box should be greater than or equal to the number of sample group entries of type ‘msif. The ‘subs’ box may also have a flag value indicating that the subsamples are not stored in the image order but in a scan order.
When applied to compressed videos, the sample group description boxes, for example sample group description boxes 820 in Figure 8 and 910 in Figure 9, may be replaced by a NALU mapping. The grouping_type of the sample to group box and the sample group description box, for example of sample to group box 800 and sample group description box 820 in Figure 8 or sample to group box 900 and sample group description box 910 in Figure 9, would then be set to ‘nalm’ instead of ‘sspm’ or ‘brmp’. The grouping_type_parameter of the sample to group box, for example of sample to group box 800 in Figure 8 or sample to group box 900 in Figure 9, may take, for example a value among ‘trif from ISO/IEC 14496-15 for spatial information or ’rrif from an amendment to ISO/IEC 14496-15 also for spatial information or ‘sspo’ (e.g. reference 865 in Figure 8) for ordering indication, ‘2dps’ (e.g. reference 860 in Figure 8) for spatial position information, or ‘2dps’(e. g. reference 855 in Figure 8). The use of NALU mapping requires however properties like the one described in relation to references 855 to 865 in Figure 8 further extended with a groupID parameter allowing the identification of a property. Indeed, NAL unit mapping refers properties by their IDs (groupID) rather than by their index in a sample group description box.
In another variant, when a limited set of tap patterns (or scan orders) are predefined and used, the subsample mapping may use virtual sample grouping, using entryjdc in sample group description boxes, for example in sample group description box 820 in Figure 8 or in sample group description box 910 in Figure 9, without referring any more to any entry in a second sample group description, for example by referencing one of predefined configurations (or tap patterns). These pre-defined patterns may be defined once for all by the standard as fixed values, like codec independent code points and be referred in entryjdc parameters. These pre-defined patterns may be described for example with the information contained in a MultiTapDescriptionBox: number of taps, number of chunks per tap, tap scan order, etc.
An alternative to the previous embodiments (Storage order indication at sample level: use of sample entries) consists in always using a sample grouping approach and marking the sample group as static in case the scan order does not vary from one sample to another. Unifying the description may make implementation easier by not creating different instructions for one case or another, but instead using instructions for writing or parsing sample groups and sample group descriptions.
Third embodiment (storage order indication at subsample level ) Figure 10 illustrates an embodiment in which the reordering instructions for data chunks are indicated at subsample level, within a subsample information box (‘subs’) of a sample table box (‘stbi), for example within subsample information box 1010 of sample table box 1005 of movie box 1000 (it is to be noted that ‘subs’ box could also be used in movie fragments, e.g. in ‘moof box). It is recalled here that a subsample information box can extended via codec_specific_parameters, for example via codec_specific_parameters 1015. While these parameters are defined for video codecs like AVC, HEVC, or VVC, no definition is available for uncompressed video.
According to some embodiments, subsamples for uncompressed video may be defined as follows: for the use of the subsample information box (8.7.7 of ISO/IEC 14496-12) in an uncompressed video stream, a subsample is defined as one or more contiguous byte ranges within a sample. A subsample is further defined on the basis of the value of flags of the subsample information box ‘subs’ as specified below. The presence of this box is optional. However, if it is present in a track containing uncompressed video data, it must comply with a predetermined semantic, for example the following one wherein the flag value specifies the type of subsample information given in this box:
0: a subsample corresponds to a byte range in order;
1 : a subsample corresponds to a byte range stored out of order (with reordering information provided elsewhere, for example in a box in the sample entry (e.g. in multi tap description box 710) or via a sample grouping mechanism like ‘sspm’, ‘brmp’, or ‘msif according to the second embodiment); this indicates that subsamples do not follow image order but rather a scan order, for example from image sensor;
2: a subsample corresponds to a byte range stored out of order with order information provided as indicated in codec_specific_parameters (e.g. codec_specific_parameters 1020), i.e. as a numbering order in the sample. Considering N subsamples in a sample, this number varies from 1 to N, 1 corresponding for example to data chunk for the top left corner of the image and N to the bottom right of the image;
3: a subsample corresponds to a byte range stored out of order with order information provided as an horizontal and vertical positions offsets in pixels in the source image, for example using 16 bits of codec_specific_parameters 1015 for the horizontal offset and 16 other bits of this same codec_specific_parameters for the vertical offset;
4: a subsample corresponds to a byte range stored out of order with order information provided as an index into a spatial layout, for example using 16 bits of codec_specific_parameters 1015 to refer an index into a spatial layout described as a grid. Such a spatial layout is described hereafter. This information allows parser or reader to position the data chunk relatively to other data chunks in the 2D referential of the reconstructed image. It also allows parser or reader to retrieve data given a region in the image by determining possible intersection between this subsample and a given region;
5: a subsample corresponds to a byte range stored in order with information on its spatial position into the source image as an horizontal and vertical positions offsets in pixels in the source image, for example using 16 bits of codec_specific_parameters 1015 for the horizontal offset and 16 other bits of this same codec_specific_parameters for the vertical offset. As for value 4, this allows parser or reader to retrieve data given a region in the image; and 6: a subsample corresponds to a byte range stored in order with information on its spatial position into the source image with order information provided as an index into a spatial layout, for example using 16 bits of codec_specific_parameters 1015 to refer an index into a spatial layout described as a grid Such a spatial layout is described hereafter. As for values 4 and 5, this allows parser or reader to retrieve data given a region in the image.
For flags values (e.g. flags value 4 or 6) according to which the subsample provides information on a spatial position in a spatial layout, the presence of a metadata structure defining this spatial layout is required. This metadata structure may be a specific box as a sub-box of the ‘trak’ box (or of the ‘traf if the spatial layout changes along time). It describes the positions and sizes of the different image parts or image partitions (e.g. image data 325-1, 325-2) composing an image from the source video (the full or complete image). For the sake of illustration, a spatial layout information box, called for example SpatialLayoutBox, may be defined as follows: Box Type: 'splt '
Container: Track Box or Track Fragment Box Mandatory:No
Quantity: Zero or one (per track or per track fragment) aligned class SpatialLayoutBox extends FullBox('spit', version=0, flags=0) { unsigned int (16) num partitions; for (i=l; i <= num partitions; i++) { unsigned int(16) hor offset [i]; unsigned int(16) vert offset [i]; unsigned int(16) partition width[i]; unsigned int(16) partition height[i];
}
} where horjoffset and vert_offset parameters respectively indicate the horizontal and vertical position, for example in pixel units, of an image part or partition in the source video (the full or complete image); and partition_width and partition_height respectively indicate the width and height, for example in pixels unit, of an image part or partition. These parameters are optional and can be omitted, especially when the spatial layout consists in a regular grid.
The spatial layout information box implicitly provides indices for image parts or partitions that can be used in subsample description or any metadata structure for spatial position indication based on a pre-defined spatial layout. As a variant for the SpatialLayoutBox, the spatial layout, especially when regular, may be described as a number of horizontal partitions and a number of vertical partitions, implicit indices and positions for the partitions would then consist in a raster-scan order of the partitions. Other ISOBMFF boxes providing same kind of information (at least horizontal and vertical positions and the indices for the partitions) could be used as an alternative.
Turning back to Figure 10, an example of a subsample with reordering instruction is illustrated with references 1020 and 1025. These items of information allow a parser to rearrange data stored in a media data box, for example in mdat 1030, to provide an image, for example image 1035. Using the codec_specific_parameters has the advantage of not breaking the processing of the ‘subs’ box by legacy players. An alternative to the use of this parameter could be to use a new version of the ‘subs’ box with fewer parameters (removing some existing parameters like discardable or sample_priority that may not be relevant for samples or subsamples from uncompressed video). This new version of the box may be dedicated to subsamples that do not follow the image order but rather a scan order. This new version of the box contains a new parameter indicating the order of each subsample in the reconstructed image.
Hardware for carrying out steps of some embodiments of the disclosure
Figure 11 is a schematic block diagram of a computing device 1100 for implementation of one or more embodiments of the disclosure. The computing device 1100 may be a device such as a micro-computer, a workstation, or a light portable device. The computing device 1100 comprises a communication bus 1102 connected to:
- a central processing unit (CPU) 1104, such as a microprocessor;
- a random access memory (RAM) 1108 for storing the executable code of the method of embodiments of the disclosure as well as the registers adapted to record variables and parameters necessary for implementing the method for encapsulating, indexing, de-encapsulating, and/or accessing data, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example;
- a read only memory (ROM) 1106 for storing computer programs for implementing embodiments of the disclosure; - a network interface 1112 that is, in turn, typically connected to a communication network 1114 over which digital data to be processed are transmitted or received. The network interface 1112 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1104;
- a user interface (Ul) 1116 for receiving inputs from a user or to display information to a user; - a hard disk (HD) 1110; and/or
- an I/O module 1118 for receiving/sending data from/to external devices such as a video source or display.
The executable code may be stored either in read only memory 1106, on the hard disk 1110 or on a removable digital medium for example such as a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 1112, in order to be stored in one of the storage means of the communication device 1100, such as the hard disk 1110, before being executed.
The central processing unit 1104 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the disclosure, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1104 is capable of executing instructions from main RAM memory 1108 relating to a software application after those instructions have been loaded from the program ROM 1106 or the hard-disc (HD) 1110 for example. Such a software application, when executed by the CPU 1104, causes the steps of the flowcharts shown in the previous figures to be performed.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the method of the disclosure. However, alternatively, the method of the present disclosure may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
Although the present disclosure has been described hereinabove with reference to specific embodiments, the present disclosure is not limited to the specific embodiments, and modifications will be apparent to a person skilled in the art which lie within the scope of the present disclosure. Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the disclosure, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

1. A method of encapsulating an image in a file compliant with the ISOBMFF standard, the method comprising: obtaining a sequence of data chunks, each data chunk including image data of a portion of the image; generating metadata providing a relationship between the sequence of data chunks and a predetermined scan of the portions of the image, encapsulating the data chunks of the sequence of data chunks and the generated metadata in the file.
2. The method of claim 1, wherein the data of each of the portions are obtained from a tap of a multi-tap image capturing apparatus, wherein the order of the sequence of data chunks depends on the number of taps of the multi-tap image capturing apparatus and on a scanning order of pixels of each of the portion.
3. The method of claim 1 or claim 2, wherein the data chunks are encapsulated into samples according to the order of the sequence of data chunks.
4. The method according to any one of claims 1 to 3, wherein the file comprises an indicator that the sequence of data chunks is ordered differently than the predetermined scan of the image.
5. The method according to any one of claims 1 to 4, wherein the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes the encapsulation of all images encapsulated within the file.
6. The method according to any one of claims 1 to 4, wherein the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes the encapsulation of some of images encapsulated within the file.
7. The method according to any one of claims 1 to 5, wherein the generated metadata are encapsulated within a multi tap description box associated with a sample description box describing encapsulation of the image.
8. The method according to claim 7, wherein the generated metadata are encapsulated within a sample entry.
9. The method according to any one of claims 1 to 6, wherein the relationship between the sequence of data chunks and a predetermined scan of the portions of the image is described using a sample group mechanism so that the relationship between the sequence of data chunks and a predetermined scan of the portions of the image describes each encapsulated image of a group of encapsulated images.
10. The method according to claim 9, wherein the file comprises a sample to group box defining at least one group of samples, the image belonging to the group of samples, and wherein the file further comprises at least one sample group description box associated with the sample to group box, the at least one sample group description box comprising the generated metadata.
11. The method according to claim 9 or 10, wherein a first relationship between a sequence of data chunks and a predetermined scan of portions of an image is associated with a first group of samples and a second relationship between a sequence of data chunks and a predetermined scan of portions of an image, different from the first relationship, is associated with a second group of samples.
12. The method according to any one of claims 1 to 6, wherein the relationship between the sequence of data chunks and a predetermined scan of the portions of the image is described within a subsample information box using codec specific parameters.
13. The method according to any one of claims 9 to claim 12, wherein the generated metadata make it possible to parse only a spatial portion of the encapsulated image.
14. The method according to any one of claims 1 to 13, wherein the image is an uncompressed image.
15. A method of parsing a file compliant with the ISOBMFF standard, the method comprising: obtaining, from the file, metadata providing a relationship between a sequence of data chunks and a predetermined scan of portions of an image, obtaining, from the file, a sequence of data chunks, each data chunk including image data of a portion of the image, ordering the obtained data chunks according to the obtained metadata, and generating the image from the ordered data chunks.
16. The method according to claim 15, wherein the metadata are obtained from a multi tap description box associated with a sample description box describing encapsulation of the image.
17. The method according to claim 15, wherein the metadata are obtained from a sample group mechanism, the relationship between a sequence of data chunks and a predetermined scan of portions of an image describing each encapsulated image of a group of encapsulated images.
18. The method according to claim 15, wherein the metadata are obtained from a subsample information box of a sample table box using codec specific parameters.
19. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing each of the steps of the method according to any one of claims 1 to 18 when loaded into and executed by the programmable apparatus.
20. A non-transitory computer-readable storage medium storing instructions of a computer program for implementing each of the steps of the method according to any one of claims 1 to 18.
21. A device for encapsulating an image within a file or for parsing an image encapsulated within a file, the device comprising a processing unit configured for carrying out each of the steps of the method according to any one of claims 1 to 18.
PCT/EP2021/087209 2021-01-06 2021-12-22 Method, device, and computer program for optimizing encapsulation of images WO2022148651A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2100154.0 2021-01-06
GB2100154.0A GB2602643B (en) 2021-01-06 2021-01-06 Method, device, and computer program for optimizing encapsulation of images

Publications (1)

Publication Number Publication Date
WO2022148651A1 true WO2022148651A1 (en) 2022-07-14

Family

ID=74566374

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/087209 WO2022148651A1 (en) 2021-01-06 2021-12-22 Method, device, and computer program for optimizing encapsulation of images

Country Status (2)

Country Link
GB (1) GB2602643B (en)
WO (1) WO2022148651A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130195171A1 (en) * 2012-01-30 2013-08-01 Qualcomm Incorporated Method of coding video and storing video content
US20160014360A1 (en) * 2014-07-11 2016-01-14 Imperx, Inc. Area scan interline transfer ccd imaging device and apparatus with tdi scanning mode
US20170346873A1 (en) * 2016-05-27 2017-11-30 Canon Kabushiki Kaisha Method, device, and computer program for encapsulating and parsing timed media data
WO2018171758A1 (en) * 2017-03-24 2018-09-27 Mediatek Inc. Method and apparatus for deriving vr projection, packing, roi and viewport related tracks in isobmff and supporting viewport roll signaling
WO2020058494A1 (en) * 2018-09-20 2020-03-26 Canon Kabushiki Kaisha Method, device, and computer program for improving transmission of encoded media data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130195171A1 (en) * 2012-01-30 2013-08-01 Qualcomm Incorporated Method of coding video and storing video content
US20160014360A1 (en) * 2014-07-11 2016-01-14 Imperx, Inc. Area scan interline transfer ccd imaging device and apparatus with tdi scanning mode
US20170346873A1 (en) * 2016-05-27 2017-11-30 Canon Kabushiki Kaisha Method, device, and computer program for encapsulating and parsing timed media data
WO2018171758A1 (en) * 2017-03-24 2018-09-27 Mediatek Inc. Method and apparatus for deriving vr projection, packing, roi and viewport related tracks in isobmff and supporting viewport roll signaling
WO2020058494A1 (en) * 2018-09-20 2020-03-26 Canon Kabushiki Kaisha Method, device, and computer program for improving transmission of encoded media data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JON PIESING TP VISION: "ts_103285v010301edithelp_review_2.docx", 17 January 2020 (2020-01-17), XP017858211, Retrieved from the Internet <URL:https://member.dvb.org/wg/TM-IPI/documentRevision/download/43254 ts_103285v010301edithelp_review_2.docx> [retrieved on 20200117] *

Also Published As

Publication number Publication date
GB202100154D0 (en) 2021-02-17
GB2602643A (en) 2022-07-13
GB2602643B (en) 2023-04-05

Similar Documents

Publication Publication Date Title
US11128898B2 (en) Method, device, and computer program for encapsulating scalable partitioned timed media data
US11805304B2 (en) Method, device, and computer program for generating timed media data
KR101889247B1 (en) Method, device, and computer program for encapsulating partitioned timed media data using a generic signaling for coding dependencies
KR101748779B1 (en) Method, device, and computer program for encapsulating partitioned timed media data
EP3092796B1 (en) Method, device, and computer program for encoding inter-layer dependencies
KR101897945B1 (en) Method, device, and computer program for encapsulating partitioned timed media data using sub-track feature
EP3466092B1 (en) Method, device, and computer program for encapsulating and parsing timed media data
KR102320455B1 (en) Method, device, and computer program for transmitting media content
US11638066B2 (en) Method, device and computer program for encapsulating media data into a media file
US20230025332A1 (en) Method, device, and computer program for improving encapsulation of media content
GB2512880A (en) Method, device, and computer program for encapsulating partitioned timed media data
US20220166997A1 (en) Method and apparatus for encapsulating video data into a file
WO2022148651A1 (en) Method, device, and computer program for optimizing encapsulation of images
WO2022148650A1 (en) Method, device, and computer program for encapsulating timed media content data in a single track of encapsulated media content data
WO2022148729A1 (en) Method and apparatus for encapsulating uncompressed images and uncompressed video data into a file
WO2023111214A1 (en) Method, device, and computer program for enhancing encoding and encapsulation of point cloud data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21844670

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21844670

Country of ref document: EP

Kind code of ref document: A1