EP3977750A1

EP3977750A1 - An apparatus, a method and a computer program for video coding and decoding

Info

Publication number: EP3977750A1
Application number: EP20814633.2A
Authority: EP
Inventors: Emre Aksu; Miska Hannuksela
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2019-05-29
Filing date: 2020-05-27
Publication date: 2022-04-06
Also published as: EP3977750A4; WO2020240089A1

Abstract

A method comprising authoring items into a media file, said items being associated with a plurality of referenced items to be processed in a specific order; and authoring one or more properties of item references of said referenced items into said media file, wherein said properties include one or more of the following:indication if the item references are strictly ordered; indication if the referenced items are removable without making a referencing item invalid; a checksum generated from ID values of the referenced items in the order they are listed.

Description

AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR VIDEO

CODING AND DECODING

TECHNICAL FIELD

[0001] The present invention relates to an apparatus, a method and a computer program for video coding and decoding.

BACKGROUND

[0002] The syntax of many media file formats is based on a hierarchical list of type and length prefixed data chunks or boxes, where the naming depends on the format in question.In a container file according to ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12), the media data and metadata is arranged in various types of boxes. Many formats are derived from ISOBMFF, including the High Efficiency Image File Format (HEIF, ISO/IEC 23008- 12), MPEG-4 file format (ISO/IEC 14496-14, also known as the MP4 format), file format for NAL unit structured video (ISO/IEC 14496-15) and 3GPP file format (3GPP TS 26.244, also known as the 3GP format). These formats use the same box-structured basic structure.

[0003] The HEIF is a standard developed by the Moving Picture Experts Group (MPEG) for storage of images and image sequences. HEIF includes a rich set of features building on top of the ISOBMFF, making HEIF feature-wise superior compared to many other image file formats. One such feature of the file format is its capability to store multiple images in the same file. These images, called image items, can have logical relationships to each other.

[0004] In some image item storage use cases, it is desirable to indicate that the order of referenced items in the box structure is strict and shall not be modified. This is particularly important in cases where an order-critical process, such as predictively coded image items, is applied to the referenced items, wherein the items that are referenced must be processed in a specific order so that the resulting bitstream after item data concatenation is a decodable video bitstream. However, the features of HEIF are inadequate for handling such order-critical processes.

SUMMARY

[0005] Now in order to at least alleviate the above problems, an enhanced file authoring method is introduced herein.

[0006] A method according to a first aspect comprises authoring items into a media file, said items being associated with a plurality of referenced items to be processed in a specific order; and authoring one or more properties of item references of said referenced items into said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered, indication if the referenced items are removable without making a referencing item invalid, a checksum generated from ID values of the referenced items in the order they are listed.

[0007] An apparatus according to a second aspect comprises means for authoring items into a media file, said items being associated with a plurality of referenced items to be processed in a specific order; and means for authoring one or more properties of item references of said referenced items into said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered, indication if the referenced items are removable without making a referencing item invalid, a checksum generated from ID values of the referenced items in the order they are listed.

[0008] According to an embodiment, the apparatus further comprises means for authoring said one or more properties of item references of said references items into a

ItemReferenceBox according to ISO Base Media File Format (ISOBMFF).

[0009] According to an embodiment, at least one further data structure is included in a syntax of the ItemReferenceBox to indicate the strictly ordered item references.

[0010] According to an embodiment, flag of the syntax of the ItemReferenceBox are used to indicate the strictly ordered item references.

[0011] According to an embodiment, at least one further box is defined in accordance with ISOBMFF syntax to indicate the strictly ordered item references.

[0012] According to an embodiment, the items are image items and the media file format is a High Efficiency Image File Format or a High Efficiency Image File compatible storage format.

[0013] According to an embodiment, a checksum generation algorithm is pre-defined or indicated in the media file.

[0014] A method according to a third aspect comprises: receiving a media file authored with image items comprising a plurality of referenced items to be processed in a specific order; reading one or more properties of item references of said references items from said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered, indication if the referenced items are removable without making a referencing item invalid, a checksum generated from ID values of the referenced items in the order they are listed; and parsing the media file according to said one or more properties. [0015] An apparatus according to a fourth aspect comprises comprising means for receiving a media file authored with image items comprising a plurality of referenced items to be processed in a specific order; means for reading one or more properties of item references of said references items from said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered, indication if the referenced items are removable without making a referencing item invalid, a checksum generated from ID values of the referenced items in the order they are listed; and means for parsing the media file according to said one or more properties.

[0016] According to an embodiment, the apparatus further comprises means for receiving an instruction to remove a first item from a media file; means for checking from

ItemReferenceBoxes if the first item is referenced by any other item in the media file; and means for removing the first item if it is not referenced by any other item or if it is indicated in the media file that the first item is removable without making the referencing item(s) invalid.

[0017] According to an embodiment, the apparatus further comprises means for receiving an instruction to reorder a first item in a first item reference; and means for reordering the first item if the first item reference is not indicated to be strictly ordered.

[0018] The further aspects relate to apparatuses and computer readable storage media stored with code thereon, which are arranged to carry out the above methods and one or more of the embodiments related thereto.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:

[0020] Figure 1 shows schematically an electronic device employing embodiments of the invention;

[0021] Figure 2 shows schematically a user equipment suitable for employing

embodiments of the invention;

[0022] Figure 3 further shows schematically electronic devices employing embodiments of the invention connected using wireless and wired network connections;

[0023] Figure 4 shows a flow chart of a file authoring method according to an embodiment of the invention;

[0024] Figure 5 shows schematically an encoder suitable for implementing embodiments of the invention; [0025] Figure 6 shows a schematic diagram of a decoder suitable for implementing embodiments of the invention;

[0026] Figure 7 shows a flow chart of a file parsing method according to an embodiment of the invention; and

[0027] Figure 8 shows a schematic diagram of an example multimedia communication system within which various embodiments may be implemented.

DETAILED DESCRIPTON OF SOME EXAMPLE EMBODIMENTS

[0028] The following describes in further detail suitable apparatus and possible mechanisms for implementing the embodiments described below. In this regard reference is first made to Figures 1 and 2, where Figure 1 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an exemplary apparatus or electronic device 50, which may incorporate a codec according to an

embodiment of the invention. Figure 2 shows a layout of an apparatus according to an example embodiment. The elements of Figs. 1 and 2 will be explained next.

[0029] The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require encoding and decoding or encoding or decoding video images.

[0030] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.

[0031] The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise a camera capable of recording or capturing images and/or video. The apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.

[0032] The apparatus 50 may comprise a controller 56, processor or processor circuitry for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller.

[0033] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.

[0034] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).

[0035] The apparatus 50 may comprise a camera capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing. The apparatus may receive the video image data for processing from another device prior to transmission and/or storage. The apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding. The structural elements of apparatus 50 described above represent examples of means for performing a corresponding function.

[0036] With respect to Figure 3, an example of a system within which embodiments of the present invention can be utilized is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet. [0037] The system 10 may include both wired and wireless communication devices and/or apparatus 50 suitable for implementing embodiments of the invention.

[0038] For example, the system shown in Figure 3 shows a mobile telephone network 11 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

[0039] The example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22. The apparatus 50 may be stationary or mobile when carried by an individual who is moving. The apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.

[0040] The embodiments may also be implemented in a set-top box; i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware or software or combination of the

encoder/decoder implementations, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.

[0041] Some or further apparatus may send and receive calls and messages and

communicate with service providers through a wireless connection 25 to a base station 24.

The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28. The system may include additional communication devices and communication devices of various types.

[0042] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11 and any similar wireless communication technology. A

communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection. [0043] Available media file format standards include ISO base media file format (ISO/IEC 14496-12, which may be abbreviated ISOBMFF) and file format for NAL unit structured video (ISO/IEC 14496-15), which derives from the ISOBMFF.

[0044] Some concepts, structures, and specifications of ISOBMFF are described below as an example of a container file format, based on which the embodiments may be implemented. The aspects of the invention are not limited to ISOBMFF, but rather the description is given for one possible basis on top of which the invention may be partly or fully realized.

[0045] A basic building block in the ISO base media file format is called a box. Each box has a header and a payload. The box header indicates the type of the box and the size of the box in terms of bytes. A box may enclose other boxes, and the ISO file format specifies which box types are allowed within a box of a certain type. Furthermore, the presence of some boxes may be mandatory in each file, while the presence of other boxes may be optional. Additionally, for some box types, it may be allowable to have more than one box present in a file. Thus, the ISO base media file format may be considered to specify a hierarchical structure of boxes.

[0046] According to the ISO family of file formats, a file includes media data and metadata that are encapsulated into boxes. Each box is identified by a four character code (4CC) and starts with a header which informs about the type and size of the box.

[0047] The syntax of a box is as follows:

aligned (8) class Box (unsigned int(32) boxtype,

optional unsigned int(8) [16] extended_type) {

unsigned int(32) size;

unsigned int(32) type = boxtype;

if (size==l) {

unsigned int(64) largesize;

} else if (size==0) {

// box extends to end of file

}

if (boxtype== ' uuid ' ) {

unsigned int(8) [16] usertype = extended_type;

}

[0048] In the above syntax, s i ze is an integer that specifies the number of bytes in this box, including all its fields and contained boxes; if size is 1 then the actual size is in the field largesize; if size is 0, then this box must be in a top-level container, and be the last box in that container (typically, a file or data object delivered over a protocol), and its contents extend to the end of that container (normally only used for a MediaDataBox). type identifies the box type; user extensions use an extended type, and in this case, the type field is set to 'uuid'.

[0049] A FullBox extends the Box syntax by adding version and flags fields into the box header. The version field is an integer that specifies the version of this format of the box. The flags field is a map or a bit field of flags. Parsers may be required that to ignore and skip boxes that have an unrecognized version value. The syntax of a FullBox may be specified as follows:

aligned (8) class FullBox (unsigned int(32) boxtype, unsigned int(8) v, bit (24) f)

extends Box (boxtype) {

unsigned int(8) version = v;

bit (24) flags = f;

}

[0050] Many formats are derived from ISOBMFF, including the High Efficiency Image File Format (HEIF, ISO/IEC 23008-12), MPEG-4 file format (ISO/IEC 14496-14, also known as the MP4 format), file format for NAL unit structured video (ISO/IEC 14496-15) and 3GPP file format (3GPP TS 26.244, also known as the 3GP format). These formats use the same box-structured basic structure.

[0051] According to the ISO base media file format, a file includes media data and metadata that are encapsulated into boxes. Each box is identified by a four character code (4CC) and starts with a header which informs about the type and size of the box.

[0052] Many files formatted according to the ISO base media file format start with a file type box, also referred to as FileTypeBox or the ftyp box. The ftyp box contains information of the brands labeling the file. The ftyp box includes one major brand indication and a list of compatible brands. The major brand identifies the most suitable file format specification to be used for parsing the file. The compatible brands indicate which file format specifications and/or conformance points the file conforms to. It is possible that a file is conformant to multiple specifications. All brands indicating compatibility to these specifications should be listed, so that a reader only understanding a subset of the compatible brands can get an indication that the file can be parsed. Compatible brands also give a permission for a file parser of a particular file format specification to process a file containing the same particular file format brand in the ftyp box. A file player may check if the ftyp box of a file comprises brands it supports, and may parse and play the file only if any file format specification supported by the file player is listed among the compatible brands.

[0053] In files conforming to the ISO base media file format, the media data may be provided in one or more instances of MediaDataBox (‘mdat‘) and the MovieBox (‘moov’) may be used to enclose the metadata for timed media. In some cases, for a file to be operable, both of the‘mdat’ and‘moov’ boxes may be required to be present. The‘moov’ box may include one or more tracks, and each track may reside in one corresponding TrackBox (‘trak’). Each track is associated with a handler, identified by a four-character code, specifying the track type. Video, audio, and image sequence tracks can be collectively called media tracks, and they contain an elementary media stream. Other track types comprise hint tracks and timed metadata tracks.

[0054] Tracks comprise samples, such as audio or video frames. For video tracks, a media sample may correspond to a coded picture or an access unit. A media track refers to samples (which may also be referred to as media samples) formatted according to a media

compression format (and its encapsulation to the ISO base media file format). A hint track refers to hint samples, containing cookbook instructions for constructing packets for transmission over an indicated communication protocol. A timed metadata track may refer to samples describing referred media and/or hint samples.

[0055] The 'trak' box includes in its hierarchy of boxes the SampleTableBox (also known as the sample table or the sample table box). The SampleTableBox contains the

SampleDescriptionBox, which gives detailed information about the coding type used, and any initialization information needed for that coding. The SampleDescriptionBox contains an entry-count and as many sample entries as the entry-count indicates. The format of sample entries is track-type specific but derive from generic classes (e.g. VisualSampleEntry, AudioSampleEntry). Which type of sample entry form is used for derivation the track-type specific sample entry format is determined by the media handler of the track.

[0056] Movie fragments may be used, for example, when recording content to ISO files, for example, in order to avoid losing data if a recording application crashes, runs out of memory space, or some other incident occurs. Without movie fragments, data loss may occur because the file format may require that all metadata, for example, a movie box, be written in one contiguous area of the file. Furthermore, when recording a file, there may not be sufficient amount of memory space to buffer a movie box for the size of the storage available, and re-computing the contents of a movie box when the movie is closed may be too slow. Moreover, movie fragments may enable simultaneous recording and playback of a file using a regular ISO file parser. Furthermore, a smaller duration of initial buffering may be required for progressive downloading, e.g., simultaneous reception and playback of a file when movie fragments are used and the initial movie box is smaller compared to a file with the same media content but structured without movie fragments. [0057] The movie fragment feature may enable splitting the metadata that otherwise might reside in the movie box into multiple pieces. Each piece may correspond to a certain period of time of a track. In other words, the movie fragment feature may enable interleaving file metadata and media data. Consequently, the size of the movie box may be limited and the use cases mentioned above be realized.

[0058] In some examples, the media samples for the movie fragments may reside in an mdat box. For the metadata of the movie fragments, however, a moof box may be provided. The moof box may include the information for a certain duration of playback time that would previously have been in the moov box. The moov box may still represent a valid movie on its own, but in addition, it may include an mvex box indicating that movie fragments will follow in the same file. The movie fragments may extend the presentation that is associated to the moov box in time.

[0059] Within the movie fragment there may be a set of track fragments, including anywhere from zero to a plurality per track. The track fragments may in turn include anywhere from zero to a plurality of track runs, each of which document is a contiguous run of samples for that track (and hence are similar to chunks). Within these structures, many fields are optional and can be defaulted. The metadata that may be included in the moof box may be limited to a subset of the metadata that may be included in a moov box and may be coded differently in some cases. Details regarding the boxes that can be included in a moof box may be found from the ISOBMFF specification.

[0060] Transformed media tracks may have resulted by applying one or more

transformations of different types for a conventional media track. A transformed media track may for example be an encrypted or protected media track or an incomplete media track. Incomplete tracks may result, for example, samples are received partially.

[0061] The ISO Base Media File Format contains three mechanisms for timed metadata that can be associated with particular samples: sample groups, timed metadata tracks, and sample auxiliary information. Derived specification may provide similar functionality with one or more of these three mechanisms.

[0062] Per-sample sample auxiliary information may be stored anywhere in the same file as the sample data itself; for self-contained media files, this is typically in a MediaDataBox or a box from a derived specification. It is stored either (a) in multiple chunks, with the number of samples per chunk, as well as the number of chunks, matching the chunking of the primary sample data or (b) in a single chunk for all the samples in a movie sample table (or a movie fragment). The Sample Auxiliary Information for all samples contained within a single chunk (or track run) is stored contiguously (similarly to sample data).

[0063] Sample Auxiliary Information, when present, is always stored in the same file as the samples to which it relates as they share the same data reference (’dref) structure.

However, this data may be located anywhere within this file, using auxiliary information offsets ('saio') to indicate the location of the data.

[0064] Files conforming to the ISOBMFF may contain any non-timed objects, referred to as items, meta items, or metadata items, in a meta box (fourCC:‘meta’), which may also be called MetaBox. While the name of the meta box refers to metadata, items can generally contain metadata or media data. The meta box may reside at the top level of the file, within a movie box (fourCC:‘moov’), and within a track box (fourCC:‘trak’), but at most one meta box may occur at each of the file level, movie level, or track level. The meta box may be required to contain a‘hdlr’ box indicating the structure or format of the‘meta’ box contents. The meta box may list and characterize any number of items that can be referred and each one of them can be associated with a file name and are uniquely identified with the file by item identifier (item id) which is an integer value. The metadata items may be for example stored in the 'idaf box of the meta box or in an 'mdaf box or reside in a separate file. If the metadata is located external to the file then its location may be declared by the DatalnformationBox (fourCC:‘dinf ). In the specific case that the metadata is formatted using XML syntax and is required to be stored directly in the MetaBox, the metadata may be encapsulated into either the XMLBox (fourCC:‘xml‘) or the BinaryXMLBox (fourcc:‘bxml’). An item may be stored as a contiguous byte range, or it may be stored in several extents, each being a contiguous byte range. In other words, items may be stored fragmented into extents, e.g. to enable interleaving. An extent is a contiguous subset of the bytes of the resource; the resource can be formed by concatenating the extents.

[0065] A uniform resource identifier (URI) may be defined as a string of characters used to identify a name of a resource. Such identification enables interaction with representations of the resource over a network, using specific protocols. A URI is defined through a scheme specifying a concrete syntax and associated protocol for the URI. The uniform resource locator (URL) and the uniform resource name (URN) are forms of URI. A URL may be defined as a URI that identifies a web resource and specifies the means of acting upon or obtaining the representation of the resource, specifying both its primary access mechanism and network location. A URN may be defined as a URI that identifies a resource by name in a particular namespace. A URN may be used for identifying a resource without implying its location or how to access it.

[0066] The ISO base media file format does not limit a presentation to be contained in one file. As such, a presentation may be comprised within several files. As an example, one file may include the metadata for the whole presentation and may thereby include all the media data to make the presentation self-contained. Other files, if used, may not be required to be formatted to ISO base media file format, and may be used to include media data, and may also include unused media data, or other information. The ISO base media file format concerns the structure of the presentation file only. The format of the media-data files may be constrained by the ISO base media file format or its derivative formats only in that the media-data in the media files is formatted as specified in the ISO base media file format or its derivative formats.

[0067] The ability to refer to external files may be realized through data references. In some examples, a sample description box included in each track may provide a list of sample entries, each providing detailed information about the coding type used, and any initialization information needed for that coding. All samples of a chunk and all samples of a track fragment may use the same sample entry. A chunk may be defined as a contiguous set of samples for one track. The Data Reference (dref) box, which may also be included in each track, may define an indexed list of uniform resource locators (URLs), uniform resource names (URNs), and/or self-references to the file containing the metadata. A sample entry may point to one index of the Data Reference box (which, in the syntax, may be referred to as DataReferenceBox), thereby indicating the file containing the samples of the respective chunk or track fragment.

[0068] DataReferenceBox contains a list of boxes that declare the potential location(s) of the media data referred to by the file. DataReferenceBox is contained by

DatalnformationBox, which in turn is contained by MedialnformationBox or MetaBox. When contained in the MedialnformationBox, each sample entry of the track contains a data reference index referring to a list entry of the list of box(es) in the DataReferenceBox. The box(es) in the DataReferenceBox are extended from FullBox, i.e. contain the version and the flags field in the box header. Two box types have been specified to be included in the DataReferenceBox: DataEntryUrlBox and DataEntryUmBox provide a URL and URN data reference, respectively. When the least significant bit of the flags field of either

DataEntryUrlBox or DataEntryUmBox is equal 1 (which may be called the "self-containing" flag or self-contained flag), the respective data reference refers to the containing file itself and no URL or URN string is provided within the DataEntryUrlBox or the DataEntryUmBox.

[0069] In ISOBMFF the exact location of samples referred to by a TrackBox (i.e.

excluding samples referred to by movie fragments) may be computed using information provided by (a) DataReferenceBox (b) SampleToChunkBox (c) ChunkOffsetBox, and (d) SampleSizesBox. Furthermore, the locating of a sample involves an offset calculation using the start of the file. For sample referred to by movie fragments, the exact location of samples may be computed using information provided in TrackFragmentHeaderBox, and

TrackFragmentRunBox, and the locating of a sample may involve an offset calculating using either the start of the file or the start of the MovieFragmentBox as a reference. The use of offsets may render the file fragile to any edits. For example, it may be sufficient to simply add or delete a byte between the start of a file and a MediaDataBox to destroy the computed offsets and render the file non-decodab le. This means that any entity that is editing a file should be careful to ensure that all offsets computed and set in the file must be valid after it completes its editing.

[0070] The ItemFocationBox provides for each item, an indication if the item is located in this or other files and in the latter case the URN/URF of the other files, the base reference of byte offsets within the file, and the extents from which the item is constructed. For each extent, the ItemFocationBox is indicative of the byte offset relative to the base reference and the length of the extent. For each item the ItemFocationBox comprises data reference index, which is a 1 -based index referring to a list entry of the list of box(es) in the

DataReferenceBox contained in the MetaBox, and which identifies the file containing the item. For each item the ItemFocationBox also comprises the construction method field, which indicates the base reference within the file and can be one of the following:

flle offset: data reference index equal to 0 indicates the same file as the file containing the MetaBox. When the MetaBox is in a Movie Fragment and the data reference indicates‘same file’, the data origin is the first byte of the enclosing MovieFragmentBox. Otherwise, byte offsets are absolute byte offsets into the file (from the start of the file) identified by data reference index. idat offset: byte offsets are relative to the ItemDataBox in the same MetaBox item offset: byte offsets are relative to the start of an item data for an item indicated by the item reference index field.

[0071] High Efficiency Image File Format (HEIF) is a standard developed by the Moving Picture Experts Group (MPEG) for storage of images and image sequences. Among other things, the standard facilitates file encapsulation of data coded according to High Efficiency Video Coding (HEVC) standard. HEIF includes features building on top of the used ISO Base Media File Format (ISOBMFF).

[0072] The ISOBMFF structures and features are used to a large extent in the design of HEIF. The basic design for HEIF comprises that still images are stored as items and image sequences are stored as tracks. HEIF enables to store multiple images in the same file. These images, called image items, can have logical relationships to each other.

[0073] In the context of HEIF, the following boxes may be contained within the root-level 'meta' box and may be used as described in the following. In HEIF, the handler value of the Handler box of the 'meta' box is 'picf . The resource (whether within the same file, or in an external file identified by a uniform resource identifier) containing the coded media data is resolved through the Data Information ('din ) box, whereas the Item Focation ('iloc') box stores the position and sizes of every item within the referenced file. The Item Reference ('ire ) box documents relationships between items using typed referencing. If there is an item among a collection of items that is in some way to be considered the most important compared to others then this item is signaled by the Primary Item ('pitm') box. Apart from the boxes mentioned here, the 'meta' box is also flexible to include other boxes that may be necessary to describe items.

[0074] Any number of image items can be included in the same file. Given a collection images stored by using the 'meta' box approach, it sometimes is essential to qualify certain relationships between images. Examples of such relationships include indicating a cover image for a collection, providing thumbnail images for some or all of the images in the collection, and associating some or all of the images in a collection with auxiliary image such as an alpha plane. A cover image among the collection of images is indicated using the 'pitm' box. A thumbnail image or an auxiliary image is linked to the primary image item using an item reference of type 'thmb' or 'auxf, respectively.

[0075] The Item Reference (’iref) box has the following syntax:

aligned (8) class SingleltemTypeReferenceBox (referenceType) extends

Box (referenceType) {

unsigned int(16) from item ID;

unsigned int(16) reference count;

for (j=0; j <reference_count; j++) {

unsigned int(16) to item ID;

}

} aligned (8) class SingleltemTypeReferenceBoxLarge (referenceType) extends Box (referenceType) {

unsigned int(32) from item ID;

unsigned int(16) reference count;

for (j=0; j <reference_count; j++) {

unsigned int(32) to item ID;

}

aligned(8) class ItemReferenceBox extends FullBox ( ' iref ' , version, 0) { if (version==0) {

SingleltemTypeReferenceBox

references [ ] ;

} else if (version==l) {

SingleltemTypeReferenceBoxLarge references [] ;

}

where

reference_type contains an indication of the type of the reference

from_item_ID contains the item_ID of the item that refers to other items

reference_count is the number of references

to_item_ID contains the item_ID of the item referred to.

[0076] In some image item storage use cases, it is desirable to indicate that the order of referenced items in the‘iref box is strict and shall not be modified. This is particularly important in cases where an order-critical process is applied to the referenced items, wherein items that are referenced must be processed in a specific order so that the resulting bitstream after item data concatenation is a decodable video bitstream. One example of an order-critical process is predictively coded image items, which are now included in the Working Draft of upcoming HEIF standard amendment.

[0077] Although HEIF provides the mechanism to provide item references from one image item to other image items, it does not provide any mechanism to indicate that the referenced items must be processed in the order they are listed in the item reference box. Lack of such a mechanism may cause for example the following consequences:

File editing might cause item references to be reordered and thus readers might have incorrect interpretation of those item references that are strictly ordered.

If all item references were treated as strictly ordered in file editing, the editing processes would become less flexible. For example, removal of an item might not be possible unless the file editor correctly interprets all item references pointing to the item to be removed.

Rather than using item references, it may be possible to specify additional data structures to indicate the order of items such as properties or entity groups. However, such additional data structures may be specific to the type of the item reference and are therefore not generically applicable. Moreover, the additional data structures are located in the file separately from the item references and hence make file parsing more complicated.

[0078] It is remarked that the support for items is specified in ISOBMFF. This support includes the specification of MetaBox and its child boxes, such as ItemLocationBox ('iloc'), ItemlnfoBox (’iinf), ItemDataBox (’idaf), ItemReferenceBox (’iref),

ItemPropertyContainerBox ('ipco'), and ItemPropertyAssociationBox ('ipma'). Item-related features of ISOBMFF are included by reference in the HEIF standard.

[0079] Some paragraphs and embodiments are described in relation to HEIF. It needs to be understood that the paragraphs and embodiments similarly apply to other file formats, such as ISOBMFF.

[0080] Some paragraphs and embodiments are described in relation to ISOBMFF. It needs to be understood that the paragraphs and embodiments similarly apply to other file formats, such as HEIF.

[0081] Now an improved method for file authoring and related apparatus are introduced for alleviating the above problems.

[0082] The method according to an aspect, as shown in Figure 4, comprises authoring (400) items into a media file, said items being associated with a plurality of referenced items to be processed in a specific order; and authoring (402) one or more properties of item references of said referenced item into said media file, wherein said properties include one or more of the following:

indication if the item references are strictly ordered;

indication if the referenced items are removable without making a referencing item invalid;

a checksum generated from ID values of the referenced items in the order they are listed.

[0083] According to an embodiment, a checksum generation algorithm may be pre-defined e.g. in a standard or may be indicated in the file. For example, the MD5 checksum may be used. A file parser may derive the checksum from the referenced item ID values and compare the derived checksum value to the checksum value contained in the file to verify that the item references are intact.

[0084] A cryptographic hash function may be defined as a hash function that is intended to be practically impossible to invert, i.e. to create the input data based on the hash value alone. Cryptographic hash function may comprise e.g. the MD5 function. An MD5 value may be a null-terminated string of UTF-8 characters containing a base64 encoded MD5 digest of the input data. One method of calculating the string is specified in IETF RFC 1864. It should be understood that instead of or in addition to MD5, other types of integrity check schemes could be used in various embodiments, such as different forms of the cyclic redundancy check (CRC), such as the CRC scheme used in ITU-T Recommendation H.271.

[0085] A checksum or hash sum may be defined as a small-size datum from an arbitrary block of digital data which may be used for the purpose of detecting errors which may have been introduced during its transmission or storage. The actual procedure which yields the checksum, given a data input may be called a checksum function or checksum algorithm. A checksum algorithm will usually output a significantly different value, even for small changes made to the input. This is especially true of cryptographic hash functions, which may be used to detect many data corruption errors and verify overall data integrity; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a high probability the data has not been altered or corrupted.

[0086] The term checksum may be defined to be equivalent to a cryptographic hash value or alike.

[0087] According to an embodiment, the properties are interpreted by a file editor to conclude which types of editing operations may be performed.

[0088] Hence, the method enables to indicate that the referenced items must be processed in the order they are listed in the item reference box. Thus, the file editor knows that the indicated item references shall not be reordered. By indicating only the item references which are to be treated as strictly ordered in file editing, the flexibility of the editing processes is increased.

[0089] New definitions and constraints for HEIF file format are provided for indicating the strict ordering of referenced items. Separate embodiments are presented for each of the following approaches:

1) New FullBox versions in ItemReferenceBox;

2) New FullBox flags in ItemReferenceBox;

3) New structure in ItemReferenceBox; 4) New Box definition;

5) New Box definitions for SingleltemTypeReferenceBox and

SingleltemTypeReferenceLargeBox;

6) Extending SingleltemTypeReferenceBox and SingleltemTypeReferenceLargeBox.

[0090] Embodiments are presented below with reference to particular syntax. It needs to be understood that the presented syntax is merely an example of a realizing the invention, and embodiments can be formed similarly by alternative syntax.

[0091] Each of the previously presented embodiments for the new definitions and constraints are discussed in more detailed manner in below. The detailed discussion provides information on the definition, syntax and semantics of each box.

[0092] 1) New FullBox versions in ItemReferenceBox

[0093] A new ItemReferenceBox syntax and data structure is defined in order to indicate strictly ordered item references. By introducing a new version, it is possible to indicate that strict ordering of image items is applied. Hence, required functionality is achieved with no additional data structure definitions other the item referencing.

[0094] A new version of the ItemReferenceBox is defined as follows:

aligned(8) class ItemReferenceBox extends FullBox (' iref ' , version, 0) { if (version==0) {

SingleltemTypeReferenceBox references [ ] ;

} else if (version==l) {

SingleltemTypeReferenceBoxLarge references [] ;

} else if (version==2) {

SingleltemTypeReferenceBox references [ ] ;

} else if (version==3) {

SingleltemTypeReferenceBoxLarge references [] ;

}

[0095] Thus, two new FullBox versions, referred to as version 2 and 3, are added to the ItemReferenceBox data structure. It is noted that version numbers 2 and 3 are given only as examples, and any other numbering may be used, as well.

[0096] If the version is equal to 2, then the references[] in SingleltemTypeReferenceBox are strictly ordered and their order shall not be modified/changed and referenced items shall be processed in the provided order.

[0097] If the version is equal to 3, then the references[] in

SingleltemTypeReferenceLargeBox are strictly ordered and their order shall not be modified/changed and referenced items shall be processed in the provided order. [0098] Zero or one ItemReferenceBoxes that have version 0 or 1 are allowed in a MetaBox to carry item references that are not strictly ordered. Zero or one ItemReferenceBoxes that have version 2 or 3 are allowed in a MetaBox to carry item references that are strictly ordered.

[0099] This is a backwards-compatible approach. A file reader which does not understand version 2 and version 3 shall not be able to process the contents of the item reference box with version equal to 2 or 3 but rather ignores the contents.

[0100] 2) New FullBox flags in ItemReferenceBox

[0101] In this approach, flags of the FullBox are utilized to indicate strict ordering of referenced items:

aligned(8) class ItemReferenceBox extends FullBox (' iref ' , version, flags) { if (version==0) {

SingleltemTypeReferenceBox references [ ] ;

} else if (version==l) {

SingleltemTypeReferenceBoxLarge references [] ;

}

[0102] Herein, flags value other than 0 indicates that all the items listed in this

ItemReferenceBox are strictly ordered and they shall be processed in the provided order.

[0103] 3) New structure in ItemReferenceBox

[0104] In this approach, a new data structure is included to the ItemReferenceBox to indicate the strict ordering of item references. Additionally, flags field or version field could be utilized to indicate the presence of the flag:

aligned(8) class ItemReferenceBox extends FullBox (' iref ' , version, 0) {

if (version==0) {

SingleltemTypeReferenceBox references [ ] ;

} else if (version==l) {

SingleltemTypeReferenceBoxLarge references [] ;

}

unsigned int(l) strict ordering flag;

unsigned int(7) reserved;

}

[0105] Herein, strict ordering flag indicates that the items listed in this ItemReferenceBox are strictly ordered and they shall be processed in the provided order. Again, the name of the flag and its location in the data structure is given as an example.

[0106] This is a non-backward compatible approach, where the legacy readers will not be able to interpret this data. [0107] In another embodiment, strict ordering is indicated inside a new Box data structure as follows:

aligned (8) class StrictOrderingBox ( 'stro' ) extends Box('stro') {

unsigned int(l) strict ordering flag;

unsigned int(7) reserved;

}

[0108] Herein, StrictOrderingBox encapsulates the strict ordering related data structure, and strict ordering flag indicates that the items listed in this ItemReferenceBox are strictly ordered and they shall be processed in the provided order.

aligned(8) class ItemReferenceBox extends FullBox ( ¹ iref ¹ , version, 0) { if (version==0) {

SingleltemTypeReferenceBox references [ ] ;

} else if (version==l) {

SingleltemTypeReferenceBoxLarge references [] ;

}

StrictOrderingBox StrictOrderlndicator ( ) ;

}

[0109] This approach is backwards compatible, since the data structure can be read and ignored by a reader which does not understand StrictOrderingBox.

[0110] Additionally, a new flag may be defined for the ItemReferenceBox in order to indicate the presence of the StrictOrderingBox.

[0111] 4) A new Box definition

[0112] In this approach, a new box called StrictltemReferenceBox may be defined as follows:

aligned(8) class StrictltemReferenceBox extends FullBox ( ¹ sref ¹ , version, flags) {

if (version==0) {

SingleltemTypeReferenceBox references [ ] ;

} else if (version==l) {

SingleltemTypeReferenceBoxLarge references [] ;

}

[0113] As in other embodiments described here, the name StrictltemReferenceBox is given only as an example. For signalling the presence of said box, a new 4CC code may be used. As an example,‘sref is used in the above data structure.

[01 14] 5) New Box definitions for SingleltemTypeReferenceBox and

singlcltcm 1 ypeReterenceLargeBox [0115] In this approach, the following two new boxes are defined in order to indicate strict ordering of items.

aligned (8) class SingleltemTypeStrictReferenceBox (referenceType) extends Box (referenceType) {

unsigned int(16) from item ID;

unsigned int(16) reference count;

for (j=0; j <reference_count; j++) {

unsigned int(16) to item ID;

}

aligned (8) class SingleltemTypeStrictReferenceBoxLarge (referenceType) extends Box (referenceType) {

unsigned int(32) from item ID;

unsigned int(16) reference count;

for (j=0; j <reference_count; j++) {

unsigned int(32) to item ID;

}

[01 16] Again, the names of the new boxes are given as examples. ItemReferenceBox can be extended with new versions in order to cover the strict ordering cases as follows:

SingleltemTypeReferenceBox references [ ] ;

} else if (version==l) {

SingleltemTypeReferenceBoxLarge references [] ;

} else if (version==2) {

SingleltemTypeStrictReferenceBox references [] ;

} else if (version==3) {

SingleltemTypeStrictReferenceBoxLarge references [] ;

}

[01 17] This approach is similar to Approach 1, but differing in the definition of new boxes that signal strict ordering of items.

[0118] According to another embodiment, ItemReferenceBox is allowed to carry child boxes of both indicating strict ordering and not indicating strict ordering of item references. The following syntax may be used:

aligned(8) class ItemReferenceBox extends FullBox (' iref ' , version, 0) { if (version==0 | | version==2) {

SingleltemTypeReferenceBox references [ ] ;

} else if (version==l | | version==3) { SingleltemTypeReferenceBoxLarge references [] ;

}

if (version==2) {

SingleltemTypeStrictReferenceBox ordered references [] ;

} else if (version==3) {

SingleltemTypeStrictReferenceBoxLarge ordered references [] ;

}

[01 19] According to yet another embodiment, ItemReferenceBox is allowed to carry child boxes of both indicating strict ordering and not indicating strict ordering of item references. The order of child boxes is not constrained and the version field of the ItemReferenceBox is merely used to differentiate between 16-bit and 32-bit item ID values. ShortltemldBox may be SingleltemTypeReferenceBox or SingleltemTypeStrictReferenceBox and need not be the same type of a box in each entry. LongltemldBox may be

SingleltemTypeReferenceBoxLarge or SingleltemTypeStrictReferenceBoxLarge and need not be the same of a box in each entry.

ShortltemldBox references [] ;

} else if (version==l) {

LongltemldBox references [] ;

}

[0120] 6) Extending SingleltemTypeReferenceBox and

SingleltemTypeReferenceLargeBox

[0121] In this approach, SingleltemTypeReferenceBox and

SingleltemTypeReferenceLargeBox are extended to indicate properties of the references. The properties may comprise but are not limited to one or more of the following:

Indication if the item references are strictly ordered.

Indication if one or more items indicated in the to item ID fields may be removed without making the item identified by the from item ID invalid.

A checksum that the file creator generated from the to item ID values in the order they are listed in the loop that lists the to item ID values. A checksum generation algorithm may be pre-defined e.g. in a standard or may be indicated in the file. For example, the MD5 checksum may be used. A file parser may derive the checksum from the to item ID values and compare the derived checksum value to the checksum value contained in the box to verify that the item references are intact. [0122] As an example, SingleltemTypeReferenceBox may be extended as follows. It needs to be understood that SingleltemTypeReferenceBoxLarge could be similarly extended. It is assumed that such extensions are backward-compatible, since existing readers would stop parsing immediately after the loop.

aligned (8) class SingleltemTypeReferenceBox (referenceType) extends

Box (referenceType) {

unsigned int(16) from item ID;

unsigned int(16) reference count;

for (j=0; j <reference_count; j++) {

unsigned int(16) to item ID;

}

unsigned int(l) strict ordering flag;

unsigned int(l) references essential flag;

unsigned int(l) md5 checksum flag;

unsigned int(21) reserved = 0;

if (md5 checksum flag)

string md5 string;

}

[0123] The semantics of the additional syntax elements may be specified for example as follows.

strict ordering flag equal to 1 indicates that the order of the to item ID values affects the interpretation of the item identified by the from item ID value

strict ordering flag equal to 0 indicates that the order of the to item ID values might or might not affect the interpretation of the item identified by the from item ID value references essential flag equal to 0 indicates that item(s) identified by to item ID value(s) may be removed without affecting the interpretation of the item identified by the from item ID value references essential flag equal to 1 indicates that removal of an item identified by any to item ID value affects the interpretation of the item identified by the from item ID value. Thus, if an item identified by any to item ID value is removed from the file, the item identified by the from item ID shall also be removed from the file.

md5_string is a null-terminated string of UTF-8 characters containing a base64 encoded MD5 digest of all the to item ID values in the order they are listed in the containing box.

[0124] The embodiments as disclosed herein may relate to but are not limited to image items, i.e. still images are stored as items. Figure 5 shows a block diagram of a video encoder suitable for obtaining such image items. Figure 5 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers. Figure 5 illustrates an embodiment of a video encoder comprising a first encoder section 500 for a base layer and a second encoder section 502 for an enhancement layer. Each of the first encoder section 500 and the second encoder section 502 may comprise similar elements for encoding incoming pictures. The encoder sections 500, 502 may comprise a pixel predictor 302, 402, prediction error encoder 303, 403 and prediction error decoder 304, 404. Figure 4 also shows an embodiment of the pixel predictor 302, 402 as comprising an inter-predictor 306, 406, an intra-predictor 308, 408, a mode selector 310, 410, a filter 316, 416, and a reference frame memory 318, 418. The pixel predictor 302 of the first encoder section 500 receives 300 base layer images of a video stream to be encoded at both the inter predictor 306 (which determines the difference between the image and a motion compensated reference frame 318) and the intra-predictor 308 (which determines a prediction for an image block based only on the already processed parts of current frame or picture). The output of both the inter-predictor and the intra-predictor are passed to the mode selector 310. The intra predictor 308 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 310. The mode selector 310 also receives a copy of the base layer picture 300. Correspondingly, the pixel predictor 402 of the second encoder section 502 receives 400 enhancement layer images of a video stream to be encoded at both the inter-predictor 406 (which determines the difference between the image and a motion compensated reference frame 418) and the intra predictor 408 (which determines a prediction for an image block based only on the already processed parts of current frame or picture). The output of both the inter-predictor and the intra-predictor are passed to the mode selector 410. The intra-predictor 408 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 410. The mode selector 410 also receives a copy of the enhancement layer picture 400.

[0125] Depending on which encoding mode is selected to encode the current block, the output of the inter-predictor 306, 406 or the output of one of the optional intra-predictor modes or the output of a surface encoder within the mode selector is passed to the output of the mode selector 310, 410. The output of the mode selector is passed to a first summing device 321, 421. The first summing device may subtract the output of the pixel predictor 302, 402 from the base layer picture 300/enhancement layer picture 400 to produce a first prediction error signal 320, 420 which is input to the prediction error encoder 303, 403. [0126] The pixel predictor 302, 402 further receives from a preliminary reconstructor 339, 439 the combination of the prediction representation of the image block 312, 412 and the output 338, 438 of the prediction error decoder 304, 404. The preliminary reconstructed image 314, 414 may be passed to the intra-predictor 308, 408 and to a filter 316, 416. The filter 316, 416 receiving the preliminary representation may filter the preliminary

representation and output a final reconstructed image 340, 440 which may be saved in a reference frame memory 318, 418. The reference frame memory 318 may be connected to the inter-predictor 306 to be used as the reference image against which a future base layer picture 300 is compared in inter-prediction operations. Subject to the base layer being selected and indicated to be source for inter-layer sample prediction and/or inter-layer motion information prediction of the enhancement layer according to some embodiments, the reference frame memory 318 may also be connected to the inter-predictor 406 to be used as the reference image against which a future enhancement layer pictures 400 is compared in inter-prediction operations. Moreover, the reference frame memory 418 may be connected to the inter predictor 406 to be used as the reference image against which a future enhancement layer picture 400 is compared in inter-prediction operations.

[0127] Filtering parameters from the filter 316 of the first encoder section 500 may be provided to the second encoder section 502 subject to the base layer being selected and indicated to be source for predicting the filtering parameters of the enhancement layer according to some embodiments.

[0128] The prediction error encoder 303, 403 comprises a transform unit 342, 442 and a quantizer 344, 444. The transform unit 342, 442 transforms the first prediction error signal 320, 420 to a transform domain. The transform is, for example, the DCT transform. The quantizer 344, 444 quantizes the transform domain signal, e.g. the DCT coefficients, to form quantized coefficients.

[0129] The prediction error decoder 304, 404 receives the output from the prediction error encoder 303, 403 and performs the opposite processes of the prediction error encoder 303,

403 to produce a decoded prediction error signal 338, 438 which, when combined with the prediction representation of the image block 312, 412 at the second summing device 339, 439, produces the preliminary reconstructed image 314, 414. The prediction error decoder may be considered to comprise a dequantizer 361, 461, which dequantizes the quantized coefficient values, e.g. DCT coefficients, to reconstruct the transform signal and an inverse

transformation unit 363, 463, which performs the inverse transformation to the reconstructed transform signal wherein the output of the inverse transformation unit 363, 463 contains reconstructed block(s). The prediction error decoder may also comprise a block filter which may filter the reconstructed block(s) according to further decoded information and filter parameters.

[0130] The entropy encoder 330, 430 receives the output of the prediction error encoder 303, 403 and may perform a suitable entropy encoding/variable length encoding on the signal to provide error detection and correction capability. The outputs of the entropy encoders 330, 430 may be inserted into a bitstream e.g. by a multiplexer 508.

[0131] Figure 6 shows a block diagram of a video decoder suitable for employing embodiments of the invention. Figure 6 depicts a structure of a two-layer decoder, but it would be appreciated that the decoding operations may similarly be employed in a single layer decoder.

[0132] The video decoder 550 comprises a first decoder section 552 for a base layer and a second decoder section 554 a predicted layer. Block 556 illustrates a demultiplexer for delivering information regarding base layer pictures to the first decoder section 552 and for delivering information regarding predicted layer pictures to the second decoder section 554. Reference P’n stands for a predicted representation of an image block. Reference D’n stands for a reconstructed prediction error signal. Blocks 704, 804 illustrate preliminary

reconstructed images (Fn). Reference R’n stands for a final reconstructed image. Blocks 703, 803 illustrate inverse transform Blocks 702, 802 illustrate inverse quantization (Q ¹). Blocks 701, 801 illustrate entropy decoding (E ¹). Blocks 705, 805 illustrate a reference frame memory (RFM). Blocks 706, 806 illustrate prediction (P) (either inter prediction or intra prediction). Blocks 707, 807 illustrate filtering (F). Blocks 708, 808 may be used to combine decoded prediction error information with predicted base layer/predicted layer images to obtain the preliminary reconstructed images (Fn). Preliminary reconstructed and filtered base layer images may be output 709 from the first decoder section 552 and preliminary reconstructed and filtered base layer images may be output 809 from the first decoder section 554.

[0133] Herein, the decoder should be interpreted to cover any operational unit capable to carry out the decoding operations, such as a player, a (file) editor, a receiver, a gateway, a demultiplexer and/or a decoder.

[0134] Figure 7 shows a flow chart of the operation of a file editor upon receiving the media file. The method comprises receiving (700) a media file authored with items comprising a plurality of referenced items to be processed in a specific order; reading (702) one or more properties of item references of said references items from said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered, indication if the referenced items are removable without making a referencing item invalid, a checksum generated from ID values of the referenced items in the order they are listed; and parsing (704) the media file parsing the media file according to said one or more properties.

[0135] In an embodiment, a file editor receives an instruction to remove a first item from a media file. The file editor checks from ItemReferenceBoxes if the first item is referenced by any other item in the media file. The file editor removes the first item if it is not referenced by any other item or if it is indicated in the media file that the first item is removable without making the referencing item(s) invalid.

[0136] In an embodiment, a file editor receives an instruction to reorder a first item in a first item reference. The file editor reorders the first item if the first item reference is not indicated to be strictly ordered.

[0137] Figure 8 is a graphical representation of an example multimedia communication system within which various embodiments may be implemented. A data source 1510 provides a source signal in an analog, uncompressed digital, or compressed digital format, or any combination of these formats. An encoder 1520 may include or be connected with a pre processing, such as data format conversion and/or filtering of the source signal. The encoder 1520 encodes the source signal into a coded media bitstream. It should be noted that a bitstream to be decoded may be received directly or indirectly from a remote device located within virtually any type of network. Additionally, the bitstream may be received from local hardware or software. The encoder 1520 may be capable of encoding more than one media type, such as audio and video, or more than one encoder 1520 may be required to code different media types of the source signal. The encoder 1520 may also get synthetically produced input, such as graphics and text, or it may be capable of producing coded bitstreams of synthetic media. In the following, only processing of one coded media bitstream of one media type is considered to simplify the description. It should be noted, however, that typically real-time broadcast services comprise several streams (typically at least one audio, video and text sub-titling stream). It should also be noted that the system may include many encoders, but in the figure only one encoder 1520 is represented to simplify the description without a lack of generality. It should be further understood that, although text and examples contained herein may specifically describe an encoding process, one skilled in the art would understand that the same concepts and principles also apply to the corresponding decoding process and vice versa. [0138] The coded media bitstream may be transferred to a storage 1530. The storage 1530 may comprise any type of mass memory to store the coded media bitstream. The format of the coded media bitstream in the storage 1530 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file, or the coded media bitstream may be encapsulated into a Segment format suitable for DASH (or a similar streaming system) and stored as a sequence of Segments. If one or more media bitstreams are encapsulated in a container file, a file generator (not shown in the figure) may be used to store the one more media bitstreams in the file and create file format metadata, which may also be stored in the file. The encoder 1520 or the storage 1530 may comprise the file generator, or the file generator is operationally attached to either the encoder 1520 or the storage 1530. Some systems operate“live”, i.e. omit storage and transfer coded media bitstream from the encoder 1520 directly to the sender 1540. The coded media bitstream may then be transferred to the sender 1540, also referred to as the server, on a need basis. The format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, a Segment format suitable for DASH (or a similar streaming system), or one or more coded media bitstreams may be encapsulated into a container file. The encoder 1520, the storage 1530, and the server 1540 may reside in the same physical device or they may be included in separate devices. The encoder 1520 and server 1540 may operate with live real-time content, in which case the coded media bitstream is typically not stored

permanently, but rather buffered for small periods of time in the content encoder 1520 and/or in the server 1540 to smooth out variations in processing delay, transfer delay, and coded media bitrate.

[0139] The server 1540 sends the coded media bitstream using a communication protocol stack. The stack may include but is not limited to one or more of Real-Time Transport Protocol (RTP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), Transmission Control Protocol (TCP), and Internet Protocol (IP). When the communication protocol stack is packet-oriented, the server 1540 encapsulates the coded media bitstream into packets. For example, when RTP is used, the server 1540 encapsulates the coded media bitstream into RTP packets according to an RTP payload format. Typically, each media type has a dedicated RTP payload format. It should be again noted that a system may contain more than one server 1540, but for the sake of simplicity, the following description only considers one server 1540.

[0140] If the media content is encapsulated in a container file for the storage 1530 or for inputting the data to the sender 1540, the sender 1540 may comprise or be operationally attached to a "sending file parser" (not shown in the figure). In particular, if the container file is not transmitted as such but at least one of the contained coded media bitstream is encapsulated for transport over a communication protocol, a sending file parser locates appropriate parts of the coded media bitstream to be conveyed over the communication protocol. The sending file parser may also help in creating the correct format for the communication protocol, such as packet headers and payloads. The multimedia container file may contain encapsulation instructions, such as hint tracks in the ISOBMFF, for

encapsulation of the at least one of the contained media bitstream on the communication protocol.

[0141] The server 1540 may or may not be connected to a gateway 1550 through a communication network, which may e.g. be a combination of a CDN, the Internet and/or one or more access networks. The gateway may also or alternatively be referred to as a middle- box. For DASH, the gateway may be an edge server (of a CDN) or a web proxy. It is noted that the system may generally comprise any number gateways or alike, but for the sake of simplicity, the following description only considers one gateway 1550. The gateway 1550 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data stream according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions.

[0142] The system includes one or more receivers 1560, typically capable of receiving, de modulating, and de-capsulating the transmitted signal into a coded media bitstream. The coded media bitstream may be transferred to a recording storage 1570. The recording storage 1570 may comprise any type of mass memory to store the coded media bitstream. The recording storage 1570 may alternatively or additive ly comprise computation memory, such as random access memory. The format of the coded media bitstream in the recording storage 1570 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. If there are multiple coded media bitstreams, such as an audio stream and a video stream, associated with each other, a container file is typically used and the receiver 1560 comprises or is attached to a container file generator producing a container file from input streams. Some systems operate“live,” i.e. omit the recording storage 1570 and transfer coded media bitstream from the receiver 1560 directly to the decoder 1580. In some systems, only the most recent part of the recorded stream, e.g., the most recent 10-minute excerption of the recorded stream, is maintained in the recording storage 1570, while any earlier recorded data is discarded from the recording storage 1570.

[0143] The coded media bitstream may be transferred from the recording storage 1570 to the decoder 1580. If there are many coded media bitstreams, such as an audio stream and a video stream, associated with each other and encapsulated into a container file or a single media bitstream is encapsulated in a container file e.g. for easier access, a file parser (not shown in the figure) is used to decapsulate each coded media bitstream from the container file. The recording storage 1570 or a decoder 1580 may comprise the file parser, or the file parser is attached to either recording storage 1570 or the decoder 1580. It should also be noted that the system may include many decoders, but here only one decoder 1570 is discussed to simplify the description without a lack of generality

[0144] The coded media bitstream may be processed further by a decoder 1570, whose output is one or more uncompressed media streams. Finally, a Tenderer 1590 may reproduce the uncompressed media streams with a loudspeaker or a display, for example. The receiver 1560, recording storage 1570, decoder 1570, and Tenderer 1590 may reside in the same physical device or they may be included in separate devices.

[0145] A sender 1540 and/or a gateway 1550 may be configured to perform switching between different representations e.g. for switching between different viewports of 360- degree video content, view switching, bitrate adaptation and/or fast start-up, and/or a sender 1540 and/or a gateway 1550 may be configured to select the transmitted representation(s). Switching between different representations may take place for multiple reasons, such as to respond to requests of the receiver 1560 or prevailing conditions, such as throughput, of the network over which the bitstream is conveyed. In other words, the receiver 1560 may initiate switching between representations. A request from the receiver can be, e.g., a request for a Segment or a Subsegment from a different representation than earlier, a request for a change of transmitted scalability layers and/or sub-layers, or a change of a rendering device having different capabilities compared to the previous one. A request for a Segment may be an HTTP GET request. A request for a Subsegment may be an HTTP GET request with a byte range. Additionally or alternatively, bitrate adjustment or bitrate adaptation may be used for example for providing so-called fast start-up in streaming services, where the bitrate of the transmitted stream is lower than the channel bitrate after starting or random-accessing the streaming in order to start playback immediately and to achieve a buffer occupancy level that tolerates occasional packet delays and/or retransmissions. Bitrate adaptation may include multiple representation or layer up-switching and representation or layer down-switching operations taking place in various orders.

[0146] A decoder 1580 may be configured to perform switching between different representations e.g. for switching between different viewports of 360-degree video content, view switching, bitrate adaptation and/or fast start-up, and/or a decoder 1580 may be configured to select the transmitted representation(s). Switching between different representations may take place for multiple reasons, such as to achieve faster decoding operation or to adapt the transmitted bitstream, e.g. in terms of bitrate, to prevailing conditions, such as throughput, of the network over which the bitstream is conveyed. Faster decoding operation might be needed for example if the device including the decoder 1580 is multi-tasking and uses computing resources for other purposes than decoding the video bitstream. In another example, faster decoding operation might be needed when content is played back at a faster pace than the normal playback speed, e.g. twice or three times faster than conventional real-time playback rate.

[0147] In the above, where the example embodiments have been described with reference to a file generator or file writer, it needs to be understood that the resulting file and a file parser may have corresponding elements in them. Likewise, where the example embodiments have been described with reference to a file parser, it needs to be understood that the file writer may have structure and/or computer program for generating the file to be parsed by the file parser.

[0148] In the above, where the example embodiments have been described with reference to an encoder, it needs to be understood that the resulting bitstream and the decoder may have corresponding elements in them. Likewise, where the example embodiments have been described with reference to a decoder, it needs to be understood that the encoder may have structure and/or computer program for generating the bitstream to be decoded by the decoder.

[0149] The embodiments of the invention described above describe the codec in terms of separate encoder and decoder apparatus in order to assist the understanding of the processes involved. However, it would be appreciated that the apparatus, structures and operations may be implemented as a single encoder-decoder apparatus/structure/operation. Furthermore, it is possible that the coder and decoder may share some or all common elements.

[0150] Although the above examples describe embodiments of the invention operating within a codec within an electronic device, it would be appreciated that the invention as defined in the claims may be implemented as part of any video codec. Thus, for example, embodiments of the invention may be implemented in a video codec which may implement video coding over fixed or wired communication paths.

[0151] Thus, user equipment may comprise a video codec such as those described in embodiments of the invention above. It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.

[0152] Furthermore elements of a public land mobile network (PLMN) may also comprise video codecs as described above.

[0153] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0154] The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.

[0155] The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multi-core processor architecture, as non-limiting examples. [0156] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

[0157] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

[0158] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar

modifications of the teachings of this invention will still fall within the scope of this invention.

Claims

CLAIMS:

1. A method comprising:

authoring items into a media file, said items being associated with a plurality of referenced items to be processed in a specific order; and

authoring one or more properties of item references of said referenced items into said media file, wherein said properties include one or more of the following:

indication if the item references are strictly ordered;

2. The method according to claim 1, further comprising

authoring said one or more properties of item references of said references items into a ItemReferenceBox according to ISO Base Media File Format (ISOBMFF).

3. An apparatus comprising

means for authoring items into a media file, said items being associated with a plurality of referenced items to be processed in a specific order; and

means for authoring one or more properties of item references of said referenced items into said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered;

4. The apparatus according to claim 3, further comprising

means for authoring said one or more properties of item references of said referenced items into a ItemReferenceBox according to ISO Base Media File Format (ISOBMFF).

5. The apparatus according to claim 4, wherein at least one further data structure is included in a syntax of the ItemReferenceBox to indicate the strictly ordered item references.

6. The apparatus according to claim 4, wherein flag of the syntax of the ItemReferenceBox are used to indicate the strictly ordered item references.

7. The apparatus according to claim 3, wherein at least one further box is defined in

accordance with ISOBMFF syntax to indicate the strictly ordered item references.

8. The apparatus according to any of claims 3 - 7, wherein the items are image items and the media file format is a High Efficiency Image File Format or a High Efficiency Image File compatible storage format.

9. The apparatus according to any of claims 3 - 8, wherein a checksum generation algorithm is pre-defined or indicated in the media file.

10. A method comprising :

receiving a media file authored with image items comprising a plurality of referenced items to be processed in a specific order;

reading one or more properties of item references of said references items from said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered, indication if the referenced items are removable without making a referencing item invalid, a checksum generated from ID values of the referenced items in the order they are listed; and

parsing the media file according to said one or more properties.

11. The method according to claim 10, further comprising

receiving an instruction to remove a first item from a media file;

checking from ItemReferenceBoxes if the first item is referenced by any other item in the media file; and

removing the first item if it is not referenced by any other item or if it is indicated in the media file that the first item is removable without making the referencing item(s) invalid.

12. The method according to claim 10, further comprising

receiving an instruction to reorder a first item in a first item reference; and reordering the first item if the first item reference is not indicated to be strictly ordered.

13. An apparatus comprising

means for receiving a media file authored with image items comprising a plurality of referenced items to be processed in a specific order;

means for reading one or more properties of item references of said references items from said media file, wherein said properties include one or more of the following: indication if the item references are strictly ordered, indication if the referenced items are removable without making a referencing item invalid, a checksum generated from ID values of the referenced items in the order they are listed; and

means for parsing the media file parsing the media file according to said one or more properties.

14. The apparatus according to claim 13, further comprising

means for receiving an instruction to remove a first item from a media file;

means for checking from ItemReferenceBoxes if the first item is referenced by any other item in the media file; and

means for removing the first item if it is not referenced by any other item or if it is indicated in the media file that the first item is removable without making the referencing item(s) invalid.

15. The apparatus according to claim 13, further comprising

means for receiving an instruction to reorder a first item in a first item reference; and

means for reordering the first item if the first item reference is not indicated to be strictly ordered.