WO2023194816A1 - Procédé et appareil pour informations d'entrée de groupe de pistes - Google Patents

Procédé et appareil pour informations d'entrée de groupe de pistes Download PDF

Info

Publication number
WO2023194816A1
WO2023194816A1 PCT/IB2023/052013 IB2023052013W WO2023194816A1 WO 2023194816 A1 WO2023194816 A1 WO 2023194816A1 IB 2023052013 W IB2023052013 W IB 2023052013W WO 2023194816 A1 WO2023194816 A1 WO 2023194816A1
Authority
WO
WIPO (PCT)
Prior art keywords
track group
box
track
tracks
timed media
Prior art date
Application number
PCT/IB2023/052013
Other languages
English (en)
Inventor
Lukasz Kondrad
Lauri Aleksi ILOLA
Miska Matias Hannuksela
Kashyap KAMMACHI SREEDHAR
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2023194816A1 publication Critical patent/WO2023194816A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the examples and non-limiting embodiments relate generally to multimedia transport , and more particularly, to t rack group entry information .
  • An example apparatus include : at least one processor; and at least one non-transitory memory including computer program code ; wherein the at least one memory and the computer program code are configured to , with the at least one processor, cause the apparatus at least to perform : indicate that the plurality of timed media t racks comprise a track group ; create a track group entry information st ructure to indi cate a dependency or relation between at least one of the plurality of timed media tracks of the track group or des cribe track group characteri stics ; associate the track group entry information structure with the track group; and store the track group entry information st ructure and the plurality of timed media tracks in the fi le .
  • the example apparatus may further include , wherein the apparatus is further caused to obtain a plurality of timed media bitstreams with a known dependency or relation; and encapsulate the plurality of timed media bitstreams as a plurality of timed media tracks in a fi le .
  • the example apparatus may further include , wherein the apparatus is further caused to provide or transmit the file to an entity, wherein the entity uses the track group entry informat ion structure to understand the dependency or relation between at least one of the plurality of timed media tracks of the track group or the track group characteristics.
  • the example apparatus may further include, wherein the plurality of timed media tracks in the track group comprises one version of a media presentation which is made available for user selection .
  • the example apparatus may further include, wherein the plurality of timed media tracks in the track group comprises one version of a media presentation which is selected by the entity by an automatic process.
  • the example apparatus may further include, wherein the track group comprises information to indicate whether the track group is a default track among a plurality of track groups of the media presentation .
  • the apparatus may further include, wherein the plurality of tracks within the track group are at least one: simultaneously or substantially simultaneously decoded in a media session; or simultaneously or substantially simultaneously presented in the media session .
  • the example apparatus may further include, wherein the track group entry information structure comprises a track group entry box.
  • the example apparatus may further include, wherein the apparatus is further caused to create a track group description box to store at least one of a description of the dependency or relation between the plurality of timed media tracks of the track group or the track group characteristics.
  • the example apparatus may further include, wherein a movie box comprises the track group description box. [0013] The example apparatus may further include, wherein a movie header box comprises the track group description box.
  • the example apparatus may further include, wherein the apparatus is further caused to define a new version of the movie header box to signal the presence of the track group description box in the movie header box.
  • the example apparatus may further include, wherein the apparatus is further caused to set a bit of a flag in the movie header box to signal the presence of the track group description box in the movie header box.
  • the example apparatus may further include, wherein the track group description box is a companion to a movie header box, and wherein the track group description box follows the movie header box within a movie box.
  • the example apparatus may further include, wherein the apparatus is further caused to provide the track group description box by using a track group entry box, wherein a track group entry box in the track group description box is uniquely identified by using a track group entry type and a track group id.
  • the example apparatus may further include, wherein the track group description box comprises one or more track group entry boxes with a unique track group entry type.
  • the example apparatus may further include, wherein a track group type box in a track is uniquely assigned to the track group entry box when the track group type box and the track group entry box comprise the corresponding track group type and track group entry type and when the track group type box and the track group entry box comprise the same track group id.
  • the example apparatus may further include, wherein the track group type box signals that a given track group type is associated with the track group entry box in the track group description box.
  • the example apparatus may further include, wherein the apparatus is further caused to use a version of the track group type box to indicate that the track group comprises an associated track group entry box.
  • the example apparatus may further include, wherein the apparatus is further caused to use flags filed of the track group type box to indicate that the track group comprises an associated track group entry box.
  • the example apparatus may further include, wherein the apparatus is caused to define a preselection track group entry box to provide information about the track group that is to be used by a parser to create a document for adaptive streaming.
  • An example method includes: indicating that the plurality of timed media tracks comprise a track group; creating a track group entry information structure for indicating a dependency or relation between at least one of the plurality of timed media tracks of the track group or describing track group characteristics; associating the track group entry information structure with the track group; and storing the track group entry information structure and the plurality of timed media tracks in the file.
  • the example method may further include obtaining a plurality of timed media bitstreams with a known dependency or relation; and encapsulating the plurality of timed media bitstreams as a plurality of timed media tracks in a file.
  • the example method may further include providing or transmitting the file to an entity, wherein the entity uses the track group entry information structure to understand the dependency or relation between at least one of the plurality of timed media tracks of the track group or the track group characteristics.
  • the example method may further include, wherein the plurality of timed media tracks in the track group comprises one version of a media presentation which is made available for user selection.
  • the example method may further include, wherein the plurality of timed media tracks in the track group comprises one version of a media presentation which is selected by the entity by an automatic process .
  • the example method may further include, wherein the track group comprises information to indicate whether the track group is a default track among a plurality of track groups of the media presentation .
  • the example method may further include, wherein the plurality of tracks within the track group are at least one: simultaneously or substantially simultaneously decoded in a media session; or simultaneously or substantially simultaneously presented in the media session .
  • the example method may further include, wherein the track group entry information structure comprises a track group entry box.
  • the example method may further include creating a track group description box to store at least one of a description of the dependency or relation between the plurality of t imed media t racks of the track group or the track group characteristi cs .
  • the example method may further include , wherein a movie box comprises the track group description box .
  • the example method may further include , wherein a movie header box comprises the track group description box .
  • the example method may further include defining a new vers ion of the movie header box to s ignal the presence of the track group description box in the movie header box .
  • the example method may further include sett ing a bit of a flag in the movie header box to signal the presence of the t rack group description box in the movie header box .
  • the example method may further include , wherein the track group description box is a companion to a movie header box , and wherein the t rack group des cription box fol lows the movie header box within a movie box .
  • the example method may further include providing the track group description box by using a track group entry box, wherein a track group entry box in the track group description box i s uniquely identified by using a track group entry type and a track group id .
  • the example method may further include , wherein the track group des cription box comprises one or more track group entry boxes with a unique track group entry type .
  • the example method may further include, wherein a track group type box in a track is uniquely ass igned to the track group entry box when the track group type box and the track group entry box comprise the corresponding track group type and track group ent ry type and when the track group type box and the track group entry box comprise the same track group id .
  • the example method may further include, wherein the track group type box signals that a given track group type is associated with the track group entry box in the track group description box.
  • the example method may further include using a version of the track group type box to indicate that the track group comprises an associated track group entry box.
  • the example method may further include using flags filed of the track group type box to indicate that the track group comprises an associated track group entry box.
  • the example method may further include defining a preselection track group entry box to provide information about the track group that is to be used by a parser to create a document for adaptive streaming.
  • Another example apparatus includes: at least one processor; and at least one non-transitory memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: receive a file, wherein an entity uses a track group entry information structure stored in the file to understand a dependency or relation between at least one of a plurality of timed media tracks of a track group or track group characteristics, and wherein: the plurality of timed media tracks are indicated to comprise the track group; the track group entry information structure is created to indicate the dependency or relation between at least one of the plurality of timed media tracks of the track group or describe the track group characteristics; the track group entry information structure is associated with the track group; and the plurality of timed media tracks are stored in the file; and decode or present the plurality of tracks within the track group in a media session.
  • the example apparatus may further include, wherein a plurality of timed media bitstreams are obtained with a known dependency or relation; and the plurality of timed media bitstreams are encapsulated as the plurality of timed media tracks in the file.
  • the example apparatus may further include, wherein the plurality of tracks within the track group are decoded or presented simultaneously or substantially simultaneously.
  • the example apparatus may further include, wherein the apparatus further caused to performs the methods as described in any of the previous paragraphs.
  • An example computer readable medium includes program instructions for causing an apparatus to perform at least the following: indicate that the plurality of timed media tracks comprise a track group; create a track group entry information structure to indicate a dependency or relation between at least one of the plurality of timed media tracks of the track group or describe track group characteristics; associate the track group entry information structure with the track group; and store the track group entry information structure and the plurality of timed media tracks in the file.
  • the example computer readable medium may further include, wherein the computer readable medium further causes the apparatus to perform obtain a plurality of timed media bitstreams with a known dependency or relation; and encapsulate the plurality of timed media bitstreams as a plurality of timed media tracks in a file.
  • the example computer readable medium may further include, wherein the computer readable medium includes a non-transitory computer readable medium.
  • the example computer readable medium may further include, wherein the computer readable medium further causes the apparatus to perform the methods as described in any of the previous paragraphs.
  • Another example computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receive a file, wherein an entity uses a track group entry information structure stored in the file to understand a dependency or relation between at least one of a plurality of timed media tracks of a track group or track group characteristics, and wherein: the plurality of timed media tracks are indicated to comprise the track group; the track group entry information structure is created to indicate the dependency or relation between at least one of the plurality of timed media tracks of the track group or describe the track group characteristics; the track group entry information structure is associated with the track group; and the plurality of timed media tracks are stored in the file; and decode or present the plurality of tracks within the track group in a media session.
  • the example computer readable medium may further include, wherein a plurality of timed media bitstreams are obtained with a known dependency or relation; and wherein the plurality of timed media bitstreams are encapsulated as the plurality of timed media tracks in the file.
  • the example computer readable medium may further include, wherein the plurality of tracks within the track group are decoded or presented simultaneously or substantially simultaneously.
  • the example computer readable medium may further include, wherein the computer readable medium comprises a non-transitory computer readable medium.
  • the example computer readable medium may further include , wherein the computer readable medium further causes the apparatus to perform the methods as described in any of the previous paragraphs .
  • Yet another example apparatus includes : means for indicating that the plural ity of t imed media t racks comprise a track group; means for creat ing a track group entry information structure to indicate a dependency or relation between at least one of the plurality of timed media tracks of the track group or des cribe track group characterist ics ; means for associating the track group entry information structure with the track group; and means for storing the track group entry information structure and the plurality of timed media t racks in the file .
  • the example method may further include means for obtaining a plural ity of timed media bitstreams with a known dependency or relation; and means for encapsulating the plurality of timed media bitstreams as a plurality of t imed media tracks in a file .
  • the example apparatus may further include , wherein the apparatus further comprises means for performing the methods as described in any of the previous paragraphs .
  • a stil l another example apparatus includes : means for receiving a file , wherein an ent ity uses a track group entry information structure stored in the fi le to understand a dependency or relation between at least one of a plural ity of t imed media t racks of a track group or track group characteri st ics , and wherein : the plurality of timed media t racks are indi cated to comprise the track group; the track group entry information structure is created to indicate the dependency or relation between at least one of the plurality of t imed media tracks of the t rack group or describe the track group characterist ics ; the track group entry informat ion st ructure is associated with the track group; and the plurality of timed media tracks are stored in the file ; and means for decoding or present ing the plurality of tracks within the track group in a media session .
  • the example apparatus may further include , wherein a plurality of timed media bitstreams are obtained with a known dependency or relation; the plurality of timed media bitst reams are encapsulated as the plurality of timed media tracks in the file ;
  • the example apparatus may further include , wherein the plurality of tracks within the track group are decoded or presented simultaneous ly or substantially simultaneously .
  • the example apparatus may further include , wherein the apparatus further comprises means for performing the methods as described in any of the previous paragraphs .
  • Another example method includes receiving a file, wherein an entity uses a track group entry information structure stored in the fi le to understand a dependency or relation between at least one of a plurality of timed media tracks of a track group or track group characterist ics , and wherein : the plurality of timed media tracks are indicated to compri se the track group; the track group entry information structure is created to indicate the dependency or relation between at least one of the plurality of timed media t racks of the track group or des cribe the track group characteristics ; the track group entry information structure is as sociated with the track group; and the plurality of t imed media tracks are stored in the file ; and decoding or presenting the plurality of tracks within the track group in a media ses sion .
  • the example method may further includes , wherein a plural ity of t imed media bitst reams are obtained with a known dependency or relation; and the plurality of timed media bitstreams are encapsulated as the plurality of timed media tracks in the fi le .
  • the example method may further include wherein the plural ity of tracks within the t rack group are decoded or presented simultaneous ly or substantially simultaneously .
  • FIG. 1 shows schematically an electronic device employing embodiments of the examples described herein.
  • FIG. 2 shows schematically a user equipment suitable for employing embodiments of the examples described herein.
  • FIG. 3 further shows schematically electronic devices employing embodiments of the examples described herein connected using wireless and wired network connections.
  • FIG. 4 shows schematically a block diagram of an encoder on a general level.
  • FIG. 5 illustrates a system configured to support streaming of media data from a source to a client device.
  • FIG. 6 is a block diagram of an apparatus that may be specifically configured in accordance with an example embodiment.
  • FIG. 7 is an example apparatus configured to implement mechanisms for tracking group entry information and/or processing track group entry information, in accordance with an embodiment.
  • FIG. 8 is an example method for tracking group entry information, in accordance with an embodiment.
  • FIG. 9 is an example method for processing track group entry information, in accordance with an embodiment.
  • FIG. 10 is a block diagram of one possible and non-limiting system in which the example embodiments may be practiced. DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
  • AIoT Al-enabled loT a . k . a also known as
  • EDRAP extended dependent random acces s point eNB (or eNodeB) evolved Node B for example , an LTE base station )
  • EN-DC E-UTRA-NR dual connectivity en-gNB or En-gNB node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in EN- DC
  • E-UTRA evolved universal terrestrial radio access, for example, the LTE radio access technology
  • Fl or Fl-C interface between CU and DU control interface gNB (or gNodeB) base station for 5G/NR for example, a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC
  • H.222.0 MPEG-2 Systems is formally known as ISO/IEC 13818-1 and as ITU-T Rec. H.222.0
  • H .26x family of video coding standards in the domain of the ITU-T H .26x family of video coding standards in the domain of the ITU-T
  • LZMA2 simple container format that can include both uncompressed data and LZMA data
  • WLAN wireles s local area network
  • circuitry refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry) ; (b) combinations of circuits and computer program product (s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor (s) or a portion of a microprocessor (s) , that require software or firmware for operation even when the software or firmware is not physically present.
  • the term 'circuitry' also includes an implementation comprising one or more processors and/or portion (s) thereof and accompanying software and/or firmware.
  • the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
  • a 'computer-readable storage medium' which refers to a non-transitory physical storage medium (e.g. , volatile or non-volatile memory device)
  • a 'computer-readable transmission medium' which refers to an electromagnetic signal.
  • a method, apparatus and computer program product are provided in accordance with an example embodiment in order to implement mechanisms for tracking group entry information and/or processing track group entry information.
  • FIG. 1 shows an example block diagram of an apparatus 50.
  • the apparatus may be an internet of things (loT) apparatus configured to perform various functions, for example, gathering information by one or more sensors, receiving or transmitting information, analyzing information gathered or received by the apparatus, or the like.
  • the apparatus may comprise a video coding system, which may incorporate a codec.
  • FIG. 2 shows a layout of an apparatus according to an example embodiment. The elements of FIG. 1 and FIG. 2 will be explained next.
  • the apparatus 50 may for example be a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or a lower power device.
  • a sensor device for example, a sensor device, a tag, or a lower power device.
  • a tag for example, a sensor device, a tag, or a lower power device.
  • embodiments of the examples described herein may be implemented within any electronic device or apparatus.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 may further comprise a display 32, for example, in the form of a liquid crystal display, light emitting diode display, organic light emitting diode display, and the like.
  • the display may be any suitable display technology suitable to display media or multimedia content, for example, an image or a video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which in embodiments of the examples described herein may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery (or in other embodiments of the examples described herein the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator) .
  • the apparatus may further comprise a camera capable of recording or capturing images and/or video.
  • the apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
  • the apparatus 50 may comprise a controller 56, a processor or a processor circuitry for controlling the apparatus 50.
  • the controller 56 may be connected to a memory 58 which in embodiments of the examples described herein may store both data in the form of an image, audio data, video data, and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and/or decoding of audio, image, and/or video data or assisting in coding and/or decoding carried out by the controller.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example, a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network .
  • a card reader 48 and a smart card 46 for example, a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network .
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals, for example, for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus (es) and/or for receiving radio frequency signals from other apparatus (es) .
  • the apparatus 50 may comprise a camera 42 capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing.
  • the apparatus may receive the video image data for processing from another device prior to transmission and/or storage.
  • the apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding .
  • the structural elements of apparatus 50 described above represent examples of means for performing a corresponding function.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired or wireless networks including, but not limited to, a wireless cellular telephone network (such as a GSM, UMTS, CDMA, LTE, 4G, 5G network, and the like) , a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth® personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • a wireless cellular telephone network such as a GSM, UMTS, CDMA, LTE, 4G, 5G network, and the like
  • WLAN wireless local area network
  • the system 10 may include both wired and wireless communication devices and/or apparatus 50 suitable for implementing embodiments of the examples described herein.
  • the system shown in FIG. 3 shows a mobile telephone network 11 and a representation of the Internet 28.
  • Connectivity to the Internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22.
  • PDA personal digital assistant
  • IMD integrated messaging device
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • the apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport .
  • the embodiments may also be implemented in a set-top box; for example, a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC) , which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • a digital TV receiver which may/may not have a display or wireless capabilities
  • PC personal computers
  • hardware and/or software to process neural network data in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28.
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA) , global systems for mobile communications (GSM) , universal mobile telecommunications system (UMTS) , time divisional multiple access (TDMA) , frequency division multiple access (FDMA) , transmission control protocol-internet protocol (TCP-IP) , short messaging service (SMS) , multimedia messaging service (MMS) , email, instant messaging service (IMS) , Bluetooth, IEEE 802.11, 3GPP Narrowband loT and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocol-internet protocol
  • SMS short messaging service
  • MMS multimedia messaging service
  • email instant messaging service
  • IMS instant messaging service
  • Bluetooth IEEE 802.11, 3GPP Narrowband loT and any similar wireless communication technology.
  • a channel may refer either to a physical channel or to a logical channel.
  • a physical channel may refer to a physical transmission medium such as a wire
  • a logical channel may refer to a logical connection over a multiplexed medium, capable of conveying several logical channels.
  • a channel may be used for conveying an information signal, for example a bitstream, from one or several senders (or transmitters) to one or several receivers.
  • the embodiments may also be implemented in internet of things (ToT) devices.
  • the ToT may be defined, for example, as an interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure.
  • loT devices are provided with an IP address as a unique identifier.
  • the loT devices may be provided with a radio transmitter, such as WLAN or Bluetooth® transmitter or an RFID tag.
  • loT devices may have access to an IP-based network via a wired network, such as an Ethernet-based network or a powerline connection (PLC) .
  • PLC powerline connection
  • FIGs. 1 to 3 encoding, decoding, signalling, and/or transporting of an image file format, in accordance with various embodiments.
  • An MPEG-2 transport stream (TS) , specified in ISO/IEC 13818- 1 or equivalently in ITU-T Recommendation H.222.0, is a format for carrying audio, video, and other media as well as program metadata or other metadata, in a multiplexed stream.
  • a packet identifier (PID) is used to identify an elementary stream (a.k.a. packetized elementary stream) within the TS.
  • PID packet identifier
  • Available media file format standards include ISO base media file format (ISO/IEC 14496-12, which may be abbreviated ISOBMFF) and file format for NAL unit structured video (ISO/IEC 14496-15) , which derives from the ISOBMFF.
  • ISOBMFF ISO base media file format
  • ISO/IEC 14496-15 file format for NAL unit structured video
  • Video codec includes an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can decompress the compressed video representation back into a viewable form.
  • a video encoder and/or a video decoder may also be separate from each other, for example, need not form a codec.
  • encoder discards some information in the original video sequence in order to represent the video in a more compact form (e.g. , at lower bitrate) .
  • pixel values in a certain picture area are predicted, for example, by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner) .
  • the prediction error for example, the difference between the predicted block of pixels and the original block of pixels, is coded.
  • encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate) .
  • a specified transform for example, Discrete Cosine Transform (DCT) or a variant of it
  • DCT Discrete Cosine Transform
  • encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate) .
  • the sources of prediction are previously decoded pictures (a.k.a. reference pictures) .
  • IBC intra block copy
  • prediction is applied similarly to temporal prediction but the reference picture is the current picture and only previously decoded samples can be referred in the prediction process.
  • Inter-layer or inter-view prediction may be applied similarly to temporal prediction, but the reference picture is a decoded picture from another scalable layer or from another view, respectively.
  • inter prediction may refer to temporal prediction only, while in other cases inter prediction may refer collectively to temporal prediction and any of intra block copy, inter-layer prediction, and inter-view prediction provided that they are performed with the same or similar process than temporal prediction.
  • Inter prediction or temporal prediction may sometimes be referred to as motion compensation or motion-compensated prediction.
  • Inter prediction which may also be referred to as temporal prediction, motion compensation, or motion-compensated prediction, reduces temporal redundancy.
  • inter prediction the sources of prediction are previously decoded pictures.
  • Intra prediction utilizes the fact that adjacent pixels within the same picture are likely to be correlated.
  • Intra prediction can be performed in spatial or transform domain, for example, either sample values or transform coefficients can be predicted. Intra prediction is typically exploited in intra coding, where no inter prediction is applied.
  • One outcome of the coding procedure is a set of coding parameters, such as motion vectors and quantized transform coefficients. Many parameters can be entropy-coded more efficiently when they are predicted first from spatially or temporally neighboring parameters. For example, a motion vector may be predicted from spatially adjacent motion vectors and only the difference relative to the motion vector predictor may be coded. Prediction of coding parameters and intra prediction may be collectively referred to as in-picture prediction.
  • FIG. 4 shows a block diagram of a general structure of a video encoder.
  • FIG. 4 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers.
  • FIG. 4 illustrates a video encoder comprising a first encoder section 500 for a base layer and a second encoder section 502 for an enhancement layer.
  • Each of the first encoder section 500 and the second encoder section 502 may comprise similar elements for encoding incoming pictures.
  • the encoder sections 500, 502 may comprise a pixel predictor 302, 402, prediction error encoder 303, 403 and prediction error decoder 304, 404.
  • FIG. 4 shows a block diagram of a general structure of a video encoder.
  • FIG. 4 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers.
  • FIG. 4 illustrates a video encoder comprising a first encoder section 500 for a base layer and a second encoder section
  • the pixel predictor 302, 402 also shows an embodiment of the pixel predictor 302, 402 as comprising an interpredictor 306, 406, an intra-predictor 308, 408, a mode selector 310, 410, a filter 316, 416, and a reference frame memory 318, 418.
  • the pixel predictor 302 of the first encoder section 500 receives base layer image (s) 300 of a video stream to be encoded at both the interpredictor 306 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 308 (which determines a prediction for an image block based only on the already processed parts of current frame or picture) .
  • the output of both the inter-predictor and the intra-predictor are passed to the mode selector 310.
  • the intra-predictor 308 may have more than one intra-prediction modes. Hence, each mode may perform the intraprediction and provide the predicted signal to the mode selector 310.
  • the mode selector 310 also receives a copy of the base layer image 300.
  • the pixel predictor 402 of the second encoder section 502 receives enhancement layer image (s) 400 of a video stream to be encoded at both the inter-predictor 406 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 408 (which determines a prediction for an image block based only on the already processed parts of current frame or picture) .
  • the output of both the inter-predictor and the intra- predictor are passed to the mode selector 410.
  • the intra-predictor 408 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 410.
  • the mode selector 410 also receives a copy of the enhancement
  • the output of the inter-predictor 306, 406 or the output of one of the optional intra-predictor modes or the output of a surface encoder within the mode selector is passed to the output of the mode selector 310, 410.
  • the output of the mode selector 310, 410 is passed to a first summing device 321, 421.
  • the first summing device may subtract the output of the pixel predictor 302, 402 from the base layer image 300 or the enhancement layer image 400 to produce a first prediction error signal 320, 420 which is input to the prediction error encoder 303, 403.
  • the pixel predictor 302, 402 further receives from a preliminary reconstructor 339, 439 the combination of the prediction representation of the image block 312, 412 and the output 338, 438 of the prediction error decoder 304, 404.
  • the preliminary reconstructed image 314, 414 may be passed to the intra-predictor 308, 408 and to a filter 316, 416.
  • the filter 316, 416 receiving the preliminary representation may filter the preliminary representation and output a final reconstructed image 340, 440 which may be saved in a reference frame memory 318, 418.
  • the reference frame memory 318 may be connected to the inter-predictor 306 to be used as the reference image against which a future base layer image 300 is compared in inter-prediction operations.
  • the reference frame memory 318 may also be connected to the inter-predictor 406 to be used as the reference image against which a future enhancement layer images 400 is compared in interprediction operations. Moreover, the reference frame memory 418 may be connected to the inter-predictor 406 to be used as the reference image against which a future enhancement layer image 400 is compared in inter-prediction operations.
  • Filtering parameters from the filter 316 of the first encoder section 500 may be provided to the second encoder section 502 subject to the base layer being selected and indicated to be source for predicting the filtering parameters of the enhancement layer according to some embodiments.
  • the prediction error encoder 303, 403 comprises a transform unit 342, 442 and a quantizer 344, 444.
  • the transform unit 342, 442 transforms the first prediction error signal 320, 420 to a transform domain.
  • the transform is, for example, the DCT transform.
  • the quantizer 344, 444 quantizes the transform domain signal, for example, the DCT coefficients, to form quantized coefficients.
  • the prediction error decoder 304, 404 receives the output from the prediction error encoder 303, 403 and performs the opposite processes of the prediction error encoder 303, 403 to produce a decoded prediction error signal 338, 438 which, when combined with the prediction representation of the image block 312, 412 at the second summing device 339, 439, produces the preliminary reconstructed image 314, 414.
  • the prediction error decoder may be considered to comprise a dequantizer 346, 446, which dequantizes the quantized coefficient values, for example, DCT coefficients, to reconstruct the transform signal and an inverse transformation unit 348, 448, which performs the inverse transformation to the reconstructed transform signal wherein the output of the inverse transformation unit 348, 448 includes reconstructed block (s) .
  • the prediction error decoder may also comprise a block filter which may filter the reconstructed block (s) according to further decoded information and filter parameters.
  • the entropy encoder 330, 430 receives the output of the prediction error encoder 303, 403 and may perform a suitable entropy encoding/variable length encoding on the signal to provide error detection and correction capability.
  • the outputs of the entropy encoders 330, 430 may be inserted into a bitstream, for example, by a multiplexer 508.
  • the method and apparatus of an example embodiment may be utilized in a wide variety of systems, including systems that rely upon the compression and decompression of media data and possibly also the associated metadata.
  • the method and apparatus are configured to compress the media data and associated metadata streamed from a source via a content delivery network to a client device, at which point the compressed media data and associated metadata is decompressed or otherwise processed.
  • FIG. 5 depicts an example of such a system 510 that includes a source 512 of media data and associated metadata.
  • the source may be, in one embodiment, a server. However, the source may be embodied in other manners when desired.
  • the source is configured to stream the media data and associated metadata to the client device 514.
  • the client device may be embodied by a media player, a multimedia system, a video system, a smart phone, a mobile telephone or other user equipment, a personal computer, a tablet computer or any other computing device configured to receive and decompress the media data and process associated metadata.
  • media data and metadata are streamed via a network 516, such as any of a wide variety of types of wireless networks and/or wireline networks.
  • the client device is configured to receive structured information including media, metadata and any other relevant representation of information including the media and the metadata and to decompress the media data and process the associated metadata (e.g. for proper playback timing of decompressed media data) .
  • An apparatus 600 is provided in accordance with an example embodiment as shown in FIG. 6.
  • the apparatus of FIG. 6 may be embodied by the source 512, such as a file writer which, in turn, may be embodied by a server, that is configured to stream a compressed representation of the media data and associated metadata.
  • the apparatus may be embodied by a client device 514, such as a file reader which may be embodied, for example, by any of the various computing devices described above.
  • the apparatus of an example embodiment is associated with or is in communication with a processing circuitry 602, one or more memory devices 604, a communication interface 606, and optionally a user interface.
  • the processing circuitry 602 may be in communication with the memory device 604 via a bus for passing information among components of the apparatus 600.
  • the memory device may be non- transitory and may include, for example, one or more volatile and/or non-volatile memories.
  • the memory device may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g. , a computing device like the processing circuitry) .
  • the memory device may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present disclosure.
  • the apparatus 600 may, in some embodiments, be embodied in various computing devices as described above. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard) .
  • the structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon.
  • the apparatus may therefore, in some cases, be configured to implement an embodiment of the present disclosure on a single chip or as a single 'system on a chip' .
  • a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • the processing circuitry 602 may be embodied in a number of different ways.
  • the processing circuitry may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP) , a processing element with or without an accompanying DSP, or various other circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit) , an FPGA (field programmable gate array) , a microcontroller unit (MCU) , a hardware accelerator, a special-purpose computer chip, or the like.
  • the processing circuitry may include one or more processing cores configured to perform independently.
  • a multi-core processing circuitry may enable multiprocessing within a single physical package.
  • the processing circuitry may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
  • the processing circuitry 602 may be configured to execute instructions stored in the memory device 604 or otherwise accessible to the processing circuitry. Alternatively or additionally, the processing circuitry may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processing circuitry may represent an entity (e.g. , physically embodied in circuitry) capable of performing operations according to an embodiment of the present disclosure while configured accordingly. Thus, for example, when the processing circuitry is embodied as an ASIC, FPGA or the like, the processing circuitry may be specifically configured hardware for conducting the operations described herein.
  • the instructions may specifically configure the processing circuitry to perform the algorithms and/or operations described herein when the instructions are executed.
  • the processing circuitry may be a processor of a specific device (e.g., an image or video processing system) configured to employ an embodiment by further configuration of the processing circuitry by instructions for performing the algorithms and/or operations described herein.
  • the processing circuitry may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processing circuitry.
  • the communication interface 606 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data, including video bitstreams.
  • the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna (s) to cause transmission of signals via the antenna (s) or to handle receipt of signals received via the antenna (s) . In some environments, the communication interface may alternatively or also support wired communication.
  • the communication interface may include a communication modem and/or other hardware/ software for supporting communication via cable, digital subscriber line (DSL) , universal serial bus (USB) or other mechanisms .
  • the apparatus 600 may optionally include a user interface that may, in turn, be in communication with the processing circuitry 602 to provide output to a user, such as by outputting an encoded video bitstream and, in some embodiments, to receive an indication of a user input .
  • the user interface may include a display and, in some embodiments, may also include a keyboard, a mouse, a joystick, a touch screen, touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms.
  • the processing circuitry may comprise user interface circuitry configured to control at least some functions of one or more user interface elements such as a display and, in some embodiments, a speaker, ringer, microphone and/or the like.
  • the processing circuitry and/or user interface circuitry comprising the processing circuitry may be configured to control one or more functions of one or more user interface elements through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processing circuitry (e.g. , memory device, and/or the like) .
  • Some of the available media file format standards include ISO base media file format (ISO/IEC 14496-12, which may be abbreviated ISOBMFF) and file format for NAL unit structured video (ISO/IEC 14496- 15) , which derives from the ISOBMFF.
  • ISOBMFF ISO base media file format
  • ISO/IEC 14496- 15 file format for NAL unit structured video
  • a basic building block in the ISO base media file format is called a box.
  • Each box has a header and a payload.
  • the box header indicates the type of the box and the size of the box in terms of bytes.
  • a box may enclose other boxes, and the ISO file format specifies which box types are allowed within a box of a certain type. Furthermore, the presence of some boxes may be mandatory in each file, while the presence of other boxes may be optional. Additionally, for some box types, it may be allowable to have more than one box present in a file. Thus, the ISO base media file format may be considered to specify a hierarchical structure of boxes.
  • a file includes media data and metadata that are encapsulated into boxes. Each box is identified by a four character code (4CC) and starts with a header which informs about the type and size of the box.
  • 4CC four character code
  • the media data may be provided in a media data 'mdat' box (also called MediaDataBox) and the movie 'moov' box (also called MovieBox) may be used to enclose the metadata.
  • a media data 'mdat' box also called MediaDataBox
  • MovieBox movie 'moov' box
  • both of the 'mdat' and 'moov' boxes may be required to be present.
  • the movie 'moov' box may include one or more tracks, and each track may reside in one corresponding track 'trak' box (may also be called TrackBox) .
  • a track may be one of the many types, including a media track that refers to samples formatted according to a media compression format (and its encapsulation to the ISO base media file format) .
  • Tracks may share a particular characteristic or a particular relationship and to indicate that ISO base media file format uses TrackGroupBox comprised in TrackBox.
  • TrackGroupBox includes zero or more boxes, and the particular characteristic or the relationship is indicated by the box type of the contained boxes .
  • the contained boxes include an identifier, which may be used to conclude the tracks belonging to the same track group.
  • the tracks that include the same type of a contained box within the TrackGroupBox and have the same identifier value within these contained boxes belong to the same track group.
  • Track groups are not used to indicate dependency or relationships between tracks. Instead, the TrackReferenceBox is used for such purposes .
  • TrackGroupBox is the following:
  • ISO base media file format also allows to store non-timed objects, referred to as items, meta items, or metadata items, in a 'meta' box (also called MetaBox) . While the name of the meta box refers to metadata, items may generally include metadata or media data.
  • the MetaBox may reside at the top level of the file, within a 'moov' box (may also be called MovieBox) , and within a TrackBox, but at most one MetaBox may occur at each of the file level, movie level, or track level.
  • the MetaBox may be required to include a 'hdlr' box indicating the structure or format of the MetaBox contents.
  • the MetaBox may list and characterize any number of items that may be referred.
  • Each one of the item may be associated with a file name and may be uniquely identified with the file by item identifier (item_id) which may be an integer value.
  • item_id may be an integer value.
  • the metadata items may be, for example, stored in an 'idat' box of the MetaBox, in a MediaDataBox, reside in a separate file.
  • Items may share a particular characteristic or a particular relationship and to indicate that ISO base media file format uses an EntityToGroupBox included in a GroupsListBox .
  • An entity group is a grouping of items, which may also group tracks.
  • Entity groups are indicated in a GroupsListBox.
  • Entity groups, specified in a GroupsListBox of a file-level MetaBox refer to tracks or file-level items.
  • Entity groups, specified in a GroupsListBox of a movie-level MetaBox refer to movie-level items.
  • Entity groups, specified in a GroupsListBox of a track-level MetaBox refer to track-level items of that track.
  • GroupsListBox includes EntityToGroupBoxes , each specifying one entity group.
  • the four-character box type of EntityToGroupBox denotes a defined grouping type.
  • HTTP Hypertext Transfer Protocol
  • RTP Real-time Transport Protocol
  • UDP User Datagram Protocol
  • HTTP is easy to configure and is typically granted traversal of firewalls and network address translators (NAT) , which makes it attractive for multimedia streaming applications.
  • NAT network address translators
  • Chunked HTTP delivery enables servers to respond to an HTTP GET request in multiple parts. However, chunked HTTP delivery does not remove the inherent encoding and encapsulation delay caused by creating self-standing movie fragments. Chunked HTTP delivery is specified in IETF RFC 7230.
  • Adaptive HTTP streaming was first standardized in Release 9 of 3rd Generation Partnership Project (3GPP) packet-switched streaming (PSS) service (3GPP TS 26.234 Release 9: 'Transparent end-to-end packet-switched streaming service (PSS) ; protocols and codecs' ) .
  • 3GPP 3rd Generation Partnership Project
  • PSS packet-switched streaming
  • MPEG took 3GPP AHS Release 9 as a starting point for the MPEG DASH standard (ISO/IEC 23009-1 : 'Dynamic adaptive streaming over HTTP (DASH) -Part 1: Media presentation description and segment formats, ' International Standard, 2nd Edition, 2014) .
  • 3GPP continued to work on adaptive HTTP streaming in communication with MPEG and published 3GP-DASH (Dynamic Adaptive Streaming over HTTP; 3GPP TS 26.247: 'Transparent end-to-end packet-switched streaming Service (PSS) ; Progressive download and dynamic adaptive Streaming over HTTP (3GP-DASH) ' .
  • MPEG DASH and 3GP- DASH are technically close to each other and may therefore be collectively referred to as DASH.
  • Streaming systems like MPEG-DASH include for example HTTP Live Streaming (a.k.a. HLS) , specified in the IETF RFC 8216.
  • HTTP Live Streaming a.k.a. HLS
  • IETF RFC 8216 HTTP Live Streaming
  • the multimedia content may be stored on an HTTP server and may be delivered using HTTP.
  • the content may be stored on the server in two parts: Media Presentation Description (MPD) , which describes a manifest of the available content, its various alternatives, their URL addresses, and other characteristics; and segments, which include the actual multimedia bitstreams in the form of chunks, in a single or multiple files.
  • MPD Media Presentation Description
  • the MDP provides the necessary information for clients to establish a dynamic adaptive streaming over HTTP.
  • the MPD includes information describing media presentation, such as an HTTP-uniform resource locator (URL) of each Segment to make GET Segment request.
  • URL HTTP-uniform resource locator
  • the DASH client may obtain the MPD e.g. by using HTTP, email, thumb drive, broadcast, or other transport methods.
  • the DASH client may become aware of the program timing, media-content availability, media types, resolutions, minimum and maximum bandwidths, and the existence of various encoded alternatives of multimedia components, accessibility features and required digital rights management (DRM) , media-component locations on the network, and other content characteristics. Using this information, the DASH client may select the appropriate encoded alternative and start streaming the content by fetching the segments using e.g. HTTP GET requests. After appropriate buffering to allow for network throughput variations, the client may continue fetching the subsequent segments and monitor the network bandwidth fluctuations. The client may decide how to adapt to the available bandwidth by fetching segments of different alternatives (with lower or higher bitrates) to maintain an adequate buffer.
  • DRM digital rights management
  • a media content component or a media component may be defined as one continuous component of the media content with an assigned media component type that can be encoded individually into a media stream.
  • Media content may be defined as one media content period or a contiguous sequence of media content periods .
  • Media content component type may be defined as a single type of media content such as audio, video, or text.
  • a media stream may be defined as an encoded version of a media content component.
  • a hierarchical data model may be used to structure media presentation as follows.
  • a media presentation includes a sequence of one or more periods, each period includes one or more groups, each group includes one or more adaptation sets, each adaptation set includes one or more representations, each representation includes one or more segments.
  • a Group may be defined as a collection of adaptation sets that are not expected to be presented simultaneously.
  • An adaptation set may be defined as a set of interchangeable encoded versions of one or several media content components.
  • a representation is one of the alternative choices of the media content or a subset thereof typically differing by the encoding choice, e.g. by bitrate, resolution, language, codec, and the like.
  • the Segment includes certain duration of media data, and metadata to decode and present the included media content.
  • a Segment is identified by a URI and may typically be requested by, e.g., a HTTP GET request.
  • a segment may be defined as a unit of data associated with an HTTP- URL and optionally a byte range that are specified by an MP
  • An initialization segment may be defined as a Segment including metadata that is necessary to present the media streams encapsulated in Media Segments.
  • an initialization segment may comprise the Movie Box ('moov') which might not include metadata for any samples, e.g., any metadata for samples is provided in 'moof boxes.
  • a media segment may include certain duration of media data for playback at a normal speed, such duration is referred as media segment duration or segment duration.
  • the content producer or service provider may select the segment duration according to the desired characteristics of the service. For example, a relatively short segment duration may be used in a live service to achieve a short end- to-end latency. The reason is that Segment duration is typically a lower bound on the end-to-end latency perceived by a DASH client since a segment is a discrete unit of generating media data for DASH. Content generation is typically done such a manner that a whole Segment of media data is made available for a server. Furthermore, many client implementations use a segment as the unit for GET requests.
  • a Segment can be requested by a DASH client only when the whole duration of Media Segment is available as well as encoded and encapsulated into a Segment.
  • different strategies of selecting segment duration may be used.
  • a segment may be further partitioned into subsegments e.g. to enable downloading segments in multiple parts.
  • the subsegments may be required to include complete access units.
  • the subsegments may be indexed by a segment index box, which includes information to map presentation time range and byte range for each subsegment.
  • the segment index box may also describe subsegments and stream access points in the segment by signaling their durations and byte offsets.
  • a DASH client may use the information obtained from segment index box(es) to make a HTTP GET request for a specific subsegment using byte range HTTP request .
  • segment duration e.g., a segment of duration > 7, 000 milliseconds
  • subsegments may be used to keep the size of HTTP responses reasonable and flexible for bitrate adaptation.
  • the indexing information of a segment may be put in the single box at the beginning of that segment or spread among many indexing boxes in the segment. Different methods of spreading are possible, such as hierarchical, daisy chain, and hybrid. This technique may avoid adding a large box at the beginning of the segment and therefore may prevent a possible initial download delay .
  • DASH supports rate adaptation by dynamically requesting media segments from different representations within an adaptation set to match varying network bandwidth.
  • coding dependencies within the representation have to be taken into account.
  • a representation switch may only happen at a random access point (RAP) , which is typically used in video coding techniques such as H.264/AVC.
  • RAP random access point
  • SAP stream access point
  • an SAP is specified as a position in a representation that enables playback of a media stream to be started using only the information included in representation data starting from that position onwards (preceded by initializing data in the initialization segment, when present) .
  • representation switching may be performed in SAP.
  • DASH provides a preselection concept which allows to group a subset of media component in a bundle that are expected to be consumed jointly.
  • a bundle is a set of media components which may be consumed jointly by a single decoder instance.
  • Elements are addressable and separable components of a bundle and may be selected or deselected dynamically by the application, either directly or indirectly by using preselections.
  • Media components are mapped to adaptation sets by either a one-to-one mapping or by the inclusion of multiple media components in a single adaptation set.
  • representations in one adaptation set may include multiple media components that are multiplexed on elementary stream level or on file container level. In the multiplexing case each media component is mapped to a Media Content component.
  • Each media component in the bundle is therefore identified and referenced by the '@id' of a Media Content component, or, when only a single media component is included in the adaptation set, by the '@id' of an adaptation set.
  • Each bundle includes a main media component that includes the decoder specific information and bootstraps the decoder.
  • the adaptation set that includes the main media component is referred to as main adaptation set.
  • the main media component shall always be included in any Preselection that is associated to a bundle.
  • each bundle may include one or multiple partial adaptation sets. The partial adaptation sets may only be processed in combination with the main adaptation set.
  • a Preselection defines a subset of media component in a bundle that are expected to be consumed jointly.
  • a preselection is identified by a unique tag towards the decoder. Multiple preselection instances can refer to the same set of streams in a bundle. Only media components of the same bundle can contribute to the decoding and rendering of a preselection.
  • An end-to-end system for DASH may be described as follows .
  • the media content is provided by an origin server, which may be a conventional web (HTTP) server.
  • the origin server may be connected with a content delivery network (CDN) over which the streamed content is delivered to and stored in edge servers.
  • CDN content delivery network
  • the MPD allows signaling of multiple base URLs for the content, which may be used to announce the availability of the content in different edge servers.
  • the content server may be directly connected to the Internet.
  • Web proxies may reside on the path of routing the HTTP traffic between the DASH clients and the origin or edge server from which the content is requested. The web proxies cache HTTP messages and hence can serve clients' requests with the cached content.
  • DASH clients are connected to the Internet through an access network, such as a mobile cellular network.
  • the mobile network may comprise mobile edge servers or mobile edge cloud, operating similarly to a CDN edge server and/or web proxy.
  • the ISO/IEC SC29 WG3 is working on specifying additional signaling to represent the MPEG DASH preselection concept on ISOBMFF level. This is in line with the preselection requirements described in exploration on Multi-stream Support for CMAF (WG03N0213_20275) . This metadata may be used to construct manifests for DASH and HLS when parsing ISOBMFF.
  • Various embodiments provide method and/or apparatus to: o obtain a plurality of media bitstreams with a known dependency, relation, or association; o encapsulate the plurality of timed media bitstreams as a plurality of timed media tracks in a file; o indicate that the plurality timed media tracks form a track group; o create a track group entry information structure indicating the dependency or relation between the tracks of the track group and/or describing track group characteristics.
  • Some examples of the track group characteristics include, but are not limited to, a priority of preselection, a label, a language, and an audio rendering indication; o associate the track group entry information structure with the track group; o store the track group entry information structure and the tracks in a file; and/or o provide or transmit the file to another entity, which can use the track group entry information structure to understand the dependency or relation between the plurality of timed media tracks of the track group and/or the track group characteristics.
  • Parsers have fast access to some or all information about groups ; o In case of a preselection, a parser can decide based on one box which one to select; o No duplication of data as the information is stored in one place and this results in reduced size of the file; and/or o When tracks are added or removed, a parser needs to update information only in a single place and hence file editing become simpler.
  • Various embodiments provide a method and apparatus that create a number of tracks, groups the tracks based on their characteristics, and provides a group entry information that may be used to understand the dependency or relation between the plurality of timed media tracks and/or the track group characteristics by another entity.
  • another entity may be a DASH/HLS packager that may translate the data provided by group entry information in ISOBMFF file into the outgoing manifest format.
  • another entity may be a media player that may expose the group entry information in ISOBMFF file, to a user, to allow selection of content to consume.
  • the media player based on the group information entry may generate the appropriate action on the decoder of the selected content by the user.
  • the tracks in the track group may be one version of a media presentation which is made available for user selection.
  • the tracks in the track group may be one version of a media presentation which is selected by an entity by an automatic process.
  • the tracks in a track group may belong to a version of media presentation.
  • the said track group may additionally carry information whether the track group is the default track group among multiple track groups of the media presentation, which is made available for selection.
  • the tracks within the said track group may be simultaneously or substantially simultaneously decoded and/or simultaneously or substantially simultaneously presented, for example, in a media presentation session.
  • a TrackGroupDescriptionBox may be created directly under the Movie box, 'moov' .
  • TrackGroupDescriptionBox may be used to store the description of the dependency or relation between the tracks of a given track group and/or the track group characteristics.
  • the TrackGroupDescriptionBox may be part of the MovieHeaderBox.
  • the presence of TrackGroupDescriptionBox in the MovieHeaderBox may be signalled either by defining a new version of the MovieHeaderBox (e.g., version 2) or by setting a specific bit of the flag in the MovieHeaderBox (e.g., 0x000001) .
  • the TrackGroupDescriptionBox may be a companion to the MovieHeaderBox, where the TrackGroupDescriptionBox may follow the MovieHeaderBox within the MovieBox. In an embodiment, the TrackGroupDescriptionBox may immediately follow the MovieHeaderBox within the MovieBox
  • At most one TrackGroupDescriptionBox is present within the MovieBox.
  • the description of the dependency or relation between the tracks of a track group and/or the track group characteristics in TrackGroupDescriptionBox may be provided through a TrackGroupEntryBox .
  • the entry may be uniquely identified through track_group_entry_type and track_group_id .
  • TrackGroupEntryBox may be extended from a Box or a FullBox, as depi cted in the table below :
  • one or more TrackGroupEntryBox with a unique track_group_entry_type may be present within the TrackGroupDescriptionBox .
  • TrackGroupTypeBox in a given track, may be uniquely assigned to one TrackGroupEntryBox when both boxes (the TrackGroupTypeBox and the TrackGroupEntryBox) include the corresponding t rack group type and track group entry type and when both the boxes include the same track group id . .
  • a file format signalling provides a hint or information to a parser .
  • TrackGroupTypeBox may signal that a given track group type may be associated with TrackGroupEnt ryBox in a TrackGroupDescriptionBox .
  • a version field of TrackGroupTypeBox may indicate that the track group includes associated TrackGroupEntryBox, e . g .
  • a version equal to 1 in TrackGroupTypeBox may indicate that the t rack group has associated TrackGroupEnt ryBox
  • o a flags field of TrackGroupTypeBox may indi cate that the track group includes associated TrackGroupEntryBox, e . g . , flags equal to 0x000001 in TrackGroupTypeBox may indicate that the track group has associated TrackGroupEntryBox .
  • a PreselectionTrackGroupEntryBox is defined to provide information about the group that may be used by a parser to create an MPD document for adaptive HTTP streaming, e.g., DASH or HLS.
  • FIG. 7 is an example apparatus 700, which may be implemented in hardware, configured to implement mechanisms for tracking group entry information and/or processing track group entry information, based on the examples described herein.
  • the apparatus 700 comprises a processor 702, at least one non-transitory memory 704 including computer program code 705, wherein the at least one memory 704 and the computer program code 705 are configured to, with the at least one processor 702, cause the apparatus to implement mechanisms for tracking group entry information and/or processing track group entry information 706.
  • the apparatus 700 optionally includes a display 708 that may be used to display content during rendering.
  • the apparatus 700 optionally includes one or more network (NW) interfaces (I/F(s) ) 710.
  • NW network interfaces
  • the NW I/F (s) 710 may be wired and/or wireless and communicate over the Internet/other network (s) via any communication technique.
  • the NW I/F(s) 710 may comprise one or more transmitters and one or more receivers.
  • the N/W I/F(s) 710 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de) modulator, and encoder/decoder circuitry ( les ) and one or more antennas .
  • the apparatus 700 may be a remote, virtual or cloud apparatus.
  • the apparatus 700 may be either a coder or a decoder, or both a coder and a decoder.
  • the at least one memory 704 may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the at least one memory 704 may comprise a database for storing data.
  • the apparatus 700 need not comprise each of the features mentioned, or may comprise other features as well .
  • the apparatus 700 may correspond to or be another embodiment of the apparatus 50 shown in FIG. 1 and FIG. 2, or any of the apparatuses shown in FIG. 3.
  • the apparatus 700 may correspond to or be another embodiment of the apparatuses shown in FIG. 10, including UE 110, RAN node 170, or network element (s) 190.
  • FIG. 8 is an example method 800 for tracking group entry information, in accordance with an embodiment.
  • the apparatus 700 includes means, such as the processing circuitry 702 or the like, for implementing mechanisms for tracking group entry information.
  • the method 800 includes indicating that the plurality of timed media tracks comprise a track group.
  • the method 800 includes creating a track group entry information structure for indicating a dependency or relation between at least one of the plurality of timed media tracks of the track group or describing track group characteristics.
  • the method 800 includes associating the track group entry information structure with the track group.
  • the method 800 includes storing the track group entry information structure and the plurality of timed media tracks in the file.
  • the method 800 may further include obtaining a plurality of timed media bitstreams with a known dependency or relation and encapsulating the plurality of timed media bitstreams as a plurality of timed media tracks in a file.
  • the method 800 may further include providing or transmitting the file to an entity.
  • the entity may use the track group entry information structure to understand the dependency or relation between at least one of the plurality of timed media tracks of the track group or the track group characteristics.
  • the entity may be a packager (e.g., a DASH/HLS) that may translate the data provided by group entry information in the file (e.g., an ISOBMFF file) into the outgoing manifest format.
  • the entity may be a media player that may expose the group entry information in the file (e.g., the ISOBMFF file) , to a user, to allow selection of a content to consume. The media player based on the group information entry may generate the appropriate action on the decoder of the selected content by the user.
  • FIG. 9 is an example method 900 for processing track group entry information, in accordance with an embodiment.
  • the apparatus 700 includes means, such as the processing circuitry 702 or the like, for implementing mechanisms for processing track group entry information.
  • the method 900 includes receiving a file, wherein an entity uses a track group entry information structure stored in the file to understand a dependency or relation between at least one of a plurality of timed media tracks of a track group or track group characteristics.
  • the method 900 includes wherein the plurality of timed media tracks are indicated to comprise the track group.
  • the method 900 includes wherein the track group entry information structure is created to indicate the dependency or relation between at least one of the plurality of timed media tracks of the track group or describe the track group characteristics.
  • the method 900 includes wherein the track group entry information structure is associated with the track group.
  • the method 900 includes wherein the plurality of timed media tracks are stored in the file.
  • the method 900 includes decoding or presenting the plurality of tracks within the track group in a media session.
  • the method may further include, wherein wherein a plurality of timed media bitstreams are obtained with a known dependency or relation, and wherein the plurality of timed media bitstreams are encapsulated as the plurality of timed media tracks in the file.
  • FIG. 10 shows a block diagram of one possible and non-limiting system in which the example embodiments may be practiced.
  • a user equipment (UE) 110 radio access network (RAN) node 170, and network element (s) 190 are illustrated.
  • the user equipment (UE) 110 is in wireless communication with a wireless network 100.
  • a UE is a wireless device that can access the wireless network 100.
  • the UE 110 includes one or more processors 120, one or more memories 125, and one or more transceivers 130 interconnected through one or more buses 127.
  • Each of the one or more transceivers 130 includes a receiver, Rx, 132 and a transmitter, Tx, 133.
  • the one or more buses 127 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like.
  • the one or more transceivers 130 are connected to one or more antennas 128.
  • the one or more memories 125 include computer program code 123.
  • the UE 110 includes a module 140, comprising one of or both parts 140-1 and/or 140-2, which may be implemented in a number of ways.
  • the module 140 may be implemented in hardware as module 140-1, such as being implemented as part of the one or more processors 120.
  • the module 140- 1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array.
  • the module 140 may be implemented as module 140-2, which is implemented as computer program code 123 and is executed by the one or more processors 120.
  • the one or more memories 125 and the computer program code 123 may be configured to, with the one or more processors 120, cause the user equipment 110 to perform one or more of the operations as described herein.
  • the UE 110 communicates with
  • RAN node 170 via a wireless link 111.
  • the RAN node 170 in this example is a base station that provides access by wireless devices such as the UE 110 to the wireless network 100.
  • the RAN node 170 may be, for example, a base station for 5G, also called New Radio (NR) .
  • the RAN node 170 may be a NG- RAN node, which is defined as either a gNB or an ng-eNB.
  • a gNB is a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to a 5GC (such as, for example, the network element (s) 190) .
  • the ng-eNB is a node providing E-UTRA user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC.
  • the NG-RAN node may include multiple gNBs, which may also include a central unit (CU) (gNB-CU) 196 and distributed unit (s) (DUs) (gNB-DUs) , of which DU 195 is shown.
  • CU central unit
  • DUs distributed unit
  • DU distributed unit
  • the DU may include or be coupled to and control a radio unit (RU) .
  • the gNB-CU is a logical node hosting radio resource control (RRC) , SDAP and PDCP protocols f51of the gNB or RRC and PDCP protocols of the en-gNB that controls the operation of one or more gNB-DUs.
  • RRC radio resource control
  • the gNB-CU terminates the Fl interface connected with the gNB-DU.
  • the Fl interface is illustrated as reference 198, although reference 198 also illustrates a link between remote elements of the RAN node 170 and centralized elements of the RAN node 170, such as between the gNB-CU 196 and the gNB-DU 195.
  • the gNB-DU is a logical node hosting RLC, MAC and PHY layers of the gNB or en-gNB, and its operation is partly controlled by gNB-CU.
  • One gNB- CU supports one or multiple cells.
  • One cell is supported by only one gNB-DU.
  • the gNB-DU terminates the Fl interface 198 connected with the gNB-CU.
  • the DU 195 is considered to include the transceiver 160, for example, as part of a RU, but some examples of this may have the transceiver 160 as part of a separate RU, for example, under control of and connected to the DU 195.
  • the RAN node 170 may also be an eNB (evolved NodeB) base station, for LTE (long term evolution) , or any other suitable base station or node.
  • eNB evolved NodeB
  • the RAN node 170 includes one or more processors 152, one or more memories 155, one or more network interfaces (N/W I/F(s) ) 161, and one or more transceivers 160 interconnected through one or more buses 157.
  • Each of the one or more transceivers 160 includes a receiver, Rx, 162 and a transmitter, Tx, 163.
  • the one or more transceivers 160 are connected to one or more antennas 158.
  • the one or more memories 155 include computer program code 153.
  • the CU 196 may include the processor (s) 152, memories 155, and network interfaces 161.
  • the DU 195 may also include its own memory/memories and processor ( s ) , and/or other hardware, but these are not shown.
  • the RAN node 170 includes a module 150, comprising one of or both parts 150-1 and/or 150-2, which may be implemented in a number of ways.
  • the module 150 may be implemented in hardware as module 150- 1, such as being implemented as part of the one or more processors 152.
  • the module 150-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array.
  • the module 150 may be implemented as module 150-2, which is implemented as computer program code 153 and is executed by the one or more processors 152.
  • the one or more memories 155 and the computer program code 153 are configured to, with the one or more processors 152, cause the RAN node 170 to perform one or more of the operations as described herein.
  • the functionality of the module 150 may be distributed, such as being distributed between the DU 195 and the CU 196, or be implemented solely in the DU 195.
  • the one or more network interfaces 161 communicate over a network such as via the links 176 and 131.
  • Two or more gNBs 170 may communicate using, for example, link 176.
  • the link 176 may be wired or wireless or both and may implement, for example, an Xn interface for 5G, an X2 interface for LTE, or other suitable interface for other standards .
  • the one or more buses 157 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, wireless channels, and the like.
  • the one or more transceivers 160 may be implemented as a remote radio head (RRH) 195 for LTE or a distributed unit (DU) 195 for gNB implementation for 5G, with the other elements of the RAN node 170 possibly being physically in a different location from the RRH/DU, and the one or more buses 157 could be implemented in part as, for example, fiber optic cable or other suitable network connection to connect the other elements (for example, a central unit (CU) , gNB-CU) of the RAN node 170 to the RRH/DU 195.
  • Reference 198 also indicates those suitable network link(s) .
  • each cell performs functions, but it should be clear that equipment which forms the cell may perform the functions.
  • the cell makes up part of a base station. That is, there can be multiple cells per base station. For example, there could be three cells for a single carrier frequency and associated bandwidth, each cell covering one-third of a 360 degree area so that the single base station' s coverage area covers an approximate oval or circle.
  • each cell can correspond to a single carrier and a base station may use multiple carriers. So when there are three 120 degree cells per carrier and two carriers, then the base station has a total of 6 cells.
  • the wireless network 100 may include a network element or elements 190 that may include core network functionality, and which provides connectivity via a link or links 181 with a further network, such as a telephone network and/or a data communications network (for example, the Internet) .
  • a further network such as a telephone network and/or a data communications network (for example, the Internet) .
  • core network functionality for 5G may include access and mobility management function (s) (AMF(S) ) and/or user plane functions (UPF (s) ) and/or session management function (s) (SMF(s) ) .
  • AMF(S) access and mobility management function
  • UPF user plane functions
  • SMF(s) session management function
  • Such core network functionality for LTE may include MME (Mobility Management Entity) /SGW (Serving Gateway) functionality.
  • MME Mobility Management Entity
  • SGW Serving Gateway
  • the RAN node 170 is coupled via a link 131 to the network element 190.
  • the link 131 may be implemented as, for example, an NG interface for 5G, or an SI interface for LTE, or other suitable interface for other standards.
  • the network element 190 includes one or more processors 175, one or more memories 171, and one or more network interfaces (N/W I/F (s) ) 180, interconnected through one or more buses 185.
  • the one or more memories 171 include computer program code 173.
  • the one or more memories 171 and the computer program code 173 are configured to, with the one or more processors 175, cause the network element 190 to perform one or more operations.
  • the wireless network 100 may implement network virtualization, which is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network.
  • Network virtualization involves platform virtualization, often combined with resource virtualization.
  • Network virtualization is categorized as either external, combining many networks, or parts of networks, into a virtual unit, or internal, providing network-like functionality to software containers on a single system. Note that the virtualized entities that result from the network virtualization are still implemented, at some level, using hardware such as processors 152 or 175 and memories 155 and 171, and also such virtualized entities create technical effects.
  • the computer readable memories 125, 155, and 171 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the computer readable memories 125, 155, and 171 may be means for performing storage functions.
  • the processors 120, 152, and 175 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on a multi-core processor architecture, as non-limiting examples.
  • the processors 120, 152, and 175 may be means for performing functions, such as controlling the UE 110, RAN node 170, network element (s) 190, and other functions as described herein .
  • the various embodiments of the user equipment 110 may include, but are not limited to, cellular telephones such as smart phones, tablets, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions .
  • PDAs personal digital assistants
  • portable computers having wireless communication capabilities
  • image capture devices such as digital cameras having wireless communication capabilities
  • gaming devices having wireless communication capabilities
  • music storage and playback appliances having wireless communication capabilities
  • Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions .
  • modules 140-1, 140-2, 150-1, and 150-2 may be configured to implement mechanisms for tracking group entry information and/or processing track group entry information based on the examples described herein.
  • Computer program code 173 may also be configured to implement mechanisms for tracking group entry information and/or processing track group entry information based on the examples described herein.
  • FIGs. 8 and 9 include flowcharts of an apparatus (e.g., 50, 600, or 700) , method, and computer program product according to certain example embodiments. It will be understood that each block of the flowchart (s) , and combinations of blocks in the f lowchart ( s ) , may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory (e.g.
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g. , hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flowchart blocks.
  • These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an arti cle of manufacture , the execution of which implements the function specified in the flowchart blocks .
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer- implemented proces s such that the inst ructions which execute on the computer or other programmable apparatus provide operat ions for implement ing the functions specified in the flowchart blocks .
  • a computer program product is therefore defined in those instances in which the computer program instruct ions , such as computer-readable program code port ions , are stored by at least one non-trans itory computer-readable storage medium with the computer program instructions , such as the computer-readable program code portions , being configured, upon execution, to perform the functions described above, such as in conj unction with the flowchart ( s ) of FIGs . 8 and 9 .
  • the computer program instructions such as the computer-readable program code port ions , need not be stored or otherwi se embodied by a non-t rans itory computer-readable storage medium, but may, instead, be embodied by a t ransitory medium with the computer program instructions , such as the computer-readable program code portions , st ill being configured, upon execution, to perform the functions described above .
  • blocks of the flowchart s support combinations of means for performing the specified functions and combinat ions of operations for performing the specified functions for performing the specified functions . It wil l al so be understood that one or more blocks of the flowcharts , and combinations of blocks in the flowchart s , may be implemented by special purpose hardware-based computer systems which perform the specified functions , or combinations of special purpose hardware and computer instructions .
  • certain ones of the operations above may be modified or further ampl ified .
  • additional optional operations may be included. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Divers modes de réalisation de la présente invention concernent un appareil, un procédé et un produit-programme d'ordinateur donnés à titre d'exemple. L'appareil comprend : au moins un processeur ; et au moins une mémoire non transitoire comprenant un code de programme informatique ; la ou les mémoires et le code de programme informatique étant configurés pour, avec le ou les processeurs, amener l'appareil au moins à effectuer les opérations suivantes : indiquer que la pluralité de pistes multimédias synchronisées comprend un groupe de pistes ; créer une structure d'informations d'entrée de groupe de pistes pour indiquer une dépendance ou une relation entre la pluralité de pistes multimédias synchronisées du groupe de pistes et/ou décrire des caractéristiques de groupe de pistes ; associer la structure d'informations d'entrée de groupe de pistes au groupe de pistes ; et stocker la structure d'informations d'entrée de groupe de pistes et la pluralité de pistes multimédias synchronisées dans le fichier.
PCT/IB2023/052013 2022-04-07 2023-03-03 Procédé et appareil pour informations d'entrée de groupe de pistes WO2023194816A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263362620P 2022-04-07 2022-04-07
US63/362,620 2022-04-07

Publications (1)

Publication Number Publication Date
WO2023194816A1 true WO2023194816A1 (fr) 2023-10-12

Family

ID=85703449

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2023/052013 WO2023194816A1 (fr) 2022-04-07 2023-03-03 Procédé et appareil pour informations d'entrée de groupe de pistes

Country Status (1)

Country Link
WO (1) WO2023194816A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200245041A1 (en) * 2017-10-12 2020-07-30 Canon Kabushiki Kaisha Method, device, and computer program for generating timed media data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200245041A1 (en) * 2017-10-12 2020-07-30 Canon Kabushiki Kaisha Method, device, and computer program for generating timed media data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Transparent end-to-end packet-switched streaming Service (PSS); Progressive download and dynamic adaptive Streaming over HTTP (3GP-DASH", 3GPP TS 26.247
"Transparent end-to-end packet-switched streaming service (PSS); protocols and codecs", 3GPP TS 26.234
STEPHAN SCHREINER ET AL: "Signaling of Preselections in ISOBMFF", no. m58088, 6 October 2021 (2021-10-06), XP030298790, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/136_OnLine/wg11/m58088-v1-m58088-ISOBMFF-Preselections.zip m58088-ISOBMFF-Preselections.docx> [retrieved on 20211006] *
Y-K WANG (BYTEDANCE): "[6.1][ISOBMFF] On preselection signalling in ISOBMFF", no. m58904, 12 January 2022 (2022-01-12), XP030299641, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/137_OnLine/wg11/m58904-v1-m58904-v1_preselectioninFF.zip m58904-v1_preselection in FF.docx> [retrieved on 20220112] *

Similar Documents

Publication Publication Date Title
KR102089457B1 (ko) 비디오 코딩 및 디코딩을 위한 장치, 방법 및 컴퓨터 프로그램
KR102613593B1 (ko) 필수 및 비필수 비디오 보충 정보의 시그널링
RU2746934C9 (ru) Межуровневое предсказание для масштабируемого кодирования и декодирования видеоинформации
US12034973B2 (en) Apparatus and a method for performing a file encapsulation and decapsulation process for a coded video bitstream
TWI676387B (zh) 多層位元流之檔案中之參數集信令
KR101949071B1 (ko) 이미지 코딩 및 디코딩을 위한 장치, 방법 및 컴퓨터 프로그램
EP2596632B1 (fr) Fourniture de jeux de données de séquence pour des données de contenu vidéo sur internet
US20240022787A1 (en) Carriage and signaling of neural network representations
WO2016185090A1 (fr) Appareil, procédé et programme d&#39;ordinateur destinés au codage et au décodage vidéo
KR20190104026A (ko) 비디오를 위한 개선된 제약 스킴 설계
WO2021205061A1 (fr) Appareil, procédé et programme informatique de codage et de décodage vidéo
US12022129B2 (en) High level syntax and carriage for compressed representation of neural networks
US12068007B2 (en) Method, apparatus and computer program product for signaling information of a media track
WO2023194816A1 (fr) Procédé et appareil pour informations d&#39;entrée de groupe de pistes
CN116601963A (zh) 生成/接收包括nal单元阵列信息的媒体文件的方法和装置及发送媒体文件的方法
US20240205422A1 (en) Method and apparatus for signaling of regions and region masks in image file format
EP4266690A1 (fr) Appareil, procédé et programme informatique pour codage et décodage vidéo
WO2024079718A1 (fr) Appareil et procédé d&#39;intégration d&#39;informations d&#39;amélioration supplémentaires post-filtre de réseau de neurones avec un format de fichier multimédia de base iso
US20240338343A1 (en) Handling tracks in multiple files
WO2023203423A1 (fr) Procédé et appareil de codage, de décodage, ou d&#39;affichage d&#39;incrustation d&#39;image
CN116569557A (zh) 支持以样本为单位的随机访问的媒体文件生成/接收方法和设备及发送媒体文件的方法
CN116325766A (zh) 生成/接收包含层信息的媒体文件的方法和设备及媒体文件传送方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23711789

Country of ref document: EP

Kind code of ref document: A1