WO2022257567A1 - 一种媒体数据的处理方法及相关设备 - Google Patents

一种媒体数据的处理方法及相关设备 Download PDF

Info

Publication number
WO2022257567A1
WO2022257567A1 PCT/CN2022/083960 CN2022083960W WO2022257567A1 WO 2022257567 A1 WO2022257567 A1 WO 2022257567A1 CN 2022083960 W CN2022083960 W CN 2022083960W WO 2022257567 A1 WO2022257567 A1 WO 2022257567A1
Authority
WO
WIPO (PCT)
Prior art keywords
track
time
domain
field
value
Prior art date
Application number
PCT/CN2022/083960
Other languages
English (en)
French (fr)
Inventor
胡颖
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP22819172.2A priority Critical patent/EP4354868A1/en
Priority to US18/072,975 priority patent/US12034947B2/en
Publication of WO2022257567A1 publication Critical patent/WO2022257567A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234327Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format

Definitions

  • This application relates to the field of computer technology, in particular to the processing of media data.
  • the time-domain layer division technology is supported, and in the system layer encapsulation technology, different time-domain layers in the track are also indicated.
  • system layer encapsulation techniques only the scenario of encapsulating a video bit stream with one track is considered.
  • the embodiments of the present application provide a media data processing method and related equipment, which can improve the flexibility when encapsulating different time domain levels.
  • the description data box includes time-domain track indication information
  • the time-domain track indication information is used to indicate the track packaging mode of the N time-domain levels
  • the time-domain track indication information includes time-domain level information of the time-domain level encapsulated by the j-th track, where j is a positive integer, and j ⁇ M;
  • each media frame included in the media data determines the time domain level of each media frame, and obtain N media frames of the time domain level; wherein, N is a positive integer greater than 1;
  • the description data box of the j-th track in the M tracks includes time-domain track indication information
  • the The time-domain track indication information is used to indicate the track packaging mode of the N time-domain levels
  • the time-domain track indication information includes the time-domain level information of the time-domain level of the j-th track package, where M is greater than A positive integer of 1.
  • an embodiment of the present application provides an apparatus for processing media data, where the media data includes multiple media frames, the multiple media frames are divided into N time-domain levels, and the multiple media frames are encapsulated into In M tracks, wherein M and N are both positive integers greater than 1; the device includes:
  • An acquisition unit configured to acquire a description data box of the j-th track in the M tracks, the description data box includes time-domain track indication information, and the time-domain track indication information is used to indicate the N time-domain Hierarchical track encapsulation, the time-domain track indication information includes time-domain level information of the time-domain level of the j-th track encapsulation, where j is a positive integer, and j ⁇ M;
  • a processing unit configured to decode the media data according to the time-domain track indication information.
  • the embodiment of the present application provides another device for processing media data, including:
  • a determining unit configured to determine the time-domain level of each media frame according to the inter-frame dependency of each media frame included in the media data, to obtain N media frames of the time-domain level; wherein, N is a positive number greater than 1 integer;
  • a processing unit configured to encapsulate the N time-domain-level media frames into M tracks respectively, and generate a corresponding description data box; the description data box includes time-domain track indication information, and the time-domain track indication The information is used to indicate the track packaging mode of the N time-domain levels, and the time-domain track indication information includes the time-domain level information of the time-domain level of the j-th track package, where M is a positive integer greater than 1 .
  • an embodiment of the present application provides a computer device, including:
  • a processor adapted to implement one or more instructions
  • the memory stores one or more instructions, and the one or more instructions are suitable for being loaded by the processor and executing the media data processing method of the above aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program executes the method for processing media data in the above aspect.
  • an embodiment of the present application provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the method for processing media data in the above aspects.
  • the content generation device is supported to encapsulate multi-time-domain-level media frames into different tracks, and during the encapsulation process, the time-domain track indication information will be recorded in the description data box of each track, so as to realize the association Tracks of different time domain levels indicate the time domain level information in the track, mark the reference track, and indicate the strategy information for the combination of multi-time domain level tracks. Then, the content consumption device can select the appropriate time-domain level according to the description data box in the track, and combine the samples of different tracks for decoding and presentation, thus ensuring the flexibility of multi-track packaging and saving decoding calculations to the greatest extent. resource.
  • FIG. 1 shows a flow chart of video processing provided by an exemplary embodiment of the present application
  • FIG. 2 shows a schematic flowchart of a processing method for media data provided by an exemplary embodiment of the present application
  • Fig. 3 shows a schematic diagram of a coding unit provided by an exemplary embodiment of the present application
  • FIG. 4 shows a schematic diagram of division of time domain levels provided by an exemplary embodiment of the present application
  • FIG. 5 shows a schematic flowchart of a method for processing media data provided by an exemplary embodiment of the present application
  • Fig. 6a shows a schematic diagram of a multi-time-domain multi-track packaging method provided by an exemplary embodiment of the present application
  • Fig. 6b shows a schematic diagram of a multi-time-domain multi-track encapsulation method provided by an exemplary embodiment of the present application
  • FIG. 7 shows a schematic flowchart of a method for processing media data provided by an exemplary embodiment of the present application
  • Fig. 8 shows a schematic structural diagram of a media data processing device provided by an exemplary embodiment of the present application
  • Fig. 9 shows a schematic structural diagram of a device for processing media data provided by an exemplary embodiment of the present application.
  • Fig. 10 shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • media data refers to the composite data formed by media data such as text, graphics, images, sounds, animations, and moving images that are interrelated in content.
  • media data mainly includes audio data composed of sound, and video data composed of images and sounds, and in the embodiment of this application, mainly taking media data as video data as an example, the data processing process of media data is carried out
  • audio data refers to the embodiment of the present application.
  • the media data processing process involved in the embodiment of the present application mainly includes media data collection, media data encoding, media data file encapsulation, media data file transmission, media data decoding and final data presentation, and the media data is video data , then, the complete processing process for video data can be shown in Figure 1, which can specifically include: video capture, video encoding, video file packaging, video transmission, video file decapsulation, video decoding and final video presentation.
  • Video capture is used to convert analog video into digital video and save it in the format of digital video files. That is to say, video capture can convert video signals into binary digital information, and the binary information converted from video signals is a A binary data stream, the binary information can also be called the code stream or bit stream (Bitstream) of the video signal, and video coding is to convert the original video format file into another video format file through compression technology.
  • the generation of video media content is mentioned, including real scenes captured by cameras and screen content scenes generated by computers. From the perspective of video signal acquisition methods, video signals can be divided into those captured by cameras and those captured by computers. The two methods of generation, due to the different statistical characteristics, the corresponding compression coding methods may also be different.
  • the modern mainstream video coding technology is based on the international video coding standard HEVC (High Efficiency Video Coding, the international video coding standard HEVC/H. 265), VVC (versatile video coding, international video coding standard VVC/H.266), and Chinese national video coding standard AVS (Audio Video Coding Standard, Chinese national video coding standard AVS), or AVS3 (produced by the AVS standard group
  • HEVC High Efficiency Video Coding
  • VVC/H.266 versatile video coding
  • VVC/H.266 versatile video coding standard VVC/H.266)
  • Chinese national video coding standard AVS Adudio Video Coding Standard, Chinese national video coding standard AVS
  • AVS3 produced by the AVS standard group
  • the third-generation video coding standard adopts the hybrid coding framework, and performs the following series of operations and processing on the input original video signal, as shown in Figure 2:
  • Block partition structure The input image (such as a media frame in video data) is divided into several non-overlapping processing units according to the size of one, and each processing unit will perform similar compression operations.
  • This processing unit is called CTU (Coding Tree Unit, coding tree unit), or LCU (Largest Coding Unit, maximum coding unit).
  • the coding tree unit is generally divided from the largest coding unit to the bottom, and then the CTU can be further divided into finer divisions to obtain one or more basic coding units, which are called CU (Coding Unit, coding unit).
  • CU Coding Unit, coding unit
  • Each CU is the most basic element in an encoding process. Described below are various encoding modes that may be adopted for each CU, wherein the relationship between the LCU (or CTU) and the CU may be shown in FIG. 3 .
  • Predictive coding Including intra-frame prediction and inter-frame prediction, the original video signal is predicted by the selected reconstructed video signal to obtain the residual video signal.
  • the encoding end needs to select the most suitable one among many possible predictive encoding modes for the current CU, and inform the decoding end.
  • Intra (picture) Prediction The predicted signal comes from the area that has been coded and reconstructed in the same image.
  • the predicted signal comes from an already coded image (called a reference image) that is different from the current image.
  • Transform&Quantization The residual video signal is transformed into the transform domain by DFT (Discrete Fourier Transform, discrete Fourier transform), DCT (discrete cosine transform, a subset of DFT) and other transformation operations , called transformation coefficients.
  • DFT Discrete Fourier Transform, discrete Fourier transform
  • DCT discrete cosine transform, a subset of DFT
  • transformation coefficients other transformation operations , called transformation coefficients.
  • the signal in the transform domain is further subjected to a lossy quantization operation to lose certain information, so that the quantized signal is conducive to compressed expression.
  • the encoding end also needs to select one of the transformation methods for the current encoding CU, and inform the decoding end.
  • the fineness of quantization is usually determined by the quantization parameter (Quantization Parameter, QP).
  • QP quantization Parameter
  • the larger value of QP means that coefficients with a larger value range will be quantized to the same output, so it usually brings greater distortion, and Lower code rate; on the contrary, the QP value is smaller, which means that the coefficients with a smaller range of values will be quantized to the same output, so it usually brings smaller distortion and corresponds to a higher code rate.
  • Entropy Coding or Statistical Coding The quantized transform domain signal will be statistically compressed and encoded according to the frequency of occurrence of each value, and finally output a binary (0 or 1) compressed code stream. At the same time, encoding generates other information, such as selected modes, motion vectors, etc., which also require entropy encoding to reduce the bit rate.
  • Statistical coding is a lossless coding method that can effectively reduce the bit rate required to express the same signal.
  • Common statistical coding methods include variable length coding (VLC, Variable Length Coding) or context-based binary arithmetic coding (CABAC, Content Adaptive Binary Arithmetic Coding).
  • Loop Filtering The coded image can be reconstructed and decoded after dequantization, inverse transformation and prediction compensation operations (reverse operation of 2 ⁇ 4 above). Compared with the original image, the reconstructed image has some information different from the original image due to the influence of quantization, resulting in distortion (Distortion). Perform filtering operations on the reconstructed image, such as deblocking, SAO (Sample Adaptive Offset, sample point adaptive compensation) or ALF (Adaptive Loop Filter, adaptive loop filter) and other filters, which can effectively reduce quantization The degree of distortion produced. Since these filtered reconstructed images will be used as references for subsequent encoded images to predict future signals, the above filtering operation is also called loop filtering and filtering operations in the encoding loop.
  • SAO Sample Adaptive Offset, sample point adaptive compensation
  • ALF Adaptive Loop Filter, adaptive loop filter
  • Figure 2 shows the basic flow of the video encoder.
  • Figure 2 takes the kth CU (marked as S k [x, y]) as an example to illustrate, where k is the current A positive integer of the number of CUs in the image, S k [x, y] represents the pixel with the coordinates [x, y] in the kth CU, x represents the abscissa of the pixel, y represents the middle coordinate of the pixel, S k [x, y] obtains the prediction signal after one of the better processing in motion compensation or intra prediction, etc. Sk [x,y] and Subtract to obtain the residual signal U k [x, y], and then transform and quantize the residual signal U k [x, y].
  • the quantized output data has two different places: one is to send an entropy encoder Entropy coding is performed, and the coded code stream is output to a buffer (buffer) for storage and waiting to be transmitted; another application is to perform inverse quantization and inverse transformation to obtain the signal U' k [x, y]. Combine the signal U' k [x, y] with A new prediction signal S* k [x, y] is obtained by adding, and S* k [x, y] is sent to the buffer of the current image for storage.
  • S* k [x, y] undergoes intra-image prediction to obtain f(S* k [x, y]), S* k [x, y] obtains S' k [x, y] after loop filtering, And S' k [x, y] is sent to the decoded image buffer for saving, so as to generate the reconstructed video.
  • S' k [x, y] is obtained after motion-compensated prediction S' r [x+m x , y+m y ], S' r [x+m x , y+m y ] represents the reference block, m x and m y denote the horizontal and vertical components of the motion vector, respectively.
  • Video file encapsulation refers to compressing the encoded and compressed video and audio according to the encapsulation format (or container, or file container). A certain format is stored in a file.
  • Common packaging formats include AVI format (Audio Video Interleaved, audio and video interleaved format) or ISOBMFF (ISO Based Media File Format, media files based on ISO (International Standard Organization, International Organization for Standardization) standards format), wherein, ISOBMFF is the encapsulation standard of media files, and the most typical ISOBMFF file is MP4 (Moving Picture Experts Group 4, Moving Picture Experts Group 4) file, wherein, the main improvement point of the embodiment of the present application is also aimed at the ISOBMFF data box of.
  • the audio code stream and the video code stream are encapsulated in a file container according to a file format such as ISOBMFF to form a package file.
  • one A wrapper file consists of multiple samples. That is to say, in the encapsulation process of media files, a media frame is usually encapsulated as a sample to generate an encapsulation file, and when the media data is video media, the media frame is a media frame, and in the media When the data is audio media, the media frame is an audio frame, that is, the package file for video media includes multiple media frames, and the package file for audio media includes multiple audio frames.
  • the media data is video media
  • a sample in the package file is a media frame in the video media as an example for illustration.
  • the encapsulated file will be transmitted to the user terminal through video, and the user terminal can present the final video content in the user terminal after reverse operations such as decapsulation and decoding.
  • the encapsulated file can be sent to the user terminal through a transmission protocol, such as DASH (dynamic adaptive streaming over HTTP, HTTP-based dynamic adaptive streaming is a kind of adaptive bit rate streaming technology), which is carried out by using DASH Transmission can enable high-quality streaming media to be transmitted through the Internet through traditional HTTP web servers.
  • a transmission protocol such as DASH (dynamic adaptive streaming over HTTP, HTTP-based dynamic adaptive streaming is a kind of adaptive bit rate streaming technology)
  • media segment information is described by MPD (media presentation description, media presentation description signaling in DASH), and in DASH, one or more A combination of media components, such as a video file with a certain resolution can be regarded as a Representation (representative), and multiple Representations included can be regarded as an Adaptation Set (a collection of video streams), and a DASH can contain one or Multiple Adaptation Sets.
  • MPD media presentation description, media presentation description signaling in DASH
  • the file decapsulation process of the user terminal is opposite to the above-mentioned file encapsulation process, and the user terminal can decapsulate the encapsulated file according to the file format requirements at the time of encapsulation to obtain an audio code stream and a video code stream.
  • the decoding process of the user terminal is also opposite to the encoding process, and the user terminal can decode the audio code stream to restore the audio content.
  • the decoding end for each CU, after the decoder obtains the compressed code stream, it first performs entropy decoding to obtain various mode information and quantized transform coefficients. Each coefficient is inversely quantized and inversely transformed to obtain a residual signal.
  • the predicted signal corresponding to the CU can be obtained, and after the two are added together, the reconstructed signal can be obtained. Finally, the reconstructed value of the decoded image needs to be loop filtered to generate the final output signal.
  • the video coding technology also involves a time-domain layering technology, which can divide different media frames into different time-domain levels according to the dependency relationship during decoding. Specifically, using the time-domain layering technology The division of the time domain level is divided into low-level media frames.
  • the arrows indicate the dependencies during decoding, from I 0 frame to B 1 frame
  • the arrow in the sign indicates that frame B 1 needs to refer to frame I 0 for decoding, that is, the decoding of frame B 1 must depend on the decoding of frame I 0 , and the relationship between other frames can be deduced by analogy.
  • the types of media frames mainly include I frames (Intra Slice, intra-frame strips), B frames, and P frames.
  • I frames are also called key frames, which belong to intra-frame compression, and only It is only necessary to refer to the information of the I frame itself.
  • the B frame is a bidirectional predictive coding frame.
  • the P frame is a forward predictive coding frame. That is, the P frame needs to refer to the information of the previous related frame to decode it, and the Arabic numeral subscripts added under the I frame, B frame, and P frame in Figure 4 are used to indicate their corresponding time domain levels .
  • the media frame does not depend on the high time-domain level when decoding, so it can be understood that the media frame in the lowest time-domain level (such as the above-mentioned L0 time-domain level) will not depend on belonging to any other time-domain level when decoding. , that is, the media frames belonging to the lowest time domain level can be independently decoded and displayed, then the media frames classified into the lowest time domain level must include I frames.
  • the media frames belonging to the low time domain level do not need to refer to the media frames of the high time domain level when decoding, as shown in Figure 4, it is assumed that one of the media frames in the video data includes L0 ⁇
  • the four time-domain levels of L3, and the arrows in Figure 4 are used to indicate the dependencies of each media frame during decoding, that is to say, the arrow from frame I0 to frame B1 indicates that B at the time - domain level of L1 Frame 1 needs to refer to the I 0 frame at the L0 time domain level when decoding, and the B 1 frame at the L1 time domain level needs to refer to the P 0 frame at the L0 time domain level when decoding, which is the first in the L2 time domain level
  • the B2 frame needs to refer to the I 0 frame at the L0 time domain level when decoding, and the B1 frame at the L1 time domain level, and the second B2 frame at the L2 time domain level needs to refer to the L1 time domain when decoding
  • the second B 3 frame at the L3 time domain level needs to refer to the first B 2 frame at the L2 time domain level, and the B 1 frame at the L1 time domain level, which is at The third B 3 frame at the L3 time domain level needs to refer to the B 1 frame at the L1 time domain level when decoding, and the second B 2 frame at the L2 time domain level, and the fourth B at the L3 time domain level
  • the decoding of the 3 frames needs to refer to the second B 2 frame at the L2 time domain level and the P 0 frame at the L0 time domain level.
  • the existing AVS3 video coding technology can support time-domain layer division technology (or called time-domain layering technology).
  • time-domain layering technology different time-domain layers in the track are also indicated.
  • Existing technologies that support temporal layering when encapsulating media frames, can indicate the number of temporal layers in the video stream corresponding to the track of each media frame through the temporal layer number field (temporal_layer_num) in the encapsulation file.
  • temporal layer of each media frame in the video code stream corresponding to the track of the encapsulation target media frame will be indicated through the temporal layer identification field (temporal_layer_id).
  • a track refers to a series of samples with time attributes according to the encapsulation method of the ISO base media file format (ISOBMFF), such as a video track, which is generated by encoding each frame of a video encoder
  • the code stream is obtained after encapsulation according to the ISOBMFF specification.
  • the existing AVS3 decoder configuration information (i.e. description data box) record provides decoder configuration information for the AVS3 encoding method, and the decoding configuration information can be represented by configuration information 1, which is specifically as follows:
  • bit(6)reserved '111111'b;//Reserved fields, general fields require an integer number of bytes, so reserved bits (bits) need to be used to make up
  • the processing method of the media data proposed in the embodiment of the present application can support the multi-track file encapsulation aimed at the time-domain hierarchical division technology in AVS3 encoding.
  • the specific implementation steps are as follows:
  • the content generation device can determine the time domain level for different media frames according to the inter-frame dependencies between the media frames of the video data;
  • the video bit stream is encapsulated into multiple different tracks, and the specific time domain level information contained in the track is indicated in each file track, including the time domain level id and the time domain level correspondence frame rate and bit rate information, etc.;
  • the content consumption device where the user is located can select one or more tracks corresponding to the required time domain levels according to its own device capabilities and the policy information combined with multiple time domain levels, and decapsulate these different tracks and reconstructed into a bitstream for decoding. Finally, the purpose of flexibly selecting file tracks and saving decoding computing resources is achieved.
  • the embodiment of the present application adds several descriptive fields at the system layer, taking the form of extending the existing ISOBMFF data box as an example, and defines related fields to support the multi-track file packaging technology of AVS3 time-domain hierarchical division, as follows
  • the media data processing method proposed in the embodiment of the present application will be described in detail, wherein the media data processing method can be executed by any content consumption device that consumes media content. It can be understood that the content consumption device includes media content.
  • the terminal device where the consumer user is located, wherein the media data includes a plurality of media frames, the plurality of media frames are divided into N time domain levels, and the plurality of media frames are encapsulated into M tracks In , both M and N are positive integers greater than 1.
  • the method may specifically include:
  • the time-domain track indication information is used to indicate the track packaging mode of N time-domain levels, and the time-domain track indication information includes the time-domain level information of the time-domain level of the j-th track package, where j is a positive integer, and j ⁇ M.
  • the description data box obtained by the content consumption device is generated based on the encoding and encapsulation of the media data by the content generation device.
  • the media frames at the domain level are encapsulated into a plurality of different tracks, wherein the media data includes a plurality of media frames, and the plurality of media frames are divided into N time domain levels, then the plurality of media frames can be encapsulated into M orbitals, where M and N are both positive integers.
  • the tracks that encapsulate the multiple media frames can be divided into reference tracks and non-reference tracks according to the characteristics of the media frames encapsulated in the tracks when they are decoded, where the reference track refers to
  • the media frames encapsulated in this track can be decoded independently, that is, the media frames encapsulated in the benchmark track will not refer to the media frames in any other track when decoding, so it can be understood that when the media data is video data, the benchmark
  • the media frames encapsulated in the track must include I frames, then based on the above, the media frames classified into the lowest temporal level must also include I frames, that is to say, the temporal level encapsulated in the reference track must include the lowest temporal level .
  • the media data is video data
  • the content generation device when the content generation device needs to send the video data to the user side for consumption and display, it can first determine The time domain level, and then according to the time domain level, the video bit stream can be encapsulated into multiple different tracks, and the specific time domain level information is indicated in each file track through the description data box, then correspondingly, on the user side.
  • a plurality of media frames included in one piece of media data belong to N time-domain levels respectively
  • the content generation device encapsulates the media frames belonging to N time-domain levels into one or more tracks, Generate the corresponding description data box in each track, so that the content consumption device (such as the terminal device on the user side) can determine the encapsulation method of the content generation device for the media frames belonging to the N time domain levels based on the record of the description data box , and further select an appropriate time-domain layer media frame for decoding and displaying.
  • the description data box is a multi-track file packaging technology that supports VAS3 time-domain hierarchical division by adding time-domain orbit information to the existing ISOBMFF data box. It can be understood that in the ISOBMFF data Adding the time-domain orbit information in the box includes expanding and adding one or more related fields in the ISOBMFF data box.
  • the relevant fields expanded in the description data box can be shown as configuration information 2, and the configuration information 2 is specifically as follows:
  • the fields included in the time-domain track indication information included in the description data box are respectively the multi-temporal track identification field (multi_temporal_track_flag), the total temporal layer number field (total_temporal_layer_num), and the temporal layer number field in the configuration information 2 above.
  • temporal layer identification field temporary_layer_id[i]
  • frame rate field frame_rate_code[i]
  • low bit rate field temporary_bit_rate_lower[i]
  • high bit rate field temporary_bit_rate_upper[i]
  • base_track_flag base track flag field
  • track_ID[i] track flag field
  • priority decoding presentation field is_output_track_flag[i]
  • alternate track field is_alternative_track_flag[i]
  • alternate track flag field alternate track flag field
  • the temporal layer number field (temporal_layer_id[i]), frame rate field (frame_rate_code[i]), low bit rate field (temporal_bit_rate_lower[i]) and high bit rate field (temporal_bit_rate_upper[i]) in the above mentioned fields ]) is used to indicate the specific time-domain layer information in the corresponding track (such as the above-mentioned j-th track).
  • the multi-temporal_track_flag field (multi_temporal_track_flag) is used to indicate the N temporal-level track encapsulation methods of media data.
  • the track encapsulation methods include: multi-track encapsulation and single-track encapsulation.
  • the multi-time-domain track identification field is used to indicate that multiple media frames belonging to N time-domain levels are encapsulated into multiple different tracks
  • the multi-time-domain track identification field is the second value
  • the multi-time-domain track identifier field is used to indicate that multiple media frames belonging to N time-domain levels are packed into a single track.
  • the first value may be 1, and the second value may be 0.
  • Temporal_layer_num field (temporal_layer_num) is used to indicate the number of temporal layers contained in the current track (that is, the aforementioned j-th track).
  • the value of the time-domain layer number field is greater than 1, that is, the j-th track encapsulates multiple time-domain layers equal to the value of the time-domain layer number field, or, in multiple time-domain tracks
  • the value of the identification field is 1, that is, one time-domain level of the media data is encapsulated into the jth track, and the content consumption device can further read the value of the relevant field from the description data box when decoding, so that Decode and display according to the value of each related field, wherein the description data box is a 'tlin' type data box.
  • the description data box in the jth track includes the specific temporal layer information of the jth track, and the temporal layer information includes a temporal layer identification field (temporal_layer_id[i]), the temporal layer
  • the field is used to indicate the ID (Identity document, a unique identity) of a single temporal domain level, where one of the N temporal domain levels corresponds to a temporal_layer_id, that is, the temporal layer identification field is available
  • the level identifier of the i-th time-domain level In the time-domain hierarchy indicating the j-th track package, the level identifier of the i-th time-domain level.
  • the time-domain level information in the jth track also includes a frame rate field (frame_rate_code[i]) and a code rate
  • the frame rate field is used to indicate that the media frames belonging to the i-th time-domain level (time domain).
  • the code rate information is used to indicate the code rate accumulated to the media frame belonging to the i-th time domain level (the instant domain level is equal to temporal_layer_id[i])
  • the code rate information Including the low bit rate field temporary_bit_rate_lower[i]
  • the low bit rate field is used to indicate the lower 18 bits of the bit rate accumulated to the media frame belonging to the i-th time domain level.
  • bit rate information also includes the high Code rate field (temporal_bit_rate_upper[i]), the high code rate field is used to indicate the upper 12 bits of the code rate accumulated to the media frame belonging to the i-th time domain level.
  • the content of the multi-time-domain track identification field in the description data box is 0, it indicates that when multiple media frames belonging to N time-domain levels in the media data are encapsulated into a single track, the content
  • the consumer device can determine the frame rate and bit rate corresponding to each time domain level by reading the values in the time domain level identification field, frame rate field, low bit rate field, and high bit rate field, so that the content
  • the consumer device can combine its own decoding performance to select media frames belonging to part or all of the time domain levels for decoding and display.
  • the value of the multi-time-domain track identification field is 1, it means that multiple media frames belonging to N time-domain levels in the media data are encapsulated into multiple different tracks.
  • the content consumption device reads the values in the frame rate field, low bit rate field, and high bit rate field to determine the corresponding Before the frame rate and bit rate, the content consumption device also needs to read the values of some other fields, where the content consumption device also needs to read these fields include some or all of the fields mentioned in the following 1-4:
  • base track flag field (base_track_flag) included in the time domain track indication information.
  • the reference track identification field is used to indicate whether the jth track is a reference track; when the reference track identification field is the first value, the reference track identification field is used to indicate that the jth track is a reference track; when the reference track identification field is the second When the value is a value, the reference track identification field is used to indicate that the jth track is a non-reference track; wherein, the media frame encapsulated in the reference track is independently decoded, wherein the first value can be 1, and the second value can be 0.
  • the frame rate and code rate when accumulating the media frames belonging to each time-domain level are recorded in the reference In the track, that is to say, only the frame rate field, low bit rate field and high bit rate field in the description data box of the reference track have values, while in the description data box of the non-base track, the frame rate field , low bit rate field and high bit rate field are empty.
  • the content consumption device if the content consumption device reads the value of the reference track identification field as 1 from the description data box of the jth track, then it indicates that the jth track is the reference track, and further, the content consumption The device can also read the values of the frame rate field, low bit rate field, and low bit rate field from the description data box of the jth track, so as to determine the corresponding frame rate and bit rate when each time domain level is accumulated. Rate.
  • Total temporal layer number field (total_temporal_layer_num).
  • the total number of time-domain layers field is used to indicate the total number of time-domain layers corresponding to all tracks contained in the current file, that is, to indicate the total number of time-domain layers encapsulated in M tracks.
  • the content generation device determines the value of the temporal layer identification field (temporal_layer_id[i]) in the description data box of the reference track When set to a value, an identifier for each temporal layer will be recorded in the base track's description data box based on the total temporal layer number field.
  • the index type identification field is used to define the index relationship between the reference track (or track of the reference time domain level) and the non-reference track (or track of the high time domain level) when the multi-track package is used at the multi-time domain level, wherein the reference time domain
  • the level track is the track containing the lowest time-domain level ID, and there is only one reference time-domain level track in a file, and the other tracks containing each time-domain level are tracks of higher time-domain levels.
  • a high temporal-level track shall be indexed via a TrackReferenceBox to the base temporal-level track on which its decoding depends.
  • TrackReferenceTypeBoxes the corresponding track index type data box
  • the TrackReferenceTypeBoxes data box indicates the current reference track (or called the reference time-domain level track) through track_IDs, where , the index between the non-reference track and the reference track is identified by the corresponding index type identification field (reference_type) in TrackReferenceTypeBoxes, which is defined as:
  • the track being indexed is the reference time domain hierarchy track.
  • the content consumption device reads the value of the reference track identification field as 0 from the description data box of the jth track, it means that the jth track is a non-reference track, and in the jth track If the track is a non-reference track, the jth track also includes a track index data box, and the track index data box includes a track index type data box; the track index type data box includes a track identification field and an index type identification field; the track identification field is used to store The identifier of the reference track, the index type identifier field is used to indicate that the indexed track is a reference track.
  • the content consumption device when the jth track is a non-reference track, since the values of the frame rate field, the low bit rate field and the high bit rate field are all empty in the description data box of the non-reference track, so, in If the jth track is a non-reference track, the content consumption device will not be able to read the values of the frame rate field, low bit rate field, and high bit rate field through the description data box of the jth track, and thus cannot determine the The frame rate and code rate accumulated to each time domain level, then, in this case, the content consumption device can be identified by the index type in the track index type data box included in the track index data box in the jth track field, index from the non-reference track (that is, the jth track) into the reference track, and read the values of the above-mentioned frame rate field, low bit rate field and high bit rate field from the reference track.
  • the track combination strategy information included in the time domain track indication information includes the track identification field (track_ID[i]), the priority decoding presentation field (is_output_track_flag[i]), the alternative track field (is_alternative_track_flag[i]), and an alternate track identification field (alternate_track_ID).
  • the track identification field is used to indicate the identification (ID) of the track including part of the time-domain hierarchy, and one track in the M tracks corresponds to one track_ID.
  • the priority decoding presentation field is used to indicate whether the jth track (that is, the current track) is a track for priority decoding presentation; when the priority decoding presentation field is the first value, the priority decoding presentation field is used to indicate that the jth track is a priority decoding presentation track; when the priority decoding presentation field is the second value, the priority decoding presentation field is used to indicate that the jth track is not a track for priority decoding presentation; wherein, the first value may be 1, and the second value may be 0.
  • the alternative track field is used to indicate whether the jth track (that is, the current track) is an alternative track of one of the M tracks; when the alternative track field is the first value, the alternative track field is used to indicate that the jth track is M An alternative track of one of the tracks; when the alternative track field is a second value, the alternative track field is used to indicate that the jth track is not an alternative track; wherein, the first value may be 1, and the second value may be 0.
  • the alternative track identification field is used to indicate the identification of a track replaced by the jth track (that is, the current track).
  • the content generation device is encapsulating the multiple media frames included in the media data can be divided into the following two situations:
  • media frames belonging to N temporal hierarchies can be packed into a single track.
  • the content consumption device obtains the package file for the media data, it can read the level identification field, frame rate field, low
  • the code rate field and the high code rate field determine the level identifier of the i-th time domain level among the N time domain levels, as well as the corresponding frame rate and code rate, so that the content consumption device can combine its own decoding capabilities, Part or all of the media frames at the time domain level are selected for decoding, and then S502 can be executed.
  • media frames belonging to N time-domain levels can be encapsulated into multiple different tracks.
  • the content generation device uses multi-track encapsulation to encapsulate media frames, it will record each track in the reference track
  • the strategy of combining each time domain level encapsulated in each time domain level, and the frame rate and bit rate of the media frame corresponding to each time domain level are accumulated, and other tracks are indexed into the reference track through the index type identification field, then the content consumption device will Based on the relevant information recorded in the reference track and combined with its own decoding capability, some or all of the media frames can be selected for decoding, that is, go to S502.
  • the content consumption device After the content consumption device acquires the description data box in the jth track, it will decode the media data according to the time domain track indication information in the description data box. Specifically, the content consumption device can decode the media data according to the time domain track indication information And the decoding performance of the decoding device, reserve the time domain level matching the decoding performance among the N time domain levels, and decode and display the media frame of the reserved time domain level.
  • the content consumption device reserves the time-domain level that matches the decoding performance among the N time-domain levels according to the time-domain track indication information and the decoding performance of the decoding device
  • the time-domain track indication information includes multiple time-domain track identification field, time domain level identification field, frame rate field and code rate information
  • code stream information includes low code rate field and high code rate field
  • the content consumption device can read the multi-time The value of the domain track identification field, when the read multi-time domain track identification field is the second value, it indicates that the media frames of N time domain levels are encapsulated into a single track, and the value of the time domain level identification field is read, the frame rate field, and the value of the low bit rate field and the value of the high bit rate field in the bit rate information; so that the content consumption device can identify the value of the field according to the time domain level, the value of the frame rate field, and the low bit rate field in the bit rate information
  • the process of decoding and consumption by the content consumption device is as follows:
  • the content generation device encodes and encapsulates the video content A. Assume that the video content A has three time-domain levels L0-L2, and the media frames of any time-domain level belonging to the three time-domain levels are encapsulated into one track and accumulated to For each time domain level, the corresponding frame rate and bit rate are as follows:
  • the content generation device sends the video file A to user 1 and user 2 respectively according to the request of the content consumption device, and user 1 and user 2 respectively receive the corresponding file A, according to the frame rate and bit rate corresponding to each time domain level in the track information, decoding consumption. Specifically:
  • the decoding performance of the content consumption device where user 1 is located is relatively good, and he chooses to keep all the media frames from L0 to L2 for decoding and presentation to obtain the best viewing effect.
  • the performance of the decoding device of the content consumption device where user 2 is located is poor, so he chooses to keep L0 All media frames of L1-L2 can be discarded, and only the reserved media frames belonging to L0 can be decoded and presented.
  • the content consumption device reserves the time domain level that matches the decoding performance among the N time domain levels according to the time domain track indication information and the decoding performance of the decoding device
  • the content consumption device can read The value of the multi-time-domain track identification field in the domain track indication information, when the multi-time-domain track identification field read by the content consumption device is the first value, indicates that media frames of N time-domain levels are encapsulated into multiple different tracks, When there is no overlapping of time domain levels in each track, the content consumption device can read the value of the time domain level identification field, the value of the frame rate field, and the value of the low bit rate field and the high bit rate field in the bit rate information from the reference track.
  • the value of the code rate field; the media frame encapsulated in the reference track is decoded independently, and then according to the value of the time domain level identification field read from the reference track, the value of the frame rate field, and the value of the low code rate field in the code rate information Value and the value of the high rate field, together with the decoding performance of the decoding device, preserves the temporal hierarchy in the track partially or fully matching the decoding performance.
  • the content generation device encapsulates the media frames of the video data (or video content) belonging to N time-domain levels into multiple different tracks, and the time domains of the tracks do not overlap, and the content consumption device decodes and consumes
  • the process can be shown in Figure 6a, specifically as follows:
  • the content generation device encodes and encapsulates the video content A. It is assumed that the video content A has three time-domain levels L0-L2, and the media frames belonging to these three time-domain levels are respectively encapsulated into three different tracks. Among them, rack1 is The reference tracks, track2 and track3 are indexed to track1 with type 'tlrf'. In the reference track, when the indication is accumulated to each time domain level, the corresponding frame rate and bit rate are as follows:
  • the content generation device sends the video file A to user 1 and user 2 respectively according to the request of the content consumption device, and user 1 and user 2 respectively receive the corresponding file A, according to the frame rate and bit rate corresponding to each time domain level in the track information, decoding consumption. Specifically:
  • the decoding device of the content consumption device where user 1 is located has better performance, and chooses to keep all the media frames of track1 to track3 for decoding and presentation to obtain the best viewing effect.
  • the performance of the decoding device of the content consumption device where user 2 is located is poor, so he chooses to keep track1 All media frames of track2 and track3 can be discarded, and only the reserved media frames belonging to track1 can be decoded and presented.
  • the content consumption device can read the value of each field in the track combination strategy information from the reference track, and based on the value of each field in the track combination strategy information and the decoding performance of the decoding device , retaining part or all of the time-domain hierarchy in the track that matches the decoding performance, wherein the time-domain track indication information also includes track combination strategy information, and the track combination strategy information includes a track identification field, a priority decoding presentation field, an alternative track field, and Overrides the track ID field.
  • the content generation device encapsulates media frames of video data (or video content) belonging to N time-domain levels into multiple different tracks, and the time-domain levels of each track overlap , then the process of decoding and consumption by the content consumption device can be as follows:
  • the content generation device encodes and encapsulates the video content A. It is assumed that the video content A has three time-domain levels L0-L2, and the media frames belonging to these three time-domain levels are respectively encapsulated into three different tracks.
  • rack1 is Reference track
  • track2 and track3 are indexed to track1 with 'tlrf' type
  • Track2 and track3 each contain a part of L1 and L2 media frames, and do not overlap each other
  • the decoding of track2 and track3 depends on track1, but there is no dependency between track2 and track3 relation.
  • In the base track indicate the information of the individual tracks when combined:
  • the content generation device sends the video file A to user 1 and user 2 respectively according to the request of the content consumption device, and user 1 and user 2 respectively receive the corresponding file A, according to the frame rate and bit rate corresponding to each time domain level in the track information, decoding consumption. Specifically:
  • the decoding device of the content consumption device where user 1 is located has better performance, and chooses to keep all the media frames of track1 to track3 for decoding and presentation to obtain the best viewing effect.
  • the performance of the decoding device of the content consumption device where user 2 is located is poor, so he chooses to keep track1 and all media frames of track2, and discard the media frames of track3, and decode the media frames of track1 for presentation.
  • the content consumption device when the number of reserved media frames is one or more, and the content consumption device decodes and displays the reserved media frames at the time domain level, it may, according to each of the reserved one or more media frames For the decoding time of the media frame, the one or more reserved media frames are reordered (that is, reconstructed) according to the decoding time, and then the reordered one or more media frames can be decoded and displayed. That is to say, when combining media frames of different tracks, the content consumption device arranges all media frames in the selected multiple tracks according to the decoding time according to the decoding time corresponding to each media frame during encapsulation, and decodes after reconstruction.
  • the content generation device is supported to encapsulate multi-time-domain-level media frames into different tracks, and during the encapsulation process, the time-domain track indication information will be recorded in the description data box of each track, so as to realize the association Tracks of different time domain levels indicate the time domain level information in the track, mark the reference track, and indicate the strategy information for the combination of multi-time domain level tracks. Then, the content consumption device can select the appropriate time-domain level according to the description data box in the track, and combine the samples of different tracks for decoding and presentation, thus ensuring the flexibility of multi-track packaging and saving decoding calculations to the greatest extent. resource.
  • FIG. 7 is a schematic flowchart of a media data processing method provided in the embodiment of the present application. As shown in FIG. 7, the method may include:
  • S702 Encapsulate N time-domain-level media frames into M tracks respectively, and generate corresponding description data boxes; the description data box of the j-th track in the M tracks includes time-domain track indication information, and time-domain track indication The information is used to indicate the track packaging modes of the N time-domain levels, and the time-domain track indication information includes the time-domain level information of the j-th track-packaged time-domain level, where M is a positive integer greater than 1.
  • the inter-frame dependencies of media frames in the media data may be shown in FIG. 4, and then the time-domain level of each media frame may be determined based on the inter-frame dependencies.
  • the time-domain track indication information includes a multi-time-domain track identification field, and the multi-time-domain track identification field is used to indicate the track encapsulation methods of N time-domain levels.
  • the content generation device generates the j-th track according to the encapsulation process of the media frame
  • the value of the multi-time-domain track identification field is generated as the first value; and if the content When the generating device encapsulates multiple media frames of N time-domain levels into a single track, then the value of the generated multi-time-domain track identification field is the second value.
  • time-domain track indication information also includes the field of the total time-domain layer number, so when the content generation device generates the description data box of the j-th track according to the media frame packaging process, it will also package the time-domain data box according to the M tracks.
  • the total number of layers generate the value of the total time domain layer number field.
  • the time-domain layer information of the jth track includes a time-domain layer number field
  • the content generation device when it generates the description data box of the jth track according to the encapsulation process of the media frame, it will The number of encapsulated time-domain layers, which generates the value of the number of time-domain layers field.
  • the time-domain level information of the j-th track includes a time-domain level identification field, a frame rate field, and code rate information.
  • the level identification of the i-th time-domain level can be stored in the time-domain level identification field, and the frame rate when accumulating to the media frame belonging to the i-th time-domain level is stored into the frame rate field, and the code rate accumulated to the media frame belonging to the i-th time domain level is used as the code rate information.
  • the code rate information includes a low code rate field and a high code rate field; then, when the content generation device uses the code rate accumulated to the media frame belonging to the i-th time domain level as the code rate information, The low 18 bits of the code rate of the media frame belonging to the i-th time domain level can be stored in the low code rate field, and the high 12 bits of the code rate of the media frame belonging to the i-th time domain level can be stored in the low code rate field, Stored in the high bit rate field.
  • the time-domain track indication information also includes a reference track identification field, so when the content generation device generates the description data box of the jth track according to the media frame encapsulation process, if the jth track is a reference track, then generate the reference track identification field The value of is the first value, and if the jth track is a non-reference track, the value of the generated reference track identification field is the second value; wherein, the media frames encapsulated in the reference track are independently decoded.
  • the time-domain track indication information also includes track combination strategy information
  • the track combination strategy information includes a track identification field, a priority decoding presentation field, a substitute track field, and a substitute track identification field; , when generating the description data box of the jth track, the identifier of the track containing part of the time-domain level can be stored in the track identifier field, and the jth track is the track that is preferentially decoded and presented, and the value of the preferentially decoded presentation field is generated The value is the first value.
  • the value of the generated priority decoding presentation field is the second value; and, if the jth track is an alternative track of a track, an alternative track is generated
  • the value of the field is the first value, and the identifier of a track replaced by the jth track is stored in the substitute track identification field; if the jth track is not a substitute track, the value of the generated substitute track field is the second value , where the first value may be 1, and the second value may be 0.
  • the content generating device will generate a track index data box of the jth track, the track index data box includes a track index type data box; the track index type data box includes a track An identification field and an index type identification field.
  • the content generation device may store the identification of the reference track in the track identification field, and index the jth track to the reference track according to the index type identification field.
  • the content generation device can determine the time-domain level of each media frame through the inter-frame dependency relationship between the media frames included in the media data, and then can separate the N time-domain level media frames into Encapsulate into M tracks, and based on the encapsulation process of the media frame, generate the description data box of the jth track, and in the description data box, set corresponding values for the fields included in the time domain track indication information, so as to pass
  • the value of each field associates tracks of different time domain levels, indicates the time domain level information in the track, marks the reference track, and indicates the strategy of combining multiple time domain level tracks, so as to realize the indication content consumption device, the content generation
  • the media frame encapsulation process of the device enables the content consumption device to select the appropriate time-domain level for decoding and presentation according to the values of each field in the track description data box, ensuring the flexibility of multi-track encapsulation and enabling Maximize the saving of decoding computing resources.
  • FIG. 8 shows a schematic structural diagram of a media data processing device provided by an exemplary embodiment of the present application
  • the media data processing device may be a computer program (including program code), for example, the media data processing device may be an application software in the content consumption device.
  • the apparatus for processing media data may include: an obtaining unit 801 and a processing unit 802 .
  • the media data processing device can be used to perform corresponding steps in the method shown in FIG. 5;
  • the media data includes a plurality of media frames, and the plurality of media frames are divided into N
  • the multiple media frames are encapsulated into M tracks, where M and N are both positive integers greater than 1; then:
  • the acquiring unit 801 is configured to acquire the description data box of the j-th track in the M tracks, the description data box includes time domain track indication information, and the time domain track indication information is used to indicate that the N time domain A track encapsulation method at the domain level, the time domain track indication information includes time domain level information at the time domain level of the jth track encapsulation, where j is a positive integer, and j ⁇ M;
  • the processing unit 802 is configured to decode the media data according to the time domain track indication information.
  • the time-domain orbit indication information includes a multi-time-domain orbit identification field, and the multi-time-domain orbit identification field is used to indicate the orbit packaging methods of the N time-domain levels;
  • the multi-time-domain track identification field is used to indicate that multiple media frames belonging to the N time-domain levels are encapsulated into multiple different tracks;
  • the multi-time-domain track identification field is used to indicate that multiple media frames belonging to the N time-domain levels are encapsulated into a single track.
  • the time-domain track indication information includes a field of total number of time-domain layers; the field of total number of time-domain layers is used to indicate the total number of time-domain layers encapsulated by the M tracks.
  • the time-domain layer information of the jth track includes a time-domain layer number field, and the time-domain layer number field is used to indicate the number of time-domain layers encapsulated by the j-th track.
  • the time domain level information of the jth track includes a time domain level identification field, a frame rate field and code rate information
  • the time-domain level identification field is used to indicate the level identification of the i-th time-domain level among the time-domain levels encapsulated in the j-th track;
  • the frame rate field is used to indicate the frame rate accumulated to the media frame belonging to the i-th time domain level
  • the code rate information is used to indicate the code rate accumulated to the media frames belonging to the i-th time domain level.
  • the code rate information includes a low code rate field and a high code rate field
  • the low code rate field is used to indicate the low 18 bits of the code rate accumulated to the media frame belonging to the i-th time domain level
  • the high code rate field is used to indicate the upper 12 bits of the code rate accumulated to the media frame belonging to the i-th time domain level.
  • the time-domain orbit indication information includes a reference orbit identification field; the reference orbit identification field is used to indicate whether the jth orbit is a reference orbit;
  • the reference track identification field is used to indicate that the jth track is a reference track; when the reference track identification field is the second value, the reference track identification The field is used to indicate that the jth track is a non-reference track;
  • the media frames encapsulated in the reference track are independently decoded.
  • the time-domain track indication information further includes track combination strategy information
  • the track combination strategy information includes a track identification field, a priority decoding presentation field, a replacement track field, and a replacement track identification field;
  • the track identification field is used to indicate the identification of the track containing part of the time domain hierarchy
  • the priority decoding presentation field is used to indicate whether the jth track is a track for priority decoding presentation; when the priority decoding presentation field is the first value, the priority decoding presentation field is used to indicate the jth track The track is a track for priority decoding presentation; when the priority decoding presentation field is a second value, the priority decoding presentation field is used to indicate that the jth track is not a track for priority decoding presentation;
  • the alternative track field is used to indicate whether the jth track is an alternative track of one of the M tracks; when the alternative track field is the first value, the alternative track field is used to indicate the The jth track is a substitute track for one of the M tracks; when the substitute track field is a second value, the substitute track field is used to indicate that the jth track is not a substitute track;
  • the replacement track identification field is used to indicate the identification of a track replaced by the jth track.
  • the jth track if the jth track is a non-reference track, the jth track further includes a track index data box, and the track index data box includes a track index type data box;
  • the track index type data box includes a track identification field and an index type identification field
  • the track identification field is used to store the identification of the reference track
  • the index type identification field is used to indicate that the indexed track is the reference track.
  • processing unit 802 is specifically configured to:
  • the decoding performance of the decoding device reserve the time domain level matching the decoding performance among the N time domain levels;
  • the time-domain track indication information includes a multi-time-domain track identification field, a time-domain level identification field, a frame rate field, and code rate information
  • the code stream information includes a low code rate field and a high code rate field
  • the processing unit 802 is specifically configured to:
  • the read multi-time-domain track identification field in the time-domain track indication information and when the read multi-time-domain track identification field is the second value, it indicates that the media frames of the N time-domain levels are encapsulated into A single track, and read the value of the time domain level identification field, the value of the frame rate field, and the value of the low code rate field and the value of the high code rate field in the code rate information;
  • the decoding performance of the decoding device reserve A time domain level matching the decoding performance among the N time domain levels.
  • the time-domain track indication information includes a multi-time-domain track identification field, a time-domain level identification field, a frame rate field, and code rate information
  • the code stream information includes a low code rate field and a high code rate field
  • the processing unit 802 is specifically configured to:
  • the value of the code rate field and the value of the high code rate field; the media frame encapsulated in the reference track is independently decoded;
  • the decoding performance of the decoding device is used, and part or all of the time-domain hierarchy in the track matching the decoding performance is reserved.
  • the time-domain track indication information further includes track combination strategy information
  • the track combination strategy information includes a track identification field, a priority decoding presentation field, a replacement track field, and a replacement track identification field;
  • the processing unit 802 is further configured to, when the read multi-time-domain track identification field is a first value, indicate that the media frames of the N time-domain levels are encapsulated into multiple different tracks, and in each track There is overlap in the time-domain hierarchy, and the value of each field in the track combination strategy information is read from the reference track;
  • the processing unit 802 is further configured to, according to the values of the fields in the track combination strategy information and the decoding performance of the decoding device, reserve part or all of the time domain levels in the tracks that match the decoding performance .
  • processing unit 802 is specifically configured to:
  • the content generation device is supported to encapsulate multi-time-domain-level media frames into different tracks, and during the encapsulation process, the time-domain track indication information will be recorded in the description data box of each track, so as to realize the association Tracks of different time domain levels indicate the time domain level information in the track, mark the reference track, and indicate the strategy information for the combination of multi-time domain level tracks.
  • the processing unit 602 can select an appropriate time-domain level according to the description data box in the track, and combine the samples of different tracks for decoding and presentation, thereby ensuring the flexibility of multi-track packaging and saving decoding calculations to the greatest extent. resource.
  • FIG. 9 shows a schematic structural diagram of a media data processing device provided in an exemplary embodiment of the present application
  • the media data processing device may be a computer program (including program code), for example, the media data processing device may be an application software in the content generating device.
  • the apparatus for processing media data may include: a determination unit 901 and a processing unit 902 .
  • the device for processing media data may be used to perform corresponding steps in the method shown in FIG. 7; then:
  • the determining unit 901 is configured to determine the time-domain level of each media frame according to the inter-frame dependency of each media frame included in the media data, and obtain N media frames of the time-domain level; wherein, N is greater than 1 positive integer;
  • the processing unit 902 is configured to encapsulate the N time-domain level media frames into M tracks respectively, and generate corresponding description data boxes; the description data box of the j-th track in the M tracks includes a time domain Orbit indication information, the time domain orbit indication information is used to indicate the track packaging mode of the N time domain levels, the time domain orbit indication information includes the time domain level information of the time domain level of the jth track packaging , where M is a positive integer greater than 1.
  • the time-domain orbit indication information includes a multi-time-domain orbit identification field, and the multi-time-domain orbit identification field is used to indicate the orbit packaging modes of the N time-domain levels; the processing unit 902 specifically uses At:
  • the value of the multi-time-domain track identification field is generated to be a second value.
  • the time-domain orbit indication information includes a total time-domain layer number field; the processing unit 902 is specifically configured to:
  • the time-domain layer information of the jth track includes a time-domain layer number field; the processing unit 902 is specifically configured to:
  • the time domain level information of the jth track includes a time domain level identification field, a frame rate field and code rate information; the processing unit 902 is specifically configured to:
  • the code rate accumulated to the media frame belonging to the i-th time domain level is used as the code rate information.
  • the code rate information includes a low code rate field and a high code rate field; the processing unit 902 is specifically configured to:
  • the upper 12 bits of the code rate of the media frame belonging to the i-th time domain level are stored in the high code rate field.
  • the time-domain orbit indication information includes a reference orbit identification field; the processing unit 902 is specifically configured to:
  • the media frames encapsulated in the reference track are independently decoded.
  • the time-domain track indication information further includes track combination strategy information
  • the track combination strategy information includes a track identification field, a priority decoding presentation field, a replacement track field, and a replacement track identification field
  • the processing unit 902 specifically for:
  • the jth track is a track that is preferentially decoded and presented, generate the value of the preferentially decoded presentation field as the first value; if the jth track is not a track that is preferentially decoded and presented, generate the priority decoding
  • the value of the presentation field is the second value;
  • the jth track is a replacement track of a track, generate the value of the replacement track field as the first value, and store the identifier of a track replaced by the jth track in the replacement track identifier field; if the jth track is not a substitute track, generate the value of the substitute track field as the second value.
  • the processing unit 902 is further configured to generate a track index data box of the j-th track if the j-th track is a non-reference track, and the track index data box includes track index type data box; the track index type data box includes a track identification field and an index type identification field;
  • the processing unit 902 is further configured to store the identifier of the reference track in the track identifier field, and index the jth track to the reference track according to the index type identifier field.
  • the processing unit 902 can determine the time-domain level of each media frame through the inter-frame dependency relationship between the media frames included in the media data, and then can separate the N time-domain level media frames into Encapsulate into M tracks, and based on the encapsulation process of the media frame, generate the description data box of the jth track, and in the description data box, set corresponding values for the fields included in the time domain track indication information, so as to pass
  • the value of each field associates tracks of different time domain levels, indicates the time domain level information in the track, marks the reference track, and indicates the strategy of combining multi-time domain level tracks, so as to realize the instruction content consumption device, the processing unit 902 media frame encapsulation process, so that the content consumption device can select the appropriate time domain level for decoding and presentation according to the values of each field in the track description data box, which ensures the flexibility of multi-track encapsulation and can Maximize the saving of decoding computing resources.
  • FIG. 10 is a schematic structural block diagram of a computer device provided by an embodiment of the present application.
  • the computer device may be the above-mentioned content consumption device, or may also be the above-mentioned content generation device, wherein the computer device may be It can be a server or a terminal device.
  • the computer device in this embodiment as shown in FIG. 10 may include: one or more processors 101 ; one or more input devices 102 , one or more output devices 103 and memory 104 .
  • the aforementioned processor 101 , input device 102 , output device 103 and memory 104 are connected through a bus 105 .
  • the memory 104 is used to store computer programs, and the computer program includes program instructions, and the processor 101 is used to execute the program instructions stored in the memory 104 .
  • the memory 104 can include a volatile memory (volatile memory), such as a random-access memory (random-access memory, RAM); the memory 104 can also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), solid-state hard disk (solid-state drive, SSD) etc.; Memory 104 can also comprise the combination of above-mentioned types of memory.
  • volatile memory such as a random-access memory (random-access memory, RAM)
  • non-volatile memory such as a flash memory (flash memory), solid-state hard disk (solid-state drive, SSD) etc.
  • SSD solid-state drive
  • the processor 101 may be a central processing unit (central processing unit, CPU).
  • the processor 101 may further include a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD), and the like.
  • the PLD may be a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or the like.
  • the processor 101 may also be a combination of the above structures.
  • the memory 104 is used to store computer programs, the computer programs include program instructions, and the processor 101 is used to execute the program instructions stored in the memory 104 to implement the above-mentioned processing method for media data as shown in Figure 5
  • the media data includes a plurality of media frames
  • the plurality of media frames are divided into N time-domain levels
  • the plurality of media frames are encapsulated into M tracks, where M and N All are positive integers greater than 1.
  • the processor 101 is configured to invoke the program instructions for executing:
  • the description data box includes time-domain track indication information
  • the time-domain track indication information is used to indicate the track packaging mode of the N time-domain levels
  • the time-domain track indication information includes time-domain level information of the time-domain level encapsulated by the j-th track, where j is a positive integer, and j ⁇ M;
  • the memory 104 is used to store a computer program
  • the computer program includes program instructions
  • the processor 101 is used to execute the program instructions stored in the memory 104, and can also be used to implement the above-mentioned steps of the corresponding method in FIG. 7 .
  • the processor 101 is configured to invoke the program instructions for executing:
  • each media frame included in the media data determines the time domain level of each media frame, and obtain N media frames of the time domain level; wherein, N is a positive integer greater than 1;
  • the description data box of the j-th track in the M tracks includes time-domain track indication information
  • the The time-domain track indication information is used to indicate the track packaging mode of the N time-domain levels
  • the time-domain track indication information includes the time-domain level information of the time-domain level of the j-th track package, where M is greater than A positive integer of 1.
  • an embodiment of the present application further provides a storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the method provided in the foregoing embodiments.
  • the embodiment of the present application also provides a computer program product including instructions, which, when run on a computer, causes the computer to execute the method provided in the foregoing embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例提供一种媒体数据的处理方法及相关设备,其中的方法包括:获取所述M个轨道中的第j个轨道的描述数据盒,所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M;根据所述时域轨道指示信息,对所述媒体数据进行解码,可提升对不同时域层级进行多轨封装时的灵活性。

Description

一种媒体数据的处理方法及相关设备
本申请要求于2021年06月11日提交中国专利局、申请号为202110656768.4、申请名称为“一种媒体数据的处理方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及媒体数据的处理。
背景技术
在相关的视频编码技术中,时域层级划分的技术得到了支持,且在系统层封装技术中,针对轨道内的不同时域层级也进行了指示。然而,一些系统层的封装技术中,仅考虑了使用一个轨道封装视频位流的情景。
发明内容
本申请实施例提供媒体数据的处理方法及相关设备,可提升对不同时域层级进行封装时的灵活性。
一方面,本申请实施例提供一种媒体数据的处理方法,所述媒体数据包括多个媒体帧,所述多个媒体帧被划分为N个时域层级,所述多个媒体帧被封装到M个轨道中,其中M和N均为大于1的正整数;包括:
获取所述M个轨道中的第j个轨道的描述数据盒,所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M;
根据所述时域轨道指示信息,对所述媒体数据进行解码。
一方面,本申请实施例提供一种媒体数据的处理方法,包括:
根据媒体数据包括的每个媒体帧的帧间依赖关系,确定所述每个媒体帧的时域层级,得到N个时域层级的媒体帧;其中,N为大于1的正整数;
分别将所述N个时域层级的媒体帧封装到M个轨道中,并生成对应的描述数据盒;所述M个轨道中第j个轨道的描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中,M为大于1的正整数。
一方面,本申请实施例提供一种媒体数据的处理装置,所述媒体数据包括多个媒体帧,所述多个媒体帧被划分为N个时域层级,所述多个媒体帧被封装到M个轨道中,其中M和N均为大于1的正整数;所述装置包括:
获取单元,用于获取所述M个轨道中的第j个轨道的描述数据盒,所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M;
处理单元,用于根据所述时域轨道指示信息,对所述媒体数据进行解码。
一方面,本申请实施例提供另一种媒体数据的处理装置,包括:
确定单元,用于根据媒体数据包括的每个媒体帧的帧间依赖关系,确定所述每个媒体 帧的时域层级,得到N个时域层级的媒体帧;其中,N为大于1的正整数;
处理单元,用于分别将所述N个时域层级的媒体帧封装到M个轨道中,并生成对应的描述数据盒;所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中,M为大于1的正整数。
一方面,本申请实施例提供一种计算机设备,包括:
处理器,适于实现一条或多条指令;以及,
存储器,存储有一条或多条指令,一条或多条指令适于由处理器加载并执行上述方面的媒体数据的处理方法。
一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序由于执行上述方面的媒体数据的处理方法。
又一方面,本申请实施例提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行以上方面的媒体数据的处理方法。
在本申请实施例中,支持内容生成设备将多时域层级的媒体帧封装到不同的轨道,并在封装过程中,将在每个轨道的描述数据盒中记录时域轨道指示信息,从而实现关联不同时域层级的轨道,指示轨道内的时域层级信息,标注基准轨道,指示多时域层级轨道相互组合的策略信息。那么,内容消费设备则可根据轨道内的描述数据盒选取合适的时域层级,将不同轨道的样本组合后进行解码呈现,从而保证了多轨道封装的灵活性,并最大化地节省了解码计算资源。
附图说明
图1示出了本申请一个示例性实施例提供的一种视频处理的流程图;
图2示出了本申请一个示例性实施例提供的针对媒体数据的处理方法的示意流程图;
图3示出了本申请一个示例性实施例提供的编码单元的示意图;
图4示出了本申请一个示例性实施例提供的时域层级的划分的示意图;
图5示出了本申请一个示例性实施例提供的一种媒体数据的处理方法的示意流程图;
图6a示出了本申请一个示例性实施例提供的一种多时域的多轨道封装方式的示意图;
图6b示出了本申请一个示例性实施例提供的一种多时域的多轨道封装方式的示意图;
图7示出了本申请一个示例性实施例提供的一种媒体数据的处理方法的示意流程图;
图8示出了本申请一个示例性实施例提供的一种媒体数据的处理装置的结构示意图;
图9示出了本申请一个示例性实施例提供的一种媒体数据的处理装置的结构示意图;
图10示出了本申请一个示例性实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例涉及媒体数据的处理技术。其中,媒体数据(或称之为多媒体数据)是 指由内容上相互关联的文本、图形、图像、声音、动画、活动图像等媒体的数据所形成的复合数据,而在本申请实施例所提及的媒体数据,主要包括由声音构成的音频数据,以及由图像和声音等构成的视频数据,且本申请实施例中,主要以媒体数据为视频数据为例,对媒体数据的数据处理过程进行详细说明,而在媒体数据为音频数据时,可参见本申请实施例。本申请实施例涉及的对媒体数据的处理过程,主要包括媒体数据采集,媒体数据编码,媒体数据文件封装,媒体数据文件传输,媒体数据解码和最终的数据呈现,而在该媒体数据为视频数据时,那么,针对视频数据的完整处理过程可如图1所示,具体可包括:视频采集,视频编码,视频文件封装,视频传输,视频文件解封装,视频解码和最终的视频呈现。
视频采集用于将模拟视频转换为数字视频,并按数字视频文件的格式进行保存,也就是说,视频采集可将视频信号转换为二进制数字信息,其中,由视频信号转换为的二进制信息是一种二进制数据流,该二进制信息也可称为该视频信号的码流或者位流(Bitstream),视频编码则是值通过压缩技术,将原始视频格式的文件转换为另一种视频格式文件。在本申请实施例中提及视频媒体内容的生成,包括摄像机采集生成的真实场景,以及计算机生成的屏幕内容场景,而从视频信号的获取方式看,视频信号可以分为摄像机拍摄到的以及计算机生成的两种方式,由于统计特性的不同,其对应的压缩编码方式也可能有所区别,现代主流视频编码技术,以国际视频编码标准HEVC(High Efficiency Video Coding,国际视频编码标准HEVC/H.265),VVC(versatile video coding,国际视频编码标准VVC/H.266),以及中国国家视频编码标准AVS(Audio Video Coding Standard,中国国家视频编码标准AVS),或AVS3(由AVS标准组推出的第三代视频编码标准)为例,采用了混合编码框架,对输入的原始视频信号,进行了如下一系列的操作和处理,具体可如图2所示:
①块划分结构(block partition structure):输入图像(如视频数据中的一个媒体帧)根据一个的大小,划分成若干个不重叠的处理单元,每个处理单元将进行类似的压缩操作。这个处理单元被称作CTU(Coding Tree Unit,编码树单元),或者LCU(Largest Coding Unit,最大编码单元)。其中,编码树单元一般由最大编码单元开始往下划分,CTU再往下,可以继续进行更加精细的划分,得到一个或多个基本编码的单元,称之为CU(Coding Unit,编码单元)。每个CU是一个编码环节中最基本的元素。以下描述的是对每一个CU可能采用的各种编码方式,其中,LCU(或CTU)和CU之间的关系可如图3所示。
②预测编码(Predictive Coding):包括了帧内预测和帧间预测等方式,原始视频信号经过选定的已重建视频信号的预测后,得到残差视频信号。编码端需要为当前CU决定在众多可能的预测编码模式中,选择最适合的一种,并告知解码端。
a.帧内预测(Intra(picture)Prediction):预测的信号来自于同一图像内已经编码重建过的区域。
b.帧间预测(Inter(picture)Prediction):预测的信号来自已经编码过的,不同于当前图像的其他图像(称之为参考图像)。
③变换编码及量化(Transform&Quantization):残差视频信号经过DFT(Discrete Fourier Transform,离散傅里叶变换),DCT(离散余弦变换,是DFT的一个子集)等变换操作,将 信号转换到变换域中,称之为变换系数。在变换域中的信号,进一步的进行有损的量化操作,丢失掉一定的信息,使得量化后的信号有利于压缩表达。
在一些视频编码标准中,可能有多于一种变换方式可以选择,因此,编码端也需要为当前编码CU选择其中的一种变换,并告知解码端。量化的精细程度通常由量化参数(Quantization Parameter,QP)来决定,QP取值较大大,表示更大取值范围的系数将被量化为同一个输出,因此通常会带来更大的失真,及较低的码率;相反,QP取值较小,表示较小取值范围的系数将被量化为同一个输出,因此通常会带来较小的失真,同时对应较高的码率。
④熵编码(Entropy Coding)或统计编码:量化后的变换域信号,将根据各个值出现的频率,进行统计压缩编码,最后输出二值化(0或者1)的压缩码流。同时,编码产生其他信息,例如选择的模式,运动矢量等,也需要进行熵编码以降低码率。
统计编码是一种无损编码方式,可以有效的降低表达同样的信号所需要的码率。常见的统计编码方式有变长编码(VLC,Variable Length Coding)或者基于上下文的二值化算术编码(CABAC,Content Adaptive Binary Arithmetic Coding)。
⑤环路滤波(Loop Filtering):已经编码过的图像,经过反量化,反变换及预测补偿的操作(上述②~④的反向操作),可获得重建的解码图像。重建图像与原始图像相比,由于存在量化的影响,部分信息与原始图像有所不同,产生失真(Distortion)。对重建图像进行滤波操作,例如去块效应滤波(deblocking),SAO(Sample Adaptive Offset,样点自适应补偿)或者ALF(Adaptive Loop Filter,自适应环路滤波)等滤波器,可以有效的降低量化所产生的失真程度。由于这些经过滤波后的重建图像,将做为后续编码图像的参考,用于对将来的信号进行预测,所以上述的滤波操作也被称为环路滤波,及在编码环路内的滤波操作。
图2中展示了视频编码器的基本流程,图2中以第k个CU(标记为S k[x,y])为例进行举例说明,其中,k为大于等于1且小于等于输入的当前图像中的CU的数量的正整数,S k[x,y]表示第k个CU中坐标为[x,y]的像素点,x表示像素点的横坐标,y表示像素点的中坐标,S k[x,y]经过运动补偿或者帧内预测等中的一种较优处理后获得预测信号
Figure PCTCN2022083960-appb-000001
S k[x,y]与
Figure PCTCN2022083960-appb-000002
相减得到残差信号U k[x,y],然后对该残差信号U k[x,y]进行变换和量化,量化输出的数据由两个不同的去处:一是送个熵编码器进行熵编码,编码后的码流输出到一个缓冲器(buffer)中保存,等待传出去;另一个应用是进行反量化和反变换后,得到信号U’ k[x,y]。将信号U’ k[x,y]与
Figure PCTCN2022083960-appb-000003
相加得到新的预测信号S* k[x,y],并将S* k[x,y]送到当前图像的缓冲器中保存。S* k[x,y]经过帧内—图像预测获得f(S* k[x,y]),S* k[x,y]经过环路滤波后获得S’ k[x,y],并将S’ k[x,y]送至解码图像缓冲器中保存,以用于生成重建视频。S’ k[x,y]经过运动—补偿预测后获得S’ r[x+m x,y+m y],S’ r[x+m x,y+m y]表示参考块,m x和m y分别表示运动矢量的水平和竖直分量。
在对媒体数据进行编码后,则需要对编码后的数据流进行封装并传输给用户,视频文件封装是指按照封装格式(或容器,或文件容器),将已经编码压缩好的视频和音频按照一定的格式存放在一个文件中,常见的封装格式包括AVI格式(Audio Video Interleaved,音频 视频交错格式)或者ISOBMFF(ISO Based Media File Format,基于ISO(International Standard Organization,国际标准化组织)标准的媒体文件格式),其中,ISOBMFF是媒体文件的封装标准,最典型的ISOBMFF文件即MP4(Moving Picture Experts Group 4,动态图像专家组4)文件,其中,本申请实施例的主要改进点也是针对ISOBMFF数据盒的。在一个实施例中,将音频码流和视频码流按照如ISOBMFF的文件格式封装在文件容器中形成封装文件,在该封装文件中,以样本(sample)为文件封装过程中的封装单位,一个封装文件由多个样本组成。也就是说,在媒体文件的封装过程中,通常将一个媒体帧作为一个样本进行封装,从而生成封装文件,而在该媒体数据为视频媒体时,该媒体帧即为媒体帧,而在该媒体数据为音频媒体时,该媒体帧即为音频帧,也即,针对视频媒体的封装文件包括多个媒体帧,而音频媒体的封装文件则包括多个音频帧,在本申请实施例中,以媒体数据为视频媒体为了,并以封装文件中的一个样本为视频媒体中的一个媒体帧为例进行说明。
封装后的文件将通过视频传输到用户终端,用户终端则可在进行解封装,解码等逆操作后,在用户终端中进行最终视频内容的呈现。其中,封装后的文件可通过传输协议发送到用户终端,该传输协议例如可以是DASH(dynamic adaptive streaming over HTTP,基于HTTP的动态自适应流是一种自适应比特率流技术),采用DASH进行传输可以使高质量流媒体可以通过传统的HTTP网络服务器以互联网传递,在DASH中,用MPD(media presentation description,DASH中的媒体演示描述信令)描述媒体片段信息,且DASH中,一个或多个媒体成分的组合,比如某种分辨率的视频文件可以看做一个Representation(代表),而包含的多个Representation可看做是一个Adaptation Set(一个视频流的集合),一个DASH可包含一个或多个Adaptation Set。
可以理解,用户终端的文件解封装的过程与上述的文件封装过程是相逆的,用户终端可按照封装时的文件格式要求对封装文件进行解封装,得到音频码流和视频码流。用户终端的解码过程也与编码过程是相逆的,该用户终端可对音频码流解码,还原出音频内容。根据上述编码过程可以看出,在解码端,对于每一个CU,解码器获得压缩码流后,先进行熵解码,获得各种模式信息及量化后的变换系数。各个系数经过反量化及反变换,得到残差信号。另一方面,根据已知的编码模式信息,可获得该CU对应的预测信号,两者相加之后,即可得到重建信号。最后,解码图像的重建值,需要经过环路滤波的操作,产生最终的输出信号。
在视频编码技术中,还涉及一种时域分层技术,该技术可将不同的媒体帧按照解码时的依赖关系,划分为不同的时域层级,具体来说,采用该时域分层技术进行时域层级的划分,被划分为低层级的媒体帧,在解码时无需参考更高层级的媒体帧,如图4所示,箭头表示解码时的依赖关系,从I 0帧至B 1帧的箭头表示B 1帧在解码时需要参考I 0帧进行解码,即B 1帧的解码必须依赖I 0帧的解码进行,其余帧之间的关系以此类推。由图4中箭头指示的帧间依赖关系可知,所有媒体帧根据帧间依赖关系被划分为L0~L3四个时域层级,属于每个时域层级的媒体帧在解码时都不依赖更高层级的媒体帧。其中,需要说明的是,本申请实施例所提及的时域层级的低和高是一个相对的概念,如图4中确定的L0~L3这四个时域层级,对于L0时域层级而言,L1~L3均为高时域层级,而对于L1时域层级而言,L3时域层级为L1 的高时域层级,而L0时域层级则为L1的低时域层级。
如图4所示,媒体帧的类型主要包括I帧(Intra Slice,帧内条带)、B帧和P帧,其中,I帧也被称为关键帧,属于帧内压缩,在解码时仅需参考I帧其本身的信息即可,B帧为双向预测编码帧,在解码时即需要参考前面已有的帧,又需要参考后面待解码的帧,而P帧为前向预测编码帧,即P帧在解码时需要参考前面相关帧的信息才能解码,而在图4中针对I帧、B帧和P帧下添加的阿拉伯数字下标用于表示其各自所处的对应的时域层级。可以理解,基于I帧、P帧和B帧这三类媒体帧在解码时的特性,由于要使得进行时域层级划分后,在属于各时域层级的媒体帧中,属于低时域层级的媒体帧在进行解码时不依赖高时域层级,那么也就可以理解,最低时域层级(如上述的L0时域层级)中的媒体帧在解码时,将不依赖于属于其他任何时域层级的,也即属于最低时域层级的媒体帧可进行独立解码显示,那么,被划分到最低时域层级的媒体帧必然包括I帧。
由于在对媒体帧进行时域层级划分时,属于低时域层级的媒体帧在解码时无需参考高时域层级的媒体帧,如图4所示,假设视频数据中的媒体帧一个包括L0~L3这四个时域层级,且图4中的箭头用于表示各媒体帧在解码时的依赖关系,也就是说,从I 0帧至B 1帧的箭头表示,处于L1时域层级的B 1帧在解码时需要参考处于L0时域层级的I 0帧,处于L1时域层级的B 1帧在解码时需要参考处于L0时域层级的P 0帧,处于L2时域层级的第一个B 2帧在解码时需要参考处于L0时域层级的I 0帧,以及处于L1时域层级的B 1帧,处于L2时域层级的第二个B 2帧在解码时需要参考处于L1时域层级的B 1帧,以及处于L0时域层级的P 0帧,处于L3时域层级的第一个B 3帧在解码时需要参考处于L2时域层级的第一个B 2帧,以及处于L0时域层级的I 0帧,处于L3时域层级的第二个B 3帧在解码时需要参考处于L2时域层级的第一个B 2帧,以及处于L1时域层级的B 1帧,处于L3时域层级的第三个B 3帧在解码时需要参考处于L1时域层级的B 1帧,以及处于L2时域层级的第二个B 2帧,处于L3时域层级的第四个B 3帧在解码时需要参考处于L2时域层级的第二个B 2帧,以及处于L0时域层级的P 0帧。
现有的AVS3视频编码技术,可支持时域层级划分技术(或称之为时域分层技术),同时在AVS3系统层封装技术中,针对轨道内的不同时域层级也进行了指示,采用现有的支持时域分层技术,在对媒体帧进行封装时,可通过封装文件中的时域层数字段(temporal_layer_num)指示封装各媒体帧的轨道对应视频码流中的时域层数,此外,还将通过时域层级标识字段(temporal_layer_id)指示封装目标媒体帧的轨道对应的视频码流中各媒体帧所属的时域层级。其中,轨道是指一系列有时间属性的按照ISO基本媒体文件格式(ISO base media file format,ISOBMFF)的封装方式的样本,比如视频track,视频track是通过将视频编码器编码每一帧后产生的码流按照ISOBMFF的规范封装后得到的。现有的AVS3解码器配置信息(即描述数据盒)记录给出了针对AVS3编码方式的解码器配置信息,该解码配置信息可采用配置信息1进行表示,该配置信息1具体如下:
class Avs3DecoderConfigurationRecord{//AVS3解码器配置记录
unsigned int(8)configurationVersion;//8位无符号整数的配置版本字段
unsigned int(8)profile_id;//简介标识符
unsigned int(8)level_id;//水平标识符
bit(6)reserved='111111'b;//保留字段,一般字段需要整数个byte,所以需要用保留的bit(位)来补足
unsigned int(2)chroma_format;//色度格式
bit(5)reserved='11111'b;//
unsigned int(3)encoding_precision;//编码精度
bit(4)reserved='1111'b;//
unsigned int(4)frame_rate_code;//编码帧率
bit(6)reserved='111111'b;//
unsigned int(2)library_indication;//库指示
bit(4)reserved='11111'b;//
unsigned int(3)temporal_layer_num;//时域层数字段
}
由上述可知,虽然在AVS3视频编码技术中,时域层级划分的技术得到了支持,同时在AVS3系统层封装技术中,针对轨道内的不同时域层级也进行了指示。然而,当前系统层的封装技术中,仅考虑了使用一个轨道封装视频位流的情景。若用户将一个包含不同时域层级的视频位流封装到多个视频文件轨道,则现有技术难以提供足够的信息以支持用户有选择性地通过不同文件轨道重构视频位流并消费。
基于此,本申请实施例提出的媒体数据的处理方法,可支持针对AVS3编码中时域层级划分技术的多轨道文件封装,在该媒体数据为视频数据时,具体实施步骤如下:
1、在视频编码环节,内容生成设备可根据视频数据的各媒体帧之间的帧间依赖关系,为不同媒体帧确定时域层级;
2、根据媒体帧的时域层级,将视频位流封装为多个不同的轨道,并在每个文件轨道中指示该轨道包含的具体时域层级信息,包括时域层级id、时域层级对应的帧率和码率信息等;
3、对于包含最低时域层级的轨道,将其标识为基准轨道,其余高时域层级的轨道通过’tlrf’索引至该基准轨道。同时,在基准轨道的文件封装中,给出多时域层级相互组合的策略信息;
4、在用户消费侧,用户所在的内容消费设备可根据自身设备能力以及多个时域层级相互组合的策略信息,选择一个或多个所需时域层级对应的轨道,解封装这些不同的轨道并重构为一个位流进行解码。最终达到灵活选择文件轨道,节省解码计算资源的目的。
为支持上述步骤,本申请实施例在系统层添加了若干描述性字段,以扩展现有ISOBMFF数据盒的形式举例,定义了相关的字段以支持AVS3时域层级划分的多轨道文件封装技术,下面结合图5,对本申请实施例提出的媒体数据的处理方法进行详细说明,其中,该媒体数据的处理方法可由任意进行媒体内容消费的内容消费设备执行,可以理解,该内容消费设备包括进行媒体内容消费的用户所在的终端设备(或服务器),其中,该媒体数据包括多个媒体帧,所述多个媒体帧被划分为N个时域层级,所述多个媒体帧被封装到M个轨道中,M和N均为大于1的正整数。
如图5所示,该方法具体可包括:
S501,获取M个轨道中的第j个轨道的描述数据盒,描述数据盒包括时域轨道指示信息。
其中,时域轨道指示信息用于指示N个时域层级的轨道封装方式,时域轨道指示信息包括第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M。
首先,内容消费设备获取的描述数据盒是基于内容生成设备对媒体数据的编码封装生成的,内容生成设备在对媒体数据进行封装时,可基于各媒体帧所属的时域层级,将属于不同时域层级的媒体帧封装到多个不同的轨道中,其中,该媒体数据包括多个媒体帧,多个媒体帧被划分为N个时域层级,那么,该多个媒体帧可被封装到M个轨道中,其中M和N均为正整数。需要说明的是,封装该多个媒体帧的轨道,根据在轨道中所封装的媒体帧在解码时的特征,可将该M个轨道划分为基准轨道和非基准轨道,其中,基准轨道是指该轨道中所封装的媒体帧能被独立解码,即基准轨道中封装的媒体帧在解码时,不会参考其他任何轨道中的媒体帧,那么可以理解,在媒体数据为视频数据时,该基准轨道中封装的媒体帧必然包括I帧,那么综合上述的,被分到最低时域层级的媒体帧也必然包括I帧,即是说,基准轨道中封装的时域层级必然包括最低时域层级。
也就是说,如果媒体数据是视频数据,内容生成设备在需要将视频数据发送到用户侧进行消费显示时,可先在视频编码时,根据各个媒体帧的帧间依赖关系,为不同媒体帧确定时域层级,进而可根据时域层级,将视频位流封装到多个不同的轨道,并通过描述数据盒在每个文件轨道中指示具体的时域层级信息,那么相应的,在用户侧进行消费时,则可基于自身终端设备的解码能力进行时域层级的选取,从而可节省解码计算资源。在一个实施例中,若一个媒体数据包括的多个媒体帧分别属于N个时域层级,那么,内容生成设备在将属于N个时域层级的媒体帧封装到一个或多个轨道后,将在每个轨道生成相应的描述数据盒,以使得内容消费设备(如用户侧的终端设备)可基于该描述数据盒的记录,确定内容生成设备对属于N个时域层级的媒体帧的封装方式,并进一步地选择合适的时域层级的媒体帧进行解码显示。
在本申请实施例中,该描述数据盒是通过在现有的ISOBMFF数据盒中添加时域轨道信息,从而实现的支持VAS3时域层级划分的多轨道文件封装技术,可以理解,在该ISOBMFF数据盒中添加时域轨道信息包括在该ISOBMFF数据盒中扩展添加的一个或多个相关字段。在本申请实施例中,以该M个轨道中第j个轨道的描述数据盒为例,对在描述数据盒中扩展的相关字段进行详细说明。其中,在该描述数据盒中扩展的相关字段可如配置信息2所示,该配置信息2具体如下:
Figure PCTCN2022083960-appb-000004
Figure PCTCN2022083960-appb-000005
Figure PCTCN2022083960-appb-000006
其中,该描述数据盒中包括的时域轨道指示信息所包括的字段分别为上述配置信息2中的多时域轨道标识字段(multi_temporal_track_flag)、总时域层数字段(total_temporal_layer_num)、时域层数字段(时域层数字段)、时域层级标识字段(temporal_layer_id[i])、帧率字段(frame_rate_code[i])、低码率字段(temporal_bit_rate_lower[i])、高码率字段(temporal_bit_rate_upper[i])、基准轨道标识字段(base_track_flag)、轨道标识字段(track_ID[i]),优先解码呈现字段(is_output_track_flag[i]),替代轨道字段(is_alternative_track_flag[i]),以及替代轨道标识字段(alternate_track_ID)。其中,上述提及的字段中的时域层数字段(temporal_layer_id[i])、帧率字段(frame_rate_code[i])、低码率字段(temporal_bit_rate_lower[i])以及高码率字段(temporal_bit_rate_upper[i])用于指示相应轨道(如上述的第j个轨道)中具体的时域层级信息。
下面,将对配置信息2中涉及的语义和语法进行详细说明:
(1)多时域轨道标识字段(multi_temporal_track_flag)用于指示媒体数据的N个时域层级的轨道封装方式,该轨道封装方式包括:多轨道封装方式和单轨道封装方式,其中,在多时域轨道标识字段为第一数值时,多时域轨道标识字段用于指示属于N个时域层级的多个媒体帧被封装到多个不同的轨道中,而在当多时域轨道标识字段为第二数值时,多时域轨道标识字段用于指示属于N个时域层级的多个媒体帧被封装到单个轨道中,具体地,该第一数值可以为1,而该第二数值则可以为0。
(2)时域层数字段(temporal_layer_num)用于指示当前轨道(即上述的第j个轨道)包含的时域层级的数量。如配置信息2所示,在时域层数字段的取值大于1,即该第j个轨道封装了该时域层数字段的取值数量的多个时域层级,或者,在多时域轨道标识字段的取值为1,即该媒体数据的1个时域层级被封装到第j个轨道,内容消费设备在解码时,可进一步从该描述数据盒中读取相关字段的取值,从而根据各相关字段的取值进行解码显示,其中,该描述数据盒是一个为‘tlin’类型的数据盒。
(3)该第j个轨道中的描述数据盒中包括了该第j个轨道具体的时域层级信息,该时域层级信息包括时域层级标识字段(temporal_layer_id[i]),该时域层级字段用于指示单个时域层级的ID(Identity document,一种唯一的身份标识),其中,N个时域层级中的一个时域层级对应一个temporal_layer_id,也就是说,该时域层级标识字段可用于指示第j个轨道封装的时域层级中,第i个时域层级的层级标识。结合配置信息2可知,在该第j个轨道的描述数据盒中,第j个轨道封装的时域层级中第i个时域层级的层级标识将被记录在temporal_layer_id[i]中。
此外,该第j个轨道中的时域层级信息还包括帧率字段(frame_rate_code[i])和码率,该帧率字段用于指示累计到属于第i个时域层级的媒体帧(即时域层级等于temporal_layer_id[i])时的帧率,该码率信息用于指示累计到属于第i个时域层级的媒体帧(即 时域层级等于temporal_layer_id[i])时的码率,该码率信息包括低码率字段(temporal_bit_rate_lower[i]),该低码率字段用于指示累计到属于第i个时域层级的媒体帧时的码率的低18位,此外,该码率信息还包括高码率字段(temporal_bit_rate_upper[i]),该高码率字段用于指示累计到属于第i个时域层级的媒体帧时的码率的高12位。其中,累计是指假设temporal_layer_id[i]=3,那么,在temporal_layer_id[i]=3时,其对应的帧率(和码率)是temporal_layer_id[i]分别取到小于3的所有帧的帧率(和码率)。
如上述的配置信息2所示,如果描述数据盒中多时域轨道标识字段的取值为0,即指示该媒体数据中属于N个时域层级的多个媒体帧被封装到单个轨道时,内容消费设备可通过读取时域层级标识字段,帧率字段,低码率字段,以及高码率字段中的取值,确定累计到各时域层级时对应的帧率和码率,从而该内容消费设备可结合自身的解码性能,选取属于部分或全部时域层级的媒体帧进行解码显示。
(4)在多时域轨道标识字段的取值为1时,说明该媒体数据中属于N个时域层级的多个媒体帧被封装到多个不同的轨道中,在属于N个时域层级的多个媒体帧被封装到多个不同的轨道的情况下,内容消费设备在读取帧率字段,低码率字段,以及高码率字段中的取值,以确定累计到各时域层级对应的帧率和码率之前,内容消费设备还需要读取一些其他字段的值,其中,内容消费设备还需要读取的这些字段包括如下的①-④中提及的部分或全部字段:
①时域轨道指示信息包括的基准轨道标识字段(base_track_flag)。
基准轨道标识字段用于指示第j个轨道是否为基准轨道;当基准轨道标识字段为第一数值时,基准轨道标识字段用于指示第j个轨道为基准轨道,当基准轨道标识字段为第二数值时,基准轨道标识字段用于指示第j个轨道为非基准轨道;其中,基准轨道中封装的媒体帧被独立解码,其中,该第一数值可以为1,该第二数值可以为0。由上述的配置信息2可知,在内容生成设备对属于N个时域层级的媒体帧进行多轨封装时,累计到属于各个时域层级的媒体帧时的帧率和码率是被记录在基准轨道中的,也就是说,只有在基准轨道的描述数据盒中的帧率字段、低码率字段和高码率字段存在取值,而在非基准轨道的描述数据盒中,该帧率字段、低码率字段和高码率字段为空。
在一个实施例中,在内容消费设备从第j个轨道的描述数据盒中,读取到基准轨道标识字段的取值为1,那么,说明该第j个轨道为基准轨道,进而,内容消费设备还可从该第j个轨道的描述数据盒中读取帧率字段、低码率字段和低码率字段的取值,从而确定出各时域层级被累计时所对应的帧率和码率。
②总时域层数字段(total_temporal_layer_num)。
总时域层数字段用于指示当前文件包含的所有轨道对应的时域层级总数,即用于指示在M个轨道中所封装的时域层级的总数量。在多个时域层级采用多轨封装的方式时,如果第j个轨道为基准轨道,那么,内容生成设备在确定基准轨道的描述数据盒中时域层级标识字段(temporal_layer_id[i])的取值时,将基于总时域层数字段,在基准轨道的描述数据盒中记录每个时域层级的标识。
③索引类型标识字段('tlrf')。
索引类型标识字段用于在多时域层级使用多轨道封装时,定义基准轨道(或基准时域层级轨道)和非基准轨道(或高时域层级轨道)之间的索引关系,其中,基准时域层级轨道为包含最低的时域层级ID的轨道,基准时域层级轨道在一个文件中仅有一个,其余包含各时域层级的轨道均为高时域层级的轨道。
高时域层级轨道应通过轨道索引数据盒(TrackReferenceBox)索引至其解码所依赖的基准时域层级轨道。而在该高时域层级轨道的TrackReferenceBox中应添加对应的轨道索引类型数据盒(TrackReferenceTypeBoxes),其中,TrackReferenceTypeBoxes数据盒中通过track_IDs指示当前的基准轨道(或称之为基准时域层级轨道),其中,该非基准轨道和该基准轨道之间的索引通过TrackReferenceTypeBoxes中对应的索引类型标识字段(reference_type)标识,该类型字段定义为:
'tlrf':被索引的轨道为基准时域层级轨道。
也就是说,如果内容消费设备从第j个轨道的描述数据盒中,读取到基准轨道标识字段的取值为0,则说明该第j个轨道为非基准轨道,而在该第j个轨道为非基准轨道,则第j个轨道还包括轨道索引数据盒,轨道索引数据盒包括轨道索引类型数据盒;轨道索引类型数据盒包括轨道标识字段和索引类型标识字段;轨道标识字段用于存储基准轨道的标识,索引类型标识字段用于指示被索引的轨道为基准轨道。也就可以理解,在第j个轨道为非基准轨道时,由于在非基准轨道的描述数据盒中,帧率字段、低码率字段以及高码率字段的取值均为空,所以,在第j个轨道为非基准轨道的情况下,内容消费设备通过第j个轨道的描述数据盒将无法读取到帧率字段、低码率字段以及高码率字段的取值,进而无法确定出累计到每个时域层级的帧率和码率,那么,在这种情况下,内容消费设备可通过第j个轨道中的轨道索引数据盒所包括的轨道索引类型数据盒中的索引类型标识字段,从非基准轨道(即该第j个轨道)索引到基准轨道中,并基准轨道中读取上述的帧率字段、低码率字段和高码率字段的取值。
④时域轨道指示信息包括的轨道组合策略信息,该轨道组合策略信息包括轨道标识字段(track_ID[i]),优先解码呈现字段(is_output_track_flag[i]),替代轨道字段(is_alternative_track_flag[i]),以及替代轨道标识字段(alternate_track_ID)。
其中,轨道标识字段用于指示包含部分时域层级的轨道的标识(ID),该M个轨道中的一个轨道对应一个track_ID。
优先解码呈现字段用于指示第j个轨道(即当前轨道)是否为优先解码呈现的轨道;当优先解码呈现字段为第一数值时,优先解码呈现字段用于指示第j个轨道为优先解码呈现的轨道;当优先解码呈现字段为第二数值时,优先解码呈现字段用于指示第j个轨道不是优先解码呈现的轨道;其中,该第一数值可以为1,该第二数值可以为0。
替代轨道字段用于指示第j个轨道(即当前轨道)是否为M个轨道中的一个轨道的替代轨道;当替代轨道字段为第一数值时,替代轨道字段用于指示第j个轨道为M个轨道中的一个轨道的替代轨道;当替代轨道字段为第二数值时,替代轨道字段用于指示第j个轨道不是替代轨道;其中,该第一数值可以为1,该第二数值可以为0。
该替代轨道标识字段用于指示所述第j个轨道(即当前轨道)替代的一个轨道的标识。
综合上述对配置信息2中的语法和语义的说明,在媒体数据中包括的多个媒体帧属于N个不同的时域层级时,内容生成设备在对媒体数据所包括的多个媒体帧进行封装时可分为以下两种情况:
第一种情况,可将属于N个时域层级的媒体帧封装到单个轨道中。那么内容消费设备在获取到针对媒体数据的封装文件后,就可从封装分别属于N个时域层级的媒体帧的单个轨道的描述数据盒中,通过读取层级标识字段,帧率字段,低码率字段和高码率字段,确定出该N个时域层级中,第i个时域层级的层级标识,以及相应的帧率和码率,进而使得内容消费设备可结合自身的解码能力,选取部分或全部的时域层级的媒体帧进行解码,即可转而执行S502。
第二种情况,可将属于N个时域层级的媒体帧封装到多个不同的轨道中,内容生成设备在采用多轨封装的方式进行媒体帧的封装时,将在基准轨道中记录各轨道中封装的各时域层级相互组合的策略,以及累计到各时域层级对应的媒体帧的帧率和码率,并将其他轨道通过索引类型标识字段索引到基准轨道中,那么内容消费设备就可通过基准轨道记录的相关的信息,并结合自身的解码能力,选择部分或全部的媒体帧进行解码,即转而执行S502。
S502,根据时域轨道指示信息,对媒体数据进行解码。
在内容消费设备获取到第j个轨道中的描述数据盒后,将根据该描述数据盒中的时域轨道指示信息,对媒体数据进行解码,具体地,内容消费设备可根据时域轨道指示信息及解码设备的解码性能,保留N个时域层级中与解码性能匹配的时域层级,并对保留的时域层级的媒体帧进行解码显示。在内容消费设备根据时域轨道指示信息及解码设备的解码性能,保留N个时域层级中与解码性能匹配的时域层级时,在一种实现方式中,由于时域轨道指示信息包括多时域轨道标识字段,时域层级标识字段、帧率字段和码率信息,码流信息包括低码率字段和高码率字段,那么,该内容消费设备则可读取时域轨道指示信息中的多时域轨道标识字段的值,在读取的多时域轨道标识字段为第二数值时,指示N个时域层级的媒体帧被封装到单个轨道,并读取时域层级标识字段的值,帧率字段的值,以及码率信息中低码率字段的值和高码率字段的值;从而使得内容消费设备可根据时域层级标识字段的值,帧率字段的值,以及码率信息中低码率字段的值和高码率字段的值,以及解码设备(即上述的内容消费设备)的解码性能,保留N个时域层级中与解码性能匹配的时域层级。
若该媒体数据为视频数据,则内容生成设备将视频数据(或视频内容)属于N个时域层级的媒体帧封装到单个轨道后,内容消费设备进行解码消费的过程具体如下:
内容生成设备对视频内容A进行编码、封装,假设视频内容A存在3个时域层级L0~L2,属于该三个时域层级的任一时域层级的媒体帧均封装入一个轨道,且累计到各个时域层级时,对应的帧率和码率如下:
L0:20fps,bitrate=1mbps;
L1:30fps,bitrate=1.5mbps;
L2:60fps,bitrate=3mbps。
内容生成设备根据内容消费设备的请求,将视频文件A分别发送给用户1和用户2,用户1和用户2分别收到对应的文件A,根据轨道中各个时域层级对应的帧率和码率信息,解 码消费。具体为:
用户1所在的内容消费设备的解码设备性能较好,选择保留L0~L2的全部媒体帧解码呈现,获得最佳观看效果,而用户2所在的内容消费设备的解码设备性能较差,选择保留L0的全部媒体帧,并可丢弃L1~L2的媒体帧,仅对保留的属于L0的媒体帧解码呈现。
在内容消费设备根据时域轨道指示信息及解码设备的解码性能,保留N个时域层级中与解码性能匹配的时域层级时,在另一种实现方式中,如果内容消费设备可读取时域轨道指示信息中的多时域轨道标识字段的值,在内容消费设备读取的多时域轨道标识字段为第一数值时,指示N个时域层级的媒体帧被封装到多个不同的轨道,在各轨道中的时域层级无重叠时,该内容消费设备可从基准轨道中读取时域层级标识字段的值,帧率字段的值,以及码率信息中低码率字段的值和高码率字段的值;基准轨道中封装的媒体帧被独立解码,并进而根据从基准轨道读取的时域层级标识字段的值,帧率字段的值,以及码率信息中低码率字段的值和高码率字段的值,以及解码设备的解码性能,保留部分或全部与解码性能匹配的轨道中的时域层级。
若该媒体数据为视频数据,则内容生成设备将视频数据(或视频内容)属于N个时域层级的媒体帧封装到多个不同轨道,且各轨道时域无重叠,内容消费设备进行解码消费的过程可如图6a所示,具体如下:
内容生成设备对视频内容A进行编码、封装,假设视频内容A存在3个时域层级L0~L2,且属于这三个时域层级的媒体帧被分别封装至三个不同轨道,其中,rack1为基准轨道,track2和track3以'tlrf'类型索引至track1。在基准轨道中,指示累计到各个时域层级时,对应的帧率和码率如下:
L0:20fps,bitrate=1mbps;
L1:30fps,bitrate=1.5mbps;
L2:60fps,bitrate=3mbps。
内容生成设备根据内容消费设备的请求,将视频文件A分别发送给用户1和用户2,用户1和用户2分别收到对应的文件A,根据轨道中各个时域层级对应的帧率和码率信息,解码消费。具体为:
用户1所在的内容消费设备的解码设备性能较好,选择保留track1~track3的全部媒体帧解码呈现,获得最佳观看效果,而用户2所在的内容消费设备的解码设备性能较差,选择保留track1的全部媒体帧,并可丢弃track2和track3的媒体帧,仅对保留的属于track1的媒体帧解码呈现。
在另一种实现方式中,若内容消费设备读取的多时域轨道标识字段为第一数值,并指示N个时域层级的媒体帧被封装到多个不同的轨道,在各轨道中的时域层级存在重叠时,该内容消费设备则可从基准轨道中读取轨道组合策略信息中的各字段的取值,并根据轨道组合策略信息中的各字段的取值,以及解码设备的解码性能,保留部分或全部与解码性能匹配的轨道中的时域层级,其中,时域轨道指示信息还包括轨道组合策略信息,轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段。如图6b所示,若该媒体数据为视频数据,则内容生成设备将视频数据(或视频内容)属于 N个时域层级的媒体帧封装到多个不同轨道,且各轨道时域层级存在重叠,那么内容消费设备进行解码消费的过程可具体如下:
内容生成设备对视频内容A进行编码、封装,假设视频内容A存在3个时域层级L0~L2,且属于这三个时域层级的媒体帧被分别封装至三个不同轨道,其中,rack1为基准轨道,track2和track3以'tlrf'类型索引至track1,Track2和track3各自包含一部分L1和L2的媒体帧,且互不重叠,track2和track3的解码均依赖track1,但track2和track3之间没有依赖关系。在基准轨道中,指示各个轨道在组合时的信息:
track1:is_output_track_flag=1;is_alternative_track_flag=0:
track2:is_output_track_flag=1;is_alternative_track_flag=0:
track3:is_output_track_flag=0;is_alternative_track_flag=1:alternative_track=2。
内容生成设备根据内容消费设备的请求,将视频文件A分别发送给用户1和用户2,用户1和用户2分别收到对应的文件A,根据轨道中各个时域层级对应的帧率和码率信息,解码消费。具体为:
用户1所在的内容消费设备的解码设备性能较好,选择保留track1~track3的全部媒体帧解码呈现,获得最佳观看效果,而用户2所在的内容消费设备的解码设备性能较差,选择保留track1和track2的全部媒体帧,并可弃track3的媒体帧,解码track1中的媒体帧进行呈现。
在一个实施例中,内容消费设备在保留的媒体帧的数量为一个或多个,对保留的时域层级的媒体帧进行解码显示时,则可根据保留的一个或多个媒体帧中每个媒体帧的解码时间,对保留的一个或多个媒体帧按照解码时间重新排序(即重构),进而可对重新排序后的一个或多个媒体帧进行解码显示。也就是说,在组合不同轨道的媒体帧时,内容消费设备根据封装时每个媒体帧对应的解码时间,按照解码时间排列所选的多个轨道中所有媒体帧,重构之后再进行解码。
在本申请实施例中,支持内容生成设备将多时域层级的媒体帧封装到不同的轨道,并在封装过程中,将在每个轨道的描述数据盒中记录时域轨道指示信息,从而实现关联不同时域层级的轨道,指示轨道内的时域层级信息,标注基准轨道,指示多时域层级轨道相互组合的策略信息。那么,内容消费设备则可根据轨道内的描述数据盒选取合适的时域层级,将不同轨道的样本组合后进行解码呈现,从而保证了多轨道封装的灵活性,并最大化地节省了解码计算资源。
下面,将结合图7,对内容生成设备对媒体数据的封装过程进行说明,该内容生成设备具体可以是服务器,或者也可以是终端设备,其中,该服务器可以是独立的服务器,也可以是多个服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务和人工智能平台等基础云计算服务的云服务器。请参见图7,是本申请实施例提供的一种媒体数据的处理方法的示意流程图,如图7所示,该方法可包括:
S701,根据媒体数据包括的每个媒体帧的帧间依赖关系,确定每个媒体帧的时域层级,得到N个时域层级的媒体帧;其中,N为大于1的正整数。
S702,分别将N个时域层级的媒体帧封装到M个轨道中,并生成对应的描述数据盒;M 个轨道中第j个轨道的描述数据盒包括时域轨道指示信息,时域轨道指示信息用于指示N个时域层级的轨道封装方式,时域轨道指示信息包括第j个轨道封装的时域层级的时域层级信息,其中,M为大于1的正整数。
在S701和S702中,媒体数据中各媒体帧的帧间依赖关系可如图4所示,进而可基于该帧间依赖关系确定每个媒体帧的时域层级。其中,时域轨道指示信息包括多时域轨道标识字段,多时域轨道标识字段用于指示N个时域层级的轨道封装方式,那么,内容生成设备在根据媒体帧的封装过程,生成第j个轨道的描述数据盒时,若内容生成设备将N个时域层级的多个媒体帧被封装到多个不同的轨道中,则生成多时域轨道标识字段的取值为第一数值;而如果该内容生成设备将N个时域层级的多个媒体帧被封装到单个轨道中,则生成多时域轨道标识字段的取值为第二数值。此外,该时域轨道指示信息还包括总时域层数字段,那么内容生成设备在根据媒体帧的封装过程,生成第j个轨道的描述数据盒时,还将根据M个轨道封装的时域层级的总数量,生成总时域层数字段的取值。
其中,该第j个轨道的时域层级信息包括时域层数字段,那么,该内容生成设备在根据媒体帧的封装过程,生成第j个轨道的描述数据盒时,将根据第j个轨道封装的时域层级的数量,生成时域层数字段的取值。此外,该第j个轨道的时域层级信息包括时域层级标识字段、帧率字段和码率信息,那么,内容生成设备在根据媒体帧的封装过程,生成第j个轨道的描述数据盒时,可将第j个轨道封装的时域层级中,第i个时域层级的层级标识存储在时域层级标识字段中,将累计到属于第i个时域层级的媒体帧时的帧率存储到帧率字段中,以及将累计到属于第i个时域层级的媒体帧时的码率,作为码率信息。在一个实施例中,码率信息包括低码率字段和高码率字段;那么,内容生成设备在将累计到属于第i个时域层级的媒体帧时的码率,作为码率信息时,可将属于第i个时域层级的媒体帧时的码率的低18位,存储在低码率字段中,并将属于第i个时域层级的媒体帧时的码率的高12位,存储在高码率字段中。
时域轨道指示信息还包括基准轨道标识字段,那么内容生成设备在根据媒体帧的封装过程,生成第j个轨道的描述数据盒时,若第j个轨道为基准轨道,则生成基准轨道标识字段的取值为第一数值,而若第j个轨道为非基准轨道,则生成基准轨道标识字段的取值为第二数值;其中,基准轨道中封装的媒体帧被独立解码。此外,时域轨道指示信息还包括轨道组合策略信息,轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段;那么,内容生成设备在根据媒体帧的封装过程,生成第j个轨道的描述数据盒时,可将包含部分时域层级的轨道的标识存储在轨道标识字段中,并在第j个轨道为优先解码呈现的轨道,生成优先解码呈现字段的取值为第一数值,若第j个轨道不是优先解码呈现的轨道,则生成优先解码呈现字段的取值为第二数值;以及,若第j个轨道为一个轨道的替代轨道,则生成替代轨道字段的取值为第一数值,并将第j个轨道替代的一个轨道的标识存储在替代轨道标识字段中;若第j个轨道不是替代轨道,则生成替代轨道字段的取值为第二数值,其中,该第一数值可以为1,该第二数值可以为0。
在一个实施例中,若第j个轨道为非基准轨道,则内容生成设备将生成第j个轨道的轨道索引数据盒,轨道索引数据盒包括轨道索引类型数据盒;轨道索引类型数据盒包括轨道标识字段和索引类型标识字段,此外,该内容生成设备可将基准轨道的标识存储在轨道标识 字段中,并根据索引类型标识字段,将第j个轨道索引至基准轨道。
在本申请实施例中,内容生成设备通过媒体数据中所包括的媒体帧之间的帧间依赖关系,可确定每个媒体帧的时域层级,进而可将N个时域层级的媒体帧分别封装到M个轨道中,并基于媒体帧的封装过程,生成第j个轨道的描述数据盒,并在描述数据盒中,为时域轨道指示信息包括的各字段设置相应的取值,以通过各字段的取值将不同时域层级的轨道进行关联,指示轨道内的时域层级信息,对基准轨道进行标注,指示多时域层级轨道相互组合的策略,从而实现指示内容消费设备,该内容生成设备的媒体帧封装过程,那么也就使得内容消费设备可根据轨道中描述数据盒中的各字段的取值,选取合适的时域层级进行解码呈现,保证了多轨道封装的灵活性,并可最大化节省解码计算资源。
上述详细阐述了本申请实施例的方法,为了便于更好地实施本申请实施例的上述方案,相应地,下面提供了本申请实施例的装置。
请参见图8,图8示出了本申请一个示例性实施例提供的一种媒体数据的处理装置的结构示意图;该媒体数据的处理装置可以是运行于上述内容消费设备中的一个计算机程序(包括程序代码),例如该媒体数据的处理装置可以是内容消费设备中的一个应用软件。由图8所示,该媒体数据的处理装置可包括:获取单元801和处理单元802。
在一个示例性实施例中,该媒体数据的处理装置可以用于执行图5所示的方法中的相应步骤;所述媒体数据包括多个媒体帧,所述多个媒体帧被划分为N个时域层级,所述多个媒体帧被封装到M个轨道中,其中M和N均为大于1的正整数;则:
获取单元801,用于获取所述M个轨道中的第j个轨道的描述数据盒,所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M;
处理单元802,用于根据所述时域轨道指示信息,对所述媒体数据进行解码。
在一个实施例中,所述时域轨道指示信息包括多时域轨道标识字段,所述多时域轨道标识字段用于指示所述N个时域层级的轨道封装方式;
当所述多时域轨道标识字段为第一数值时,所述多时域轨道标识字段用于指示属于所述N个时域层级的多个媒体帧被封装到多个不同的轨道中;
当所述多时域轨道标识字段为第二数值时,所述多时域轨道标识字段用于指示属于所述N个时域层级的多个媒体帧被封装到单个轨道中。
在一个实施例中,所述时域轨道指示信息包括总时域层数字段;所述总时域层数字段用于指示所述M个轨道封装的时域层级的总数量。
在一个实施例中,所述第j个轨道的时域层级信息包括时域层数字段,所述时域层数字段用于指示所述第j个轨道封装的时域层级的数量。
在一个实施例中,所述第j个轨道的时域层级信息包括时域层级标识字段、帧率字段和码率信息;
所述时域层级标识字段用于指示所述第j个轨道封装的时域层级中,第i个时域层级的层级标识;
所述帧率字段用于指示累计到属于第i个时域层级的媒体帧时的帧率;
所述码率信息用于指示累计到属于第i个时域层级的媒体帧时的码率。
在一个实施例中,所述码率信息包括低码率字段和高码率字段;
所述低码率字段用于指示累计到属于第i个时域层级的媒体帧时的码率的低18位;
所述高码率字段用于指示累计到属于第i个时域层级的媒体帧时的码率的高12位。
在一个实施例中,所述时域轨道指示信息包括基准轨道标识字段;所述基准轨道标识字段用于指示所述第j个轨道是否为基准轨道;
当所述基准轨道标识字段为第一数值时,所述基准轨道标识字段用于指示所述第j个轨道为基准轨道,当所述基准轨道标识字段为第二数值时,所述基准轨道标识字段用于指示所述第j个轨道为非基准轨道;
其中,所述基准轨道中封装的媒体帧被独立解码。
在一个实施例中,所述时域轨道指示信息还包括轨道组合策略信息,所述轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段;
所述轨道标识字段用于指示包含部分时域层级的轨道的标识;
所述优先解码呈现字段用于指示所述第j个轨道是否为优先解码呈现的轨道;当所述优先解码呈现字段为第一数值时,所述优先解码呈现字段用于指示所述第j个轨道为优先解码呈现的轨道;当所述优先解码呈现字段为第二数值时,所述优先解码呈现字段用于指示所述第j个轨道不是优先解码呈现的轨道;
所述替代轨道字段用于指示所述第j个轨道是否为所述M个轨道中的一个轨道的替代轨道;当所述替代轨道字段为第一数值时,所述替代轨道字段用于指示所述第j个轨道为所述M个轨道中的一个轨道的替代轨道;当所述替代轨道字段为第二数值时,所述替代轨道字段用于指示所述第j个轨道不是替代轨道;
所述替代轨道标识字段用于指示所述第j个轨道替代的一个轨道的标识。
在一个实施例中,若所述第j个轨道为非基准轨道,则所述第j个轨道还包括轨道索引数据盒,所述轨道索引数据盒包括轨道索引类型数据盒;
所述轨道索引类型数据盒包括轨道标识字段和索引类型标识字段;
所述轨道标识字段用于存储基准轨道的标识,所述索引类型标识字段用于指示被索引的轨道为基准轨道。
在一个实施例中,所述处理单元802,具体用于:
根据所述时域轨道指示信息及解码设备的解码性能,保留所述N个时域层级中与所述解码性能匹配的时域层级;
对保留的时域层级的媒体帧进行解码显示。
在一个实施例中,所述时域轨道指示信息包括多时域轨道标识字段,时域层级标识字段、帧率字段和码率信息,所述码流信息包括低码率字段和高码率字段;所述处理单元802,具体用于:
读取所述时域轨道指示信息中的多时域轨道标识字段的值,在读取的所述多时域轨道标识字段为第二数值时,指示所述N个时域层级的媒体帧被封装到单个轨道,并读取所述 时域层级标识字段的值,所述帧率字段的值,以及所述码率信息中低码率字段的值和高码率字段的值;
根据所述时域层级标识字段的值,所述帧率字段的值,以及所述码率信息中低码率字段的值和高码率字段的值,以及所述解码设备的解码性能,保留所述N个时域层级中与所述解码性能匹配的时域层级。
在一个实施例中,所述时域轨道指示信息包括多时域轨道标识字段,时域层级标识字段、帧率字段和码率信息,所述码流信息包括低码率字段和高码率字段;所述处理单元802,具体用于:
读取所述时域轨道指示信息中的多时域轨道标识字段的值,在读取的所述多时域轨道标识字段为第一数值时,指示所述N个时域层级的媒体帧被封装到多个不同的轨道,在各轨道中的时域层级无重叠时,从基准轨道中读取所述时域层级标识字段的值,所述帧率字段的值,以及所述码率信息中低码率字段的值和高码率字段的值;所述基准轨道中封装的媒体帧被独立解码;
根据从所述基准轨道读取的所述时域层级标识字段的值,所述帧率字段的值,以及所述码率信息中低码率字段的值和高码率字段的值,以及所述解码设备的解码性能,保留部分或全部与所述解码性能匹配的轨道中的时域层级。
在一个实施例中,所述时域轨道指示信息还包括轨道组合策略信息,所述轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段;
所述处理单元802,还用于在读取的所述多时域轨道标识字段为第一数值时,指示所述N个时域层级的媒体帧被封装到多个不同的轨道,在各轨道中的时域层级存在重叠,从基准轨道中读取所述轨道组合策略信息中的各字段的取值;
所述处理单元802,还用于根据所述轨道组合策略信息中的各字段的取值,以及所述解码设备的解码性能,保留部分或全部与所述解码性能匹配的轨道中的时域层级。
在一个实施例中,所述处理单元802,具体用于:
根据保留的一个或多个媒体帧中每个媒体帧的解码时间,对保留的一个或多个媒体帧按照所述解码时间重新排序;
对重新排序后的一个或多个媒体帧进行解码显示。
在本申请实施例中,支持内容生成设备将多时域层级的媒体帧封装到不同的轨道,并在封装过程中,将在每个轨道的描述数据盒中记录时域轨道指示信息,从而实现关联不同时域层级的轨道,指示轨道内的时域层级信息,标注基准轨道,指示多时域层级轨道相互组合的策略信息。那么,处理单元602则可根据轨道内的描述数据盒选取合适的时域层级,将不同轨道的样本组合后进行解码呈现,从而保证了多轨道封装的灵活性,并最大化地节省了解码计算资源。
上述详细阐述了本申请实施例的方法,为了便于更好地实施本申请实施例的上述方案,相应地,下面提供了本申请实施例的装置。
请参见图9,图9示出了本申请一个示例性实施例提供的一种媒体数据的处理装置的结构示意图;该媒体数据的处理装置可以是运行于上述内容生成设备中的一个计算机程序(包 括程序代码),例如该媒体数据的处理装置可以是内容生成设备中的一个应用软件。由图9所示,该媒体数据的处理装置可包括:确定单元901和处理单元902。
在一个示例性实施例中,该媒体数据的处理装置可以用于执行图7所示的方法中的相应步骤;则:
确定单元901,用于根据媒体数据包括的每个媒体帧的帧间依赖关系,确定所述每个媒体帧的时域层级,得到N个时域层级的媒体帧;其中,N为大于1的正整数;
处理单元902,用于分别将所述N个时域层级的媒体帧封装到M个轨道中,并生成对应的描述数据盒;所述M个轨道中第j个轨道的描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中,M为大于1的正整数。
在一个实施例中,所述时域轨道指示信息包括多时域轨道标识字段,所述多时域轨道标识字段用于指示所述N个时域层级的轨道封装方式;所述处理单元902,具体用于:
若将所述N个时域层级的多个媒体帧被封装到多个不同的轨道中,则生成所述多时域轨道标识字段的取值为第一数值;
若将所述N个时域层级的多个媒体帧被封装到单个轨道中,则生成所述多时域轨道标识字段的取值为第二数值。
在一个实施例中,所述时域轨道指示信息包括总时域层数字段;所述处理单元902,具体用于:
根据所述M个轨道封装的时域层级的总数量,生成所述总时域层数字段的取值。
在一个实施例中,所述第j个轨道的时域层级信息包括时域层数字段;所述处理单元902,具体用于:
根据所述第j个轨道封装的时域层级的数量,生成所述时域层数字段的取值。
在一个实施例中,所述第j个轨道的时域层级信息包括时域层级标识字段、帧率字段和码率信息;所述处理单元902,具体用于:
将所述第j个轨道封装的时域层级中,第i个时域层级的层级标识存储在所述时域层级标识字段中;
将累计到属于第i个时域层级的媒体帧时的帧率存储到所述帧率字段中;
将累计到属于第i个时域层级的媒体帧时的码率,作为所述码率信息。
在一个实施例中,所述码率信息包括低码率字段和高码率字段;所述处理单元902,具体用于:
将属于第i个时域层级的媒体帧时的码率的低18位,存储在所述低码率字段中;
将属于第i个时域层级的媒体帧时的码率的高12位,存储在所述高码率字段中。
在一个实施例中,所述时域轨道指示信息包括基准轨道标识字段;所述处理单元902,具体用于:
若所述第j个轨道为基准轨道,则生成所述基准轨道标识字段的取值为第一数值;
若所述第j个轨道为非基准轨道,则生成所述基准轨道标识字段的取值为第二数值;
其中,所述基准轨道中封装的媒体帧被独立解码。
在一个实施例中,所述时域轨道指示信息还包括轨道组合策略信息,所述轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段;所述处理单元902,具体用于:
将包含部分时域层级的轨道的标识存储在所述轨道标识字段中;
若所述第j个轨道为优先解码呈现的轨道,则生成所述优先解码呈现字段的取值为第一数值,若所述第j个轨道不是优先解码呈现的轨道,则生成所述优先解码呈现字段的取值为第二数值;
若所述第j个轨道为一个轨道的替代轨道,则生成所述替代轨道字段的取值为第一数值,并将所述第j个轨道替代的一个轨道的标识存储在所述替代轨道标识字段中;若所述第j个轨道不是替代轨道,则生成所述替代轨道字段的取值为第二数值。
在一个实施例中,所述处理单元902,还用于若所述第j个轨道为非基准轨道,则生成第j个轨道的轨道索引数据盒,所述轨道索引数据盒包括轨道索引类型数据盒;所述轨道索引类型数据盒包括轨道标识字段和索引类型标识字段;
所述处理单元902,还用于将所述基准轨道的标识存储在所述轨道标识字段中,并根据所述索引类型标识字段,将所述第j个轨道索引至基准轨道。
在本申请实施例中,处理单元902通过媒体数据中所包括的媒体帧之间的帧间依赖关系,可确定每个媒体帧的时域层级,进而可将N个时域层级的媒体帧分别封装到M个轨道中,并基于媒体帧的封装过程,生成第j个轨道的描述数据盒,并在描述数据盒中,为时域轨道指示信息包括的各字段设置相应的取值,以通过各字段的取值将不同时域层级的轨道进行关联,指示轨道内的时域层级信息,对基准轨道进行标注,指示多时域层级轨道相互组合的策略,从而实现指示内容消费设备,该处理单元902的媒体帧封装过程,那么也就使得内容消费设备可根据轨道中描述数据盒中的各字段的取值,选取合适的时域层级进行解码呈现,保证了多轨道封装的灵活性,并可最大化节省解码计算资源。
请参见图10,是本申请实施例提供的一种计算机设备的结构示意性框图,该计算机设备可以是上述的该内容消费设备,或者也可以是上述的内容生成设备,其中,该计算机设备可以是服务器,也可以是终端设备。如图10所示的本实施例中的计算机设备可包括:一个或多个处理器101;一个或多个输入设备102,一个或多个输出设备103和存储器104。上述处理器101、输入设备102、输出设备103和存储器104通过总线105连接。存储器104用于存储计算机程序,所述计算机程序包括程序指令,处理器101用于执行所述存储器104存储的程序指令。
所述存储器104可以包括易失性存储器(volatile memory),如随机存取存储器(random-access memory,RAM);存储器104也可以包括非易失性存储器(non-volatile memory),如快闪存储器(flash memory),固态硬盘(solid-state drive,SSD)等;存储器104还可以包括上述种类的存储器的组合。
所述处理器101可以是中央处理器(central processing unit,CPU)。所述处理器101还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)等。该PLD可以是现 场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)等。所述处理器101也可以为上述结构的组合。
本申请实施例中,所述存储器104用于存储计算机程序,所述计算机程序包括程序指令,处理器101用于执行存储器104存储的程序指令,用来实现上述如图5中媒体数据的处理方法所涉及的步骤,其中,所述媒体数据包括多个媒体帧,所述多个媒体帧被划分为N个时域层级,所述多个媒体帧被封装到M个轨道中,其中M和N均为大于1的正整数。
在一个实施例中,所述处理器101被配置调用所述程序指令,用于执行:
获取所述M个轨道中的第j个轨道的描述数据盒,所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M;
根据所述时域轨道指示信息,对所述媒体数据进行解码。
在一个实施例中,所述存储器104用于存储计算机程序,所述计算机程序包括程序指令,处理器101用于执行存储器104存储的程序指令,还可用来实现上述如图7中相应方法的步骤。
在一个实施例中,所述处理器101被配置调用所述程序指令,用于执行:
根据媒体数据包括的每个媒体帧的帧间依赖关系,确定所述每个媒体帧的时域层级,得到N个时域层级的媒体帧;其中,N为大于1的正整数;
分别将所述N个时域层级的媒体帧封装到M个轨道中,并生成对应的描述数据盒;所述M个轨道中第j个轨道的描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中,M为大于1的正整数。
另外,本申请实施例还提供了一种存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行上述实施例提供的方法。
本申请实施例还提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例提供的方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (28)

  1. 一种媒体数据的处理方法,所述媒体数据包括多个媒体帧,所述多个媒体帧被划分为N个时域层级,所述多个媒体帧被封装到M个轨道中,其中M和N均为大于1的正整数;所述方法由内容消费设备执行,所述方法包括:
    获取所述M个轨道中的第j个轨道的描述数据盒,所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M;
    根据所述时域轨道指示信息,对所述媒体数据进行解码。
  2. 如权利要求1所述的方法,所述时域轨道指示信息包括多时域轨道标识字段,所述多时域轨道标识字段用于指示所述N个时域层级的轨道封装方式;
    当所述多时域轨道标识字段为第一数值时,所述多时域轨道标识字段用于指示属于所述N个时域层级的多个媒体帧被封装到多个不同的轨道中;
    当所述多时域轨道标识字段为第二数值时,所述多时域轨道标识字段用于指示属于所述N个时域层级的多个媒体帧被封装到单个轨道中。
  3. 如权利要求1所述的方法,所述时域轨道指示信息包括总时域层数字段;所述总时域层数字段用于指示所述M个轨道封装的时域层级的总数量。
  4. 如权利要求1所述的方法,所述第j个轨道的时域层级信息包括时域层数字段,所述时域层数字段用于指示所述第j个轨道封装的时域层级的数量。
  5. 如权利要求1所述的方法,所述第j个轨道的时域层级信息包括时域层级标识字段、帧率字段和码率信息;
    所述时域层级标识字段用于指示所述第j个轨道封装的时域层级中,第i个时域层级的层级标识;
    所述帧率字段用于指示累计到属于第i个时域层级的媒体帧时的帧率;
    所述码率信息用于指示累计到属于第i个时域层级的媒体帧时的码率。
  6. 如权利要求5所述的方法,所述码率信息包括低码率字段和高码率字段;
    所述低码率字段用于指示累计到属于第i个时域层级的媒体帧时的码率的低18位;
    所述高码率字段用于指示累计到属于第i个时域层级的媒体帧时的码率的高12位。
  7. 如权利要求1所述的方法,所述时域轨道指示信息包括基准轨道标识字段;所述基准轨道标识字段用于指示所述第j个轨道是否为基准轨道;
    当所述基准轨道标识字段为第一数值时,所述基准轨道标识字段用于指示所述第j个轨道为基准轨道,当所述基准轨道标识字段为第二数值时,所述基准轨道标识字段用于指示所述第j个轨道为非基准轨道;
    其中,所述基准轨道中封装的媒体帧被独立解码。
  8. 如权利要求1所述的方法,所述时域轨道指示信息还包括轨道组合策略信息,所述轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段;
    所述轨道标识字段用于指示包含部分时域层级的轨道的标识;
    所述优先解码呈现字段用于指示所述第j个轨道是否为优先解码呈现的轨道;当所述优先解码呈现字段为第一数值时,所述优先解码呈现字段用于指示所述第j个轨道为优先解码呈现的轨道;当所述优先解码呈现字段为第二数值时,所述优先解码呈现字段用于指示所述第j个轨道不是优先解码呈现的轨道;
    所述替代轨道字段用于指示所述第j个轨道是否为所述M个轨道中的一个轨道的替代轨道;当所述替代轨道字段为第一数值时,所述替代轨道字段用于指示所述第j个轨道为所述M个轨道中的一个轨道的替代轨道;当所述替代轨道字段为第二数值时,所述替代轨道字段用于指示所述第j个轨道不是替代轨道;
    所述替代轨道标识字段用于指示所述第j个轨道替代的一个轨道的标识。
  9. 如权利要求1所述的方法,若所述第j个轨道为非基准轨道,则所述第j个轨道还包括轨道索引数据盒,所述轨道索引数据盒包括轨道索引类型数据盒;
    所述轨道索引类型数据盒包括轨道标识字段和索引类型标识字段;
    所述轨道标识字段用于存储基准轨道的标识,所述索引类型标识字段用于指示被索引的轨道为基准轨道。
  10. 如权利要求1~9任一项所述的方法,所述根据所述时域轨道指示信息,对所述媒体数据进行解码,包括:
    根据所述时域轨道指示信息及解码设备的解码性能,保留所述N个时域层级中与所述解码性能匹配的时域层级;
    对保留的时域层级的媒体帧进行解码。
  11. 如权利要求10所述的方法,所述时域轨道指示信息包括多时域轨道标识字段,时域层级标识字段、帧率字段和码率信息,所述码流信息包括低码率字段和高码率字段;
    所述根据所述时域轨道指示信息及所述解码设备的解码性能,保留所述N个时域层级中与所述解码性能匹配的时域层级,包括:
    读取所述时域轨道指示信息中的多时域轨道标识字段的值,在读取的所述多时域轨道标识字段为第二数值时,指示所述N个时域层级的媒体帧被封装到单个轨道,并读取所述时域层级标识字段的值,所述帧率字段的值,以及所述码率信息中低码率字段的值和高码率字段的值;
    根据所述时域层级标识字段的值,所述帧率字段的值,以及所述码率信息中低码率字段的值和高码率字段的值,以及所述解码设备的解码性能,保留所述N个时域层级中与所述解码性能匹配的时域层级。
  12. 如权利要求10所述的方法,所述时域轨道指示信息包括多时域轨道标识字段,时域层级标识字段、帧率字段和码率信息,所述码流信息包括低码率字段和高码率字段;
    所述根据所述时域轨道指示信息及所述解码设备的解码性能,保留所述N个时域层级中与所述解码性能匹配的时域层级,包括:
    读取所述时域轨道指示信息中的多时域轨道标识字段的值,在读取的所述多时域轨道标识字段为第一数值时,指示所述N个时域层级的媒体帧被封装到多个不同的轨道,在各轨道中的时域层级无重叠时,从基准轨道中读取所述时域层级标识字段的值,所述帧率字 段的值,以及所述码率信息中低码率字段的值和高码率字段的值;所述基准轨道中封装的媒体帧被独立解码;
    根据从所述基准轨道读取的所述时域层级标识字段的值,所述帧率字段的值,以及所述码率信息中低码率字段的值和高码率字段的值,以及所述解码设备的解码性能,保留部分或全部与所述解码性能匹配的轨道中的时域层级。
  13. 如权利要求12所述的方法,所述时域轨道指示信息还包括轨道组合策略信息,所述轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段;所述方法还包括:
    在读取的所述多时域轨道标识字段为第一数值时,指示所述N个时域层级的媒体帧被封装到多个不同的轨道,在各轨道中的时域层级存在重叠,从基准轨道中读取所述轨道组合策略信息中的各字段的取值;
    根据所述轨道组合策略信息中的各字段的取值,以及所述解码设备的解码性能,保留部分或全部与所述解码性能匹配的轨道中的时域层级。
  14. 如权利要求10所述的方法,保留的媒体帧的数量为一个或多个,所述对保留的时域层级的媒体帧进行解码,包括:
    根据保留的一个或多个媒体帧中每个媒体帧的解码时间,对保留的一个或多个媒体帧按照所述解码时间重新排序;
    对重新排序后的一个或多个媒体帧进行解码。
  15. 一种媒体数据的处理方法,所述方法由内容生成设备执行,所述方法包括:
    根据媒体数据包括的每个媒体帧的帧间依赖关系,确定所述每个媒体帧的时域层级,得到N个时域层级的媒体帧;其中,N为大于1的正整数;
    分别将所述N个时域层级的媒体帧封装到M个轨道中,并生成对应的描述数据盒;所述M个轨道中第j个轨道的描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中,M为大于1的正整数。
  16. 如权利要求15所述的方法,所述时域轨道指示信息包括多时域轨道标识字段,所述多时域轨道标识字段用于指示所述N个时域层级的轨道封装方式;
    所述根据媒体帧的封装过程,生成第j个轨道的描述数据盒,包括:
    若将所述N个时域层级的多个媒体帧被封装到多个不同的轨道中,则生成所述多时域轨道标识字段的取值为第一数值;
    若将所述N个时域层级的多个媒体帧被封装到单个轨道中,则生成所述多时域轨道标识字段的取值为第二数值。
  17. 如权利要求15所述的方法,所述时域轨道指示信息包括总时域层数字段;
    所述根据媒体帧的封装过程,生成第j个轨道的描述数据盒,包括:
    根据所述M个轨道封装的时域层级的总数量,生成所述总时域层数字段的取值。
  18. 如权利要求15所述的方法,所述第j个轨道的时域层级信息包括时域层数字段;
    所述根据媒体帧的封装过程,生成第j个轨道的描述数据盒,包括:
    根据所述第j个轨道封装的时域层级的数量,生成所述时域层数字段的取值。
  19. 如权利要求15所述的方法,所述第j个轨道的时域层级信息包括时域层级标识字段、帧率字段和码率信息;
    所述根据媒体帧的封装过程,生成第j个轨道的描述数据盒,包括:
    将所述第j个轨道封装的时域层级中,第i个时域层级的层级标识存储在所述时域层级标识字段中;
    将累计到属于第i个时域层级的媒体帧时的帧率存储到所述帧率字段中;
    将累计到属于第i个时域层级的媒体帧时的码率,作为所述码率信息。
  20. 如权利要求19所述的方法,所述码率信息包括低码率字段和高码率字段;所述将累计到属于第i个时域层级的媒体帧时的码率,作为所述码率信息,包括:
    将属于第i个时域层级的媒体帧时的码率的低18位,存储在所述低码率字段中;
    将属于第i个时域层级的媒体帧时的码率的高12位,存储在所述高码率字段中。
  21. 如权利要求15所述的方法,所述时域轨道指示信息包括基准轨道标识字段;
    所述根据媒体帧的封装过程,生成第j个轨道的描述数据盒,包括:
    若所述第j个轨道为基准轨道,则生成所述基准轨道标识字段的取值为第一数值;
    若所述第j个轨道为非基准轨道,则生成所述基准轨道标识字段的取值为第二数值;
    其中,所述基准轨道中封装的媒体帧被独立解码。
  22. 如权利要求15所述的方法,所述时域轨道指示信息还包括轨道组合策略信息,所述轨道组合策略信息包括轨道标识字段,优先解码呈现字段,替代轨道字段,以及替代轨道标识字段;
    所述根据媒体帧的封装过程,生成第j个轨道的描述数据盒,包括:
    将包含部分时域层级的轨道的标识存储在所述轨道标识字段中;
    若所述第j个轨道为优先解码呈现的轨道,则生成所述优先解码呈现字段的取值为第一数值,若所述第j个轨道不是优先解码呈现的轨道,则生成所述优先解码呈现字段的取值为第二数值;
    若所述第j个轨道为一个轨道的替代轨道,则生成所述替代轨道字段的取值为第一数值,并将所述第j个轨道替代的一个轨道的标识存储在所述替代轨道标识字段中;若所述第j个轨道不是替代轨道,则生成所述替代轨道字段的取值为第二数值。
  23. 如权利要求15所述的方法,所述方法还包括:
    若所述第j个轨道为非基准轨道,则生成第j个轨道的轨道索引数据盒,所述轨道索引数据盒包括轨道索引类型数据盒;所述轨道索引类型数据盒包括轨道标识字段和索引类型标识字段;
    将所述基准轨道的标识存储在所述轨道标识字段中,并根据所述索引类型标识字段,将所述第j个轨道索引至基准轨道。
  24. 一种媒体数据的处理装置,所述媒体数据包括多个媒体帧,所述多个媒体帧被划分为N个时域层级,所述多个媒体帧被封装到M个轨道中,其中M和N均为大于1的正整数;所述装置包括:
    获取单元,用于获取所述M个轨道中的第j个轨道的描述数据盒,所述描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中j为正整数,且j≤M;
    处理单元,用于根据所述时域轨道指示信息,对所述媒体数据进行解码。
  25. 一种媒体数据的处理装置,包括:
    确定单元,用于根据媒体数据包括的每个媒体帧的帧间依赖关系,确定所述每个媒体帧的时域层级,得到N个时域层级的媒体帧;其中,N为大于1的正整数;
    处理单元,用于分别将所述N个时域层级的媒体帧封装到M个轨道中,并生成对应的描述数据盒;所述M个轨道中第j个轨道的描述数据盒包括时域轨道指示信息,所述时域轨道指示信息用于指示所述N个时域层级的轨道封装方式,所述时域轨道指示信息包括所述第j个轨道封装的时域层级的时域层级信息,其中,M为大于1的正整数。
  26. 一种计算机设备,包括:
    处理器,适于实现一条或多条指令;以及,
    存储器,存储有一条或多条指令,所述一条或多条指令适于由所述处理器加载并执行如权利要求1~14任一项所述的方法,或者,执行如权利要求15~23任一项所述的方法。
  27. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括第一程序指令和第二程序指令,所述第一程序指令当被处理器执行时使所述处理器执行如权利要求1~14任一项所述的方法,所述第二程序指令当被处理器执行时使所述处理器执行如权利要求15~23任一项所述的方法。
  28. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1~14任一项所述的方法,或者,执行权利要求15~23任一项所述的方法。
PCT/CN2022/083960 2021-06-11 2022-03-30 一种媒体数据的处理方法及相关设备 WO2022257567A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22819172.2A EP4354868A1 (en) 2021-06-11 2022-03-30 Media data processing method and related device
US18/072,975 US12034947B2 (en) 2021-06-11 2022-12-01 Media data processing method and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110656768.4 2021-06-11
CN202110656768.4A CN115474053A (zh) 2021-06-11 2021-06-11 一种媒体数据的处理方法及相关设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/072,975 Continuation US12034947B2 (en) 2021-06-11 2022-12-01 Media data processing method and related device

Publications (1)

Publication Number Publication Date
WO2022257567A1 true WO2022257567A1 (zh) 2022-12-15

Family

ID=84364609

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083960 WO2022257567A1 (zh) 2021-06-11 2022-03-30 一种媒体数据的处理方法及相关设备

Country Status (4)

Country Link
US (1) US12034947B2 (zh)
EP (1) EP4354868A1 (zh)
CN (1) CN115474053A (zh)
WO (1) WO2022257567A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116708938A (zh) * 2023-08-03 2023-09-05 腾讯科技(深圳)有限公司 视频处理方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101518087A (zh) * 2006-08-24 2009-08-26 诺基亚公司 用于指示媒体文件中轨道关系的系统和方法
GB2506911A (en) * 2012-10-12 2014-04-16 Canon Kk Streaming data corresponding to divided image portions (tiles) via a description file including spatial and URL data
CN106664446A (zh) * 2014-07-01 2017-05-10 佳能株式会社 用于封装hevc分层媒体数据的方法、设备和计算机程序
CN112804256A (zh) * 2021-02-09 2021-05-14 腾讯科技(深圳)有限公司 多媒体文件中轨道数据的处理方法、装置、介质及设备

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100128112A1 (en) 2008-11-26 2010-05-27 Samsung Electronics Co., Ltd Immersive display system for interacting with three-dimensional content
WO2017024990A1 (en) 2015-08-07 2017-02-16 Mediatek Inc. Method and apparatus of bitstream random access and synchronization for multimedia applications
EP3406079A4 (en) 2016-02-03 2019-07-10 MediaTek Inc. PUSH PATTERN METHOD AND SYSTEM AND URL LIST FOR DASH PROTOCOL ON INTEGRAL DUPLEX PROTOCOLS
EP3411975B1 (en) 2016-02-19 2022-05-04 Mediatek Inc. Method and system of adaptive application layer fec for mpeg media transport
TWI650994B (zh) 2016-09-02 2019-02-11 聯發科技股份有限公司 提升品質遞送及合成處理
US20180075576A1 (en) 2016-09-09 2018-03-15 Mediatek Inc. Packing projected omnidirectional videos
US10623635B2 (en) 2016-09-23 2020-04-14 Mediatek Inc. System and method for specifying, signaling and using coding-independent code points in processing media contents from multiple media sources
US11197040B2 (en) 2016-10-17 2021-12-07 Mediatek Inc. Deriving and signaling a region or viewport in streaming media
US10742999B2 (en) 2017-01-06 2020-08-11 Mediatek Inc. Methods and apparatus for signaling viewports and regions of interest
US10805620B2 (en) 2017-01-11 2020-10-13 Mediatek Inc. Method and apparatus for deriving composite tracks
US11139000B2 (en) 2017-03-07 2021-10-05 Mediatek Inc. Method and apparatus for signaling spatial region information
US10542297B2 (en) 2017-03-07 2020-01-21 Mediatek Inc. Methods and apparatus for signaling asset change information for media content
US10565616B2 (en) 2017-07-13 2020-02-18 Misapplied Sciences, Inc. Multi-view advertising system and method
CN113615207A (zh) 2019-03-21 2021-11-05 Lg电子株式会社 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法
US11831861B2 (en) 2019-08-12 2023-11-28 Intel Corporation Methods for viewport-dependent adaptive streaming of point cloud content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101518087A (zh) * 2006-08-24 2009-08-26 诺基亚公司 用于指示媒体文件中轨道关系的系统和方法
GB2506911A (en) * 2012-10-12 2014-04-16 Canon Kk Streaming data corresponding to divided image portions (tiles) via a description file including spatial and URL data
CN106664446A (zh) * 2014-07-01 2017-05-10 佳能株式会社 用于封装hevc分层媒体数据的方法、设备和计算机程序
CN112804256A (zh) * 2021-02-09 2021-05-14 腾讯科技(深圳)有限公司 多媒体文件中轨道数据的处理方法、装置、介质及设备

Also Published As

Publication number Publication date
EP4354868A1 (en) 2024-04-17
US12034947B2 (en) 2024-07-09
CN115474053A (zh) 2022-12-13
US20230091266A1 (en) 2023-03-23

Similar Documents

Publication Publication Date Title
KR102406887B1 (ko) 시간 설정형 미디어 데이터를 발생시키는 방법, 디바이스, 및 컴퓨터 프로그램
TWI656786B (zh) 取樣自適應偏移裝置
US6959116B2 (en) Largest magnitude indices selection for (run, level) encoding of a block coded picture
US6871006B1 (en) Processing of MPEG encoded video for trick mode operation
KR102028527B1 (ko) 영상 디코딩 방법 및 이를 이용하는 장치
CN103237218B (zh) 图像信息解码设备和方法、图像信号编码设备和方法及程序
TW201743611A (zh) 以高效率視訊寫碼及分層高效率視訊寫碼檔案格式之圖塊分組及樣本之映射
CN107211168A (zh) 在分层视频文件格式中的样本条目及操作点发信设计
TW201836355A (zh) 視訊解碼方法
CN105144713A (zh) 用于解码器设置的对视频进行编码的方法及其装置以及基于解码器设置对视频进行解码的方法及其装置
US20230022526A1 (en) Video processing method and apparatus, device, and storage medium
WO2022257567A1 (zh) 一种媒体数据的处理方法及相关设备
US20240080487A1 (en) Method, apparatus for processing media data, computer device and storage medium
CN112565815A (zh) 文件封装方法、文件传输方法、文件解码方法及相关设备
WO2024078066A1 (zh) 视频解码方法、视频编码方法、装置、存储介质及设备
TWI794076B (zh) 多媒體資源中軌道資料的處理方法、裝置、媒體及設備
US9706201B2 (en) Region-based processing of predicted pixels
US7423652B2 (en) Apparatus and method for digital video decoding
Rabie et al. PixoComp: a novel video compression scheme utilizing temporal pixograms
CN113905255B (zh) 媒体数据的编辑方法、媒体数据的封装方法及相关设备
CN108366263A (zh) 视频解码方法、设备及存储介质
TW202431841A (zh) 使用圖塊到光柵重排序的增強型視訊解碼器
JP2005057345A (ja) 画像データの保存管理方法とその装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22819172

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022819172

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022819172

Country of ref document: EP

Effective date: 20240111