US9554143B2 - System and method for efficient multi-bitrate and multi-spatial resolution media encoding - Google Patents

System and method for efficient multi-bitrate and multi-spatial resolution media encoding Download PDF

Info

Publication number
US9554143B2
US9554143B2 US14/064,224 US201314064224A US9554143B2 US 9554143 B2 US9554143 B2 US 9554143B2 US 201314064224 A US201314064224 A US 201314064224A US 9554143 B2 US9554143 B2 US 9554143B2
Authority
US
United States
Prior art keywords
pictures
encoding
combinations
encoding parameters
ctb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/064,224
Other versions
US20140119451A1 (en
Inventor
Sriram Sethuraman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ittiam Systems Pvt Ltd
Original Assignee
Ittiam Systems Pvt Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ittiam Systems Pvt Ltd filed Critical Ittiam Systems Pvt Ltd
Assigned to ITTIAM SYSTEMS (P) LTD. reassignment ITTIAM SYSTEMS (P) LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SETHURAMAN, SRIRAM
Publication of US20140119451A1 publication Critical patent/US20140119451A1/en
Priority to US15/369,898 priority Critical patent/US9654791B1/en
Application granted granted Critical
Publication of US9554143B2 publication Critical patent/US9554143B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/39Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability involving multiple description coding [MDC], i.e. with separate layers being structured as independently decodable descriptions of input picture data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • Embodiments of the present invention relate to media encoding. More particularly, embodiments of the present invention relate to efficient multi-bitrate and multi-spatial resolution media encoding.
  • media streams are encoded or compressed in a certain encoding format prior to transmission of the media streams over a network.
  • Typical encoding for hypertext transfer protocol (HTTP) based adaptive streaming technology requires producing multiple titrate streams from a single input media to allow adaptation to changing bandwidth during content delivery (e.g. dynamic adaptive streaming over HTTP or DASH).
  • the encoding service produces media streams of multiple spatial resolutions, such as a quarter video graphics array (QVGA), a half-size video graphics array (HVGA), a video graphics array (VGA), a wide video graphics array (WVGA), 720p, 1080p/i, and so on.
  • QVGA quarter video graphics array
  • HVGA half-size video graphics array
  • VGA video graphics array
  • WVGA wide video graphics array
  • 720p, 1080p/i 720p, 1080p/i, and so on.
  • the process of encoding can be performed using on a public cloud infrastructure or a private cloud infrastructure using software-based or custom hardware appliances
  • Newer video coding standards such as high efficiency video coding (HEVC) require a much higher computational complexity compared to older video coding standards, and, thereto efficient encoding becomes even more critical to help keep encoding costs lower.
  • HEVC high efficiency video coding
  • the information contained in it is not that useful for coding the lower resolution streams.
  • optimal rate-distortion performance with the output formats may not be possible with re-use.
  • coding tools in standards such as H.264 or HEVC High Efficiency Video Coding
  • features such as, weighted prediction, hierarchical group of pictures (GOP) structure, multiple reference motion estimation (ME), and the like
  • older standards such as, MPEG-2 (Moving Pictures Expert Group-2) do not have the above mentioned features.
  • some encoders such as the open-source x264 encoder
  • FIG. 1A illustrates an architecture for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, according to one embodiment
  • FIGS. 1B-1D illustrate architectures for efficiently encoding the sequence of pictures at the given spatial resolution to the encoding format at the plurality of output bitrates, according to three different embodiments, respectively;
  • FIG. 2 illustrates an architecture for encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output spatial resolutions, according to one embodiment
  • FIG. 2A is a block diagram illustrating an intelligent encoding module, such as one shown in FIG. 2 , coupled to a plurality of normative encoding modules for encoding the sequence of pictures to the encoding format at a plurality of output bit rates, according to one embodiment;
  • FIG. 3A is a block diagram illustrating components of an intelligent encoding module shown in FIGS. 1A-1D, 2 and 2A , according to one embodiment;
  • FIG. 3B is a block diagram illustrating components of a normative encoding module, shown in FIGS. 1A-1D and 2A , according to one embodiment
  • FIGS. 4A-4B are block diagrams illustrating a coding tree and a coding tree block (CTB), respectively, in the context of the present invention
  • FIGS. 5A-5B are flow diagrams of methods for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates and a plurality of output spatial resolutions respectively, according to two different embodiments that can also be combined;
  • FIGS. 6A-6B are systems for performing efficient media encoding, according to two different embodiments.
  • FIG. 1A illustrates an architecture 100 A for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, according to one embodiment.
  • the architecture 100 A includes an intelligent encoding module 102 and a set of normative encoding modules 104 A-N. Further, the intelligent encoding module 102 is coupled to the normative encoding modules 104 A-N.
  • the intelligent encoding module 102 receives the sequence of pictures. Further, the intelligent encoding module 102 can optionally determine a picture type for each of the pictures or receive the picture type for each of the pictures. The intelligent encoding module 102 then identifies best combinations of encoding parameters from a determined set of combinations of encoding parameters for each coding tree block (CTB) in each of the pictures supported by the encoding format for the associated picture type.
  • CTB coding tree block
  • a combination of encoding parameters for each CTB in each of the pictures is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree option, a partition type of each leaf block of each coding tree option, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree option, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree option, and a transform tree for each leaf block or leaf block partition of each coding tree option.
  • An example coding tree 400 A is shown in FIG. 4A
  • the corresponding coding tree block 402 , an example leaf block 404 , and an example leaf block partition 406 are shown in FIG. 4B .
  • the intelligent encoding module 102 evaluates a cost function that depends on one of the output bitrates for each combination of encoding parameters in the determined set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture types.
  • the cost function can be evaluated in terms of one or more of, input distortion to encoder (such as sum of absolute of the residuals being encoded in either the spatial domain or a transform domain), output distortion of encoder (such as sum of squared error between the original samples and the reconstructed samples), partial estimated or actual rate (such as number of bits to encode the non-residual elements of the bit stream), and estimate or actual rate (i.e. total number of bits required to encode including residuals).
  • the intelligent encoding module 102 then identifies the best combinations of encoding parameters for each CTB in each of the pictures from the determined set of combinations of encoding parameters based on the evaluated cost function.
  • the intelligent encoding module 102 encodes the pictures in the encoding format based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the one of the output bitrates. For example, components residing in the intelligent encoding module 102 are explained in more detail with reference to FIG. 3A .
  • the normative encoding modules 104 A-N one for each of remaining output bitrates, each of which receives the sequence of pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102 .
  • Each of the normative encoding modules 104 A-N then encodes the sequence of pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at respective one of the remaining output bitrates.
  • each of the normative encoding modules 104 A-N re-evaluates the cost function that is dependent on respective one of the remaining output bitrates for a set of combinations of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to determine the one combination of encoding parameters used for encoding each CTB in each of the pictures at the respective one of the remaining output bitrates.
  • Each of the normative encoding modules 104 A-N then encodes the sequence of pictures based on the one of the best combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at respective one of the remaining output bitrates.
  • each of the normative encoding modules 104 A-N performs motion compensation or intra prediction and implements entire encoding loop (residual computation, forward transform, quantization (simple or RD-optimized), inverse quantization, and/or inverse transform, reconstruction).
  • Each of the normative encoding modules 104 A-N also performs entropy coding. Without loss of generality, each of the normative encoding modules 104 A-N also includes in-loop filtering functionality, such as deblocking, sample adaptive offset, adaptive loop filtering, and so on. This is explained in more detail with reference to FIG. 3B .
  • FIG. 1B illustrates an architecture 100 B for efficiently encoding the sequence of pictures at the given spatial resolution to the encoding format at a plurality of output bitrates, according to one embodiment.
  • the architecture 100 B includes the intelligent encoding module 102 and the normative encoding modules 104 A-N and a coarse-to-fine mapping module 106 . Further, the coarse-to-fine mapping module 106 is coupled to the intelligent encoding module 102 . Furthermore, the intelligent encoding module 102 is coupled to the normative encoding modules 104 A-N.
  • the coarse-to-fine mapping module 106 receives combinations of encoding parameters for each CTB in each of the pictures at a reduced spatial resolution version derived from the sequence of pictures at a given spatial resolution and maps them to corresponding blocks at the given spatial resolution than this reduced spatial resolution.
  • the intelligent encoding module 102 receives the sequence of pictures and the mapped outputs from the coarse-to-fine mapping module 106 and determines a set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures by using the mapped outputs of the coarse-to-fine mapping module 106 . Furthermore, the intelligent encoding module 102 can optionally determine a picture type for each of the pictures or receive the picture type for each of the pictures. The intelligent encoding module 102 then identifies best combinations of encoding parameters for each CTB in each of the pictures from the determined set of combinations of encoding parameters supported by the encoding format for the associated picture type.
  • the intelligent encoding module 102 encodes the pictures based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at one of the output bitrates.
  • the normative encoding modules 104 A-N one for each of remaining output bitrates, each of which receives the pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102 .
  • Each of the normative encoding modules 104 A-N then encodes the pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at respective one of the remaining output bitrates. This is explained in more detail with reference to FIG. 1A .
  • FIG. IC is another architecture 100 C for efficiently encoding the sequence of pictures at the given spatial resolution to the encoding format at the output bitrates, according to one embodiment.
  • the architecture 100 C includes an estimation module 108 , the intelligent encoding module 102 and the normative encoding modules 104 A-N. Further, the estimation module 108 is coupled to the intelligent encoding module 102 . Furthermore, the intelligent encoding module 102 is coupled to the normative encoding modules 104 A-N.
  • the estimation module 108 receives sequence of pictures at a reduced spatial resolution version and estimates the set of combinations of encoding parameters for the CTBs in each of the pictures at the given spatial resolution based on the reduced spatial resolution version of the sequence of pictures relative to the given spatial resolution. Further, the intelligent encoding module 102 receives the pictures and the estimated set of combinations of encoding parameters for the CTBs in each of the pictures at the given spatial resolution from the estimation module 108 . Furthermore, the intelligent encoding module 102 determines the set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures using the estimated set of combinations of encoding parameters for the CTBs in each of the pictures at the given spatial resolution received from the estimation module 108 .
  • the intelligent encoding module 102 can optionally determine the picture type for each of the pictures or receive the picture type for each of the pictures. The intelligent encoding module 102 then identifies the best combinations of encoding parameters from the determined set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture type.
  • the intelligent encoding module 102 encodes the pictures based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at one of the output bitrates.
  • the normative encoding modules 104 A-N one for each of remaining output bitrates, each of which receives the pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102 .
  • Each of the normative encoding modules 104 A-N then encodes the pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at respective one of the remaining output bitrates. This is explained in more detail with reference to FIG. 1A .
  • the architecture 100 D includes the intelligent encoding module 102 , the normative encoding modules 104 A-N, a media decoding module (MDM) 110 , a format-to-format mapping module 112 .
  • MDM media decoding module
  • the MDM 110 is coupled to the format-to-format mapping module 112 and the intelligent encoding module 102 .
  • the format-to-format mapping module 112 is coupled to the intelligent encoding module 102 .
  • the intelligent encoding module 102 is coupled to the normative encoding modules 104 A-N.
  • the architecture 100 D also optionally includes a mid-processing module 114 coupled between the MDM 110 and intelligent encoding module 102 .
  • the MOM 110 receives and decodes the media bit stream in a certain input encoding format to produce decompressed sequence of pictures.
  • the MDM 110 also produces a set of coding parameters in the input encoding format that are obtained while decoding the media stream.
  • the coding parameters is comprised of one or more of the encoding parameters described earlier, and additionally, parameters such as quantization parameter for each leaf block of each coding tree option, number of bits for coding the non-residual syntax elements in the input encoding format, and the number of bits for coding the residual syntax elements in the input encoding format.
  • the format-to-format mapping module 112 receives the coding parameters from the MOM 110 and maps the coding parameters available for each CTB in the sequence of pictures in the certain input encoding format to a set of encoding parameters supported by the encoding format. Further, the intelligent encoding module 102 receives the decompressed sequence of pictures from the MDM 110 and the mapped outputs of the format-to-format mapping module 112 . In one example embodiment, the mid-processing module 114 performs one or more of removing coding artifacts in the decompressed sequence of pictures, de-interlacing the sequence of pictures, and enhancing the decompressed sequence of pictures. The intelligent encoding module 102 receives the sequence of pictures from the mid-processing module 114 .
  • the intelligent encoding module 102 determines the set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures using the mapped outputs received from the format-to-format mapping module 112 .
  • the intelligent encoding module 102 can optionally determine the picture type for each of the pictures type or receive the picture type for each of the picture types as an input. The intelligent encoding module 102 then identifies the best combinations of encoding parameters for each CTB in each of the pictures from the determined set of combinations of encoding parameters supported by the encoding format for the associated picture type.
  • the intelligent encoding module 102 encodes the pictures based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at one of the output bitrates.
  • the normative encoding modules 104 A-N one for each of remaining output bitrates, each of which receives the pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102 .
  • Each of the nonnative encoding modules 104 A-N then encodes the pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at respective one of the remaining output bitrates. This is explained in more detail with reference to FIG. 1A .
  • FIG. 2 is an architecture 200 for encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output spatial resolutions, according to one embodiment.
  • the architecture 200 includes a media transformation module (MTM) 202 , the estimation module 108 , a set of coarse-to-fine mapping modules 204 A-N, one for each of the output spatial resolutions other than a lowest spatial resolution of the output spatial resolutions, and a set of intelligent encoding modules 206 A-N, one for each of the output spatial resolutions.
  • MTM media transformation module
  • the MTM 202 is coupled to the estimation module 108 and the intelligent encoding modules 206 A-N. Furthermore, each of the coarse-to-fine mapping modules 204 A-N is coupled to respective one of the intelligent encoding modules 206 A-N.
  • the architecture 200 also optionally includes a fine-to-coarse mapping module 208 that is coupled to the estimation module 108 . Further, the architecture 200 optionally includes a look-ahead processing module 210 that is coupled to the estimation module 108 and the intelligent encoding modules 206 A-N.
  • the MTM 202 transforms the sequence of pictures at the given spatial resolution to the output spatial resolutions and/or an additional spatial resolution that is lower than the output spatial resolutions.
  • the MTM 202 performs one of resizing to derive sequence of pictures at higher or lower resolution than the given resolution, de-interlacing to convert interlaced to progressive content, cadence generation, frame rate up/down conversion, and the like.
  • the MTM 202 transforms decompressed pictures at the given spatial resolution to the output spatial resolutions and/or an additional spatial resolution.
  • the estimation module 108 estimates a set of combinations of encoding parameters for each CTB in each of the pictures at a lowest output spatial resolution among the output spatial resolutions based on the additional spatial resolution.
  • the fine-to-coarse mapping module 208 receives combinations of encoding parameters at a higher spatial resolution and provides a set of possible encoding parameters for a corresponding region at the additional spatial resolution to the estimation module 108 using which it estimates the set of combinations of encoding parameters.
  • each of the coarse-to-fine mapping modules 204 A-N maps combinations of encoding parameters for each CTB in each of the pictures at an immediate coarser spatial resolution to corresponding blocks at their respective output spatial resolution.
  • the intelligent encoding modules 206 A-N one for each of the plurality of output spatial resolutions, each of which receives the sequence of pictures at the respective output spatial resolution and a picture type for each of the pictures.
  • the look-ahead processing module 210 receives the sequence of pictures at the given spatial resolution or the sequence of pictures at lower spatial resolutions from the MTM 202 and determines group of pictures (GOP) structure and hence the picture type for each of the pictures across all the intelligent encoding modules 206 A-N.
  • GOP group of pictures
  • Each of the intelligent encoding modules 206 A-N determines a set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures based on the outputs of one or more of the estimation module 108 and the respective one of the coarse-to-fine mapping modules 204 A-N. In one example embodiment, when the respective output spatial resolution is lower than the given resolution, each of the intelligent encoding modules 206 A-N receives the output of the fine-to-coarse mapping module 208 .
  • a combination of encoding parameters for each CTB in each of the pictures is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree option, a partition type of each leaf block of each coding tree option, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree option, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree option, and a transform tree for each leaf block or leaf block partition of each coding tree option.
  • An example coding tree 400 A is shown in FIG. 4A
  • the corresponding coding tree block 402 is shown in FIG. 4A
  • an example leaf block 404 an example leaf block partition 406 are shown in FIG. 4B .
  • each of the intelligent encoding modules 206 A-N identifies best combinations of encoding parameters from the determined set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture type.
  • the intelligent encoding module 102 evaluates a cost function for each combination of encoding parameters in the set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture type.
  • the intelligent encoding module 102 then identifies the best combinations of encoding parameters from the set of combinations of encoding parameters for each CTB in each of the pictures based on the evaluated cost function.
  • each of the intelligent encoding modules 206 A-N encodes the pictures in the encoding format based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the respective output spatial resolution.
  • an intelligent encoding module 206 (e.g., intelligent encoding modules 206 A-N) is coupled to a set of normative encoding modules 212 A-N as shown in FIG. 2A . Further, the normative encoding modules 212 A-N, one for each of other desired output bitrates at a respective output spatial resolution, each of which receives the sequence of pictures at the respective output spatial resolution, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 206 .
  • each of the normative encoding modules 212 A-N encodes the pictures in the encoding format based on one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the respective desired output titrate at the respective output spatial resolution.
  • FIG. 3A which its a block diagram 300 A illustrating an intelligent encoding module 302 (e.g., one of the intelligent encoding module 102 and the intelligent encoding modules 206 A-N shown in FIGS. 1A-1D, 2 and 2A , respectively).
  • the intelligent encoding module 302 includes a rate control module 304 , an intra-prediction node selection or refinement module 306 , a motion estimation or refinement module 308 and a mode decision module 310 .
  • the intelligent encoding module 302 at a given spatial resolution includes one or more of the rate control module 304 , the intra-prediction mode selection or refinement module 306 , and the mode decision module 310 .
  • the intelligent encoding module 302 is capable of taking in information from one or more of mapping modules, such as a coarse-to-fine mapping module (e.g., one of the coarse-to-fine mapping module 106 of FIG. 1B and the coarse-to-fine mapping modules 204 A-N of FIG. 2 ), a fine-to-coarse mapping module (e.g., the fine-to-coarse mapping module 208 of FIG. 2 ) that receives combinations of encoding parameters for each CTB in each of the pictures at a higher spatial resolution and maps the received combinations of encoding parameters to corresponding blocks at the given spatial resolution, an estimation module (e.g., the estimation module 108 of FIGS.
  • mapping modules such as a coarse-to-fine mapping module (e.g., one of the coarse-to-fine mapping module 106 of FIG. 1B and the coarse-to-fine mapping modules 204 A-N of FIG. 2 ), a fine-to-coarse mapping module (e.g., the fine-
  • a format-to-format mapping module that produce outputs in a form suitable for consumption at the given spatial resolution
  • a format-to-format mapping module 112 of FIG. 1C determined picture type for each picture and a set of reference frames for each inter-coded picture along with an associated optional weight and offset for each reference frame for weighted prediction from a look-ahead processing module (e.g., the look-ahead processing module 210 of FIG. 2 ).
  • the intelligent encoding module 302 produces an optimal encoded bit stream in the encoding format at a bitrate selected amongst the output bitrates at that spatial resolution and provides information input to the coarse-to-fine mapping module at a higher resolution and optionally to other normative encoding modules (e.g., normative encoding modules 104 A-N of FIGS. 1A-1D or normative encoding modules 212 A-N of FIG. 2A ) at the given spatial resolution.
  • normative encoding modules e.g., normative encoding modules 104 A-N of FIGS. 1A-1D or normative encoding modules 212 A-N of FIG. 2A
  • the rate control module 304 computes frame level and coding-tree level quantization parameters for the determined picture type of each picture and corresponding to operating bitrate selected for the intelligent encoding module 302 out of the output bitrates.
  • the intra-prediction mode selection or refinement module 306 takes in the combinations of encoding parameters for each CTB in each picture from one or more of the mapping modules, determines a refinement set of combinations of encoding parameters at the given spatial resolution based on the outputs of the mapping modules, evaluates a cost function for each combination of encoding parameters that depends on the computed quantization parameter, and identifies the top few combinations of encoding parameters that have the lowest evaluated cost across the set of combinations of encoding parameters.
  • the encoding parameters relevant for the intra-prediction mode selection or refinement module 306 within the combination of encoding parameters include a coding tree structure, a partition type of each leaf block of the coding tree, an intra prediction mode of each leaf block partition, and a transform tree for each leaf block or leaf block partition.
  • the intra-prediction mode selection or refinement module 306 performs the evaluation of cost function by using intra prediction performed using neighboring pixel values in the input sequence of pictures or by using intra prediction performed using neighboring pixel values that are the reconstructed pixel values obtained by quantizing the transformed or non-transformed (intra or inter) prediction residuals for the causal or non-causal neighboring blocks, de-quantizing, inverse transforming if transformed, and summing up the (intra or inter) predicted value for those neighboring pixels.
  • the motion estimation or refinement module 308 takes in the combinations of encoding parameters for each CTB in each inter-picture from one or more of the mapping modules, determines a refinement set of combinations of encoding parameters at the given spatial resolution based on the outputs of the mapping module, evaluates a cost function for each combination of encoding parameters that depends on the computed quantization parameter, and identifies the top few combinations of encoding parameters that have the lowest evaluated cost across the set of combinations.
  • the encoding parameters relevant for this module within the combination of encoding parameters include coding tree structures, a partition type of each leaf block of each coding tree, a prediction mode of each leaf block partition, such as unidirectional or bi-directional prediction, sub-pixel or full-pixel accurate motion vectors with reference to a set of reference frames for each leaf block partition, and a transform tree for each leaf block or leaf block partition.
  • the mode decision module 310 maintains strict neighbor dependencies required by the encoding format, receives the identified best combinations of encoding parameters from the intra prediction mode selection or refinement module 306 and the motion estimation or refinement module 308 for each CTB in each picture.
  • the mode decision module 310 evaluates a cost function (that depends on the computed quantization parameter) across the received combinations of encoding parameters as well as additional modes for each leaf block of the coding tree in each combination for which the decoder-side can infer the mode, motion vectors, and/or reference frames based on a signaled neighbor index, where the cost function evaluation can require intra prediction and motion compensation as well as (a) evaluation of exact encoding distortion incurred due to quantization (after transforming or in the spatial domain) and the bits required for encoding the syntax elements for the encoding format or (b) model based estimation of the encoding distortion and the bits required, for each combination of encoding parameters for each CTB.
  • a cost function that depends on the computed quantization parameter
  • the mode decision module 310 identifies the set of combinations of encoding parameters that result in the lowest cost across all the evaluated combinations for each CTB in each picture, performs the normative steps for obtaining the quantized residuals for the best combination of encoding parameters for a CTB, reconstructs the pixels by inverse quantizing the quantized residuals and inverse transforming (if residuals were forward transformed) and adding the prediction values, and performs in-loop operations, such as in-loop deblocking, sample adaptive offset, adaptive loop filtering that are selected for the encoder and supported by the encoding format.
  • in-loop operations such as in-loop deblocking, sample adaptive offset, adaptive loop filtering that are selected for the encoder and supported by the encoding format.
  • the mode decision module 310 encodes the set of combinations of encoding parameters according to the syntax and semantics of the encoding format along with the encoding of the quantized residuals according to the syntax and semantics of the encoding format to produce a media bit stream at the selected output bitrate.
  • FIG. 3B is a block diagram 300 B illustrating a normative encoding module 312 (e.g., one of the normative encoding modules 104 A-N and normative encoding modules 212 A-N shown in FIGS. 1A-1D and 2A , respectively).
  • the normative encoding module 312 include a rate control module 314 , a mode selection module 316 , an intra prediction module 318 , a motion compensation module 320 , an encoding loop module 322 , an in-loop post-processing module 324 , and an entropy coding module 326 .
  • the rate control module 314 computes frame level and coding-tree level quantization parameters for the picture type of each picture and corresponding to the operating bitrate selected for the normative encoding module out of the output bitrates. Further, the mode selection module 316 evaluates a cost function for a set of combinations of encoding parameters derived using best combinations of encoding parameters provided by the intelligent encoding module 302 operating at the same spatial resolution to determine the best combination of encoding parameters.
  • the mode selection module 316 performs evaluation of cost function which can be based on exact encoding distortion and exact bits required to encode (or) based on a model based estimation of distortion and bits required to encode each of the set of combinations of encoding parameters derived using the best combinations of encoding parameters for each CTB in each picture.
  • the intra prediction module 318 predicts sample values of a leaf block of the CTB based on already reconstructed neighbor samples and the selected predicted mode for the leaf block.
  • the motion compensation module 320 computes sub-pixel accurate prediction based on attributes of a leaf block partition of a CTB, such as a sub-pixel accurate motion vector, a reference frame index, weight and offset for that reference frame index, and whether the leaf block partition is predicted from a single reference or from two references.
  • the encoding loop module 322 performs residual computation as a difference between the original samples and the intra-predicted or motion compensated samples, optional forward transforming of the residuals, quantization based on the selected quantization parameter, inverse quantization, inverse transform of forward transforming of residuals was done), and addition of intra predicted or motion compensated sample values to obtain the reconstructed sample values.
  • the in-loop post-processing module 324 performs additional optional in-loop processing steps supported by the encoding format, such as in-loop deblocking, sample adaptive offset adaptive loop filtering, and so on.
  • the entropy coding module 326 encodes the best combination of encoding parameters for each CTB according to syntax and semantics of the encoding format along with the encoding of the quantized and suitably scanned residual coefficients to create a media bit stream at the required output bitrate.
  • FIG. 6A is a flow diagram 500 A of a method for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, according to one embodiment.
  • the pictures and a picture type for each of the pictures are received.
  • best combinations of encoding parameters from a determined set of combinations of the encoding parameters supported by the encoding format for each CTB in each of the pictures are identified based on the respective picture types.
  • a combination of encoding parameters for each CTB in each of the pictures is composed of is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree option, a partition type of each leaf block of each coding tree option, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree option, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree option, and a transform tree for each leaf block or leaf block partition of each coding tree option.
  • the sequence of pictures are encoded using the best of the identified combinations of encoding parameters for each CTB of each of the pictures to create media bit streams at the plurality of output bitrates. This is explained in more detail with reference to FIGS. 1A-1D .
  • FIG. 5B is a flow diagram 500 B of a method for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output spatial resolutions, according to one embodiment.
  • the sequence of pictures at the given spatial resolution is transformed to the output spatial resolutions and/or an additional spatial resolution that is lower than the plurality of output spatial resolutions.
  • the sequence of pictures at respective output spatial resolutions and a picture type for each of the pictures are received.
  • one or more of estimating a set of combinations of encoding parameters for each CTB in each of the pictures at a lowest output spatial resolution of the output spatial resolutions based on the additional spatial resolution, and mapping combinations of encoding parameters for each CTB in each of the pictures at an immediate coarser spatial resolution to corresponding blocks at the respective output spatial resolution are performed.
  • a set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures is determined based on the estimated set of combinations of encoding parameters or the mapped outputs.
  • best combinations of encoding parameters from the determined set of combinations of encoding parameters are identified based on the respective picture types.
  • the pictures in the encoding format are encoded based on the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the respective output spatial resolution.
  • the steps of receiving, performing, determining, identifying and encoding at are repeated at each of the remaining output spatial resolutions to create a media bit stream at each of the remaining output spatial resolutions. This is explained in more detail with reference to FIG. 2 .
  • FIG. 6A illustrates a system 600 A for performing efficient media encoding, according to one embodiment.
  • the system 600 A includes a plurality of computing devices 602 A-N that are communicatively coupled to each other. Further, the computing devices 602 A-N includes one or more processors 606 A-N and memories 608 A-N, respectively.
  • the efficient media encoding module 612 e.g., 100 A, 100 B, 100 C, 100 D, 200 or 200 including 200 A of FIGS. 1A-1D, 2 and 2A , respectively
  • the efficient media encoding module 612 is a program that resides in the memory 610 and is executed on one or more of the processors 606 A-N.
  • the memory 610 or the memories 608 A-N can be a shared memory that is accessible by the computing devices 602 A-N.
  • the shared memory is accomplished through an integrated memory map visible to each of the computing devices 602 A-N that are interconnected physically through either serial or parallel inter-processor inter-connects.
  • the shared memory can be a common random access memory (RAM) (internal or external) accessible over a shared bus, or it can be a memory mapped area that is physically accessed over a processor interconnect (e.g., Ethernet, Peripheral Component Interconnect Express (PCIe), serial rapid IO, Intel quick-path InterconnectTM, Texas Instruments' HyperlinkTM, and so on.).
  • processor interconnect e.g., Ethernet, Peripheral Component Interconnect Express (PCIe), serial rapid IO, Intel quick-path InterconnectTM, Texas Instruments' HyperlinkTM, and so on.
  • the efficient media encoding module 612 can run sequentially on a single processor or can be scheduled to run concurrently on a set of threads, processors, or computing devices with inter-device interconnects to access RAM associated with a remote computing devices and a memory mapping mechanism that provides a unified memory map for each computing device.
  • the efficient media encoding module 612 includes multiple modules to perform efficient media encoding and each module itself may run on a single core or multi-cores on a single computing device or across multiple computing devices.
  • the different modules can also run on distributed computing devices made available as part of an infrastructure-as-a-service (Iaas) offering or a private cloud infrastructure or platform-as-a-service (PaaS) offering.
  • Iaas infrastructure-as-a-service
  • PaaS platform-as-a-service
  • the system 6008 includes logic that is comprised of combinational circuits or sequential circuits 614 A-N that are communicatively coupled to each other.
  • the combinational circuits or sequential circuits 614 A-N includes memories 620 A-N.
  • one or more of the combinational circuits or sequential circuits 614 A-N implement logical circuits 622 (e.g., 100 A, 100 B, 100 C, 100 D, 200 or 200 including 200 A of FIGS. 1A-1D, 2 and 2A ), with their associated memories 620 A-N, for performing efficient media encoding.
  • the systems and methods described in FIGS. 1 through 6 propose a technique for efficient multi-bitrate and multi-spatial resolution media encoding that aims to get significant reduction in computational complexity over existing independent encoding of the sequence of pictures.
  • the proposed technique also works with both simulcast (coding of multiple resolution and multiple bit-rates at each resolution as independently decodable streams) and scalable representations (coding of multiple resolutions and multiple bit-rates at each resolution as a set of dependent streams).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A system and method for multi-bitrate and multi-spatial resolution media encoding are disclosed. In an embodiment for encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, the sequence of pictures and a picture type for each of the pictures are received. Further, best combinations of encoding parameters for each coding tree block (CTB) in each of the pictures are identified from a determined set of combinations of encoding parameters supported by the encoding format for the associated picture type. Furthermore, the sequence of pictures are encoded using the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create media bit streams at the plurality of output bitrates.

Description

Benefit is claimed under 35 U.S.C 119(a) to Indian Provisional Patent Application Ser. No. 4480/CHE/2012 entitled “SYSTEM AND ARCHITECTURE FOR EFFICIENT MULTI-RESOLUTION MULTI-BITATE MEDIA TRANSCODING” by Ittiam Systems (P) Ltd. filed on Oct. 26, 2012.
FIELD OF TECHNOLOGY
Embodiments of the present invention relate to media encoding. More particularly, embodiments of the present invention relate to efficient multi-bitrate and multi-spatial resolution media encoding.
BACKGROUND
Generally, media streams are encoded or compressed in a certain encoding format prior to transmission of the media streams over a network. Typical encoding for hypertext transfer protocol (HTTP) based adaptive streaming technology requires producing multiple titrate streams from a single input media to allow adaptation to changing bandwidth during content delivery (e.g. dynamic adaptive streaming over HTTP or DASH). Further, for end devices having varying screen resolutions, the encoding service produces media streams of multiple spatial resolutions, such as a quarter video graphics array (QVGA), a half-size video graphics array (HVGA), a video graphics array (VGA), a wide video graphics array (WVGA), 720p, 1080p/i, and so on. The process of encoding can be performed using on a public cloud infrastructure or a private cloud infrastructure using software-based or custom hardware appliances.
Given the slew of output media streams to be produced from a single input media, it is important to consider an efficient means of encoding. Newer video coding standards, such as high efficiency video coding (HEVC) require a much higher computational complexity compared to older video coding standards, and, thereto efficient encoding becomes even more critical to help keep encoding costs lower. There are existing methods that leverage the information in the input encoded media bit-stream to influence the encoding. However, as the input media stream is at a high resolution, the information contained in it is not that useful for coding the lower resolution streams. Also, given the coding tool differences between input formats and the output formats, optimal rate-distortion performance with the output formats may not be possible with re-use. For example, coding tools in standards such as H.264 or HEVC (High Efficiency Video Coding) have features such as, weighted prediction, hierarchical group of pictures (GOP) structure, multiple reference motion estimation (ME), and the like, whereas older standards such as, MPEG-2 (Moving Pictures Expert Group-2) do not have the above mentioned features. While some encoders (such as the open-source x264 encoder) leverage first pass information in terms of look-ahead processing and GOP structure when encoding multiple bitrate streams at a given spatial resolution, they still treat each encoding instance independently and repeat a lot of encoding parameters estimation process for each instance, which results in nearly a linear increase in computational complexity with the number of output bitrates.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the present invention are illustrated by way of an example and not limited to the figures in the accompanying drawings, in which like references indicate similar elements and in which:
FIG. 1A illustrates an architecture for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, according to one embodiment;
FIGS. 1B-1D illustrate architectures for efficiently encoding the sequence of pictures at the given spatial resolution to the encoding format at the plurality of output bitrates, according to three different embodiments, respectively;
FIG. 2 illustrates an architecture for encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output spatial resolutions, according to one embodiment;
FIG. 2A is a block diagram illustrating an intelligent encoding module, such as one shown in FIG. 2, coupled to a plurality of normative encoding modules for encoding the sequence of pictures to the encoding format at a plurality of output bit rates, according to one embodiment;
FIG. 3A is a block diagram illustrating components of an intelligent encoding module shown in FIGS. 1A-1D, 2 and 2A, according to one embodiment;
FIG. 3B is a block diagram illustrating components of a normative encoding module, shown in FIGS. 1A-1D and 2A, according to one embodiment;
FIGS. 4A-4B are block diagrams illustrating a coding tree and a coding tree block (CTB), respectively, in the context of the present invention;
FIGS. 5A-5B are flow diagrams of methods for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates and a plurality of output spatial resolutions respectively, according to two different embodiments that can also be combined; and
FIGS. 6A-6B are systems for performing efficient media encoding, according to two different embodiments.
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.
DETAILED DESCRIPTION
A system and method for efficient multi-bitrate and multi-spatial resolution media encoding are disclosed. In the following detailed description of the embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
FIG. 1A illustrates an architecture 100A for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, according to one embodiment. As shown in FIG. 1A, the architecture 100A includes an intelligent encoding module 102 and a set of normative encoding modules 104A-N. Further, the intelligent encoding module 102 is coupled to the normative encoding modules 104A-N.
In operation, the intelligent encoding module 102 receives the sequence of pictures. Further, the intelligent encoding module 102 can optionally determine a picture type for each of the pictures or receive the picture type for each of the pictures. The intelligent encoding module 102 then identifies best combinations of encoding parameters from a determined set of combinations of encoding parameters for each coding tree block (CTB) in each of the pictures supported by the encoding format for the associated picture type. For example, a combination of encoding parameters for each CTB in each of the pictures is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree option, a partition type of each leaf block of each coding tree option, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree option, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree option, and a transform tree for each leaf block or leaf block partition of each coding tree option. An example coding tree 400A is shown in FIG. 4A, the corresponding coding tree block 402, an example leaf block 404, and an example leaf block partition 406 are shown in FIG. 4B.
In one embodiment, the intelligent encoding module 102 evaluates a cost function that depends on one of the output bitrates for each combination of encoding parameters in the determined set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture types. For example, the cost function can be evaluated in terms of one or more of, input distortion to encoder (such as sum of absolute of the residuals being encoded in either the spatial domain or a transform domain), output distortion of encoder (such as sum of squared error between the original samples and the reconstructed samples), partial estimated or actual rate (such as number of bits to encode the non-residual elements of the bit stream), and estimate or actual rate (i.e. total number of bits required to encode including residuals). In this embodiment, the intelligent encoding module 102 then identifies the best combinations of encoding parameters for each CTB in each of the pictures from the determined set of combinations of encoding parameters based on the evaluated cost function.
Moreover in operation, the intelligent encoding module 102 encodes the pictures in the encoding format based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the one of the output bitrates. For example, components residing in the intelligent encoding module 102 are explained in more detail with reference to FIG. 3A.
In addition, the normative encoding modules 104A-N, one for each of remaining output bitrates, each of which receives the sequence of pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102. Each of the normative encoding modules 104A-N then encodes the sequence of pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at respective one of the remaining output bitrates. In one embodiment, each of the normative encoding modules 104A-N re-evaluates the cost function that is dependent on respective one of the remaining output bitrates for a set of combinations of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to determine the one combination of encoding parameters used for encoding each CTB in each of the pictures at the respective one of the remaining output bitrates. Each of the normative encoding modules 104A-N then encodes the sequence of pictures based on the one of the best combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at respective one of the remaining output bitrates.
In one example implementation, each of the normative encoding modules 104A-N performs motion compensation or intra prediction and implements entire encoding loop (residual computation, forward transform, quantization (simple or RD-optimized), inverse quantization, and/or inverse transform, reconstruction). Each of the normative encoding modules 104A-N also performs entropy coding. Without loss of generality, each of the normative encoding modules 104A-N also includes in-loop filtering functionality, such as deblocking, sample adaptive offset, adaptive loop filtering, and so on. This is explained in more detail with reference to FIG. 3B.
FIG. 1B illustrates an architecture 100B for efficiently encoding the sequence of pictures at the given spatial resolution to the encoding format at a plurality of output bitrates, according to one embodiment. As shown in FIG. 1B, the architecture 100B includes the intelligent encoding module 102 and the normative encoding modules 104A-N and a coarse-to-fine mapping module 106. Further, the coarse-to-fine mapping module 106 is coupled to the intelligent encoding module 102. Furthermore, the intelligent encoding module 102 is coupled to the normative encoding modules 104A-N.
In operation, the coarse-to-fine mapping module 106 receives combinations of encoding parameters for each CTB in each of the pictures at a reduced spatial resolution version derived from the sequence of pictures at a given spatial resolution and maps them to corresponding blocks at the given spatial resolution than this reduced spatial resolution.
Further, the intelligent encoding module 102 receives the sequence of pictures and the mapped outputs from the coarse-to-fine mapping module 106 and determines a set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures by using the mapped outputs of the coarse-to-fine mapping module 106. Furthermore, the intelligent encoding module 102 can optionally determine a picture type for each of the pictures or receive the picture type for each of the pictures. The intelligent encoding module 102 then identifies best combinations of encoding parameters for each CTB in each of the pictures from the determined set of combinations of encoding parameters supported by the encoding format for the associated picture type.
Also, the intelligent encoding module 102 encodes the pictures based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at one of the output bitrates. Further, the normative encoding modules 104A-N, one for each of remaining output bitrates, each of which receives the pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102. Each of the normative encoding modules 104A-N then encodes the pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at respective one of the remaining output bitrates. This is explained in more detail with reference to FIG. 1A.
Referring now to FIG. IC, which is another architecture 100C for efficiently encoding the sequence of pictures at the given spatial resolution to the encoding format at the output bitrates, according to one embodiment. As shown in FIG. IC, the architecture 100C includes an estimation module 108, the intelligent encoding module 102 and the normative encoding modules 104A-N. Further, the estimation module 108 is coupled to the intelligent encoding module 102. Furthermore, the intelligent encoding module 102 is coupled to the normative encoding modules 104A-N.
In operation, the estimation module 108 receives sequence of pictures at a reduced spatial resolution version and estimates the set of combinations of encoding parameters for the CTBs in each of the pictures at the given spatial resolution based on the reduced spatial resolution version of the sequence of pictures relative to the given spatial resolution. Further, the intelligent encoding module 102 receives the pictures and the estimated set of combinations of encoding parameters for the CTBs in each of the pictures at the given spatial resolution from the estimation module 108. Furthermore, the intelligent encoding module 102 determines the set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures using the estimated set of combinations of encoding parameters for the CTBs in each of the pictures at the given spatial resolution received from the estimation module 108. In addition, the intelligent encoding module 102 can optionally determine the picture type for each of the pictures or receive the picture type for each of the pictures. The intelligent encoding module 102 then identifies the best combinations of encoding parameters from the determined set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture type.
Also, the intelligent encoding module 102 encodes the pictures based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at one of the output bitrates. Further, the normative encoding modules 104A-N, one for each of remaining output bitrates, each of which receives the pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102. Each of the normative encoding modules 104A-N then encodes the pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create the media bit stream at respective one of the remaining output bitrates. This is explained in more detail with reference to FIG. 1A.
Referring now to FIG. 1D, which is yet another architecture 100D for efficiently encoding a media bit stream at a given spatial resolution to an encoding format at a plurality of output bitrates, according to one embodiment. As shown in FIG. 1D, the architecture 100D includes the intelligent encoding module 102, the normative encoding modules 104A-N, a media decoding module (MDM) 110, a format-to-format mapping module 112. Further, the MDM 110 is coupled to the format-to-format mapping module 112 and the intelligent encoding module 102. Furthermore, the format-to-format mapping module 112 is coupled to the intelligent encoding module 102. In addition, the intelligent encoding module 102 is coupled to the normative encoding modules 104A-N. The architecture 100D also optionally includes a mid-processing module 114 coupled between the MDM 110 and intelligent encoding module 102.
In operation, the MOM 110 receives and decodes the media bit stream in a certain input encoding format to produce decompressed sequence of pictures. The MDM 110 also produces a set of coding parameters in the input encoding format that are obtained while decoding the media stream. For example, the coding parameters is comprised of one or more of the encoding parameters described earlier, and additionally, parameters such as quantization parameter for each leaf block of each coding tree option, number of bits for coding the non-residual syntax elements in the input encoding format, and the number of bits for coding the residual syntax elements in the input encoding format. Further, the format-to-format mapping module 112 receives the coding parameters from the MOM 110 and maps the coding parameters available for each CTB in the sequence of pictures in the certain input encoding format to a set of encoding parameters supported by the encoding format. Further, the intelligent encoding module 102 receives the decompressed sequence of pictures from the MDM 110 and the mapped outputs of the format-to-format mapping module 112. In one example embodiment, the mid-processing module 114 performs one or more of removing coding artifacts in the decompressed sequence of pictures, de-interlacing the sequence of pictures, and enhancing the decompressed sequence of pictures. The intelligent encoding module 102 receives the sequence of pictures from the mid-processing module 114.
Furthermore, the intelligent encoding module 102 determines the set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures using the mapped outputs received from the format-to-format mapping module 112. In addition, the intelligent encoding module 102 can optionally determine the picture type for each of the pictures type or receive the picture type for each of the picture types as an input. The intelligent encoding module 102 then identifies the best combinations of encoding parameters for each CTB in each of the pictures from the determined set of combinations of encoding parameters supported by the encoding format for the associated picture type.
Also, the intelligent encoding module 102 encodes the pictures based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at one of the output bitrates. Further, the normative encoding modules 104A-N, one for each of remaining output bitrates, each of which receives the pictures, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 102. Each of the nonnative encoding modules 104A-N then encodes the pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at respective one of the remaining output bitrates. This is explained in more detail with reference to FIG. 1A.
Referring now to FIG. 2, which is an architecture 200 for encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output spatial resolutions, according to one embodiment. As shown in FIG. 2, the architecture 200 includes a media transformation module (MTM) 202, the estimation module 108, a set of coarse-to-fine mapping modules 204A-N, one for each of the output spatial resolutions other than a lowest spatial resolution of the output spatial resolutions, and a set of intelligent encoding modules 206A-N, one for each of the output spatial resolutions.
Further, the MTM 202 is coupled to the estimation module 108 and the intelligent encoding modules 206A-N. Furthermore, each of the coarse-to-fine mapping modules 204A-N is coupled to respective one of the intelligent encoding modules 206A-N. The architecture 200 also optionally includes a fine-to-coarse mapping module 208 that is coupled to the estimation module 108. Further, the architecture 200 optionally includes a look-ahead processing module 210 that is coupled to the estimation module 108 and the intelligent encoding modules 206A-N.
In operation, the MTM 202 transforms the sequence of pictures at the given spatial resolution to the output spatial resolutions and/or an additional spatial resolution that is lower than the output spatial resolutions. In one example embodiment, the MTM 202 performs one of resizing to derive sequence of pictures at higher or lower resolution than the given resolution, de-interlacing to convert interlaced to progressive content, cadence generation, frame rate up/down conversion, and the like. In one example, the MTM 202 transforms decompressed pictures at the given spatial resolution to the output spatial resolutions and/or an additional spatial resolution.
Further, the estimation module 108 estimates a set of combinations of encoding parameters for each CTB in each of the pictures at a lowest output spatial resolution among the output spatial resolutions based on the additional spatial resolution. In one example embodiment, the fine-to-coarse mapping module 208 receives combinations of encoding parameters at a higher spatial resolution and provides a set of possible encoding parameters for a corresponding region at the additional spatial resolution to the estimation module 108 using which it estimates the set of combinations of encoding parameters. Furthermore, each of the coarse-to-fine mapping modules 204A-N maps combinations of encoding parameters for each CTB in each of the pictures at an immediate coarser spatial resolution to corresponding blocks at their respective output spatial resolution.
In addition, the intelligent encoding modules 206A-N, one for each of the plurality of output spatial resolutions, each of which receives the sequence of pictures at the respective output spatial resolution and a picture type for each of the pictures. In one example implementation, the look-ahead processing module 210 receives the sequence of pictures at the given spatial resolution or the sequence of pictures at lower spatial resolutions from the MTM 202 and determines group of pictures (GOP) structure and hence the picture type for each of the pictures across all the intelligent encoding modules 206A-N. Each of the intelligent encoding modules 206A-N determines a set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures based on the outputs of one or more of the estimation module 108 and the respective one of the coarse-to-fine mapping modules 204A-N. In one example embodiment, when the respective output spatial resolution is lower than the given resolution, each of the intelligent encoding modules 206A-N receives the output of the fine-to-coarse mapping module 208. For example, a combination of encoding parameters for each CTB in each of the pictures is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree option, a partition type of each leaf block of each coding tree option, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree option, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree option, and a transform tree for each leaf block or leaf block partition of each coding tree option. An example coding tree 400A is shown in FIG. 4A, the corresponding coding tree block 402, an example leaf block 404, and leaf block partition 406 are shown in FIG. 4B.
Moreover, each of the intelligent encoding modules 206A-N identifies best combinations of encoding parameters from the determined set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture type. In one embodiment, the intelligent encoding module 102 evaluates a cost function for each combination of encoding parameters in the set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture type. In this embodiment, the intelligent encoding module 102 then identifies the best combinations of encoding parameters from the set of combinations of encoding parameters for each CTB in each of the pictures based on the evaluated cost function. Also in operation, each of the intelligent encoding modules 206A-N encodes the pictures in the encoding format based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the respective output spatial resolution.
In one example embodiment, an intelligent encoding module 206 (e.g., intelligent encoding modules 206A-N) is coupled to a set of normative encoding modules 212A-N as shown in FIG. 2A. Further, the normative encoding modules 212A-N, one for each of other desired output bitrates at a respective output spatial resolution, each of which receives the sequence of pictures at the respective output spatial resolution, their determined picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures from the intelligent encoding module 206. Furthermore, each of the normative encoding modules 212A-N encodes the pictures in the encoding format based on one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the respective desired output titrate at the respective output spatial resolution.
Referring now to FIG. 3A, which its a block diagram 300A illustrating an intelligent encoding module 302 (e.g., one of the intelligent encoding module 102 and the intelligent encoding modules 206A-N shown in FIGS. 1A-1D, 2 and 2A, respectively). As shown in FIG. 3A, the intelligent encoding module 302 includes a rate control module 304, an intra-prediction node selection or refinement module 306, a motion estimation or refinement module 308 and a mode decision module 310. In one embodiment, the intelligent encoding module 302 at a given spatial resolution includes one or more of the rate control module 304, the intra-prediction mode selection or refinement module 306, and the mode decision module 310.
In one example embodiment, the intelligent encoding module 302 is capable of taking in information from one or more of mapping modules, such as a coarse-to-fine mapping module (e.g., one of the coarse-to-fine mapping module 106 of FIG. 1B and the coarse-to-fine mapping modules 204A-N of FIG. 2), a fine-to-coarse mapping module (e.g., the fine-to-coarse mapping module 208 of FIG. 2) that receives combinations of encoding parameters for each CTB in each of the pictures at a higher spatial resolution and maps the received combinations of encoding parameters to corresponding blocks at the given spatial resolution, an estimation module (e.g., the estimation module 108 of FIGS. 1B and 2), or a format-to-format mapping module that produce outputs in a form suitable for consumption at the given spatial resolution (e.g., the format-to-format mapping module 112 of FIG. 1C), determined picture type for each picture and a set of reference frames for each inter-coded picture along with an associated optional weight and offset for each reference frame for weighted prediction from a look-ahead processing module (e.g., the look-ahead processing module 210 of FIG. 2). Furthermore, the intelligent encoding module 302 produces an optimal encoded bit stream in the encoding format at a bitrate selected amongst the output bitrates at that spatial resolution and provides information input to the coarse-to-fine mapping module at a higher resolution and optionally to other normative encoding modules (e.g., normative encoding modules 104A-N of FIGS. 1A-1D or normative encoding modules 212A-N of FIG. 2A) at the given spatial resolution.
In one example embodiment, the rate control module 304 computes frame level and coding-tree level quantization parameters for the determined picture type of each picture and corresponding to operating bitrate selected for the intelligent encoding module 302 out of the output bitrates. Further, the intra-prediction mode selection or refinement module 306 takes in the combinations of encoding parameters for each CTB in each picture from one or more of the mapping modules, determines a refinement set of combinations of encoding parameters at the given spatial resolution based on the outputs of the mapping modules, evaluates a cost function for each combination of encoding parameters that depends on the computed quantization parameter, and identifies the top few combinations of encoding parameters that have the lowest evaluated cost across the set of combinations of encoding parameters. For example, the encoding parameters relevant for the intra-prediction mode selection or refinement module 306 within the combination of encoding parameters include a coding tree structure, a partition type of each leaf block of the coding tree, an intra prediction mode of each leaf block partition, and a transform tree for each leaf block or leaf block partition.
In one example, the intra-prediction mode selection or refinement module 306 performs the evaluation of cost function by using intra prediction performed using neighboring pixel values in the input sequence of pictures or by using intra prediction performed using neighboring pixel values that are the reconstructed pixel values obtained by quantizing the transformed or non-transformed (intra or inter) prediction residuals for the causal or non-causal neighboring blocks, de-quantizing, inverse transforming if transformed, and summing up the (intra or inter) predicted value for those neighboring pixels.
Furthermore in this example embodiment, the motion estimation or refinement module 308 takes in the combinations of encoding parameters for each CTB in each inter-picture from one or more of the mapping modules, determines a refinement set of combinations of encoding parameters at the given spatial resolution based on the outputs of the mapping module, evaluates a cost function for each combination of encoding parameters that depends on the computed quantization parameter, and identifies the top few combinations of encoding parameters that have the lowest evaluated cost across the set of combinations. For example, the encoding parameters relevant for this module within the combination of encoding parameters include coding tree structures, a partition type of each leaf block of each coding tree, a prediction mode of each leaf block partition, such as unidirectional or bi-directional prediction, sub-pixel or full-pixel accurate motion vectors with reference to a set of reference frames for each leaf block partition, and a transform tree for each leaf block or leaf block partition.
Also in this example embodiment, the mode decision module 310 maintains strict neighbor dependencies required by the encoding format, receives the identified best combinations of encoding parameters from the intra prediction mode selection or refinement module 306 and the motion estimation or refinement module 308 for each CTB in each picture. Further, the mode decision module 310 evaluates a cost function (that depends on the computed quantization parameter) across the received combinations of encoding parameters as well as additional modes for each leaf block of the coding tree in each combination for which the decoder-side can infer the mode, motion vectors, and/or reference frames based on a signaled neighbor index, where the cost function evaluation can require intra prediction and motion compensation as well as (a) evaluation of exact encoding distortion incurred due to quantization (after transforming or in the spatial domain) and the bits required for encoding the syntax elements for the encoding format or (b) model based estimation of the encoding distortion and the bits required, for each combination of encoding parameters for each CTB.
Furthermore, the mode decision module 310 identifies the set of combinations of encoding parameters that result in the lowest cost across all the evaluated combinations for each CTB in each picture, performs the normative steps for obtaining the quantized residuals for the best combination of encoding parameters for a CTB, reconstructs the pixels by inverse quantizing the quantized residuals and inverse transforming (if residuals were forward transformed) and adding the prediction values, and performs in-loop operations, such as in-loop deblocking, sample adaptive offset, adaptive loop filtering that are selected for the encoder and supported by the encoding format. In addition, the mode decision module 310 encodes the set of combinations of encoding parameters according to the syntax and semantics of the encoding format along with the encoding of the quantized residuals according to the syntax and semantics of the encoding format to produce a media bit stream at the selected output bitrate.
Referring now to FIG. 3B, which is a block diagram 300B illustrating a normative encoding module 312 (e.g., one of the normative encoding modules 104A-N and normative encoding modules 212A-N shown in FIGS. 1A-1D and 2A, respectively). As shown in FIG. 3B, the normative encoding module 312 include a rate control module 314, a mode selection module 316, an intra prediction module 318, a motion compensation module 320, an encoding loop module 322, an in-loop post-processing module 324, and an entropy coding module 326.
In one embodiment, the rate control module 314 computes frame level and coding-tree level quantization parameters for the picture type of each picture and corresponding to the operating bitrate selected for the normative encoding module out of the output bitrates. Further, the mode selection module 316 evaluates a cost function for a set of combinations of encoding parameters derived using best combinations of encoding parameters provided by the intelligent encoding module 302 operating at the same spatial resolution to determine the best combination of encoding parameters. In one example, the mode selection module 316 performs evaluation of cost function which can be based on exact encoding distortion and exact bits required to encode (or) based on a model based estimation of distortion and bits required to encode each of the set of combinations of encoding parameters derived using the best combinations of encoding parameters for each CTB in each picture.
Furthermore the intra prediction module 318 predicts sample values of a leaf block of the CTB based on already reconstructed neighbor samples and the selected predicted mode for the leaf block. In addition, the motion compensation module 320 computes sub-pixel accurate prediction based on attributes of a leaf block partition of a CTB, such as a sub-pixel accurate motion vector, a reference frame index, weight and offset for that reference frame index, and whether the leaf block partition is predicted from a single reference or from two references. Moreover, the encoding loop module 322 performs residual computation as a difference between the original samples and the intra-predicted or motion compensated samples, optional forward transforming of the residuals, quantization based on the selected quantization parameter, inverse quantization, inverse transform of forward transforming of residuals was done), and addition of intra predicted or motion compensated sample values to obtain the reconstructed sample values.
Also, the in-loop post-processing module 324 performs additional optional in-loop processing steps supported by the encoding format, such as in-loop deblocking, sample adaptive offset adaptive loop filtering, and so on. Further, the entropy coding module 326 encodes the best combination of encoding parameters for each CTB according to syntax and semantics of the encoding format along with the encoding of the quantized and suitably scanned residual coefficients to create a media bit stream at the required output bitrate.
Referring now to FIG. 6A, which is a flow diagram 500A of a method for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, according to one embodiment. At block 502A, the pictures and a picture type for each of the pictures are received. At 504A, best combinations of encoding parameters from a determined set of combinations of the encoding parameters supported by the encoding format for each CTB in each of the pictures are identified based on the respective picture types. For example, a combination of encoding parameters for each CTB in each of the pictures is composed of is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree option, a partition type of each leaf block of each coding tree option, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree option, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree option, and a transform tree for each leaf block or leaf block partition of each coding tree option. At block 506A, the sequence of pictures are encoded using the best of the identified combinations of encoding parameters for each CTB of each of the pictures to create media bit streams at the plurality of output bitrates. This is explained in more detail with reference to FIGS. 1A-1D.
Referring now to FIG. 5B, which is a flow diagram 500B of a method for efficiently encoding a sequence of pictures at a given spatial resolution to an encoding format at a plurality of output spatial resolutions, according to one embodiment. At block 502B, the sequence of pictures at the given spatial resolution is transformed to the output spatial resolutions and/or an additional spatial resolution that is lower than the plurality of output spatial resolutions. At block 504B, the sequence of pictures at respective output spatial resolutions and a picture type for each of the pictures are received. At block 506B, one or more of estimating a set of combinations of encoding parameters for each CTB in each of the pictures at a lowest output spatial resolution of the output spatial resolutions based on the additional spatial resolution, and mapping combinations of encoding parameters for each CTB in each of the pictures at an immediate coarser spatial resolution to corresponding blocks at the respective output spatial resolution are performed. At block 508B, a set of combinations of encoding parameters supported by the encoding format for each CTB in each of the pictures is determined based on the estimated set of combinations of encoding parameters or the mapped outputs. At block 510B best combinations of encoding parameters from the determined set of combinations of encoding parameters are identified based on the respective picture types. At block 512B, the pictures in the encoding format are encoded based on the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at the respective output spatial resolution. At block 514B the steps of receiving, performing, determining, identifying and encoding at are repeated at each of the remaining output spatial resolutions to create a media bit stream at each of the remaining output spatial resolutions. This is explained in more detail with reference to FIG. 2.
Referring now to FIG. 6A, which illustrates a system 600A for performing efficient media encoding, according to one embodiment. As shown in FIG. 6A, the system 600A includes a plurality of computing devices 602A-N that are communicatively coupled to each other. Further, the computing devices 602A-N includes one or more processors 606A-N and memories 608A-N, respectively. The efficient media encoding module 612 (e.g., 100A, 100B, 100C, 100D, 200 or 200 including 200A of FIGS. 1A-1D, 2 and 2A, respectively) can be implemented as a program (which performs the processes described in FIGS. 5A and 5B) that resides in any of the memories 608A-N or in a media server 604.
In one example embodiment, the efficient media encoding module 612 is a program that resides in the memory 610 and is executed on one or more of the processors 606A-N. For example, the memory 610 or the memories 608A-N can be a shared memory that is accessible by the computing devices 602A-N. In this example, the shared memory is accomplished through an integrated memory map visible to each of the computing devices 602A-N that are interconnected physically through either serial or parallel inter-processor inter-connects.
In other words, the shared memory can be a common random access memory (RAM) (internal or external) accessible over a shared bus, or it can be a memory mapped area that is physically accessed over a processor interconnect (e.g., Ethernet, Peripheral Component Interconnect Express (PCIe), serial rapid IO, Intel quick-path Interconnect™, Texas Instruments' Hyperlink™, and so on.). Further, the efficient media encoding module 612 can run sequentially on a single processor or can be scheduled to run concurrently on a set of threads, processors, or computing devices with inter-device interconnects to access RAM associated with a remote computing devices and a memory mapping mechanism that provides a unified memory map for each computing device. The efficient media encoding module 612 includes multiple modules to perform efficient media encoding and each module itself may run on a single core or multi-cores on a single computing device or across multiple computing devices. The different modules can also run on distributed computing devices made available as part of an infrastructure-as-a-service (Iaas) offering or a private cloud infrastructure or platform-as-a-service (PaaS) offering.
Referring now to FIG. 6B, which illustrates a system 6008 for performing efficient media encoding, according to one embodiment. As shown in FIG. 6B, the system 6008 includes logic that is comprised of combinational circuits or sequential circuits 614A-N that are communicatively coupled to each other. Furthermore, the combinational circuits or sequential circuits 614A-N includes memories 620A-N. Further, one or more of the combinational circuits or sequential circuits 614A-N implement logical circuits 622 (e.g., 100A, 100B, 100C, 100D, 200 or 200 including 200A of FIGS. 1A-1D, 2 and 2A), with their associated memories 620A-N, for performing efficient media encoding.
As an example of the proposed technique, consider a scenario where a 1920×1080@30 fps ingested bit stream is to be encoded to an encoding format with 4 different bit-rates each at 1920×1080, 1280×720, 852×480, and 640×360. Let x be the cycles taken by an encoder without input from any mapping module to code a highest resolution. Let y be the cycles taken by a normative encoding module to code the highest resolution. Let z be the cycles taken by an intelligent encoding module. The computational complexity reduction with the proposed arrangement compared to brute-force encoding of all the streams would be:
  • 1.75*4*x (for the brute-fore encoding) vs. 1.75*(z+3*y)
  • (where 1.75 is the approximately the ratio between the total number of pixels across all resolutions relative to the number of pixels at 1920×1080 resolution)
  • Computational savings factor=(4x)/(z+3y)
  • For typical medium to high quality encoders, x=5y, and z=2.5y. For these values, the computational savings factor becomes (20)/(5.5)=3.6. In other words, 4 streams are encoded at roughly the same computational complexity that it would take to do one brute-force re-encoding.
In various embodiments, the systems and methods described in FIGS. 1 through 6 propose a technique for efficient multi-bitrate and multi-spatial resolution media encoding that aims to get significant reduction in computational complexity over existing independent encoding of the sequence of pictures. The proposed technique also works with both simulcast (coding of multiple resolution and multiple bit-rates at each resolution as independently decodable streams) and scalable representations (coding of multiple resolutions and multiple bit-rates at each resolution as a set of dependent streams).
In addition, it will be appreciated that the various operations, processes, and methods disclosed herein may be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and may be performed in any order (e.g., including using means for achieving the various operations). Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims (18)

What is claimed is:
1. A system for encoding a sequence of pictures at a given spatial resolution, with an associated picture type for each picture, to an encoding format at a plurality of output bitrates, comprising:
an intelligent encoding module that receives the sequence of pictures, identifies best combinations of encoding parameters for each coding tree block (CTB) in each of the pictures from a determined set of combinations of encoding parameters supported by the encoding format for the associated picture type, and encodes the pictures in the encoding format based on the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at one of the plurality of output bitrates; and
a set of normative encoding modules, one for each of remaining output bitrates, each of which receives the sequence of pictures, their picture types, and the identified best combinations of encoding parameters for each CTB in each of the pictures and encodes each CTB in the sequence of pictures using one combination of encoding parameters derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to create a media bit stream at respective one of the remaining output bitrates.
2. The system of claim 1 wherein a particular combination of encoding parameters for each CTB in each of the pictures is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree, a partition type of each leaf block of each coding tree, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree, and a transform tree for each leaf block or leaf block partition of each coding tree.
3. The system of claim 1, wherein the intelligent encoding module is configured to:
evaluate a cost function that depends on the one of the plurality of output bitrates, for each combination of encoding parameters for each CTB in each of the pictures in the determined set of combinations of encoding parameters supported by the encoding format for the associated picture type; and
identify the best combinations of encoding parameters for each CTB in each of the pictures from the set of combinations of encoding parameters based on the evaluated cost function.
4. The system of claim 3, wherein each of the set of normative encoding modules is configured to:
re-evaluate the cost function that is dependent on respective one of the remaining output bitrates for a set of combinations of encoding parameters for each CTB in each of the pictures derived using the identified best combinations of encoding parameters for each CTB in each of the pictures to determine the combination of encoding parameters used for encoding each CTB in each of the pictures at the respective one of the remaining output bitrates.
5. The system of claim 1, further comprising at least one of:
a coarse-to-fine mapping module that receives combinations of encoding parameters for each CTB in each of the pictures at a reduced spatial resolution version derived from the sequence of pictures at the given spatial resolution and maps the received combinations of encoding parameters to corresponding blocks at the given resolution;
an estimation module that estimates a set of combinations of encoding parameters for the CTBs in each of the pictures at the given spatial resolution based on a reduced spatial resolution version of the sequence of pictures relative to the given spatial resolution;
a format-to-format mapping module that maps a set of coding parameters available for each CTB in the sequence of pictures in a certain input encoding format to a set of encoding parameters supported by the encoding format; and
a fine-to-coarse mapping module that receives combinations of encoding parameters for each CTB in each of the pictures at a higher spatial resolution than the given spatial resolution and maps the received combinations of encoding parameters to corresponding blocks at the given resolution.
6. The system of claim 5, wherein the intelligent encoding module determines the set of combinations of encoding parameters supported by the encoding format for the associated picture type based on the outputs of at least one of the coarse-to-fine mapping module, the estimation module, the format-format mapping module, and the fine-to-coarse mapping module.
7. The system of claim 5, wherein the set of coding parameters available for each coding tree block in the sequence of pictures in a certain input encoding format is comprised of one or more coding tree structure options, a prediction type of each leaf block of each coding tree, a partition type of each leaf block of each coding tree, quantization parameter for each leaf block of each coding tree, a header bits value for each leaf block of each coding tree, a residual coding bits value for each leaf block of each coding tree, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree, and a transform tree for each leaf block or leaf block partition of each coding tree.
8. The system of claim 5, further comprising:
a media decoding module (MDM) that decodes a media bit stream in the certain input encoding format to produce the sequence of pictures and the set of coding parameters that constitutes the input to the format-to-format mapping module.
9. The system of claim 8, further comprising:
a mid-processing module that performs at least one of removing coding artifacts in the sequence of pictures, de-interlacing the sequence of pictures, and enhancing the sequence of pictures.
10. The system of claim 1, wherein the intelligent encoding module and the set of normative encoding modules are programs that reside in a memory and are executed by one or more processors or sets of logical circuits that are implemented using combinational circuits and/or sequential circuits, with their associated memories.
11. The system of claim 10, wherein the associated memories and program memory are shared memories that are accessible by one or more computing devices comprising the one or more processors or sets of logical circuits.
12. The system of claim 11, wherein the shared memory is accomplished through an integrated memory map visible to each of the one or more computing devices that are interconnected physically through either serial or parallel inter-processor inter-connects.
13. A method for encoding a sequence of pictures with associated picture types for the pictures at a given spatial resolution to an encoding format at a plurality of output bitrates, comprising:
receiving the sequence of pictures and the picture type for each of the pictures;
identifying best combinations of encoding parameters for each coding tree block (CTB) in each of the pictures from a determined set of combinations of encoding parameters supported by the encoding format for the associated picture type; and
encoding the sequence of pictures using the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create media bit streams at the plurality of output bitrates.
14. The method of claim 13, wherein a particular combination of encoding parameters for each CTB in each of the pictures is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree, a partition type of each leaf block of each coding tree, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree, and a transform tree for each leaf block or leaf block partition of each coding tree.
15. The method of claim 13, wherein identifying the best combinations of encoding parameters for each CTB in each of the pictures from the determined set of combinations of encoding parameters, comprises:
evaluating a cost function that depends on the one of the plurality of output bitrates, for each combination of encoding parameters for each CTB in each of the pictures in the determined set of combinations of encoding parameters supported by the encoding format for the associated picture type; and
identifying the best combinations of encoding parameters for each CTB in each of the pictures from the set of combinations of encoding parameters based on the evaluated cost function.
16. A non-transitory computer-readable storage medium having instructions that, when executed by a computing device, cause the computing device to:
receive a sequence of pictures and a picture type for each the pictures;
identify best combinations of encoding parameters from a determined set of combinations of encoding parameters for each coding tree block (CTB) in each of the pictures supported by an encoding format for the associated picture type; and
encode the sequence of pictures using the best of the identified combinations of encoding parameters for each CTB in each of the pictures to create media bit streams at a plurality of output bitrates.
17. The non-transitory computer-readable storage medium of claim 16, wherein a particular combination of encoding parameters for each CTB in each of the pictures is composed of one or more of, coding tree structure options, a prediction type of each leaf block of each coding tree, a partition type of each leaf block of each coding tree, a prediction mode within the prediction type for each partition of each leaf block partition of each coding tree, motion vectors with respect to a set of reference pictures for each leaf block of each coding tree, and a transform tree for each leaf block or leaf block partition of each coding tree.
18. The non-transitory computer-readable storage medium of claim 16, wherein the instructions further cause the computing device to:
evaluate a cost function, that depends on one of the plurality of output bitrates, for each combination of encoding parameters in the set of combinations of encoding parameters for each CTB in each of the pictures supported by the encoding format for the associated picture type; and
identify the best combinations of encoding parameters for each CTB in each of the pictures from the set of combinations of encoding parameters based on the evaluated cost function.
US14/064,224 2012-10-26 2013-10-28 System and method for efficient multi-bitrate and multi-spatial resolution media encoding Active 2035-05-06 US9554143B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/369,898 US9654791B1 (en) 2012-10-26 2016-12-06 System and method for efficient multi-bitrate and multi-spatial resolution media encoding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN4480/CHE/2012 2012-10-26
IN4480CH2012 2012-10-26

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/369,898 Division US9654791B1 (en) 2012-10-26 2016-12-06 System and method for efficient multi-bitrate and multi-spatial resolution media encoding

Publications (2)

Publication Number Publication Date
US20140119451A1 US20140119451A1 (en) 2014-05-01
US9554143B2 true US9554143B2 (en) 2017-01-24

Family

ID=50547173

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/064,224 Active 2035-05-06 US9554143B2 (en) 2012-10-26 2013-10-28 System and method for efficient multi-bitrate and multi-spatial resolution media encoding
US15/369,898 Active US9654791B1 (en) 2012-10-26 2016-12-06 System and method for efficient multi-bitrate and multi-spatial resolution media encoding

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/369,898 Active US9654791B1 (en) 2012-10-26 2016-12-06 System and method for efficient multi-bitrate and multi-spatial resolution media encoding

Country Status (1)

Country Link
US (2) US9554143B2 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10390071B2 (en) * 2016-04-16 2019-08-20 Ittiam Systems (P) Ltd. Content delivery edge storage optimized media delivery to adaptive bitrate (ABR) streaming clients
US11146608B2 (en) 2017-07-20 2021-10-12 Disney Enterprises, Inc. Frame-accurate video seeking via web browsers
GB2567835B (en) * 2017-10-25 2020-11-18 Advanced Risc Mach Ltd Selecting encoding options
CN108965873B (en) * 2018-07-24 2020-02-14 北京大学 Adaptive partitioning method for pulse array coding
US11803493B2 (en) * 2020-11-30 2023-10-31 Dell Products L.P. Systems and methods for management controller co-processor host to variable subsystem proxy
US20240223816A1 (en) * 2023-01-03 2024-07-04 Qualcomm Incorporated Adaptive loop filter classifiers

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305273A1 (en) * 2010-06-11 2011-12-15 Microsoft Corporation Parallel multiple bitrate video encoding
US20140056356A1 (en) * 2012-08-21 2014-02-27 Motorola Mobility Llc Method and apparatus for efficient signaling of weighted prediction in advanced coding schemes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5830761B2 (en) * 2011-05-10 2015-12-09 株式会社Joled Display device and electronic device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305273A1 (en) * 2010-06-11 2011-12-15 Microsoft Corporation Parallel multiple bitrate video encoding
US20140056356A1 (en) * 2012-08-21 2014-02-27 Motorola Mobility Llc Method and apparatus for efficient signaling of weighted prediction in advanced coding schemes

Also Published As

Publication number Publication date
US20170150166A1 (en) 2017-05-25
US9654791B1 (en) 2017-05-16
US20140119451A1 (en) 2014-05-01

Similar Documents

Publication Publication Date Title
CN115956363B (en) Video decoding method, device, computer equipment and storage medium
US9654791B1 (en) System and method for efficient multi-bitrate and multi-spatial resolution media encoding
US10791333B2 (en) Video encoding using hierarchical algorithms
CN111757106B (en) Method and apparatus for coding a current block in a video stream using multi-level compound prediction
JP7419418B2 (en) Decoders, programs and methods
CN102037732B (en) Method for performing single pass adaptive interpolation filter, encoder and decoder
CN102714724B (en) Video information encoding method and video information encoding system
US11425402B2 (en) Cross-codec encoding optimizations for video transcoding
US20170347094A1 (en) Block size adaptive directional intra prediction
CN105474642B (en) Method, system and medium for recoding a set of images using frequency domain difference
US10205763B2 (en) Method and apparatus for the single input multiple output (SIMO) media adaptation
CN104639948A (en) In-loop adaptive wiener filter for video coding and decoding
US20150078446A1 (en) Method and apparatus for inter-layer intra prediction
KR20130105827A (en) Video decoding using motion compensated example-based super resolution
CN116349225A (en) Content-adaptive online training method and device for deblocking in block-by-block image compression
CN105306957A (en) Adaptive loop filtering method and device
EP4315849A1 (en) Externally enhanced prediction for video coding
CN119213777A (en) Content-adaptive online training using scaling factors and/or offsets for neural image compression
KR20120117613A (en) Method and apparatus for encoding a moving picture
US20250294134A1 (en) Task-oriented video semantic coding system
CN103096064B (en) Method for encoding and reconstructing a block of pixels and corresponding device
Zhao et al. Parallel entropy decoding for high-resolution video coding
JP5938424B2 (en) Method for reconstructing and encoding image blocks
JP7279084B2 (en) Method and apparatus for intra prediction
CN103002279B (en) Encode the method and its corresponding device of simultaneously reconstructed pixel block

Legal Events

Date Code Title Description
AS Assignment

Owner name: ITTIAM SYSTEMS (P) LTD., INDIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SETHURAMAN, SRIRAM;REEL/FRAME:031498/0886

Effective date: 20131025

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4