US9866846B2 - Method and apparatus for video processing with complexity information - Google Patents

Method and apparatus for video processing with complexity information Download PDF

Info

Publication number
US9866846B2
US9866846B2 US14/877,776 US201514877776A US9866846B2 US 9866846 B2 US9866846 B2 US 9866846B2 US 201514877776 A US201514877776 A US 201514877776A US 9866846 B2 US9866846 B2 US 9866846B2
Authority
US
United States
Prior art keywords
num
max
filterings
tap
instances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US14/877,776
Other versions
US20160105680A1 (en
Inventor
Felix Carlos Fernandes
Madhukar Budagavi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US14/877,776 priority Critical patent/US9866846B2/en
Priority to PCT/KR2015/010852 priority patent/WO2016060478A1/en
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BUDAGAVI, MADHUKAR, FERNANDES, FELIX CARLOS
Publication of US20160105680A1 publication Critical patent/US20160105680A1/en
Application granted granted Critical
Publication of US9866846B2 publication Critical patent/US9866846B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/127Prioritisation of hardware or computational resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness

Definitions

  • the present application relates generally to video processing devices and, more specifically, to methods for dynamic voltage and frequency scaling for video processing in order to reduce power usage.
  • Video codecs in mobile devices can be implemented using either software on the CPU, such as an ARM (Advanced RISC Machines) platform, or hardware via dedicated ASIC (application specific integrated circuit) design. Recent advances in circuits design have demonstrated that power consumption can be reduced if circuits are placed into a low-power state, which uses a slower clock rate and a lower supply voltage.
  • This disclosure provides methods and apparatuses for implementing complexity-based video processing and corresponding power reduction in a display screen.
  • a decoder for video processing includes a receiver configured to receive, from an encoder, a bitstream associated with a video.
  • the decoder also includes a processor configured to parse the bitstream to determine a percentage of at least one of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period, determine a voltage and frequency to be used for decoding the video as a function of the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period; and decode the video at the determined voltage and frequency.
  • an encoder for video processing includes a transmitter configured to transmit, to a decoder, a bitstream associated with a video.
  • the encoder also includes a processor configured to code a video to have at least one variable of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period.
  • the processor is also configured to determine a percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period.
  • the processor is further configured to generate the bitstream containing the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period.
  • a method for video processing includes parsing, at a decoder, a bitstream associated with a video to determine a percentage of at least one of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period.
  • the method also includes determining, at the decoder, a voltage and frequency to be used for decoding the video according to the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period.
  • the method further includes decoding, at the decoder, the video at the determined voltage and frequency.
  • Couple and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another.
  • controller means any device, system or part thereof that controls at least one operation. Such a controller may be implemented in hardware or a combination of hardware and software and/or firmware. The functionality associated with any particular controller may be centralized or distributed, whether locally or remotely.
  • phrases “at least one of,” when used with a list of items, means that different combinations of one or more of the listed items may be used, and only one item in the list may be needed.
  • “at least one of: A, B, and C” includes any of the following combinations: A, B, C, A and B, A and C, B and C, and A and B and C.
  • various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium.
  • application and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code.
  • computer readable program code includes any type of computer code, including source code, object code, and executable code.
  • computer readable medium includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory.
  • ROM read only memory
  • RAM random access memory
  • CD compact disc
  • DVD digital video disc
  • a “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical signals or other signals.
  • a non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
  • FIG. 1A is a high level diagram illustrating an example network within which devices may implement complexity-based video processing according to this disclosure
  • FIG. 1B is a front view of an example user device from the network of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure;
  • FIG. 1C is a high level block diagram of the functional components in the example user device of FIG. 1A according to this disclosure
  • FIG. 2A is a high level block diagram of an example content server from the network of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure;
  • FIG. 2B is an example functional architecture to implement complexity-based video processing according to this disclosure.
  • FIG. 3 illustrates the quarter-sample interpolation of the 4 ⁇ 4-block consisting of samples G, H, I, J, M, N, P, Q, R, S, V, W, T, U, X, Y according to this disclosure
  • FIG. 4 illustrates a 16 ⁇ 16 luma block, in which upper-case roman numerals reference columns of samples and lower-case roman numerals reference rows of samples according to this disclosure.
  • FIGS. 1A through 4 discussed below, and the various embodiments used to describe the principles of this disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of this disclosure may be implemented in any suitably arranged wired or wireless communication system, such as with a battery-powered smartphone, laptop, or other device having a wired or wireless network connection.
  • metadata used for display adaptation is embedded within a video stream or other video content information using a Supplemental Enhancement Information (SEI) message, which is parsed at a decoder to help with display power reduction.
  • SEI Supplemental Enhancement Information
  • the metadata can be delivered out-of-band using a transport mechanism, storage medium, or the like.
  • Elements in an extended SEI message can be derived at the encoder during video encoding.
  • FIG. 1A is a high-level diagram illustrating an example network 100 within which devices may implement complexity-based video processing according to this disclosure.
  • the network 100 includes a content encoder 101 , which can include a data processing system having an encoder controller configured to encode video content.
  • the content encoder 101 can be communicably coupled to (or alternatively integrated with) a content server 102 , which can include a data processing system configured to deliver video content to user devices.
  • the content server 102 can be coupled by a communications network, such as the Internet 103 and a wireless communication system including a base station (BS) 104 , for delivery of the video content to a user device 105 .
  • BS base station
  • the user device 105 can also be referred to as a user equipment (UE) or a mobile station (MS).
  • UE user equipment
  • MS mobile station
  • the user device 105 can be a “smart” phone, tablet, or other device capable of functions other than wireless voice communications, including at least playing video content.
  • the user device 105 can be a laptop computer or other wired or wireless device, such as any device that is primarily battery-powered during at least periods of typical operation.
  • FIG. 1B is a front view of an example user device 105 from the network 100 of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure.
  • FIG. 1C is a high level block diagram of the functional components in the example user device 105 of FIG. 1A according to this disclosure.
  • the user device 105 in this example represents a mobile phone or smartphone and includes a display 106 .
  • a processor 107 coupled to the display 106 can control content that is presented on the display 106 .
  • the processor 107 and other components within the user device 105 can be powered by a battery or other power source that can be recharged by an external power source or can be powered by an external power source.
  • a memory 108 coupled to the processor 107 can store or buffer video content for playback by the processor 107 and presentation on the display 106 and can also store a video player application (or “app”) 109 for performing such video playback.
  • the video content being played can be received, either contemporaneously (such as overlapping in time) with the playback of the video content or prior to the playback, via a transceiver 110 connected to an antenna 111 .
  • the video content can be received in wireless communications from a base station 104 .
  • FIG. 2A is a high level block diagram of an example content server 102 from the network 100 of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure.
  • the server 200 includes a bus system 205 , which can be configured to support communication between at least one processing device 210 , at least one storage device 215 , at least one communications unit 220 , and at least one input/output (I/O) unit 225 .
  • I/O input/output
  • the processing device 210 is configured to execute instructions that can be loaded into a memory 230 .
  • the server 200 can include any suitable number(s) and type(s) of processing devices 210 in any suitable arrangement.
  • Example processing devices 210 can include microprocessors, microcontrollers, digital signal processors, field programmable gate arrays, application specific integrated circuits, and discrete circuitry.
  • the processing device(s) 210 can be configured to execute processes and programs resident in the memory 230 , such as operations for generating display adaptation metadata and complexity information.
  • the memory 230 and a persistent storage 235 are examples of storage devices 215 , which represent any structure(s) capable of storing and facilitating retrieval of information (such as data, program code, or other suitable video information on a temporary or permanent basis).
  • the memory 230 can represent a random access memory or any other suitable volatile or non-volatile storage device(s).
  • the persistent storage 235 can contain one or more components or devices supporting longer-term storage of data, such as a ready only memory, hard drive, Flash memory, or optical disc.
  • the communications unit 220 is configured to support communications with other systems or devices.
  • the communications unit 220 can include a network interface card or a wireless transceiver facilitating communications over the network 103 .
  • the communications unit 220 can be configured to support communications through any suitable physical or wireless communication link(s).
  • the I/O unit 225 is configured to allow for input and output of data.
  • the I/O unit 225 can be configured to provide a connection for user input through a keyboard, mouse, keypad, touchscreen, or other suitable input device.
  • the I/O unit 225 can also be configured to send output to a display, printer, or other suitable output device.
  • the I/O unit 225 can be configured to allow the input or output of complexity information embedded within SEI message(s).
  • FIG. 2A is described as representing the server 102 of FIG. 1A
  • the same or similar structure can be used in one or more different user devices.
  • a laptop or desktop computer can have the same or similar structure as that shown in FIG. 2A .
  • FIG. 2B is an example functional architecture to implement complexity-based video processing according to this disclosure.
  • DA provides Green Metadata having complexity metrics.
  • the functional architecture 300 can includes a transmitter 310 and a receiver 350 .
  • the transmitter 310 can include a media pre-processor 312 , a first green metadata generator 314 , a video encoder 316 , a second green metadata generator 318 , and a power optimizer module 320 .
  • the receiver 350 can include a media decoder 352 , a presentation subsystem 354 , a green metadata extractor 356 , and a power optimizer module 358 .
  • the MPEG-4 Simple Profile Standard provides some complexity metrics in Clause 6.3.5.1. (of ISO/IEC 14496-2 International Standard, “MPEG-4 Simple Profile). Although these metrics are efficiently represented, they cannot be applied to complexity-based video processing in the widely-used AVC standard.
  • H.264/MPEG AVC is a decoding technology that is widely used in the industry. Certain embodiments provide methods to compute efficient complexity metrics for widely used decoders such as AVC. By analyzing the worst-case characteristics of the computationally intensive interpolation and deblocking modules, our methods pack each complexity metric into 8 bits, independent of the applicability period.
  • the MPEG Green Metadata International Standard (IS) text provides the following four complexity metrics for C-DVFS:
  • num_six_tap_filterings (32 bits)—indicates the number of 6-tap filterings in the specified period, as defined in ISO/IEC 14496-10 which is incorporated by reference into this patent document in its entirety. Each half-pel interpolation requires a 6-tap filtering operation and each quarter-pel interpolation requires either one or two 6-tap filtering operations.
  • num_alpha_point_deblocking_instances (32 bits)—indicates the number of alpha-point deblocking instances in the specified period. Using the notation in ISO/IEC 14496-10 an alpha-point deblocking instance is defined as a single filtering operation that produces either a single, filtered output p′0 or a single, filtered output q′0 where p′0 and q′0 are filtered samples across a 4 ⁇ 4 block edge. Therefore the number of alpha point deblocking instances is the total number of filtering operations applied to produce filtered samples of the type p′0 or q′0.
  • num_non_zero_macroblocks indicates the number of non-zero macroblocks in the specified period.
  • num_intra_coded_macroblocks indicates the number of intra-coded macroblocks in the specified period.
  • period_type specifies the type of upcoming period over which the four complexity metrics are applicable.
  • the complexity metrics will be respectively applicable over a single picture, all pictures up to (but not including) the picture containing the next I-slice, a specified time interval (in seconds) or a specified number of pictures.
  • the period_type is 2 or 3 then the period_type signals the duration of a scene over which the complexity metrics are applicable.
  • the worst-case characteristics of each metric are analyzed and then the metric is normalized by the largest occurrence in the worst-case.
  • the resulting fraction that lies in the [0,1] interval is packed into a byte.
  • Embodiments of this disclosure introduce the percentage of six-tap filterings having a size of a single byte, which allows an efficient representation.
  • max_num_six_tap_filterings_pic(i) denote the width and height of the reference picture luma array by PicWidthInSamples_L and refPicHeightEffectiveL respectively.
  • STFs six-tap filterings
  • FIG. 3 illustrates the quarter-sample interpolation of the 4 ⁇ 4-block 360 consisting of samples G, H, I, J, M, N, P, Q, R, S, V, W, T, U, X, Y according to this disclosure.
  • the embodiment of the quarter-sample interpolation of the 4 ⁇ 4-block 360 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure.
  • upper-case letters represent integer samples and lower-case letters represent fractional sample positions. Subscripts are used to indicate the integer sample that is associated with a fractional sample position.
  • the worst-case largest number of STFs are analyzed for the interpolation of the 4 ⁇ 4-block consisting of samples G, H, I, J, M, N, P, Q, R, S, V, W, T, U, X, Y.
  • MV motion vector
  • sixteen points need to be computed for each of the other fractional-sample positions (b G , c G , . . . , r G ) that the MV could point to.
  • the STFs required for each fractional-sample position that the MV could point to are counted.
  • the worst-case largest number of STFs is fifty-two, when the MV points to j G , f G , i G , k G or q G . Since the overhead of filtering samples outside the block is smaller for larger block sizes, the worst case STFs is when all partitions are 4 ⁇ 4 blocks and two MVs are used for each block (one from each refPicList).
  • a reference picture list (refpiclist) specifies the reference pictures, as defined in H.264 or ISO/IFC 14496 AVC specification, both of which are incorporated herein by reference.
  • num_six_tap_filterings is also reduced by a factor of N.
  • the preceding analysis also assumes an efficient implementation in which filtering is not repeated.
  • the samples bb, b G , s M , gg, hh are not re-computed but are re-used from a prior filtering operation.
  • filterings may be repeated because it is simpler to re-filter rather than to access a stored value.
  • the worst-case largest number of STFs in a picture is of the order of 1664 ⁇ , where ⁇ >1.
  • num_six_tap_filterings is also increased by a factor of ⁇ .
  • Embodiments of this disclosure introduce the percentage of alpha-point deblocking instances that allows an efficient representation.
  • FIG. 4 Let's consider a macroblock containing a 16 ⁇ 16 luma block in which the samples have been numbered in raster-scan order, as shown in FIG. 4 .
  • Upper-case roman numerals are used to reference columns of samples and lower-case roman numerals are used to reference rows of samples.
  • Column IV refers to the column of Samples 4, 20, . . . 244 and Row xiii refers to the row of samples 193, 194, . . . , 208.
  • Edges are indicated by an ordered pair that specifies the columns or rows on either side of the edge.
  • Edge (IV, V) refers to the vertical edge between Columns IV and V.
  • Edge (xii, xiii) indicates the horizontal edge between Rows xii and xiii. Note that the leftmost vertical edge and the topmost horizontal edge are denoted by (0, I) and (0, i) respectively.
  • FIG. 4 illustrates a 16 ⁇ 16 luma block 400 , in which upper-case roman numerals reference columns of samples and lower-case roman numerals reference rows of samples according to this disclosure.
  • the embodiment of the 16 ⁇ 16 luma block 400 shown in FIG. 4 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure.
  • the Vertical Edges (0, I), (IV, V), (VIII, IX) and (XII, XIII) are filtered first.
  • the Horizontal Edges (0, i), (iv, v), (viii, ix) and (xii, xiii) are filtered.
  • an APDI will occur on each row of the edge because the q 0 Samples 1, 17, . . . 241 will all be APDIs. Therefore, 16 APDIs will occur in Vertical Edge (0, I).
  • the worst-case number of APDIs is determined by the chroma sampling relative to the luma sampling.
  • the worst-case analysis for each chroma block is identical to that of the 16 ⁇ 16 luma block. Therefore, 256 APDIs are produced by worst-case deblocking of the two chroma blocks.
  • max_num_alpha_point_deblocking_instances_pic( i ) 128*chroma_format_multiplier*PicSizeInMbs where chroma_format_multiplier depends on the AVC variables separate_colour_plane_flag and chroma_format_idc as shown in the following table.
  • chroma_format_multiplier separate_colour_plane_flag chroma_format_idc Comment 1 1 any value separate colour plane 1 0 0 monochrome 1.5 0 1 4:2:0 sampling 2 0 2 4:2:2 sampling 3 0 3 4:4:4 sampling
  • N any value separate colour plane 1 0 0 monochrome 1.5 0 1 4:2:0 sampling 2 0 2 4:2:2 sampling 3 0 3 4:4:
  • Embodiments of this disclosure introduce the percentage of non-zero macroblocks that allows an efficient representation.
  • Embodiments of this disclosure introduce the percentage of intra-coded macroblocks that allows an efficient representation.
  • the logarithm (to base 2, or any other base) of the percentage metric can be used to emphasize the lower range of the metric.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A decoder for video processing includes a receiver configured to receive a bitstream associated with a video from a coder. The decoder also includes a processor configured to parse the bitstream to determine a percentage of at least one a number of six tap filterings or a number of alpha point deblocking instances, in a specified period. The processor is further configured to determine a voltage and frequency to be used for decoding the video proportional to the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances. The processor is configured to decode the video at the determined voltage and frequency. Other embodiments including a encoder and method also are disclosed.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS AND CLAIMS OF PRIORITY
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application No. 62/063,824, entitled “EFFICIENT COMPLEXITY METRICS FOR VIDEO PROCESSING”, filed Oct. 14, 2014, which is hereby incorporated by reference into this patent document in its entirety.
TECHNICAL FIELD
The present application relates generally to video processing devices and, more specifically, to methods for dynamic voltage and frequency scaling for video processing in order to reduce power usage.
BACKGROUND
Power consumption is an increasingly critical issue for video-capable mobile devices, where video processing requires a significant amount of energy for video encoding, decoding and associated memory transfers. Video codecs in mobile devices can be implemented using either software on the CPU, such as an ARM (Advanced RISC Machines) platform, or hardware via dedicated ASIC (application specific integrated circuit) design. Recent advances in circuits design have demonstrated that power consumption can be reduced if circuits are placed into a low-power state, which uses a slower clock rate and a lower supply voltage.
SUMMARY
This disclosure provides methods and apparatuses for implementing complexity-based video processing and corresponding power reduction in a display screen.
In a first example, a decoder for video processing is provided. The decoder includes a receiver configured to receive, from an encoder, a bitstream associated with a video. The decoder also includes a processor configured to parse the bitstream to determine a percentage of at least one of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period, determine a voltage and frequency to be used for decoding the video as a function of the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period; and decode the video at the determined voltage and frequency.
In a second example, an encoder for video processing is provided. The encoder includes a transmitter configured to transmit, to a decoder, a bitstream associated with a video. The encoder also includes a processor configured to code a video to have at least one variable of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period. The processor is also configured to determine a percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period. The processor is further configured to generate the bitstream containing the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period.
In a third example, a method for video processing is provided. The method includes parsing, at a decoder, a bitstream associated with a video to determine a percentage of at least one of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period. The method also includes determining, at the decoder, a voltage and frequency to be used for decoding the video according to the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period. The method further includes decoding, at the decoder, the video at the determined voltage and frequency.
Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.
Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The term “couple” and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another. The terms “transmit,” “receive,” and “communicate,” as well as derivatives thereof, encompass both direct and indirect communication unless explicitly specified. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning “and/or.” The phrase “associated with,” as well as derivatives thereof, means to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like. The term “controller” means any device, system or part thereof that controls at least one operation. Such a controller may be implemented in hardware or a combination of hardware and software and/or firmware. The functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. The phrase “at least one of,” when used with a list of items, means that different combinations of one or more of the listed items may be used, and only one item in the list may be needed. For example, “at least one of: A, B, and C” includes any of the following combinations: A, B, C, A and B, A and C, B and C, and A and B and C.
Moreover, various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The terms “application” and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory. A “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical signals or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
Definitions for other certain words and phrases are provided throughout this patent document. Those of ordinary skill in the art should understand that in many if not most instances, such definitions apply to prior uses as well as future uses of such defined words and phrases.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of this disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
FIG. 1A is a high level diagram illustrating an example network within which devices may implement complexity-based video processing according to this disclosure;
FIG. 1B is a front view of an example user device from the network of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure;
FIG. 1C is a high level block diagram of the functional components in the example user device of FIG. 1A according to this disclosure;
FIG. 2A is a high level block diagram of an example content server from the network of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure;
FIG. 2B is an example functional architecture to implement complexity-based video processing according to this disclosure;
FIG. 3 illustrates the quarter-sample interpolation of the 4×4-block consisting of samples G, H, I, J, M, N, P, Q, R, S, V, W, T, U, X, Y according to this disclosure; and
FIG. 4 illustrates a 16×16 luma block, in which upper-case roman numerals reference columns of samples and lower-case roman numerals reference rows of samples according to this disclosure.
DETAILED DESCRIPTION
FIGS. 1A through 4, discussed below, and the various embodiments used to describe the principles of this disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of this disclosure may be implemented in any suitably arranged wired or wireless communication system, such as with a battery-powered smartphone, laptop, or other device having a wired or wireless network connection.
The following documents and standards descriptions are hereby incorporated into the present disclosure as if fully set forth herein: (1) ISO/I 23001-11 International Standard, “MPEG Green Metadata”; (2) ISO/WC 14496-2 International Standard, “MPEG-4 Simple Profile”; (3) ITU-T H.264 or ISO/IEC 14496-10 International Standard, MPEG-4 AVC; and (4) U.S. patent application Ser. No. 14/091,238, “DYNAMIC VOLTAGE/FREQUENCY SCALING FOR VIDEO PROCESSING USING EMBEDDED COMPLEXITY METRICS,” filed on Nov. 26, 2013.
In embodiments of this disclosure, metadata used for display adaptation is embedded within a video stream or other video content information using a Supplemental Enhancement Information (SEI) message, which is parsed at a decoder to help with display power reduction. In other embodiments, the metadata can be delivered out-of-band using a transport mechanism, storage medium, or the like. Elements in an extended SEI message can be derived at the encoder during video encoding.
FIG. 1A is a high-level diagram illustrating an example network 100 within which devices may implement complexity-based video processing according to this disclosure. As shown in FIG. 1, the network 100 includes a content encoder 101, which can include a data processing system having an encoder controller configured to encode video content. The content encoder 101 can be communicably coupled to (or alternatively integrated with) a content server 102, which can include a data processing system configured to deliver video content to user devices. The content server 102 can be coupled by a communications network, such as the Internet 103 and a wireless communication system including a base station (BS) 104, for delivery of the video content to a user device 105. The user device 105 can also be referred to as a user equipment (UE) or a mobile station (MS). As noted above, the user device 105 can be a “smart” phone, tablet, or other device capable of functions other than wireless voice communications, including at least playing video content. Alternatively, the user device 105 can be a laptop computer or other wired or wireless device, such as any device that is primarily battery-powered during at least periods of typical operation.
FIG. 1B is a front view of an example user device 105 from the network 100 of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure. FIG. 1C is a high level block diagram of the functional components in the example user device 105 of FIG. 1A according to this disclosure. The user device 105 in this example represents a mobile phone or smartphone and includes a display 106. A processor 107 coupled to the display 106 can control content that is presented on the display 106. The processor 107 and other components within the user device 105 can be powered by a battery or other power source that can be recharged by an external power source or can be powered by an external power source. A memory 108 coupled to the processor 107 can store or buffer video content for playback by the processor 107 and presentation on the display 106 and can also store a video player application (or “app”) 109 for performing such video playback. The video content being played can be received, either contemporaneously (such as overlapping in time) with the playback of the video content or prior to the playback, via a transceiver 110 connected to an antenna 111. As described above, the video content can be received in wireless communications from a base station 104.
FIG. 2A is a high level block diagram of an example content server 102 from the network 100 of FIG. 1A within which complexity-based video processing can be implemented according to this disclosure. As shown in FIG. 2A, the server 200 includes a bus system 205, which can be configured to support communication between at least one processing device 210, at least one storage device 215, at least one communications unit 220, and at least one input/output (I/O) unit 225.
The processing device 210 is configured to execute instructions that can be loaded into a memory 230. The server 200 can include any suitable number(s) and type(s) of processing devices 210 in any suitable arrangement. Example processing devices 210 can include microprocessors, microcontrollers, digital signal processors, field programmable gate arrays, application specific integrated circuits, and discrete circuitry. The processing device(s) 210 can be configured to execute processes and programs resident in the memory 230, such as operations for generating display adaptation metadata and complexity information.
The memory 230 and a persistent storage 235 are examples of storage devices 215, which represent any structure(s) capable of storing and facilitating retrieval of information (such as data, program code, or other suitable video information on a temporary or permanent basis). The memory 230 can represent a random access memory or any other suitable volatile or non-volatile storage device(s). The persistent storage 235 can contain one or more components or devices supporting longer-term storage of data, such as a ready only memory, hard drive, Flash memory, or optical disc.
The communications unit 220 is configured to support communications with other systems or devices. For example, the communications unit 220 can include a network interface card or a wireless transceiver facilitating communications over the network 103. The communications unit 220 can be configured to support communications through any suitable physical or wireless communication link(s).
The I/O unit 225 is configured to allow for input and output of data. For example, the I/O unit 225 can be configured to provide a connection for user input through a keyboard, mouse, keypad, touchscreen, or other suitable input device. The I/O unit 225 can also be configured to send output to a display, printer, or other suitable output device. In some embodiments, the I/O unit 225 can be configured to allow the input or output of complexity information embedded within SEI message(s).
Note that while FIG. 2A is described as representing the server 102 of FIG. 1A, the same or similar structure can be used in one or more different user devices. For example, a laptop or desktop computer can have the same or similar structure as that shown in FIG. 2A.
FIG. 2B is an example functional architecture to implement complexity-based video processing according to this disclosure. Generally, DA provides Green Metadata having complexity metrics. As illustrated in FIG. 2B, the functional architecture 300 can includes a transmitter 310 and a receiver 350. The transmitter 310 can include a media pre-processor 312, a first green metadata generator 314, a video encoder 316, a second green metadata generator 318, and a power optimizer module 320. The receiver 350 can include a media decoder 352, a presentation subsystem 354, a green metadata extractor 356, and a power optimizer module 358.
The MPEG-4 Simple Profile Standard provides some complexity metrics in Clause 6.3.5.1. (of ISO/IEC 14496-2 International Standard, “MPEG-4 Simple Profile). Although these metrics are efficiently represented, they cannot be applied to complexity-based video processing in the widely-used AVC standard.
Power consumption is an increasingly critical issue for video-capable mobile devices, where video processing requires a significant amount of energy for video encoding, decoding and associated memory transfers. Recent advances in circuit design have demonstrated that power consumption can be reduced if circuits are placed into low-power states which use slower clock rates and lower supply voltages. To exploit these low-power states, complexity metrics that indicate decoding complexity are embedded in the bitstream and they are used to set the optimum low-power state of the decoding circuitry. This is the Codec Dynamic Voltage/Frequency Scaling (C-DVFS) decoder-power reduction technique.
Other systems fail to provide efficient complexity metrics that apply C-DVFS to widely-used decoders such as H.264/MPEG AVC. Depending on the applicability period, the prior art uses up to 32 bits for each AVC complexity metric.
Hence, there is a need for efficient complexity metrics that apply C-DVFS to widely-used decoders.
H.264/MPEG AVC is a decoding technology that is widely used in the industry. Certain embodiments provide methods to compute efficient complexity metrics for widely used decoders such as AVC. By analyzing the worst-case characteristics of the computationally intensive interpolation and deblocking modules, our methods pack each complexity metric into 8 bits, independent of the applicability period.
The MPEG Green Metadata International Standard (IS) text provides the following four complexity metrics for C-DVFS:
1. num_six_tap_filterings (32 bits)—indicates the number of 6-tap filterings in the specified period, as defined in ISO/IEC 14496-10 which is incorporated by reference into this patent document in its entirety. Each half-pel interpolation requires a 6-tap filtering operation and each quarter-pel interpolation requires either one or two 6-tap filtering operations.
2. num_alpha_point_deblocking_instances (32 bits)—indicates the number of alpha-point deblocking instances in the specified period. Using the notation in ISO/IEC 14496-10 an alpha-point deblocking instance is defined as a single filtering operation that produces either a single, filtered output p′0 or a single, filtered output q′0 where p′0 and q′0 are filtered samples across a 4×4 block edge. Therefore the number of alpha point deblocking instances is the total number of filtering operations applied to produce filtered samples of the type p′0 or q′0.
3. num_non_zero_macroblocks—indicates the number of non-zero macroblocks in the specified period.
4. num_intra_coded_macroblocks—indicates the number of intra-coded macroblocks in the specified period.
Note that there are four types of periods over which the metrics are applicable as defined by the period_type in the IS text, where the period_type specifies the type of upcoming period over which the four complexity metrics are applicable. For period_type=0, 1, 2, 3, the complexity metrics will be respectively applicable over a single picture, all pictures up to (but not including) the picture containing the next I-slice, a specified time interval (in seconds) or a specified number of pictures. When the period_type is 2 or 3, then the period_type signals the duration of a scene over which the complexity metrics are applicable.
To provide an efficient representation for the four complexity metrics, the worst-case characteristics of each metric are analyzed and then the metric is normalized by the largest occurrence in the worst-case. The resulting fraction that lies in the [0,1] interval is packed into a byte.
Byte Representation for the Six-Tap Filterings Metric
Embodiments of this disclosure introduce the percentage of six-tap filterings having a size of a single byte, which allows an efficient representation. The percentage of six-tap filterings is defined as follows:
percent_six_tap_filterings=Floor[(num_six_tap_filterings/max_num_six_tap_filterings)*255]   (1)
with max_num_six_tap_filterings defined as
max_num_six_tap_filterings=Σi=1 num _ pics _ per _ periodmax_num_six_tap_filterings_pic(i)   (2)
where: num_pics_per_period=the number of pictures in the specified period; and max_num_six_tap_filterings_pic(i)=the maximum number of six-tap filterings in the i-th picture within the specified period; and Floor(x) is the greatest integer less than or equal to x.
To determine max_num_six_tap_filterings_pic(i), denote the width and height of the reference picture luma array by PicWidthInSamples_L and refPicHeightEffectiveL respectively. At the decoder, in the worst-case, largest number of six-tap filterings (STFs) occurs in a picture when all partitions consist of 4×4 blocks that will be interpolated. The 4×4 blocks produce the largest number of STFs because the overhead from interpolating samples that are outside the block is larger for 4×4 blocks than for 8×8 blocks as explained below.
FIG. 3 illustrates the quarter-sample interpolation of the 4×4-block 360 consisting of samples G, H, I, J, M, N, P, Q, R, S, V, W, T, U, X, Y according to this disclosure. The embodiment of the quarter-sample interpolation of the 4×4-block 360 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure.
In FIG. 3, upper-case letters represent integer samples and lower-case letters represent fractional sample positions. Subscripts are used to indicate the integer sample that is associated with a fractional sample position. The worst-case largest number of STFs are analyzed for the interpolation of the 4×4-block consisting of samples G, H, I, J, M, N, P, Q, R, S, V, W, T, U, X, Y. This interpolation must be performed when a motion vector (MV) points to one of the following fractional-sample positions: aG, bG, cG, dG, eG, fG, gG, hG, iG, jG, kG, nG, pG, qG, rG. If the MV points to aG, then aG must be computed and the 15 points (aH, aI, . . . ) that have the same respective relative locations to H, I, J, M, N, P, Q, R, S, V, W, T, U, X, Y that aG has to G must be computed.
Similarly, sixteen points need to be computed for each of the other fractional-sample positions (bG, cG, . . . , rG) that the MV could point to. To determine the worst-case largest number of STFs for the interpolation of the 4×4 block, the STFs required for each fractional-sample position that the MV could point to are counted.
Case 1. If the MV points to bG, then to interpolate bG, one STF is applied to E, F, G, H, I, J, which are already available as integer samples. So we need 16 STFs to compute bG, . . . , bY for the 4×4 block.
Case 2. If the MV points to hG, then to interpolate hG, one STF is applied to A, C, G, M, R, T, which are already available as integer samples. As such, sixteen STFs are needed to compute h_G, . . . , h_Y for the 4×4 block.
Case 3. If the MV points to jG, then to interpolate jG, six STFs are needed to compute aa, bb, bG, sM, gg, hh because these are unavailable. Next one STF is needed to compute jG from aa, bb, bG, sM, gg, hh. So we need 7 STFs for jG:
a. To get jM, the samples bb, bG, sM, gg, hh, ii are needed. Only ii is unavailable. As such, two STFs are needed for jM (one for ii and one for jM);
b. To get jR, two STFs are needed (one for jj and one for jR); and
c. To get jT, two STFs are needed (one for kk and one for jT);
Therefore, for jG, jM, jR and jT, 7+2+2+2=13 STFs are needed. Since the computation is identical for each of the four columns GMRT, HNSU, IPVX and JQWY, 13*4=52 STFs are needed to compute jG, . . . jY for the 4×4 block.
Case 4. If the MV points to aG, then to interpolate aG, one STF is needed to get bG (from case 1) and therefore sixteen STFs are needed to compute aG, . . . , aY for the 4×4 block.
Case 5. If the MV points to cG, then to interpolate cG, one STF is needed to get bG (from Case 1) and therefore sixteen STFs are needed to compute cG, . . . , cY for the 4×4 block.
Case 6. If the MV points to dG, then to interpolate dG, one STF is needed to get hG (from Case 2) and therefore sixteen STFs are needed to compute dG, . . . , dY for the 4×4 block.
Case 7. If the MV points to nG, then to interpolate nG, one STF is needed to get hG (from Case 2) and therefore sixteen STFs are needed to compute nG, . . . , nY for the 4×4 block.
Case 8. If the MV points to fG, then to interpolate fG, seven STFs are needed to get jG (from Case 3). Note that bG is included in these 7 STFs. Therefore, from Case 3, 52 STFs are required to compute fG, . . . fY for the 4×4 block.
Case 9. If the MV points to iG, then to interpolate iG, seven STFs are needed to get jG. Note that hG is computed by one of these seven STFs. Therefore, fifty-two STFs are required to compute iG, . . . iY for the 4×4 block. For this analysis, the row jG, jH, jI, jJ is computed first (in order to obtain hG) and then this process is repeated for the other three rows (MNPQ, RSVW, TUXY) in the 4×4 block. Previously, in Case 3 Column GMRT was analyzed and then repeated for the other three columns (HNSU, IPVX, JQWY).
Case 10. If the MV points to kG, then to interpolate kG, seven STFs are needed to get jG. Note that mG is computed by one of these seven STFs. Therefore, fifty-two STFs are required to compute kG, . . . kY for the 4×4 block.
Case 11. If the MV points to qG, then to interpolate qG, seven STFs are needed to get jG. Note that sG is computed by one of these seven STFs. Therefore, fifty-two STFs are required to compute qG, . . . qY for the 4×4 block.
Case 12. If the MV points to eG, then to interpolate eG, two STFs are needed to get bG and hG (from Case 1, Case 2). Therefore thirty-two STFs are needed to compute eG, . . . , eY for the 4×4 block.
Case 13. If the MV points to gG, then to interpolate gG, two STFs are needed to get bG and mH. Therefore, thirty-two STFs are needed to compute gG, . . . , gY for the 4×4 block.
Case 14. If the MV points to pG, then to interpolate pG, two STFs are needed to get hG and sG. Therefore, thirty-two STFs are needed to compute pG, . . . , pY for the 4×4 block.
Case 15. If the MV points to rG, then to interpolate rG, two STFs are needed to get mG and sG. Therefore, thirty-two STFs are needed to compute rG, . . . , rY for the 4×4 block.
From Cases 1 thorough 15, the worst-case largest number of STFs is fifty-two, when the MV points to jG, fG, iG, kG or qG. Since the overhead of filtering samples outside the block is smaller for larger block sizes, the worst case STFs is when all partitions are 4×4 blocks and two MVs are used for each block (one from each refPicList). A reference picture list (refpiclist) specifies the reference pictures, as defined in H.264 or ISO/IFC 14496 AVC specification, both of which are incorporated herein by reference.
In this case, the worst-case largest number of STFs in a picture is:
max_num_six_tap_filterings_pic(i)=(worst case #STFs in a 4×4 block)*(#refPicLists)*(#MBs in the picture)*(#4×4 luma blocks per MB)=52*2*PicSizeInMbs*16=1664*PicSizeInMbs  (3)
The preceding analysis assumes that a processing unit performs a single six-tap filtering. However, in certain embodiments in which a processing unit performs N six-tap filtering operations simultaneously, where N>1, then the worst-case largest number of STFs in a picture is of the order of 1664/N. In such embodiments, num_six_tap_filterings is also reduced by a factor of N.
The preceding analysis also assumes an efficient implementation in which filtering is not repeated. For example in Case 3a, the samples bb, bG, sM, gg, hh are not re-computed but are re-used from a prior filtering operation. In other embodiments, filterings may be repeated because it is simpler to re-filter rather than to access a stored value. In such embodiments, the worst-case largest number of STFs in a picture is of the order of 1664α, where α>1. In such embodiments, num_six_tap_filterings is also increased by a factor of α.
Byte Representation for the Alpha-Point Deblocking Instances Metric
Embodiments of this disclosure introduce the percentage of alpha-point deblocking instances that allows an efficient representation. In one embodiment, in order to satisfy a size of a single byte, the percentage of alpha-point deblocking instances is defined as follows:
percent_alpha_point_deblocking_instances=Floor[(num_alpha_point_deblocking_instances/maxnum_alpha_point_deblocking_instances)*255]  (4)
with max_num_alpha_point_deblocking_instances defined as
Σi=1 num _ pics _ per _ periodmax_num_alpha_point_deblocking_instances_pic(i)  (5)
where: num_pics_per_period=the number of pictures in the specified period; and max_num_alpha_point_deblocking_instances_pic(i)=the maximum number of alpha-point deblocking instances in the ith picture within the specified period.
To determine max_num_alpha_point_deblocking_instances_pic(i), the worst-case, largest number of Alpha-Point Deblocking Instances (APDIs) that can occur when deblocking the picture at the decoder must be determined.
Let's consider a macroblock containing a 16×16 luma block in which the samples have been numbered in raster-scan order, as shown in FIG. 4. Upper-case roman numerals are used to reference columns of samples and lower-case roman numerals are used to reference rows of samples. For example, Column IV refers to the column of Samples 4, 20, . . . 244 and Row xiii refers to the row of samples 193, 194, . . . , 208. Edges are indicated by an ordered pair that specifies the columns or rows on either side of the edge. For example, Edge (IV, V) refers to the vertical edge between Columns IV and V. Similarly, Edge (xii, xiii) indicates the horizontal edge between Rows xii and xiii. Note that the leftmost vertical edge and the topmost horizontal edge are denoted by (0, I) and (0, i) respectively.
The maximum number of APDIs occurs when the 4×4 transform is used on each block and a single APDI occurs in every set of eight samples across a 4×4 block horizontal or vertical edge denoted as pi and qi with i=0, . . . , 3 (as shown in FIG. 8-11 of the ITU-T H.264 or ISO/IEC 14496-10 International Standard, MPEG-4 AVC spec).
FIG. 4 illustrates a 16×16 luma block 400, in which upper-case roman numerals reference columns of samples and lower-case roman numerals reference rows of samples according to this disclosure. The embodiment of the 16×16 luma block 400 shown in FIG. 4 is for illustration only. Other embodiments could be used without departing from the scope of the present disclosure.
For the macroblock in FIG. 4, the Vertical Edges (0, I), (IV, V), (VIII, IX) and (XII, XIII) are filtered first. Then, the Horizontal Edges (0, i), (iv, v), (viii, ix) and (xii, xiii) are filtered. Now, when Vertical Edge (0, I) is filtered, in the worst-case, an APDI will occur on each row of the edge because the q0 Samples 1, 17, . . . 241 will all be APDIs. Therefore, 16 APDIs will occur in Vertical Edge (0, I). Similarly, when Vertical Edge (IV, V) is filtered, there will also be 16 APDIs corresponding to the 16 (p0, q0) sample pairs (20, 21), (36, 37), . . . (244, 245). Thus, there will be 16*4=64 APDIs from vertical-edge filtering. After horizontal-edge filtering, there will be an additional 64 APDIs because each horizontal edge will contribute 16 APDIs. For example, Horizontal Edge (viii, ix) will contribute the 16 APDIs corresponding to the (p0, q0) sample pairs (113, 129), (114, 130), . . . , (128, 144). Hence, in the worst-case, deblocking the luma block in a macroblock produces 128 APDIs.
Next, the two chroma blocks corresponding to the luma block in the macroblock are considered. The worst-case number of APDIs is determined by the chroma sampling relative to the luma sampling.
For each chroma block in YUV 4:2:0 format, two vertical edges and two horizontal edges are filtered. Each edge contributes 8 APDIs, in the worst-case. So, 8*4*2=64 APDIs are produced by worst-case deblocking of the two chroma blocks.
For YUV 4:2:2 format, two vertical edges and four horizontal edges are filtered. Each vertical edge contributes 16 APDIs and each horizontal edge contributes 8 APDIs. So, 2*(2*16+4*8)=128 APDIs are produced by worst-case deblocking of the two chroma blocks.
For YUV 4:4:4 format, the worst-case analysis for each chroma block is identical to that of the 16×16 luma block. Therefore, 256 APDIs are produced by worst-case deblocking of the two chroma blocks.
Finally, for separate color planes, the worst-case analysis of a 16×16 block is identical to that a 16×16 luma block.
To conclude, since each picture has PicSizeInMbs macroblocks, the worst-case (maximum) number of APDIs per picture is as follows:
max_num _alpha _point _de - blocking_instances _pic ( i ) = PicSizeInMbs * ( 128 + 64 ) / N = 192 / N * PicSizeInMbs , for YUV 4 : 2 : 0 format ; = PicSizeInMbs * ( 128 + 128 ) / N = 256 * PicSizeInMbs , for YUV 4 : 2 : 2 format ; = PicSizeInMbs * ( 128 + 256 ) / N = 384 * PicSizeInMbs , for YUV 4 : 4 : 4 format ; or = 128 * PicSizeInMbs , for a single color plane .
Equivalently,
max_num_alpha_point_deblocking_instances_pic(i)=128*chroma_format_multiplier*PicSizeInMbs
where chroma_format_multiplier depends on the AVC variables separate_colour_plane_flag and chroma_format_idc as shown in the following table.
chroma_format_multiplier separate_colour_plane_flag chroma_format_idc Comment
1 1 any value separate colour
plane
1 0 0 monochrome
1.5 0 1 4:2:0 sampling
2 0 2 4:2:2 sampling
3 0 3 4:4:4 sampling

The preceding analysis assumes that a processing unit performs a single APDI. However, in certain embodiments in which a processing unit performs N APDIs simultaneously, where N>1, then the worst-case largest number of APDIs in a picture is reduced by a factor of N:
max_num _alpha _point _de - blocking_instances _pic ( i ) = PicSizeInMbs * ( 128 + 64 ) / N = 192 / N * PicSizeInMbs , for YUV 4 : 2 : 0 format ; = PicSizeInMbs * ( 128 + 128 ) / N = 256 / N * PicSizeInMbs , for YUV 4 : 2 : 2 format ; = PicSizeInMbs * ( 128 + 256 ) / N = 384 / N * PicSizeInMbs , for YUV 4 : 4 : 4 format ; = 128 / N * PicSizeInMbs , for a single color plane .
In such embodiments, num_alpha_point_deblocking_instances is also reduced by a factor of N.
Byte Representation for the Non-Zero Macroblocks Metric
Embodiments of this disclosure introduce the percentage of non-zero macroblocks that allows an efficient representation. In one embodiment, in order to satisfy a size of a single byte, the percentage of non-zero macroblocks is defined as follows:
percent_non_zero_macroblocks=(num_non_zero_macroblocks/max_num_non_zero_macroblocks)*255  (7)
with max_num_non_zero_macroblocks defined as
Σi=1 num _ pics _ per _ periodmax_num_non_zero_macroblocks_pic(i)  (8)
where: num_pics_per_period=the number of pictures in the specified period; and max_num_non_zero_macroblocks_pic(i)=picSizeInMBs for the ith picture within the specified period.
Byte Representation for the Intra-Coded Macroblocks Metric
Embodiments of this disclosure introduce the percentage of intra-coded macroblocks that allows an efficient representation. In one embodiment, in order to satisfy a size of a single byte, the percentage of intra-coded macroblocks is defined as follows:
percent_intra_coded_macroblocks=(num_intra_coded_macroblocks/max_num_intra_coded_macroblocks)*255  (9)
with max_num_intra_coded_macroblocks defined as:
Σi=1 num _ pics _ per _ period max_num_intra_coded_macroblocks_pic(i)  (10)
where: num_pics_per_period=the number of pictures in the specified period; and max_num_intra_code_macroblocks_pic(i)=picSizeInMBs for the ith picture within the specified period.
In an alternative embodiment, the logarithm (to base 2, or any other base) of the percentage metric can be used to emphasize the lower range of the metric.
The techniques disclosed in this patent document allow products, such as smartphones and tablets, to be much more power efficient while reducing the data costs, thus improving the user experience for mobile streaming applications.
While each process flow and/or signal sequence depicted in the figures and described above depicts a sequence of steps and/or signals, either in series or in tandem, unless explicitly stated or otherwise self-evident (such as that a signal cannot be received before being transmitted) no inference should be drawn from that sequence regarding specific order of performance, performance of steps or portions or transmission of signals thereof serially rather than concurrently or in an overlapping manner, or performance the steps or transmission of signals depicted exclusively without the occurrence of intervening or intermediate steps or signals. Moreover, those skilled in the art will recognize that complete processes and signal sequences are not illustrated or described. Instead, for simplicity and clarity, only so much of the respective processes and signal sequences as is unique to this disclosure or necessary for an understanding of this disclosure is depicted and described.
Although this disclosure has been described with exemplary embodiments, various changes and modifications can be suggested to one skilled in the art. It is intended that this disclosure encompass such changes and modifications as fall within the scope of the appended claims.

Claims (17)

What is claimed is:
1. A decoder for video processing comprising:
a receiver configured to receive, from a encoder, a bitstream associated with a video;
a processor configured to:
parse the bitstream to determine a percentage of at least one of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period;
determine a voltage and frequency to be used for decoding the video as a function
of the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period; and
operate the decoder at the determined voltage and frequency to decode the video, wherein the percentage of six tap filtering, denoted as percent_six_tap_filterings, in the specified period is determined using:

(num six tap filterings/max num six tap filterings)*255
wherein the max num six tap filterings is defined as:

max num six tap filterings=Σi=1 num _ pics _ per _ period max num six tap filterings pic(i)
where:
the num pics per period=a number of pictures in the specified period; and
the max num six tap filterings pic(i)=a maximum number of six-tap filterings in an ith picture within the specified period.
2. The decoder according to claim 1 wherein the maximum number of STFs, noted as max_num_six_tap_filterings_pic(i), is determined according to:

the max_num_six_tap_filterings_pic(i)=1664*PicSizeInMbs,
where the PicSizeInMbs is picture size in a macroblock.
3. The decoder according to claim 1 wherein the maximum number of STFs, noted as max_num_six_tap_filterings_pic(i), is determined according to:

the max_num_six_tap_filterings_pic(i)=1664*x*PicSizeInMbs,
where the PicSizeInMbs is picture size in a macroblock, and x=1/N, where N is the number of STFs performed by a single processing unit (with N>1), or x=α, where α>1 is a factor that accounts for repeated filterings.
4. The decoder according to claim 1, wherein the percentage of alpha point deblocking instances, noted as percent_alpha_point_deblocking_instances, in the specified period is determined using:

(num_alpha_point_deblocking_instances/max_num_alpha_point_deblocking_instances)*255,
with the max_num_alpha_point_deblocking_instances defined as:

Σi=1 num _ pics _ per _ period max_num_alpha_point_deblocking_instances_pic(i)
where:
the num_pics_per_period=a number of pictures in the specified period; and
the max_num_alpha_point_deblocking_instances_pic(i)=a maximum number of alpha-point deblocking instances in the ith picture within the specified period.
5. The decoder according to claim 4, wherein the max_num_alpha_point_deblocking_instances_pic(i) is determined by either:
192*PicSizeInMbs, for YUV 4:2:0 format;
256*PicSizeInMbs, for YUV 4:2:2 format;
384*PicSizeInMbs, for YUV 4:4:4 format; or
128*PicSizeInMbs, for a single color plane, where the PicSizeInMbs is a number of macroblocks in a picture.
6. The decoder according to claim 4, wherein the max_num_alpha_point_deblocking_instances_pic(i) is determined by either:
192/N*PicSizeInMbs, for YUV 4:2:0 format;
256/N*PicSizeInMbs, for YUV 4:2:2 format;
384/N*PicSizeInMbs, for YUV 4:4:4 format or
128/N*PicSizeInMbs, for a single color plane,
where N is a number of STFs performed by a single processing unit and N>1.
7. An encoder for video processing, the encoder comprising:
a transmitter configured to transmit a bitstream associated with a video to a decoder;
and a processor configured to:
encode a video to include at least one variable of a number of six tap filterings or a number of alpha point deblocking instances;
determine a percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances; and
generate the bitstream containing the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances,
wherein the percentage of six tap filtering, noted as percent_six_tap_filterings, in the specified period is determined using:

(num six tap filterings/max num six tap filterings)*255
with the max num six tap filterings defined as:

max num six tap filterings=Σi=1 num _ pics _ per _ period max num six tap filterings pic(i)
where:
the num pics per period=a number of pictures in the specified period; and
the max num six tap filterings pic(i)=a maximum number of six-tap filterings in an i-th picture within the specified period.
8. The encoder according to claim 7, wherein the maximum number of STFs, noted as max_num_six_tap_filterings_pic(i), is determined according to:

the max_num_six_tap_filterings_pic(i)=1664*x*PicSizeInMbs,
where the PicSizeInMbs is picture size in a macroblock, and x=1/N, where N is the number of STFs performed by a single processing unit (with N>1), or x=α, where α>1 is a factor that accounts for repeated filterings.
9. The encoder according to claim 7, wherein the percentage of alpha point deblocking instances, noted as percent_alpha_point_deblocking_instances, in the specified period is determined using:

(num_alpha_point_deblocking_instances/max_num_alpha_point_deblocking_instances)*255,
with the max_num_alpha_point_deblocking_instances defined as:

Σi=1 num _ pics _ per _ period max_num_alpha_point_deblocking_instances_pic(i)
where:
the num_pics_per_period=a number of pictures in the specified period; and
the max_num_alpha_point_deblocking_instances_pic(i)=a maximum number of alpha-point deblocking instances in the ith picture within the specified period.
10. The encoder according to claim 9, wherein the max_num_alpha_point_deblocking_instances_pic(i) is determined by either:
192/N*PicSizeInMbs, for YUV 4:2:0 format;
256/N*PicSizeInMbs, for YUV 4:2:2 format;
384/N*PicSizeInMbs, for YUV 4:4:4 format; or
128/N*PicSizeInMbs, for a single color plane, where N is a number of STFs performed by a single processing unit and N>1.
11. A method for video processing, the method comprising:
parsing, at a decoder, a bitstream associated with a video to determine a percentage of at least one of a number of six tap filterings or a number of alpha point deblocking instances, in a specified period;
determining, at the decoder, a voltage and frequency to be used for decoding the video proportional to the percentage of the at least one of the number of six tap filterings or the number of alpha point deblocking instances, in the specified period; and
operating the decoder at the determined voltage and frequency to decode the video,
wherein the percentage of six tap filtering, noted as percent six tap filterings, in the specified period is determined using:

num six tap filterings/max num six tap filterings)*255
wherein the max num six tap filterings is defined as:

max num six tap filterings=Σj=1 num _ pics _ per _ period max num six tap filterings pic(i)
where:
num pics per period=a number of pictures in the specified period; and
max num six tap filterings pic(i)=a maximum number of six-tap filterings in an i-th picture within the specified period.
12. The method according to claim 11, wherein the percentage of either the number of six tap filterings, the number of alpha point deblocking instances, the number of non-zero macroblocks, or the number of intra-coded macroblocks, is of a size of one byte.
13. The method according to claim 11 wherein the maximum number of STFs, noted as max_num_six_tap_filterings_pic(i), in a picture is determined according to:

the max_num_six_tap_filterings_pic(i)=1664*PicSizeInMbs,
where the PicSizeInMbs is a number of macroblocks in a picture.
14. The method according to claim 11, wherein the maximum number of STFs, noted as max_num_six_tap_filterings_pic(i), is determined according to:

the max_num_six_tap_filterings_pic(i)=1664*x*PicSizeInMbs,
where the PicSizeInMbs is picture size in a macroblock, and x=1/N, where N is the number of STFs performed by a single processing unit (with N>1), or x=α, where α>1 is a factor that accounts for repeated filterings.
15. The method according to claim 11, wherein the percentage of alpha point deblocking instances, noted as percent_alpha_point_deblocking_instances, in the specified period is determined using:

(num_alpha_point_deblocking_instances/max_num_alpha_point_deblocking_instances)*255,
with the max_num_alpha_point_deblocking_instances defined as:

Σi=1 num _ pics _ per _ period max_num_alpha_point_deblocking_instances_pic(i)
where:
the num_pics_per_period=a number of pictures in the specified period; and
the max_num_alpha_point_deblocking_instances_pic(i)=a maximum number of alpha-point deblocking instances in the ith picture within the specified period.
16. The method according to claim 15, wherein
the max_num_alpha_point_deblocking_instances_pic(i) is determined by either:
192*PicSizeInMbs, for YUV 4:2:0 format;
256*PicSizeInMbs, for YUV 4:2:2 format;
384*PicSizeInMbs, for YUV 4:4:4 format; or
128*PicSizeInMbs, for a single color plane, where the PicSizeInMbs is a number of macroblocks in a picture.
17. The method according to claim 15, wherein the max_num_alpha_point_deblocking_instances_pic(i) is determined by either:
192/N*PicSizeInMbs, for YUV 4:2:0 format;
256/N*PicSizeInMbs, for YUV 4:2:2 format;
384/N*PicSizeInMbs, for YUV 4:4:4 format; or
128/N*PicSizeInMbs, for a single color plane, where N is the number of STFs performed by a single processing unit and N>1.
US14/877,776 2014-10-14 2015-10-07 Method and apparatus for video processing with complexity information Expired - Fee Related US9866846B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/877,776 US9866846B2 (en) 2014-10-14 2015-10-07 Method and apparatus for video processing with complexity information
PCT/KR2015/010852 WO2016060478A1 (en) 2014-10-14 2015-10-14 Method and apparatus for video processing with complexity information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462063824P 2014-10-14 2014-10-14
US14/877,776 US9866846B2 (en) 2014-10-14 2015-10-07 Method and apparatus for video processing with complexity information

Publications (2)

Publication Number Publication Date
US20160105680A1 US20160105680A1 (en) 2016-04-14
US9866846B2 true US9866846B2 (en) 2018-01-09

Family

ID=55656360

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/877,776 Expired - Fee Related US9866846B2 (en) 2014-10-14 2015-10-07 Method and apparatus for video processing with complexity information

Country Status (2)

Country Link
US (1) US9866846B2 (en)
WO (1) WO2016060478A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022167210A1 (en) * 2021-02-03 2022-08-11 Interdigital Vc Holdings France, Sas Metadata for signaling information representative of an energy consumption of a decoding process

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11838553B2 (en) * 2021-08-09 2023-12-05 Qualcomm Incorporated Green metadata signaling

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007053118A1 (en) 2005-11-04 2007-05-10 National University Of Singapore A method and a system for determining predicted numbers of processor cycles required for respective segments of a media file for playback of the media file
US20070291858A1 (en) * 2006-06-16 2007-12-20 Via Technologies, Inc. Systems and Methods of Video Compression Deblocking
US20120042313A1 (en) 2010-08-13 2012-02-16 Weng-Hang Tam System having tunable performance, and associated method
US20120151065A1 (en) * 2010-12-13 2012-06-14 Google Inc. Resource allocation for video playback
US20120207216A1 (en) * 2009-10-22 2012-08-16 Zhejiang Uiniversity Video and image encoding/decoding system based on spatial domain prediction
US20120314771A1 (en) * 2009-08-21 2012-12-13 Sk Telecom Co., Ltd. Method and apparatus for interpolating reference picture and method and apparatus for encoding/decoding image using same

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007053118A1 (en) 2005-11-04 2007-05-10 National University Of Singapore A method and a system for determining predicted numbers of processor cycles required for respective segments of a media file for playback of the media file
US20090112931A1 (en) * 2005-11-04 2009-04-30 Ye Wang Method and a System for Determining Predicted Numbers of Processor Cycles Required for Respective Segments of a Media File for Playback of the Media File
US20070291858A1 (en) * 2006-06-16 2007-12-20 Via Technologies, Inc. Systems and Methods of Video Compression Deblocking
US20120314771A1 (en) * 2009-08-21 2012-12-13 Sk Telecom Co., Ltd. Method and apparatus for interpolating reference picture and method and apparatus for encoding/decoding image using same
US20120207216A1 (en) * 2009-10-22 2012-08-16 Zhejiang Uiniversity Video and image encoding/decoding system based on spatial domain prediction
US20120042313A1 (en) 2010-08-13 2012-02-16 Weng-Hang Tam System having tunable performance, and associated method
US20120151065A1 (en) * 2010-12-13 2012-06-14 Google Inc. Resource allocation for video playback

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
International Search Report and Written Opinion issued for PCT/KR2015/010852 dated Feb. 2, 2016, 5 pgs.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022167210A1 (en) * 2021-02-03 2022-08-11 Interdigital Vc Holdings France, Sas Metadata for signaling information representative of an energy consumption of a decoding process

Also Published As

Publication number Publication date
US20160105680A1 (en) 2016-04-14
WO2016060478A1 (en) 2016-04-21

Similar Documents

Publication Publication Date Title
US10506229B2 (en) Codeword space reduction for intra chroma mode signaling for HEVC
EP2922296A2 (en) Efficient software for transcoding to HEVC on multi-core processors
US10123026B2 (en) Codeword space reduction for intra chroma mode signaling for HEVC
US9826245B2 (en) Method, apparatus and recording medium for encoding/decoding using parallel processing of image tiles
US8582646B2 (en) Methods for delta-QP signaling for decoder parallelization in HEVC
US10225578B2 (en) Intra-prediction edge filtering
US20230128206A1 (en) Method and apparatus for dc intra prediction
US9866846B2 (en) Method and apparatus for video processing with complexity information
US11323706B2 (en) Method and apparatus for aspect-ratio dependent filtering for intra-prediction
US20140079138A1 (en) Simplifiication of pic_order_cnt_lsb calculation in hm8
US9554131B1 (en) Multi-slice/tile encoder with overlapping spatial sections
US11805250B2 (en) Performing intra-prediction using intra reference sample filter switching
US20140092962A1 (en) Inter field predictions with hevc

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FERNANDES, FELIX CARLOS;BUDAGAVI, MADHUKAR;SIGNING DATES FROM 20151006 TO 20151020;REEL/FRAME:037344/0721

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20220109