WO2012027894A1 - Video classification systems and methods - Google Patents

Video classification systems and methods Download PDF

Info

Publication number
WO2012027894A1
WO2012027894A1 PCT/CN2010/076569 CN2010076569W WO2012027894A1 WO 2012027894 A1 WO2012027894 A1 WO 2012027894A1 CN 2010076569 W CN2010076569 W CN 2010076569W WO 2012027894 A1 WO2012027894 A1 WO 2012027894A1
Authority
WO
WIPO (PCT)
Prior art keywords
macroblock
encoding
distortion
frame
video
Prior art date
Application number
PCT/CN2010/076569
Other languages
French (fr)
Inventor
Fang Shi
Biao Wang
Original Assignee
Intersil Americas Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intersil Americas Inc. filed Critical Intersil Americas Inc.
Priority to PCT/CN2010/076569 priority Critical patent/WO2012027894A1/en
Priority to CN201080062017XA priority patent/CN102771123A/en
Priority to US13/225,238 priority patent/US20120057640A1/en
Priority to US13/225,269 priority patent/US8824554B2/en
Priority to US13/225,222 priority patent/US20120057629A1/en
Priority to US13/225,202 priority patent/US20120057633A1/en
Publication of WO2012027894A1 publication Critical patent/WO2012027894A1/en
Priority to US14/472,313 priority patent/US9609348B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • H04N19/194Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive involving only two passes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • Fig. 1 illustrates the relationship of distortion and rate difference between Intra Inter modes for a given quantization parameter.
  • FIG. 2 is a flowchart illustrating a content classification based mode decision method.
  • FIG. 3 is a simplified block schematic illustrating a processing system employed in certain embodiments of the invention.
  • Video standards such as H.264/AVC employ mode decision as an encoding decision process to determine whether a macroblock (“MB") is encoded as an intra-prediction mode ("Intra Mode”) or an inter-prediction mode (“Inter Mode”). Rate-distortion optimization techniques are commonly applied in various modes
  • rate-distortion cost J is defined as
  • distortion D is defined as the difference between reconstructed MB and original MB
  • rate R represents the bits used to encode the current MB
  • coefficient ⁇ is a weighting factor.
  • SAD sum of absolute difference
  • Rate-distortion optimization (RDO) techniques can provide a balance of encoding quality and compression ratio 1 .
  • An accurate calculation of rate R in equation (1 ) is computationally costly and generally involves a dual-pass encoding process which requires the use of hardware resources and which introduces additional delays.
  • Research has been conducted to optimize the calculation of R and to provide a fast rate-distortion balanced mode decision algorithm.
  • estimation of bit rate R per MB is generally very costly due to tight pipeline
  • distortion D is used to determine the mode decision when R is omitted from equation (1 ).
  • Mode optimization typically cannot be achieved by using D alone without considering the bit rate perspective of encoding.
  • Intra Mode's SAD for background MB can be smaller than Inter Mode SAD values: therefore, Intra-mode is typically selected.
  • Intra Mode encoding usually consumes many more bits than Inter Mode encoding and, consequentially, encoding bits may be wasted and background blocky artifacts can be observed.
  • Certain embodiments employ a comparison of rate cost distortions Jjntra and Jjnter. Based on equation (1 ), a comparison can be taken as equivalent to the comparison of D mtra + ⁇ * ( ⁇ ) D mter shown in Equation (2), where A * (AR)
  • (denoted as ⁇ hereafter) is the rate difference weighting factor between Intra mode and Inter mode.
  • Fig. 1 shows the relationship of AR and D for a given QP.
  • SAD is used as distortion
  • AR R mtra -R mter .
  • R mtra represents bit numbers used by the intra mode encoder to encode the current microblock
  • R mter represents bit numbers used by the inter mode encoder to encode current microblock.
  • Rho-domain (“p-domain") content classification.
  • Certain embodiments of the invention provide an innovative p-domain metric ⁇ and systems and methods that apply the metric. In some embodiments, the definition of
  • NZ will be used herein to represent p, where NZ can be understood as meaning a number of non-zero coefficients after quantization of each macroblock in video standards such as the H.264 video standard.
  • a p-domain deviation metric ⁇ may be defined as a recursive weighted ration between the theoretical NZ_QP curve and the actual NZ_QP curve. Normalized ⁇ typically fluctuates around 1 .0. A value of ⁇ smaller than 1 .0 can indicate that the actual encoded bit rate is larger than the expectation, implying that a more complicated motion contextual content has been encountered. In contrast, a value of ⁇ larger than 1 .0 indicates that the actual encoded bit rate is smaller than the expectation, implying a smoother motion content encountered. Therefore, p-domain deviation ⁇ can be used as an indicator to classify video content to high motion complexity, medium, medium-low and low motion complexity categories. Based on motion complexity classification, a fast mode decision algorithm can be employed.
  • a content classification based mode decision algorithm is illustrated.
  • the algorithm may be embodied in a combination of hardware and software and may be deployed as instructions and data stored in a computer readable media. It will be appreciated that the instructions and data may be configured and/or adapted such that execution of the instructions by a processor cause the processor to perform the method described in Fig. 2.
  • step 204 If at step 203 it is determined that a current frame belongs to the first 5 frames of a video sequence, then step 204 is performed next; otherwise step 203 is performed next.
  • motion complexity index T n is initiated based on initial QP and complexity information and the P-point can be found from QP _P _T n tables. Step 206 can then be performed.
  • NZ_QP deviation ⁇ is calculated
  • step 205 the motion complexity index based on T n is then recalculated based on deviation ⁇ .
  • a table lookup from QP _P _T n tables may be performed to find P for the current frame based on weighted previous frame's QP value and content classification index T N before performing step 206.
  • deviation ⁇ is calculated with respect to distortion D based on the tangent relationship of ⁇ and D, the distribution frequency of D, and the location of P-point.
  • a mathematical model ⁇ can be established as a function of P-point, D and QP for each motion complexity class to represent the cost deviation ⁇ for each MB.
  • QP _P _T n ⁇ s shown in Table 1 , here below:
  • Inter Mode RD cost J can be replaced by D as shown in equation (2) and Intra Mode cost J can be replaced by D + T , where ⁇ is derived from experimental model ⁇ as described at step 206.
  • a winning mode may be selected as the mode which yields the minimum mode cost J. The process is typically repeated until it is determined at 210 the encoding of the current frame is finished.
  • the mode-decision algorithm, QP _P _T n table and deviation model ⁇ are built offline from experimental results.
  • classification index T n and its corresponding methods are described in a related, concurrently filed application titled "p-domain metrics ⁇ and its applications.”
  • the video classification based mode decision algorithms, systems and methods described herein can provide a very cost efficient, fast and robust alternative approach compared with conventional systems that tend to be computationally costly, and which are usually involve dual-pass encoding mode decision algorithms.
  • a fast table-lookup method is used to get a P-point value. From the P-point, QP and content classification index T n , and
  • MB cost deviation ⁇ can be obtained from a selected experimental model ⁇ p . Mode decisions can be made efficiently by inserting ⁇ into equation (2).
  • FIG. 3 certain embodiments of the invention employ a processing system that includes at least one computing system 30 deployed to perform certain of the steps described above.
  • Computing system 30 may be a commercially available system that executes commercially available operating systems such as Microsoft Windows®, UNIX or a variant thereof, Linux, a real time operating system and or a proprietary operating system.
  • the architecture of the computing system may be adapted, configured and/or designed for integration in the processing system, for embedding in one or more of an image capture system, communications device and/or graphics processing systems.
  • computing system 30 comprises a bus 302 and/or other mechanisms for
  • processors communicating between processors, whether those processors are integral to the computing system 30 (e.g. 304, 305) or located in different, perhaps physically separated computing systems 300.
  • processor 304 and/or 305 comprises a CISC or RISC computing processor and/or one or more digital signal processors.
  • processor 304 and/or 305 may be embodied in a custom device and/or may perform as a configurable sequencer.
  • Device drivers 303 may provide output signals used to control internal and external components and to communicate between processors 304 and 305.
  • Computing system 30 also typically comprises memory 306 that may include one or more of random access memory (“RAM”), static memory, cache, flash memory and any other suitable type of storage device that can be coupled to bus 302.
  • Memory 306 can be used for storing instructions and data that can cause one or more of processors 304 and 305 to perform a desired process.
  • Main memory 306 may be used for storing transient and/or temporary data such as variables and intermediate information generated and/or used during execution of the instructions by processor 304 or 305.
  • Computing system 30 also typically comprises non-volatile storage such as read only memory (“ROM”) 308, flash memory, memory cards or the like; non-volatile storage may be connected to the bus 302, but may equally be connected using a high-speed universal serial bus (USB), Firewire or other such bus that is coupled to bus 302.
  • Non-volatile storage can be used for storing configuration, and other information, including instructions executed by processors 304 and/or 305.
  • Non-volatile storage may also include mass storage device 310, such as a magnetic disk, optical disk, flash disk that may be directly or indirectly coupled to bus 302 and used for storing instructions to be executed by processors 304 and/or 305, as well as other information.
  • computing system 30 may be communicatively coupled to a display system 312, such as an LCD flat panel display, including touch panel displays, electroluminescent display, plasma display, cathode ray tube or other display device that can be configured and adapted to receive and display information to a user of computing system 30.
  • a display system 312 such as an LCD flat panel display, including touch panel displays, electroluminescent display, plasma display, cathode ray tube or other display device that can be configured and adapted to receive and display information to a user of computing system 30.
  • device drivers 303 can include a display driver, graphics adapter and/or other modules that maintain a digital representation of a display and convert the digital representation to a signal for driving a display system 312.
  • Display system 312 may also include logic and software to generate a display from a signal provided by system 300. In that regard, display 312 may be provided as a remote terminal or in a session on a different computing system 30.
  • An input device 314 is generally provided locally or through a remote system and typically provides for alphanumeric input as well as cursor control 316 input, such as a mouse, a trackball, etc. It will be appreciated that input and output can be provided to a wireless device such as a PDA, a tablet computer or other system suitable equipped to display the images and provide user input.
  • a wireless device such as a PDA, a tablet computer or other system suitable equipped to display the images and provide user input.
  • portions of the described invention may be performed by computing system 30.
  • Processor 304 executes one or more sequences of instructions.
  • such instructions may be stored in main memory 306, having been received from a computer-readable medium such as storage device 310.
  • Execution of the sequences of instructions contained in main memory 306 causes processor 304 to perform process steps according to certain aspects of the invention.
  • functionality may be provided by embedded computing systems that perform specific functions wherein the embedded systems employ a customized combination of hardware and software to perform a set of predefined tasks.
  • embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
  • Non-volatile storage may be embodied on media such as optical or magnetic disks, including DVD, CD-ROM and BluRay. Storage may be provided locally and in physical proximity to
  • Nonvolatile storage may be removable from computing system 304, as in the example of BluRay, DVD or CD storage or memory cards or sticks that can be easily connected or disconnected from a computer using a standard interface, including USB, etc.
  • computer-readable media can include floppy disks, flexible disks, hard disks, magnetic tape, any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
  • Transmission media can be used to connect elements of the processing system and/or components of computing system 30. Such media can include twisted pair wiring, coaxial cables, copper wire and fiber optics. Transmission media can also include wireless media such as radio, acoustic and light waves. In
  • radio frequency (RF), fiber optic and infrared (IR) data communications may be used.
  • Various forms of computer readable media may participate in providing instructions and data for execution by processor 304 and/or 305.
  • the instructions may initially be retrieved from a magnetic disk of a remote computer and transmitted over a network or modem to computing system 30.
  • the instructions may optionally be stored in a different storage or a different part of storage prior to or during execution.
  • Computing system 30 may include a communication interface 318 that provides two-way data communication over a network 320 that can include a local network 322, a wide area network or some combination of the two.
  • a network 320 can include a local network 322, a wide area network or some combination of the two.
  • ISDN integrated services digital network
  • LAN local area network
  • Network link 320 typically provides data communication through one or more networks to other data devices.
  • network link 320 may provide a connection through local network 322 to a host computer 324 or to a wide are network such as the Internet 328.
  • Local network 322 and Internet 328 may both use electrical, electromagnetic or optical signals that carry digital data streams.
  • Computing system 30 can use one or more networks to send messages and data, including program code and other information.
  • a server 330 might transmit a requested code for an application program through Internet 328 and may receive in response a downloaded application that provides or augments functional modules such as those described in the examples above.
  • the received code may be executed by processor 304 and/or 305.
  • Certain embodiments of the invention provide video encoder systems and methods.
  • the encoder systems employ content classification. Some of these embodiments comprise maintaining one or more tables relating quantization parameters and P-points for a frame of video.
  • the frame comprises one or more macroblocks. Some of these embodiments comprise calculating a deviation representative of a difference between original and decoded versions of a macroblock.
  • embodiments comprise calculating a deviation representative of a distribution frequency of the value of a distortion. Some of these embodiments comprise calculating a deviation representative of the location of a P-point. In some of these embodiments, the P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a macroblock. Some of these embodiments comprise updating a motion complexity index using a quantization parameter and a number of non-zero coefficients of the encoded frame. Some of these embodiments comprise selecting an encoding mode for the macroblock using the motion complexity index to reference mode information maintained in the one or more tables.
  • the selected mode yields a least cost encoding. In some of these embodiments. In some of these embodiments, the deviation comprises a weighted difference of estimated distortion and measured distortion for a selected quantization parameter value. In some of these embodiments,
  • the deviation is normalized. In some of these embodiments, calculating the deviation representative of the difference between original and decoded versions of a macroblock is based on a tangential relationship between the distortion and a rate difference between the encoding modes. In some of these embodiments, each P-point corresponds to a distortion value is associated with no rate difference between encoding modes for the macroblock. In some of these embodiments, the motion complexity index is initiated during receipt of an initial number of frames in a video sequence. In some of these embodiments, there are at least 5 frames in the initial number of frames in the video sequence.
  • Some of these embodiments comprise modeling cost of deviation for each motion complexity class for each macroblock as a function of P-point, distortion and quantization parameter. Some of these embodiments comprise looking up a P-point for a current frame using a weighted quantization parameter value of a previous frame.
  • the encoding modes comprise an inter- prediction mode and an intra-prediction mode. In some of these embodiments, the encoding modes are defined by the H.264 video standard.
  • Certain embodiments of the invention provide a video encoder. Some of these embodiments comprise a plurality of tables relating quantization parameters and encoding modes for a video frame. Some of these embodiments comprise a content classifier that selects an encoding mode for a macroblock of the video frame from the plurality of tables using a deviation representative of difference between original and decoded versions of the macroblock. Some of these embodiments comprise a processor that maintains a motion complexity index using a quantization parameter and non-zero coefficients of the encoded frame. In some of these embodiments, the motion complexity index is operable to select an encoding mode based on the motion complexity of the frame. In some of these embodiments, the selected mode yields a least cost encoding for the frame. In some of these embodiments, the selected mode yields a least cost encoding for the macroblock. In some of these embodiments, each P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a

Abstract

Video encoder systems and methods are described that employ table-based content classification. One or more tables relate quantization parameters and P-points for a frame of video that typically comprises macroblocks. A deviation representative of a difference between original and decoded versions of a macroblock is determined, the deviation being further representative of a distribution frequency of the value of a distortion for a P-point. The P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a macroblock. A motion complexity index is updated using a quantization parameter and non-zero coefficients of the encoded frame. An encoding mode for the macroblock can be retrieved from the tables using the motion complexity index to reference mode information maintained in the tables.

Description

VIDEO CLASSIFICATION SYSTEMS AND METHODS
Cross-Reference to Related Applications
[0001] The present Application is related to concurrently filed applications entitled "Rho-Domain Metrics," "Video Analytics for Security Systems and Methods" and "Systems And Methods for Video Content Analysis," which are expressly
incorporated by reference herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Fig. 1 illustrates the relationship of distortion and rate difference between Intra Inter modes for a given quantization parameter.
[0003] Fig. 2 is a flowchart illustrating a content classification based mode decision method.
[0004] Fig. 3 is a simplified block schematic illustrating a processing system employed in certain embodiments of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0005] Embodiments of the present invention will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the invention. Notably, the figures and examples below are not meant to limit the scope of the present invention to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts. Where certain elements of these embodiments can be partially or fully implemented using known
components, only those portions of such known components that are necessary for an understanding of the present invention will be described, and detailed
descriptions of other portions of such known components will be omitted so as not to obscure the invention. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the invention is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, the present invention encompasses present and future known equivalents to the components referred to herein by way of illustration.
[0006] Video standards such as H.264/AVC employ mode decision as an encoding decision process to determine whether a macroblock ("MB") is encoded as an intra-prediction mode ("Intra Mode") or an inter-prediction mode ("Inter Mode"). Rate-distortion optimization techniques are commonly applied in various
implementations. When encoding a MB, rate-distortion cost is calculated for both Intra Modes and Inter Modes. The minimum cost mode is selected as the final encoding mode. Depending on the video standard, multiple Intra Modes and Inter Modes are applied. For example, in H.264 standard, there are 4 Intra 16x16 Modes and 9 Intra 4x4 Modes for each MB, and SKIP, Inter 16x16 Mode, Inter 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 Modes for each MB. Rate-distortion cost J is defined as
Figure imgf000003_0001
where distortion D is defined as the difference between reconstructed MB and original MB, where rate R represents the bits used to encode the current MB, and where coefficient λ is a weighting factor. In one example, the sum of absolute difference (SAD) can be used to quantify distortion.
Rate-Distortion Optimization
[0007] Rate-distortion optimization (RDO) techniques can provide a balance of encoding quality and compression ratio1. An accurate calculation of rate R in equation (1 ) is computationally costly and generally involves a dual-pass encoding process which requires the use of hardware resources and which introduces additional delays. Research has been conducted to optimize the calculation of R and to provide a fast rate-distortion balanced mode decision algorithm. However, estimation of bit rate R per MB is generally very costly due to tight pipeline
architectures employed in hardware embodiments that provide real-time encoding and multiple-channel encoding.
[0008] Accordingly, in certain embodiments, distortion D is used to determine the mode decision when R is omitted from equation (1 ). Mode optimization typically cannot be achieved by using D alone without considering the bit rate perspective of encoding. For example, in the low-complex background cases, Intra Mode's SAD for background MB can be smaller than Inter Mode SAD values: therefore, Intra-mode is typically selected. However, Intra Mode encoding usually consumes many more bits than Inter Mode encoding and, consequentially, encoding bits may be wasted and background blocky artifacts can be observed.
[0009] Certain embodiments employ a comparison of rate cost distortions Jjntra and Jjnter. Based on equation (1 ), a comparison can be taken as equivalent to the comparison of Dmtra + λ * (ΑΚ) Dmter shown in Equation (2), where A * (AR)
(denoted as τ hereafter) is the rate difference weighting factor between Intra mode and Inter mode.
*^int er ^i t er
*^int ra ^i t ra
Experimental results show there is a pseudo tangent relationship between AR and distortion for a given quantization parameter ("QP") as shown in Fig. 1 (QP =26).
[0010] Fig. 1 shows the relationship of AR and D for a given QP. In Fig. 1 , SAD is used as distortion and AR = Rmtra -Rmter . For the purposes of this description,
Rmtra represents bit numbers used by the intra mode encoder to encode the current microblock, and Rmter represents bit numbers used by the inter mode encoder to encode current microblock. A point P is defined as the point at which Diff_R (AR ) is equal to the zero point on axis X. Points with D values less than P will consume more bits with Intra Mode encoding (AR(= Rmtra - Rmter) > 0 ), while points with D values larger than P will consume less bits with Intra Mode, as shown in the drawing. Experimental results show there is a pseudo tangent relationship between AR and distortion for a given QP. The location of P point is a function of QP and video motion complexity, P points increase along with the increasing of QP and motion complexity.
[0011] It will be appreciated that finding the P point is a key step in the process. When P point is located, based on the tangent curve and D value distribution frequency, deviation τ can be estimated and the Intra Mode/Inter Mode decision can be reached quickly and with greater ease.
Rho-domain content classification
[0012] Certain embodiments use Rho-domain ("p-domain") content classification. Certain embodiments of the invention provide an innovative p-domain metric Θ and systems and methods that apply the metric. In some embodiments, the definition of
1 As described in the article "Overview of the H.264/AVC Video coding Standard", by T. Wiegand, G.J.Sullivan, G.Bjontegaard, and A.Luthra in IEEE Transactions on Circuits and Systems for Video p in p-domain can be taken to be the non-zero coefficient number after transform and quantization in a video encoding process. Additionally, the term "NZ" will be used herein to represent p, where NZ can be understood as meaning a number of non-zero coefficients after quantization of each macroblock in video standards such as the H.264 video standard. For the purposes of this description, a p-domain deviation metric Θ may be defined as a recursive weighted ration between the theoretical NZ_QP curve and the actual NZ_QP curve. Normalized Θ typically fluctuates around 1 .0. A value of Θ smaller than 1 .0 can indicate that the actual encoded bit rate is larger than the expectation, implying that a more complicated motion contextual content has been encountered. In contrast, a value of Θ larger than 1 .0 indicates that the actual encoded bit rate is smaller than the expectation, implying a smoother motion content encountered. Therefore, p-domain deviation Θ can be used as an indicator to classify video content to high motion complexity, medium, medium-low and low motion complexity categories. Based on motion complexity classification, a fast mode decision algorithm can be employed.
Example of a content classification based mode decision algorithm
[0013] In the example of Fig. 2, a content classification based mode decision algorithm is illustrated. The algorithm may be embodied in a combination of hardware and software and may be deployed as instructions and data stored in a computer readable media. It will be appreciated that the instructions and data may be configured and/or adapted such that execution of the instructions by a processor cause the processor to perform the method described in Fig. 2.
[0014] At step 200, offline trained quantization parameter QP and P-point tables QP _P _Tn are built based on p-domain content classifications, while Tn(Tn =1 , 2,
3, ...51 ) denotes different motion complexity classifications. If at step 203 it is determined that a current frame belongs to the first 5 frames of a video sequence, then step 204 is performed next; otherwise step 203 is performed next. At step 204, motion complexity index Tn is initiated based on initial QP and complexity information and the P-point can be found from QP _P _Tn tables. Step 206 can then be performed.
[0015] If at step 202 it is identified that the current frame does not belong to the first 5 frames of a video sequence then, at step 203, NZ_QP deviation Θ is calculated
Technology, vol.13, July 2003 (pp.560~576) based on the encoded frame NZ and QP information. At step 205, the motion complexity index based on Tn is then recalculated based on deviation Θ. A table lookup from QP _P _Tn tables may be performed to find P for the current frame based on weighted previous frame's QP value and content classification index TN before performing step 206.
[0016] At step 206, deviation τ is calculated with respect to distortion D based on the tangent relationship of τ and D, the distribution frequency of D, and the location of P-point. A mathematical model φ can be established as a function of P-point, D and QP for each motion complexity class to represent the cost deviation τ for each MB. One example of a QP _P _Tn \s shown in Table 1 , here below:
QP_P_Tn :
static int MD_P_TABLE[][]={
//{T1 ,T2,T3,P_point_T1 ,P_point_T2,P_point_T3}
{0.8,1 .1 ,2,4,6,6}, //QP = 14
{0.8,1 .1 ,2,4,6,6}, //QP = 15
{0.8,1 .1 ,2,5,7,7}, //QP = 16
{0.8,1 .1 ,2,5,7,7}, //QP = 17
{0.8,1 .1 ,2,6,8,8}, //QP = 18
{0.8,1 .1 ,2,6,8,8}, //QP = 19
{0.8,1 .1 ,2,7,9,9}, //QP = 20
{0.8,1 .1 ,2,8,9,9}, //QP = 20
}
//Listed in the table are relative values.
//From QP and content classification index Tn, and P_point can be obtained form the MD_P_TABLE
Table 1 : QP_P_Tn table
[0017] At step 208, mode decisions for each MB of current frame can be taken. Inter Mode RD cost J can be replaced by D as shown in equation (2) and Intra Mode cost J can be replaced by D + T , where τ is derived from experimental model φ as described at step 206. A winning mode may be selected as the mode which yields the minimum mode cost J. The process is typically repeated until it is determined at 210 the encoding of the current frame is finished.
[0018] In certain embodiments, the mode-decision algorithm, QP _P _Tn table and deviation model φ are built offline from experimental results. Motion
classification index Tn and its corresponding methods are described in a related, concurrently filed application titled "p-domain metrics Θ and its applications." The video classification based mode decision algorithms, systems and methods described herein can provide a very cost efficient, fast and robust alternative approach compared with conventional systems that tend to be computationally costly, and which are usually involve dual-pass encoding mode decision algorithms. In certain embodiments of the present invention. A fast table-lookup method is used to get a P-point value. From the P-point, QP and content classification index Tn , and
MB cost deviation τ can be obtained from a selected experimental model <p . Mode decisions can be made efficiently by inserting τ into equation (2).
System Description
[0019] Turning now to Fig. 3, certain embodiments of the invention employ a processing system that includes at least one computing system 30 deployed to perform certain of the steps described above. Computing system 30 may be a commercially available system that executes commercially available operating systems such as Microsoft Windows®, UNIX or a variant thereof, Linux, a real time operating system and or a proprietary operating system. The architecture of the computing system may be adapted, configured and/or designed for integration in the processing system, for embedding in one or more of an image capture system, communications device and/or graphics processing systems. In one example, computing system 30 comprises a bus 302 and/or other mechanisms for
communicating between processors, whether those processors are integral to the computing system 30 (e.g. 304, 305) or located in different, perhaps physically separated computing systems 300. Typically, processor 304 and/or 305 comprises a CISC or RISC computing processor and/or one or more digital signal processors. In some embodiments, processor 304 and/or 305 may be embodied in a custom device and/or may perform as a configurable sequencer. Device drivers 303 may provide output signals used to control internal and external components and to communicate between processors 304 and 305.
[0020] Computing system 30 also typically comprises memory 306 that may include one or more of random access memory ("RAM"), static memory, cache, flash memory and any other suitable type of storage device that can be coupled to bus 302. Memory 306 can be used for storing instructions and data that can cause one or more of processors 304 and 305 to perform a desired process. Main memory 306 may be used for storing transient and/or temporary data such as variables and intermediate information generated and/or used during execution of the instructions by processor 304 or 305. Computing system 30 also typically comprises non-volatile storage such as read only memory ("ROM") 308, flash memory, memory cards or the like; non-volatile storage may be connected to the bus 302, but may equally be connected using a high-speed universal serial bus (USB), Firewire or other such bus that is coupled to bus 302. Non-volatile storage can be used for storing configuration, and other information, including instructions executed by processors 304 and/or 305. Non-volatile storage may also include mass storage device 310, such as a magnetic disk, optical disk, flash disk that may be directly or indirectly coupled to bus 302 and used for storing instructions to be executed by processors 304 and/or 305, as well as other information.
[0021] In some embodiments, computing system 30 may be communicatively coupled to a display system 312, such as an LCD flat panel display, including touch panel displays, electroluminescent display, plasma display, cathode ray tube or other display device that can be configured and adapted to receive and display information to a user of computing system 30. Typically, device drivers 303 can include a display driver, graphics adapter and/or other modules that maintain a digital representation of a display and convert the digital representation to a signal for driving a display system 312. Display system 312 may also include logic and software to generate a display from a signal provided by system 300. In that regard, display 312 may be provided as a remote terminal or in a session on a different computing system 30. An input device 314 is generally provided locally or through a remote system and typically provides for alphanumeric input as well as cursor control 316 input, such as a mouse, a trackball, etc. It will be appreciated that input and output can be provided to a wireless device such as a PDA, a tablet computer or other system suitable equipped to display the images and provide user input.
[0022] According to one embodiment of the invention, portions of the described invention may be performed by computing system 30. Processor 304 executes one or more sequences of instructions. For example, such instructions may be stored in main memory 306, having been received from a computer-readable medium such as storage device 310. Execution of the sequences of instructions contained in main memory 306 causes processor 304 to perform process steps according to certain aspects of the invention. In certain embodiments, functionality may be provided by embedded computing systems that perform specific functions wherein the embedded systems employ a customized combination of hardware and software to perform a set of predefined tasks. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software. [0023] The term "computer-readable medium" is used to define any medium that can store and provide instructions and other data to processor 304 and/or 305, particularly where the instructions are to be executed by processor 304 and/or 305 and/or other peripheral of the processing system. Such medium can include nonvolatile storage, volatile storage and transmission media. Non-volatile storage may be embodied on media such as optical or magnetic disks, including DVD, CD-ROM and BluRay. Storage may be provided locally and in physical proximity to
processors 304 and 305 or remotely, typically by use of network connection. Nonvolatile storage may be removable from computing system 304, as in the example of BluRay, DVD or CD storage or memory cards or sticks that can be easily connected or disconnected from a computer using a standard interface, including USB, etc.
Thus, computer-readable media can include floppy disks, flexible disks, hard disks, magnetic tape, any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
[0024] Transmission media can be used to connect elements of the processing system and/or components of computing system 30. Such media can include twisted pair wiring, coaxial cables, copper wire and fiber optics. Transmission media can also include wireless media such as radio, acoustic and light waves. In
particular radio frequency (RF), fiber optic and infrared (IR) data communications may be used.
[0025] Various forms of computer readable media may participate in providing instructions and data for execution by processor 304 and/or 305. For example, the instructions may initially be retrieved from a magnetic disk of a remote computer and transmitted over a network or modem to computing system 30. The instructions may optionally be stored in a different storage or a different part of storage prior to or during execution.
[0026] Computing system 30 may include a communication interface 318 that provides two-way data communication over a network 320 that can include a local network 322, a wide area network or some combination of the two. For example, an integrated services digital network (ISDN) may used in combination with a local area network (LAN). In another example, a LAN may include a wireless link. Network link 320 typically provides data communication through one or more networks to other data devices. For example, network link 320 may provide a connection through local network 322 to a host computer 324 or to a wide are network such as the Internet 328. Local network 322 and Internet 328 may both use electrical, electromagnetic or optical signals that carry digital data streams.
[0027] Computing system 30 can use one or more networks to send messages and data, including program code and other information. In the Internet example, a server 330 might transmit a requested code for an application program through Internet 328 and may receive in response a downloaded application that provides or augments functional modules such as those described in the examples above. The received code may be executed by processor 304 and/or 305.
Additional Descriptions of Certain Aspects of the Invention
[0028] The foregoing descriptions of the invention are intended to be illustrative and not limiting. For example, those skilled in the art will appreciate that the invention can be practiced with various combinations of the functionalities and capabilities described above, and can include fewer or additional components than described above. Certain additional aspects and features of the invention are further set forth below, and can be obtained using the functionalities and components described in more detail above, as will be appreciated by those skilled in the art after being taught by the present disclosure.
[0029] Certain embodiments of the invention provide video encoder systems and methods. In some of these embodiments, the encoder systems employ content classification. Some of these embodiments comprise maintaining one or more tables relating quantization parameters and P-points for a frame of video. In some of these embodiments, the frame comprises one or more macroblocks. Some of these embodiments comprise calculating a deviation representative of a difference between original and decoded versions of a macroblock. Some of these
embodiments comprise calculating a deviation representative of a distribution frequency of the value of a distortion. Some of these embodiments comprise calculating a deviation representative of the location of a P-point. In some of these embodiments, the P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a macroblock. Some of these embodiments comprise updating a motion complexity index using a quantization parameter and a number of non-zero coefficients of the encoded frame. Some of these embodiments comprise selecting an encoding mode for the macroblock using the motion complexity index to reference mode information maintained in the one or more tables.
[0030] In some of these embodiments, the selected mode yields a least cost encoding. In some of these embodiments. In some of these embodiments, the deviation comprises a weighted difference of estimated distortion and measured distortion for a selected quantization parameter value. In some of these
embodiments, the deviation is normalized. In some of these embodiments, calculating the deviation representative of the difference between original and decoded versions of a macroblock is based on a tangential relationship between the distortion and a rate difference between the encoding modes. In some of these embodiments, each P-point corresponds to a distortion value is associated with no rate difference between encoding modes for the macroblock. In some of these embodiments, the motion complexity index is initiated during receipt of an initial number of frames in a video sequence. In some of these embodiments, there are at least 5 frames in the initial number of frames in the video sequence.
[0031] Some of these embodiments comprise modeling cost of deviation for each motion complexity class for each macroblock as a function of P-point, distortion and quantization parameter. Some of these embodiments comprise looking up a P-point for a current frame using a weighted quantization parameter value of a previous frame. In some of these embodiments, the encoding modes comprise an inter- prediction mode and an intra-prediction mode. In some of these embodiments, the encoding modes are defined by the H.264 video standard.
[0032] Certain embodiments of the invention provide a video encoder. Some of these embodiments comprise a plurality of tables relating quantization parameters and encoding modes for a video frame. Some of these embodiments comprise a content classifier that selects an encoding mode for a macroblock of the video frame from the plurality of tables using a deviation representative of difference between original and decoded versions of the macroblock. Some of these embodiments comprise a processor that maintains a motion complexity index using a quantization parameter and non-zero coefficients of the encoded frame. In some of these embodiments, the motion complexity index is operable to select an encoding mode based on the motion complexity of the frame. In some of these embodiments, the selected mode yields a least cost encoding for the frame. In some of these embodiments, the selected mode yields a least cost encoding for the macroblock. In some of these embodiments, each P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a
macroblock.
[0033] Although the present invention has been described with reference to specific exemplary embodiments, it will be evident to one of ordinary skill in the art that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

WHAT IS CLAIMED IS:
1. A method of content classification in a video encoder, comprising:
maintaining one or more tables relating quantization parameters and P-points for a frame of video, the frame comprising one or more macroblocks;
calculating a deviation representative of a difference between original and decoded versions of a macroblock, a distribution frequency of the value of a distortion and the location of a P-point;
updating a motion complexity index using a quantization parameter and a number of non-zero coefficients in the encoded frame; and
selecting an encoding mode for the macroblock using the motion complexity index to reference mode information maintained in the one or more tables, wherein the selected mode yields a least cost encoding, wherein each P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a macroblock.
2. The method of claim 1 , wherein the deviation comprises a weighted difference of estimated distortion and measured distortion for a selected quantization parameter value.
3. The method of claim 1 or 2, wherein the deviation is normalized.
4. The method of any of claims 1 -3, wherein calculating the deviation representative of the difference between original and decoded versions of a macroblock is based on a tangential relationship between the distortion and a rate difference between the encoding modes.
5. The method of any of claims 1 -4, wherein the each P-point corresponds to a distortion value is associated with no rate difference between encoding modes for the macroblock.
6. The method of any of claims 1 -5, wherein the motion complexity index is initiated during receipt of an initial number of frames in a video sequence.
7. The method of claim 6, wherein there are at least 5 frames in the initial number of frames in the video sequence.
8. The method of any of claims 1 -7, further comprising modeling cost of deviation for each motion complexity class for each macroblock as a function of P-point, distortion and quantization parameter.
9. The method of any of claims 1 -8, further comprising looking up a P-point for a current frame using a weighted quantization parameter value of a previous frame.
10. The method of any of claims 1 -9, wherein the encoding modes comprise an inter- prediction mode and an intra-prediction mode.
1 1 . The method of any of claims 1 -10, wherein the encoding modes are defined by the H.264 video standard.
12. A video encoder, comprising:
a plurality of tables relating quantization parameters and encoding modes for a video frame;
a content classifier that selects an encoding mode for a macroblock of the video frame from the plurality of tables using a deviation representative of difference between original and decoded versions of the macroblock; and
a processor that maintains a motion complexity index using a quantization parameter and non-zero coefficients of the encoded frame, the motion complexity index being operable to select an encoding mode based on the motion complexity of a frame, wherein the selected mode yields a least cost encoding, wherein each P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a macroblock.
PCT/CN2010/076569 2010-09-02 2010-09-02 Video classification systems and methods WO2012027894A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
PCT/CN2010/076569 WO2012027894A1 (en) 2010-09-02 2010-09-02 Video classification systems and methods
CN201080062017XA CN102771123A (en) 2010-09-02 2010-09-02 Video classification systems and methods
US13/225,238 US20120057640A1 (en) 2010-09-02 2011-09-02 Video Analytics for Security Systems and Methods
US13/225,269 US8824554B2 (en) 2010-09-02 2011-09-02 Systems and methods for video content analysis
US13/225,222 US20120057629A1 (en) 2010-09-02 2011-09-02 Rho-domain Metrics
US13/225,202 US20120057633A1 (en) 2010-09-02 2011-09-02 Video Classification Systems and Methods
US14/472,313 US9609348B2 (en) 2010-09-02 2014-08-28 Systems and methods for video content analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2010/076569 WO2012027894A1 (en) 2010-09-02 2010-09-02 Video classification systems and methods

Publications (1)

Publication Number Publication Date
WO2012027894A1 true WO2012027894A1 (en) 2012-03-08

Family

ID=45772083

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/076569 WO2012027894A1 (en) 2010-09-02 2010-09-02 Video classification systems and methods

Country Status (2)

Country Link
CN (1) CN102771123A (en)
WO (1) WO2012027894A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080069211A1 (en) * 2006-09-14 2008-03-20 Kim Byung Gyu Apparatus and method for encoding moving picture
CN101179729A (en) * 2007-12-20 2008-05-14 清华大学 Interframe mode statistical classification based H.264 macroblock mode selecting method
US20100150233A1 (en) * 2008-12-15 2010-06-17 Seunghwan Kim Fast mode decision apparatus and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101112101A (en) * 2004-11-29 2008-01-23 高通股份有限公司 Rate control techniques for video encoding using parametric equations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080069211A1 (en) * 2006-09-14 2008-03-20 Kim Byung Gyu Apparatus and method for encoding moving picture
CN101179729A (en) * 2007-12-20 2008-05-14 清华大学 Interframe mode statistical classification based H.264 macroblock mode selecting method
US20100150233A1 (en) * 2008-12-15 2010-06-17 Seunghwan Kim Fast mode decision apparatus and method

Also Published As

Publication number Publication date
CN102771123A (en) 2012-11-07

Similar Documents

Publication Publication Date Title
US20120057633A1 (en) Video Classification Systems and Methods
US9215466B2 (en) Joint frame rate and resolution adaptation
JP5590133B2 (en) Moving picture coding apparatus, moving picture coding method, moving picture coding computer program, moving picture decoding apparatus, moving picture decoding method, and moving picture decoding computer program
US20120307890A1 (en) Techniques for adaptive rounding offset in video encoding
JP2008507211A (en) H. based on intra prediction direction. H.264 spatial error concealment
US10574997B2 (en) Noise level control in video coding
WO2006096612A2 (en) System and method for motion estimation and mode decision for low-complexity h.264 decoder
WO2018095890A1 (en) Methods and apparatuses for encoding and decoding video based on perceptual metric classification
JP2019110530A (en) Method and device for encoding video data
CN107343202B (en) Feedback-free distributed video coding and decoding method based on additional code rate
Zhou et al. $\ell_ {2} $ Restoration of $\ell_ {\infty} $-Decoded Images Via Soft-Decision Estimation
Sun et al. Rate distortion modeling and adaptive rate control scheme for high efficiency video coding (HEVC)
Chen et al. Rate-distortion optimization of H. 264/AVC rate control with novel distortion prediction equation
US8442338B2 (en) Visually optimized quantization
Tang et al. Improved hierarchical quantisation parameter setting method for screen content coding in high efficiency video coding
WO2012027894A1 (en) Video classification systems and methods
WO2012027892A1 (en) Rho-domain metrics
Milani Fast H. 264/AVC FRExt intra coding using belief propagation
US10085023B2 (en) Systems and methods for quantization of video content
JP2019110529A (en) Method and device for encoding video data
Jung et al. Perceptual rate distortion optimisation for video coding using free‐energy principle
Jayaraman et al. Investigation of filtering of rain streaks affected video sequences under various quantisation parameter in HEVC encoder using an enhanced V‐BM4D algorithm
Jiang et al. Scalable hevc intra frame complexity control subject to quality and bitrate constraints
Tang et al. Distortion propagation based quantization parameter cascading method for screen content video coding
JP2007129662A (en) Image encoder

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201080062017.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10856581

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10856581

Country of ref document: EP

Kind code of ref document: A1