CN102771123A

CN102771123A - Video classification systems and methods

Info

Publication number: CN102771123A
Application number: CN201080062017XA
Authority: CN
Inventors: F·施; 王标
Original assignee: Intersil Inc
Current assignee: Intersil Corp; Intersil Americas LLC
Priority date: 2010-09-02
Filing date: 2010-09-02
Publication date: 2012-11-07
Also published as: WO2012027894A1

Abstract

Video encoder systems and methods are described that employ table-based content classification. One or more tables relate quantization parameters and P-points for a frame of video that typically comprises macroblocks. A deviation representative of a difference between original and decoded versions of a macroblock is determined, the deviation being further representative of a distribution frequency of the value of a distortion for a P-point. The P-point corresponds to a distortion value that is associated with a minimum rate difference between encoding modes for a macroblock. A motion complexity index is updated using a quantization parameter and non-zero coefficients of the encoded frame. An encoding mode for the macroblock can be retrieved from the tables using the motion complexity index to reference mode information maintained in the tables.

Description

The visual classification system and method

The cross reference of related application

The application relates to the patent application that is entitled as " Rho-Domain Metrics (Rho-territory tolerance) ", " Video Analytics for Security Systems and Methods (video analysis that is used for safety system and method is learned) " and " Systems And Methods for Video Content Analysis (system and method that is used for video content analysis) " of common submission, and these documents are contained in this through quoting specially.

The accompanying drawing summary

Fig. 1 illustrates the relation for the rate variance between the distortion of given quantization parameter and internal schema and the inter mode.

Fig. 2 is the flow chart that the mode judging method of content-based classification is shown.

Fig. 3 is the simplification block diagram that is illustrated in the treatment system that adopts in the certain embodiments of the invention.

Embodiment

Referring now to accompanying drawing embodiments of the invention are described in detail, these accompanying drawings provide as illustrated examples so that those skilled in the art can put into practice the present invention.Obviously, following accompanying drawing and example are not intended to scope of the present invention is limited in single embodiment, but can through exchange said or shown in key element some or all make other embodiment possibility that becomes.So long as suitable, in institute's drawings attached, will use identical Reference numeral to represent same or analogous parts.Can use under the part or all of situation that realizes of known tip assemblies in some key element of these embodiment; Only as far as describing understanding those necessary parts of the present invention in these known tip assemblies; And save detailed description to other part of these known tip assemblies, obscure to be unlikely to that the present invention is produced.In this manual, the embodiment that single component is shown should not be regarded as restriction; On the contrary, the present invention is intended to contain other embodiment that comprises a plurality of same assemblies, and vice versa, only if other situation of explicit state in this article.In addition, the applicant does not plan to make the arbitrary term in specification or claims to ascribe common or not special implication to, only if clear and definite so elaboration.In addition, the present invention contain among this paper parts through explaining orally citation all at present with the equivalent of knowing in the future.

Video standard such as H.264/AVC with mode decision as the coding decision process with confirm macro block (" MB ") be encoded as inner estimation mode (" Intra Mode (internal schema) ") or between predictive mode (" Inter Mode (inter mode) ").Rate distortion (rate distortion) optimisation technique generally is applied in the various realizations.When coding during MB, to internal schema and inter mode calculation rate distortion cost all.The minimum cost pattern is chosen as final coding mode.Depend on video standard, use a plurality of internal schemas and inter mode.For example, in standard H.264, for each MB 4 interior 16x16 patterns and 9 interior 4x4 patterns are arranged, and for each MB the SKIP pattern is arranged, a 16x16 pattern, a 16x8,8x16,8x8,8x4,4x8 and 4x4 pattern.Rate distortion cost J is defined as

J=D+λ*R, (1)

Wherein distortion D is defined as poor between MB and the original MB of reconstruct, rate R represent to be used to the to encode bit of current MB, and coefficient lambda is a weighted factor.In one example, the summation of absolute difference (SAD) can be used for quantizing distortion.

Rate-distortion optimization

Rate-distortion optimization (RDO) technology can provide the balance of coding quality and compression ratio ¹(as described in T.Wiegand, G.J.Sullivan, G.Bjontegaard and the A.luthra article " Overview of the is Video coding Standard (the H.264/AVC review of video encoding standard) H.264/AVC " on IEEE Transactions on Circuits and Systems for video Technology (IEEE very low and system's journal) (July in 2003 the 13rd volume, 560-576 page or leaf)).To the height that assesses the cost of the accurate Calculation of the rate R in the equality (1), and the bilateral that is usually directed to need to use hardware resource and introduce extra delay is crossed (dual-pass) cataloged procedure.Studied with the calculating of optimizing R and the mode decision algorithm that rapid rate distortion balance is provided.But because the compact pipeline architecture that in the hardware embodiment that real-time coding and multichannel coding is provided, is adopted, the common cost of estimation of the bit rate R of every MB is very high.

Therefore, in certain embodiments, when from equality (1) when dispensing R, distortion D is used for deterministic model and judges.Pattern optimization can not consider that the encoded bit rate prospect realizes through utilizing D separately usually.For example, under the low complex degree background, the sad value of the comparable inter mode of SAD of the internal schema of background MB is little: therefore, select internal schema usually.But the internal schema coding consumes much more bit than the inter mode coding usually, thereby coded-bit possibly be wasted and can be observed the block pseudomorphism of background.

Some embodiment employing rate cost distortion J _{_ in}And J _{_}Comparison.Based on equality (1), can carry out with at the D shown in the equality (2) _In+ λ * (Δ R) and D _BetweenThe comparison of comparison equivalent, wherein λ * (Δ R) (hereinafter representing with τ) is the rate variance weighted factor between internal schema and the inter mode.

J _Between=D _Between

J _In=D _In

Experimental result is illustrated between given quantization parameter shown in Figure 1 (" QP ") distortion and the Δ R (QP=26) and has pseudo-tangent relationship.

Fig. 1 illustrates for the Δ R of given QP and the relation of D.In Fig. 1, SAD is as distortion and Δ R=R _In– R _BetweenFor the purpose of this specification, R _InExpression is by internal schema encoder be used to the to encode bit number of current microlith and R _BetweenExpression is by inter mode encoder be used to the to encode bit number of current microlith.Point P is defined in this some Diff_R of place (Δ R) and equals the zero point on the X.Shown in accompanying drawing, the point with D value littler than P utilizes the internal schema coding will consume more bit (Δ R (=R _In– R _Between)>0) point that, has the D value bigger than P utilizes internal schema will consume less bit.Experimental result shows for given QP, has pseudo-tangent relationship between distortion and the Δ R.The position that P is ordered is the function of QP and video motion complexity, and the P point increases along with the increase of QP and motion complexity.

Should be appreciated that seeking the P point is the committed step in this process.When the P point is positioned, based on tangent cutve and D value distribution frequency, but estimated bias τ and can be fast and more easily reach internal schema/inter mode and judge.

Rho-territory classifying content

Some embodiment uses Rho-territory (" ρ-territory ") classifying content.Some embodiment of the present invention provides a kind of ρ-territory tolerance θ of innovation and the system and method for using this tolerance.In certain embodiments, the definition of the ρ in ρ-territory can be considered to be in the quantity of changing and quantize nonzero coefficient afterwards in the video coding process.In addition, this paper use a technical term " NZ " characterize ρ, wherein NZ can be understood that to be illustrated in video standard such as video standard H.264 and quantizes a plurality of nonzero coefficients after each macro block down.For the purpose of this specification, ρ-territory deviation measurement θ can be defined as the recurrence weighted ratio between theoretical NZ_QP curve and the actual NZ_QP curve.Normalized θ is usually in 1.0 fluctuations up and down.Less than 1.0 θ value can indicate actual through coding bit rate greater than expection, this is hinting and is running into more complicated motion background content.On the contrary, greater than the indication of 1.0 θ value actual through coding bit rate less than expection, this is hinting and is running into more level and smooth motion component.Therefore, ρ-territory deviation θ can be used as with video content be divided into high motion complexity, medium, in the indicating device of low and harmonic motion complexity kind.Based on the motion complexity classification, can adopt the fast mode decision algorithm.

The example of the mode decision algorithm of content-based classification

In the example of Fig. 2, illustration the mode decision algorithm of content-based classification.This algorithm can be implemented in the combination of hardware and software, and can be used as the instruction and data that is stored in the computer-readable medium.Should be appreciated that instruction and data can be configured to and/or be regulated such that processor causes this processor execution graph 2 described methods to the execution of instruction.

In step 200,, create the quantization parameter QP of off-line training and the table QP_P_T that P-is ordered based on ρ-territory classifying content _n, and T _n(Tn=1,2,3 ... 51) the different motion complexity classification of expression.If confirm that in step 203 present frame belongs to preceding 5 frames of video sequence, so next execution in step 204; Otherwise next execution in step 203.In step 204, based on initial Q P and complexity information initialization motion complexity index T _n, and can be from QP_P_T _nTable finds the P-point.But execution in step 206 subsequently.

If confirm that in step 202 present frame does not belong to preceding 5 frames of video sequence, calculate NZ_QP deviation θ in step 203 based on the frame NZ and the QP information of warp coding so.In step 205, recomputate motion complexity index T based on deviation θ _nCan be before execution in step 206 based on the previous frame QP value and the classifying content index T of weighting _nFrom QP_P_T _nTable is carried out and is tabled look-up to seek the P of present frame.

In step 206, based on the distribution frequency of the tangent relationship of τ and D, D and position that P-is ordered with respect to distortion D calculation deviation τ.Can set up Mathematical Modeling

as the function of P-point, D and the QP of each motion complexity kind to represent the cost deviation τ of each MB.At QP_P_T shown in the table 1 _nAn example, as follows:

Listed in the // table is relative value.

// according to QP and classifying content index Tn, can obtain the P_ point from MD_P_TABLE.

Table 1:QP_P_Tn table

In step 208, can carry out mode decision to each MB of present frame.Inter mode rate distortion cost J can be replaced by D shown in equality (2); And internal schema cost J can be replaced by D+ τ; Wherein τ derives from experimental model

, and is of step 206.Successful pattern can be chosen as the pattern that produces minimal mode cost J.Usually repeat this process until confirming end-of-encode to present frame in step 210.

In certain embodiments, make up mode decision algorithm, QP_P_T according to the experimental result off-line _nTable and buggy model

Classification of motions index T _nAnd correlation method is described in the relevant common submit applications that is entitled as " ρ-territory tolerance θ and application thereof " to some extent.With assess the cost height and be usually directed to bilateral and cross the conventional system of coding mode decision algorithm and compare usually, the mode decision algorithm based on visual classification described herein, system and method can provide cost efficiency very high, quick and sane replacement method.In certain embodiments of the present invention, use quick look-up method to obtain the P-point value.According to P-point, QP and classifying content index T _n, can be from the experimental model of selecting Obtain MB cost deviation τ.Can carry out mode decision effectively through τ being inserted equality (2).

System description

Turn to Fig. 3 now, some embodiment of the present invention adopts a kind of treatment system, and this treatment system comprises at least one computing system 30 that is deployed to aforementioned some step of execution.Computing system 30 can be the system that can buy on the market, and this system carries out the operating system that can buy on the market such as Microsoft

UNIX or its variant, Linux, real time operating system and/or private operating system.Can adjust, the architecture of configuration and/or designing and calculating system to be to be integrated in the treatment system, to be embedded among one or more in catching image system, communication equipment and/or the graphic system.In one example; Computing system 30 comprises bus 302 and/or other mechanism that are used between processor, communicating by letter, no matter these processors are processors integrated with computing system 30 (for example 304,305) still be arranged in different, possibly be the processor of physically separated computing system 300.In general, processor 304 and/or 305 comprises CISC or RISC computation processor and/or one or more digital signal processor.In certain embodiments, processor 304 and/or 305 can be implemented in the equipment for customizing and/or as configurable sequencer and carry out.Device driver 303 can provide the output signal, and this output signal is used for controlling inside and outside assembly and at processor 304, communicate between 305.

Computing system 30 generally also comprises memory 306, and this memory 302 can comprise one or more in the memory device of random-access memory (ram), static memory, cache memory, flash memory and any other type that can be coupled to bus 302.Memory 306 can be used to store instruction and data, and this instruction and data can make the required process of one or more execution of processor 304,305.Main storage 306 can be used for storing of short duration data and/or ephemeral data, the variable and the average information that for example during being executed instruction by

processor

304 or 305, produce and/or use.Computing system 30 generally also comprises non-volatile memories, for example read-only memory (" ROM ") 308, flash memory, storage card etc.; Non-volatile memories can be connected to bus 302, but other this type bus that also can use high speed universal serial bus (USB), fire compartment wall with being equal to or be coupled to bus 302 connects.Non-volatile memories can be used for stored configuration and out of Memory, and said out of Memory comprises the instruction of being carried out by processor 304 and/or 305.Non-volatile memories also can comprise mass memory unit 310, and for example disk, CD, flash disk, this mass memory unit 304 can be coupled to bus 302 directly or indirectly and be used for storing the instruction that plan is carried out by processor 305 and/or 305, and out of Memory.

In certain embodiments; Computing system 30 can be coupled to the for example display system 312 of LCD flat panel display communicatedly, comprises that touch-screen display, electroluminescent display, plasma scope, cathode ray tube maybe can dispose and be adapted to reception information and information is shown to other display device of computing system 30 users.In general, device driver 303 can comprise display driver, EGA and/or keep the numeral of demonstration and this numeral converted to other module of the signal that is used to drive display system 312.Display system 312 also can comprise the logic OR software that from the signal that is provided by system 300, generates demonstration.In this, display 312 can be set to remote terminal or be arranged in the session on the different computing system 30.Input equipment 314 generally provides in this locality or provides through remote system, and generally as alphanumeric input and cursor control 316 inputs (for example mouse, tracking ball etc.).Should be appreciated that and can input and output for example be offered the wireless device of PDA, flat computer or other system that suitably is equipped to display image and provides the user to import.

According to one embodiment of present invention, described some parts of the present invention can be realized through computing system 30.Processor 304 is carried out one or more command sequences.For example, after the computer-readable medium of for example memory device 310 receives instruction, these instructions can be stored in the main storage 306.The execution that is included in the command sequence in the main storage 306 makes processor 304 carry out the process steps of some aspect according to the present invention.In certain embodiments, can function be provided by embedded computing system, these embedded computing systems are carried out dedicated functions, and wherein embedded system adopts the customization of hardware and software to make up the set of carrying out the predefine task.Therefore, embodiments of the invention are not limited to any concrete combination of hardware circuit and software.

Term " computer-readable medium " be used for defining can be especially instruction by processor 304 and/or 305 and/or the situation carried out of other peripheral hardware of treatment system under store instruction with other data and with it any medium to processor 304 and/or 305 is provided.This medium can comprise non-volatile memories, volatile storage and transmission medium.Non-volatile memories can be presented as the medium of CD for example or disk (comprising DVD, CD-ROM and Blu-ray disc) and so on.Storage can be arranged on local also physics near processor 304,305, or generally provides through using network to connect to come remotely.Non-volatile memories can be removed from computing system 304, as those blue light, DVD or CD storage or storage card or the memory stick that kind that can use standard interfaces such as comprising USB to be connected in computer or to break off from computer easily.Therefore; Computer-readable medium can comprise floppy disk, flexible disk, hard disk, tape, any other magnetic medium, CD-ROM, DVD, blue light and other optical medium, punched card, paper tape, have any other physical medium of sectional hole patterns, RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cassette, perhaps computer any other medium of reading of content therefrom.

Transmission medium can be used to the parts of connection processing system and/or the assembly of computing system 30.This medium can comprise twisted-pair feeder, coaxial cable, copper cash and optical fiber.Transmission medium also can comprise wireless medium, for example radio wave, sound wave and light wave.Under particular radio frequencies (RF), can use optical fiber and infrared (IR) data communication.

Various forms of computer-readable mediums can participate in providing instruction and data to carry out for processor 304 and/or 305.For example, can and on network or modulator-demodulator, instruction be transferred to computing system 30 from the instruction of the search disk of far-end computer at first.The instruction selectively before execution or the term of execution be stored in the different piece of different storages or storage.

Computing system 30 can comprise communication interface 318, and this communication interface 318 is provided at the bidirectional data communication on the network 320 that can comprise local area network (LAN) 322, wide area network or both some combinations.For example, Integrated Service Digital Network can combine Local Area Network to use.In another example, LAN can comprise Radio Link.Network link 320 generally provides the data communication through one or more networks to other data equipment.For example, network link 320 can provide through local area network (LAN) 322 to master computer 324 or to the for example connection of the wide area network of the Internet 328 and so on.Both all can use the signal of telecommunication, electromagnetic signal or the light signal that carries digital data stream local area network (LAN) 322 and the Internet 328.

Computing system 30 can use one or more networks to send message and data, comprises program code and out of Memory.In internet example, server 330 can send the request code of application programs through the Internet 328, and can be used as response and receive the application of downloading, and the application of this download provides or expands the functional module as describing in the earlier examples.The code that receives can be carried out by processor 304 and/or 305.

The supplemental instruction of some aspect of the present invention

Above stated specification of the present invention is an illustrative but not determinate.For example, it will be understood by those skilled in the art that the present invention can realize through the various combinations of aforementioned functional and ability, and can comprise than still less aforementioned or more parts.Set forth some additional aspect of the present invention and characteristic below, and these additional aspect and characteristic can use the front in greater detail function and parts obtain, as those skilled in that art after receiving disclosure teaching accessible.

Some embodiment of the present invention provides video encoder system and method.Among some embodiment in these embodiment, encoder system adopts classifying content.Some embodiment among these embodiment comprise one or more tables that maintenance is relevant with the P-point with the quantization parameter of frame of video.Among some embodiment in these embodiment, frame comprises one or more macro blocks.Some embodiment among these embodiment comprise the prototype version of represents macro block and the deviation of the difference between the decoded version.Some embodiment among these embodiment comprise the deviation of represents distortion value distribution frequency.Some embodiment among these embodiment comprise the deviation of represents P-point position.Among some embodiment in these embodiment, the P-point corresponding to the macroblock encoding pattern between the distortion value that is associated of minimum rate variance.Some embodiment among these embodiment comprise that use upgrades the motion complexity index through quantization parameter and a plurality of nonzero coefficient of coded frame.Some embodiment among these embodiment comprise that the pattern information of using the motion complexity index to quote in one or more tables, to safeguard is to select the macroblock encoding pattern.

Among some embodiment in these embodiment, selected mode producing least cost coding.Among some embodiment in these embodiment.Among some embodiment in these embodiment, deviation comprises the distortion estimator and the weighted difference of measuring distortion of selected quantization parameter value.Among some embodiment in these embodiment, deviation is by normalization.Among some embodiment in these embodiment, come the prototype version of represents macro block and the deviation of the difference between the decoded version based on rate variance between coding mode and the tangent relationship between the distortion.Among some embodiment in these embodiment, each P-point corresponding to the macroblock encoding pattern between the distortion value that is associated of no rate variance.Among some embodiment in these embodiment, the initialization image duration motion complexity index of initial number in the receiver, video sequence.Among some embodiment in these embodiment, the frame of initial number has at least 5 frames in the video sequence.

Some embodiment among these embodiment comprise the function that the deviation cost of each motion complexity kind of each macro block is modeled as P-point, distortion and quantization parameter.Some embodiment among these embodiment comprise that the weight quantization parameter value that uses previous frame searches the P-point of present frame.Among some embodiment in these embodiment, coding mode comprises a predictive mode and inner estimation mode.Among some embodiment in these embodiment, coding mode is defined by video standard H.264.

Some embodiment of the present invention provides video encoder.Some embodiment among these embodiment comprise a plurality of tables relevant with quantization parameter with the coding mode of frame of video.Some embodiment among these embodiment comprise the classifying content device, and the deviation of difference is selected the macroblock encoding pattern of frame of video between the prototype version of this classifying content device use expression macro block and the decoded version from said a plurality of tables.Some embodiment among these embodiment comprise processor, and this processor uses and safeguards the motion complexity index through the nonzero coefficient and the quantization parameter of coded frame.Among some embodiment in these embodiment, the motion complexity index can be used for selecting coding mode based on the motion complexity of frame.Among some embodiment in these embodiment, selected mode producing is used for the least cost coding of frame.Among some embodiment in these embodiment, selected mode producing is used for the least cost coding of macro block.Among some embodiment in these embodiment, each P-point corresponding to the macroblock encoding pattern between the distortion value that is associated of minimum rate variance.

Although invention has been described with reference to certain exemplary embodiments, yet those skilled in the art know certainly and can make various modifications and variation to these embodiment and do not break away from the spirit and the scope of broad of the present invention.Therefore, specification and accompanying drawing are considered to illustrative but not restrictive, sense.

Claims

1. the content categorizing method in the video encoder comprises:

Safeguard the one or more tables relevant with the P-point with the quantization parameter of frame of video, said frame comprises one or more macro blocks;

The distribution frequency of the prototype version of represents macro block and the deviation of the difference between the decoded version, distortion value and the position that P-is ordered;

Use a plurality of nonzero coefficients and quantization parameter in the frame of coding to upgrade the motion complexity index; And

The pattern information of using said motion complexity index to quote to be maintained in said one or more table is to select said macroblock encoding pattern; Wherein selected mode producing least cost coding, wherein each P-point corresponding to the macroblock encoding pattern between the distortion value that is associated of minimum rate variance.

2. the method for claim 1 is characterized in that, said deviation comprises the distortion estimator and the weighted difference of measuring distortion of selected quantization parameter value.

3. according to claim 1 or claim 2 method is characterized in that said deviation is by normalization.

4. like each described method among the claim 1-3, it is characterized in that, come the prototype version of represents macro block and the deviation of the difference between the decoded version based on rate variance between said coding mode and the tangent relationship between the distortion.

5. like each described method among the claim 1-4, it is characterized in that, said each P-point corresponding to said macroblock encoding pattern between the distortion value that is associated of no rate variance.

6. like each described method among the claim 1-5, it is characterized in that the said motion complexity index of initialization image duration of initial number in the receiver, video sequence.

7. method as claimed in claim 6 is characterized in that the frame of initial number has at least 5 frames in the said video sequence.

8. like each described method among the claim 1-7, it is characterized in that, comprise that also the deviation cost with each motion complexity kind of each macro block is modeled as the function of P-point, distortion and quantization parameter.

9. like each described method among the claim 1-8, it is characterized in that, comprise that also the weight quantization parameter value that uses previous frame searches the P-point of present frame.

10. like each described method among the claim 1-9, it is characterized in that said coding mode comprises a predictive mode and inner estimation mode.

11., it is characterized in that said coding mode is defined by video standard H.264 like each described method among the claim 1-10.

12. a video encoder comprises:

The a plurality of tables relevant with quantization parameter with the coding mode of frame of video;

The classifying content device, the deviation of difference is selected said macroblock encoding pattern between the prototype version of the macro block of the said frame of video of said classifying content device use expression and the decoded version from said a plurality of tables; And

Processor; Said processor uses the nonzero coefficient and the quantization parameter of coded frame to safeguard the motion complexity index; Said motion complexity index can be used for selecting coding mode based on the motion complexity of frame; Wherein selected mode producing least cost coding, wherein each P-point corresponding to the macroblock encoding pattern between the distortion value that is associated of minimum rate variance.