CN1235413C

CN1235413C - Method for coding and recoding ripple video frequency based on motion estimation

Info

Publication number: CN1235413C
Application number: CN 03149504
Authority: CN
Inventors: 耿静; 陈小敬; 庞潼川; 周闰
Original assignee: Datang Microelectronics Technology Co Ltd
Current assignee: Datang Microelectronics Technology Co Ltd
Priority date: 2003-07-14
Filing date: 2003-07-14
Publication date: 2006-01-04
Anticipated expiration: 2023-07-14
Also published as: CN1471321A

Abstract

The present invention relates to a small wave video coding/decoding method based on motion estimation. At a coding terminal, the conversion of image data into the image data in a YUV format is firstly determined, a first frame image is seen as a first default frame, and subsequent video frames are seen as default PB frames; motion of the subsequent video frames is estimated to readjust the types of the subsequent video frames in a self-adaptive mode according to the result of the motion estimation; the first frame is coded in an intraframe mode according to the types of the video frames, and the PB frames are coded in the intraframe mode. Correspondingly, firstly, a first frame image at a decoding terminal is seen as a first default frame, and the coding types of subsequent video frames are obtained according to information in a coded code stream to determine the subsequent video frames as first frames or PB frames; secondly, the first frame is coded in an intraframe mode, and the PB frames are coded in an interframe mode; finally, the reduced image data is converted into the image data in the required format. The present invention for judging the coding types in the self-adaptive mode has the advantages of flexible coding method and coding efficiency improvement. The present invention can realize high-quality reliable image compression with a low transmission rate or a very low transmission rate.

Description

Small echo video coding-decoding method based on estimation

Technical field

The present invention relates to a kind of decoding method of video image, particularly a kind of small echo video coding-decoding method based on estimation.

Background technology

At present the multi-media image compression standard is mostly based on traditional discrete cosine transform (DCT), comprises the MPEG1/2/4 series standard of Joint Photographic Experts Group, moving image of rest image and the H.261/H.263 standard etc. that is used for low bit-rate video communication.

H.263 be the video encoding and decoding standard that is used for the low bit rate real-time Transmission that international telecommunication association-ITU-T of telecommunication standardization sector (The InternationalTelecommunications Union-Telecommunication Standardization Sector) passed through in nineteen ninety-five.Its design original intention is to satisfy the low bandwidth Video Applications demand that bandwidth is lower than 64kbps, as video conference, video telephone etc.H.263 input video frame format is QCIF (Quarter Common Intermediate Format, size is 176 * 144), CIF (Common Intermediate Format, size is 352 * 288) etc.Each frame of video is divided into many macro blocks (MB-Micro Block), and each macro block is made up of 4 Y luminance block, 1 Cb chrominance block and 1 Cr chrominance block.The size of piece (Block) is 8 * 8.H.263 with the macro block compression that unit carries out frame of video.

H.263 use discrete cosine transform DCT (Discrete Cosine Transform) to reduce spatial redundancy, use estimation and motion compensation (Motion Estimation and MotionCompensation) to reduce time redundancy.H.263, two kinds of coded systems are arranged, and a kind of is the Intra mode, intraframe coding, and the frame of generation is as key frame-I frame; Another kind is the Inter mode, interframe encode, and the frame of generation is as non-key frame-P frame; And the B frame be to use with it in time former and later two I frames of arest neighbors or P frame predict, it oneself not as the reference picture of any other frame.

Estimation is meant in reference frame one of search and the most similar image block of current frame image piece, i.e. best matching blocks, and Search Results is represented with motion vector.There are two kinds of methods to carry out estimating motion, a kind of be the difference square and (SSD), another kind be the difference absolute value and (SAD), these two kinds of methods all have advantage and the limitation of oneself under particular environment, because SAD implements than very fast (not having multiplying), use the SAD method to carry out estimating motion among the MPEG; Motion compensation is meant the motion vector reconstruct present frame that utilizes reference frame and tried to achieve, and the difference of reconstructed frame and present frame is carried out compressed encoding as the offset of present frame.Both work in coordination, and realize compression effectiveness jointly.

The research of motion estimation algorithm is set about from two aspects: fast search algorithm and piece matching criterior.The simplest searching algorithm is full search method (FS), this arithmetic accuracy height, but amount of calculation is too huge.In order to accelerate arithmetic speed, guarantee precision, people have proposed a lot of fast search algorithms: three-step approach (TSS) and based on the improvement algorithm of three-step approach, two dimensional logarithmic method (LOGS), intersection search method (CS), four step rule (4SS), prediction search method (PSA), diamond (rhombus) search method (DS) etc.Diamond search (ds) is one of fast search algorithm of combination property optimum up to now.When the decision of piece matching criterior finds best matching blocks, thereby stops searching process.Traditional criterion has absolute average error function (MAE), cross-correlation function (CCF), mean square error function (MSE), worst error minimum function (MME) etc.Because conventional method is not considered the visual characteristic of human eye, so judged result and perception of human eyes differ bigger.The alternative criterion SAD that the actual piece matching criterior that H.263 adopts is MSE (absolute difference and), SAD has replaced the power operation of MSE with signed magnitude arithmetic(al), has obviously reduced operand, thereby can accelerate computational speed.

Based on the Video Coding Scheme of estimation, normally at first each two field picture is carried out format conversion at present, convert Y to, U, three components of V, giving tacit consent to first two field picture then is the I frame, and it is carried out intraframe coding; Force to specify the image in the sequence thereafter to be appointed as I frame, P frame or B frame, carry out in the frame or interframe encode to it respectively, wherein I frame and P frame can be as the reference frames of interframe encode; The dct transform of piecemeal is adopted in intraframe coding, uses quantification, entropy coding then, with the coded data input code flow; Interframe encode is divided into estimation, motion compensation, motion vector encoder and residual image coding several sections; The estimation of interframe encode is only carried out the Y component, and the motion vector that obtains is applicable to the UV component; Estimation is divided into two big classes, wherein the P frame adopts one-way movement to estimate, the B frame adopts bi-directional motion estimation, estimation is carried out image division for several macro blocks, comprise whole pixel motion estimation and half-pix estimation, the macro block after the estimation is divided into two kinds: inter piece and intra piece; For the inter piece, need carry out motion compensation according to its reference frame and present frame, and its residual error is carried out dct transform, quantification and entropy coding; For the intra piece, then directly it is carried out intraframe coding; Motion vector encoder adopts the Huffman encoding of difference.

In decoding end, at first giving tacit consent to first two field picture is the I frame, and it is carried out intraframe decoder; Can be appointed as I frame, P frame or B frame to the image in the sequence thereafter, carry out in the frame to it respectively or the interframe decoding, wherein the reduction frame of I frame and P frame can be as the reference frame of interframe decoding; Intraframe decoder is decoded to it according to input code flow, by the DCT inverse transformation of inverse quantization and piecemeal, obtains reducing frame data; The interframe decoding is divided into motion vector decoder, residual image decoding and motion compensation several sections, and motion vector decoder adopts the Hafman decoding consistent with coding side; The interframe decoding divides macro block to carry out, the macro block kind is divided into: inter piece and intra piece, for the inter piece, it is carried out the DCT inverse transformation of inverse quantization and piecemeal, and, carry out motion compensation according to the motion vector of its reference frame and decoding, for the intra piece, then directly it is carried out intraframe decoder, will go back original digital image data at last and carry out format conversion, be converted to rgb format from yuv format.

Yet, very low based on the compression efficiency of DCT piecemeal compress technique, guarantee to recover under the picture quality acceptable condition, compression ratio is about 1: 20～and 40, but under big compression multiple, blocking effect will make the picture quality rapid deterioration.H.261 and H.263 be to rely on the redundancy of video sequence to improve compression ratio in time domain and spatial domain, but under low code check condition, the frame per second of video sequence was significantly less than for 25～30 frame/seconds, thereby video sequence weakens in the correlation of time domain, caused H.261 and code efficiency H.263 must reduce.Moreover from should being used for of video telephone standard H.263, dct transform technology significant discomfort is closed in low code rate image transmission, and the blocking effect of being brought by dct transform and estimation is the principal element that influences picture quality.The block characteristic of dct transform makes the original image of going back of I frame also have mosaic, has reduced the also proper mass of image, has also influenced its motion estimation result as the reference frame.

And existing scheme does not take into full account the severe degree of image sequence motion yet, forces the type of coding of specify image, and encoding mechanism is dumb, can influence the also proper mass of image sequence; The cost function of interframe encode estimation is too single, can not be according to the type of coding of adaptive this two field picture of adjustment of the situation of estimation, and simultaneously, the Huffman encoding of motion vector is too complicated, and the length of code word also has redundancy, code efficiency is not high.

Summary of the invention

Technical problem to be solved by this invention is to provide a kind of small echo video coding-decoding method based on estimation, thereby can realize the reliable image compression of better quality lower or very under the low transmission rate.

The invention provides a kind of wavelet video coding method, determine that at first view data is converted to yuv format based on estimation; Giving tacit consent to first two field picture is the I frame, and thereafter frame of video is defaulted as the PB frame; Described frame of video is thereafter carried out estimation; According to the result of estimation, the adaptive type of readjusting described frame of video thereafter; According to video frame type, the I frame is carried out intraframe coding, the PB frame is carried out interframe encode.

Accordingly, the present invention also provides a kind of small echo video encoding/decoding method based on estimation, and at first giving tacit consent to first two field picture is the I frame, and obtains the type of coding of frame of video thereafter according to the information in the encoding code stream, determines that it is I frame or PB frame; Then the I frame is carried out intraframe decoder, the PB frame is carried out the interframe decoding; To go back the data format that original digital image data is converted to requirement at last.

The present invention with adaptive judgement replaced traditional MPEG and H.263 in can only do the mechanical way of an I frame every a fixed number, make coding method more flexible, make full use of a two field picture in time with the correlation of former frames, improved code efficiency, the characteristics of overall importance of all right combined with wavelet transformed, estimation to the PB frame pattern is improved, piece notion in the cancellation frame, can also improve as required Huffman code tables simultaneously, further shortened code length, thereby make the present invention realize the reliable image compression of better quality lower or very under the low transmission rate.

Description of drawings

Fig. 1 is the present invention's the wavelet video coding method flow chart based on estimation;

Fig. 2 is the present invention carries out estimation to frame of video an embodiment flow chart;

Fig. 3 is the applied LIST two-dimensional array of the present invention's a preferred embodiment schematic diagram;

Fig. 4 is the applied SDL two-dimensional array of the present invention's a preferred embodiment schematic diagram;

Fig. 5 is the applied SPIHT coding of the present invention's a preferred embodiment initialization schematic diagram;

Fig. 6 is the present invention's the small echo video encoding/decoding method flow chart based on estimation.

Embodiment

The invention provides a kind of wavelet video coding method based on estimation, as shown in Figure 1, determine that at first view data is converted to yuv format (step 101), generally speaking, be that data with rgb format are converted to yuv format, if but input directly is the data of yuv format, then this step can be omitted; Giving tacit consent to first two field picture is the I frame, and thereafter frame of video is defaulted as PB frame (step 102); Described frame of video is thereafter carried out estimation (step 103); According to the result of estimation, the adaptive type (step 104) of readjusting described frame of video thereafter; According to video frame type, the I frame is carried out intraframe coding, the PB frame is carried out interframe encode (step 105), wherein, I frame and P frame can be as the reference frames of interframe encode.

In order further to improve the quality of reduction image sequence, can also specify to described frame of video thereafter that to force a two field picture at regular intervals be the I frame, carry out intraframe coding.

View data is carried out image data format conversion, and the mutual conversion of RGB and YUV is a floating-point operation, and it is quite consuming time that entire image is carried out floating-point operation, can make floating-point operation into integer arithmetic when hardware is realized.

The RGB that adopts among the present invention to YUV integer conversion formula is:

Y＝[(9798R+19235G+3736B)/215]

U＝[(-4784R-9437G+14221B)/215]+128

V＝[(20218R-16941G-3277B)/215]+128

YUV is to RGB integer conversion formula:

U-＝128

V-＝128

R＝[(32768Y+31326U+20348V)/215]

G＝[(32768Y-8912U-21200V)/215]

B＝[(32768Y-36244U-55804M)/215]

As shown in Figure 2, when frame of video is carried out estimation, can be divided into present frame the macro block of M * N size, and default SAD threshold value (step 201); Then each macro block is searched in the reference frame scope, during greater than the SAD threshold value, then be considered as this macro block estimation failure (step 202) at the sad value of reference frame and primitive frame; Simultaneously, the macroblock number of the described present frame estimation failure of accumulative total, and default failure number threshold value (step 203); Threshold value (step 204) is counted in the macroblock number and the described failure of more described present frame cumulative movement estimation failure, if described present frame cumulative movement is estimated the macroblock number of failure and is counted threshold value greater than described failure, then described present frame is adjusted into the I frame, otherwise judge that then this frame is the P frame, and it is carried out interframe encode.

For example, when carrying out estimation, can adopt adaptive 16 * 16 or 8 * 8 search, if the SAD threshold value is SAD_threshold, failure macroblock number threshold value is distortion_num_threshold, the sad value of current macro is cur_SAD, the macroblock number of current failure is big_distortion_num, the macro block of present frame adds up to block_num, then, first cost function is to judge whether estimation failure of current macro, and judgment formula is: cur_SAD＞SAD_threshold, if satisfy cost function, then big_distortion_num adds 1; Second cost function is given threshold value distortion_num_threshold, and judgment formula is: distortion_num_threshold=block_num/K.Wherein the scope of K can be that the 8～16, three cost function is to judge whether estimation failure of present frame, and judgment formula is: big_distortion_num＞distortion_num_threshold.

With 16 * 16 macro blocks is example, if threshold value SAD_threshold=2000～5000 are set, judge whether this 16 * 16 macro block estimation is successful, after this macro block estimation, at reference frame with at the sad value＞SAD_threshold of primitive frame, just can think this macro block estimation failure.Cumulative movement is estimated the macroblock number of failure simultaneously, 16 * 16 macroblock number/8～16 of threshold value distortion_num_threshold=image are set, if a frame cumulative movement is estimated the macroblock number＞distortion_num_threshold of failure, judge that then this frame is the I frame, and it is carried out intraframe coding.Otherwise, judge that then this frame is the P frame, and it carried out interframe encode.

In order further to improve the efficient of frame of video encoding and decoding, intraframe coding can be adopted discrete wavelet (DWT) conversion of two dimension, uses quantification, zerotree image and arithmetic coding etc. then, and coded data is formed code stream; Interframe encode comprises estimation, motion compensation, motion vector encoder and residual error picture coding, wherein, the predictive frame that estimation obtains, according to its reference frame and present frame, carry out motion compensation, and its residual image carried out two-dimensional discrete wavelet conversion, and use quantification, zerotree image and arithmetic coding etc. then, coded data is formed code stream.

Specifically, the intraframe coding of I frame comprises the steps:

(1) raw image data after the transform format is carried out wavelet transformation;

(2) coefficient that produces behind the wavelet transformation is carried out zerotree image and arithmetic coding;

(3) insert control information in the data code flow that behind coding, generates, and output to channel frame by frame.

P frame interframe encode comprises the steps:

(1) according to corresponding motion vector, the respective macroblock in the translation reference frame forms motion compensated image, to motion vector encoder and form code stream;

(2) current frame image and motion compensated image are subtracted each other, obtain the surplus error image;

(3) the surplus error image is carried out intraframe coding,

B frame interframe encode comprises:

(1) according to the motion vector of its former and later two P frames or I frame, P frame, adopts corresponding motion compensation process, obtain the motion vector increment in (1 ,+1) scope, to the motion vector incremental encoding and form code stream;

(3) the Y component to the surplus error image carries out intraframe coding.

The intraframe coding of described surplus error image comprises the steps:

(1) surplus error image data is carried out wavelet transformation;

(2) coefficient that produces behind the wavelet transformation is carried out zerotree image and arithmetic coding, and form code stream;

With four layers of two-dimentional Daubechies (5,3) is example, and wavelet transformation is divided into line translation of a plurality of one dimension integer wavelet and one dimension integer wavelet rank transformation.The algorithm of line translation of one dimension integer wavelet and one dimension integer wavelet rank transformation is as follows:

s_{l}^{(0)} = x_{2 l}

d_{l}^{(0)} = x_{2 l + 1}

d_{l}^{(1)} = d_{l}^{(0)} + α (s_{l}^{(0)} + s_{l + 1}^{(0)})

s_{l}^{(1)} = s_{l}^{(0)} + β (d_{l}^{(1)} + d_{l - 1}^{(1)})

s_{l} = k \cdot s_{l}^{(1)}

d_{l} = d_{l}^{(1)} / k

Wherein, s _l ⁽⁰⁾Be the even samples of one dimension picture signal x, d _l ⁽⁰⁾Be the odd samples of one dimension picture signal x, s _lBe the even samples after the conversion, d _lBe the odd samples after the conversion; α, β are wavelet conversion coefficient, and k is stretching coefficient (α=-0.5, β=0.25, k=1.2～1.5).

For zerotree image wherein, preferably, can use the SPIHT coding, simultaneously, in order further to improve encoding-decoding efficiency, can do further improvement to the SPIHT coding, for example, replace original list structure in the SPIHT coding with two-dimensional array, the symbol with wavelet coefficient during to this initialization of two-dimensional array separates with absolute value, calculates the maximum value bit plane of wavelet coefficient; Calculate maximum descendants's value of all coefficients that the offspring is arranged; The array of non-important pixel LIP, important pixel LSP and three states of non-important set LIS is represented in initialization, with a unified two-dimensional array state table LIST representative.Preferred, in order further to improve compression ratio, the wavelet coefficient tree structure can also be increased one deck.

As shown in Figure 3, the LIST two-dimensional array is represented the state of adjacent two coefficients in the former coefficient matrix with a byte, and size is 1/2 of a former coefficient matrix, and LIS represents that with two bits LIP and LSP represent with two bits, and LSP is empty when initialization.Wherein A1A2B1B2 represents the state of i data, and C1C2D1D2 represents the state of i+1 data:

A1A2: expression LIS state B1B2: expression LIP, LSP state

01-----a class LIS 01----LIP

10-----b class LIS 10----LSP

00-----ignores 00----and ignores

As shown in Figure 4, a byte is represented the state that offspring's coefficient is arranged in the former coefficient matrix in the SDL two-dimensional array, the highest order plane value of storage SD and SL, and wherein SD represents to comprise all offsprings of direct offspring, SL represents not comprise direct offspring's indirect offspring.1 byte can be represented the SD and the SL highest order plane value value of a LIS coefficient, wherein high four bit representation SLmax bit plane values, low four bit representation SDmax bit plane values.

In the two-dimensional array SDL of maximum descendants's value of preserving all coefficients that the offspring is arranged, a byte can be represented maximum descendants's value of two adjacent coefficients, the specific implementation method can be used binary form, uses four bits that peaked highest order plane is replaced numerical value itself.For example: maximum descendants's value of adjacent two coefficients is respectively max1=1000, max2=500,2 ⁹＜max1＜2 ¹⁰, 2 ⁸＜max2＜2 ⁹, therefore, dominant bit plane=9 (0x1001) of max1, dominant bit plane=8 (0x1000) of max2 are represented with 0x10001001 in SDL.Begin coding from the highest order plane.

In the preferred version of initialization procedure, for LIP, the LIS status data amount that makes initialization array LIST reduces, further improve compression ratio, the lowest frequency coefficient of wavelet transformation can also be continued by four parts of spatial division, as the initialization data of LIP and LIS, being about to the wavelet coefficient tree structure increases one deck.SPIHT is a kind of wavelet coefficient coding method based on zero tree, and the feature of zerotree image is to utilize the self-similarity between the image wavelet transform coefficient between the different scale, can represent all offspring descendants' value with zerotree root bit tree seldom.More preferred embodiment of the present invention simply continues the lowest frequency coefficient to divide, and it is original 1/4 that initialized data are reduced to, and makes the position of zerotree root improve one deck, and the represented coefficient of one zero tree is more, and the compression efficiency of algorithm is improved.And this simple division though do not utilize self-similarity between frequency band, has utilized the similitude on the lowest frequency number of coefficients level, under the situation that does not increase algorithm complex, has increased compression efficiency.For instance, if a two field picture is carried out 4 layers of wavelet transformation, its concrete initialization as shown in Figure 5.

If utilize the wavelet transformation described in the present invention's the embodiment to carry out in the frame and the encoding and decoding of interframe, then it should be noted that, when taking exercises prediction, this does not meet the part of little wave property to need in the cancellation conventional motion estimation in the frame piece (intra-block), the notion in the frame is expanded to the notion of whole frame.This is because in conventional method, and after whole pixel motion estimation, encoder will determine that adopting INTER still is the INTRA mode, utilizes formula:

MB_mean = (Σ_{i = 1, j = 1}^{Nc} original) / N_{c}

A = Σ_{i = 1, j = 1}^{16,16} | original - MB_mean |

If A＜(SAD _Inter-2*N _B), just adopt the INTRA mode to encode, needn't proceed motion search; Otherwise adopt the INTER mode, continue near V0, to carry out the motion search of half-pix then.The INTER mode is current macro and directly adopts discrete cosine transform (DCT) conversion, as can be seen, why mark off the INTRA macro block, mainly be because estimation is too violent, at the characteristics of dct transform, in the INTRA macro block, the difference of the mean value of each pixel value and this macro block and very little, this pixel value that this macro block just is described is very average, very level and smooth, is fit to dct transform.But the wavelet transformation to handling at whole two field picture can't carry out independent conversion to the part.Therefore, the INTRA macroblock encoding is not suitable for the characteristic of wavelet transformation.

Then abandoned the notion of macro block during in a preferred embodiment of the invention, specific to intraframe coding.That is to say that the coding of I frame and the coding of residual image are the notions of abandoning macro block, because wavelet transformation is whole frame notion.But to moving image, still can adopt the estimation of piecemeal, thereby utilize motion sequence correlation in time, remove redundancy.Simultaneously, motion compensation can be taked overlapped block motion compensation method (Overlapped Block Motion Compensation).

The present invention used ITU-T H.263 with MPEG-4 in estimation and the PB frame pattern of compensation, further combine the characteristics of overall importance of wavelet transformation in a preferred embodiment, transform estimation and movement compensating algorithm among ITU-TH.263 and the MPEG-4, adaptive image is divided three classes: I frame, P frame and B frame.Wherein the I frame is made compressed encoding in the frame as the reference frame.The specific practice of PB frame is that unit of image composition that two frames are to be encoded carries out combined coding, form is as IBBPBBPBBP ... the P two field picture is carried out estimation, result according to estimation judges that with compressed encoding in the frame still be the interframe compressed encoding, if estimation failure, then this two field picture is readjusted frame, carried out encoding and decoding in the frame into I.For the B two field picture, then only on the basis of the motion vector of existing front and back two frame P frames, (generally being [1 ,+1]) carried out half picture element movement and estimated in a very little scope.

The P two field picture is carried out the search window of estimation and can select [32 ,+32] for use, rather than [48 ,+48] H.263, reduced the scope of search time and motion vector.To the coded reference of motion vector the calculus of finite differences in the standard, the Huffman encoding of tabling look-up, but the used form of encoding changes as required, and the code length relative standard has been shortened a lot.In order to shorten the code word size of coding, taked a series of measures:

1) write the motion vector unification of half-pix as positive number.For example whole pixel motion vector is (5,6), and the half-pix motion vector is (1 ,-1), then rewriting whole pixel motion vector is (4,5), and the half-pix motion vector is (+1, + 1), the half-pix motion vector can not put into the Huffman table like this, encode separately with a bit.

2) because the search window scope is [32, + 32], then the scope of motion vector is [8 ,+8], consider measure 1) in way, the scope of motion vector is [9 ,+8], but we take the mode that rounds up to the border, still be constrained to [8, + 8], the scope of Huffman output code table is [16 ,+16] like this.

Following table is exactly the Huffman output code table behind the motion-vector prediction, with respect to code table H.263, because it has included only whole pixel motion vector, thus significantly reduced the code word size of coding, and the quality of going back original image is not had influence substantially.

Huffman output code table behind the motion-vector prediction

MVD	Code word
MVD	Code word	0	0
+1	10s	0	0
+1	10s	+2	110s
+3	1110s	+2	110s
+3	1110s	+4	11110s
+5	111110s	+4	11110s
+5	111110s	+6	111110s
+7	1111110s	+6	111110s
+7	1111110s	+8	11111110s
+9	111111110s	+8	11111110s
+9	111111110s	+10	1111111110s
+11	11111111110s	+10	1111111110s
+11	11111111110s	+12	111111111110s
+13	1111111111110s	+12	111111111110s
+13	1111111111110s	+14	11111111111110s
+15	111111111111110s	+14	11111111111110s
+15	111111111111110s	+16	1111111111111110s

In an embodiment of the present invention, the original input picture of I frame or the surplus error image data of P frame are carried out code stream behind wavelet transformation and the SPIHT coding, can adopt two symbol QM encoders of adaptive model, carry out arithmetic coding.The QM encoder originates from the Q encoder of IBM Corporation, and it is by Langon, people's such as Rissanen early stage job development and come.The same with other arithmetic coding method, the QM encoder can be divided into binary coding and two parts of statistical model clearly on principle.The QM encoder is a kind of binary coding method, this means with regard to single context, and it can only be to 0 and 1 two encoding symbols.Symbols streams for example: 100011111001011011010 ...

Accordingly, the present invention also provides a kind of small echo video encoding/decoding method based on estimation, and it is equivalent to the inverse process of cataloged procedure basically.As shown in Figure 6, at first giving tacit consent to first two field picture is the I frame, and obtains the type of coding of frame of video thereafter according to the information in the encoding code stream, determines that it is I frame or PB frame (step 601); Then the I frame is carried out intraframe decoder, the PB frame is carried out interframe decoding (step 602); To go back the form (step 603) that original digital image data is converted to requirement at last, in general, be to be converted to rgb format from yuv format, if but only need the data of yuv format, then do not need to have changed with regard to it.

Wherein, corresponding to the wavelet transformation of the present invention's coding staff and the embodiment process of zerotree image, the intraframe decoder of decoding side can adopt arithmetic decoding, zero tree decoding and inverse quantization according to input code flow, use the discrete wavelet inverse transformation of two dimension then, obtain restoring data; The interframe decoding comprises motion vector decoder, residual image decoding and motion compensation, wherein to the residual image of interframe decoding, adopt arithmetic decoding, zero tree decoding and inverse quantization, use the 2-d discrete wavelet inverse transformation then, obtain restoring data, carry out motion compensation according to its reference frame then, draw the reduction frame.

Specifically, described I frame intraframe decoder comprises the steps:

(1) from decoded bit stream, isolates control information;

(2) code stream is carried out arithmetic decoding, again code stream is sent to zero tree decoded portion;

(3) code stream after receiving is carried out zero tree decoding in proper order;

(4) decoded data are carried out wavelet inverse transformation, the view data that obtains reducing is carried out format conversion.

Described P frame interframe decoding comprises the steps:

(1) from decoded bit stream, isolates control information, carry out motion vector decoder, secondly the code stream of arithmetic decoding zerotree image;

(2), constitute the predicted picture of present frame according to motion vector and previous frame decompressed image;

(3) described code stream is carried out zero tree decoding, obtain the wavelet conversion coefficient of surplus error image;

(4) decoded wavelet coefficient is carried out wavelet inverse transformation, recover the surplus error image;

(5) with predicted picture and the addition of surplus error image, decode current frame image.

The decoding of B frame interframe comprises the steps:

(1) isolate control information from decoded bit stream, the decoding motion vectors increment according to the motion vector and the motion vector increment of its former and later two P frames, constitutes the predicted picture of present frame;

(2) code stream is carried out arithmetic decoding and zero tree decoding, obtain the wavelet conversion coefficient of surplus error image;

(3) decoded data are carried out wavelet inverse transformation, recover the surplus error image;

(4) with predicted picture and the addition of surplus error image, decode current frame image.

With four layers of two-dimentional Daubechies (5,3) wavelet inverse transformation is example, and four layers of 2-d wavelet inverse transformation that decoded data code flow is carried out are divided into capable inverse transformation of a plurality of one dimension integer wavelets and the inverse transformation of one dimension integer wavelet row.The algorithm of capable inverse transformation of one dimension integer wavelet that is divided into and the inverse transformation of one dimension integer wavelet row is as follows:

d_{l}^{(1)} = κ \cdot d_{l}

s_{l}^{(1)} = s_{l} / κ

s_{l}^{(0)} = s_{l}^{(1)} - β (d_{l}^{(1)} + d_{l - 1}^{(1)})

d_{l}^{(0)} = d_{l}^{(1)} - α (s_{l}^{(0)} + s_{l + 1}^{(0)})

x_{2 l + 1} = d_{l}^{(0)}

s_{2 l} = s_{l}^{(0)}

In the formula: d _lBe the high frequency samples of signal, s _lBe the low frequency samples of signal, d _l ⁽¹⁾Be the high frequency samples behind stretching, s _l ⁽¹⁾Be the low frequency samples behind stretching, α, β are wavelet conversion coefficient, s _l ⁽⁰⁾Be the high frequency samples after the conversion, d _l ⁽⁰⁾Be the low frequency samples after the conversion, (α=-0.5, β=0.25, k=1.2～1.5).

Accordingly, as more preferred embodiment, described zero tree decoding also can increase one deck with the wavelet coefficient tree structure when initialization.

Claims

1, a kind of wavelet video coding method based on estimation is characterized in that comprising:

(1) determines that image data format is a yuv format;

(2) acquiescence first two field picture is the I frame, and thereafter frame of video is defaulted as P frame or B frame;

(3) described frame of video is thereafter carried out estimation;

(4) according to the result of estimation, the adaptive type of readjusting described frame of video thereafter;

(5) according to video frame type, the I frame is carried out intraframe coding, P frame or B frame are carried out interframe encode.

2, the method for claim 1 is characterized in that in the described step (2), and it is the I frame that frame of video is thereafter forced appointment one two field picture at set intervals.

3, the method for claim 1 is characterized in that described step (3) comprising:

(3-1) present frame is divided into the macro block of M * N size, and default SAD threshold value;

(3-2) each macro block is searched in the reference frame scope, during greater than the SAD threshold value, then be considered as this macro block estimation failure at the sad value of reference frame and primitive frame;

(3-3) macroblock number of the described present frame estimation failure of accumulative total, and default failure number threshold value;

(3-4) threshold value is counted in the macroblock number and the described failure of more described present frame cumulative movement estimation failure.

4, the method for claim 1 is characterized in that described step (4), counts threshold value if described present frame cumulative movement is estimated the macroblock number of failure greater than described failure, then described present frame is adjusted into the I frame.

5, the method for claim 1 is characterized in that the intraframe coding of the described I frame of step (5) comprises:

Raw image data after the transform format is carried out wavelet transformation;

The coefficient that produces behind the wavelet transformation is carried out zerotree image and arithmetic coding;

Insert control information in the data code flow that behind coding, generates, and output to channel frame by frame.

6, the method for claim 1 is characterized in that the described P frame of step (5) interframe encode comprises:

According to corresponding motion vector, the respective macroblock in the translation reference frame forms motion compensated image;

Current frame image and motion compensated image are subtracted each other, obtain the surplus error image;

Motion vector is carried out Huffman encoding, the surplus error image is carried out intraframe coding,

B frame interframe encode comprises:

Motion vector according to its former and later two P frames or I frame, P frame adopts corresponding motion compensation process, obtains the motion vector increment in (1 ,+1) scope, and the motion vector increment is carried out Huffman encoding;

Y component to the surplus error image carries out intraframe coding.

7, method as claimed in claim 6 is characterized in that the intraframe coding of described surplus error image comprises:

Surplus error image data are carried out wavelet transformation;

8,, it is characterized in that described wavelet transformation adopts a coefficient of dilatation k to be optimized k=1.2～1.5 as claim 5 or 7 described methods.

9,, it is characterized in that described zerotree image increases one deck with the wavelet coefficient tree structure when initialization as claim 5 or 7 described methods; Calculate maximum descendants's value of all coefficients that the offspring is arranged, and it is stored among the two-dimensional array SDL, represent the state that offspring's coefficient is arranged in the former coefficient matrix, replace numerical value itself with peaked highest order plane with a byte.

10, method as claimed in claim 3 is characterized in that the search window of described estimation is selected for use [32 ,+32].

11, method as claimed in claim 6 is characterized in that described motion vector encoder, is write the motion vector unification of half-pix as positive number.

12, method as claimed in claim 6 is characterized in that described motion vector encoder, is that the mode that the border takes to round up is retrained.

13, a kind of small echo video encoding/decoding method based on estimation is characterized in that comprising:

(1) acquiescence first two field picture is the I frame, and obtains the type of coding of frame of video thereafter according to the information in the encoding code stream, determines that it is I frame or P frame, B frame;

(2) the I frame is carried out intraframe decoder, P frame, B frame are carried out the interframe decoding;

(3) will go back the form that original digital image data is converted to requirement.

14, method as claimed in claim 13 is characterized in that the described I frame of step (2) intraframe decoder comprises:

From decoded bit stream, isolate control information, carry out motion vector decoder;

Code stream is carried out arithmetic decoding, again decoded bit stream is sent to zero tree decoded portion;

Code stream after receiving is carried out zero tree decoding in proper order;

Decoded data are carried out wavelet inverse transformation, and the view data that obtains reducing is carried out format conversion.

15, method as claimed in claim 13 is characterized in that the decoding of the described P frame of step (2) interframe comprises:

From decoded bit stream, isolate control information, carry out motion vector decoder, secondly the code stream of arithmetic decoding zerotree image;

According to motion vector and previous frame decompressed image, constitute the predicted picture of present frame;

Described code stream is carried out the wavelet coefficient that zero tree decoding forms the surplus error image;

Decoded wavelet coefficient is carried out wavelet inverse transformation, recover the surplus error image;

With predicted picture and the addition of surplus error image, decode current frame image,

The decoding of B frame interframe comprises:

From decoded bit stream, isolate control information, decode the motion vector increment, and, constitute the predicted picture of present frame according to the motion vector and the motion vector increment of its former and later two P frames;

Code stream is carried out arithmetic decoding and zero tree decoding, obtain the wavelet conversion coefficient of surplus error image;

With predicted picture and the addition of surplus error image, decode current frame image.

16,, it is characterized in that described wavelet inverse transformation adopts a coefficient of dilatation k to be optimized k=1.2～1.5 as claim 14 or 15 described methods.

17,, it is characterized in that described zero tree decoding increases one deck with the wavelet coefficient tree structure when initialization as claim 14 or 15 described methods.