CN100499788C

CN100499788C - Video encoding devices

Info

Publication number: CN100499788C
Application number: CNB2004800063908A
Authority: CN
Inventors: 帕梅拉·科斯曼; 阿萨纳西奥斯·利昂塔里斯; 维杰伊·切拉帕
Original assignee: University of California
Current assignee: University of California
Priority date: 2003-01-09
Filing date: 2004-01-09
Publication date: 2009-06-10
Anticipated expiration: 2024-01-09
Also published as: CN1759610A

Abstract

A dual, and possibly multiple, frame approach is used by the invention. Embodiments of the invention include making a decision to use a long term reference frame, which is a frame other than an immediate past reference frame, to conduct INTER coding, or to conduct INTRA frame coding. Other embodiments include use of long and short term reference blocks, and make a decision between two types of INTER coding blocks and INTRA coding. In accordance with embodiments of the invention, a long term frame is a high quality frame. The high quality frame may be used as a reference frame under particular conditions.

Description

Video encoder

Technical field

The field of the invention is image and video coding.Other field of the present invention comprises the digital communication and the stored digital of image and video.

Background technology

Be upwards curve (sheer) owing to accurately describe the needed amount of digital data of single frame of image, thereby the digital communication of view data is the task of a difficulty with storage.In video, it is very big that data volume becomes very soon.Image encoding attempts promptly to make the communication of view data and/or storage to manage by reducing the necessary data volume of presentation video by compressing image data.For example, the communication resource has limited bandwidth.Especially true in wireless communication medium.In coded image data, exist compromise.For example, reducing size of data should not make image quality decrease surpass receivable degree.In addition, essential Management Calculation cost and speed are especially in the equipment that should save computational resource and electric power resource.The modern example of video coding/compression method comprises MPEG-4 and H.264.Especially, the latter is that specialized designs is used for the video transmission on the packet network.

The more than a kind of communication media of many device accesses.Can visit a plurality of networks as equipment such as laptop computer, PDA(Personal Digital Assistant), work station or video conferencing systems.For example, an equipment can be visited several dissimilar cable networks and wireless network.

Many now video compression algorithms utilize motion compensation to obtain sufficient compression.The basic thought of motion compensation is as follows.Macro block (macroblock) presentation video data block, for example, the square region of 16 * 16 pixels in the image.Macro block in the present frame to be encoded is compared to find the most similar one to some set of macroblocks in the reference frame.Reference frame is the frame formerly in the image normally.Usually measure similitude with the absolute value sum of pixel difference or the difference of two squares between the pixel.Can specify the position of this best matching blocks by given offset vector, this offset vector is called motion vector, and its horizontal and vertical position of having described between current macro to be encoded and the optimum Match macro block in the reference frame is poor.May only use this motion vector to represent current macro to be encoded.Decoder once receive this motion vector can from reference frame take out reference block and with this reference block paste the expression current block the position.If reference block is enough similar to current block, then this direct replacement can provide suitable quality.If they are approaching inadequately, then decoder can send some alternatively together and describe how to revise reference block so that make it and extraneous information that current block is more approaching.

Under any situation in the two, this all be known as INTER () coding.When decoder did not find the matched well of current macro, decoder can be selected by oneself current macro to be encoded, and not with reference to the piece in any other past.This is known as INTRA (interior) coding.Selection between INTER and INTRA coding is video encoding standard MPEG, MPEG-2, MPEG-4[T.Sikora, " TheMPEG-4 Video Standard Verification Model ", IEEE Transactions onCircuits and Systems for Video Technology, vol.7, no.1, pp.19-31, February 1997.], H.263[G.Cote, B.Erol, M.Gallant and F.Kossentini, " Coding at Low Bit Rates H.263+:Video; " IEEE Trans.Circ.And Systemsfor Video Techn, vol.8, no.7, pp.849-865, Nov.1998.] and up-to-date and current techniques H.264 in available Basic Ways.

The INTER coding is tending towards than INTRA coding needs bit still less, but can propagated error.Because INTRA coding is with reference to frame formerly, thereby it can not propagate the error that exists in the frame formerly.Between INTER and INTRA coding, select to relate to satisfy and use less bit and the competition purpose of wrong robust.

Making wise selection between INTER and INTRA coding is the problem that this area is paid close attention to.The one piece of paper that relates to this theme is R.Zhang, S.L.Regunathan, K.Rose " Video Codingwith Optimal Inter/Intra-Mode Switching for Packet Loss Resilience; " IEEE Journal on Selected Areas in Communications, vol.18, no.6, pp.966-76, June 2000.This piece paper provides a kind of ROPE of being called (recursive optimalper pixel estimate, optimum each pixel estimation of recurrence) distortion method of estimation, it considers two factors in distortion estimator: the hidden ability (concealability) of the channel errors probability and the piece that is encoded.Selection between the competition purpose that reduces distortion (estimating as ROPE) by balance then and only use the bit of peanut to be used for coding (especially remaining within the targeted rate constraint) is made and being encoded for the INTER of given (such as macro block) and INTRA.Rate constraint favor INTER coding, error constraint favor INTRA coding, and the ability of concealed errors favor INTER coding.

When the connection that is used to transmit video data stood quality and changes, result's video decode can produce the result of non-constant.When reference frame provided the reference of poor quality, decoded result worsened fast.A kind of technology that addresses this problem that has proposed is to keep a plurality of frames.But this can make coding burden and complexity very high.

The example that has many a plurality of reference-frame-method.[for example see also, N.Vasconcelos andA.Lippman, " Library-based Image Coding; " IEEE InternationalConference on Acoustics, Speech, and Signal Processing, vol.v, pp.V/489-V/492,1994; T.Wiegand, X.zhang, and B.Girod, " Long termMemory Motion-Compensated Prediction, " IEEE Trans.Circ.And Systemsfor Video Techn., vol.9, no.1, pp.70-84, Feb.1999.].In one example, when the piece among the frame N of video was encoded, encoder may be sought best possibility match block in frame N-1, N-2, N-3 and N-4.That is, can search for 4 next-door neighbours' past frame to mate.Encoder can tell which reference frame of decoder that optimum Match is provided then.For example, can distribute 2 bits to describe which provides optimum Match in 4 frames, the skew between the position that common motion vector provides best matching blocks in current block to be encoded and the specified reference frame is provided then.

At T.Fukuhara, K.Asai, and T.Murakami, " Very Low Bit-Rate VideoCoding with Block Partitioning and Adaptive Selection of TwoTime-Differential Frame Memories; " IEEE Trans.Circ.And Systems forVideo Techn.Vol.7, no.1, pp.212-220, among the Feb.1997, only use two time differences (time-differential) frame, therefore on computation complexity, needed little relatively increase.This pair frame buffer is the special case of multi-frame buffer, wherein has only two reference frames.For example, a short-term reference frame (next-door neighbour's past frame) and a long term reference frame (from the frame in farther past) can be arranged.In people's such as Fukuhara paper, frame is a frame formerly, so in the polyhybird codec like that, and second frame comprises the frame from the farther past that periodically updates according to pre-defined rule.Demonstrated multi-reference frame and can increase to cost in the significant gain of reconstruct PSNR (PeakSignal-to-Noise Ratio, Y-PSNR) aspect generation with computation burden and storage complexity.Estimation is the main performance bottleneck in the hybrid video coding system, and occupies the total coding time that is higher than 80-90%.Thereby, also doubled the scramble time even increase an extra frame buffer.So same concerning storage requirement, thus wherein this increase also is that the linear number increase along with reference frame is unallowed.

A kind of best all the time connection, (Always Best Connected, ABC) method was the method for using when equipment inserts a plurality of the connection.Equipment such as laptop computer or PDA may can insert several dissimilar wireless or cable network that operates in different rates.For example, this equipment may be able to use Ethernet connection (10Mbps), WLAN (11Mbps), HDR (400-500Kbps), 1xRTT (64Kbps) and GPRS (16Kbps) to communicate.At any given time, suppose that this user does not select its user to gather any other network in the catalogue (profile), this equipment will use this particular moment its best that can insert connect and move.The best connection often is the connection with peak data rate, but also comprises other factors (for example, error rate, delay or the like).If best the connection becomes unavailable, perhaps its deterioration is to no longer being the best point that connects, and then expection equipment seamlessly switches to other connection, and promptly new the best connects.This connection often is the connection that is in than low rate.To detect all connections available so which to be checked with also expecting this equipment periodic.If two-forty connect become unavailable and and then become available, then expect this availability of this device discovery, and switch back to use this network.

Summary of the invention

The present invention has used two frames and may be the method for multiframe.Embodiments of the invention comprise making and use long term reference frame to carry out the INTER coding still to carry out the decision of INTRA coding that this long term reference frame is the frame that is different from next-door neighbour's past reference frame.Other embodiment comprises and uses long-term and short-term reference block, and encodes at INTER encoding block and INTRA and to make decision between these two types.

According to embodiments of the invention, long-term frame is a top-quality frames.This top-quality frames can be used as reference frame under given conditions.

The invention provides a kind of video encoder, comprising: short-term reference block buffer is used to store at least one short-term reference block; At least one long term reference block buffer is used to store at least one long term reference piece; Code device, be used for vector is encoded to describe at least one image block with respect at least one reference block, described code device is when predicting will change to than low quality by the employed connection of described video encoder the time, come to select between described at least one short-term reference block in described short-term reference block buffer and described at least one the long term reference piece in described long term reference buffer based on one or more factors of being checked when encoding, described factor comprises one or more in the following factor: described encoder is to the expection of the distortion at decoder place; The number of the frame buffer in the encoder; The size of the frame buffer in the encoder; Any feedback from decoder; Change the history of data channel quality; Change the history of image-region quality, so that between described at least one long term reference piece and described at least one short-term reference block, select selectively for each at least one piece that just is being encoded.

The present invention also provides a kind of method that is used for video coding, and described method comprises the steps: normally to encode and store the normal quality reference frame; During the high-quality channel situation, also encode and the storage of high quality reference frame; Using this high-quality reference frame to align all or part of of frame that is encoded encodes.

The present invention also provides a kind of method that video is decoded of being used for, and described method comprises the steps: to receive encoded video; And, optionally select to use the long-term frame of being stored of high-quality, to improve just decoded present frame no matter whether encoded video encodes by the long-term frame of use high-quality when aligning decoded present frame coding.

The present invention also provides a kind of video encoder, comprising: a plurality of frame buffers, and wherein the storage of at least two frame buffers is from the frame of video information of discontinuous frame; Code device is used for coming frame of video is encoded with reference to the one or more information that is stored in described a plurality of frame buffers; And control logic, be used for by select one of jump renewal or continuous renewal or general renewal or any aperiodicity renewal to upgrade described a plurality of frame buffer according to the optional undated parameter of freedom.

Description of drawings

Figure 1A and 1B are the block diagram of diagram according to the example system of the embodiment of the invention;

Fig. 2 is the schematic diagram that is shown in the of the present invention exemplary long term reference frame coding method in the speed handover situations;

Fig. 3 A and 3B schematically illustrate two kinds of schedulers; The outer bandwidth of the scheduler allocation of Fig. 3 B gives the user to produce the high-quality reference frame;

Fig. 4 schematically illustrates the long term reference frame coding method that has feedback according to the embodiment of the invention; And

Fig. 5 schematically illustrates the long term reference frame coding method that has feedback according to the embodiment of the invention.

Embodiment

The present invention is directed to the view data that is used for such as video and carry out Methods for Coding, equipment and system.The present invention is also at the method, apparatus and system that are used for data are decoded.The present invention can be used in the system of many types of utilizing communication media.For example, the present invention can be used in the peer-type communication and during the server client type is communicated by letter, and can be used for for example view data being encoded to store.A certain exemplary embodiments of the present invention is video conferencing system and method.Embodiments of the invention may be specially adapted to wherein, and video connects the situation that changes the speed significant quantity.

In the process of describing the embodiment of the invention, discussion will focus on the processing to frame.This will be to common realization of the present invention.Yet, in standard, handle and compare based on piece.For purpose of the present invention, block size normally arbitrarily.As in the above-mentioned standard, common realization will be used macro block.As adopting, a frame can be divided into a plurality of, herein such as macro block.Yet, frame sign is not particularly limited, although the frame sign of embodiments of the invention in may the use standard.

The present invention can be used as the correction of many above-mentioned standards and uses.The present invention can utilize the motion compensation vector that calculates in any of these standards, and also can utilize other method to carry out vector calculation.By the present invention, improved known standard, although the present invention has utilized universal standard framework and known hardware to realize.

The present invention has used the method for a kind of pair of frame and possible multiframe.Embodiments of the invention comprise making and use long term reference frame to carry out the INTER coding still to carry out the decision of INTRA coding that this long term reference frame is the frame that is different from next-door neighbour's past reference frame.Other embodiment comprises and uses long-term and short-term reference block, and encodes at INTER encoding block and INTRA and to make decision between these two types.Optimum each pixel of recurrence estimates that (ROPE) is used for calculating the square (moment) that is used for INTRA coding and short-term reference block, and optimum each pixel of the recurrence of revising estimates to be used for calculating the square that is used for the long term reference piece, and wherein formerly the element of piece is regarded stochastic variable as.

According to embodiments of the invention, long-term frame is a top-quality frames.Top-quality frames can be used as reference frame under given conditions.For example, this condition can comprise that the quality of the connection of passing through its receiver, video changes.Top-quality frames for example can be the past frame from the moment that better quality connects.In a further exemplary embodiment, before changing, the expection of communication quality kept a top-quality frames or a plurality of top-quality frames.This can be that low amount inferior is warned or prediction result in advance, perhaps can periodically carry out, and for example, can obtain to guarantee top-quality frames.For example, can not take the bit rate of being distributed that (starve) its typical case takies and being that cost is encoded to the high-quality reference frame once in a while by forcing to make at preceding frame and subsequent frame at preceding and subsequent frame.In extra embodiment, frame is divided into static state and dynamic part.For example, can discern background parts and prospect part.The high-quality reference frame of background parts is because its static characteristic thereby the time period that can use abundant length, and the prospect part is carried out motion compensation by frame nearest or that receive recently.

Embodiments of the invention comprise that two frames or multiframe keep embodiment and single reference frame keeps embodiment.For example, two frame embodiment compare (in the present frame to be encoded) each piece, and carry out motion compensation by the frame of determining the better result of generation with high-quality reference frame in the past and (short-term) reference frame recently.The top-quality frames in past remained in the long term reference frame buffer and will reference frame recently remain in recently the reference frame buffer.When recently reference frame is higher than reference frame quality in the past, then its high-quality reference frame as the past is retained in the long term reference frame buffer, and abandons the high-quality reference frame in previous past.In exemplary single frames embodiment, keep single reference frame buffer.In this single reference frame buffer, keep the high-quality long term reference frame till it is discarded, for example, the similitude of this high-quality long term reference frame and present frame to be encoded drops under the certain threshold level, thereby has made the decision as the reference frame of the past frame of bringing into use the next-door neighbour.

Encoder complexity among the of the present invention pair of frame or the multiframe embodiment does not have complexity of the prior art, and this is because kept top-quality frames as benchmark (bench mark).Therefore, for example, long-term top-quality frames carries out a large amount of comparisons with regard to needs not as long as satisfying certain yardstick.In addition, can be aperiodically or periodically carry out long-term frame and the test relatively of frame recently.Single frame embodiment of the present invention provides the low complex degree encoder, and does not need extra encoder complexity.

The preferred embodiments of the present invention method is carried out the long-term motion compensation by use from the data of the higher rate connection in past.Consider following situation, wherein, video coding apparatus is by using for example best all the time (ABC) Network Transmission that connects, thereby and current best quality of connection worsen the user and must switch to connection than low rate.Until the frame of N is high-quality, because they transmit on high speed channel, and from frame N+1, the user has low rate now and connects from 1 in consideration.Utilization such as MPEG-2 or normal video code device H.263, when low rate connect to continue during video quality will be lower.Frame N+1 (its must with lower rate transmissions) can use the INTER coding from frame N (it has high-quality), and thereby the quality of frame N+1 will benefit from by using the high-quality reference frame to finish block matching motion and compensate this fact.But owing to cut down speed, so frame N+1 will be lower than frame N quality on average.

When frame N+2 is encoded, encode by using frame N+1 to finish INTER now, thereby quality descends manyly.Corresponding to than low rate, low-qualityer transmission, people can estimate that this quality descends quite soon up to obtaining certain new stable state.Under this sight, embodiments of the invention use two frame motion compensation process, and its medium-term and long-term frame buffer will comprise for example frame N, promptly lose efficacy and last frame that relatively poor connection connects from the high-quality two-forty before replacing this good connection in good the connection.Thereby, for coded frame M, wherein M〉and N+1, two frames that can be used to carry out motion compensation will be arranged: short-term frame M-1 (low-quality next-door neighbour's past frame) and long-term frame N (frame in high-quality past far away).

Consider that scene wherein to be encoded is very static, the situation as a lot of video conference application.When equipment when the WLAN of for example 10Mbps connects the GPRS that switches to 16Kbps downwards and connects, it is quite fixing that the background in the scene can keep.Use can provide the point-device coupling to present frame from the long-term motion compensation of the superior quality in past.On the other hand, owing to 16Kbps is low-down speed and will produces low-quality video, may provide very poor result so only use short-term frame in the past to be used for motion compensation.The top-quality frames in past can provide advantage also to have other reason.With the still connection to change of constant connection, the top-quality frames in past provides the another kind of possible situation of advantage all can take place.For example, allow and consider that people wherein or thing leave scene and turn back to the video conference application of this scene then after a while.Only use next-door neighbour's past frame can not provide high quality information as coded reference.In fact no matter, turn back in first frame of scene at them, be not used in their reference information at all, be high-quality or low-quality.If kept the high-quality past frame,, also will have high-quality reference information even then when they turn back in the scene for the first time.Thereby the high-quality reference frame from the long-term past also can be useful situation even Here it is in scene right and wrong static state.Turn aside frequently their face is sightless then when they go back to again to some frames thereby such advantage also may occur in people in the video conference situation.If the past frame that only uses the next-door neighbour when they go back to, will be not used in the reference of their face so at first as reference information.

The correction and the extra embodiment that use higher rate in the past to connect comprise that wherein existence is for example to the warning in advance of quality of connection variation or the situation of prediction.Consider that equipment wherein has certain early warning or can predict that current best being about to of connecting break down and need switch to the situation that low rate connects.Recognize that decoder can utilize special top-quality frames to be used for long-term frame buffer well, then transmitter in fact can be encoded to having a high-quality especially frame before switching.That is, make formerly by forcing that frame and subsequent frame do not take the bit rate that they take usually, encoder can for example use meticulousr quantification to produce high-quality especially frame.When switching to when connecting than low rate, decoder can use this particular frame as the long-term frame buffer in two frame motion compensation.

In another is revised, aperiodically or periodically top-quality frames is encoded.A reason of doing like this is the variation of prevention quality of connection.In other cases, even this also may be useful in the occasion of not considering the quality of connection variation.Consider to make unique user to use (for example) less bit to be used for 9 frames so that give the 10th frame (long term reference frame that produces better quality thus is to be used for ensuing 10 frames) with extra bit.This can be used for producing the video of better quality on the whole and not increase on required bandwidth.Extra embodiment is included in the low discharge time durations or coding top-quality frames when additional bit speed can get.This also can for example divide the bit rate that is used in top-quality frames to control by server.For example, if many users just at shared bandwidth, can take turns a part of changeing bandwidth by the scheduler that server is finished between user's (client).Each client periodically is assigned with one section extra bandwidth.This extra bandwidth is used for encoding as the top-quality frames of reference frame.Another embodiment relates to by forcing and makes some formerly frame or subsequent frame do not take the bit that distributes for them and make client or peer users use they self allocated bandwidth to compensate to create the high-quality reference frame.

Embodiments of the invention can also recognition data static state and dynamic part and adjust coding based on this identification.Suppose to exist this situation, video wherein to be encoded and transmission is grouped into by the static or approximate static background parts and the foreground portion of motion.This for example can occur in the visual telephone and video conference application.If present frame to be encoded can be divided into background area and foreground area, then can carry out motion compensation to background (supposition is static), and prospect (supposition nonstatic) be carried out motion compensation by the frame that uses most recent by using the top-quality frames in the long-term past.In addition, have following situation, wherein the dynamic part of data such as prospect is benefited from the reference to the long-term frame of high-quality.For example, wherein situation in object or the people video that leaves and return has after a while provided the situation that multidate information is wherein benefited from one or more long-term top-quality frames.

Any of extra embodiment of the present invention and previous embodiment or the embodiment that lists below combines and uses the prediction of single frame length phase.Get for using two frame notions, use single frame, promptly up to its become be used for the moment of the high-quality long-term past frame of having of motion compensation (that is, its so different so that preferably do not re-use this and be close to low-quality frames in the past and be used for predicting) till discarded with present frame till.Different with two frame buffer notions, this at the encoder place without any need for extra complexity.

Extra embodiment of the present invention hidden with decoder (concealment) is relevant.When vision signal is compressed and during in insecure channel, must takes the strategy of some process errors.A strategy is error concealing (error concealment), and this is the adoptable a series of post-processing approach of decoder.When the part of the frame that receives was destroyed, the post-processing approach of decoder was attempted hidden this point to spectators.Have various replaceability ways: the time hidden (temporal concealment) of the suitable piece in location in reference frame is estimated, related to spatial domain interpolation, frequency domain.In an embodiment of the present invention, suppose that encoder encodes according to standard procedure, promptly formerly frame as reference frame.Yet decoder can utilize extra reference frame (in the past long-term top-quality frames) and frame formerly.Be not only to have when losing just hiddenly, decoder can utilize hidden when receiving the very inferior macro block of quality, can substitute the very inferior macro block of this quality with the piece of better quality by using high-quality reference frame in the past.This can be called improvement rather than hidden, but it can be considered as hidden variant of time.In essence, when quality is enough poor, even can select to think lost piece in fact this piece do not lose, and can use and wherein use the high-quality past frame to replace losing losing of piece hidden (loss concealment) method.

Embodiments of the invention also comprise the various combinations of above-mentioned coding decision and framework.To go through some certain preferred embodiment now, various for a person skilled in the art extra inventive features will be conspicuous.

Figure 1A and 1B show and are used to implement example system of the present invention.The system of Figure 1A and 1B can use any of embodiment above the present invention.Figure 1A shows encoder.Figure 1B shows decoder.In encoder, long-term frame is estimated the estimation of memory 10 storages for the long-term frame that is used for model selection.These estimations are to determine by the ROPE algorithm of revising.Another memory that constitutes extra storage 12 can be used for storing than the frame that is stored in long term memory 10 estimates more Zao frame estimation.In the embodiment of the invention of having used feedback, adopt extra storage 12.The short-term frame is estimated to be stored in the short-term frame and is estimated in the memory 14.

Speed/code device control logic 16 uses frame to estimate to come calculated distortion and carry out rate-distortion optimization.Speed/code device control logic is also controlled quantization parameter (quantization parameter, QP) selection, motion compensation and decoding (re-decoding) control again.Switch 18 by speed/code device control logic control send image pixel (INTRA coding) or image pixel poor (INTER coding) to discrete cosine transformer (discrete cosine transformer, DCT) 20.The given QP that is selected by speed/code device control logic 16, the image DCT coefficient of 22 pairs of DCT20 outputs of quantizer quantize and export quantification index (coefficient).Given quantification index, inverse quantizer 24 utilize QP reconstructed image DCT coefficient.Oppositely DCT 26 receives DCT coefficient (for example 8 * 8 DCT coefficient) and converts them to image pixel or pixel value difference.

Under the situation of INTER coding, switch 28 adds and is used for the prediction that the motion compensation of autokinesis compensator 30 obtains.Switch 28 keeps off-state under the situation of INTRA coding.Motion compensator 30 carries out estimation as prediction to present frame by using from the short-term frame of being stored of short-term frame memory 32 with from the long-term frame of being stored of long-term frame memory 34.The motion vector that ground of equal value, motion compensator can priority of use obtain in the last stage carries out motion compensation.Motion compensator also rewrites decoding for and frame that its reference frame has cushioned available to its feedback in memory 36, the frame behind the extra recompile in the past of memory 36 storage wherein, the frame behind this in the past extra recompile are used for for it being fed back current available frame decode again (at feedback embodiment of the present invention).Then the estimation of being stored in the extra storage 12 equals recompile, promptly equals to be stored in the frame in 36.It is illustrated separately, and this is because it is used for decoding again in 36 as model selection here.

(Figure 1B) locates at decoder, and inverse quantizer 38 receives quantification index and utilizes QP reconstruct DCT coefficient.This inverse quantizer is in the inverse quantizer 24 that all is equivalent to aspect each in the encoder.Oppositely DCT 40 receives the DCT coefficient and converts them to image pixel or difference pixel value.Oppositely DCT is equivalent to the reverse DCT 26 in the encoder.Switch 42 adds prediction or does not add prediction, and this switch 42 is equivalent to the switch 28 of encoder.Short-term frame memory 44 storages reconstructed frame formerly, 46 pairs of motion vectors that receive of motion vector decoder are decoded, motion compensator 48 carries out motion compensation by using by 46 motion vectors that provide and by the reference that short-term frame memory 44 and long-term frame memory 50 provide, the long-term frame after these long-term frame memory 50 storage reconstruct.

Preferred extra ad hoc approach of the present invention will be discussed now, and this ad hoc approach can be used in also in Figure 1A and 1B or other system.Be used in the system 10 of Figure 1A and 1B the method for video coding is used top-quality frames.Video data is divided framing.Frame can be divided into the zone, such as background and prospect.If encoder has the intelligence (intelligence) about video data then this will be useful.Video data can also be divided into the unit such as piece or macro block, it is simpler that this calculates coding.Also frame can be done as a whole treating.Encoder is determined the expection distortion at the decoder place.This can be based on the history of channel errors probability and past coding mode.Encoder uses the high-quality past frame to improve the frame that just is being encoded.

Encoder can also the recognition data variations and the content of the image that just is being encoded of storage or image-region in change.Can use feedback to come the renewal version of the expection distortion at storage decoder place.Thereby then by use encoder to the size of the number of the expection of decoder place distortion, frame buffer, frame buffer, from any feedback of decoder, change the data channel quality history, change the content in the frame buffer of history image or regional quality determine to be stored in to(for) each frame that just is being encoded or zone (such as the macro block) of the image that is transmitted, determine which frame buffer is used as reference buffer.Determine with maximum compression than or video quality or weigh both module.

Switching between INTER and the INTRA coding is the decision that realizes in the embodiment of the invention.Utilized optimum each pixel of recurrence to estimate (ROPE), decision supplies a pattern in its hybrid video coding apparatus on operating in packet erasure (packet erasure) channel.

Suppose that video bit stream transmits on the packet erasure channel.With each frame be divided into the piece group (Groupsof Blocks, GOB).Each GOB comprises the macro block of single horizontal segment (slice), and (macroblock MB) and as single grouping transmits.Because sync mark again (resynchronization marker) is so can independently receive each grouping and to its decoding.Like this, single packet loss has just been erased a MB segment, but keeps the remainder of frame harmless.

If p is the probability of packet erasure, it also is the probability of erasure of every single pixel.When decoder detects when wiping the application error concealment method.Decoder is used for replacing the macro block of losing from a macro block of frame formerly, and the intermediate value of motion vector (MV) of using three nearest macro blocks among the GOB on the GOB that loses is as motion vector.If top GOB also lost (perhaps nearest MB all be interior coding thereby do not have a motion vector), then use complete zero (0,0) MV, and use from the macro block that is in common location (co-located) of frame formerly and come the macro block of place of lost.

By f _nThe frame n of expression original video signal, it is compressed and is reconstructed at the encoder place Decoded (after may being error concealing) reconstruct of the frame n of receiver place is used

Expression.Encoder is not also known

And it is treated as stochastic variable.

If

The initial value of pixel i among the expression frame n, and establish

Represent its encoder reconstruct.Value after the reconstruct at decoder place (may after hiding error) usefulness

Expression.The expection distortion of pixel i is:

d_{n}^{i} = E {{(f_{n}^{i} - {\tilde{f}}_{n}^{i})}^{2}} = {(f_{n}^{i})}^{2} - 2 f_{n}^{i} E {{\tilde{f}}_{n}^{i}} + E {{({\tilde{f}}_{n}^{i})}^{2}}

Calculating need estimative image sequence

The first moment and the second moment of stochastic variable.For calculating these values, recursive function has been proposed in ROPE, wherein be necessary separately the situation of the MB of an interior coding and a coding.

For the MB of interior coding, under the probability of the 1-p that divides into groups corresponding to correct reception

{\tilde{f}}_{n}^{i} = {\hat{f}}_{n}^{i} .

If packet loss, but formerly GOB is correct then causes decoder that the pixel i in the present frame is associated with pixel k in the frame formerly based on intermediate value motion vector hidden.Like this, under the probability of p (1-p)

{\tilde{f}}_{n}^{i} = {\tilde{f}}_{n - 1}^{k} .

At last, if GOB current and formerly divides into groups to have lost, then

{\tilde{f}}_{n}^{i} = {\tilde{f}}_{n - 1}^{i}

(probability of happening is p ²).Then two of pixel squares are in the MB of interior coding:

E {{\tilde{f}}_{n}^{i}} = (1 - p) ({\hat{f}}_{n}^{i}) + p (1 - p) E {{\tilde{f}}_{n - 1}^{k}} + p^{2} E {{\tilde{f}}_{n - 1}^{i}} - - - (2)

E {{({\tilde{f}}_{n}^{i})}^{2}} = (1 - p) {({\hat{f}}_{n}^{i})}^{2} + p (1 - p) E {{({\tilde{f}}_{n - 1}^{k})}^{2}} + p^{2} E {{({\tilde{f}}_{n - 1}^{i})}^{2}} - - - (3)

For the MB of a coding, the real motion vector of supposing it is to make pixel j predict pixel i from frame formerly.Then, the encoder prediction of this pixel is Predicated error Be compressed, quantification surplus (quantized residue) is

Encoder is reconstructed into:

{\hat{f}}_{n}^{i} = {\hat{f}}_{n - 1}^{j} + {\hat{e}}_{n}^{i} - - - (4)

Encoder transmits

Motion vector with MB.If correctly receive grouping, then decoder has been known

And MV, but still must use himself reconstruct to the pixel j in the frame formerly

And this reconstruct may be different from encoder values

Like this, following the providing of decoder reconstruct of pixel i:

{\tilde{f}}_{n}^{i} = {\tilde{f}}_{n - 1}^{j} + {\tilde{e}}_{n}^{i} - - - (5)

Encoder once more will

As the stochastic variable modeling.MB to the deviation of back two kinds of situation squares and interior coding is similar, but then different for the first kind of situation that does not wherein transmit error (probability 1-p).Then for the pixel among the MB of a coding

First moment and second moment is following provides:

E {{\tilde{f}}_{n}^{i}} = (1 - p) ({\hat{e}}_{n}^{i} + E {{\tilde{f}}_{n - 1}^{j}}) + p (1 - p) E {{\tilde{f}}_{n - 1}^{k}} + p^{2} E {{\tilde{f}}_{n - 1}^{i}} - - - (6)

E {{({\tilde{f}}_{n}^{i})}^{2}} = (1 - p) ({({\hat{e}}_{n}^{i})}^{2} + 2 {\hat{e}}_{n}^{i} E ({\tilde{f}}_{n - 1}^{j}) + E {{({\tilde{f}}_{n - 1}^{j})}^{2}})

(7)

+ p (1 - p) E {{({\tilde{f}}_{n - 1}^{k})}^{2}} + p^{2} E {{({\tilde{f}}_{n - 1}^{i})}^{2}}

Carry out these recurrence so that calculate the distortion of decoder place expection at the encoder place.Encoder can adopt this result in its coding decision, optimally to choose the coding mode that is used for each MB.

ROPE considers because compression and transmit expection distortion that error caused to carry out optimum mode switch.Encoder serves as that switch between an interior coding or a coding on the basis for given bit rate and packet loss rate with the macro block with optimum way.Target is to minimize overall distortion D under bit rate constraint R.By using Lagrange multiplier λ (Lagrange multiplier λ), the overall cost J=D+ of ROPE algorithmic minimizing λ R.Each MB adds up to the contribution of this cost, thereby can be that the basis minimizes it with the macro block.Thereby, by the following coding mode of choosing each MB that minimizes

\min_{\mod e} J_{MB} = \min_{\mod e} (D_{MB} + {λR}_{MB}) - - - (8)

Wherein, the distortion D of MB _MBIt is the summation of the distortion contribution of each pixel.By show " A Stable Feedback Control of the Butter State Using the ControlledLagrange Multiplier Method " as J.Choi and D.Park, IEEE Trans.Image Proc., vol.3, pp.546-58, September revises λ like that in 1994 and realizes rate controlled.

Embodiments of the invention use the optimization model under two frame buffers and the rate-distortion framework to switch.The basic function of two frame buffers is as follows.When frame n was encoded, encoder all kept two reference frames in memory.Short-term reference frame is frame n-1.Long term reference frame is for example frame n-k, and wherein k can be variable but always greater than 1.How description now chooses long term reference frame.

Jump to upgrade in the method for (jump updating) a kind of being referred to as, long term reference frame is changing in the scope of the oldest frame n-N-1 from latest frame n-2.When frame n is encoded, if long term reference frame is n-N-1, then continue to when frame n+1 encoded when encoder, short-term reference frame will be advanced 1 forward to frame n, and long term reference frame will be to skip before N to frame n-1.Long term reference frame will keep static to N frame then, and jump forward again then.N is called the jump undated parameter.

Thereby other method is to upgrade the long-term frame buffer of long-term frame buffer continuously to comprise the frame of distance when anterior bumper set time distance.Thereby this buffer always comprises the n-D frame concerning each frame n.D is called continuous undated parameter.

Noticing that jump is upgraded and continuous renewal can be regarded more generally (N as, D) special case of update strategy, at (N, D) in the update strategy, thereby long term reference frame jumps forward N become with present frame to be encoded backward distance be the frame of D, and then N frame kept static, and then jump forward.(N D) upgrades, and frame k can make the LT frame as frame k-D or early frame k-N-D+1 recently for general.In the jump of definition is upgraded, can freely select N to each sequence, and D=2, (meaning is that N becomes frame n-2 thereby the LT frame jumps forward when kainogenesis more).In upgrading continuously, can freely select D for each sequence, and N is fixed as 1.The most general update strategy will be less than fixing N or D; But when needed long-term frame buffer is updated to the most useful any frame at random.In another kind of situation, (N D) is maintained fixed to a sequential coding time.Describe now and how in coding mode, to make a choice.

Each macro block can be encoded with one of three kinds of coding modes: interior coding, use short term buffer between coding (inter-ST coding), use long-term buffer between encode (inter-LT coding).To use the ROPE algorithm of revising in these three kinds of coding modes, to make a choice.In case selected coding mode, then the grammer that bit stream is encoded just almost has been equal to the normal conditions of single frame buffer.Unique correction is, if coding between having chosen then will send individual bit and use short-term or long-term frame with indication.Be described in now in the coding mode and how make a choice.As preceding, use f respectively _n,

Represent the encoder reconstruct of initial frame n, condensed frame and the decoder version of frame.Suppose that long-term frame buffer upgraded before m frame.Like this, it comprises at the transmitter place

And comprise at the receiver place

The expection distortion of pixel i among the frame n is provided by formula 1.

For the square in the computing formula 1, the recursion step of the pixel of the MB of coding and inter-ST coding and the corresponding steps in the conventional ROPE algorithm are equal in being used for.For the pixel in the MB of inter-LT coding, the real motion vector of supposing MB is to make the motion vector of the pixel i among the pixel j predictive frame n from frame n-m (wherein m〉1).The encoder prediction of this pixel is

Compressed prediction error And quantizing surplus uses

Expression.The encoder reconstruct of pixel is:

{\tilde{f}}_{n}^{i} = {\hat{e}}_{n}^{i} + {\tilde{f}}_{n - m}^{j} - - - (9)

Because receiver can't be visited

So its use

{\hat{f}}_{n}^{i} = {\hat{e}}_{n}^{i} + {\hat{f}}_{n - m}^{j} - - - (10)

When MB loses, calculate the intermediate value motion vector of three nearest MB, and this intermediate value motion vector is used for the pixel i of present frame is associated with pixel k in the frame formerly.Use with original ROPE algorithm in identical parameter, calculate the first moment and the second moment of the pixel in the MB that inter-LT encodes

E {{\tilde{f}}_{n}^{i}} = (1 - p) ({\hat{e}}_{n}^{i} + E {{\tilde{f}}_{n - m}^{j}}) + p (1 - p) E {{\tilde{f}}_{n - 1}^{k}} + p^{2} E {{\tilde{f}}_{n - 1}^{i}} - - - (11)

E {{({\tilde{f}}_{n}^{i})}^{2}} = (1 - p) ({({\hat{e}}_{n}^{i})}^{2} + 2 {\hat{e}}_{n}^{i} E ({\tilde{f}}_{n - m}^{j}) + E {{({\tilde{f}}_{n - m}^{j})}^{2}})

(12)

+ p (1 - p) E {{({\tilde{f}}_{n - 1}^{k})}^{2}} + p^{2} E {{({\tilde{f}}_{n - 1}^{i})}^{2}}

Notice and remain priority of use frame n-1 rather than long-term frame is finished error concealing.Carry out this error concealing and no matter three top MB are inter-ST coding or the inter-LT coding or both certain combinations.Motion vector can be highly incoherent.If top GOB has also lost, then use the piece that is in common location (co-located) to come hidden MB from frame formerly.

The existence of contiguous uncorrelated motion vector produces negative influence to the motion vector code efficiency.Because the incorrect prediction to motion vector from their contiguous motion vectors causes existing bit rate loss (bit rate loss).In addition, so because all use a bit to come the designated frame buffer to reduce compression efficiency for the MB of each coding.However, experimental result shows that rate-distortion optimization is to these extra bit modelings and still can access superior compression performance.

Another correction of conventional ROPE comprises the income (benefit) of extending half-pix (half-pel) motion vector (perhaps other minute pixel (fractional pixel) motion vector), and this is because it produces too high punishment in ROPE still to avoid fully accurate half-pix modeling or other mark modeling.Suppose only to use the integer part of motion vector still to realize error concealing (EC), and thereby do not change be used in the

formula

2 and 3 of MB of coding.Turn back to the

formula

6 and 7 of the MB of coding between being used for, observe item With

Remain unchanged.Yet,

With

Calculating become very crucial.Pixel coordinate j points to now and has covered 4 times of positions in the interpolation grid of the area of original image.

For this calculating, distinguish three types pixel on the half-pix grid: the pixel (be called integral indices pixel, they do not need interpolation) consistent with actual (initially) location of pixels, (flatly or vertically) are positioned at pixel in the middle of two integral indices pixels, are positioned at the pixel of four integral indices pixel diagonal centres.Use bilinear interpolation, thereby the value after the interpolation is the average of two or four neighbour's integral indices pixels simply.For the integral indices pixel, being equal to of recursion equation and conventional ROPE algorithm, and this estimation is optimum.

For the pixel after level or the vertical interpolation, suppose use pixel k in j and the original image prime field on the pixel domain after the interpolation ₁And k ₂Pixel after the interpolation is corresponding.First moment is tractable on calculating:

E {{\tilde{f}}_{n - 1}^{j}} = \frac{1}{2} [1 + E {{\tilde{f}}_{n - 1}^{k_{1}}} + E {{\tilde{f}}_{n - 1}^{k_{2}}}] - - - (13)

But the expression formula of second moment is:

E {{({\tilde{f}}_{n - 1}^{j})}^{2}} = \frac{1}{4} [1 + E {{\tilde{f}}_{n - 1}^{k_{1}}} + E {{({\tilde{f}}_{n - 1}^{k_{2}})}^{2}} + 2 E {{\tilde{f}}_{n - 1}^{k_{1}}} + 2 E {{\tilde{f}}_{n - 1}^{k_{2}}} + 2 E {{\tilde{f}}_{n - 1}^{k_{1}} {\tilde{f}}_{n - 1}^{k_{2}}}] - - - (14)

Last require to calculate matrix relevant that its horizontal/vertical dimension equals number of pixels in the image.This can or utilize abundant computational resource to implement on undersized image.But, using the image of typical computational resource for typical sizes, this is infeasible on calculating.Preferably using the cosine inequality to get it is similar to:

E {{({\tilde{f}}_{n - 1}^{j})}^{2}} \leq \frac{1}{4} [1 + E {{({\tilde{f}}_{n - 1}^{k_{1}})}^{2}} + E {{({\tilde{f}}_{n - 1}^{k_{2}})}^{2}}

(15)

+ 2 E {{\tilde{f}}_{n - 1}^{k_{1}}} + 2 E {{\tilde{f}}_{n - 1}^{k_{2}}} + 2 \sqrt{E {{({\tilde{f}}_{n - 1}^{k_{1}})}^{2}} {E {({\tilde{f}}_{n - 1}^{k_{2}})}^{2}}}]

For pixel, suppose that the j on the pixel grid of interpolation is to the pixel k in the original image prime field through the diagonal angle interpolation ₁, k ₂, k ₃, k ₄The result of interpolation.First moment can accurately be calculated as follows:

E {{\tilde{f}}_{n - 1}^{j}} = \frac{1}{4} [2 + E {{\tilde{f}}_{n - 1}^{k_{1}}} + E {{\tilde{f}}_{n - 1}^{k_{2}}} + E {{\tilde{f}}_{n - 1}^{k_{3}}} + E {{\tilde{f}}_{n - 1}^{k_{4}}}] - - - (16)

The accurate expression of second moment is as follows:

E {{({\tilde{f}}_{n - 1}^{j})}^{2}} = \frac{1}{16} [4 + E {{({\tilde{f}}_{n - 1}^{k_{1}})}^{2}} + E {{({\tilde{f}}_{n - 1}^{k_{2}})}^{2}} + E {{({\tilde{f}}_{n - 1}^{k_{3}})}^{2}} + E {{({\tilde{f}}_{n - 1}^{k_{4}})}^{2}}

+ 4 E {{\tilde{f}}_{n - 1}^{k_{1}}} + 4 E {{\tilde{f}}_{n - 1}^{k_{2}}} + 4 E {{\tilde{f}}_{n - 1}^{k_{3}}^{2}} + 4 E {{\tilde{f}}_{n - 1}^{k_{4}}}

(17)

+ 2 E {{\tilde{f}}_{n - 1}^{k_{1}} {\tilde{f}}_{n - 1}^{k_{2}}} + 2 E {{\tilde{f}}_{n - 1}^{k_{1}} {\tilde{f}}_{n - 1}^{k_{3}}} + 2 E {{\tilde{f}}_{n - 1}^{k_{1}} {\tilde{f}}_{n - 1}^{k_{4}}}

+ 2 E {{\tilde{f}}_{n - 1}^{k_{2}} {\tilde{f}}_{n - 1}^{k_{3}}} + 2 E {{\tilde{f}}_{n - 1}^{k_{2}} {\tilde{f}}_{n - 1}^{k_{4}}} + 2 E {{\tilde{f}}_{n - 1}^{k_{3}} {\tilde{f}}_{n - 1}^{k_{4}}}]

Use as identical with the situation of horizontal/vertical approximate, obtain:

E {{({\tilde{f}}_{n - 1}^{j})}^{2}} \leq \frac{1}{16} [4 + E {{({\tilde{f}}_{n - 1}^{k_{1}})}^{2}} + E {{({\tilde{f}}_{n - 1}^{k_{2}})}^{2}} + E {{({\tilde{f}}_{n - 1}^{k_{3}})}^{2}} + E {{({\tilde{f}}_{n - 1}^{k_{4}})}^{2}}

+ 4 E {{\tilde{f}}_{n - 1}^{k_{1}}} + 4 E {{\tilde{f}}_{n - 1}^{k_{2}}} + 4 E {{\tilde{f}}_{n - 1}^{k_{3}}} + 4 E {{\tilde{f}}_{n - 1}^{k_{4}}}

+ 2 \sqrt{E {{({\tilde{f}}_{n - 1}^{k_{1}})}^{2}} E {{({\tilde{f}}_{n - 1}^{k_{2}})}^{2}}} + 2 \sqrt{E {{({\tilde{f}}_{n - 1}^{k_{1}})}^{2}} E {{({\tilde{f}}_{n - 1}^{k_{3}})}^{2}}} - - - (18)

+ 2 \sqrt{E {{({\tilde{f}}_{n - 1}^{k_{1}})}^{2}} E {{({\tilde{f}}_{n - 1}^{k_{4}})}^{2}}} + 2 \sqrt{E {{({\tilde{f}}_{n - 1}^{k_{2}})}^{2}} E {{({\tilde{f}}_{n - 1}^{k_{3}})}^{2}}}

+ 2 \sqrt{E {{({\tilde{f}}_{n - 1}^{k_{2}})}^{2}} E {{({\tilde{f}}_{n - 1}^{k_{4}})}^{2}}} + 2 \sqrt{E {{({\tilde{f}}_{n - 1}^{k_{3}})}^{2}} E {{({\tilde{f}}_{n - 1}^{k_{4}})}^{2}}}

And use this upper limit to be similar to second moment.

One embodiment of the present of invention are H.263+ Video Codecs of revising.This is process checking in experiment.Revised existing H.263+ Video Codec in two ways.(single-frame SF) under the situation of motion compensation, uses the ROPE algorithm to be used for the mode switch decision at single frames.Result's bit stream meets H.263+ standard fully.Secondly, revise H.263+ Video Codec to utilize extra (for a long time) frame buffer.As in the article of above-cited Fukuhara etc., be updated periodically this frame buffer according to undated parameter N.For single frames and two two kinds of situations of frame, measure the performance of integer and half-pixel motion vector.The half-pix vector be used and use provide above be similar to modeling.

Experiment shows, the increase of long-term frame buffer has improved the compression efficiency of encoder and made bit stream to packet loss robust more.Yet, very small for this effect of some sequence, and it depends on undated parameter N.Fixing N is not optimum for all sequences.Optimization comprises for particular sequence chooses specific undated parameter N.

Multiframe coding embodiment of the present invention is applied to the speed handover network.Can use long-term or high-quality reference frame.For example, consider that wireless network often experiences the remarkable transition on the network capacity.Network when the example incident that produces this transition is to use such as the best service that connects (ABC) method all the time switches (handoff).Under this situation, use two frame codings of long-term or top-quality frames can significantly improve the quality of the frame that after and then network switches, transmits and the precipitous and violent transition in the network capacity is played smoothing effect.

In a preferred embodiment, specifying long-term past frame is just at the last frame from high bandwidth coding before the network of low bandwidth model switches.Among Fig. 2 this is illustrated, the transition that connects from 10Mbps to 10-20Kbps takes place in Fig. 2.Be used to come present frame n is encoded from the long-term frame n-D that two-forty connects.For each macro block (MB) in the predictive frame, not only carry out search, and choose the piece of better coupling at one (or a plurality of) next-door neighbour's past frame but also on one (or a plurality of) long-term past frame.

Consider the practical embodiments of correction now, the method for Fig. 2 is applied to abide by the abc network of MPEG-4 coding operation.The video encoder that moves under the service as ABC must be to the bandwidth change of a plurality of orders of magnitude very robust.Suppose that abc network provides packet delivery timely with losing of minimum.In order to resist the cataclysm in the bandwidth, suppose that the quantization parameter of each frame can go up change in its gamut (1-31), this and 25% the standard compliant encoder that the change of quantization parameter value is defined in value formerly formed contrast.

By the MPEG-4 code device of revising standard the multiframe bandwidth is switched embodiment and carry out emulation.The MPEG-4 code device uses A.Vetro, H.Sun, Y.Wang, " MPEG-4 Rate Control forMultiple Video Objects ", IEEE Transactions on Circuits and Systems for VideoTechnology, Vol.9, No.1, method of rate control described in the pp.186-99, February 1999.Each frame is considered as the single object of MPEG-4 encoder.Be the outer memory of long-term frame allocation.To the MB of each coding transmit extra bit with the notice decoder its with reference to which frame.The interior refresh cycle, (intra refresh period) was set to 100.The refresh cycle has strengthened the performance of two frame encoders in reducing.But, refresh the bit rate that causes the higher bit rate that will can use above GPRS (General Packet Radio Service, general packet radio service) system in frequent.

Tested dissimilar video sequence with static and dynamic background and scene change.The form of cycle tests is QCIF (quarter common intermediate formate, 1/4th public intermediate image forms).Frame rate was 10 frame/seconds.For investigation switches to the influence of different low-bandwidth network, emulation from 1Mbps switch to scope from 10Kbps (GPRS) to 150Kbps the low-bandwidth network of (1xRTT CDMA).By using two frame buffer encoder and being more also by using conventional MPEG-4 encoder that various sequences are encoded.Find, as by decoding back sequence PSNR quantized, keep top-quality frames and cause with long-term past frame and obtained switching to the better video quality up to hundreds of frame of lower bandwidth after connecting as two frame encoders.This technology needs less cost (not only at the encoder place but also at the decoder place) aspect the memory that keeps two reference frames, and seeks at search second reference frame on the encoder complexity of best matching blocks and need less cost.

In some instances, can be such as the imminent switching of predicting bandwidth by early warning or prediction.This provides the opportunity of when higher bandwidth long-term frame being encoded.

In another embodiment, periodically or occasionally when not changing on bandwidth or channel/network condition (even) gets long-term frame, and the outer bandwidth of allocation is to produce as the top-quality frames of frame for a long time.This is the active method that a kind of occasionally or periodically (for example every N frame) obtains the high-quality long term reference frame.

Thereby it will have than the more excellent quality of conventional quality that just is being encoded the outer bit of allocation to produce the high-quality reference frame.This long-term high-quality reference frame can be used as the source of high-quality match block during the time period of expansion then.In one example, the time period of expansion can surpass N frame if situation occurred changes.

The distribution of extra bit can be from scheduler.For example, suppose a plurality of users of existence use by the scheduler distributed system resources.The system that is considered is the wireless medium shared as HDR (high data rate) system of the speed with 400Kbps.Scheduler S ₁Five equilibrium comes the available bandwidth B of automatic network between k user.Each user has the bandwidth of B/k.Scheduler S ₂Keep certain part b of total bandwidth, and among all users, divide remaining.The part b that is kept will distribute to each user in turn, and between the user loop cycle.Each user has bandwidth (B-b)/k during k-1 time slot (time slot), and has bandwidth b+ (B-b)/k during a time slot.Fig. 3 A and 3B represent two schedulers, and wherein trunnion axis is the time with highly representing bandwidth.It is for the scheduler S of Fig. 3 A ₁Be uniformly, but then very high for user i in Fig. 3 B, because S ₂In scheduler extra bandwidth is provided.This thick secret note of distributing to the extra bandwidth of user i with representative shows.Be assigned to other k-1 user in the system in this extra bandwidth of different time.Putting the place at any time has the average bandwidth on the user to keep identical.By S ₂The extra periodicity bandwidth of distributing is used to create top-quality frames, and this top-quality frames is used as long-term frame in for example two frame motion compensated schemes of multiframe.

Another embodiment does not use scheduler, but the user who allows to have fixing average bandwidth for a frame use extra bit (for before and frame afterwards use less bit), if a certain amount of extra delay of tolerable in the video.The extra bit that is used for this frame changes into the more time that is used to transmit this frame simply.There is the compromise of compression performance and delay.In scheduler embodiment, so can make owing to periodically provide extra bandwidth not cause extra delay by the operation scheduling device for top-quality frames.Yet, can be the S that equals uniform distribution with the overall average bandwidth constraint ₁The bandwidth of system.

The correction MPEG-4 encoder of specific embodiment has selectable three kinds of coding modes: interior coding, use short-term reference frame between coding and use long term reference frame between coding.For each macro block (MB), at first among three kinds of coding modes, make a choice making a choice between an interior coding and the coding to come by following.Distortion d between optimum Match MB in calculating short-term frame or the long-term frame and the current MB to be encoded _MinCalculate the standard deviation of current MB.If σ＜d _MinChoose interior coding for-512, otherwise coding between choosing.Distortion between MB after selection between according to short-term and long-term top-quality frames between the coding is based on motion compensation and the current MB to be encoded is made.Choose the reference frame that obtains than low distortion.

Another embodiment comprises the feedback of use from decoder.As example, encoder can receive the feedback of indication to the affirmation of the grouping that receives from decoder.If i is the index of present frame.Use has the feedback of fixed delay d, and encoder can have good understanding to the frame of (i-d) individual reconstruct of decoder.To use term decode again (re-decode) describe encoder and use feedback information that past frame is decoded to make it be equal to the processing of the decoder version of this frame.Because which GOB encoder is known and is received and which has been lost by intact, so the accurately operation that comprises error concealing of analog decoder of encoder.Through the frame of decoding again is the frame that is equal to decoder version at the encoder place, however use term estimate to be described in the encoder place for the still unavailable frame of its feedback information, thereby force encoder to estimate decoder version.Utilize feedback information, the formula that is used for the MB of interior coding, inter-ST coding, inter-LT coding above the use still carries out the estimation of intermediate frame (intermediate frame) pixel value.Yet, now about the required past decoder frame of these formula information can the frame through decoding again of ACK/NACK reinitializes (reinitialize) by using.Encoder can recomputate the pixel estimation more reliably and follow the tracks of latent fault for last d frame then.In actual prediction surplus (residual) or interior coefficient (intra coefficients) input ROPE algorithm for estimating, wherein the ROPE that obtains of reference frame or same recursive calculation estimates or through the frame of decoding again.

Illustrate an example among Fig. 4.Here, jump undated parameter and feedback delay are respectively N=2 and d=5.Jump undated parameter N=2 means that frame 0 will be the long term reference that is used for frame 2 and frame 3, and frame 2 will be the long term reference of frame 4 and frame 5, and frame 4 will be used for frame 6 and frame 7.

Because d=5, when then beginning frame 7 codings, frame 2 will be decoded again, and this frame of decoding again recently will promptly be used to upgrade the estimation of

frame

3,4,5,6.About frame 7 is encoded, long-term frame is a frame 4, and the short-term frame is a frame 6, and the new estimation of these two frames will will be used for calculating by encoder because the expection distortion that packet loss caused of relevant frame 7.

A kind of alternative method is to make long-term frame buffer advance to comprise i.e. (i-d) frame of the nearest frame of accurately knowing.Feedback information has improved the estimation of ST frame, and the evaluated error of LT frame is reduced to zero.Both always comprise the reconstruct that is equal to guarantee the long-term frame buffer of encoder.Postponing can to use D=d and N under the situation of d〉1 general (N, D) update strategy, the perhaps continuous update strategy of D=d and N=1.N=2 has been described, the example of d=5 among Fig. 5.In Fig. 5, frame 12 is being encoded.Its LT frame is a frame 7, and frame 7 has also been decoded again.Yet, to frame 7 decode again need frame 1 and frame 6 through again the decoding version, promptly it divides other ST frame and LT frame.Can obtain the estimation of

frame

8,9,10,11 now.For frame 8, need through decoding again 7 and decode again 3.For 9, need be through 8 (ST) and 3 (LT) of estimation through decoding again.For 10, need be through estimating 9 and through 5 of decoding again.Similarly, 11 need be through estimating 10 and through 5 of decoding again.

Synchronous by the long-term frame buffer that makes reflector and receiver place, can eliminate the drift error (drift error) that causes owing to the packet loss accumulation fully.If the macro block through the Inter-LT coding arrives, then they will be at the encoder place by the mode reconstruct to be equal to.Usually, this only guarantees by the macro block of coding in transmitting.But here, feedback signal makes it possible to use long-term frame buffer can not sacrifice compression efficiency greatly as the extra error robust factor.

Replace to use feedback only improve that distortion is estimated and and then improved mode select, thereby can also use this information to come at the encoder place LT frame to be decoded again and improve estimation now, and the more real reference frame of use.Experiment shows that it all shows well under various conditions.

Though illustrated and described specific embodiment of the present invention, should be appreciated that other modification, replacement and change are very clear to those of ordinary skills.Under the situation that does not deviate from the spirit and scope of the present invention that be indicated in the appended claims, can carry out such modification, replacement and change.

Set forth various feature of the present invention in the appended claims.

Claims

1. video encoder comprises:

Short-term reference block buffer is configured to store at least one short-term reference block;

At least one long term reference block buffer is configured to store at least one long term reference piece;

Code device, be used for vector is encoded to describe at least one image block with respect at least one reference block, described code device is configured to when predicting will change to than low quality by the employed connection of described video encoder the time, based on one or more factors of when encoding, being checked come for each piece that just is being encoded in described short-term reference block buffer described at least one short-term reference block and described at least one the long term reference piece in described long term reference block buffer between select, described factor comprises one or more in the following factor: described encoder is to the expection of the distortion at decoder place; The number of the frame buffer in the encoder; The size of the frame buffer in the encoder; Feedback from decoder; Change the history of data channel quality; Change the history of image-region quality.

2. encoder as claimed in claim 1, wherein, the described code device that is used to encode is configured to optionally select described at least one long term reference piece to come the background data coding and optionally selects described short-term reference block to come foreground data is encoded.

3. encoder as claimed in claim 2, wherein, described short-term reference block comprises the reference block in a next-door neighbour's past.

4. encoder as claimed in claim 3, wherein, described at least one long term reference piece comprises at least one just piece before the reference block in past of described next-door neighbour.

5. encoder as claimed in claim 1, wherein, described one or more factors of checking in coding also are used to determine when described at least one the long term reference block buffer of renewal.

6. encoder as claimed in claim 1, wherein, described encoder comprises a plurality of long term reference block buffers.

7. encoder as claimed in claim 6, wherein, described short-term reference block comprises the reference block in next-door neighbour's past.

8. encoder as claimed in claim 1, wherein, described code device is configured to carry out the precision encoding of branch pixel by determining that following content is next for described at least one long term reference piece on minute pixel grid:

Comprise initial pixel locations with the pixel of actual pixels position consistency;

The location of pixels that comprises the level that comes across two pixels between the initial pixel locations or vertical interpolation; And

The location of pixels of diagonal interpolation.

9. encoder as claimed in claim 8, wherein:

The first moment of the location of pixels of direct calculated level or vertical interpolation and the location of pixels of diagonal interpolation; And

The second moment of the location of pixels of estimation level or vertical interpolation and the location of pixels of diagonal interpolation.

10. encoder as claimed in claim 1, wherein, described code device is configured to selecting between coding and the interior coding between two types, coding comprises coding that uses described at least one short-term reference block (ST) and the coding that uses described at least one long term reference piece (LT) between described two types, and wherein:

Described code device be configured to by optimum each pixel of recurrence of using the element of piece formerly to be considered as stochastic variable estimate to calculate be used in the square of coding and ST piece; And

Described code device is configured to by using optimum each pixel of the recurrence of revising to estimate to calculate the square that is used for the LT piece, and wherein formerly the element of piece is regarded stochastic variable as.

11. encoder as claimed in claim 10 wherein, upgrades described LT piece and described code device and is configured to the Rcv decoder feedback and uses this to feed back determine when the described LT piece of renewal.

12. encoder as claimed in claim 11, wherein, described code device is configured to use described feedback to come synchronous described long term reference block buffer.

13. encoder as claimed in claim 1, wherein, described at least one long term reference piece comprises that long term reference frame and described code device are to come frame is encoded in the basis with the block-by-block.