CN101313582A

CN101313582A - Encoder assisted frame rate up conversion using various motion models

Info

Publication number: CN101313582A
Application number: CN 200680043307
Authority: CN
Inventors: 石方; 塞伊富拉·哈立德·奥古兹; 苏密特·辛格·塞蒂; 维贾雅拉克希米·R·拉韦恩德拉恩
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2005-09-27
Filing date: 2006-09-27
Publication date: 2008-11-26

Abstract

An Encoder Assisted Frame Rate Up Conversion (EA-FRUC) system that utilizes various motion models, such as affine models, in addition to video coding and pre-processing operations at the video encoder to exploit the FRUC processing that will occur in the decoder in order to improve the modeling of moving objects, compression efficiency and reconstructed video quality. Furthermore, objects are identified in a way that reduces the amount of information necessary for encoding to render the objects on the decoder device.

Description

Use the encoder assisted frame rate up conversion of various motion models

The cross reference of related application

Present application for patent is advocated being entitled as of application on September 27th, 2005 (a) the 60/721st of " utilizing the method (A METHOD OF ENCODER ASSISTED FRAME RATE " UPCONVERSION WITH DIFFERENT MOTION MODELS) of the encoder assisted frame rate up conversion that the different motion model carries out "; (b) the 60/721st that is entitled as " method and apparatus (A METHOD AND APPARATUS FOR ENCODER ASSISTED FRAME RATEUP-CONVERSION) that is used for the encoder assisted frame rate up conversion " of No. 375 provisional application cases and application on September 27th, 2005; priority of No. 376 provisional application cases, both clearly are incorporated herein by reference at this for it.

Technical field

The present invention is directed to a kind of method and apparatus that is used for coding video frequency data.

Background technology

Nowadays the video format that has the various frame rate of multiple support.Following form is current most popular, enumerates by the order of its number of pictures per second of supporting (fps): 24 (film is original), 25 (PAL), 30 (being generally interlaced video) and 60 (high definition (HD), for example 720p).Although these frame rate are applicable to great majority and use that in order to reach the needed low bandwidth of mobile handset video communication, frame rate is reduced to the low speed that reaches 15,10,7.5 or 3 fps sometimes.Although the low-end devices that these low rates allow to have low computing capability shows some videos, the gained video quality suffers " jerking movement " (that is, having slide show effect), and is not smooth motion.And the frame of reduction can correctly not followed the trail of the amount of exercise in the video usually.For instance, during for example those occur in " high-speed motion " video content part in the competitive sports, less frame should be lost, and during for example those occur in " low-speed motion " video content segments in the talk show, more frame can be lost.Video compression depends on content, and needs it can analyze and incorporate into motion and textural characteristics in the sequence to be encoded, so that improve video compression efficiency.

Frame rate up conversion (FRUC) is to use video interpolation to increase the process through the frame rate of reconstruction video at the Video Decoder place.In FRUC, set up interpolation frame as a reference by using the frame that is received.At present, the system of enforcement FRUC frame interpolation (hereinafter referred to as " interpolation frame ") comprises based on the method for motion compensated interpolation and the processing of transmitting moving vector.FRUC also is used for the conversion between the various video formats.For instance, use in (film of correcting the rate differences of color separately between film and the video is to the video tape transfer techniques) in telecine and reverse telecine, progressive (24 frame/second) converts NTSC interlaced video (29.97 frame/second) to.

Another FRUC method is used Weighted adaptive motion compensated interpolation (WAMCI), to reduce by estimation with based on the block artifacts that deficiency was caused of the processing of block.The method is based on the interpolation that a plurality of motion compensated interpolations (MCI) image is weighted summation.In proposed method, also the technology that is similar to overlay region block motion compensation (OBMC) by application class reduces the block artifacts on the block border.Specifically, chaotic in order to reduce during handling in the overlapping region, described method is used motion analysis to determine the type of block motion and is used OBMC adaptively.Experimental result shows that proposed method reached improved result, and block artifacts significantly reduces.

Another FRUC method uses vectorial fail-safe analysis to reduce the illusion that causes by using any motion vector that inaccurately transmits from encoder.In the method, estimation is used for the construction motion vector, and described motion vector and the motion vector that is transmitted are compared to be identified for the most desirable method of frame interpolation.Promote in the transfer algorithm in the routine of using estimation, will allow the motion vector of interpolation frame to carry out estimation procedure with construction by the frame that uses two contiguous decodings.Yet these algorithms attempt to improve the utilance of transmission bandwidth, and no matter the required amount of calculation of motion estimation operation.Comparatively speaking, in the lifting transfer algorithm of the motion vector that uses transmission, the quality of interpolation frame depends on the motion vector of being derived by encoder to a great extent.By using the combination of two kinds of methods, at first analyze the motion vector of transmission to determine whether it can be used for the construction interpolation frame.Then select to be used for the method for interpolation from following three kinds of methods adaptively: local motion compensated interpolation, global motion compensation interpolation and frame repeat interpolation.

Although the FRUC technology generally is implemented as the post-processing function in the Video Decoder, do not relate to video encoder usually in this operation.Yet, in the method that is called encoder assist type FRUC (EA-FRUC), encoder can determine whether can elimination and motion vector or reference frame (for example, the transmission of residual data) relevant customizing messages, the major part of the vector that still allows simultaneously decoder independently to produce again not have elimination or the frame of residual data.For instance, introduced bi-directional predicted video coding method as improvement to the coding of the B frame among the MPEG-2.In the method, propose to use error criterion so that in motion compensated predictive coding, can use the real motion vector.Distortion measurement is based on absolute difference summation (SAD), but known this distortion measurement is not enough to provide true distortion measurement, especially in will quantized sequences under the momental situation between two frames.In addition, (optimal cases) should preferably depend on content and when variable, come to classify to the variation in the threshold value by using fixed threshold when these threshold values along with classification.

The FRUC video compression technology comprises that those utilize encoder to strengthen the technology of information, uses the motion prediction based on block, and it has translational motion model to give the motion of objects modeling in frame of video.Based on the motion prediction of block developed vision signal intrinsic time correlation structure.As based on the motion prediction of block employed translational motion modeling can be keep rigid shape and simultaneously in the plane of the lens that more or less are parallel to video capture device the main body of experience translational motion reduce or eliminate time redundancy in the vision signal.Translational motion model uses two parameters in each encoded block.

In motion compensated prediction and hybrid video compression, according to the use of translational motion model and the divided video frame, wherein produce subregion so that the location keeps rigid shape and experiences the objective subject of translational motion simultaneously by conventional encoder based on transition coding.For instance, may be partitioned into following object through an out-of-date people facing to the video sequence of video camera speech at automobile, comprising: another object video of representing the representative sports car of the audio object of the sound that the rest image of the fixed background of described sequence, the object video of represent talking head, representative be associated with described people and the sprite that conduct has the rectangle support area.The position of described sprite on rest image can be moved in time.

Regrettably, every block can not be predicted or describe to translational model motion prediction and need the motion of objects in the motion of two above parameters exactly.The object of the self-movement that changes in conjunction with camera motion and focal length causes complicated motion vector field, must be similar to described motion vector field effectively to be used for motion prediction.Therefore, residue signal (being also referred to as prediction error) has sizable power, and the frame of video that therefore contains these motions is invalid to compressing.When using motion prediction based on block to come interpolation to contain the frame of video of these objects, limit dynamically owing to translational motion model frame description block motion, the subjectivity of interpolation frame and objective quality are all lower.In addition, when coming the divided video sequence according to translational model motion prediction, the validity of the algorithm of the interpolation of the object of processing experience arbitrary motion and distortion is restricted.

Need a kind of method that the high-quality interpolation frame is provided at the decoder device place, it suitably gives the motion object modeling, reduce the potential required amount of bandwidth of information that transmission is used to carry out interpolation simultaneously, and it also reduces the potential required amount of calculation of these frames of generation, so that make it well suited in relying on the Multimedia Mobile device that low-power is handled.

Summary of the invention

Some aspect disclosed herein provides a kind of encoder assisted frame rate up to change (EA-FRUC) system, it also utilizes various motion models to handle with the exploitation FRUC that will take place in decoder except utilizing video coding and pretreatment operation at the video encoder place, so that improve the modeling, compression efficiency of motion object and through the quality of reconstruction video.

In one aspect, disclose a kind of method of handling multi-medium data.Described method comprises: in first and second frame of video at least one is divided into a plurality of subregions; Determine modeling information at least one object in the described subregion at least one, described modeling information is associated with first and second frame of video; Produce interpolation frame based on described modeling information; And producing coded message based on described interpolation frame, wherein said coded message is used to produce in time the frame of video with the interpolation frame co.

In another aspect, disclose a kind of equipment that is used to handle multi-medium data.Described equipment comprises: be used at least one of first and second frame of video is divided into the device of a plurality of subregions; Be used at least one object in described a plurality of subregion at least one to determine the device of modeling information, described modeling information is associated with first and second frame of video; Be used for producing the device of interpolation frame based on described modeling information; And the device that is used for producing based on described interpolation frame coded message, wherein said coded message is used to produce in time the frame of video with the interpolation frame co.

In another aspect, disclose a kind of equipment that is used to handle multi-medium data.Described equipment comprises: cut apart module, it is configured in first and second frame of video at least one is divided into a plurality of subregions; MBM, it is configured to determine modeling information at least one object in described a plurality of subregions at least one that described modeling information is associated with first and second frame of video; The frame generation module, it is configured to produce interpolation frame based on described modeling information; Coding module, it is configured to produce coded message based on described interpolation frame; And transport module, it is configured to described coded message is transferred to decoder.

In aspect another, disclose a kind of machine-readable medium that comprises the instruction that is used to handle multi-medium data.Described instruction causes machine when carrying out: in first and second frame of video at least one is divided into a plurality of subregions; Determine modeling information at least one object in described a plurality of subregions at least one, described modeling information is associated with first and second frame of video; Produce interpolation frame based on described modeling information; And produce coded message based on described interpolation frame, wherein said coded message is used to produce in time the frame of video with described interpolation frame co.

In another aspect, disclose a kind of processor that is used to handle multi-medium data.Described processor is configured in first and second frame of video at least one is divided into a plurality of subregions; Determine modeling information at least one object in described a plurality of subregions at least one, described modeling information is associated with first and second frame of video; Produce interpolation frame based on described modeling information; And produce coded message based on described interpolation frame, wherein said coded message is used to produce in time the frame of video with described interpolation frame co.

The those skilled in the art will understand other purpose, feature and advantage from following detailed description.Yet, should be appreciated that unrestricted mode provides with explanation in the exemplary aspect of indication for detailed description and instantiation.Under the situation that does not depart from its spirit, can make a lot of variations and modification in the following description, and described description should be interpreted as and comprise all such modifications.

Description of drawings

Figure 1A illustrates the example of communication system that uses encoder assisted frame rate up conversion (EA-FRUC) system of various motion models according to the enforcement of an aspect that is used to transmit the crossfire video.

Figure 1B illustrates the example of EA-FRUC device that is configured to use various motion models according to an aspect that is used to transmit the crossfire video.

Fig. 2 is the flow chart of operation of EA-FRUC system that is configured to use various motion models of explanation Figure 1A.

Fig. 3 is for illustrating by using object-based modeling information and DECODER information to come coding video frequency data to be used to promote the flow chart of sampling.

Fig. 4 determines the flow chart of modeling information for the explanation object in the frame of video according to an aspect of the present invention.

Fig. 5 is for illustrating by using affine model to come to determine as the object in the frame of video flow chart of motion vector erosion information.

Fig. 6 decodes by using the flow chart of object-based modeling information and the DECODER information encoded video data bitstream through promoting sampling for the decoder device that use is configured to decoding moving model in the translational motion model framework that passes through according to some aspect of the present invention.

Embodiment

As described herein, aspect of encoder assist type FRUC (EA-FRUC) system in, encoder can the access originator frame and decoder on the prior knowledge of employed FRUC algorithm.Encoder further is configured to use various motion models (comprising translational motion model) to give the motion object modeling in the frame of source exactly.Use the encoder transmitting additional information of consequent interpolation frame to carry out FRUC and improve the decision-making of having done during the interpolation with aided decoder.The knowledge that utilization will be carried out in decoder about FRUC, EA-FRUC system utilize various motion models, video coding and pretreatment operation to improve compression efficiency (and then improve transmission bandwidth utilization) and through the quality (comprising through rebuilding the expression of motion object) of reconstruction video at the video encoder place.Specifically, can replenish from the various motion model information (for example affine motion modeling) of encoder or replace being provided to decoder usually by the encoder information transmitted, so that motion modeling information can be used for encoder assist type FRUC.

In one aspect, the information that is provided by encoder for example comprises will be in the space of the image of decoder place interpolation (for example, become more meticulous, mode decision, neighbor's feature) and the parameter of time (for example, motion vector decision-making) feature and about the differential information of normal prediction (B or P) frame coding with the interpolation frame that is produced by the FRUC process.The information that is provided by encoder further comprises various motion models, and described motion model is through selecting accurately and effectively to represent the motion object from original video stream.

Some motion estimation technique also can be used for video compression except that being used for translational motion.Additional motion types comprises: rotatablely move; Amplify and dwindle motion; Distortion, wherein the hypothesis of rigid bodies is run counter in the structure of scenario objects and the variation in the form; Affine motion; Global motion; With object-based motion.Affine motion model is supported a plurality of type of sports, comprises translational motion, rotatablely moves, shearing, translation, distortion and amplifying and dwindling the object convergent-divergent that uses in the sight.Compare with translation model, affine motion model is more general, because it is incorporated into these other type of sports is arranged.Because rotation, convergent-divergent and shearing, affine motion model uses six parameters in each encoded block.Therefore, its actual dynamic motion for the object in the scene has higher adaptability.

Object-based motion estimation technique is used to contain the frame of video of scene of a plurality of objects of experience different motion type.In these cases, single motion model can not be captured Different Dynamic effectively, but as an alternative, can use a large amount of models, wherein customize indivedual models distinctively at each object in the scene.

The characteristic of some aspects assessment decoder device of the encoder apparatus that this paper discussed, described decoder device will be used to decode by the encoder apparatus coded data and optimize the coding of video data and reproduce so that improve compression efficiency, performance and the object at decoder device place when the interpolation frame.For instance, decoder device can improve FRUC or hiding error.In one aspect, frame of video is divided into the regional ensemble that has non-homogeneous size and non-homogeneous shape usually based on behavior, time change dynamics or object that can unique identification.According to some aspect, encoder apparatus analyzes video data (in the fragment of variable duration) so that the location global motion.In the place that is positioned with global motion, estimate correlation model parameter and signal by using various motion models (for example affine motion model).Then can set up affine motion model, it describes translation, rotation, convergent-divergent and the metamorphosis conversion of each object or subregion.Then, can use partition information to produce prediction signal together with the model that is associated, described prediction signal can reduce the power of residue signal.Partition map is transferred to decoder device together with the model that is associated (comprising type and parameter information).Residue signal can be compressed and also can be sent to decoder device separately to allow higher-quality reconstruction.In certain aspects, decoder device then can be analyzed encoded data by using about the information of the encoded motion model in revising the translational motion model framework.

The process that is used for identifying object is described in some aspect, and it has significantly reduced the coding information necessary amount that is used for reproducing object on decoder device.In some aspects in aspect those, by using image segmentation, forming the foreground object that information is discerned a background object and any number based on the technology or the scene of figure.Then background object is classified.In case the object-based scene analysis that comprises above-mentioned two steps is carried out and finished to sub-fragment or whole video sequence to video sequence, the evolution of each object and its dynamic behaviour can come accurate description by suitable motion-distorted pattern.For instance, for the object of experience homogeneous translational motion, whole track can be described simply by motion vector (with respect to the nominal inter-frame duration and standardization).This information in conjunction with the vision data of the single snapshot of this object can be used for this object correctly is reproduced on the decoder device, till object shifts out scene, or till in its motion or perceptual property some change.The variation of one in motion of objects or the perceptual property can be used for the minimum non-homogeneous time sampling pattern of identifying object.In a similar manner, can determine movement locus that may be quite complicated and contain attribute at the previous object of discerning in the scene.

In following description content, provide detail so that the complete understanding to described aspect to be provided.Yet, be understood by those skilled in the art that, can under the situation that does not have these details, put into practice described aspect.For instance, can in block diagram, show electric assembly, so that do not obscure described aspect with unnecessary details.In other cases, but these assemblies of detail display, other structure and technology with the described aspect of further explanation.

Notice that also described aspect can be described to a process, described process is depicted as flow table, flow chart, structure chart or block diagram.Although flow table can be described as continuous process with operation, can carry out a lot of operations in the described operation parallel or simultaneously, and can repeat described process.In addition, can rearrange the order of described operation.Process stops when its operation is finished.Process can be corresponding to method, function, program, conventional program, subprogram etc.When process during corresponding to function, its termination turns back to call function or principal function corresponding to described function.

Figure 1A illustrates the example of communication system that uses encoder assisted frame rate up conversion (EA-FRUC) system of various motion models according to the enforcement of an aspect that is used to transmit the crossfire video.Described system 100 comprises encoder apparatus 105 and decoder device 110.

Encoder apparatus 105 comprises frame generator 115, modeling device 120, dispenser 160, multimedia coding device 125, memory assembly 130, processor 135 and receiver/transmitter 140.Processor 135 is the integrated operation of the described exemplary encoder device 105 of control usually.

Dispenser assembly 160 is divided into different blocks with frame of video, makes motion model to be associated with the subset area of frame of video.The analysis of motion-deformation information can be successfully used to initial scene/frame is carried out segmentation and can be used for determining the minimum time sampling of the frame of needs compression and transmission, the described frame that needs compression and transmission with can be based on the data of the frame that is transmitted and successfully the frame of interpolation form and contrast.In certain aspects, (minimum) number of sampling example is based on motion-deformation dynamics and when experiences variation.Therefore, can carry out suitable frame interpolation based on the suitable segmentation of motion-deformation dynamics.

Modeler component 120 is configured to determine motion model and described motion model is associated with the object of finding in the frame of video of forming scene.

Frame generator assembly 115 produces interpolation frame by using from the data of original video stream and about the information that will be used to decode by the decoder of the data of encoder apparatus 105 transmission.Discussed the system and method that is used to produce interpolation frame in No. 2006/0165176 U.S. Patent Publication case that is entitled as " method and apparatus (Method and apparatus for encoderassisted-frame rate up conversion (EA-FRUC) for video compression) that is used for the encoder assisted frame rate up conversion (EA-FRUC) of video compression ", the mode that described open case is quoted in full is incorporated herein.

Multimedia coding device 125 can comprise sub-component, described sub-component comprises the transformer/quantizer assembly, its with video (or audio frequency or closed-caption this paper) data from space field transformation and/or quantize to another territory, for example at DCT (discrete cosine transform) situation down conversion and/or quantize to frequency domain.Described multimedia coding device also can comprise entropy encoder component.Described entropy encoder component can be used context-adaptive variable length code (CAVLC).Encoded data can comprise through the data of quantification, through the data of conversion, compressed data or its any combination.Memory assembly 130 is used for stored information, and described information for example is original video data to be encoded, encoded video data waiting for transmission, header information, header catalogue or just by the intermediate data of various encoder components operations.

In this example, receiver/transmitter assembly 140 contains circuit and/or the logic that is useful on from external source 145 receptions data to be encoded.External source 145 can be (for example) external memory storage, internet, live video and/or audio frequency feed-in, and the reception data can comprise wired and/or radio communication.Reflector 140 also contains circuit and/or the logic (for example, reflector) in order to transmission (Tx) encoded data on network 150.Network 150 for example can be the wired system of phone, cable and optical fiber or the part of wireless system.Under the situation of wireless communication system, network 150 can be including (for example) the part of code division multiple access (CDMA or CDMA2000) communication system, and perhaps described system can be frequency division multiple access (FDMA) system, OFDM (OFDMA) system, time division multiple access (TDMA) system (GSM/GPRS (the General Packet Radio Service)/EDGE (enhanced data gsm environment) or TETRA (terrestrial trunked radio) the mobile phone technology that for example are used for service industry), Wideband Code Division Multiple Access (WCDMA) (WCDMA), high data rate (1xEV-DO or 1xEV-DO Gold Multicast) system or in general adopt any wireless communication system of technical combinations.The data of being transmitted can comprise a plurality of bit streams, for example video, audio frequency and/or closed-caption.

It should be noted that can omit, rearrange and/or constitutional diagram 1 in one or more elements of the encoder apparatus 105 showed.For instance, processor module 135 can be in encoder apparatus 105 outsides.

Decoder device 110 contains similar assembly with encoder apparatus 105, comprises multimedia decoder 165, memory assembly 170, receiver 175 and processor 180.Decoder device 110 receives by network 150 or from the encoded multi-medium data of external memory 185 transmission.Receiver 175 contains and is useful on circuit and/or the logic that receives (Rx) encoded data in conjunction with network 150, and the logic that is used for receiving from external memory 185 encoded data.External memory 185 can be (for example) external RAM or ROM or remote server.

Multimedia decoder 165 contains circuit and/or the logic that is useful on the encoded multimedia bit stream that decoding receives.The sub-component of multimedia decoder 165 can comprise quantization component, inverse transform component and various error recovery component.Described error recovery component can comprise rudimentary error checking and correction assembly (for example Read-Solomon (Reed-Solomon) coding and/or turbine (Turbo) coding) and be used to replace and/or hidden can not the recovery and/or hiding error by the upper strata mistake of the data of hanging down the layer method correction.

Can show by display module 190, be stored in the external memory 185 or be stored in the internal memory component 170 through the decoding multimedia data.Display module 190 can be the integrated part of decoder device 110.Display module 190 contains for example part of video and/or audio viewing hardware and logic, comprises display screen and/or loud speaker.Display module 190 also can be external peripheral device.In this example, receiver 175 also contains to be useful on the multi-medium data through decoding is sent to the logic of exterior storage assembly 185 or display module 190.

It should be noted that can omit, rearrange and/or constitutional diagram 1 shown in one or more elements of decoder device 110.For instance, processor 180 can be positioned at the outside of decoder device 110.

Figure 1B illustrates the example of EA-FRUC device 155 that is configured to use various motion models according to an aspect that is used to transmit the crossfire video.Module 161, the module 121 that is used for determining modeling information that is configured to use the EA-FRUC device 100 of various motion models to comprise to be used to divide first and second frame of video, be used to the module 126 that produces the module 116 of interpolation frame and be used to produce coded message.

In one aspect, be used at least one device that is divided into a plurality of subregions of first and second frame of video is comprised the module that is used to divide first and second frame of video 161.In one aspect, be used for determining that the device of the modeling information of at least one object at least one of described a plurality of subregions comprises the module that is used for determining modeling information 121.In one aspect, the device that is used for producing based on described modeling information interpolation frame comprises the module that is used to produce interpolation frame 116.In one aspect, the device that is used for producing based on described interpolation frame coded message comprises the module that is used to produce coded message 126.

Fig. 2 is the flow chart of operation of EA-FRUC system that is configured to use various motion models of explanation Figure 1A.At first, at step 201 place, as will further discussing in detail, by using object-based modeling information and coming coding video frequency data to be used for promoting sampling about the information of decoder device 110 referring to Fig. 3.Then, at step 202 place, encoded message transmission is arrived decoder device 110.In certain aspects, encoded information is transferred to the receiver 175 of decoder device 110 from the transmitter module 140 of encoder apparatus 105.After receiving encoded information, at step 203 place, after the encoded information of decoder device 110 decoding, finish described process, thereby by using the regenerate compressed version of original video data of encoded object-based modeling information.To come further to discuss in detail step 203 referring to Fig. 6.

Fig. 3 is for illustrating by using object-based modeling information and DECODER information to come coding video frequency data to be used to promote the flow chart of sampling.At first, in step 301, as further discussing in detail, for the object in the frame of video is determined modeling information referring to Fig. 4.Then, in step 302, use about hope to be used to decode the information of decode system of encoded video data so that further promote the encoded video of sampling.Finally, in step 303, produce encoded video bit stream, as be entitled as in No. 2006/0002465 U.S. Patent Publication case of " method and apparatus (Method andApparatus for Using Frame Rate Up Conversion Techniques in Scalable Video Coding) that is used for being suitable for frame rate up conversion " at scalable video coding argumentation, it is incorporated herein in this clear and definite mode of quoting in full.

Fig. 4 determines the flow chart of modeling information for the explanation object in the frame of video according to an aspect of the present invention.In aspect illustrated, discern the motion object at some favourable technology that the object of identification experience arbitrary motion and distortion discloses by using this paper.In others, as known in the state of the art, can be by with the prediction of motion compensation be applied to each frame of video equably based on the hybrid video compression scheme of transition coding and come identifying object.In addition, aspect illustrated in, employed affine module covers the part of frame of video, is commonly called object-based affine module or local GMC.In the case, encoder apparatus 105 is carried out object fragments with the object in the setting movement, then estimates by using affine module itself and Object Descriptor to upgrade affine module.For instance, but the border of the institute's description object in the binary position mapping instruction video frame.In aspect affine module covering whole video frame, use global motion compensation (GMC).For the GMC situation, employed six parameters are used for the motion of descriptor frame in the affine model motion, and are transferred to decoder device 110, are not embedded in the bit stream and there is any other movable information.In aspect in addition, can use the motion model except that affine motion model.

At first, in step 401, frame of video is divided into some blocks.In certain aspects, described block has fixed size and shape.In others, can based on comprise time change dynamics in remarkable motion-deformational behavior, the zone, can unique identification the factor of object in one or its combination and frame is divided into block with non-homogeneous size and/or non-homogeneous shape.

Then, in step 402, discern a background object, and identification zero or a plurality of foreground object.In certain aspects, can discern by using image segmentation.Image segmentation comprises the pixel domain attribute of analyzing for example brightness and color-values in conjunction with threshold process, and in conjunction with analyze some statistics of these attributes based on the method in zone, for example mean value, variance, standard deviation, minimum value-maximum, intermediate value and other.In others, can discern by the texture model that uses mark husband (Markov) random field for example or crumb form.In others, can discern by using edge/outline line detection (comprising Watershed Transformation) and shape to gradient image.In others, can discern based on lax segmentation method (being commonly called active contour model) by the use reservation is successional.In others, can discern by for example using the temporal information of sports ground.In certain aspects, can image segmentation take place by use the some or all of combination in the above-mentioned image segmentation methods in Unified frame.

In some others, can be by using and identifying object based on the technology of figure, for example by using local and the overall situation, semantic and statistics (intensity/texture) grouping prompting.In aspect in addition, can discern the object of above enumerating from the scene composition information that tools obtain by using.In certain aspects, can discern background object and any foreground object by in Unified frame, using the some or all of combination in the above-mentioned recognition methods.

Then, in step 403, can classify to background object.In certain aspects, background object can be categorized as rest image, in the case, a transmission of described background object enough is used for the future frame interpolation and/or the decoding/reconstruction tasks at decoder device 110 places.In others, background object can be categorized as static (or almost static) image of experience global motion, described motion is for example for shaking, scrolling, rotation, amplifying or dwindle motion.In the case, encoder apparatus 105 is suitably selected to transmit some sampling state of background image in conjunction with the description of global motion model.Described transmission can enough be used for the frame interpolation and/or the decoding/reconstruction tasks at decoder device 110 places.In aspect in addition, the classification of background object can not belong to any one in above-mentioned two classes, in the case, the state of background image may be more intensive time sampling can be by frame interpolation and/or the decoding/reconstruction of encoder apparatus 105 transmission with the success of supporting decoder device 110 places.

Then, in step 404, handle from the motion of objects vector information of video data identification.Can be entitled as " be used for motion vector handle method and apparatus (Method and Apparatus for Motion Vector Processing) " No. 2006/0018382 system and method that the U.S. Patent Publication case is disclosed by use and handle motion vector information, described open case is incorporated herein in this clear and definite mode of quoting in full.In step 405, the affine model through estimating is associated with the motion object.Can estimate affine model based on the degradation in the performance of the approximation method of plane motion vector field piecemeal at least.As further discussing in detail referring to Fig. 5, in step 406 by using each affine model that motion vector erosion information is specified and each is associated through motion object of identification, and in step 407 by using based drive object fragments further to specify.These are further specified and are used for upgrading each affine model separately in step 408, and finally when be described affine model generation Object Descriptor in step 409 complete process.

Fig. 5 is for illustrating by using affine model to come to determine as the object in the frame of video flow chart of motion vector erosion information.At first, in step 501, encoder apparatus 105 determines that affine model is to be associated with the motion object.Then, in step 502, encoder apparatus 105 moves to first macro zone block of the object map of frame of video, wherein in step 503, each macro zone block at object map, in decision-making state 504, encoder 105 decision macro zone blocks whether with determined affine model coupling from step 501.If macro zone block mates with described affine model really, then in step 505, upgrade affine model base object map by the macro zone block that uses described coupling.Then, in step 506, by turning back to step 503, encoder apparatus 105 advances to next macro zone block.Yet if described macro zone block does not mate with described affine model, by turning back to step 503, decoder device directly advances to next macro zone block in step 506.In addition, finish described process.

Although use translation model based on the motion compensation of block widely cloth be deployed in the decoder device (software of device or hardware aspect), but, the movable information from encoder apparatus 105 is described in based on the motion vector framework of translation block in order in decoder device, to implement to use the EA-FRUC of different motion model.In certain aspects, in the motion framework based on the translation block of decoder device 110, describe the process of different motion model and can recursively carry out, so that for setting up motion vector than big block size at the block motion vectors that has than the block of cells size.

By using the information about the motion model of encoding in video bit stream, decoder device 110 is used for showing that it is that selected motion object produces motion vector that the part of pixel of some of the object of original video is come by use.In certain aspects, Xuan Ding pixel can be evenly distributed in the block.In others, can from block, select pixel randomly.

In certain aspects, a plurality of motion vectors of block are then through merging producing the single motion vector of the described block of representative, and described motion vector can further stand reprocessing (for example vectorial smoothing), and are as indicated above.In others, selected pixel or motion of objects vector can be used as the seed movement vector (seed motionvector) that is used for motion estimation module, so that produce the motion vector of representing the block of being paid close attention to.

Fig. 6 is that explanation is used the decoder device that is configured to decoding moving model in the translational motion model framework to decode according to some aspect according to the present invention to use object-based modeling information and DECODER information to promote the flow chart of the encoded video data bitstream of sampling.

In step 601, decoder device 110 receive the video bit stream that is used to comprise two reference frames through decoded information.Then, in decision-making state 602, whether the described bit stream of decoder device 110 decisions comprises the interpolation frame that encoder strengthens.If comprise the interpolation frame that encoder strengthens, then in step 603, decoder device also uses interpolation frame (it comprises the information that the encoder relevant with various motion models strengthens) to produce in time and the coterminal frame of video of interpolation frame except using reference frame.In other words, decoder device use interpolation frame that encoder strengthens with and the reference frame that is associated so that produce the frame of video that replaces interpolation frame.Yet if in step 602, the interpolated frame information that decoder device 110 decision encoders strengthen is not embedded in the bit stream, and in step 604, decoder device 110 will use reference frame to produce bidirectional frame (B frame).

Be understood by those skilled in the art that, can use in multiple different process and the technology any one to come expression information and signal.For instance, can express possibility to run through by voltage, electric current, electromagnetic wave, magnetic field or particle, light field or particle or its any combination and above describe content and data, instruction, order, information, signal, position, symbol and the chip of reference.

The those skilled in the art will further understand, and various illustrative logical blocks, module and the algorithm steps of describing in conjunction with example disclosed herein can be embodied as electronic hardware, computer software or both combinations.For this interchangeability of hardware and software clearly is described, above from its functional angle various Illustrative components, block, module, circuit and step have been described substantially.With this functional hardware that is embodied as still is that software depends on application-specific and the design limit of forcing on the whole system.Those skilled in the art can implement described functional at each application-specific by different way, but these implementation decisions should be interpreted as causing departing from the category of the method that is disclosed.

Can be by using general processor, DSP, ASIC, field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components or implementing or carry out various illustrative logical blocks, module and the circuit of describing in conjunction with example disclosed herein with its any combination of carrying out function as herein described through design.General processor can be microprocessor, but or, described processor can be any conventional processors, controller, microcontroller or state machine.Processor also can be embodied as the combination of calculation element, for example, and DSP and combination, a plurality of microprocessor of microprocessor, one or more microprocessors that combine the DSP core or any other this type of configuration.

The method of describing in conjunction with example disclosed herein or the step of algorithm can directly be implemented in hardware, the software module of being carried out by processor or both combinations.In the medium of any other form known to software module can reside in RAM memory, flash memory, ROM memory, eprom memory, eeprom memory, register, hard disk, removable disk, CD-ROM or this technology.Exemplary storage medium is coupled to processor, so that processor can write medium from read information and with information.Perhaps, medium can become one with processor.Processor and medium can reside among the ASIC.ASIC can reside in the radio modem.Perhaps, processor and medium can be used as discrete component and reside in the radio modem.

The those skilled in the art provide the aforementioned description of the example that is disclosed so that can make or use the method and apparatus that is disclosed.Be appreciated by those skilled in the art that various modifications, and the principle that this paper defined can be applicable to other example under the situation of spirit that does not depart from the method and apparatus that is disclosed or category to these examples.

Claims

1. method of handling multi-medium data, it comprises:

In first and second frame of video at least one is divided into a plurality of subregions;

Determine modeling information at least one object in the described subregion at least one, described modeling information is associated with described first frame of video and second frame of video;

Produce interpolation frame based on described modeling information; And

Produce coded message based on described interpolation frame, wherein said coded message is used to produce in time the frame of video with described interpolation frame co.

2. method according to claim 1, wherein determine that at least one object in one in the described subregion modeling information comprises:

Determine to estimate based on the sports ground of block;

Based on estimating and discern at least one object in described field based on block; And

For described at least one object is determined affine model.

3. method according to claim 1, it further comprises the border of using color character to discern described at least one object.

4. method according to claim 1, it further comprises the border of using textural characteristics to discern described at least one object.

5. method according to claim 1, it further comprises the border of using the pixel domain attribute to discern described at least one object.

6. method according to claim 1, it further comprises:

Determine with described subregion in motion vector erosion information that are associated, the coded message of wherein said transmission comprises described motion vector erosion information.

7. method according to claim 1, wherein said modeling information comprises affine model.

8. method according to claim 7, wherein said affine model comprise at least one in translation, rotation, shearing and the convergent-divergent motion.

9. method according to claim 1, wherein said modeling information comprises global motion model.

10. equipment that is used to handle multi-medium data, it comprises:

Be used at least one of first and second frame of video is divided into the device of a plurality of subregions;

Be used at least one object in described a plurality of subregion at least one to determine the device of modeling information, described modeling information is associated with described first frame of video and second frame of video;

Be used for producing the device of interpolation frame based on described modeling information; And

Be used for producing based on described interpolation frame the device of coded message, wherein said coded message is used to produce in time the frame of video with described interpolation frame co.

11. equipment according to claim 10, wherein said definite device comprises:

Be used for definite device of estimating based on the sports ground of block;

Be used for the device of estimating at least one object of identification based on the field of block based on described; And

Be used to described at least one object to determine the device of affine model.

12. equipment according to claim 10, it further comprises the border of using color character to discern described at least one object.

13. equipment according to claim 10, it further comprises the border of using textural characteristics to discern described at least one object.

14. equipment according to claim 10, it further comprises the border of using the pixel domain attribute to discern described at least one object.

15. equipment according to claim 10, it further comprises:

Be used for the device of definite motion vector erosion information that is associated with one of described subregion, the coded message of wherein said transmission comprises described motion vector erosion information.

16. equipment according to claim 10, wherein said modeling information comprises affine model.

17. equipment according to claim 16, wherein said affine model comprise in translation, rotation, shearing and the convergent-divergent motion at least one.

18. an equipment that is used to handle multi-medium data, it comprises:

Cut apart module, it is configured in first and second frame of video at least one is divided into a plurality of subregions;

MBM, it is configured to determine modeling information at least one object in described a plurality of subregions at least one that described modeling information is associated with described first frame of video and second frame of video;

The frame generation module, it is configured to produce interpolation frame based on described modeling information;

Coding module, it is configured to produce coded message based on described interpolation frame; And

Transport module, it is configured to described coded message is transferred to decoder.

19. a machine-readable medium that comprises the instruction that is used to handle multi-medium data, wherein said instruction impel machine when carrying out:

Determine modeling information at least one object in described a plurality of subregions at least one, described modeling information is associated with described first frame of video and second frame of video;

Produce interpolation frame based on described modeling information; And

20. a processor that is used to handle multi-medium data, described processor is configured to:

Produce interpolation frame based on described modeling information; And