Embodiment
The invention provides a kind of method and apparatus that is used for effectively and determines motion vector accurately, between expectation image motion that causes owing to tracing of human eye and the motion of the display image in the digital video, do not have or almost do not have difference like this.This can comprise that a time vector splitting scheme determines that the recurrence stage division of motion vector realizes by use.
Usually, in order to move motion compensation process preferably, comprise recurrence stage division described here, need do two basic assumptions to the characteristic of object motion: 1) the motion object has inertia, and 2) the motion object is very big.Inertia hypothesis hinting for a time vector sampling interval (being exactly the frame rate in the digital video), motion vector changing gradually.The hypothesis of big object is hinting that motion vector only gradually changes for a space vector sampling interval, and in other words, vector field is level and smooth, and it is discontinuous to have a less border movement.
The target of recurrence stage division is by a source correlation window being applied to first picture frame and a target correlation window being applied to next picture frame, and drop target correlation window, thereby obtain the optimum Match between target correlation window and the source correlation window, just, the content of source correlation window is identical as much as possible with the content of target correlation window, thereby finds a motion vector.Simultaneously, it is few as much as possible to carry out the required amount of calculation of coupling between source correlation window and the target correlation window, and still searches for whole vector spaces restrictions.In order to realize these targets, the recurrence hierarchical algorithms has used the multiple resolution levels of picture frame.At first determine other optimum movement vector of lowest resolution level, this is to project minimum resolution levels downwards by the optimum movement vector with previous highest resolution level, it is tested, and upgrade one or more.Then this optimum movement vector upwards is transferred to a higher resolution levels, carries out some adjustment and determine a new optimum movement vector at that.This new optimum movement vector continues upwards to be transferred to another higher resolution levels, carries out some adjustment and determines another new motion vector at that.This processing repeats up to having reached the highest always, original resolution levels, and for till the original resolution rank determined an optimum movement vector.
Fig. 1 has shown the execution of a recurrence classification processing (100).Suppose to have generated the multiple resolution levels of picture frame.As shown in Figure 1, the recurrence classification of determining a motion vector is handled (100) and is started from the motion vector of a previous picture frame is projected a minimum resolution levels (step 102) downwards.Thereby produce one group of optimum movement vector (step 104) that upgrades vector and test to find on the lowest resolution rank.In one embodiment, this test is that the pixel of the relevant position in source correlation window and the target correlation window that concentrates on each end point of upgrading vector of the initial point by will concentrating on motion vector relatively realizes.Relatively can, for example realize by the brightness value that each pixel in each target window is deducted the respective pixel in the source window.In this case, the minimum of the absolute difference that optimum Match will be by finding source correlation window and target correlation window centering and (SAD) defining, and optimum movement vector be with this source correlation window and target correlation window to relevant vector.
After having found minimum SAD, select best vector (step 106).Handle (100) then and check whether there is higher resolution levels (step 108).If there is higher resolution levels, handle that best vector upwards is transferred to next higher resolution levels (step 110), and repeating step 104-108.If there is not higher resolution levels, handle and advance to step 112, select best vector as motion vector there, and be used for motion compensation, finished the processing of present frame like this.
The advantage of this method is that the renewal of a pixel equals the renewal in two or more pixels of next higher level a lower rank, and this depends on the poor of two resolution between the rank.If for example there are three resolution levels, be exactly 1: 1,1: 2 and 1: 4, and, assembled the potential reduction of delay so four times in each grade renewal+/-1 pixel.Rephrase the statement, use effective resolution classification to come the accelerating time recurrence to assemble.This has brought significant improvement, especially for the frame of the small object with high-speed motion.
Now with reference to accompanying drawing 1-4, by having 1: 1, other recurrence hierarchy plan of three stage resolution ratios of 1: 2 and 1: 4, the image block grid of 4 * 4 pixels is an example, describes the present invention in detail.It should be noted that the vector shown in the accompanying drawing 2-4 only is in order to show this example, other vector number of other number of stage resolution ratio and each stage resolution ratio and/or type can for example assess the cost quality, changes such as processing speed according to different parameters.
Fig. 4 has shown an image block grid (400), and it has been divided into the image block (405) of 4 * 4 pixels, and wherein each pixel is represented as a circle (410).The pixel of black (415) is expressed as the position of each 4 * 4 block of image pixels calculating kinematical vector.As can be seen from Figure 4, for each 4 * 4 block of image pixels is calculated a motion vector, and the position of the motion vector initial point in each 4 * 4 block of image pixels is identical.Fig. 3 has shown that resolution is half identical grids of pixels (400) of original pixels grid shown in Figure 4.Fig. 2 has shown the more identical grids of pixels (400) of low resolution, and in the present embodiment, resolution is half of resolution shown in Figure 3,1/4th of resolution perhaps shown in Figure 4.
As depicted in figs. 1 and 2, the recurrence classification processing of determining a motion vector starts from the motion vector (205) of a previous image is projected minimum resolution levels (step 102) downwards, be 1: 4 of original resolution in the present embodiment, as shown in Figure 2.In one embodiment, before projection, this outmoded motion vector (205) of filtering, this mainly is to consider to contain the situation that causes the discontinuous object background edge of vector in the neighborhood pixels.This processing is also referred to as that time vector is cut apart and will be described in more detail below.Filtering output is 1: 1 grade other new basic vector, and next it project 1: 4 rank downwards.In first frame of this sequence, be exactly when not having previous image, to handle (100) and start from zero vector as outmoded motion vector.In one embodiment, when existing scene to interrupt in the video, when not having continuity between two frames, also use zero vector exactly.
Fig. 5 A has shown an embodiment of a time vector dividing processing (500).As mentioned above, the purpose of time vector dividing processing (500) is to provide one to being projected onto the better estimation of other outmoded motion vector (205) of lowest resolution level, as shown in Figure 2.Therefore, except motion vector of simple projection, also the adjacent domain (550) that contains a plurality of vectors is detected, shown in Fig. 5 B.And, suppose that the adjacent domain (550) of vector comprises an object edge.Time vector dividing processing (500) attempts before the motion vector of selecting a best motion vector relevant with object to be separated with the motion vector relevant with background, and this will further improve selects to handle.
Shown in Fig. 5 A, handle (500) and start from obtaining one group of adjacent vectors (550) from a previous picture frame.Group shown in Fig. 5 B (550) comprises nine vectors, and each vector all points to an image block (560).In the present embodiment, use nine adjacent image pieces (560) to define adjacent vectors, but only have wherein five (with the V1-V5 of X-shaped arrangement) to be used for calculating in the middle of the described execution here.But the reader can be appreciated that the vector that can select arbitrary number, and the adjacent image piece can have multiple different shape.The square setting of the setting of five adjacent vectors and image block only is the purpose for example.
The component of handling then adjacent vectors (550) is slit into two families (step 504).In the middle of an execution, cut apart by determining any two vectors farthest separated from one anotherly, and use these two vectors to carry out as the source vector of two families.After two source vectors having determined family, each remaining vector is divided in two families just according to the source vector of its close family.
Next, be treated to each family and determine a representative vector (step 506).The purpose of determining representative vector is to find the representative vector of the best of each existing family.In the middle of an execution, representative vector is determined to be in each family vectors of other vector distance minimum with all.Minimum range can, for example by determine between all other vectors in each vector and this family in the family distance and will be apart from Calais's calculating mutually.Vector with minimum total distance is selected as representative vector.
When having found two representative vector, handle and determine which representative vector provides best coupling (step 508) when image block moves defined distance of each representative vector and direction.This can, for example by using two correlation windows to carry out, one of them correlation window concentrates on the initial point of this vector, another concentrates on the end point of this vector, and determines the minimum and (SAD) of the absolute value of the pixel in two correlation windows.Below how detailed description is operated, but for the purpose of Fig. 5 A, important result is to be a coupling that finds a best in two representative vector.Processing selecting has the representative vector of optimum Match as candidate vector (step 510) then.Next selected vector is projected minimum resolution levels downwards, and processing finishes then.Best coupling vector representation subtend vector, other vector representation background vector.
Help to have solved discontinuous around the object vector of less edge details above-mentioned cutting apart, for example the hood on sports car.Cut apart also and can carry out equally on the adjacent domain that does not comprise any target edges, because most vector is positioned at in the middle of the gang, and one or some " outside " vectors will only be contained in other family.
Return with reference to figure 1 and Fig. 2, after filtered vector has projected minimum resolution levels, generated one group upgrade vector (210a-210f) thereby and test these upgrade that vectors find and outmoded filtering projection motion vector between differ+/-1 pixel or+the minimum SAD of/-2 pixels.In Fig. 2, shown that six are upgraded vector (210a-210f), because mobile the moving greater than vertical direction usually of horizontal direction, therefore two are used on the horizontal direction+/-1 pixel, two are used on the horizontal direction+/-2 pixels, and two are used on the vertical direction+/-1 pixel.But, those skilled in the art will appreciate that can on any level relevant and/or vertical direction, generate the renewal vector of any number and it is tested with projected vector (205).In the middle of an execution, the photography vector of a prediction also throws 1: 4 rank downwards.This photography vector will be described in further detail.
In the middle of an execution, by the different location of pixels in the candidate vector definite object frame of an image block is calculated SAD, wherein candidate vector comes from the identical image block position in the frame of source.For each candidate vector, the window of a rectangle is concentrated on the pixel pointed of each candidate vector in the target frame.A corresponding rectangular window concentrates on the pixel of candidate vector origin in the frame of source.Calculate two corresponding bright pixels in the window then, be exactly, the absolute difference of pixel that has identical relative position in two windows is right.The summation of all absolute differences is exactly a sad value.Sad value reduces along with the coupling more of window, and when pixel was identical, SAD was reduced to zero in the ideal.In fact, certainly, because the influence of noise and other factors, best vector has the SAD of non-zero, but best vector will have the minimum SAD of the vector in the set of candidate vectors.
After having found minimum SAD, best vector is exactly that it is selected and be stored in (step 106) in the memory to have the vector of minimum SAD (210f).Handle then and check whether there is higher resolution levels (step 108).Therefore as mentioned above, in the present embodiment, there are two higher resolution levels, handle transmission best vector (210f), it is projected on 1: 2 the resolution levels (step 110) as shown in Figure 3.After best vector has upwards projected 1: 2 rank (step 104), generate one group round this best vector (210f) and upgrade vector (305a-305d).On this rank, round project downwards on 1: 2 resolution levels outmoded 1: 1 filtered vector (205) also generate second group and upgrade vector (310a-310d).Upgrade minimum SAD in vectors by calculating all,, find a new best vector (305a) as on 1: 4 resolution levels.Select this best renewal vector then and be stored in (step 106) in the memory.
Handle then and check whether there is any higher resolution levels (step 108) once more.At that point, higher resolution levels of residue in resolution pyramids, therefore handle and turn back to step 104 once more, wherein other best vector of 1: 2 stage resolution ratio (305a) among Fig. 3 is filtered and upwards project on as shown in Figure 41: 1 the highest resolution levels.Best vector (305a) round projection and filtering generates one group of renewal vector (405a-405d) (step 104) once more.On this rank, also generate second group round outmoded 1: 1 filtered vector and upgrade vector (410a-410d).Also generate the 3rd group round photography vector (415) and upgrade vector (420a-420d).
The photography vector has been described the mass motion of content frame, and is opposite with the local vector that fully independently calculates each image block position, and the photography vector can be used for assisting to find a more real motion vector.In the middle of the scene of several common generations, the motion vector that moves generation owing to each locational photography in the frame can utilize a simple pattern easily to predict.For example, shake at camera lens under the situation of photography distant place landscape, all motion vectors will be identical, and equal the speed of video camera.When camera lens moves into place object on a plane, when for example the width of cloth on the metope is drawn, will be an other scene.All then motion vectors have a radiation direction, and increase to the maximum of image border from zero of image central authorities.
In the middle of an execution, handle and attempt a mathematic(al) mode is applied on the motion vector that uses the least square algorithm computation.Best coupling between photography motion vector and the Mathematical Modeling shows that an above-mentioned scene may take place, and can use the photograph mode predictive vector as the additional candidate vector in the next recurrence classification vector estimation steps.The advantage of considering the photography vector is that the recurrence of recurrence hierarchical search partly is a local search method, and its local minimum value that can be converged to a falseness replaces real minimum value.The photography predictive vector candidate may potential help avoid the detection of false local minimum value and make processing directly advance to real minimum value.
As at 1: 4 and 1: 2 resolution levels, find new best vector (405d) (step 106), and be stored in the memory then.Handle then and check whether there is any higher resolution levels (step 108) once more.There is not specifically higher resolution levels, therefore handles advancing to step 112, select best vector there, and use this best vector to carry out motion compensation, finished the processing of present frame like this.
Above-mentioned processing is that the image block at all 4 * 4 pixels in the frame carries out, and according to the motion vector of determining, between source frame and target frame, carry out the interpolation of frame, thereby make owing to have less difference or do not have difference between expectation image vector that the tracking of human eye causes and the display image vector at all.
Can see from top discussion, the invention provides a level and smooth and accurate vector field, and only used considerably less operand.And, because the multiple rank of resolution has reduced and has assembled delay.Compare the resolution levels that can use still less with traditional method, and owing to guarantee to change resolution at higher resolution levels by test projected vector on each resolution, other vector error of even lower level can not be exaggerated.Carry out the discontinuous problem of edge details generation object-background vector that can help to solve of cutting apart of a time vector in to the process of determined motion vector being filtered into a previous image round less, for example, hood on mobile automobile, the perhaps details of similar type.Simultaneously, time vector is cut apart the image-region that not opposite influence does not contain target edges.In this scene, outside vector (that is, incorrect vector) or vector will be separated from good vector, and the journey of therefore correcting one's mistakes is still favourable.
The present invention also can be at Fundamental Digital Circuit or at computer hardware, and firmware is realized in the middle of software or their combination.Device among the present invention can be embedded in the computer program in the computer-readable memory device actual, and the execution by programmable processor realizes; And method step of the present invention can be by programmable processor by execution of programs of instructions, and operation input data and generation output realize function of the present invention.The present invention also can carry out one or more computer program and realize in a programmable system, described programmable system comprises at least one programmable processor, at least one input equipment and at least one output equipment, described processor and data storage system coupling and, be used for receiving data and instruction from this system, and to this system's emission data and instruction.Each computer program can perhaps collect or carry out in machine language if desired in high-level process flow or object oriented programming languages.And under any circumstance, language can be a language compiling or that explain.Suitable processor comprises, for example, and general and special-purpose microprocessor.Usually, a processor will receive instruction and data from a read-only memory and/or a random access storage device.Usually, a computer will comprise that one or more is used for the large storage facilities of storing data files; Such equipment comprises disk, for example Nei Bu hard disk and removable dish; Magnetooptical disc; And CD.Memory device is suitable for actual embeddeding computer program command and data, comprises the nonvolatile memory of form of ownership, comprises semiconductor memory apparatus for example, EPROM for example, EEPROM.And flash memory device; Disk, for example internal hard drive and removable dish; Magnetooptical disc; And CD-ROM dish.Any aforesaid dish can replenish or be integrated in the middle of the ASIC by ASIC (application-specific integrated circuit (ASIC)).
Fig. 6 has shown that is used to carry out a computer system of the present invention (600).This computer system (600) only is the example of graphics system, has wherein used the present invention.This computer system (600) comprises a CPU (CPU) (610), a random access storage device (RAM) (620), a read-only memory (ROM) (625), one or more external equipment (630), a graphics controller (660), a main memory unit (640 and 650) and a digital display unit (670).Known in the prior art, what ROM was unidirectional transmits data and instruction to CPU (610), uses RAM (620) to transmit data and instruction in two-way mode simultaneously usually.CPU (610) generally includes the processor of any amount.Main storage device (640 and 650) comprises any suitable computer-readable medium.Second storage medium (680), large storage facilities normally is also with CPU (610) bidirectional coupled and additional data storage capacity is provided.Large storage facilities (680) is one and is used for storage and contains computer code, the computer-readable medium of the program of data etc.Large storage facilities (680) be one usually than the slow-footed storage medium of main storage device (640,650), for example hard disk or tape.Large storage facilities (680) can be the equipment of disk or paper tape reader or other known types.The information that is to be understood that the reservation in the large storage facilities (680) can be with the part of standard mode combination as the RAM (620) of virtual memory under suitable situation.
CPU (610) also is coupled on one or more input-output apparatus (690), and it includes, but are not limited to video-frequency monitor, tracking ball, mouse, keyboard, microphone, tactual displays, sensing card reader, magnetic or paper tape reader, tablet, stylus, voice or writing identifier, perhaps other known input equipments, for example other computer.At last, CPU (610) uses the network shown in (695) to connect, and optionally is coupled to a computer or radio communication network, for example an Internet or an internal network.Utilize such network to connect, can expect that CPU (610) can be from network receiving information, perhaps in the process of carrying out above-mentioned steps to network output information.Such information is represented as the command sequence of using CPU (610) to carry out usually, can receive and output to network from network in the middle of, for example to be embedded in the form of the computer data signal in the carrier wave.Right and wrong Changshu is known for the technical staff of computer hardware and software field for above-mentioned equipment and material.
Graphics controller (660) generates view data and a corresponding reference signal, and provides it to digital display unit (670).Can basis, for example the pixel data that receives from CPU (610) or external encoder (not shown) generates view data.In one embodiment, view data provides with the form of RGB, and reference symbol comprises VSYNC known in the art and HSYNC.But, be to be understood that the present invention can carry out with other forms of data and/or reference symbol.For example, view data can comprise the video signal data with corresponding timing reference signal.
Multiple executive mode of the present invention has been described.Come what may, be to be understood that not deviate from the spirit and scope of the present invention and can make various changes.For example except the classification and the time vector in intermediate layer, the photograph mode vector that projection downwards produces also can be as the candidate vector of SAD calculating.Therefore, other embodiment is also contained within the scope of following claim.