A vision signal can be used its three-dimensional, and promptly level, vertical and time component are treated; And be expressed as a continuous function f
3(x, y, t).Suppose that its moving object has only at the uniform velocity translation of rigid body campaign v=(v
x, v
y), progressive video F then
3The fourier transform of () can be expressed as:
F
3(f
x,f
y,f
t)=F
2(f
x,f
y)·δ(f
xv
x+f
yv
y+f
c)
Eq (1) is F wherein
2(f
x, f
y) be two-dimensional video signal F
2(f
x, f
y) fourier transform, and δ (f
xv
x+ f
yv
y+ f
t) then represent in the three-dimensional frequency space by Equation f
xv
x+ f
yv
y+ f
tA clinoplain of=0 expression, therefore, base band only exists on the two-dimentional frequency plane.Eq (1) is disclosed in the paper " double sampling of the motion compensation of HDTV " such as people such as R.A.F.Belfor, SPIE, 1605, face-to-face communication image processing ' 91,274-284 page or leaf (1991).Can predict a space-time bandwidth from the position of baseband frequency spectrum.That is, if given time bandwidth f
t w, just can from Eq (1), draw time bandwidth f
w t, spatial bandwidth f
x wWith f
y w, and velocity component v
xWith v
yBetween relation as follows:
f
t w=f
x w·v
x+f
y w·v
y
Eq(2)
F wherein
x wWith f
y wBe the corresponding spatial bandwidth component on x and the y direction.Can find out that from Eq (2) time bandwidth is to be directly proportional with the speed of moving object; And when having fixed time bandwidth, spatial bandwidth becomes that speed with moving object is inversely proportional to.
Because the vision signal that is used for filtering is with space and time sample frequency sampling, the vision signal of being sampled can be represented three-dimensional data from the sample survey, i.e. pixel.Therefore, the sampling of continuous function f3 () can be expressed as continuous function f
3(x, y t) multiply by the cubical array of a δ function.Then, the spectrum distribution of pixel just can be by (f
3The fourier transform of () and the involutory of a δ function provide.Thereby the frequency spectrum of pixel is repeated on the interval of sampling frequency by the feature of δ function.
At first, wherein show base band frequency division cloth v as the function of speed of moving object referring to Figure 1A, 1B and 1C
x=1 pixel/frame period, v
x=2 pixels/frame period and v
x=3 pixels/frame period, wherein solid line is represented the duplicate of base band; And with the normalization to 1 of time sample frequency; And space (x direction of principal axis) and temporal frequency be appointed as f respectively
xWith f
t
The motion of a pixel A in the moving object causes frequency spectrum to depart from the spatial frequency axle, as shown in Figure 1A.As shown in Figure 1A, 1B and 1C, described angle of deviation θ increases with speed.From Eq (2), just can easily understand the reason of deflection by investigating temporal frequency on the pixel in the vision signal: because the distribution of frequency spectrum on the spatio-temporal frequency territory is relevant with the product of the speed of spatial frequency and moving object, higher moving object speed provides higher temporal frequency.What should emphasize is that frequency spectrum is deflection rather than rotation.
Referring to Fig. 2, wherein show with a regular time cut-off frequency f
t cThe result of low-pass filtering in time domain.In order to carry out time filtering, can make following two kinds of hypothesis: the first, baseband frequency spectrum does not have the space overlap part, and the second, in order to simplify, only existence simple horizontal movement at the uniform velocity is (with f
xExpression).In Fig. 2, filtering is the result comprise, such as the spatial high-frequency composition B of the overlapping adjacent spectra of express time.That is the time low-frequency component of the duplicate that the spatial high-frequency composition influence is adjacent.In other words, the spatial high-frequency composition of adjacent duplicate and the interference between the low-frequency component appear in the image of demonstration.
From Eq (1) and Eq (2) as can be seen, space (comprising the vertical and horizontal component) and temporal frequency f
sWith f
tBetween relation can represent down:
Spatial frequency f wherein
sBe to be defined in f
x-f
yOn the plane.Seen in from Eq (3), should be appreciated that when time of having fixed for the binding hours bandwidth during cut-off frequency spatial-cut-off frequency becomes that absolute value with the speed of moving object is inversely proportional to.
To suppose that h () is the pulse characteristic of a low pass termporal filter, and in order simplifying, only to have simple horizontal movement (x direction of principal axis), then the vision signal g of time frequency band limits (x, t ') can be expressed as follows:
Wherein used a linear phase filter to reduce the group delay effect of filter response.From translation of rigid body campaign v=(v at the uniform velocity
x, v
y) with the hypothesis of simple horizontal movement in, the filtering input function can be expressed as follows:
f(x,t-τ)=f(x+v
xτ,t) Eq(5)
From Eq (5), motion pixel can be represented with the track in the spatial domain on its any on time shaft along the displacement of temporal frequency axle.Thereby, Eq (4) can be rewritten as:
On the other hand, in the situation of actual video signal, at the uniform velocity the hypothesis of translation of rigid body campaign is not to set up forever.Moreover if there is not moving object, then each pixel value of video data signal is owing to light source and change such as the feature of vision signal generation equipment such as gamma camera change.In these situations, Eq (5) only keeps setting up at short time interval, and can rewrite as follows:
f(x,t-(k+1)Δt)=f(x+v
x(t-kΔt)·Δt,t-kΔt)
Eq (7) wherein Δ t represents time interval of a weak point, frame period for example, and K then is an integer.According to Eq (7), formula (6) can be rewritten into:
Eq(8)
From Eq (8), be appreciated that the time filtering of Eq (4) can be by finishing with the spatio-temporal filtering of its filtering input function f ().
Eq (8) is the continuous description of motion adaptive spatio-temporal filtering.Similarly the result also sets up in discrete case: integration is substituted with summation, and d τ is represented with Δ τ and j.Eq this moment (8) is provided by following formula:
f(x+v(x,n-j)Δτ-l,n-j))
Eq (9) wherein n is a frame subscript; Speed and filtering position are replaced by vector v and x; Comprise 2N+1) * the filter pulse characteristic h () of a L filter coefficient is that L (N, L are positive integer) is predetermined together together with time cut-off frequency and predetermined digital N; And if the pixel interbody spacer is expressed as Δ x, and then select to satisfy | v () Δ τ |≤| the Δ τ of Δ x| (if Δ τ can not satisfy this condition, then it can cause space overlap).
Therefore, as can be seen in the Eq (9), the time frequency band limits can be passed through spatio-temporal filtering, i.e. low-pass filtering is taken from the filtering input function in space and the time-domain and drawn.
On the other hand, if Δ T is an interFrameGap, then L Δ τ equals Δ T, and v () Δ T equals D (), and D () is the displacement of a pixel of two adjacent interframe of expression.At this moment, Eq (9) can revise as follows:
Wherein, L is chosen as satisfied | D () |≤| Δ x|L (this condition equivalence is in the condition of previous description | v () Δ τ |≤| Δ x|, therefore, if L can not satisfy this condition, it can cause space overlap).Eq (10) is a kind of realization of Eq (9).The time frequency band limits is to pass through spatio-temporal filtering, it is low-pass filtering, filtering input function f () draws, that this function comprises is a plurality of, and (such as (2N+1) filtering input data set, wherein each group comprises the filtering input data of the predetermined number (for example L) that obtains from the pixel value of the corresponding frame of vision signal.In Eq (10), (x+D (x, n-j) 1/L) may not overlap with accurate pixel location in the position of the filtering input data in (n-j) frame of expression vision signal.In this case, filtering input data can be determined from being arranged in around the adjacent image point of this position by using such as the bilinear interpolation method of the weighted sum of determining the adjacent image point value as filtering input data.That is, the filtering input function is that track in the moving object of upper edge, time-space domain draws.Specifically, the one group of input data that is included among the filtering input function f () can use motion vector to determine from the pixel value of the frame of correspondence, this motion vector is represented moving object in this frame of vision signal and the displacement between its former frame, and this point will be described in conjunction with Fig. 3 below.
On the other hand, the filter pulse characteristic comprises and plays a plurality of with the effect of bandwidth constraints on a predetermined bandwidth of vision signal, promptly (2N+1) * L, filter coefficient, these filter coefficients can pre-determine according to required time cut-off frequency and predetermined digital N and L.For example, when the time cut-off frequency be f
t cThe time, then use cut-off frequency f
t c/ L designs the filtering pulse characteristic.
In fact, can from Eq (10), find out, the data g () of filtering, promptly the frequency band limits data are respectively to organize filtering input data with corresponding filter coefficient and by asking the input data sum of respectively organizing filtering to obtain by convolution.
Referring to Fig. 3, wherein show the filtering input function key diagram of motion adaptive spatio-temporal filtering method of the present invention.In order to simplify, each frame represents with a line, for example F
C-1, F
cWith F
C+1, and the N of Eq (10) and L are assumed to 1 and 4 respectively.In other words, in order to obtain a target frame F
cIn a target pixel through the data of filtering, in Filtering Processing, used three filtering incoming frames, promptly comprise the target frame F that will carry out the target pixel of filtering operation thereon
c, and two adjacent frame F
C-1, F
C+1, c-1 wherein, c and c+1 represent the frame subscript; And on each filtering incoming frame, determine four filtering input data according to the motion vector of the locational pixel of target pixel in the frame of its back.The position of target pixel is respectively at frame F
C-1, F
cWith F
C+1In as x
10, x
20With x
30Expression, and vertical axis is a time shaft.
In order to draw target frame F
cMiddle x
20On target pixel through the data of filtering, determined many groups of (promptly three groups) filtering input data, data are imported in the filtering on the corresponding movement locus of every group of target pixel that is arranged in corresponding filtering incoming frame that comprises predetermined number (for example 4).Particularly, according to frame F
C-1, F
cWith F
C+1In motion vector D (x
10, c-1), D (x
20, c) with D (x
30, c+1) on the locational pixel track of target pixel, determine three groups respectively and be positioned at (x
10, x
11, x
12, x
13), (x
20, x
21, x
22, x
23) and (x
30,
31, x
32, x
33On filtering input data.
As shown in Figure 3, be readily appreciated that filtering input data equal the target pixel value of valency in the frame of the temporal interpolation of vision signal or the sampling that makes progress.For example, frame F
C-1Middle x
11The filtering input data at place are equivalent to time t=-3 Δ T/4 and go up x
10The pixel value at place.This can be expressed as follows:
Eq(11)
Equivalence between spatial domain and the time-domain as shown in phantom in Figure 3.
Referring to Fig. 4 A to 4D, wherein show by using the low pass time filtering result of the vision signal of this motion adaptive spatio-temporal filtering method on the time-space domain.In Fig. 4 A, show the baseband frequency spectrum of original video signal.As mentioned above, acquisition is respectively organized the process time that is equivalent to of filtering input data and is upwards sampled or interpolation, as shown in Fig. 4 B.If the cut-off frequency of required time low-pass filtering is f
t c, then the cut-off frequency of filter of the present invention is f
t c/ L is as shown in Fig. 4 C.Filtering result's final frequency spectrum is illustrated among Fig. 4 D, and they are patterns of upwards sampling (note, do not provide the filtering result for the frame that inserts) of the frequency spectrum among Fig. 4 C.Compare with the time frequency band limits among Fig. 2, should be able to easily understand, space-time frequency band limits of the present invention is not subjected to the time-interleaving some effects.
As can be, will be understood that filtering operation is that track in the moving object of upper edge, time-space domain carries out, and draws a time frequency band limits whereby from shown in Eq (10) and Fig. 3,4A, 4B, 4C and the 4D.Therefore, property filter of the present invention can be eliminated the time-interleaving in the frequency spectrum that may appear at repetition when the speed of moving object increases effectively, thereby greatly reduces the visible artefact in the motor area that appears in the image.
Referring to Fig. 5, wherein show the image coding device of employing according to the motion adaptive spatio-temporal filtering device of a preferred embodiment of the present invention.This image coding device comprises a filter circuit 100, is used for carrying out according to motion adaptive spatio-temporal filtering of the present invention; And a video coding circuit 60, be used for for video signal compression is eliminated the redundancy of the vision signal of process filtering for transmission to more tractable size.This vision signal generates from a video signal source, gamma camera (not shown) for example, and deliver to filter circuit 100.
Filter circuit 100 is carried out the motion adaptive spatio-temporal filtering operation according to Eq (10), as mentioned above.Filter circuit 100 comprises a frame buffer 10, motion estimator 20, motion vector buffer 30, a filtering pattern of the input device 40 and a filtering calculator 50.The present frame that one of storage is being input to filter circuit 100 in the frame buffer 10 and a plurality of (such as the frame of (2N+1) front, the filtering incoming frame that promptly in filtering, will use.Particularly, suppose N=1, then store present frame F in the frame buffer 10
C+2, and three filtering incoming frame F
C-1, F
cWith F
C+1, wherein c+2, c+1, c and c-1 are the frame subscript.Two frames in succession of motion estimator 20 receiving video signals are promptly directly from the present frame F of the vision signal of video signal source input
C+2And be stored in the frame F of its front in the frame buffer 10
C+1, and extract and be included in present frame F
C+2In the motion vector that is associated of each pixel.In order to extract motion vector, can adopt various estimating motion methods famous in the present technique (for example, to see MPEG video simulation model 3, International Standards Organization, the coded representation of picture and acoustic information, 1990, ISOIDEC/JTC1/SC2/WG8 MPEG 90/041).
The motion vector that extracts is coupled to motion vector buffer 30 and it is stored in wherein.According to the present invention, storage frame F in the motion vector buffer 30
C+2, F
C+1, F
cWith F
C-1Motion vector.
The motion vector that is associated with the filtering incoming frame that is stored in the filtering incoming frame in the frame buffer 10 and be stored in the motion vector buffer 30 is coupled to filtering pattern of the input device 40.Filtering pattern of the input device 40 determines to constitute many groups of (for example 3 groups) filtering input data of the filtering input function f () among the Eq (10).As mentioned above, be positioned on the position of not dropping on accurate pixel location if judge filtering input data, then filtering pattern of the input device 40 provides filtering input data by the weighted sum of calculating its four neighboring pixels.Filtering input data are coupled on the filtering calculator 50.
In filtering calculator 50, use the filtered data g () that represents with Eq (10) from the filtering input data computation of filtering pattern of the input device 40 inputs.
The filter pulse characteristic that comprises a plurality of (for example (2N+1) * L) filter coefficient is according to required time cut-off frequency f
t c, N and L determine, and f
t c, N and L then be by having considered the feature of vision signal, and be predetermined in conjunction with the described condition of Eq (10) for satisfying the front.Filter coefficient can pre-determine before filtering and be stored in the filtering calculator 50.As mentioned above, filter circuit 100 is carried out the motion adaptive spatio-temporal filtering operation, draws the vision signal of a time frequency band limits whereby.
Be coupled to video coding circuit 60 from the filtered vision signal of filtering calculator 50 outputs, therein, with the whole bag of tricks compressed video signal known in the present technique (for example, see MPEG video simulation model 3, International Standards Organization, the coded representation of picture and acoustic information, 1990, ISO-IEC/JTC1/SC2/WG8 MPEG 90/041).Encoded video signal is coupled to a transmitter for sending.
Though illustrate and described the present invention with reference to specific embodiment, apparent for person skilled in the art person, can under the condition that does not break away from defined invention spirit and scope in the appending claims, make many changes and correction.