CN101853510A

CN101853510A - Movement perception model extraction method based on time-space domain

Info

Publication number: CN101853510A
Application number: CN 201010152494
Authority: CN
Inventors: 石旭利; 潘琤雯; 张兆扬; 魏小文
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2010-04-20
Filing date: 2010-04-20
Publication date: 2010-10-06

Abstract

The invention relates to a movement perception model extraction method based on a time-space domain. The method comprises the specific steps of: inputting a video coding frame; establishing a movement model; scanning movement perception objects; establishing a time-space domain split model; and finally obtaining a final time-space domain movement perception model by combining an edge determination method with the movement model and the time-space domain split model. In consideration of the region consistency of movement objects, by combining time-space video image split, the invention improves the extraction effect of video movement objects and establishes the movement perception model based on a time-space domain.

Description

Movement perception model extraction method based on the time-space domain

Technical field

The present invention relates to a kind of movement perception model extraction method based on the time-space domain, the several data disposal route is fused on the video motion object that extracts the human eye concern, particularly on the basis of analyzing motion vector, incorporate sdi video area image dividing method, improved movement perception model greatly.

Background technology

The foundation of movement perception model has become the research focus in the video processing technique.Video is the combination of image on continuous time, and the motion phenomenon that consecutive image produced makes the extraction of video motion object have certain realistic meaning.Motion in the video is the emphasis that numerous researchists pay close attention to liking the part of paying close attention to the most when people watch, therefore setting up a kind of good movement perception model.

The detection of object video and to cut apart be prerequisite and the basis of setting up movement perception model, wherein the object video detection is to find out the sport foreground part that is different from the background area in video image, and video object segmentation is with complete the separating from background of detected sport foreground part.As a classical problem of field of video processing, existing at present a large amount of research document.Whether according to video data is compressed format, partitioning algorithm can be divided into compression domain and cut apart with uncompressed domain and cut apart.By whether needing manually to participate in cutting procedure, can be divided into automated manner and semi-automatic mode.According to the difference of the information of utilizing in the cutting procedure, can be divided into time domain and cut apart to unite and cut apart with the time-space domain.

Though proposed a lot of partitioning algorithms, and different segmentation procedure is arranged, basic segmentation strategy is all roughly the same.The general step of partitioning algorithm comprises the analysis to video data, determines zone to be split and different dividing methods.Because the content complexity of object video own, the present situation of artificial intelligence technology has determined current computer still not have people's observation, identification, has understood the ability of image simultaneously.Also there is not a kind of general, effective dividing method at present.The research tendency of current video partitioning algorithm is to seek better associating time-domain information and time-space domain information dividing method.

Given this, this method has incorporated the information in sdi video territory and supporting disposal route utilizing the video compress domain information to extract on the basis of motion model.The fusion of this spatial domain and time domain method for processing video frequency makes the movement perception model of final extraction have better effect.

Summary of the invention

The objective of the invention is defective, a kind of movement perception model extraction method based on the time-space domain is provided, improve the extraction effect of motion perceptive object, obtain a kind of more desirable movement perception model at the prior art existence.This model can be used to improve video coding algorithm, improves the encoder bit rate of motion perception part by the number of coded bits that reduces non-motion perception part.

For reaching above-mentioned purpose, design of the present invention is: as shown in Figure 1, at first utilize pretreated motion vector to set up motion diagram, filter out the motion object that paid close attention to by human eye according to a plurality of conspicuousness parameters then, obtain motion model; Utilize video image brightness information to carry out image segmentation simultaneously and obtain the spatial domain parted pattern, and obtain movement perception model based on the time-space domain in conjunction with above two models.Motion vector pre-service and edge decision process are as follows among Fig. 1:

(1) perceive motion object screening technique: have a plurality of motion objects simultaneously among the motion vector figure that calculates by the motion vector entropy, but be not all motion objects all be that human eye is paid close attention to.We utilize the positional information of motion object and the ratio of shared pixel to set a conspicuousness coefficient, according to the size screening motion object of greatest concern of conspicuousness coefficient.

(2) edge decision method: because the motion model and the spatial domain parted pattern that utilize motion vector to obtain all are to be base unit with the connected region; Motion model has indicated the positional information in the shared zone of motion object; Parted pattern indicates the connection piece that the different texture zone is formed.Therefore we utilize the ratio of shared different texture zone, moving region number to be listed as to determine which texture region is contained in the motion object that will extract, find out the regional kind in the level and smooth block image of this UNICOM shared warp of remarkable motion subject area, and count the number N of i regional shared image slices vegetarian refreshments in the block image _All(i) and significantly shared i the regional pixel number N in UNICOM zone _Abject(i); Definition Obj _Reg(i) indicate the shared segmented areas of motion object, formula (1) is seen in definition.

{Obj}_{Reg} (i) = \{\begin{matrix} 0, & if N_{all} (i) > ϵ_{1} * N_{abject} (i) and N_{abject} (i) < ϵ_{2} * \underset{i}{Σ} N_{abject} (i) \\ 1, & else \end{matrix} - - - (1)

ε wherein ₁And ε ₂Be whether two differentiations are the threshold value in the shared zone of motion object, ε ₁Be decided to be 3, ε ₂Be decided to be 0.5; Occur simultaneously by the zone that above method is found out in motion model and the parted pattern, and obtain final movement perception model.

Pre-service motion vector and obtain directional information at first, the follow-up direction that all is based on motion vector of handling all about the motion vector entropy.Motion vector direction value is divided into the two-way input, the correlativity entropy in one tunnel input space territory calculates again, and the correlativity entropy in another territory input time, road calculates, and according to the comprehensive above two kinds of correlativity entropys of certain principle, obtains motion vector figure.Utilize adaptive threshold system of selection division motion perpetual object and non-motion perpetual object zone to set up corresponding motion model at last.

Information entropy can be used for the size of metric amount, and system is orderly, and information entropy is just low more; System is chaotic, and information entropy is just high more.Because camera lens moves and can cause that background also has motion vector in the video, so for the moving situation in the correct description video image, we have introduced the information entropy of motion vector.The information entropy of motion vector can be divided into time domain entropy and spatial domain entropy.So-called time domain entropy is the degree of consistency of the motion vector in the different frame image; The spatial domain entropy is the degree of consistency of adjacent motion vectors on the current vector space.We extract motion model by time domain and the spatial domain entropy of analyzing the motion vector direction, specifically can be divided into the following aspects and finish:

(1) pre-service motion vector: at first original motion vector data is carried out the mean filter mask process of 3 * 3 sizes, the independent noise in the smooth motion polar plot realizes that motion vector figure going spatially is dry.Motion vector figure to every frame and front and back two frames thereof carries out arithmetic mean then, can go dryly in time, and remedies the zero motion vector situation of present frame.

(2) acquisition of motion vector direction: we obtain the directional information of motion vector by the vector formula, suppose pretreated n frame the (i, j) motion vector of macro block be PV (i, j)=(X _{N, i, j}, y _{N, i, j}), then to obtain formula as follows for the motion vector direction:

θ _n，i，j＝arctan(y _n，i，j/x _n，i，j) (2)

(3) computer memory correlativity entropy: because the spatial coherence entropy depends on each macro block and numerical value on every side thereof, therefore we distribute by the computer memory correlation probabilities and obtain, shown in formula (3) and formula (4), (i wherein, j) position of the current motion vector of expression, the spatial information amount of Cs () expression motion vector, P _sBe the corresponding probability assignments function of histogram SH, m is a histogram space size.W represents the window size of Nx N.Their quantity of information is all calculated in requirement to each motion vector of every frame.

P_{S} (n) = \frac{{SH}_{i, j}^{w} (n)}{Σ_{l = 1}^{m} {SH}_{i, j}^{w} (l)} - - - (3)

Cs (i, j) = - Σ_{n = 1}^{m} P_{s} (n) Log (P_{S} (n)) - - - (4)

(4) computing time the correlativity entropy: motion vector temporal correlation entropy then depends on the numerical value of the macro block between current macro and preceding L/2 frame, the back L/2 frame.Computing formula is as (5) and formula (6).Wherein, (Ct represents the spatial information amount of motion vector, P for i, the j) position of the current motion vector of expression _tBe the corresponding probability assignments function of histogram TH, m is a histogram space size.Relevant frame number on the L express time axle.Their quantity of information is all calculated in requirement to each motion vector of every frame.

P_{t} (n) = \frac{{TH}_{i, j}^{L} (n)}{Σ_{l = 1}^{m} {TH}_{i, j}^{L} (l)} - - - (5)

Ct (i, j) = - Σ_{n = 1}^{m} P_{t} (n) Log (P_{t} (n)) - - - (6)

(5) combination of motion vector figure: at first normalization phase space and time entropy obtain spatial domain and time domain motion vector entropy matrix diagram C to [0,1] _sAnd C _tC _t(i, j) and Cs (i, the j) arbitrfary point in the corresponding matrix, the size of the entropy of arbitrfary point relatively then: if the value of spatial domain is greater than the value of time domain, then this point is judged to be the value of spatial domain, otherwise then gets the value of time domain, as the formula (7).C in the formula _Ts(i, j) be in conjunction with after the motion vector entropy chart in the arbitrfary point.

C_{ts} (i, j) = \{\begin{matrix} C_{t} (i, j) & C_{t} (i, j) > C_{s} (i, j) \\ C_{s} (i, j) & C_{s} (i, j) \leq C_{t} (i, j) \end{matrix} - - - (7)

(6) the self-adaptation entropy is selected: the entropy chart that above method obtains, object wherein is fuzzyyer, also can be subjected to The noise, so by setting the Threshold Segmentation entropy chart between object and the background.We adopt the method based on the adaptively selected threshold value of maximum fault information, obtain motion vector and cut apart figure.

Set up process as shown in Figure 3 based on the parted pattern of video image spatial domain, its committed step is as follows:

(1) Mean-shift algorithm smoothed image: the Mean-shift algorithm has smoothing effect to image, can remove the grain details in the image.And this algorithm and general fuzzy algorithm are different, and it has kept image edge information when the smoothed image texture information.We utilize the luminance component of Mean-shift algorithm smoothed video image, and choose the kernel function of Gaussian function as the Mean-shift algorithm, and formula is as follows:

G (x) = e^{{- | | x | |}^{2}} - - - (8)

(2) utilize region-growing method to obtain block image: to choose arbitrary pixel on the image after the Mean-shift algorithm is level and smooth, constantly search the point that approximate characteristic is arranged with it with this as the starting point outward, and they are classified as a class.After the growth that finishes a panel region, choose not classified pixel once more and repeat above step until the merging work of finishing all pixels.

(3) merger segmented areas: find out to cut apart and comprise the cut zone of number of pixels less than threshold value T among the figure, because these region shapes all are random, therefore the zone that is adjacent tends to above four.This method only pay close attention to the going up most of current area territory, the most following, the most left and the rightest facing connects the zone, with them as the merger target.At first find out four adjacent areas by the current position, merger zone for the treatment of, calculate the difference of these four regional luminances and current region brightness then, compose the numbering and the average in this zone to treating the merger zone as final objective merger zone in the zone of getting the luminance difference minimum.By treating the merger zone in the above method circular treatment image, up to All Ranges pixel number all greater than threshold value T.

According to above-mentioned theory, process of the present invention is following step:

(1) utilize motion vector to obtain motion model, its process is:

1. the motion vector that produces in the cataloged procedure is carried out 3 * 3 mean filter mask process;

2. establish n frame (i, j) motion vector of individual macro block be PV (i, j)=(X _{N, i, j}, y _{N, i, j}), the motion vector of this macro block is defaulted as the motion vector of each pixel in the macro block, the direction of this vector is θ _{N, i, j}

θ _n，i，j＝arctan(y _n，i，j/x _n，i，j)

3. calculate the current pixel point and the probability histogram distribution function of the motion vector value of eight points on every side thereof

Wherein SH () is by the current pixel point and the direction value θ of the motion vector of eight points on every side thereof _{N, i, j}The histogram of being formed, m are histogram space size, and w represents the search window size of N*N; Calculate the spatial coherence entropy of the motion vector value of each pixel according to the probability distribution situation of gained

Wherein, the spatial information entropy of Cs () expression motion vector, P _sIt is the corresponding probability distribution function of histogram SH ();

4. calculate the probability histogram distribution function of the motion vector value of pixel on the same position of the motion vector value of current pixel point and front and back three frames thereof

Wherein TH () is the motion vector direction value θ by current pixel point and front and back three frame relevant position pixels thereof _{N, i, j}The histogram of being formed, P _tBe the corresponding probability distribution function of histogram TH (), m is a histogram space size, the relevant frame number on the L express time axle; And be calculated as follows the temporal correlation entropy of each pixel motion vector direction:

The temporal information entropy of Ct () expression motion vector;

5. the room and time entropy chart that previous step is obtained is normalized to [0,1], spatial domain that obtains and time domain motion vector entropy matrix diagram Cs, Ct.(i, j), (i, the j) arbitrfary point in the corresponding matrix, the size of the entropy of arbitrfary point relatively then: if the value of spatial domain is greater than the value of time domain, then this point is the value of spatial domain to Cs to Ct, otherwise then gets the value of time domain.Utilize following formula generalized time and spatial information at last, obtain final space time information entropy.Cts in the formula (i, j) be in conjunction with after the motion vector entropy chart in the arbitrfary point.

C_{ts} (i, j) = \{\begin{matrix} C_{t} (i, j) & C_{t} (i, j) > C_{s} (i, j) \\ C_{s} (i, j) & C_{s} (i, j) \leq C_{t} (i, j) \end{matrix}

6. make in the two field picture minimum space time information entropy be Min[f (x, y)], represent with message level 0, make maximum space time information entropy be Max[f (x, y)], l-1 represents with message level, R={0,1..., l-1} represent the set of message level; Definition N _p(p ∈ R) pixel quantity when being p for message level promptly has the pixel number of identical information entropy; For threshold value t ∈ R, need find out wherein the pairing space time information entropy of certain one-level in the l-1 grade as threshold value t in 0 grade, and carry out self-adaptation according to threshold value t and divide, the information entropy that promptly is lower than threshold value is

The information entropy that is higher than threshold value is

Threshold value wherein

After obtaining threshold value t, can be by following Rule Extraction moving region: promptly when the space time information entropy of pixel during greater than threshold value t, promptly this pixel be in the moving region, otherwise is in non-moving region.

(2) screening motion perceptive object, its process is:

1. define a remarkable coefficient of motion object, screened the motion object that human eye is the most paid close attention to.Be defined as follows: α (j)=α _Location(j) * α _Motion(j);

α_{Location} (j) = \frac{1}{\sqrt{{(x_{center} (j) - x_{pic_center})}^{2} + {(y_{center} (j) - y_{pic_center})}^{2}}};

α_{Motion} (j) = \frac{\overset{&OverBar;}{| | MV (j) | |}}{\overset{&OverBar;}{| | {MV}_{all} | |}};

Wherein α (j) is j motion motion of objects conspicuousness coefficient by obtaining after calculating, α _Location(j) be the inverse of j motion object UNICOM regional center, (x to the distance of picture centre _Center(j), y _Center(j)) be the center position coordinates in the UNICOM zone of j motion object.(x _{Pic_center}, y _{Pic_center}) be the coordinate of the central point of image.α _Motion(j) be j the remarkable coefficient of motion motion of objects, molecule is the mean value of mould of the motion vector of j the shared macro block of motion object, and denominator is the mean value of the motion vector mould of all macro blocks in current this two field picture;

2. after obtaining the conspicuousness coefficient of each motion object, these coefficients are sorted, find out the motion object of conspicuousness coefficient maximum.And determine that this motion object is the follow-up object of accurately cutting apart;

(3) utilize the spatial domain monochrome information to obtain the Video Segmentation model, its process is as follows:

1. (n, i j), utilize the level and smooth luminance component of mean-shift algorithm to extract luminance component Y in the video image.If x treats level and smooth initial point, can be calculated as follows the mean-shift vector m of this point _k(x), as || m _h(x)-and finish the mean shift computing of current some during x||＜ε, the off-set value that obtains is composed to current pixel point, and carried out the calculations of offset of next point, until the calculating of finishing all pixels;

m_{h} (x) = \frac{Σ_{i = 1}^{n} G (\frac{x_{i} - x}{h}) w (x_{i}) x_{i}}{Σ_{i = 1}^{n} G (\frac{x_{i} - x}{h}) w (x_{i})}

2. utilize the image after region-growing method is cut apart the mean-shift algorithm process, obtain video image initial segmentation figure, each cut zone is the UNICOM zone all among this figure, and comprises the certain number of pixels point.

3. utilize regional conflation algorithm to handle initial segmentation figure, further promote segmentation effect: find out and comprise the cut zone of number of pixels less than threshold value T, according to experimental result, threshold value T gets 50.Set four coordinate (x simultaneously _l, y _l), (x _r, y _r), (x _u, y _u) and (x _d, y _d), deposit respectively current region the most left, about, the pixel coordinate figure on limit, highest and lowest.Behind the coordinate initial point of determining four target merger zones, calculate four target area brightness L respectively _Region(1), L _Region(2), L _Region(3) and L _Region(4) with the interference region luminance difference for the treatment of merger.Compose the numbering and the average in this zone to treating the merger zone as the merger zone in the zone of getting the luminance difference minimum then.

(4) extract final movement perception model, the process that the edge criterion obtains final movement perception model is as follows:

1. utilize region-growing method location according to 3) positional information of the motion object that resulting conspicuousness is the strongest;

2. find out this motion object region in the spatial domain split image, and count the number N of i regional shared image slices vegetarian refreshments in the block image _All(i) and significantly shared i the regional pixel number N in UNICOM zone _Object(i);

3. according to the level and smooth block diagram that obtains and the most remarkable motion subject area figure and two department of statistic's numerical value of N in the 4th goes on foot, obtaining _All(i) and N _Object(i) extract the motion perceptive object, realize the foundation of movement perception model.We define Obj _Reg(i) come the shared segmented areas of marker motion object, define formula as follows:

{Obj}_{Reg} (i) = \{\begin{matrix} 0, & if N_{all} (i) > ϵ_{1} * N_{abject} (i) and N_{abject} (i) < ϵ_{2} * \underset{i}{Σ} N_{abject} (i) \\ 1, & else \end{matrix} - - - (3 - 17)

ε wherein ₁And ε ₂Be whether two differentiations are the threshold value in the shared zone of motion object, ε ₁Be decided to be 3, ε ₂Be decided to be 0.5.

According to the foregoing invention design, the present invention adopts following technical proposals:

A kind of movement perception model extraction method based on the time-space domain is characterized in that concrete step is as follows:

(1) input video coded frame;

(2) set up motion model: calculating kinematical vector spatial domain and time domain entropy obtain time domain and spatial domain motion model, and comprehensive two kinds of models obtain initial motion model;

(3) screening perceive motion object: position and shared pixel number by a plurality of motion objects in the analysis-by-synthesis motion model extract the motion object that human eye is paid close attention to most;

(4) set up the spatial domain parted pattern: utilize video image brightness information to carry out Mean-shift and region-growing method obtains the spatial domain split image, set up the spatial domain parted pattern with this;

(5) utilize the edge decision method to obtain final time-space domain motion sensor model in conjunction with motion model and spatial domain parted pattern.

Set up in the above-mentioned steps (2)

Screening perception in the above-mentioned steps (3)

Set up the space in the above-mentioned steps (4)

Spatial domain in the above-mentioned steps (5)

Compare with movement perception model in the past, this method can accurately extract the motion object.Combine spatial domain Video Segmentation effect and motion vector entropy and extract, make final movement perception model can show the object of being paid close attention to when people watch video well.

Description of drawings

Fig. 1 is of the present invention based on time-space domain motion sensor model extracting method principle process block diagram.

Fig. 2 is a structured flowchart of setting up motion model among Fig. 1.

Fig. 3 is a structured flowchart of setting up the spatial domain parted pattern among Fig. 1.

Fig. 4 is the motion vector figure and the motion model figure of the resulting a certain frame of input mother-daughter sequence in the JM10.2 verification model.

Fig. 5 is level and smooth figure of the Mean-shift of the resulting a certain frame of input mother-daughter sequence in the JM10.2 verification model and spatial domain parted pattern figure.

Fig. 6 is the spatial domain parted pattern figure and the movement perception model figure of the resulting a certain frame of input mother-daughter sequence in the JM10.2 verification model.

Fig. 7 is the method specified operational procedure block diagram that the present invention adopts.

Embodiment

The preferred embodiments of the present invention detailed description in conjunction with the accompanying drawings is as follows:

Embodiment one: the present invention is based on time-space domain motion sensor model extracting method is by flow chart shown in Figure 1, be that programming realizes that Fig. 6 imports the resulting a certain frame movement perception model figure of mother-daughter sequence on the JM10.2 verification model on the PC test platform of Athlon x22.0GHz, internal memory 1024M at CPU.

Referring to Fig. 1, the present invention is based on time-space domain motion sensor model extracting method, by the motion vector that is produced in the analysis of encoding process, extract initial motion model.Utilize the monochrome information of spatial domain to obtain the video image parted pattern simultaneously.On above two model based, utilize the edge resolution principle to obtain final movement perception model.The movement perception model that obtains by the method combines the characteristic of spatial domain and time domain, the characteristic when more meeting the human eye sense and seeing video.

Referring to its concrete operations step of Fig. 7 be:

(1) input video frame;

(2) set up motion model: utilize motion vector to carry out spatial domain and the calculating of time domain entropy, obtain time domain and spatial domain motion model, and comprehensive two kinds of models obtain initial motion model;

(3) screen the motion perceptive object by position and the shared pixel number of analyzing motion object in the motion model;

(4) set up the spatial domain parted pattern; Utilize video image brightness information to carry out Mean-shift and region-growing method, obtain the spatial domain split image, set up the spatial domain parted pattern;

(5) utilize edge criterion fusional movement model and spatial domain parted pattern, obtain final movement perception model.

Embodiment two: present embodiment and embodiment one are basic identical, and special feature is as follows: it is as follows that the motion model of above-mentioned steps (2) is set up process:

1. the motion vector that produces in the cataloged procedure is carried out the mean filter processing of 3 * 3 masks;

2. establish n frame (i, j) motion vector of individual macro block be designated as PV (i, j)=(x _{N, i, j}, y _{N, i, j}), the motion vector of this each pixel of macro block all is that (i, j), the direction of motion of vector is θ to PV _{N, i, j}=arctan (y _{N, i, j}/ x _{N, i, j})

3. calculate the current pixel point and the probability histogram distribution function of the motion vector direction of eight points on every side thereof

The spatial information entropy of Cs () expression motion vector, P _sIt is the corresponding probability distribution function of histogram SH ();

4. calculate the probability histogram distribution function of the motion vector value of the pixel on the same position of the motion vector value of current pixel point and front and back three frames thereof

Wherein TH () is the motion vector direction indication value θ by current pixel point and front and back three frame relevant position pixels thereof _{N, i, j}The histogram of being formed, P _tBe the corresponding probability distribution function of histogram TH (), m is a histogram space size, the relevant frame number on the L express time axle; Calculate the temporal correlation entropy of the motion vector value of each pixel thus: The temporal information entropy of Ct () expression motion vector;

5. the space of normalization motion vector phase place, time entropy be to [0,1], spatial domain that obtains and time domain motion vector entropy matrix diagram Cs, Ct.Ct (i, j), Cs (i, j) arbitrfary point in the corresponding matrix, and the size of the entropy of arbitrfary point relatively: if the value of spatial domain is greater than the value of time domain, then current point is the value of spatial domain, otherwise then get the value of time domain, be shown below, generalized time and spatial information obtain final space time information entropy.Cts in the formula (i, j) be in conjunction with after the motion vector entropy chart in the arbitrfary point.

C_{ts} (i, j) = \{\begin{matrix} C_{t} (i, j) & C_{t} (i, j) > C_{s} (i, j) \\ C_{s} (i, j) & C_{s} (i, j) \leq C_{t} (i, j) \end{matrix}

6. make in the two field picture minimum space time information entropy be Min[f (x, y)], represent with message level 0; Maximum space time information entropy be Max[f (x, y)], l-1 represents with message level; R={0,1..., l-1} represent the set of message level.Definition N _p(p ∈ R) pixel quantity when being p for message level promptly has the pixel number of identical information entropy; For threshold value t ∈ R, need find out wherein the pairing space time information entropy of certain one-level in the l-1 grade as threshold value t in 0 grade, and according to threshold value t self-adaptation division information entropy: the information entropy that is lower than threshold value is: The information entropy that is higher than threshold value is:

Threshold value wherein

After obtaining threshold value t, can find out moving region in the image: when the space time information entropy of pixel during greater than threshold value t, promptly this pixel is in the moving region; Otherwise be in non-moving region.

The process of above-mentioned steps (3) screening perceive motion object is as follows:

1. define a remarkable factor alpha of motion object (j), screened the motion object that final human eye the most to be split is paid close attention to.Be defined as follows:

α(j)＝α _Location(j)*α _Motion(j)；

α_{Location} (j) = \frac{1}{\sqrt{{(x_{center} (j) - x_{pic_center})}^{2} + {(y_{center} (j) - y_{pic_center})}^{2}}};

α_{Motion} (j) = \frac{\overset{&OverBar;}{| | MV (j) | |}}{\overset{&OverBar;}{| | {MV}_{all} | |}};

Wherein α (j) is j motion motion of objects conspicuousness coefficient by obtaining after calculating, α _Location(j) be the inverse of j motion object UNICOM regional center, (x to the distance of picture centre _Center(j), y _Center(j)) be the center position coordinates in the UNICOM zone of j motion object.(x _{Pic_center}, y _{Pic_center}) be the coordinate of the central point of image.α _Motion(j) be j the remarkable coefficient of motion motion of objects, molecule is the mean value of mould of the motion vector of j the shared macro block of motion object, and denominator is the average of the motion vector mould of all macro blocks in current this two field picture;

2. after obtaining the conspicuousness coefficient of each motion object, these coefficients are sorted, find out the pairing motion object of maximum conspicuousness coefficient.And determine that this motion object is the follow-up object that will accurately cut apart;

It is as follows that the spatial domain parted pattern of above-mentioned steps (4) is set up process:

1. extract luminance component Y in the video image (n, i, j), utilize the level and smooth Y of mean-shift algorithm (n, i, j).Make that x is the current level and smooth initial point for the treatment of, can be calculated as follows the mean-shift vector m of this point _k(x), as || m _h(x)-mean shift that finishes current some during x||＜ε calculates, off-set value composed to current pixel point, and carried out the calculations of offset of next point, until the mean shift algorithm of finishing all pixels;

m_{h} (x) = \frac{Σ_{i = 1}^{n} G (\frac{x_{i} - x}{h}) w (x_{i}) x_{i}}{Σ_{i = 1}^{n} G (\frac{x_{i} - x}{h}) w (x_{i})}

2. utilize the smoothed image after region-growing method is cut apart the mean-shift algorithm process, obtain video image initial segmentation figure, each cut zone all is the UNICOM zone and comprises the certain number of pixels point among this figure, and all cut zone can be formed entire image;

3. utilize regional conflation algorithm to handle initial segmentation figure, further promote segmentation effect: find out and cut apart that the pixel number is less than the zone of threshold value T among the figure, according to experimental result, threshold value T gets 50.Set four coordinate (x _l, y _l), (x _r, y _r), (x _u, y _u) and (x _d, y _d) deposit respectively current region the most left, about, the pixel coordinate on limit, highest and lowest.These four coordinates are initialized as first pixel (x in the zonule ₁, y ₁).Traversal treats that each pixel writes down its coordinate (x in the interfered cell territory of merger then _i, y _i), and according to four coordinate figures of following formula renewal.In the traversal interference region behind all pixels, the target merger zone in four corresponding four orientation of coordinate.

\{\begin{matrix} x_{l} = x_{i}, y_{l} = y_{i} & x_{i} < x_{l} \\ x_{r} = x_{i}, y_{r} = y_{i} & x_{i} > x_{r} \\ x_{u} = x_{i}, y_{u} = y_{i} & y_{i} < y_{u} \\ x_{d} = x_{i}, y_{d} = y_{i} & y_{i} > y_{d} \end{matrix}

Behind the coordinate initial point of determining four target merger zones, calculate four target area brightness L respectively _Region(1), L _Region(2), L _Region(3) and L _Region(4) with treat merger regional luminance difference.Compose the numbering and the average in this zone to treating the merger zone as the merger zone in the zone of getting the luminance difference minimum at last.

The process that the edge criterion of above-mentioned steps (5) obtains the perceive motion model is as follows:

1. utilize region-growing method location according to 3) positional information of resulting motion object;

2. find out this motion object shared zone in the spatial domain split image, and count the number N of i regional shared image slices vegetarian refreshments in the block image _All(i) and significantly shared i the regional pixel number N in UNICOM zone _Object(i);

3. according to level and smooth block diagram and the most remarkable motion subject area figure and two department of statistic's numerical value of N of obtaining _All(i) and N _Object(i), accurately extract remarkable motion object, obtain final movement perception model.We define Obj _Reg(i) come the shared segmented areas of marker motion object, define formula as follows.

{Obj}_{Reg} (i) = \{\begin{matrix} 0, & if N_{all} (i) > ϵ_{1} * N_{abject} (i) and N_{abject} (i) < ϵ_{2} * \underset{i}{Σ} N_{abject} (i) \\ 1, & else \end{matrix} - - - (3 - 17)

ε wherein ₁And ε ₂Be whether two differentiations are the threshold value in the shared zone of motion object, ε ₁Be decided to be 3, ε ₂Be decided to be 0.5;

Example when below providing the input video form and be 352 * 288 CIF adopts the H.264 scrambler of JM10.2 version that standard test sequences is encoded.H.264 the configuration of scrambler is as follows: Baseline Profile, and IPPP, per 15 frames insert 1 I frame, 1 reference frame, bandwidth is set to 256k bps, and frame per second is set to 30fps, and the initial quantization parameter is set to 32.

Adopt typical standard test sequences mother-daughter to test as input video, Fig. 4 is for passing through to analyze the resulting motion model of motion vector, this motion model can only reflect the position at motion object place roughly as can be seen from the figure, can not with the motion objects intact extract.Fig. 5 is the parted pattern that the spatial domain brightness analysis is set up, and this model can keep the edge of object well, intactly is partitioned into object.Therefore combine above two kinds of resulting movement perception models of model, can intactly be partitioned into the human eye institute motion object of concern the most as Fig. 6.

Claims

1. movement perception model extraction method based on the time-space domain is characterized in that concrete step is as follows:

(1) input video coded frame;

2. the movement perception model extraction method based on space-time according to claim 1 is characterized in that setting up motion model in the described step (2) realizes by following steps:

1. the motion vector that produces in the cataloged procedure is carried out the mean filter mask process of 3 * 3 sizes;

2. establish n frame (i, j) motion vector of macro block be PV (i, j)=(x _{N, i, j}, y _{N, i, j}), the motion vector of this macro block is defaulted as the motion vector of each pixel in this macro block, wherein x _{N, i, j}Be motion vector x component, y _{N, i, j}Be motion vector y component, the direction of motion of this vector is expressed as θ _{N, i, j}

θ _n，i，j＝arctan(y _n，i，j/x _n，i，j)

3. calculate the current pixel point and the probability histogram distribution function of the motion vector value of eight points on every side thereof Wherein SH () is by the current pixel point and the direction value θ of the motion vector of eight points on every side thereof _{N, i, j}The histogram of being formed, m are histogram space size, and w represents the search window size of N*N, the n mark current motion vector position, 1 exemplified around eight point motion vectors; Calculate the spatial coherence entropy of the motion vector value of each pixel according to the probability distribution situation of gained The spatial information entropy of Cs () expression motion vector, P _sIt is the corresponding probability distribution function of histogram SH ();

4. calculate the probability histogram distribution function of the motion vector value of the motion vector value of current pixel point and front and back three frame same position pixels thereof

Wherein TH () is the motion vector direction value θ by current pixel point and front and back three frame relevant position pixels thereof _{N, i, j}The histogram of being formed, P _tBe the corresponding probability distribution function of histogram TH (), m is a histogram space size, the relevant frame number on the L express time axle, the n mark current motion vector position, 1 exemplified before and after frame eight motion vectors; Calculate the temporal correlation entropy of the motion vector value of each pixel thus:

The temporal information entropy of Ct () expression motion vector;

5. phase space and the time entropy chart with a two field picture is normalized to [0,1], spatial domain that obtains and time domain motion vector entropy matrix diagram Cs, Ct; Ct (i, j), Cs (i, j) arbitrfary point in the corresponding matrix, the size of the entropy of arbitrfary point relatively then: if the value of spatial domain is greater than time domain, then this final decision is for being the spatial domain entropy, otherwise then get the time domain entropy, as following formula (1) generalized time and spatial information, obtain final space time information entropy, C in the formula _Ts(i, j) be in conjunction with after the motion vector entropy chart in the arbitrfary point:

C_{ts} (i, j) = \{\begin{matrix} C_{t} (i, j) & C_{t} (i, j) > C_{s} (i, j) \\ C_{s} (i, j) & C_{s} (i, j) \leq C_{t} (i, j) \end{matrix} - - - (1)

6. in a two field picture, the minimum space time information entropy of order be Min[C (i, j)], represent with message level 0, make maximum space time information entropy be Max[C (i, j)], l-1 represents with message level, R={0,1 ..., l-1} represents the set of message level; Definition N _p(p ∈ R) pixel quantity when being p for message level promptly has the pixel number of identical information entropy, and wherein i and n have enumerated the pixel of summation process; For threshold value t ∈ R, need find out wherein the pairing space time information entropy of certain one-level in the l-1 grade as threshold value t in 0 grade, and carry out self-adaptation according to threshold value t and divide, the information entropy that promptly is lower than threshold value is

The information entropy that is higher than threshold value is

Threshold value wherein

Wherein argmax represents when being higher than threshold point information entropy E _ABe lower than threshold point information entropy E _BThe threshold value t that gets during the sum maximal value; After finding information entropy threshold value t, can divide the moving region by the t value: when the space time information entropy of pixel during greater than threshold value t, this pixel is in the moving region, otherwise is in non-moving region.

3. perceive motion object screening technique according to claim 1, it is characterized in that the screening perceive motion is to liking in the described step (3): define a remarkable coefficient of motion object, as following formula (2)～formula (5), utilize these coefficients to filter out the motion object that human eye is the most paid close attention to; After obtaining the conspicuousness coefficient of each motion object, we sort to these coefficients, find out the motion object of conspicuousness coefficient maximum; Determine the motion object that this motion object is a subsequent treatment;

α(j)＝α _Location(j)*α _NRate(j)*α _Motion(j) (2)

α_{Location} (j) = \frac{1}{\sqrt{{(x_{center} (j) - x_{pic_center})}^{2} + {(y_{center} (j) - y_{pic_center})}^{2}}} - - - (3)

α_{NRate} (j) = \frac{α_{num} (j)}{N_{all}} - - - (4)

α_{Motion} (j) = \frac{\overset{&OverBar;}{| | MV (j) | |}}{\overset{&OverBar;}{| | {MV}_{all} | |}} - - - (5)

α in the formula (j) is j motion motion of objects conspicuousness coefficient by obtaining after calculating, α _Location(j) be the inverse of j motion object centers, (x to the distance of picture centre _Center(j), y _Center(j)) be the center position coordinates in the UNICOM zone of j motion object.(x _{Pic_center}, y _{Pic_center}) be the coordinate of the central point of image; α _Num(j) be the number of the shared image slices vegetarian refreshments in j motion object UNICOM zone, N _AllIt is the sum of pixel in the two field picture.α _NRate(j) be the ratio that j motion object pixel number accounts for total pixel number; α _Motion(j) be j the remarkable coefficient of motion motion of objects, molecule

Be the mean value of mould of the motion vector of j the shared macro block of motion object, denominator

Mean value for the motion vector mould of all macro blocks in current this two field picture.

4. the movement perception model extraction method based on space-time according to claim 1 is characterized in that the spatial domain parted pattern of setting up in the described step (4) is realized by following steps:

1. (n, i j), at first utilize each luminance component in the mean-shift algorithm smoothed image to extract luminance component Y in the video image; If x treats level and smooth initial point, following public affairs are calculated the mean-shift vector m of this point by formula (6) _k(x), as || m _k(x)-mean shift that finishes current some during x||＜ε calculates, off-set value composed to current pixel point; Proceed down the calculations of offset of a bit, until the mean shift algorithm of finishing all pixels, ε is for judging the threshold value that whether continues translation, and its size is determined by experiment, m _k(x) be the offset vector value of mean value computation each time;

m_{h} (x) = \frac{Σ_{i = 1}^{n} G (\frac{x_{i} - x}{h}) w (x_{i}) x_{i}}{Σ_{i = 1}^{n} G (\frac{x_{i} - x}{h}) w (x_{i})} - - - (6)

G () in the following formula is a kernel function, and it is cheap kernel function, the w (x of calculating that Gaussian function is set in this experiment _i) for each sampled point calculates weight, it is 1 that this paper sets weight;

2. utilize the image after region-growing method is cut apart the mean-shift algorithm process, obtain video image initial segmentation figure, each cut zone all is the UNICOM zone and comprises the certain number of pixels point among this figure, and the set of all independent cut zone is the view picture video image;

3. utilize regional conflation algorithm to handle initial segmentation figure and further promote segmentation effect: find out to cut apart and comprise the cut zone of number of pixels less than threshold value T among the figure, according to experimental result, threshold value T gets 50; Set four coordinate (x _l, y _l), (x _r, y _r), (x _u, y _u) and (x _d, y _d) deposit respectively current region the most left, about, the pixel coordinate on limit, highest and lowest.These four coordinates are initialized as first pixel (x in the subregion ₁, y ₁); Traversal treats that each pixel writes down its coordinate (x in the interference subregion of merger then _i, y _i), and according to four coordinate figures of following formula (7) renewal; Traversal is disturbed in the subregion behind all pixels, four coordinate correspondences the target merger zone in four orientation.

\{\begin{matrix} x_{l} = x_{i}, y_{l} = y_{i} & x_{i} < x_{l} \\ x_{r} = x_{i}, y_{r} = y_{i} & x_{i} > x_{r} \\ x_{u} = x_{i}, y_{u} = y_{i} & y_{i} < y_{u} \\ x_{d} = x_{i}, y_{d} = y_{i} & y_{i} > y_{d} \end{matrix} - - - (7)

Behind the coordinate initial point of determining four target merger zones, calculate four target area brightness L respectively _Region(1), L _Region(2), L _Region(3) and L _Region(4) with the interference region luminance difference for the treatment of merger.Compose the numbering and the average in this zone to treating the merger zone as the merger zone in the zone of getting the luminance difference minimum at last.

5. the movement perception model extraction method based on space-time according to claim 1 is characterized in that the time-space domain motion sensor model in the described step (5) is further realized by following steps:

1. utilize region-growing method location explanation 3) in the positional information of the strongest motion object of resulting conspicuousness;

3. the level and smooth block diagram that obtains more than the foundation, the most remarkable motion subject area figure and the 4th go on foot two department of statistic's numerical value of N that obtain _All(i) and N _Object(i) accurately extract remarkable motion object, realize final motion Object Segmentation; Definition Obj _Reg(i) indicate the shared segmented areas of motion perceptive object, as following formula (8):

{Obj}_{Reg} (i) = \{\begin{matrix} 0, & if N_{all} (i) > ϵ_{1} * N_{abject} (i) and N_{abject} (i) < ϵ_{2} * \underset{i}{Σ} N_{abject} (i) \\ 1, & else \end{matrix} - - - (8)

ε wherein ₁And ε ₂Be whether two differentiations are the threshold value in the shared zone of motion object, ε ₁Be decided to be 3, ε ₂Be decided to be 0.5; Behind the motion perceptual map that obtains, promptly finished the foundation of whole movement perception model.