CN102917223B - Dynamic background video object extraction based on enhancement type diamond search and three-frame background alignment - Google Patents

Dynamic background video object extraction based on enhancement type diamond search and three-frame background alignment Download PDF

Info

Publication number
CN102917223B
CN102917223B CN201210398149.0A CN201210398149A CN102917223B CN 102917223 B CN102917223 B CN 102917223B CN 201210398149 A CN201210398149 A CN 201210398149A CN 102917223 B CN102917223 B CN 102917223B
Authority
CN
China
Prior art keywords
frame
block
point
sigma
macro block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210398149.0A
Other languages
Chinese (zh)
Other versions
CN102917223A (en
Inventor
祝世平
郭智超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xiaolajiao Technology Co ltd
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201210398149.0A priority Critical patent/CN102917223B/en
Publication of CN102917223A publication Critical patent/CN102917223A/en
Application granted granted Critical
Publication of CN102917223B publication Critical patent/CN102917223B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a video object extracting method based on enhancement type diamond motion estimation and three-frame background alignment under a dynamic background. The video object extracting method comprises the following steps of: firstly dividing reference frames K, K-1 and K+1 into 8*8 macro blocks, and screening all the macro blocks according to a macro block pre-judgment rule; carrying out block matching on the screened macro blocks by using a enhancement type diamond motion estimation method, so as to obtain a motion vector field of the K-1 frame relative to the K frame and a motion vector field of the K+1 frame relative to the K frame, and computing a global motion parameter by using a least square method; respectively carrying out motion compensation on the K-1 frame and the K+1 frame, and respectively aligning backgrounds of the K-1 frame and the K+1 frame with a background of the K frame, so as to obtain reconstruction frames K-1' and K+1'; extracting edge information of the reconstruction frame K-1', the reference frame K and the reconstruction frame K+1' by respectively adopting Prewitt operators, respectively computing frame differences of the edge information relative to the edge of the reference frame, and carrying out binaryzation by a maximum variance threshold; and finally, carrying out subsequent processes by adopting morphology, median filtering and the like, and realizing quick and effective cutting of a video object under the dynamic background.

Description

Based on the dynamic background video object extraction of enhancement mode diamond search and three frame background alignment
Technical field:
The present invention relates to the processing method in a kind of Video segmentation, particularly a kind of based on Video Object Extraction method under the dynamic background of enhancement mode rhombus estimation and three frame background alignment.
Background technology:
For the extraction of Moving Objects in dynamic video sequence, the global motion produced due to video camera makes the dividing method under static background, as: frame difference or background subtraction method of grading is not suitable for segmentation under dynamic background, namely can not exactly by moving object extract out, therefore the impact of the global motion that camera motion causes first must be eliminated for the segmentation problem under dynamic background, by overall motion estimation and compensation technique, problem is changed into the segmentation problem under static background, and then under application static background widely dividing method to realize under dynamic background accurate, effective segmentation.
Overall motion estimation refers to the characteristics of motion estimating the sequence background region caused by camera motion, solves the multiple parameters in respective counts student movement movable model.Global motion compensation is at the globe motion parameter obtained according to estimation, in the mapping transformation of an intercropping corresponding background alignment of present frame and former frame.After compensating accurately, the methods such as frame difference or background subtraction just can be adopted like this to eliminate background area, outstanding interested there is local motion foreground area (see Yang Wenming. the video object segmentation [D] of temporal-spatial fusion. Zhejiang: Zhejiang University, 2006).
For the motion segmentation problem under dynamic background, existing considerable scholar has done a large amount of research work in the world at present.As utilized the watershed algorithm of improvement, the frame of video after motion compensation is divided into different gray areas, the movable information of sequence is obtained by optical flow computation, finally, the region of movable information and segmentation is comprehensively obtained object template by certain criterion, reach accurate location to object video (see Zhang Qingli. a kind of Video object segmentation algorithm based on movement background. Shanghai University's journal (natural science edition), 2005,11 (2): 111-115.).As set up four movement parameter radiation patterns to describe global motion, block matching method is adopted to carry out parameter Estimation, detect moving target in conjunction with Horn-Schunck algorithm and application card Kalman Filtering is followed the tracks of information such as the centroid positions of moving target, achieve the detection and tracking of Moving Objects in dynamic scene.(see Shi Jiadong. moving object detection and tracking in dynamic scene. Beijing Institute of Technology's journal, 2009,29 (10): 858-876.).The another kind of method adopting nonparametric probability, the impact of background motion under the overall motion estimation backoff algorithm elimination dynamic scene of first employing coupling weighting, then estimate that each pixel belongs to the probability density of prospect and background and combining form scheduling algorithm processes, achieve the accurate and effective segmentation of Moving Objects under dynamic background.(see Ma Zhiqiang. motion segmentation new algorithm under a kind of dynamic scene. computer engineering and science, 2012,34 (4): 43-46.).
In order to solve the segmentation problem under dynamic background, the inventive method achieves overall motion estimation and the compensation methodes such as a kind of employing macro block judges in advance, Block-matching, video camera six parameter affine model, least square method, and realizes dynamic background segment by three frame background alignment jointing edge information etc.Experiment proves, the method achieve the extraction of object video in dynamic background video sequence, and extraction accuracy is improved significantly.
Summary of the invention:
The technical problem to be solved in the present invention is: the operation time how reducing Block-matching, how to realize the accurate extraction of object video under dynamic background.
The technical solution adopted for the present invention to solve the technical problems is: a kind of based on Video Object Extraction method under the dynamic background of enhancement mode rhombus estimation and three frame background alignment, comprises the following steps:
(1) with reference to frame K frame, K-1 frame, K+1 frame is divided into 8 × 8 macro blocks, judges according to texture information in advance, screens all macro blocks in K-1 frame, K+1 frame;
(2) SAD criterion is adopted to the macro block after above-mentioned screening, enhancement mode diamond search strategy carries out Block-matching, using K-1 frame as present frame, K frame as with reference to frame, obtain the motion vector field of K-1 frame relative to K frame; Using K+1 frame as present frame, K frame as with reference to frame, obtain the motion vector field of K+1 frame relative to K frame, and calculate globe motion parameter by least square method, obtain video camera six parameter model;
(3) motion compensation is carried out to K-1 frame, make K-1 frame and K frame background alignment, obtain reconstruction frames K-1', motion compensation is carried out to K+1 frame, make K+1 frame and K frame background alignment, obtain reconstruction frames K+1';
(4) adopt Prewittl operator extraction marginal information respectively, calculate that it is poor relative to the frame of reference frame K-edge respectively, and adopt maximum variance threshold value to carry out binaryzation;
(5) employing carries out reprocessing with computing and morphology, medium filtering etc., effective fast segmentation of object video under the dynamic background of realization.
Judge in advance for 8 × 8 macro blocks be divided in present frame K-1 frame, K+1 frame and screen in described step (1), concrete steps are as follows:
Owing to apply least square method calculating globe motion parameter in following step in, the macro block that a lot of error is large is directly deleted, if macro block large for error can be rejected before least square method computing, arithmetic speed will be improved significantly, and reduce operand.And determine macro block error size, impact calculates the texture information that the key factor of accuracy is macro block, namely gradient information.The macro block that this part proposes judge in advance and the method for screening just from the gradient information of macro block, threshold value according to setting carries out screening or retaining for macro block, when the amount of information of macro block is less than this threshold value, this macro block is screened, not as the macro block participating in Block-matching in following step; When containing much information in this threshold value, then macro block being retained, participating in carrying out the computings such as following estimation as validity feature block.
Its key step is as follows:
The first step: each frame is divided into 8x8 sub-block, prove through test, according to the form being divided into 16 × 16 sub-blocks, then amount of calculation is excessive, if be divided into 4x4 sub-block, the methods such as Block-matching are accurate not, therefore adopt the form of 8 × 8 sub-blocks;
Second step: adopt Sobel operator to obtain the gradient map of each frame, using the basis for estimation that gradient information is rejected as macro block;
| ▿ f ( x , y ) | = mag ( ▿ f ( x , y ) ) = G x 2 + G y 2
Wherein represent the gradient information of this point, G x, G yrepresent partial derivative respectively.
3rd step: the gradient amount calculating each macro block; For 8x8 sub-block, its gradient information amount is:
| ▿ f ( x , y ) 8 × 8 | = Σ i = 1 i = 8 Σ j = 1 j = 8 | ▿ f ( x , y ) |
4th step: determine the threshold value that macro block is prejudged, 40% of all macro blocks of general reservation, according to the value that this is determined, sort to the gradient amount of all macro blocks, determine the optimal threshold T of reservation 40% time macro block screening;
5th step: complete the screening for macro block, if its gradient information amount >T, then retains macro block, participates in carrying out the computings such as following estimation as validity feature block; If its gradient information amount <T, screens this macro block, not as the macro block participating in Block-matching in following step.
In described step (2) respectively using K-1 frame, K+1 frame as present frame, using K frame as reference frame, Block-matching is carried out to the macro block employing SAD criterion after screening, enhancement mode diamond search strategy, and the motion vector field of being tried to achieve by Block-matching utilizes least square method to obtain video camera six parameter model, its concrete steps are as follows:
(i) block matching criterion SAD
This part adopts SAD block matching criterion, and this criterion can not only find optimal match point, and amount of calculation is little, consuming time short.
SAD ( i , j ) = &Sigma; m = 1 M &Sigma; n = 1 N | f k ( m , n ) - f k - 1 ( m + i , n + j ) |
Wherein (i, j) is displacement, f kand f k-1be respectively the gray value of present frame and previous frame, MxN is the size of macro block, if a bit locate SAD (i, j) at certain to reach minimum, then this point is the Optimum Matching point that will look for.
(ii) enhancement mode diamond search strategy
Fig. 2 (a) is depicted as the error surface of Integer Pel estimation, and because hunting zone is large, video content is complicated, the error surface of Integer Pel estimation is non-monotonic.Therefore, Integer Pel estimation is easily absorbed in local minimum.Otherwise fraction pixel is obtained by Integer Pel interpolation, in fractional pixel search window, the correlation of Searching point is far above the correlation of Integer Pel Searching point.The fraction pixel error surface of major part video sequence all has the distribution character of Fig. 2 (b), namely when Searching point is near global minima point, and matching error monotonic decreasing.Therefore, many rapid fraction pixel motion vector searching methods have employed motion vectors (FMVP:fractional predicted mv) as initial search point.If can the initial point of accurately predicting Reusable Fractional Motion Vector search, then earlier can search the best MV near FMVP, in time stop fractional-pel motion estimating searching.
Motion-vector search commonly uses three kinds of templates: rhombus template, square templates and hexagon template.Wherein, rhombus template is the simplest, is adopted, as Fig. 3 (a) by many video encoders; Square templates adds 4 points on diagonal in rhombus template, and computation complexity and Search Results accuracy increase, as Fig. 3 (b); Hexagon is applicable to the larger occasion in hunting zone, and because Reusable Fractional Motion Vector hunting zone is only limitted between two Integer Pel, make search too complicated, therefore hexagon template is not too applicable to Reusable Fractional Motion Vector search, as Fig. 3 (c).
Based on above analysis, a kind of enhancement mode rhombus template search strategy based on predictive vector is proposed.Because motion vectors FMVP and best MV has higher matching rate, this method does not consider initial search center (0,0), and directly using FMVP as initial search point; Adopt enhancement mode rhombus template (EDSP:extended diamond search pattern), in conjunction with the advantage that square templates accuracy is higher, the basis of rhombus template increases the Searching point on diagonal; Do not carry out the iteration of rhombus template, and search is stopped in [-2,2] scope of FMVP, omit [-2,2] extraneous minority improves little Reusable Fractional Motion Vector search to code efficiency, to reduce search point, thus reduces amount of calculation further.
Fig. 4 is the enhancement mode rhombus template search strategy schematic diagram based on predictive vector, and method flow is as follows:
The first step: the Reusable Fractional Motion Vector being predicted current block by adjacent block, obtains FMVP, i.e. (Pred_x, Pred_y).Directly using FMVP as initial search point;
Second step: comparison search starting point (Pred_x, Pred_y) 4 diamond search points around and (Pred_x, Pred_y) matching error, if minimum match error RMS is positioned at (Pred_x, Pred_y), then stop Reusable Fractional Motion Vector search, otherwise carry out three-wave mixing;
3rd step: as Fig. 4 (a), if optimal match point is relative with suboptimum match point, then selects optimal match point MV to be final Reusable Fractional Motion Vector; As Fig. 4 (b), if optimal match point is adjacent with suboptimum match point, then calculate the matching error that the square templates be adjacent is put, if RMS is still rhombus optimal match point, then select rhombus optimal match point MV to be final Reusable Fractional Motion Vector, otherwise carry out next step;
4th step: centered by the Searching point in the 3rd step on square templates, with the point of rhombus template search around it.Select the point of RMS as final Reusable Fractional Motion Vector.
(iii) least square method obtains video camera six parameter model
In the present frame K-1 got in selecting step (i) and present frame K+1, both sides sub-block is as characteristic block, the motion vector that will obtain through (i) (ii) step substitute into video camera six parameter model (as shown in the formula) after, adopt Least Square Method parameter m 0, m 1, m 2, n 0, n 1, n 2.6 parameter affine transform models: can carry out modeling to translation, rotation, convergent-divergent motion, it is defined as follows:
x &prime; = m 0 + m 1 x + m 2 y y &prime; = n 0 + n 1 x + n 2 y
Wherein m 0and n 0represent the translation amplitude of pixel in x and y direction respectively, m 1, n 1, m 2, n 2four parametric descriptions convergent-divergent and rotary motion.
The reconstruction frames K-1', the K+1' that are obtained present frame K-1, K+1 by motion compensation respectively in described step (3), its particular content is as follows:
For each point in present frame k-1 frame, k+1 frame according to the camera model of above-mentioned acquisition, calculate its correspondence position respectively in reference frame K and assignment is carried out to it, thus the global motion compensation realized for K-1 frame, k+1 frame, make the background alignment of the reconstruction frames k-1' after compensation, k+1' and reference frame k, thus realize following jointing edge information, self adaptation maximum variance threshold value based on methods of video segmentation under the dynamic background of enhancement mode rhombus estimation and three frame background alignment.
Employing Prewitt operator extraction marginal information in described step (4), and carry out difference with reference frame K-edge respectively, and adopt maximum variance threshold value to carry out binaryzation, its concrete steps are as follows:
(i) Prewitt operator extraction marginal information, and carry out difference with reference frame K-edge
Edge detection operator kind is a lot, selects Prewitt edge detection operator to carry out Edge Gradient Feature for reconstruction frames K-2', K-1', K+1', K+2' and reference frame K frame here.
Prewitt operator can realize with mask convolution:
f s(x,y)=|f(x,y)×G x|+|f(x,y)×G y|
Wherein: G x = - 1 0 1 - 1 0 1 - 1 0 1 G y = 1 1 1 0 0 0 - 1 - 1 - 1
The result that application Prewitt operator extracts edge respectively for reconstruction frames k-1', k+1' and k frame is: f k-1 '(x, y), f k+1 '(x, y) and f k(x, y).
To the edge of reconstruction frames k-1' and k frame, the edge of reconstruction frames k+1' and k frame carries out image difference computing, tries to achieve frame difference d 1with d 2, wherein:
Frame difference d 1=| f k-1 '(x, y)-f k(x, y) |, frame difference d 2=| f k+1 '(x, y)-f k(x, y) |
(ii) maximum variance threshold value is adopted to carry out binaryzation
Maximum variance threshold value is a kind of adaptive Threshold, and the histogram of image is divided into two groups with optimum thresholding by it, when the variance between two groups is maximum, and decision threshold.So the binaryzation realizing edge image difference result is in this way adopted in this part.
If the gray value of piece image is 0 ~ m-1 level, the pixel count of gray value i is n i, then total pixel number:
The probability of each value is:
If optimal threshold is T, with threshold value T, pixel is divided into two groups: C 0={ 0 ~ T-1} and C 1={ T ~ m-1}, C 0and C 1the probability produced and mean value are drawn by following formula:
C 0the probability produced w 0 = &Sigma; i = 0 T - 1 p i = w ( T )
C 1the probability produced w 1 = &Sigma; i = T m - 1 p i = 1 - w 0
C 0mean value &mu; 0 = &Sigma; i = 0 T - 1 ip i w 0 = &mu; ( T ) w ( T )
C 1mean value &mu; 1 = &Sigma; i = T m - 1 ip i w 1 = &mu; - &mu; ( T ) 1 - w ( T )
Wherein: &mu; = &Sigma; i = 0 m - 1 ip i , &mu; ( T ) = &Sigma; i = 0 T - 1 ip i
Then the average gray of all samplings is: μ=w 0μ 0+ w 1μ 1
Variance between two groups:
&delta; 2 ( T ) = w 0 ( &mu; 0 - &mu; ) 2 + w 1 ( &mu; 1 - &mu; ) 2 = w 0 w 1 ( &mu; 1 - &mu; 0 ) 2 = [ &mu; &CenterDot; w ( T ) - &mu; ( T ) ] 2 w ( T ) [ 1 - w ( T ) ]
T when asking above formula to be maximum between 1 ~ m-1, is optimal threshold.
Binaryzation is carried out according to obtained optimal threshold T edge testing result.
The advantage that the present invention is compared with prior art had is: this method prejudges the time that effectively can reduce Block-matching by what carry out macro block before block matching method, by continuous three frame video sequences are carried out background alignment and the follow-up process to three two field pictures by estimation, motion compensation, can accurately by the video object segmentation under dynamic background out.
Accompanying drawing illustrates:
Fig. 1 is that the present invention is a kind of based on Video Object Extraction method flow diagram under the dynamic background of enhancement mode rhombus estimation and three frame background alignment;
Fig. 2 is that the present invention is a kind of based on Video Object Extraction method motion estimation error curved surface schematic diagram under the dynamic background of enhancement mode rhombus estimation and three frame background alignment;
To be that the present invention is a kind of commonly use search pattern schematic diagram based on Video Object Extraction method under the dynamic background of enhancement mode rhombus estimation and three frame background alignment to Fig. 3;
Fig. 4 be the present invention a kind of based on Video Object Extraction method under the dynamic background of enhancement mode rhombus estimation and three frame background alignment based on the enhancement mode rhombus template search schematic diagram of predictive vector;
The Video Object Extraction result of Fig. 5 after to be that the present invention is a kind of adopt the inventive method to compensate based on the 139th frame of Video Object Extraction method Coastguard video sequence under the dynamic background of enhancement mode rhombus estimation and three frame background alignment; Wherein (a) represents the 138th frame of Coastguard video sequence; B () represents the 139th frame of Coastguard video sequence; C () represents the 140th frame of Coastguard video sequence; D () represents the pretreated result of the 138th frame of Coastguard video sequence; E () represents the pretreated result of the 139th frame of Coastguard video sequence; F () represents the pretreated result of the 140th frame of Coastguard video sequence; G () represents the result of reconstruction frames through Prewitt rim detection of the 138th frame of Coastguard video sequence; H () represents the result of the 139th frame through Prewitt rim detection of Coastguard video sequence; I () represents the result of reconstruction frames through Prewitt rim detection of the 140th frame of Coastguard video sequence; J () represents the two-value video object plane that the 139th frame of Coastguard video sequence adopts the inventive method to extract after three frame background alignment methods of estimation, compensation; K () represents the video object plane that the 139th frame of Coastguard video sequence adopts the inventive method to extract after three frame background alignment methods of estimation, compensation;
Embodiment:
The present invention is described in further detail below in conjunction with the drawings and the specific embodiments.
The present invention is a kind of based on Video Object Extraction method under the dynamic background of enhancement mode rhombus estimation and three frame background alignment, comprises the following steps (as shown in Figure 1):
Step 1. greyscale transformation and morphology preliminary treatment.
First the video sequence of yuv format is done greyscale transformation, because Y-component comprises half-tone information, therefore Y-component is extracted from video sequence.Owing to inevitably there will be the interference of noise in video, therefore morphology opening and closing reconstruction is carried out to every two field picture, stress release treatment, smooth out some tiny edges with simplified image.Pretreated result can see Fig. 5 (d) (e) (f).
Step 2. is with reference to frame K frame, and K-1 frame, K+1 frame is divided into 8 × 8 macro blocks, judges according to texture information in advance, screens all macro blocks in K-1 frame, K+1 frame.
Owing to apply least square method calculating globe motion parameter in following step in, the macro block that a lot of error is large is directly deleted, if macro block large for error can be rejected before least square method computing, arithmetic speed will be improved significantly, and reduce operand.And determine macro block error size, impact calculates the texture information that the key factor of accuracy is macro block, namely gradient information.The macro block that this part proposes judge in advance and the method for screening just from the gradient information of macro block, threshold value according to setting carries out screening or retaining for macro block, when the amount of information of macro block is less than this threshold value, this macro block is screened, not as the macro block participating in Block-matching in following step; When containing much information in this threshold value, then macro block being retained, participating in carrying out the computings such as following estimation as validity feature block.
Its key step is as follows:
The first step: each frame is divided into 8x8 sub-block, prove through test, according to the form being divided into 16 × 16 sub-blocks, then amount of calculation is excessive, if be divided into 4x4 sub-block, the methods such as Block-matching are accurate not, therefore adopt the form of 8 × 8 sub-blocks;
Second step: adopt Sobel operator to obtain the gradient map of each frame, using the basis for estimation that gradient information is rejected as macro block;
| &dtri; f ( x , y ) | = mag ( &dtri; f ( x , y ) ) = G x 2 + G y 2
Wherein represent the gradient information of this point, G x, G yrepresent partial derivative respectively.
3rd step: the gradient amount calculating each macro block; For 8x8 sub-block, its gradient information amount is:
| &dtri; f ( x , y ) 8 &times; 8 | = &Sigma; i = 1 i = 8 &Sigma; j = 1 j = 8 | &dtri; f ( x , y ) |
4th step: determine the threshold value that macro block is prejudged, 40% of all macro blocks of general reservation, according to the value that this is determined, sort to the gradient amount of all macro blocks, determine the optimal threshold T of reservation 40% time macro block screening;
5th step: complete the screening for macro block, if its gradient information amount >T, then retains macro block, participates in carrying out the computings such as following estimation as validity feature block; If its gradient information amount <T, screens this macro block, not as the macro block participating in Block-matching in following step
Step 3. adopts SAD criterion to the macro block after above-mentioned screening, enhancement mode diamond search strategy carries out Block-matching, using K-1 frame as present frame, K frame as with reference to frame, obtain the motion vector field of K-1 frame relative to K frame; Using K+1 frame as present frame, K frame as with reference to frame, obtain the motion vector field of K+1 frame relative to K frame, and calculate globe motion parameter by least square method, obtain video camera six parameter model.
Block matching criterion conventional at present has: mean absolute error MAD (Mean Absolute Difference), least mean-square error MSE (Mean Square Error), minimum absolute difference SAD (Sum ofAbsolute).
This part adopts SAD block matching criterion, and this criterion can not only find optimal match point, and amount of calculation is little, consuming time short.
SAD ( i , j ) = &Sigma; m = 1 M &Sigma; n = 1 N | f k ( m , n ) - f k - 1 ( m + i , n + j ) |
Wherein (i, j) is displacement, f kand f k-1be respectively the gray value of present frame and previous frame, MxN is the size of macro block, if a bit locate SAD (i, j) at certain to reach minimum, then this point is the Optimum Matching point that will look for.
Fig. 2 (a) is depicted as the error surface of Integer Pel estimation, and because hunting zone is large, video content is complicated, the error surface of Integer Pel estimation is non-monotonic.Therefore, Integer Pel estimation is easily absorbed in local minimum.Otherwise fraction pixel is obtained by Integer Pel interpolation, in fractional pixel search window, the correlation of Searching point is far above the correlation of Integer Pel Searching point.The fraction pixel error surface of major part video sequence all has the distribution character of Fig. 2 (b), namely when Searching point is near global minima point, and matching error monotonic decreasing.Therefore, many rapid fraction pixel motion vector searching methods have employed motion vectors (FMVP:fractional predicted mv) as initial search point.If can the initial point of accurately predicting Reusable Fractional Motion Vector search, then earlier can search the best MV near FMVP, in time stop fractional-pel motion estimating searching.
Motion-vector search commonly uses three kinds of templates: rhombus template, square templates and hexagon template.Wherein, rhombus template is the simplest, is adopted, as Fig. 3 (a) by many video encoders; Square templates adds 4 points on diagonal in rhombus template, and computation complexity and Search Results accuracy increase, as Fig. 3 (b); Hexagon is applicable to the larger occasion in hunting zone, and because Reusable Fractional Motion Vector hunting zone is only limitted between two Integer Pel, make search too complicated, therefore hexagon template is not too applicable to Reusable Fractional Motion Vector search, as Fig. 3 (c).
Based on above analysis, a kind of enhancement mode rhombus template search strategy based on predictive vector is proposed.Because motion vectors FMVP and best MV has higher matching rate, this method does not consider initial search center (0,0), and directly using FMVP as initial search point; Adopt enhancement mode rhombus template (EDSP:extended diamond search pattern), in conjunction with the advantage that square templates accuracy is higher, the basis of rhombus template increases the Searching point on diagonal; Do not carry out the iteration of rhombus template, and search is stopped in [-2,2] scope of FMVP, omit [-2,2] extraneous minority improves little Reusable Fractional Motion Vector search to code efficiency, to reduce search point, thus reduces amount of calculation further.
Fig. 4 is the enhancement mode rhombus template search strategy schematic diagram based on predictive vector, and method flow is as follows:
The first step: the Reusable Fractional Motion Vector being predicted current block by adjacent block, obtains FMVP, i.e. (Pred_x, Pred_y).Directly using FMVP as initial search point;
Second step: comparison search starting point (Pred_x, Pred_y) 4 diamond search points around and (Pred_x, Pred_y) matching error, if minimum match error RMS is positioned at (Pred_x, Pred_y), then stop Reusable Fractional Motion Vector search, otherwise carry out three-wave mixing;
3rd step: as Fig. 4 (a), if optimal match point is relative with suboptimum match point, then selects optimal match point MV to be final Reusable Fractional Motion Vector; As Fig. 4 (b), if optimal match point is adjacent with suboptimum match point, then calculate the matching error that the square templates be adjacent is put, if RMS is still rhombus optimal match point, then select rhombus optimal match point MV to be final Reusable Fractional Motion Vector, otherwise carry out next step;
4th step: centered by the Searching point in the 3rd step on square templates, with the point of rhombus template search around it.Select the point of RMS as final Reusable Fractional Motion Vector.
Respectively the macro block after screening in K-1 frame, K+1 frame and reference frame K are carried out Block-matching according to above-mentioned SAD criterion and enhancement mode diamond search strategy, obtain the motion vector field of present frame K-1 relative to reference frame K, and present frame K+1 is relative to the motion vector field of reference frame K.
Step 4. asks camera motion according to least square method.
In the present frame K-1 got in selecting step 2 and present frame K+1, both sides sub-block is as characteristic block, the motion vector obtained through Block-matching, estimation is substituted into video camera six parameter model (as shown in the formula) after, adopt Least Square Method parameter m 0, m 1, m 2, n 0, n 1, n 2.6 parameter affine transform models: can carry out modeling to translation, rotation, convergent-divergent motion, it is defined as follows:
x &prime; = m 0 + m 1 x + m 2 y y &prime; = n 0 + n 1 x + n 2 y
Wherein m 0and n 0represent the translation amplitude of pixel in x and y direction respectively, m 1, n 1, m 2, n 2four parametric descriptions convergent-divergent and rotary motion.
Step 5. obtains reconstruction frames K-1', the K+1' of present frame K-1, K+1 respectively by motion compensation.
For each point in present frame k-1 frame, k+1 frame according to the camera model of above-mentioned acquisition, calculate its correspondence position respectively in reference frame K and assignment is carried out to it, thus the global motion compensation realized for K-1 frame, k+1 frame, make the background alignment of the reconstruction frames k-1' after compensation, k+1' and reference frame k, thus realize following jointing edge information, adaptive threshold based on methods of video segmentation under the dynamic background of enhancement mode rhombus estimation and three frame background alignment.
Step 6. adopts Prewitt operator extraction marginal information, carries out difference respectively with reference frame K-edge, and adopts maximum variance threshold value to carry out binaryzation.
(i) Prewitt operator extraction marginal information, and carry out difference with reference frame K-edge
Edge detection operator kind is a lot, selects Prewitt edge detection operator to carry out Edge Gradient Feature for reconstruction frames K-2', K-1', K+1', K+2' and reference frame K frame here.
Prewitt operator can realize with mask convolution:
f s(x,y)=|f(x,y)×G x|+|f(x,y)×G y|
Wherein: G x = - 1 0 1 - 1 0 1 - 1 0 1 G y = 1 1 1 0 0 0 - 1 - 1 - 1
The result that application Prewitt operator extracts edge respectively for reconstruction frames k-1', k+1' and k frame is: f k-1 '(x, y), f k+1 '(x, y) and f k(x, y), result can see Fig. 5 (g) (h) (i).
To the edge of reconstruction frames k-1' and k frame, the edge of reconstruction frames k+1' and k frame carries out image difference computing, tries to achieve frame difference d 1with d 2, wherein:
Frame difference d1=|f k-1' (x, y)-f k(x, y) |, frame difference d 2=| f k+1 '(x, y)-f k(x, y) |
Maximum variance threshold value is a kind of adaptive Threshold, and the histogram of image is divided into two groups with optimum thresholding by it, when the variance between two groups is maximum, and decision threshold.So the binaryzation realizing edge image difference result is in this way adopted in this part.
If the gray value of piece image is 0 ~ m-1 level, the pixel count of gray value i is n i, then total pixel number:
N = &Sigma; i = 0 m - 1 n i
The probability of each value is:
If optimal threshold is T, with threshold value T, pixel is divided into two groups: C 0={ 0 ~ T-1} and C 1={ T ~ m-1}, C 0and C 1the probability produced and mean value are drawn by following formula:
C 0the probability produced w 0 = &Sigma; i = 0 T - 1 p i = w ( T )
C 1the probability produced w 1 = &Sigma; i = T m - 1 p i = 1 - w 0
C 0mean value &mu; 0 = &Sigma; i = 0 T - 1 ip i w 0 = &mu; ( T ) w ( T )
C 1mean value &mu; 1 = &Sigma; i = T m - 1 ip i w 1 = &mu; - &mu; ( T ) 1 - w ( T )
Wherein: &mu; = &Sigma; i = 0 m - 1 ip i , &mu; ( T ) = &Sigma; i = 0 T - 1 ip i
Then the average gray of all samplings is: μ=w 0μ 0+ w 1μ 1
Variance between two groups:
&delta; 2 ( T ) = w 0 ( &mu; 0 - &mu; ) 2 + w 1 ( &mu; 1 - &mu; ) 2 = w 0 w 1 ( &mu; 1 - &mu; 0 ) 2 = [ &mu; &CenterDot; w ( T ) - &mu; ( T ) ] 2 w ( T ) [ 1 - w ( T ) ]
T when asking above formula to be maximum between 1 ~ m-1, is optimal threshold.
According to obtained optimal threshold T to frame difference d 1, frame difference d 2carry out binaryzation respectively, the result of binaryzation is respectively OtusBuf1 and OtusBuf2.
Step 7. and computing and reprocessing.
The result of above-mentioned binaryzation is carried out and computing, as follows with the result of computing:
Wherein: DifferBuf (i) represents the result with computing, OtusBuf1 (i) and OtusBuf2 (i) represent frame difference d 1, d 2carry out the result of binaryzation respectively.
Due to inevitably noisy interference in video sequence, therefore with computing after also to carry out some reprocessing work, to remove isolated zonule, small―gap suture, the results are shown in Figure 5 (j) of reprocessing.For this reason, first this part adopts the method for medium filtering to remove the noise of some interference, then adopts morphological image method, mainly comprises corrosion and dilation operation, not only can remove noise and can play the effect of smoothed image.Erosion operation mainly eliminates boundary point, and border is internally shunk, and all background dots with object contact are then merged in this object by dilation operation, and border is expanded outwardly.

Claims (1)

1., based on a Video Object Extraction method under the dynamic background of enhancement mode rhombus method for estimating and three frame background alignment, it is characterized in that comprising the following steps:
(1) with reference to frame K frame, K-1 frame, K+1 frame is divided into 8 × 8 macro blocks, judges according to texture information in advance, screens all macro blocks in K-1 frame, K+1 frame; Concrete steps are as follows:
The first step: each frame is divided into 8 × 8 sub-blocks;
Second step: adopt Sobel operator to obtain the gradient map of each frame, using the basis for estimation that gradient information is rejected as macro block;
| &dtri; f ( x , y ) | = mag ( &dtri; f ( x , y ) ) = G x 2 + G y 2
Wherein represent the gradient information at (x, y) place, G x, G yrepresent partial derivative respectively;
3rd step: the gradient amount calculating each macro block; For 8 × 8 sub-blocks, its gradient information amount is:
| &dtri; f ( x , y ) 8 &times; 8 | = &Sigma; i = 1 i = 8 &Sigma; j = 1 j = 8 | &dtri; f ( x , y ) |
4th step: determine the threshold value that macro block is prejudged to retain 40% of all macro blocks, according to the value that this is determined, sorts to the gradient amount of all macro blocks, determines the optimal threshold T of reservation 40% time macro block screening;
5th step: complete the screening for macro block, if its gradient information amount >T, then retains macro block, participates in carrying out following estimation computing as validity feature block; If its gradient information amount <T, screens this macro block, not as the macro block participating in Block-matching in following step;
(2) SAD criterion is adopted to the macro block after above-mentioned screening, enhancement mode diamond search strategy carries out Block-matching, using K-1 frame as present frame, K frame as with reference to frame, obtain the motion vector field of K-1 frame relative to K frame; Using K+1 frame as present frame, K frame as with reference to frame, obtain the motion vector field of K+1 frame relative to K frame, and calculate globe motion parameter by least square method, obtain video camera six parameter model; Its concrete steps are as follows:
(i) block matching criterion SAD
Specific formula for calculation is as follows:
SAD ( i , j ) = &Sigma; m = 1 M &Sigma; n = 1 N | f k ( m , n ) - f k - 1 ( m + i , n + j ) |
Wherein (i, j) is displacement, f kand f k-1be respectively the gray value of present frame and previous frame, M × N is the size of macro block, if a bit locate SAD (i, j) at certain to reach minimum, then this point is the Optimum Matching point that will look for;
(ii) enhancement mode diamond search strategy
Method flow based on the enhancement mode rhombus template search strategy of predictive vector is as follows:
The first step: the Reusable Fractional Motion Vector being predicted current block by adjacent block, obtains FMVP, i.e. (Pred_x, Pred_y); Directly using FMVP as initial search point;
Second step: comparison search starting point (Pred_x, Pred_y) 4 diamond search points around and (Pred_x, Pred_y) matching error, if minimum match error RMS is positioned at (Pred_x, Pred_y), then stop Reusable Fractional Motion Vector search, otherwise carry out three-wave mixing;
3rd step: if optimal match point is relative with suboptimum match point, then select optimal match point MV to be final Reusable Fractional Motion Vector; If optimal match point is adjacent with suboptimum match point, then calculate the matching error that the square templates be adjacent is put, if RMS is still rhombus optimal match point, then selects rhombus optimal match point MV to be final Reusable Fractional Motion Vector, otherwise carry out next step;
4th step: centered by the Searching point in the 3rd step on square templates, with the point of rhombus template search around it; Select the point of RMS as final Reusable Fractional Motion Vector;
(iii) least square method obtains video camera six parameter model
In the present frame K-1 got in selecting step (i) and present frame K+1, both sides sub-block is as characteristic block, after the motion vector that will obtain through (i) (ii) step substitutes into video camera six parameter model, adopt Least Square Method parameter m 0, m 1, m 2, n 0, n 1, n 2; 6 parameter affine transform models: can carry out modeling to translation, rotation, convergent-divergent motion, it is defined as follows:
x &prime; = m 0 + m 1 x + m 2 y y &prime; = n 0 + n 1 x + n 2 y
Wherein m 0and n 0represent the translation amplitude of pixel in x and y direction respectively, m 1, n 1, m 2, n 2four parametric descriptions convergent-divergent and rotary motion;
(3) motion compensation is carried out to K-1 frame, make K-1 frame and K frame background alignment, obtain reconstruction frames K-1', motion compensation is carried out to K+1 frame, make K+1 frame and K frame background alignment, obtain reconstruction frames K+1'; Its particular content is as follows:
For each point in present frame K-1 frame, K+1 frame according to the camera model of above-mentioned acquisition, calculate its correspondence position respectively in reference frame K and assignment is carried out to it, thus the global motion compensation realized for K-1 frame, K+1 frame, make the background alignment of the reconstruction frames K-1' after compensation, K+1' and reference frame K, thus realize following jointing edge information, self adaptation maximum variance threshold value based on methods of video segmentation under the dynamic background of enhancement mode rhombus method for estimating and three frame background alignment;
(4) adopt Prewitt operator extraction marginal information respectively, calculate that it is poor relative to the frame of reference frame K-edge respectively, and adopt maximum variance threshold value to carry out binaryzation; Its concrete steps are as follows:
(i) Prewitt operator extraction marginal information, and carry out difference with reference frame K-edge
Edge detection operator kind is a lot, selects Prewitt edge detection operator to carry out Edge Gradient Feature for reconstruction frames K-1', K+1' and reference frame K frame here;
Prewitt operator can realize with mask convolution:
f s(x,y)=|f(x,y)×G x|+|f(x,y)×G y|
Wherein: G x = - 1 0 1 - 1 0 1 - 1 0 1 G y = 1 1 1 0 0 0 - 1 - 1 - 1
The result that application Prewitt operator extracts edge respectively for reconstruction frames K-1', K+1' and K frame is: f k-1'(x, y), f k+1'(x, y) and f k(x, y);
To the edge of reconstruction frames K-1' and K frame, the edge of reconstruction frames K+1' and K frame carries out image difference computing, tries to achieve frame difference d 1with d 2, wherein:
Frame difference d 1=| f k-1'(x, y)-f k(x, y) |, frame difference d 2=| f k+1'(x, y)-f k(x, y) |
(ii) maximum variance threshold value is adopted to carry out binaryzation
Maximum variance threshold value is a kind of adaptive Threshold, and the histogram of image is divided into two groups with optimum thresholding by it, when the variance between two groups is maximum, and decision threshold; So the binaryzation realizing edge image difference result is in this way adopted in this part;
If the gray value of piece image is 0 ~ m-1 level, the pixel count of gray value i is n i, then total pixel number:
N = &Sigma; i = 0 m - 1 n i
The probability of each value is: p i = n i N
If optimal threshold is T, with threshold value T, pixel is divided into two groups: C 0={ 0 ~ T-1} and C 1={ T ~ m-1}, C 0and C 1the probability produced and mean value have following formula to draw:
C 0the probability produced w 0 = &Sigma; i = 0 T - 1 p i = w ( T )
C 1the probability produced w 1 = &Sigma; i = T m - 1 p i = 1 - w 0
C 0mean value &mu; 0 = &Sigma; i = 0 T - 1 ip i w 0 = &mu; ( T ) w ( T )
C 1mean value &mu; 1 = &Sigma; i = T m - 1 ip i w 1 = &mu; - &mu; ( T ) 1 - w ( T )
Wherein: &mu; = &Sigma; i = 0 m - 1 ip i , &mu; ( T ) = &Sigma; i = 0 T - 1 ip i
Then the average gray of all samplings is: μ=w 0μ 0+ w 1μ 1
Variance between two groups:
&delta; 2 ( T ) = w 0 ( &mu; 0 - &mu; ) 2 + w 1 ( &mu; 1 - &mu; ) 2 = w 0 w 1 ( &mu; 1 - &mu; 0 ) 2 = [ &mu; &CenterDot; w ( T ) - &mu; ( T ) ] 2 w ( T ) [ 1 - W ( T ) ]
T when asking above formula to be maximum between 1 ~ m-1, is optimal threshold;
Binaryzation is carried out according to obtained optimal threshold T edge testing result;
(5) adopt and carry out reprocessing with computing and morphology, medium filtering, realize the effectively segmentation fast of object video under dynamic background.
CN201210398149.0A 2012-10-18 2012-10-18 Dynamic background video object extraction based on enhancement type diamond search and three-frame background alignment Active CN102917223B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210398149.0A CN102917223B (en) 2012-10-18 2012-10-18 Dynamic background video object extraction based on enhancement type diamond search and three-frame background alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210398149.0A CN102917223B (en) 2012-10-18 2012-10-18 Dynamic background video object extraction based on enhancement type diamond search and three-frame background alignment

Publications (2)

Publication Number Publication Date
CN102917223A CN102917223A (en) 2013-02-06
CN102917223B true CN102917223B (en) 2015-06-24

Family

ID=47615433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210398149.0A Active CN102917223B (en) 2012-10-18 2012-10-18 Dynamic background video object extraction based on enhancement type diamond search and three-frame background alignment

Country Status (1)

Country Link
CN (1) CN102917223B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111369592B (en) * 2020-03-13 2023-07-25 浙江工业大学 Newton interpolation-based rapid global motion estimation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159859A (en) * 2007-11-29 2008-04-09 北京中星微电子有限公司 Motion detection method, device and an intelligent monitoring system
CN101719979A (en) * 2009-11-27 2010-06-02 北京航空航天大学 Video object segmentation method based on time domain fixed-interval memory compensation
CN102270346A (en) * 2011-07-27 2011-12-07 宁波大学 Method for extracting target object from interactive video
CN102420985A (en) * 2011-11-29 2012-04-18 宁波大学 Multi-view video object extraction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8335350B2 (en) * 2011-02-24 2012-12-18 Eastman Kodak Company Extracting motion information from digital video sequences

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159859A (en) * 2007-11-29 2008-04-09 北京中星微电子有限公司 Motion detection method, device and an intelligent monitoring system
CN101719979A (en) * 2009-11-27 2010-06-02 北京航空航天大学 Video object segmentation method based on time domain fixed-interval memory compensation
CN102270346A (en) * 2011-07-27 2011-12-07 宁波大学 Method for extracting target object from interactive video
CN102420985A (en) * 2011-11-29 2012-04-18 宁波大学 Multi-view video object extraction method

Also Published As

Publication number Publication date
CN102917223A (en) 2013-02-06

Similar Documents

Publication Publication Date Title
CN102917220B (en) Dynamic background video object extraction based on hexagon search and three-frame background alignment
CN102917217B (en) Movable background video object extraction method based on pentagonal search and three-frame background alignment
CN110232330B (en) Pedestrian re-identification method based on video detection
CN108198201A (en) A kind of multi-object tracking method, terminal device and storage medium
CN103871076A (en) Moving object extraction method based on optical flow method and superpixel division
CN103077531B (en) Based on the gray scale Automatic Target Tracking method of marginal information
US10249046B2 (en) Method and apparatus for object tracking and segmentation via background tracking
CN102063727A (en) Covariance matching-based active contour tracking method
CN105957036A (en) Video motion blur removing method strengthening character prior
CN108200432A (en) A kind of target following technology based on video compress domain
Wang et al. Unstructured road detection using hybrid features
CN115375733A (en) Snow vehicle sled three-dimensional sliding track extraction method based on videos and point cloud data
CN103051893B (en) Dynamic background video object extraction based on pentagonal search and five-frame background alignment
CN102917222B (en) Mobile background video object extraction method based on self-adaptive hexagonal search and five-frame background alignment
CN102970527B (en) Video object extraction method based on hexagon search under five-frame-background aligned dynamic background
CN102917224B (en) Mobile background video object extraction method based on novel crossed diamond search and five-frame background alignment
CN105913084A (en) Intensive track and DHOG-based ultrasonic heartbeat video image classifying method
CN102917223B (en) Dynamic background video object extraction based on enhancement type diamond search and three-frame background alignment
CN102917218B (en) Movable background video object extraction method based on self-adaptive hexagonal search and three-frame background alignment
CN111105430A (en) Variation level set image segmentation method based on Landmark simplex constraint
CN102917221B (en) Based on the dynamic background video object extraction of the search of novel cross rhombic and three frame background alignment
CN102917219B (en) Based on the dynamic background video object extraction of enhancement mode diamond search and five frame background alignment
Mei et al. An Algorithm for Automatic Extraction of Moving Object in the Image Guidance
Guo et al. Research on the detection and tracking technology of moving object in video images
Chavan et al. A novel deep learning based osopeel approach for video inpainting

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20170117

Address after: 518000 Guangdong city of Shenzhen province Nanshan District Shahe Street Xueyuan Road No. 1001 Nanshan Chi Park A7 building 4 floor

Patentee after: SHENZHEN XIAOLAJIAO TECHNOLOGY Co.,Ltd.

Address before: 100191 Haidian District, Xueyuan Road, No. 37,

Patentee before: BEIHANG University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220624

Address after: 518000 4th floor, building A7, Nanshan Zhiyuan, No. 1001, Xueyuan Avenue, Nanshan District, Shenzhen, Guangdong Province

Patentee after: Shenzhen Skylark Software Technology Co.,Ltd.

Address before: 518000, 4, A7 building, Nanshan Zhiyuan 1001, Shahe Road, Nanshan District, Shenzhen, Guangdong.

Patentee before: SHENZHEN XIAOLAJIAO TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240206

Address after: 518000 4th Floor, Building A7, Nanshan Zhiyuan, 1001 Xueyuan Avenue, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: SHENZHEN XIAOLAJIAO TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 518000 4th floor, building A7, Nanshan Zhiyuan, No. 1001, Xueyuan Avenue, Nanshan District, Shenzhen, Guangdong Province

Patentee before: Shenzhen Skylark Software Technology Co.,Ltd.

Country or region before: China