CN106815854B - On-line video foreground and background separation method based on regular error modeling - Google Patents
On-line video foreground and background separation method based on regular error modeling Download PDFInfo
- Publication number
- CN106815854B CN106815854B CN201611252353.6A CN201611252353A CN106815854B CN 106815854 B CN106815854 B CN 106815854B CN 201611252353 A CN201611252353 A CN 201611252353A CN 106815854 B CN106815854 B CN 106815854B
- Authority
- CN
- China
- Prior art keywords
- video
- foreground
- background
- model
- background separation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Image Analysis (AREA)
Abstract
A on-line video foreground and background separation method based on regular error modeling includes 1, acquiring video data of a monitoring system in real time; 2. embedding transformation operator optimization variables which can be based on the video background into the model based on the real-time change of the video background environment; 3. constructing a regular error modeling model based on the real-time change characteristics of the video foreground target; 4. combining the step 2 and the step 3 to construct a complete statistical model, and obtaining a complete monitoring video foreground and background separation model according to a maximum posterior estimation method; 5. down-sampling the video data, accelerating the calculation of the foreground and background separation model of the video data in the step (4), and realizing the real-time solution of the model; 6. and outputting the foreground and the background in real time according to the foreground and background separation result of the monitoring video data obtained in the step 5. The method for separating the foreground and the background of the online monitoring video has high speed and high precision, and has important significance in detecting, tracking, identifying, analyzing and the like of the monitoring video target in practical application.
Description
Technical Field
The invention relates to a video processing method of a monitoring video, in particular to an online video foreground and background separation method based on regular error modeling.
Background
The foreground and background separation of the surveillance video has important application value in real life, such as target tracking, urban traffic monitoring and the like. However, nowadays, monitoring equipment is distributed in all corners of the world, and daily monitoring video data is extremely large and complex in structure. On the premise of ensuring high precision and high efficiency, the real-time foreground and background separation of the monitoring video is still a huge challenge.
In the field of image processing, a number of techniques for separating the foreground and background of a video are available. Common techniques include a direct separation method based on statistical assumptions, a subspace learning method, and an online separation method, etc.
The direct separation method based on statistical hypothesis makes certain hypothesis of statistical distribution on frame data of the video, and then achieves the purpose of separating foreground and background based on some statistics, such as a median or mean model, a histogram model, and the like. In addition, the MoG and MoGG methods consider a more refined statistical distribution hypothesis for frame data, and use some mixed distributions (such as gaussian mixtures) to fit each frame data, resulting in a better separation effect. However, these methods ignore the structural information of the video, such as spatial continuity of the foreground, similarity of the background in time, etc. In contrast, the subspace learning method finely encodes the video structure information, and by assuming that the background of the video has a low-rank structure, the structure information such as the spatial continuity of the video foreground and the similarity of the background in time is integrated into a model, so that the obtained foreground and background separation effect is ideal.
Although the spatial learning method has achieved remarkable results, there is a gap from the actual practical application. In the current society, video data is rapidly increased every moment, and the foreground and background separation technology is required to have high efficiency on the premise of ensuring high precision; on the other hand, in the face of monitoring data which is constantly emerging all the time, a real-time online separation technology needs to be provided. Although some on-line separation methods exist at present, the double requirements of high precision and high efficiency cannot be met.
In view of the deficiencies of the prior art, it is necessary to provide a technology for real-time online foreground and background separation of a continuously-appearing monitoring video with high precision and high efficiency, and particularly, the technology should be capable of effectively adapting to the dynamic foreground target type and the dynamic background environmental change existing in the video.
Disclosure of Invention
The invention aims to provide an online video foreground and background separation method based on regular error modeling, which can more fully and accurately utilize structural information of a video to perform statistical modeling, thereby achieving a separation effect with higher precision and ensuring high efficiency of processing.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
step S1: acquiring video data of a monitoring system on line;
step S2: constructing a model on the basis of the assumption of a low-rank structure of a video background, embedding an adaptive transformation factor variable into the model, and coding the dynamic change of the video background to realize the adaptive modeling of the real video dynamic background;
step S3: carrying out parametric distribution modeling based on the randomness of the change of the video foreground target, so that the video foreground target can be adaptive to the dynamic change of the video foreground under different scenes at different time, further coding the regularized noise information of the video foreground information before being embedded into the video foreground target, and realizing the adaptive modeling of the real video dynamic foreground;
step S4: combining the step S2 with the step S3 to construct a complete statistical model for monitoring the separation of the foreground and the background of the video;
step S5: in step S1, down-sampling the video data, and applying the foreground and background separation statistical model of the video data in step S4 to speed up the item based on the sampled data;
step S6: on top of the foreground object obtained in step S5, TV continuity modeling is performed thereon;
step S7: and outputting the finally detected foreground object and background scene of the video according to the results obtained in the steps S5 and S6.
The step S2 builds a model: due to the similarity of the background corresponding to each frame of image of the video data, the similarity is encoded by performing the following low-rank expression on the video image:
xt=Utvt+εt(1)
wherein xt∈RdT-th frame image, U, representing a surveillance videot∈Rd×rIs the current expression base of the video background, wherein r < d, the subspaces expressed by these bases constitute a low rank subspace of the original image space, vt∈RrAs a combination coefficient, UtvtDenotes xtIn subspace UtLow rank mapping oftRepresents the residual;
embedding an adaptive transform factor variable in the model, i.e. improving the model (1) to:
wherein tau istFor picture xtThe affine transformation operator variable expresses the video background transformation of rotation, translation, distortion and scale.
The parameterized distribution modeling of the step S3 is to model the residual variable ε in the model (2)tThe coding is mixed Gaussian distribution, so that the coding is adaptive to the dynamic change of a video foreground, namely a video background residual under different time and different scenes, and the corresponding model is as follows:
whereinIs xtThe value of the ith pixel of (a),is UtThe number of the ith row of (a),a hidden variable is represented by a number of hidden variables, the ith pixel value representing the tth pixel belongs to the kth mixed component in the mixed Gaussian distribution and satisfiesMulti represents a distribution of polynomials,is the variance of the t-th mixture component;
in order to encode regularized noise information of video foreground information before embedding, realize self-adaptive modeling of real video dynamic foreground, respectively carrying out conjugate prior form hypothesis on noise distribution variables in a model (3):
where Inv-Gamma denotes an inverse Gamma distribution, Dir denotes a Dirichlet distribution, the expression of (a) is as follows:
whereinRepresenting the degree of membership representing the degree to which the ith pixel value detected by the jth element belongs to the kth mixture component in the mixture Gaussian distribution,the meaning of (C) is the same as in formula (3).
The step S4 is to construct a statistical model based on the steps S2, S3:
where P (v)t) A gaussian distribution with a sufficiently large variance is represented,is the residual epsilon of the tth detectiontThe mixture coefficient of the mixture Gaussian distribution is contained,representing the variance, τ, of each mixture component of the mixed Gaussian distributiontWhich represents the adaptive transform factor, is, as defined by formula (6);
according to the maximum posterior estimation principle, the video foreground and background separation model converted by the statistical model can be converted into the following optimization problem, and U is fixedt=Ut-1:
The simplification is as follows:
wherein DKL(. I. represents KL divergence, R (. pi.)t,∑t) Is a noise regularization term of the form:
here, theIs the residual epsilon of the tth detectiontThe mixture coefficient of the mixture Gaussian distribution is contained,representing the variance, π, of each mixture component of a mixed Gaussian distributiont-1,∑t-1Representing the corresponding t-1 th bin of mixing coefficients and variance vectors, as defined by formula (6), C represents and pit,∑tIrrelevantAnd (4) counting.
In step S5, downsampling is performed to increase the solving speed before modeling and solving the t-th frame image, and Ω represents the subscript set, so that the result is obtained after downsamplingNamely, it is
In step 5, for τ in (7)tPerforming a first order approximation on Δ τtThe solved problem degenerates to a weighted least squares problem, so that the following model is solved to obtain an updated result:
whereinIs the residual epsilon of the tth detectiontThe mixture coefficient of the mixture Gaussian distribution is contained,representing the variance, τ, of each mixture component of the mixed Gaussian distributiontFor adaptive transform factor variables, J is the Jacobi matrix with x at τ, uiIs row i of the base matrix U.
Step S5 adopts EM algorithm to update parameter pi in online foreground and background separation modelt,Σt,vtThe superscript s in the following formula represents the s-th iteration, and the specific process comprises:
s7.2: and (3) giving an iteration format and a termination condition of the M steps in the EM algorithm:
the iteration format is:
wherein:
the iteration termination condition is as follows:
s7.3: setting an iteration initial value:
obtaining initial subspace decomposition by using a PCA method for initial frame data, and then initializing parameter pi by using a MoG methodt,0,∑t,0,vt,0;
S7.4: and (4) performing iterative operation of the expressions (8) to (13) until the termination condition expression (14) is met.
In the step S5, the parameter π is updated for the t-th frame datat,∑t,vt,τtOn the basis of the above-mentioned background, the following model is used to determine the basis U of the backgroundt-1Fine tuning to obtain updated Ut:
The model (15) has the following solution:
updating U according to formulas (16) - (18)tAnd output the foreground
In step S6, a TV norm model is established by using the background continuity characteristics of the surveillance video as follows:
here | · |)TVWhich represents the norm of the TV,for the output result of equation (19), λ is set to(Maximum variance), the above optimization problem translates into:
s.t.F=Zii =1,2 (21) wherein Z1,Z2∈Rm×n,Si(. cndot.) is defined as follows:
solving the TV norm model of formula (21) by using an alternative direction multiplier method ADMM:
s9.1: an augmented Lagrangian function giving the problem (18):
S9.2: establishing an iteration format and a termination condition of an alternating direction multiplier method:
the iteration format is:
wherein rho is a positive number greater than 1, and 1.5 is taken;
the iteration termination condition is as follows:
s9.3: solving the steps (22) and (23) to give an iterative specific formula;
s9.5: iterative operations (22) to (24) are performed until the termination condition (25) is satisfied.
In step S7, the foreground FG is finally outputtBackground BGt=xt-FGtHere FGt,BGtRespectively representing the foreground and the background of the tth detective data.
The invention respectively carries out targeted analysis and coding in establishing the foreground and the background based on the monitoring video data, realizes the on-line monitoring video foreground and background separation model and method with high speed and high precision, and has important application significance for detecting, tracking, identifying, analyzing and the like of the monitoring video target in practical application.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 shows frame data of a part of videos in the Li Datasets data set.
Fig. 3 is a diagram illustrating the effect of downsampling in step S5, where the first line is the original image in the Li data sets, and the second line is the result of downsampling by 1%.
Fig. 4 is a diagram of the video separation effect according to the present invention, where the first column shows the original image in the Li Datasets data set, the second column is the real label of the pre-marked foreground, the third column is the foreground image separated in step S5, and the fourth column is the foreground image separated in step S7.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Example 1: the data set Li Datasets (https:// flying. setas. upenn. edu/. xiaowz/dynamic/wordpress/demolor /) is adopted as the computer simulation experiment object of the invention (see the first column of FIG. 4). See table below (the foreground-background separation speed for each video application in the Li Datasets dataset for the present invention, where FPS represents the number of data frames processed per second):
the data set contains 9 surveillance video data sets, including static background video, background video changing with light conditions, dynamic background video, and where part of the data has a real label of the pre-marked foreground (see the second column of fig. 4), and the frame data of part of the video is shown in fig. 2. The present invention extracts 200 frames of data from each of various videos to perform experiments. The process is shown in figure 1:
step S1, Li Datasets video data are obtained;
step S2: constructing a low-rank decomposition model for the t frame data through the basic principle of foreground and background separation, wherein the low-rank decomposition model is obtained by fitting noise by using a mixed Gaussian mixture containing three mixed components, and the specific expression is as follows:
xt=Utvt+εt(26)
wherein xt∈RdT-th frame image, U, representing a surveillance videot∈Rd×rIs the current expression base of the video background, wherein r < d, the subspaces expressed by these bases constitute a low rank subspace of the original image space, vt∈RrAs a combination coefficient, UtvtDenotes xtIn subspace UtLow rank mapping oftRepresents the residual;
embedding an adaptive transform factor variable in the model, i.e. refining the model (26) to:
wherein tau istFor picture xtThe affine transformation operator variable expresses the video background transformation of rotation, translation, distortion and scale.
Step S3, according to the mixed Gaussian distribution hypothesis of the noise, there are
WhereinIs xtThe value of the ith pixel of (a),is UtThe number of the ith row of (a),a hidden variable is represented by a number of hidden variables, the ith pixel value representing the tth detection belongs to mixed Gaussian distributionThe k-th mixed component and satisfyMulti represents a distribution of polynomials,is the variance of the kth mixture component;
according to the similarity between the frame data backgrounds of the video surveillance video, the following prior distribution can be assumed:
where Inv-Gamma denotes an inverse Gamma distribution, Dir denotes a Dirichlet distribution, the expression of (a) is as follows:
whereinRepresenting the degree of membership representing the degree to which the ith pixel value detected by the jth element belongs to the kth mixture component in the mixture Gaussian distribution,the meaning of (2) is the same as in (28).
Step S4, combining the step S2 and the step S3 and the maximum posterior estimation method, the following separation optimization problem of the foreground and the background of the surveillance video can be obtained:
the simplification is as follows:
here, theIs the residual epsilon of the tth detectiontThe mixture coefficient of the mixture Gaussian distribution is contained, representing the variance, π, of each mixture component of a mixed Gaussian distributiont-1,∑t-1Representing the corresponding t-1 th bin of mixing coefficients and variance vectors, c represents and pi as defined in formula (29)t,∑tAn independent constant.
Step S5, referring to FIG. 3, based on the video data input in step S1, the maximum posterior model in step S4 is applied to construct a video foreground and background separation solving algorithm
A. The initial subspaces U and v were obtained using the PCA algorithm for the first 50 frames of data, and then the following MoG algorithm 1 was used to initialize the parameters:
B. down-sampling the t-th frame data x by 1%tDown-sampling to obtainWhere Ω denotes a subscript set, i.e.
Ω={k1,k2,…,km|1≤kj≤d,j=1,2,…,m}
C. Fixed U is equal to Ut-1Updating pi using EM algorithmt,∑t,vtThe iteration (superscript s denotes the number of iterations) format is as follows:
e-step:
m-step:
D. and (3) iteration termination conditions:
E. updating pi according to the above processt,∑t,vtThen, update UtOnly part of the elements are neededFine tuning is carried out, and the specific optimization model is as follows:
The model (30) has the following explicit solution:
And S6, establishing a TV norm model by using the background continuity characteristics of the surveillance video on the foreground object obtained in the step S5 as follows:
here | · |)TVWhich represents the norm of the TV,for the foreground output in step S5, λ is set(Maximum variance). The above optimization problem can be converted into:
s.t.F=Zi,i=1,2
wherein Z1,Z2∈Rm×n,Si(. cndot.) is defined as follows:
the process of solving uses the following iterative format:
where p is a positive number greater than 1, here taken to be 1.5.
(32) solving the following problem:
the problem (35) has the following optimal solution:
B. the following problem is solved by the equation:
the problem (36) can be decomposed into the following two subproblems (i ═ 1,2) to be solved separately:
are equivalently provided with
Here, theLet v1,v2,…,vnRepresents ZiColumn vector of (d), t1,t2,…,tnRepresents TiThe problem (37) can be decomposed again into the following n subproblems to be solved separately:
the problem (38) can be solved using the following 1D TV denoising algorithm:
step S7: final output FGt(solution of formula (31) in step S6), background BGt=xt-FGt(see the fourth column of FIG. 4).
Claims (3)
1. An online video foreground and background separation method based on regular error modeling is characterized by comprising the following steps:
step S1: acquiring video data of a monitoring system on line;
step S2: constructing a model on the basis of the assumption of a low-rank structure of a video background, embedding an adaptive transformation factor variable into the model, and coding the dynamic change of the video background to realize the adaptive modeling of the real video dynamic background;
step S3: carrying out parametric distribution modeling based on the randomness of the change of the video foreground target, so that the video foreground target can be adaptive to the dynamic change of the video foreground under different scenes at different time, further coding the regularized noise information of the video foreground information before being embedded into the video foreground target, and realizing the adaptive modeling of the real video dynamic foreground;
step S4: combining the step S2 with the step S3 to construct a complete statistical model for monitoring the separation of the foreground and the background of the video;
step S5: in step S1, down-sampling the video data, and applying the foreground and background separation statistical model of the video data in step S4 to construct a video foreground and background separation solving algorithm based on the sampled data, and accelerating the video foreground and background separation solving algorithm; updating the sampling data by using a MoG initialization algorithm and an EM algorithm to finally obtain a foreground target;
step S6: on top of the foreground object obtained in step S5, TV continuity modeling is performed thereon;
step S7: and outputting the finally detected foreground object and background scene of the video according to the results obtained in the steps S5 and S6.
2. The online video foreground-background separation method based on regular error modeling according to claim 1, wherein: the step S2 builds a model: due to the similarity of the background corresponding to each frame of image of the video data, the similarity is encoded by performing the following low-rank expression on the video image:
xt=Utvt+εt(1)
wherein xt∈RdT-th frame image, U, representing a surveillance videot∈Rd×rIs the current expression base of the video background, wherein r < d; these subspaces of the base representation constitute a low-rank subspace, v, of the original image spacet∈RrAs a combination coefficient, UtvtDenotes xtIn subspace UtLow rank mapping oftRepresents the residual;
embedding an adaptive transform factor variable in the model, i.e. improving the model (1) to:
xtοτt=Utvt+εt(2)
wherein tau istFor an image xtThe affine transformation operator variable expresses the video background transformation of rotation, translation, distortion and scale scaling.
3. The online video foreground-background separation method based on regular error modeling according to claim 2, characterized in that: in step S7, the foreground FG is finally outputtBackground BGt=xt-FGtHere FGt,BGtRespectively representing the foreground and background of the image of the t-th frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611252353.6A CN106815854B (en) | 2016-12-30 | 2016-12-30 | On-line video foreground and background separation method based on regular error modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611252353.6A CN106815854B (en) | 2016-12-30 | 2016-12-30 | On-line video foreground and background separation method based on regular error modeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106815854A CN106815854A (en) | 2017-06-09 |
CN106815854B true CN106815854B (en) | 2020-05-15 |
Family
ID=59109330
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611252353.6A Active CN106815854B (en) | 2016-12-30 | 2016-12-30 | On-line video foreground and background separation method based on regular error modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106815854B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107346547B (en) * | 2017-07-04 | 2020-09-04 | 易视腾科技股份有限公司 | Monocular platform-based real-time foreground extraction method and device |
CN108846804B (en) * | 2018-04-23 | 2022-04-01 | 杭州电子科技大学 | Deblurring method based on line graph and column graph model |
CN108933703B (en) * | 2018-08-14 | 2020-06-02 | 西安交通大学 | Environment self-adaptive perception wireless communication channel estimation method based on error modeling |
CN109150775B (en) * | 2018-08-14 | 2020-03-17 | 西安交通大学 | Robust online channel state estimation method for dynamic change of self-adaptive noise environment |
CN110018529B (en) * | 2019-02-22 | 2021-08-17 | 南方科技大学 | Rainfall measurement method, rainfall measurement device, computer equipment and storage medium |
CN112734791B (en) * | 2021-01-18 | 2022-11-29 | 烟台南山学院 | On-line video foreground and background separation method based on regular error modeling |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101371273A (en) * | 2005-12-30 | 2009-02-18 | 意大利电信股份公司 | Video sequence partition |
US8565525B2 (en) * | 2005-12-30 | 2013-10-22 | Telecom Italia S.P.A. | Edge comparison in segmentation of video sequences |
CN100583126C (en) * | 2008-01-14 | 2010-01-20 | 浙江大学 | A video foreground extracting method under conditions of view angle variety based on fast image registration |
CN105761251A (en) * | 2016-02-02 | 2016-07-13 | 天津大学 | Separation method of foreground and background of video based on low rank and structure sparseness |
CN106204477B (en) * | 2016-07-06 | 2019-05-31 | 天津大学 | Video frequency sequence background restoration methods based on online low-rank background modeling |
-
2016
- 2016-12-30 CN CN201611252353.6A patent/CN106815854B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106815854A (en) | 2017-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815854B (en) | On-line video foreground and background separation method based on regular error modeling | |
Lei et al. | A universal framework for salient object detection | |
Vu et al. | Fast low-rank shared dictionary learning for image classification | |
Liu et al. | Learning depth from single monocular images using deep convolutional neural fields | |
Mademlis et al. | A salient dictionary learning framework for activity video summarization via key-frame extraction | |
CN105488812A (en) | Motion-feature-fused space-time significance detection method | |
CN108510013B (en) | Background modeling method for improving robust tensor principal component analysis based on low-rank core matrix | |
CN110096961B (en) | Indoor scene semantic annotation method at super-pixel level | |
Cheng et al. | Towards pose-invariant lip-reading | |
Aldroubi et al. | Similarity matrix framework for data from union of subspaces | |
Javed et al. | Robust background subtraction to global illumination changes via multiple features-based online robust principal components analysis with Markov random field | |
CN114708297A (en) | Video target tracking method and device | |
Shan et al. | Animation design based on 3D visual communication technology | |
CN111428795A (en) | Improved non-convex robust principal component analysis method | |
Abavisani et al. | Domain Adaptive Subspace Clustering. | |
Zou et al. | A nonlocal low-rank regularization method for fractal image coding | |
Dong et al. | Orthonormal dictionary learning and its application to face recognition | |
CN109670506A (en) | Scene Segmentation and system based on Kronecker convolution | |
CN108108652B (en) | Cross-view human behavior recognition method and device based on dictionary learning | |
CN110136164B (en) | Method for removing dynamic background based on online transmission transformation and low-rank sparse matrix decomposition | |
Fang et al. | Learning visual saliency from human fixations for stereoscopic images | |
Wan et al. | Illumination robust video foreground prediction based on color recovering | |
Xiong et al. | ψ-Net: Point Structural Information Network for No-Reference Point Cloud Quality Assessment | |
Wu et al. | Color transfer with salient features mapping via attention maps between images | |
Munir et al. | Background subtraction in videos using LRMF and CWM algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |