CN106815854B

CN106815854B - On-line video foreground and background separation method based on regular error modeling

Info

Publication number: CN106815854B
Application number: CN201611252353.6A
Authority: CN
Inventors: 孟德宇; 雍宏巍; 岳宗胜; 赵谦
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2020-05-15
Anticipated expiration: 2036-12-30
Also published as: CN106815854A

Abstract

A on-line video foreground and background separation method based on regular error modeling includes 1, acquiring video data of a monitoring system in real time; 2. embedding transformation operator optimization variables which can be based on the video background into the model based on the real-time change of the video background environment; 3. constructing a regular error modeling model based on the real-time change characteristics of the video foreground target; 4. combining the step 2 and the step 3 to construct a complete statistical model, and obtaining a complete monitoring video foreground and background separation model according to a maximum posterior estimation method; 5. down-sampling the video data, accelerating the calculation of the foreground and background separation model of the video data in the step (4), and realizing the real-time solution of the model; 6. and outputting the foreground and the background in real time according to the foreground and background separation result of the monitoring video data obtained in the step 5. The method for separating the foreground and the background of the online monitoring video has high speed and high precision, and has important significance in detecting, tracking, identifying, analyzing and the like of the monitoring video target in practical application.

Description

On-line video foreground and background separation method based on regular error modeling

Technical Field

The invention relates to a video processing method of a monitoring video, in particular to an online video foreground and background separation method based on regular error modeling.

Background

The foreground and background separation of the surveillance video has important application value in real life, such as target tracking, urban traffic monitoring and the like. However, nowadays, monitoring equipment is distributed in all corners of the world, and daily monitoring video data is extremely large and complex in structure. On the premise of ensuring high precision and high efficiency, the real-time foreground and background separation of the monitoring video is still a huge challenge.

In the field of image processing, a number of techniques for separating the foreground and background of a video are available. Common techniques include a direct separation method based on statistical assumptions, a subspace learning method, and an online separation method, etc.

The direct separation method based on statistical hypothesis makes certain hypothesis of statistical distribution on frame data of the video, and then achieves the purpose of separating foreground and background based on some statistics, such as a median or mean model, a histogram model, and the like. In addition, the MoG and MoGG methods consider a more refined statistical distribution hypothesis for frame data, and use some mixed distributions (such as gaussian mixtures) to fit each frame data, resulting in a better separation effect. However, these methods ignore the structural information of the video, such as spatial continuity of the foreground, similarity of the background in time, etc. In contrast, the subspace learning method finely encodes the video structure information, and by assuming that the background of the video has a low-rank structure, the structure information such as the spatial continuity of the video foreground and the similarity of the background in time is integrated into a model, so that the obtained foreground and background separation effect is ideal.

Although the spatial learning method has achieved remarkable results, there is a gap from the actual practical application. In the current society, video data is rapidly increased every moment, and the foreground and background separation technology is required to have high efficiency on the premise of ensuring high precision; on the other hand, in the face of monitoring data which is constantly emerging all the time, a real-time online separation technology needs to be provided. Although some on-line separation methods exist at present, the double requirements of high precision and high efficiency cannot be met.

In view of the deficiencies of the prior art, it is necessary to provide a technology for real-time online foreground and background separation of a continuously-appearing monitoring video with high precision and high efficiency, and particularly, the technology should be capable of effectively adapting to the dynamic foreground target type and the dynamic background environmental change existing in the video.

Disclosure of Invention

The invention aims to provide an online video foreground and background separation method based on regular error modeling, which can more fully and accurately utilize structural information of a video to perform statistical modeling, thereby achieving a separation effect with higher precision and ensuring high efficiency of processing.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

step S1: acquiring video data of a monitoring system on line;

step S2: constructing a model on the basis of the assumption of a low-rank structure of a video background, embedding an adaptive transformation factor variable into the model, and coding the dynamic change of the video background to realize the adaptive modeling of the real video dynamic background;

step S3: carrying out parametric distribution modeling based on the randomness of the change of the video foreground target, so that the video foreground target can be adaptive to the dynamic change of the video foreground under different scenes at different time, further coding the regularized noise information of the video foreground information before being embedded into the video foreground target, and realizing the adaptive modeling of the real video dynamic foreground;

step S4: combining the step S2 with the step S3 to construct a complete statistical model for monitoring the separation of the foreground and the background of the video;

step S5: in step S1, down-sampling the video data, and applying the foreground and background separation statistical model of the video data in step S4 to speed up the item based on the sampled data;

step S6: on top of the foreground object obtained in step S5, TV continuity modeling is performed thereon;

step S7: and outputting the finally detected foreground object and background scene of the video according to the results obtained in the steps S5 and S6.

The step S2 builds a model: due to the similarity of the background corresponding to each frame of image of the video data, the similarity is encoded by performing the following low-rank expression on the video image:

x^t＝U^tv^t+ε^t(1)

wherein x^t∈R^dT-th frame image, U, representing a surveillance video^t∈R^d×rIs the current expression base of the video background, wherein r < d, the subspaces expressed by these bases constitute a low rank subspace of the original image space, v^t∈R^rAs a combination coefficient, U^tv^tDenotes x^tIn subspace U^tLow rank mapping of^tRepresents the residual;

embedding an adaptive transform factor variable in the model, i.e. improving the model (1) to:

wherein tau is^tFor picture x^tThe affine transformation operator variable expresses the video background transformation of rotation, translation, distortion and scale.

The parameterized distribution modeling of the step S3 is to model the residual variable ε in the model (2)^tThe coding is mixed Gaussian distribution, so that the coding is adaptive to the dynamic change of a video foreground, namely a video background residual under different time and different scenes, and the corresponding model is as follows:

wherein

Is x^tThe value of the ith pixel of (a),

is U^tThe number of the ith row of (a),

a hidden variable is represented by a number of hidden variables,

the ith pixel value representing the tth pixel belongs to the kth mixed component in the mixed Gaussian distribution and satisfies

Multi represents a distribution of polynomials,

is the variance of the t-th mixture component;

in order to encode regularized noise information of video foreground information before embedding, realize self-adaptive modeling of real video dynamic foreground, respectively carrying out conjugate prior form hypothesis on noise distribution variables in a model (3):

where Inv-Gamma denotes an inverse Gamma distribution, Dir denotes a Dirichlet distribution,

the expression of (a) is as follows:

wherein

Representing the degree of membership representing the degree to which the ith pixel value detected by the jth element belongs to the kth mixture component in the mixture Gaussian distribution,

the meaning of (C) is the same as in formula (3).

The step S4 is to construct a statistical model based on the steps S2, S3:

where P (v)^t) A gaussian distribution with a sufficiently large variance is represented,

is the residual epsilon of the tth detection^tThe mixture coefficient of the mixture Gaussian distribution is contained,

representing the variance, τ, of each mixture component of the mixed Gaussian distribution^tWhich represents the adaptive transform factor, is,

as defined by formula (6);

according to the maximum posterior estimation principle, the video foreground and background separation model converted by the statistical model can be converted into the following optimization problem, and U is fixed^t＝U^t-1：

The simplification is as follows:

wherein D_KL(. I. represents KL divergence, R (. pi.)^t,∑^t) Is a noise regularization term of the form:

here, the

representing the variance, π, of each mixture component of a mixed Gaussian distribution^t-1，∑^t-1Representing the corresponding t-1 th bin of mixing coefficients and variance vectors,

as defined by formula (6), C represents and pi^t,∑^tIrrelevantAnd (4) counting.

In step S5, downsampling is performed to increase the solving speed before modeling and solving the t-th frame image, and Ω represents the subscript set, so that the result is obtained after downsampling

Namely, it is

Correspondingly, for U^tThe row vector is down-sampled to obtain

In step 5, for τ in (7)^tPerforming a first order approximation on Δ τ^tThe solved problem degenerates to a weighted least squares problem, so that the following model is solved to obtain an updated result:

wherein

representing the variance, τ, of each mixture component of the mixed Gaussian distribution^tFor adaptive transform factor variables, J is the Jacobi matrix with x at τ, u_iIs row i of the base matrix U.

Step S5 adopts EM algorithm to update parameter pi in online foreground and background separation model^t,Σ^t,v^tThe superscript s in the following formula represents the s-th iteration, and the specific process comprises:

s7.1: giving out the membership degree of the step E in the EM algorithm

The update formula of (2):

s7.2: and (3) giving an iteration format and a termination condition of the M steps in the EM algorithm:

the iteration format is:

wherein:

the iteration termination condition is as follows:

s7.3: setting an iteration initial value:

obtaining initial subspace decomposition by using a PCA method for initial frame data, and then initializing parameter pi by using a MoG method^t,0,∑^t,0,v^t,0；

S7.4: and (4) performing iterative operation of the expressions (8) to (13) until the termination condition expression (14) is met.

In the step S5, the parameter π is updated for the t-th frame data^t,∑^t,v^t，τ^tOn the basis of the above-mentioned background, the following model is used to determine the basis U of the background^t-1Fine tuning to obtain updated U^t：

Wherein

The model (15) has the following solution:

wherein

Represents U^tThe number of the ith row of (a),

and

the expression of (a) is as follows:

updating U according to formulas (16) - (18)^tAnd output the foreground

In step S6, a TV norm model is established by using the background continuity characteristics of the surveillance video as follows:

here | · |)_TVWhich represents the norm of the TV,

for the output result of equation (19), λ is set to

(

Maximum variance), the above optimization problem translates into:

s.t.F＝Z_ii =1,2 (21) wherein Z₁,Z₂∈R^m×n,S_i(. cndot.) is defined as follows:

solving the TV norm model of formula (21) by using an alternative direction multiplier method ADMM:

s9.1: an augmented Lagrangian function giving the problem (18):

wherein P is₁,P₂∈R^m×n，

S9.2: establishing an iteration format and a termination condition of an alternating direction multiplier method:

the iteration format is:

wherein rho is a positive number greater than 1, and 1.5 is taken;

the iteration termination condition is as follows:

s9.3: solving the steps (22) and (23) to give an iterative specific formula;

s9.4: setting an initial value of iteration:

s9.5: iterative operations (22) to (24) are performed until the termination condition (25) is satisfied.

In step S7, the foreground FG is finally output^tBackground BG^t＝x^t-FG^tHere FG^t,BG^tRespectively representing the foreground and the background of the tth detective data.

The invention respectively carries out targeted analysis and coding in establishing the foreground and the background based on the monitoring video data, realizes the on-line monitoring video foreground and background separation model and method with high speed and high precision, and has important application significance for detecting, tracking, identifying, analyzing and the like of the monitoring video target in practical application.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 shows frame data of a part of videos in the Li Datasets data set.

Fig. 3 is a diagram illustrating the effect of downsampling in step S5, where the first line is the original image in the Li data sets, and the second line is the result of downsampling by 1%.

Fig. 4 is a diagram of the video separation effect according to the present invention, where the first column shows the original image in the Li Datasets data set, the second column is the real label of the pre-marked foreground, the third column is the foreground image separated in step S5, and the fourth column is the foreground image separated in step S7.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Example 1: the data set Li Datasets (https:// flying. setas. upenn. edu/. xiaowz/dynamic/wordpress/demolor /) is adopted as the computer simulation experiment object of the invention (see the first column of FIG. 4). See table below (the foreground-background separation speed for each video application in the Li Datasets dataset for the present invention, where FPS represents the number of data frames processed per second):

the data set contains 9 surveillance video data sets, including static background video, background video changing with light conditions, dynamic background video, and where part of the data has a real label of the pre-marked foreground (see the second column of fig. 4), and the frame data of part of the video is shown in fig. 2. The present invention extracts 200 frames of data from each of various videos to perform experiments. The process is shown in figure 1:

step S1, Li Datasets video data are obtained;

step S2: constructing a low-rank decomposition model for the t frame data through the basic principle of foreground and background separation, wherein the low-rank decomposition model is obtained by fitting noise by using a mixed Gaussian mixture containing three mixed components, and the specific expression is as follows:

x^t＝U^tv^t+ε^t(26)

embedding an adaptive transform factor variable in the model, i.e. refining the model (26) to:

Step S3, according to the mixed Gaussian distribution hypothesis of the noise, there are

Wherein

Is x^tThe value of the ith pixel of (a),

is U^tThe number of the ith row of (a),

a hidden variable is represented by a number of hidden variables,

the ith pixel value representing the tth detection belongs to mixed Gaussian distributionThe k-th mixed component and satisfy

Multi represents a distribution of polynomials,

is the variance of the kth mixture component;

according to the similarity between the frame data backgrounds of the video surveillance video, the following prior distribution can be assumed:

the expression of (a) is as follows:

wherein

the meaning of (2) is the same as in (28).

Step S4, combining the step S2 and the step S3 and the maximum posterior estimation method, the following separation optimization problem of the foreground and the background of the surveillance video can be obtained:

the simplification is as follows:

here, the

c represents and pi as defined in formula (29)^t,∑^tAn independent constant.

Step S5, referring to FIG. 3, based on the video data input in step S1, the maximum posterior model in step S4 is applied to construct a video foreground and background separation solving algorithm

A. The initial subspaces U and v were obtained using the PCA algorithm for the first 50 frames of data, and then the following MoG algorithm 1 was used to initialize the parameters:

B. down-sampling the t-th frame data x by 1%^tDown-sampling to obtain

Where Ω denotes a subscript set, i.e.

Ω＝{k₁,k₂,…,k_m|1≤k_j≤d,j＝1,2,…,m}

Similarly for U^tDown-sampling to obtain

C. Fixed U is equal to U^t-1Updating pi using EM algorithm^t,∑^t,v^tThe iteration (superscript s denotes the number of iterations) format is as follows:

e-step:

m-step:

D. and (3) iteration termination conditions:

E. updating pi according to the above process^t,∑^t,v^tThen, update U^tOnly part of the elements are needed

Fine tuning is carried out, and the specific optimization model is as follows:

wherein

The model (30) has the following explicit solution:

here, the

To represent

The number of the ith row of (a),

and

the expression of (a) is as follows:

F. output foreground

(see the third column of FIG. 4)

And S6, establishing a TV norm model by using the background continuity characteristics of the surveillance video on the foreground object obtained in the step S5 as follows:

here | · |)_TVWhich represents the norm of the TV,

for the foreground output in step S5, λ is set

(

Maximum variance). The above optimization problem can be converted into:

s.t.F＝Z_i,i＝1,2

wherein Z₁,Z₂∈R^m×n,S_i(. cndot.) is defined as follows:

the process of solving uses the following iterative format:

where p is a positive number greater than 1, here taken to be 1.5.

(32) solving the following problem:

the problem (35) has the following optimal solution:

B. the following problem is solved by the equation:

the problem (36) can be decomposed into the following two subproblems (i ═ 1,2) to be solved separately:

are equivalently provided with

Here, the

Let v₁,v₂,…,v_nRepresents Z_iColumn vector of (d), t₁,t₂,…,t_nRepresents T_iThe problem (37) can be decomposed again into the following n subproblems to be solved separately:

the problem (38) can be solved using the following 1D TV denoising algorithm:

step S7: final output FG^t(solution of formula (31) in step S6), background BG^t＝x^t-FG^t(see the fourth column of FIG. 4).

Claims

1. An online video foreground and background separation method based on regular error modeling is characterized by comprising the following steps:

step S1: acquiring video data of a monitoring system on line;

step S5: in step S1, down-sampling the video data, and applying the foreground and background separation statistical model of the video data in step S4 to construct a video foreground and background separation solving algorithm based on the sampled data, and accelerating the video foreground and background separation solving algorithm; updating the sampling data by using a MoG initialization algorithm and an EM algorithm to finally obtain a foreground target;

2. The online video foreground-background separation method based on regular error modeling according to claim 1, wherein: the step S2 builds a model: due to the similarity of the background corresponding to each frame of image of the video data, the similarity is encoded by performing the following low-rank expression on the video image:

x^t＝U^tv^t+ε^t(1)

wherein x^t∈R^dT-th frame image, U, representing a surveillance video^t∈R^d×rIs the current expression base of the video background, wherein r < d; these subspaces of the base representation constitute a low-rank subspace, v, of the original image space^t∈R^rAs a combination coefficient, U^tv^tDenotes x^tIn subspace U^tLow rank mapping of^tRepresents the residual;

x^tοτ^t＝U^tv^t+ε^t(2)

wherein tau is^tFor an image x^tThe affine transformation operator variable expresses the video background transformation of rotation, translation, distortion and scale scaling.

3. The online video foreground-background separation method based on regular error modeling according to claim 2, characterized in that: in step S7, the foreground FG is finally output^tBackground BG^t＝x^t-FG^tHere FG^t，BG^tRespectively representing the foreground and background of the image of the t-th frame.