CN102542571B

CN102542571B - Moving target detecting method and device

Info

Publication number: CN102542571B
Application number: CN201010608739.2A
Authority: CN
Inventors: 黄光彬; 陈健平; 刘之富; 黄凯峰; 陈俊贤; 林倞; 胡赟; 梁小丹
Original assignee: China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Group Guangdong Co Ltd
Priority date: 2010-12-17
Filing date: 2010-12-17
Publication date: 2014-11-05
Anticipated expiration: 2030-12-17
Also published as: CN102542571A

Abstract

The invention provides a moving target detecting method and device. The method comprises the following steps of: extracting a plurality of video blocks from continuous multiframe images, wherein each video block comprises pixel blocks at the same positions in the continuous multiframe images; extracting the structural features of the video blocks; extracting the texture features of the video blocks; combining the structural feature and the texture feature of each video block to form a feature vector of the video block; combining the feature vectors of the plurality of video blocks to form a background feature matrix; constructing an equation for linearly expressing the feature vectors of the video blocks to be detected by using the feature matrix, and solving the minimum residual error solution of the equation to obtain a solution vector; and judging the video blocks to be detected as moving targets when a residual error corresponding to the solution vector is larger than a first threshold value or the sparsity of the solution vector is smaller than a second threshold value. By using the moving target detecting method and device, the accuracy in detecting the moving targets can be improved.

Description

Moving target detection method and device

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a moving target detection method and device.

Background

The moving object detection algorithm is the most basic and important part in video intelligent analysis, and mainly refers to a technology for automatically detecting a continuous moving object in a video frame sequence from a background by defining a corresponding mathematical model and a segmentation algorithm.

When detecting a moving target, the content of the image needs to be understood, and the main approach is to extract structural features and texture features of the image and use the structural features and the texture features to represent the image. The method of using the structural features is mainly to extract basic outline and structural information of an image, and commonly used methods include a Gabor filter, an initial Sketch (Primal Sketch) filter, and the like. The overall structure of the image is described as completely as possible by arranging filters with different directions, sizes and shapes.

The method using texture features mainly extracts texture regions with certain regular changes of an image, and a common method is to use statistical information such as a color histogram, Local Binary Pattern (LBP) features, and histogram of gradient directions (HOG) for description.

One moving object detection algorithm of the prior art is: the method comprises the steps of extracting pixel blocks on a single image frame by utilizing the spatial correlation of a background, carrying out feature description on the extracted pixel blocks, and judging a moving target and the background according to the difference of the feature description, wherein the pixel blocks with a large difference value of the features are a moving target area, and the pixel blocks with a small difference value of the features are a background area.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

for the background modeling algorithm based on the pixel block, in fact, due to the fact that a plurality of complex background conditions exist in a real scene, such as sudden illumination change, rain and fog weather and the like, a complete background image cannot be obtained according to single-frame pixel block analysis, and in addition, due to the illumination shadow influence, mirror reflection and other changes in the scene, the accuracy of the moving object detection method based on the background modeling algorithm is low.

Disclosure of Invention

The invention aims to provide a moving target detection method and a moving target detection device, which solve the problem of low accuracy of moving target detection in the prior art.

In order to solve the above problems, the present invention provides the following technical solutions:

a moving object detection method, comprising:

extracting a plurality of video blocks from a continuous multi-frame image, wherein each video block consists of pixel blocks at the same position in the continuous multi-frame image;

extracting the structural features of the video block;

extracting texture features of the video block;

combining the structural features and the texture features of each video block into a feature vector of the video block;

combining the feature vectors of the plurality of video blocks into a feature matrix of the background;

constructing an equation for linearly expressing the characteristic vector of the video block to be detected by using the characteristic matrix, and solving a minimum residual solution of the equation to obtain a solution vector;

and when the residual error corresponding to the solution vector is larger than a first threshold value, or the sparsity of the solution vector is smaller than a second threshold value, determining the video block to be detected as a moving target.

In the above moving object detection method, the extracting structural features of the video block includes:

constructing two-dimensional Gabor filters in multiple directions;

vertically moving or keeping each two-dimensional Gabor filter still to obtain a plurality of two-dimensional Gabor filters;

solving the linear response of the video block to the plurality of two-dimensional Gabor filters to obtain a linear response vector;

constructing the linear response vector as a structural feature of the video block.

The above moving object detection method, wherein the constructing the linear response vector as the structural feature of the video block comprises:

and selecting a preset number of elements from the linear response vector according to the sequence from big to small, updating the selected preset number of elements to 1, updating the rest elements to 0, and constructing the linear response vector obtained by updating as the structural characteristic of the video block.

In the above moving object detecting method, the predetermined number is 3.

In the above moving object detection method, the extracting texture features of the video block includes:

constructing two-dimensional Gabor filters in multiple directions;

horizontally moving or keeping each two-dimensional Gabor filter still to obtain a plurality of two-dimensional Gabor filters;

obtaining convolution response of the video block to the plurality of two-dimensional Gabor filters to obtain a convolution response vector;

constructing a statistical histogram of the convolution response vector as a texture feature of the video block.

The above moving object detection method, wherein the constructing the statistical histogram of the convolution response vector as the texture feature of the video block:

and carrying out normalization processing on the statistical histogram to obtain a mean histogram, and constructing the mean histogram as the texture feature of the video block.

In the above moving object detection method, the minimized residual solution of the equation is obtained by minimizing 1-norm.

In the above moving object detection method, the sparsity of the solution vector is calculated according to the following formula:

wherein,for the purpose of the solution vector,is composed ofOf k isThe dimension(s) of (a) is,to removeThe vector obtained after the zero element in (1),the 1-norm of (a) is,is composed of1-norm of (1).

A moving object detecting apparatus comprising:

the video block extraction module is used for extracting a plurality of video blocks from continuous multi-frame images, and each video block consists of pixel blocks at the same position in the continuous multi-frame images;

the structural feature extraction module is used for extracting the structural features of the video block;

the texture feature extraction module is used for extracting the texture features of the video block;

the characteristic vector constructing module is used for combining the structural characteristic and the texture characteristic of each video block into a characteristic vector of the video block;

the characteristic matrix constructing module is used for combining the characteristic vectors of the video blocks into a characteristic matrix of the background;

the solving module is used for constructing an equation for linearly expressing the characteristic vector of the video block to be detected by using the characteristic matrix and solving the minimized residual solution of the equation to obtain a solution vector;

and the judging module is used for judging the video block to be detected as a moving target when the residual error corresponding to the solution vector is larger than a first threshold value or the sparsity of the solution vector is smaller than a second threshold value.

The moving object detection device described above, wherein the structural feature extraction module is further configured to:

constructing two-dimensional Gabor filters in multiple directions;

The moving object detection device described above, wherein the texture feature extraction module is further configured to:

constructing two-dimensional Gabor filters in multiple directions;

The moving object detecting device described above, wherein the solving module is further configured to: the minimized residual solution of the equation is solved by minimizing the 1-norm.

Compared with the prior art that the algorithm of background modeling is carried out by adopting pixel blocks, the invention provides a scheme of carrying out background modeling after extracting the video block primitive (the pixel blocks at the same position in the multi-frame image sequence form a video block primitive) of a multi-frame image sequence by combining the space correlation and the time correlation of the image (the background of the image frame sequence under a period of time sequence has certain stability), and the problem that the background is changed greatly on a single-frame image due to the change of complex scenes such as rain and fog weather, lamplight and the like can be avoided, so that the accuracy of detecting the moving target can be improved.

Drawings

FIG. 1 is a flow chart of a moving object detection method according to an embodiment of the invention;

FIG. 2 is a diagram illustrating a comparison of video blocks and pixel blocks according to an embodiment of the present invention;

fig. 3 is a structural diagram of a moving object detecting device according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a scheme for modeling the background after extracting the video block elements of a multi-frame image sequence by combining the spatial correlation and the time correlation of the image (the background of the image frame sequence under a period of time sequence has certain stability), which can improve the accuracy of moving target detection in complex scenes such as rainy and foggy weather, sudden change of illumination, shadow reflection and the like, thereby providing better pretreatment for subsequent intelligent analysis and application.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, the moving object detection method of the embodiment of the present invention mainly includes the following steps:

step 101: extracting a plurality of video blocks from a continuous multi-frame image, wherein each video block consists of pixel blocks at the same position in the continuous multi-frame image;

sequentially extracting a plurality of image frame sequences from a video, and inputting a new image frame I_n(N ═ 1, 2.) (image size W × H), it is divided into N ═ W × H)/(H × H) pixel blocks, represented as N ═ 1, 2.) (image size W × H)Where i is the pixel block index and h is the height and width of the pixel block. Referring to fig. 2, for each pixel block P_i，nAnd combining the pixel block and the pixel blocks at the same position in the first t-1 image frames into a video block element, wherein the size of the element is h x t. Thus, a video block primitive may be defined asWherein, B_i＝{B_i，1，B_i，2，...，B_i，t}. The image feature extraction and the moving object detection are based on the video block element B_iAnalysis and processing are carried out.

The above is only one implementation of video block primitive extraction. In a specific implementation, if the calculation capability allows, a video block may be extracted for each pixel point in the current image frame, or a video block may be extracted for every several pixel points. The video block is composed of t pixel blocks at the same position, and the central point of the pixel block is the pixel point. For the edge pixel point, a part of the pixel block corresponding to the edge pixel point is located outside the image, and at this time, the existing method can be adopted for processing, for example, the value of the pixel point located outside the image is directly set to zero, or the value of the pixel point of the edge region of the image is copied to the pixel point located outside the image.

Step 102: extracting the structural features of the video block;

various methods can be adopted for extracting the structural features, and the method for extracting the structural features of the video block by adopting a space-time Gabor filter is introduced as follows, which specifically comprises the following steps:

constructing two-dimensional Gabor filters in multiple directions;

Examples are as follows:

assuming that the size of video block B is 15 × 5, spatio-temporal Gabor filtering results in a subspace of Gabor basis functions as:

wherein, epsilon is a residual vector, Gi, i ═ 1, 2., N is a space-time Gabor filter, and the construction method is as follows:

firstly, constructing two-dimensional Gabor filters with the size of 13 × 13 in 8 directions;

then, they are shifted by 0, ± 2, ± 4, ± 6, ± 8 pixels in the direction perpendicular to the filter, and N8 × 9 — 72 Gabor basis functions are obtained.

The linear response vector of video block B to the spatio-temporal Gabor filter is α ═ (α i, i ═ 1, 2., 72)^T. According to experiments, each video block B can be relatively well represented by up to 3 Gabor basis functions, and therefore, the linear response vector can be updated as follows for subsequent sparse representation model solution:

step 103: extracting texture features of the video block;

various methods can be adopted for extracting texture features, and the method for extracting texture features of video blocks by adopting a Gabor response statistical histogram specifically comprises the following steps:

constructing two-dimensional Gabor filters in multiple directions;

Examples are as follows:

still assuming that the size of video block B is 15 × 5, the subspace represented by the statistical histogram after Gabor-based response for video block B is:

where ε is the residual vector, F_i(i ═ 1, 2.., N) is: the two-dimensional Gabor filters in 8 directions with a size of 7 × 7 are shifted by 3 pixels in 8 directions or are not shifted, and thus N — 8 × 9 — 72 two-dimensional Gabor filters are obtained.

<B，F_i>For performing F on video block B_iNormalization of the result after convolution response, H_i(<B，F_i>) Is a histogram of the filter response for the pixel,is a mean histogram.

Step 104: combining the structural features and the texture features of each video block into a feature vector of the video block;

step 105: combining the feature vectors of the plurality of video blocks into a feature matrix of the background;

at initialization, since it is not known which video blocks are background and which video blocks are moving objects, the feature vectors of all video blocks are combined into a feature matrix (initial background model) of the background. After the moving object detection of the subsequent steps is performed, the feature vector of the video block determined as the moving object can be removed from the feature matrix, so that the dynamic update of the background model is realized.

Step 106: constructing an equation for linearly expressing the characteristic vector of the video block to be detected by using the characteristic matrix, and solving a minimum residual solution of the equation to obtain a solution vector;

the feature vector of the video block to be detected is P, the feature matrix of the background is A, and the following linear expression equation is constructed:

P＝Aw+e

wherein w is a linear expression vector, e ∈ R^kIs a residual vector (which is a sparse vector) and k is the dimension of w, i.e. the number of eigenvectors included in the feature matrix.

It should be noted that solving the above equation is a non-deterministic (NP) problem, and it is impossible to obtain an optimal solution, but a predetermined algorithm may be used to obtain several satisfactory solutions, and then, from these satisfactory solutions, the corresponding solution with the minimum residual error is selected as the solution vector.

In recent years, sparse representation models have been widely used in the fields of signal and image analysis. The model describes images and signals in the form of linear combinations of a small number of elements in a basis or dictionary. Wherein, 1-norm (l)¹Norm) is solved as a sparse representationMost efficient way, sparse solution of the system of linear equations can be achieved by minimizing l in the general case¹The convex optimization problem of norm solves robustly and accurately.

For example, "Stochastic methods for l published in the International machine learning Association (ICML) in 2009¹Regularized Loss Minimization' proposes a solution l¹An efficient method of minimizing problems. For the efficiency of the algorithm, it is desirable to describe each image block with as few features as possible and relatively accurately and completely. The prior moving object detection algorithm needs to store a large amount of feature vectors to establish a background model, and then the moving object is extracted by comparing feature vector difference values among multiple frames. And obtaining sparse representation of the background model by adopting a sparse expression model and passing through¹The minimized residual error is accurately segmented, and the problems of time efficiency, accuracy and the like can be well improved.

In particular to solving the above equation to minimize the residual i⁰Solving norm to obtain w of the most sparse residual:

if e is sufficiently sparse, then l⁰Minimum valueEquivalent to solving for l¹Minimum valueThis is a convex optimization problem that can be solved efficiently by linear methods and the solution is unique:

wherein,

step 107: and when the residual error corresponding to the solution vector is larger than a first threshold value, or the sparsity of the solution vector is smaller than a second threshold value, determining the video block to be detected as a moving target.

After the solution vector is obtained, a residual corresponding to the solution vector may be calculated, if the residual is greater than or equal to a first threshold, it indicates that the video block P is a moving object, and if the residual is less than the first threshold, it is determined that the video block P and the background model a are in the same space, and it is determined that the video block P and the background model a are background.

Experiments show that even the video block residual with the largest difference is small, therefore, in order to further improve the accuracy of the moving object detection, the invention also provides the following moving object judgment method:

first, the sparsity solution is performed on each of the plurality of satisfactory solutions obtained in step 106:

wherein,in order to solve the vector, the vector is calculated,is composed ofOf k isThe dimension(s) of (a) is,to removeThe vector obtained after the zero element in (1),is composed ofThe 1-norm of (a) is,is composed of1-norm of (1).

Then, obtaining the maximum value of the sparsity corresponding to the multiple satisfactory solutions, defining the maximum sparsity as the maximum sparsity, if the maximum sparsity is larger than or equal to a second threshold, judging the video block as the background, and if the maximum sparsity is smaller than the second threshold, judging the video block as the moving target.

Of course, only one sparsity, for example, the sparsity corresponding to the minimum residual solution, may be calculated, and the moving object may be determined by using the above method.

One method of characterizing the sparsity of a vector is given above, and other methods known in the art may be used to characterize the sparsity of a vector.

In addition, after the moving object is detected, the embodiment of the present invention may further remove an unreasonable moving object through two-dimensional calibration information, which is specifically as follows:

the two-dimensional calibration mainly comprises the steps of calibrating the height and the position of a person in a scene and calibrating the surface in a video, a moving target can move in a continuous time sequence only by depending on a supporting plane or the ground according to prior knowledge, a parameter model of a camera can be obtained by depending on the calibration and the height of the person, and the size of a possible moving target pixel block in the scene can be estimated by surface calibration, so that an unreasonable moving target video block can be well eliminated.

Further, the embodiment of the invention can also process the moving target which does not move for a long time, namely, the following multi-level background modeling is carried out:

video block B for each region of video_iPerforming a moving target pixel block count N_fgN each time it is judged to be foreground (here we define a moving object as foreground in a scene)_fg＝N_fg+1, if N_fg＞T_adpt(T_adptA preset threshold), adding the video block of the area into the second layer background sequence to consider that the foreground is temporarily blended into the background, but still saving the record of the video block as a moving object and using the record as the background for updating the background model.

In addition, when dynamically updating the background model, the embodiment of the present invention further provides the following optimization processing manner:

video block B for each pixel point_iGlobally preserving a feature base space A_iAdding the detected background video block into the pixel point video block B in the updating stage of the background model_iCharacteristic base space A of_iIn (1). Due to the characteristic base space A thus stored_iThe dimensionality increases with increasing number of processed video frames. Therefore, the embodiment of the invention adopts a method of frame-by-frame clustering: i.e. after gamma frame, to feature base space A_iPerforming a K-means clustering algorithm to reduce A_iFinally, a dynamically updated background model is generated. γ can be set according to the time efficiency requirements of the algorithm.

The following describes an apparatus for carrying out the above method.

Referring to fig. 3, the moving object detection apparatus according to the embodiment of the present invention mainly includes: the video block extraction module 10, the structural feature extraction module 20, the texture feature extraction module 30, the feature vector construction module 40, the feature matrix construction module 50, the solving module 60, and the determination module 70, wherein:

the video block extraction module 10 is configured to extract a plurality of video blocks from a continuous multi-frame image, where each video block is composed of a pixel block at the same position in the continuous multi-frame image.

The structural feature extraction module 20 is configured to extract structural features of the video block. The structural feature extraction may be performed by various methods, and preferably, the structural feature extraction module 20 performs the structural feature extraction as follows:

constructing two-dimensional Gabor filters in multiple directions;

And the texture feature extraction module 30 is configured to extract texture features of the video block. Various methods can be adopted for texture feature extraction, and preferably, the texture feature extraction module 30 performs texture feature extraction as follows:

constructing two-dimensional Gabor filters in multiple directions;

And a feature vector construction module 40, configured to combine the structural features and the texture features of each video block into a feature vector of the video block.

And an eigen matrix construction module 50, configured to combine eigenvectors of the plurality of video blocks into an eigen matrix of the background.

And the solving module 60 is used for constructing an equation for linearly expressing the characteristic vector of the video block to be detected by using the characteristic matrix, and solving a minimum residual solution of the equation to obtain a solution vector. Preferably, the solving module 60 solves the minimized residual solution of the equation by minimizing a 1-norm.

And a determining module 70, configured to determine the video block to be detected as a moving target when a residual corresponding to the solution vector is greater than a first threshold, or the sparsity of the solution vector is smaller than a second threshold.

In summary, compared with the existing algorithm that uses pixel blocks for background modeling, the embodiment of the present invention combines the spatial correlation and the temporal correlation of images, and combines the image appearance information and the motion information to perform modeling and calculation after performing video block primitive extraction on a multi-frame image sequence. Therefore, the situation that the background changes in a single-frame image due to changes of complex scenes such as rain and fog weather, light changes and the like can be effectively dealt with, and the accuracy of moving target detection can be improved.

Aiming at the defect that the structural features in the prior art have no time persistence, the embodiment of the invention provides that the offset variable is added to the original Gabor features, and the appearance and the motion change of the background can be obtained by describing video block primitives; the texture can be understood as regular superposition of structural information by utilizing the regularity of the texture, and the description of a texture region can be effectively carried out by counting the response of a structural filter.

The embodiment of the invention adopts a sparse expression model to carry out background matching and calculates l¹And the minimized residual error is used for detecting the moving object, so that the periodicity and the efficiency of detecting the moving object are further improved.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and those skilled in the art should understand that the technical solutions of the present invention can be modified or substituted with equivalents without departing from the spirit scope of the technical solutions of the present invention, which should be covered by the scope of the claims of the present invention.

Claims

1. A moving object detection method, comprising:

extracting the structural features of the video block;

extracting texture features of the video block;

constructing an equation for linearly expressing the characteristic vector of the video block to be detected by using the characteristic matrix, and solving a minimized residual solution of the equation by a method of minimizing 1-norm to obtain a solution vector;

when the residual error corresponding to the solution vector is larger than a first threshold value, or the sparsity of the solution vector is smaller than a second threshold value, determining the video block to be detected as a moving target;

the method calculates the sparsity of the solution vector according to the following formula:

wherein,for the purpose of the solution vector,is composed ofOf k isThe dimension(s) of (a) is,to removeThe vector obtained after the zero element in (1),is composed ofThe 1-norm of (a) is,is composed of1-norm of (1).

2. The moving object detection method of claim 1, wherein said extracting structural features of video blocks comprises:

constructing two-dimensional Gabor filters in multiple directions;

3. The moving object detection method of claim 2 wherein said constructing the linear response vector as a structural feature of the video block comprises:

4. A moving object detecting method according to claim 3, characterized in that:

the predetermined number is 3.

5. The moving object detection method of claim 1, wherein said extracting texture features of video blocks comprises:

constructing two-dimensional Gabor filters in multiple directions;

6. The moving object detection method of claim 5 wherein said constructing the statistical histogram of the convolution response vector as a texture feature of the video block:

7. A moving object detecting apparatus, comprising:

the solving module is used for constructing an equation for linearly expressing the characteristic vector of the video block to be detected by using the characteristic matrix, and solving a minimized residual solution of the equation by a method of minimizing 1-norm to obtain a solution vector;

the judging module is used for judging the video block to be detected as a moving target when the residual error corresponding to the solution vector is larger than a first threshold value or the sparsity of the solution vector is smaller than a second threshold value;

the device calculates the sparsity of the solution vector according to the following formula:

8. The moving object detecting device according to claim 7, wherein the structural feature extracting module is further configured to:

constructing two-dimensional Gabor filters in multiple directions;

9. The moving object detection device of claim 7, wherein the texture feature extraction module is further configured to:

constructing two-dimensional Gabor filters in multiple directions;

10. The moving object detecting device of claim 7, wherein the solving module is further configured to:

the minimized residual solution of the equation is solved by minimizing the 1-norm.