CN107871315B

CN107871315B - Video image motion detection method and device

Info

Publication number: CN107871315B
Application number: CN201710934534.5A
Authority: CN
Inventors: 徐斌; 李毅; 肖岗; 刘佳瑶; 龚少麟; 张川江
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2017-10-09
Filing date: 2017-10-09
Publication date: 2020-08-14
Anticipated expiration: 2037-10-09
Also published as: CN107871315A

Abstract

The invention provides a video image target detection method, which comprises the following steps: performing background modeling on an image input by a video through two layers of Gaussian mixture models to obtain a video image background of the video input image, wherein the input of the second Gaussian mixture model is a result of the modeling of the first Gaussian mixture model; performing frame-by-frame difference processing on the video image background and the video input image to obtain a video image foreground of the video image; and carrying out binarization processing on the foreground of the video image based on an Otsu threshold value, eliminating a communicated region based on morphological corrosion and expansion operation and eliminating a small region based on the size of a pixel value in sequence to form a foreground target of the input video image. In addition, the invention also provides a video image target detection device and a video image target detection system.

Description

Video image motion detection method and device

Technical Field

The invention relates to a method and a device for detecting a moving target of a video image, and belongs to the field of artificial intelligence.

Background

With the continuous acceleration of the urbanization construction step in China, the urban safety management becomes a non-negligible problem, and the intelligent video monitoring system is more and more widely applied to urban safety management, and the scale is also continuously expanded. As one of core technologies of video monitoring, a detection technology of a moving object in a video image is also paid attention, and how to further improve the stability and accuracy of moving object detection in practice becomes a key point of research of a large number of experts and scholars at home and abroad. The traditional moving object detection method comprises inter-frame difference, an optical flow method, a Gaussian mixture model and the like, wherein as a classical self-adaptive background modeling method, the Gaussian mixture model has good adaptability to a complex background and relatively good actual performance, so that the traditional moving object detection method is always concerned by researchers.

In prior art 1, chinese patent ZL201410090199.1 discloses a moving object detection method based on gaussian mixture and edge detection, which reads a current image frame from a video shot by a camera; initializing a background by using a Gaussian mixture model, continuously updating the background, and simultaneously separating a moving target and carrying out binarization; extracting a work carrying target by using a canny edge detection method; carrying out OR operation on the obtained moving target and filling the hole; shadow elimination; carrying out necessary post-treatment to obtain a final result; and circulating the processing until all the image frame processing is finished. The method solves the problem that the moving target extracted by the conventional method is seriously lost under the condition that the color of the moving target is similar to that of the background by utilizing the OR operation of the moving target extracted by the Gaussian mixture model and the moving target extracted by the canny operator. Although the patent ZL processes the moving object with the similar color to the background, the patent ZL is limited by the performance of the single-layer gaussian mixture model, the capability of modeling the background of the video image is insufficient, and part of the foreground may be misjudged as the background.

In prior art 2, chinese patent ZL200710304222.2 discloses a moving object detection method based on an extended gaussian mixture model, which constructs probability density functions of a shadow background and a foreground based on the extended gaussian mixture model through a primary model construction module; constructing probability density functions of the moving target and the non-moving target based on the three types of models through a secondary model construction module; classifying by a classification module by applying a MAP-MRF (Maximum a Posteriori-Markov random field) method; and applying the tracked feedback information to further accurately model the foreground. According to the method, the Gaussian mixture model is fused with the spatial information, so that the foreground false detection caused by background motion can be overcome; adverse effects caused by shadows can be overcome by fusing background modeling, foreground detection and shadow removal in a probability frame, so that the detection effect of the moving object is improved. Although the background modeling effect of the moving target in the prior art 2 is good, the algorithm based on the Markov random field has high computational complexity, and the real-time requirement on the actual video monitoring image is difficult to achieve.

The present invention provides a method and an apparatus for detecting a moving object in a video image, which are directed to the above-mentioned defects in the prior art.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a method and a device for detecting a moving target of a video image. In addition, the invention also aims to provide a stable video image target detection method

Another object of the present invention is to implement a two-layer gaussian modeling algorithm.

It is another object of the invention to implement a post-processing algorithm for processing images.

In order to achieve the above object, the present invention provides a method for detecting a moving object in a video image, including: s1, performing background modeling on an image input by a video through two layers of Gaussian mixture models to obtain a video image background of the video input image, wherein the input of a secondary Gaussian mixture model is a result of the modeling of the primary Gaussian mixture model; s2, performing difference processing on the video image background and the video input image frame by frame to obtain a video image foreground of the video image; and S3, sequentially carrying out binarization processing on the foreground of the video image based on an Otsu threshold value, eliminating a communicated region based on morphological corrosion and expansion operation and eliminating a small region based on the size of a pixel value to form a foreground target of the input video image.

In addition, the present invention also provides a video image target detection apparatus, including: the background modeling unit is used for carrying out background modeling on the image input by the video through two layers of Gaussian mixture models to obtain the video image background of the video input image, wherein the input of the secondary Gaussian mixture model is the result of the modeling of the primary Gaussian mixture model; the background elimination unit is used for carrying out difference processing on the video image background output by the background modeling unit and the video input image frame by frame to obtain a video image foreground of the video image; and the post-processing unit is used for sequentially carrying out binarization processing based on an Otsu threshold value, elimination of a connected region based on morphological corrosion and expansion operation and elimination of a small region based on the size of a pixel value on the video image foreground output by the background elimination unit to form a foreground target of the input video image.

In addition, the present invention also provides a video image target detection device, including: one or more processors; a non-transitory computer readable storage medium storing one or more instructions that when executed by the processor are configured to: performing background modeling on an image input by a video through two layers of Gaussian mixture models to obtain a video image background of the video input image, wherein the input of the second Gaussian mixture model is a result of the modeling of the first Gaussian mixture model; performing frame-by-frame difference processing on the video image background and the video input image to obtain a video image foreground of the video image; and carrying out binarization processing on the foreground of the video image based on an Otsu threshold value, eliminating a communicated region based on morphological corrosion and expansion operation and eliminating a small region based on the size of a pixel value in sequence to form a foreground target of the input video image.

The invention has the beneficial effects that: firstly, the modeling mode of the two-layer Gaussian mixture model provided by the invention can achieve a relatively balanced effect in the aspects of calculation speed and background modeling accuracy, and the effect is stable. Secondly, the algorithm avoids the interference of small targets through various image post-processing operations including binarization and morphological operations. And thirdly, the algorithm can effectively realize background modeling of the video continuous images, thereby effectively detecting the moving target.

Drawings

FIG. 1 is a schematic diagram of video image moving object detection based on a two-layer Gaussian mixture model;

FIG. 2 is a flow chart of a video image moving object detection method based on a two-layer Gaussian mixture model;

FIG. 3 shows a block diagram of the components of a video image object detection apparatus;

FIG. 4 shows a block diagram of the components of the post-processing unit;

FIG. 5 is a block diagram showing the components of another video image object detecting apparatus;

fig. 6 shows an effect diagram after processing by the video image detection method.

Detailed Description

In image and adaptation, so-called video is actually a series of images with time series characteristics, also a series of images. For a common surveillance video, since the camera is relatively stable, there will be a relatively stable scene in the sequence image, i.e. the background, which will not change. Objects that change in the background, i.e. moving objects, need to be detected because they have valuable information, so-called foreground information. The background and the foreground are actually images, but the background is generally unchanged, and the foreground is used as a moving target and is changed in real time.

Fig. 1 is a schematic diagram of detecting a moving target of a video image based on a two-layer gaussian mixture model according to the present invention, where an input video image is processed by 101 a two-layer gaussian mixture model to obtain a video background, then background elimination 102 is performed, and a result of the background elimination is subjected to image post-processing 103 to obtain a foreground image of the moving target.

Fig. 2 is a flowchart of a method for detecting a moving target of a video image based on a two-layer gaussian mixture model, which mainly includes three steps of background modeling of the two-layer gaussian mixture model S1, background subtraction elimination S2 and image post-processing S3.

1) Background modeling S1: and inputting images frame by frame from an input monitoring video, and modeling the background through a two-layer Gaussian mixture model, wherein the input of the second Gaussian mixture model is the result of the first modeling. And finally forming the background of the video image through two-layer model modeling.

The modeling of the single Gaussian mixture model comprises the following steps:

initializing a background model mu, the initial background mean value being mu₀Initial standard deviation σ₀Initial difference threshold T (set to 20), I_x，yFor the pixel value at pixel point (x, y):

μ(x，y)＝I_x，y

σ(x，y)＝T

wherein, T is a pixel value of the image, which has only a gray level and no dimension, and can be artificially set according to the environment.

i. Inspection pixel I_x，yWhether the image belongs to the foreground or the background, wherein the image is a lambda threshold parameter, and whether the mean value mu (x, y) is within a certain range is judged:

if I_x，y-μ(x，y)|＜λ*σ(x，y)，I_x，yAs a background

Otherwise, I_x，yIs a prospect of

Learning update to background, update formula as follows, where α is learning rate, which can be generally set to 1 e-4:

μ(x，y)＝(1-α)*μ(x，y)+α*I_x，y

repeating steps ii, iii until the algorithm stops, i.e. when

And stop, here also a constant small amount, which may take 1 e-5.

2) Background elimination S2: the video background established in the first step is used for carrying out frame-by-frame subtraction with the video image to eliminate the background, and the result of the subtraction processing is the foreground of the video image and is also the moving target of the video sequence image;

the difference processing formula is as follows:

D_x，y＝I_x，y-μ(x，y)

wherein, I_x，yThe original image is shown, and μ (x, y) is the calculated background.

3) Image post-processing S3: and performing image post-processing operation on the image subjected to the difference processing to finally form a foreground target of the image. The post-treatment operations are sequentially carried out, and specifically comprise the following steps: binarization processing based on an Otsu threshold value, elimination of a communicated region based on morphological corrosion and expansion operation, and elimination of a small region based on the size of a pixel value.

The binarization processing based on the Otsu threshold value is as follows:

the Otsu threshold assumes a bimodal distribution in the image histogram, and the basic assumption is to set a threshold that separates the foreground and background of the image G, such that the inter-class variance of the foreground and background pixels should be maximized. Mathematically, the Otsu threshold t should satisfy the following optimal expression:

wherein, ω is₀＝N₀/N，ω₁＝N₁/N，

Here, N₀，N₁And N represents the foreground, background and total pixel count, respectively. p is a radical of_iRepresenting the frequency of the grey level i. Mu.s₀，μ₁And

representing the mean values of the gray levels of the foreground, background and full image pixels, respectively. For an RGB image, t has a value in the range of 0-255. Therefore, after t is obtained, the segmented image R can be obtained by thresholding^segThe following were used:

in the divided image R^segPixels representing foreground are all labeled 1, while background pixels are labeled 0.

Wherein, the step of eliminating the communication area based on morphological corrosion and expansion operation is as follows:

two images B are provided, a, B being called a structural element and also being called a brush, if a is the object to be processed, i.e. the data after binarization based on the atrazine threshold, and B is used to process a. The structuring elements are usually relatively small images. The data after binarization processing based on the Otsu threshold value is firstly subjected to corrosion operation and then expansion operation, namely morphological opening operation.

Wherein the corrosion (Erosion) operation is:

the result of X erosion with S is a collection of all X' S that remain in X after S is translated by X. In other words, the set obtained by corroding X with S is a set of the origin positions of S when S is completely included in X.

Wherein the expansion (scaling) operation is:

dilation can be viewed as the dual operation of erosion, which is defined as: and translating the structural element B by a to obtain Ba, and recording the point a if the Ba hits X. The set of all points a satisfying the above condition is called the result of expansion of X by B.

And performing small region elimination based on the size of a pixel value on the data after the communication region is eliminated based on the morphological erosion and the expansion operation, wherein the small region elimination based on the size of the pixel value is as follows:

let the connected region in the obtained image G be { A₁，A₂，…，A_NThe number of the corresponding pixel values of the communicated area is { n }₁，n₂，…，n_NIs then if n_i<, wherein if the human setting is 30, the region is left and determined as non-target; if n is_i>. sup..

Fig. 3 is a video image object detection apparatus 300 according to the present invention, which includes: the background modeling unit 301 performs background modeling on an image input by a video through two layers of mixed gaussian models to obtain a video image background of the video input image, wherein the input of the second mixed gaussian model is a result of the first mixed gaussian model modeling; a background elimination unit 302, configured to perform frame-by-frame subtraction processing on the video image background output by the background modeling unit and the video input image to obtain a video image foreground of the video image; and the post-processing unit 303 is used for sequentially carrying out binarization processing based on an Otsu threshold value, elimination of a connected region based on morphological corrosion and expansion operation and elimination of a small region based on the size of a pixel value on the foreground of the video image output by the background elimination unit to form a foreground target of the input video image.

Wherein the modeling method of each layer in the two layers of Gaussian mixture models in the background modeling unit is as described above for the modeling method in the video image target detection method of FIG. 1.

As shown in fig. 4, the post-processing unit 303 includes a binarization processing module 304 based on an atrazine threshold, a module 305 for eliminating a connected region based on morphological erosion and dilation operation, and a small region eliminating module 306 based on a pixel value, and the image foreground is processed by the three modules in sequence to obtain a foreground target of the input video image.

Fig. 5 shows another video image object detection apparatus 400 according to the present invention, which includes: one or more processors 401;

a non-transitory computer-readable storage medium 403 storing one or more instructions 402, wherein the computer-readable storage medium 403 may also store data to be processed, which may also be stored in other storage media, and wherein the processor, when executing the one or more instructions, is configured to: performing background modeling on an image input by a video through two layers of Gaussian mixture models to obtain a video image background of the video input image, wherein the input of the second Gaussian mixture model is a result of the modeling of the first Gaussian mixture model; performing frame-by-frame difference processing on the video image background and the video input image to obtain a video image foreground of the video image; and carrying out binarization processing on the foreground of the video image based on an Otsu threshold value, eliminating a communicated region based on morphological corrosion and expansion operation and eliminating a small region based on the size of a pixel value in sequence to form a foreground target of the input video image.

Fig. 6 shows the results obtained by the method and apparatus of the present invention as an example of indoor results, and the processing forms of the urban video and the indoor video are consistent except for the positions.

In a word, the modeling form based on the two-layer Gaussian mixture model provided by the invention can effectively avoid the influence of interference points in the detection of the moving target of the video image, and achieves the effect of relative balance in the aspects of calculation speed and the accuracy of background modeling. Various post-processing operations of the image, such as binarization, morphological operations, etc., can avoid the interference of small targets. Generally, the method for detecting the moving target in the video image can realize automatic analysis and study and judgment of the urban monitoring video, and can play an effective auxiliary support role in processing urban problems such as illegal parking, road occupation and the like.

Claims

1. A video image object detection method comprises the following steps:

s1, performing background modeling on an image input by a video through two layers of Gaussian mixture models to obtain a video image background of the video input image, wherein the input of a secondary Gaussian mixture model is a result of the modeling of the primary Gaussian mixture model;

s2, performing difference processing on the video image background and the video input image frame by frame to obtain a video image foreground of the video image;

s3, sequentially carrying out binarization processing on the foreground of the video image based on an Otsu threshold value, eliminating a communication area based on morphological corrosion and expansion operation and eliminating a small area based on the size of a pixel value to form a foreground target of the input video image;

each layer of the two-layer Gaussian mixture model is modeled as follows:

s11, initializing an image background model,

wherein the initial background mean value is mu₀Initial standard deviation σ₀Initial differential threshold T, I_x,yIs the pixel value at pixel point (x, y);

s12, checking the pixel I by judging whether the mean value mu (x, y) is within a certain range_x,yWhether it belongs to the foreground or the background, if I_x,y-μ(x,y)|<λ*σ(x,y)，I_x,yTo the background, otherwise, I_x,yForeground, where is the lambda threshold parameter;

s13. passing μ (x, y) ═ 1- α μ (x, y) + α I_x,yLearning and updating the background, and repeating the step S12 until mu (x, y) satisfies the condition, wherein

Where α is the learning rate.

2. The method of claim 1, wherein the learning rate α is 1 e-4.

3. The method as claimed in claim 1, wherein the binarization process based on Otsu threshold is: setting an image histogram to be in bimodal distribution, and setting a great amount of threshold value for separating the foreground and the background of the image, so that the inter-class variance of the foreground and background pixels is maximum.

4. The method of claim 3, wherein the Otsu threshold t satisfies the following expression,

wherein, ω is₀＝N₀/N，ω₁＝N₁/N，

N₀，N₁And N represents the number of foreground, background and total pixels, p_iFrequency, μ, representing the gray level i₀,μ₁And

representing the mean values of the gray levels of the foreground, background and full image pixels, respectively.

5. The method of claim 1, wherein the elimination of the connected region based on morphological erosion and dilation operations is: and setting a structural element B, firstly carrying out corrosion operation on the result after the binaryzation treatment based on the Otsu threshold value, and then carrying out expansion operation.

6. The method of claim 1, wherein the small region elimination based on pixel value size is: denote the region of connectivity as { A₁,A₂,…,A_NThe number of the corresponding pixel values of the communicated area is { n }₁，n₂，…,n_NIs then if n_i<If the region is manually set, the region is deleted and is judged to be a non-target; if n is_i>The target is the target, namely the foreground.

7. The method of claim 6, wherein the number is 30.

8. The method of claim 1, wherein T takes the value of 20.

9. A video image object detection apparatus comprising:

the background modeling unit is used for carrying out background modeling on the image input by the video through two layers of Gaussian mixture models to obtain the video image background of the video input image, wherein the input of the secondary Gaussian mixture model is the result of the modeling of the primary Gaussian mixture model;

the background elimination unit is used for carrying out difference processing on the video image background output by the background modeling unit and the video input image frame by frame to obtain a video image foreground of the video image;

the post-processing unit is used for sequentially carrying out binarization processing based on an Otsu threshold value, elimination of a communicated region based on morphological corrosion and expansion operation and elimination of a small region based on the size of a pixel value on the foreground of the video image output by the background elimination unit to form a foreground target of the input video image;

wherein each layer of the two layers of Gaussian mixture models in the background modeling unit is modeled as:

the background model of the image is initialized and,

checking the pixel I by determining whether the mean value mu (x, y) is within a certain range_x,yWhether it belongs to the foreground or the background, if I_x,y-μ(x,y)|<λ*σ(x,y)，I_x,yTo the background, otherwise, I_x,yForeground, where is the lambda threshold parameter;

by μ (x, y) ═ 1- α μ (x, y) + α I_x,yLearning and updating the background, and repeating the step S12 until mu (x, y) satisfies the condition, wherein

Where α is the learning rate.

10. The device according to claim 9, wherein the post-processing unit comprises a binarization processing module based on Otsu threshold, a module for eliminating connected regions based on morphological erosion and dilation operation, and a small region elimination module based on pixel value size, and the image foreground is processed by three modules in sequence to obtain a foreground target of the input video image.

11. The apparatus of claim 10, wherein the metrization processing module based on the Otsu threshold sets the image histogram to a bimodal distribution and sets the Otsu threshold separating the foreground and background of the image such that the between-class variance of the foreground and background pixels is maximized.

12. The apparatus of claim 11, wherein the Otsu threshold t satisfies the following expression,

wherein, ω is₀＝N₀/N，ω₁＝N₁/N，

13. The apparatus of claim 10, wherein the morphological erosion and dilation operations based elimination connected regions and small region elimination module based on pixel value size is to: and setting a structural element B, firstly carrying out corrosion operation on the result after the binaryzation treatment based on the Otsu threshold value, and then carrying out expansion operation.

14. The apparatus of claim 10, wherein the small region elimination based on pixel value size is: denote the region of connectivity as { A₁,A₂,…,A_NThe number of the corresponding pixel values of the communicated area is { n }₁，n₂，…,n_NIs then if n_i<If the region is manually set, the region is deleted and is judged to be a non-target; if n is_i>The target is the target, namely the foreground.

15. A video image object detection apparatus comprising:

one or more processors;

a non-transitory computer readable storage medium storing one or more instructions that when executed by the processor are configured to:

performing background modeling on an image input by a video through two layers of Gaussian mixture models to obtain a video image background of the video input image, wherein the input of the second Gaussian mixture model is a result of the modeling of the first Gaussian mixture model;

performing frame-by-frame difference processing on the video image background and the video input image to obtain a video image foreground of the video image;

and carrying out binarization processing on the foreground of the video image based on an Otsu threshold value, eliminating a communicated region based on morphological corrosion and expansion operation and eliminating a small region based on the size of a pixel value in sequence to form a foreground target of the input video image.