CN113409353A

CN113409353A - Motion foreground detection method and device, terminal equipment and storage medium

Info

Publication number: CN113409353A
Application number: CN202110626840.9A
Authority: CN
Inventors: 朋兴磊; 符顺; 许楚萍
Original assignee: Hangzhou Lianji Technology Co ltd
Current assignee: Hangzhou Lianji Technology Co ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-09-17
Anticipated expiration: 2041-06-04
Also published as: CN113409353B

Abstract

The invention discloses a motion foreground detection method, a motion foreground detection device, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring a current video frame to be detected; when the current video frame is a first frame picture, initializing a pre-established first Gaussian mixture model by using the pixel value of each pixel point of the current video frame; otherwise, carrying out motion vector estimation on the current video frame according to the background frame to obtain a global motion estimator of the current video frame; performing position correction on the first Gaussian mixture model of each pixel point in the current video frame according to the global motion estimator; matching each pixel point of the current video frame with the first Gaussian mixture model after the position correction of the pixel point; and classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result. The method can solve the problem that the motion foreground is missed to be detected in the current motion foreground detection method under the dynamic scene.

Description

Motion foreground detection method and device, terminal equipment and storage medium

Technical Field

The invention relates to the technical field of computer image processing, in particular to a motion foreground detection method, a motion foreground detection device, terminal equipment and a storage medium.

Background

In the field of computer digital image processing, image processing-based moving foreground detection refers to distinguishing moving foreground objects from the background in a video sequence. According to the motion state of a shot picture, a motion foreground detection algorithm can be divided into a foreground detection method under a static background and a foreground detection method under a dynamic background, wherein the static background corresponds to a scene with a fixed camera and a relatively stable background; the dynamic background comprises two situations, namely a camera is fixed, but a shot picture comprises a moving background, such as leaf shaking, water wave and the like; another situation is that the camera is in motion and the captured image moves along with the camera.

For the detection of a motion foreground in a dynamic scene, the current method comprises the following steps: the image is downsampled and filtered through the Gaussian pyramid, a dynamic scene is changed into a relatively stable static scene, and mixed Gaussian background modeling is carried out by using data of a layer of which the motion of pixel points meets a preset condition, so that motion foreground detection under the dynamic scene is realized, but downsampling and filtering operation can lose a large amount of image detail information, and detection omission of some motion foregrounds is caused.

Disclosure of Invention

The embodiment of the invention aims to provide a motion foreground detection method, a motion foreground detection device, a terminal device and a storage medium, so as to solve the problem that the motion foreground is missed to be detected in the motion foreground detection method under the current dynamic scene.

To achieve the above object, a first embodiment of the present invention provides a motion foreground detecting method, including:

acquiring a current video frame to be detected;

when the current video frame is a first frame picture, initializing a pre-established first Gaussian mixture model by using the pixel value of each pixel point of the current video frame;

when the current video frame is not the first frame picture, carrying out motion vector estimation on the current video frame according to the background frame to obtain the global motion estimation quantity of the current video frame; the background frame is obtained by calculation according to first Gaussian mixture models of all pixel points of the current video frame;

performing position correction on the first Gaussian mixture model corresponding to each pixel point in the current video frame according to the global motion estimator;

matching each pixel point of the current video frame with the first Gaussian mixture model after the position of the pixel point is corrected;

and classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result.

Preferably, the method further comprises:

for each pixel point classified as a foreground pixel point, matching any one pixel point with a first Gaussian mixture model corresponding to other pixel points in a search range preset by the pixel point to obtain a neighborhood matching result;

and finally judging that any pixel belongs to the foreground pixel or the background pixel according to the neighborhood matching result.

Preferably, the performing motion vector estimation on the current video frame according to the background frame to obtain a global motion estimator of the current video frame specifically includes:

dividing a current video frame into a plurality of image blocks, and taking one of the image blocks as a reference block;

searching an image block similar to the reference block in a corresponding preset search area of the background frame to serve as a matching block;

obtaining a local motion vector of the reference block according to the relative displacement of the reference block and the matching block;

and counting the local motion vectors of all the image blocks by adopting a histogram statistical method, and taking the local motion vector represented by the highest point in the histogram as the global motion estimator of the current video frame.

Preferably, the performing, according to the global motion estimator, position correction on the first gaussian mixture model corresponding to each pixel point in the current video frame specifically includes:

shifting the first Gaussian mixture models corresponding to all pixel points of the current video frame according to the global motion estimator;

discarding the first Gaussian mixture model corresponding to the pixel point of the picture area beyond the preset imaging range;

and establishing a corresponding first Gaussian mixture model for the pixel points of the picture area newly entering the preset imaging range.

Preferably, the first gaussian mixture model comprises a plurality of sub-gaussian models, then,

the matching of each pixel point of the current video frame with the first gaussian mixture model thereof, or the matching of each pixel point of the current video frame with the first gaussian mixture model after the position correction thereof, includes:

calculating a first distance between the ith pixel point of the current video frame and the mean value of the jth sub-Gaussian model in the first mixed Gaussian model;

when the square of the first distance is less than lambda of the variance of the jth sub-Gaussian model₁When the number of the pixels is multiplied, judging that the ith pixel point is matched with the jth sub-Gaussian model; wherein λ is₁＞0；

Otherwise, judging whether the jth sub-Gaussian model is the last sub-Gaussian model;

if yes, judging that the first Gaussian mixture model corresponding to the ith pixel point is not matched;

if not, continuously calculating a first distance between the ith pixel point and the mean value of the j +1 th sub-Gaussian model.

Preferably, after the acquiring the current video frame to be detected, the method further includes:

setting an interval frame number of update frames, wherein the update frames are used for indicating the model parameters of the first Gaussian mixture model to be updated according to a preset update mode; then the process of the first step is carried out,

after the determining determines that the ith pixel point matches the jth sub-Gaussian model, the method further includes:

judging whether the current video frame is an update frame;

if the current video frame is an update frame, updating the model parameters of the first Gaussian mixture model according to the preset update mode;

and sorting the weights of all sub-Gaussian models in the updated first mixed Gaussian model in a descending order, calculating the sum of the weights from large to small, and taking the sub-Gaussian model with the sum of the weights larger than a preset threshold value as a background model.

Preferably, the updating the model parameters of the first gaussian mixture model according to the preset updating method includes:

replacing the pixel value of the ith pixel point with the mean value of a sub-Gaussian model with the minimum weight in the first mixed Gaussian model corresponding to the ith pixel point, and initializing the variance and the weight of the sub-Gaussian model;

and normalizing the weights of all the sub-Gaussian models to update the first mixed Gaussian model of the ith pixel.

Preferably, for each pixel point classified as a foreground pixel point, matching any one pixel point with a first gaussian mixture model corresponding to other pixel points in a search range preset around the pixel point, specifically including:

calculating a second distance between the kth pixel point classified as the foreground pixel point and the mean value of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point in the search range preset at the periphery of the kth pixel point;

if the square of the second distance is smaller than the lambda of the variance of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h-th pixel point₂Multiplying, then judging a first Gaussian mixture model corresponding to the kth pixel point and the h pixel pointMatching any one sub-Gaussian model; wherein λ is₂＞0；

When any matched sub-Gaussian model is a background model, judging that the kth pixel point is a background pixel point;

and when any matched sub-Gaussian model is not the background model, judging that the kth pixel point is a foreground pixel point.

A second embodiment of the present invention provides a motion foreground detecting apparatus, including:

the video frame acquisition module is used for acquiring a current video frame to be detected;

the model initialization module is used for initializing a pre-established first Gaussian mixture model by using the pixel value of each pixel point of the current video frame when the current video frame is a first frame picture;

the motion estimation module is used for estimating a motion vector of the current video frame according to the background frame when the current video frame is not the first frame picture to obtain a global motion estimator of the current video frame; the background frame is obtained by calculation according to first Gaussian mixture models of all pixel points of the current video frame;

the position correction module is used for correcting the position of the first Gaussian mixture model corresponding to each pixel point in the current video frame according to the global motion estimator;

the matching module is also used for matching each pixel point of the current video frame with the first Gaussian mixture model after the position of the pixel point is corrected;

and the first foreground judging module is used for classifying all the pixel points in the current video frame into foreground pixel points and background pixel points according to the first matching result.

A third embodiment of the present invention correspondingly provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements the motion foreground detection method according to any one of the first embodiments when executing the computer program.

A fourth embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, a device on which the computer-readable storage medium is located is controlled to execute the motion foreground detection method according to any one of the first embodiments.

Compared with the prior art, the motion foreground detection method, the motion foreground detection device, the terminal equipment and the storage medium provided by the embodiment of the invention have the advantages that the mixed Gaussian model and the motion estimation are combined, so that the condition that the small motion object is fuzzified and the detection is missed due to the filtering in the existing foreground detection method is avoided.

Drawings

Fig. 1 is a schematic flow chart of a motion foreground detection method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a motion foreground detection method according to another embodiment of the present invention;

FIG. 3 shows two search templates of the diamond search method, wherein the left side is a large diamond template (LDSP) with 9 search positions, and the right side is a small diamond template (SDSP) with 5 search positions;

FIG. 4 is a schematic diagram of a first Gaussian mixture model for position correction according to an embodiment of the present invention;

fig. 5 is a schematic flowchart illustrating a process of determining a foreground or a background after obtaining a first gaussian mixture model according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a motion foreground detecting apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, which is a schematic flow chart of the motion foreground detection method provided in the embodiment of the present invention, the method includes steps S1 to S6:

s1, acquiring a current video frame to be detected;

s2, when the current video frame is a first frame picture, initializing a pre-established first Gaussian mixture model by using the pixel value of each pixel point of the current video frame;

s3, when the current video frame is not the first frame, estimating the motion vector of the current video frame according to the background frame to obtain the global motion estimation quantity of the current video frame; the background frame is obtained by calculation according to first Gaussian mixture models of all pixel points of the current video frame;

s4, performing position correction on the first Gaussian mixture model corresponding to each pixel point in the current video frame according to the global motion estimator;

s5, matching each pixel point of the current video frame with the first Gaussian mixture model after the position correction;

and S6, classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result.

Specifically, a frame of video frame to be detected in a dynamic scene is acquired. Preferably, the acquired current video frame is subjected to image pre-processing, such as down-sampling, graying, filtering, and the like. After the preprocessing is finished, a first Gaussian mixture model is respectively established for each pixel point in the current video frame. Each first mixed Gaussian model consists of K sub-Gaussian models, and each sub-Gaussian model comprises three parameters of weight, mean value and variance. After the first gaussian mixture model of each video frame is established, a corresponding background frame can be calculated, specifically: and calculating a weighted mean value through the first Gaussian mixture models corresponding to all pixel points of the video frame to obtain a background frame corresponding to the video frame.

When the current video frame is the first frame picture, initializing the first gaussian mixture model by using the current video frame, specifically, initializing each pixel point as follows: setting the average value of all sub-Gaussian models corresponding to each pixel point as a current pixel value; the variance is initialized to a larger initial value, typically set to 15 x 15; the weight is uniformly distributed, and the weight of each model is 1/K. Preferably, K is 3-6, and when the scene is complex, the value of K is larger. And for each initialized pixel point, directly matching the pixel point with the first Gaussian mixture model.

When the current video frame is not the first frame picture, comparing the background frame with the current video frame by taking the background frame as a reference frame, and performing motion vector estimation on the current video frame to obtain a motion vector of the current video frame; and according to the motion vector, the first Gaussian mixture model is shifted according to the motion vector, so that position correction is realized, pixel points of the shifted first Gaussian mixture model and the current video frame are almost in one-to-one correspondence in spatial position, and the condition that a small moving object is fuzzified by a filtering method to cause omission is avoided. And then, matching the pixel points with the shifted first Gaussian mixture model.

And judging the foreground according to the matching result of the video frame belonging to the first frame picture and the video not belonging to the first frame picture. If a certain pixel point is not matched with the first Gaussian mixture model, the pixel point can be preliminarily judged as a foreground pixel point, and otherwise, the pixel point can be preliminarily judged as a background pixel point. In the embodiment of the invention, the condition of missing detection caused by fuzzification of small moving objects caused by filtering in the conventional foreground detection method can be avoided by combining the Gaussian mixture model with the motion estimation.

In an optional embodiment, the method further comprises:

s11, for each pixel point classified as a foreground pixel point, matching any one pixel point with a first Gaussian mixture model corresponding to other pixel points in a search range preset by the pixel point to obtain a neighborhood matching result;

and S12, finally judging whether any pixel belongs to the foreground pixel or the background pixel according to the neighborhood matching result.

Matching the pixel points which are preliminarily judged as foreground pixel points with a first Gaussian mixture model corresponding to other pixel points in a search range preset at the periphery of the pixel points to obtain a neighborhood matching result; and carrying out foreground judgment again according to the neighborhood matching result, and finally judging the pixel as a background pixel if the pixel is matched with the first Gaussian mixture model of other pixels in the field, or finally judging the pixel as a foreground pixel. By the neighborhood model search, the range can be changed in a self-adaptive manner according to the local motion estimation quantity, the local area motion change is avoided to be large, and the problem of the precision of motion estimation can be solved. Preferably, the search range of the neighborhood model is half of the local motion estimator of the region to which the current pixel belongs.

Referring to fig. 2, which is a schematic flow chart of a motion foreground detection method according to another embodiment of the present invention, as can be seen from fig. 2, in an executed flow, in order to improve the adaptability of a model, parameters of a part of models are updated during model matching and judgment, and then subsequent steps are performed according to the updated models.

The embodiment of the invention provides a motion foreground detection method, which comprises the steps of initializing a Gaussian mixture model by utilizing a first frame video frame in a dynamic scene, directly carrying out model matching, obtaining a motion vector of a non-first frame video frame by taking a background frame as a reference, carrying out position correction on the Gaussian mixture model, and then carrying out matching of other frame pictures and a correction model; and then performing foreground judgment based on the matching result. In order to avoid the background edge part being detected as the foreground, the field model search is added to the pixel points which are preliminarily judged as the foreground pixel points, whether the related pixel points are the foreground or not is further judged, and the problem that the moving foreground is missed to be detected in the moving foreground detection method under the current dynamic scene is effectively solved.

In an optional embodiment, the performing motion vector estimation on the current video frame according to the background frame to obtain a global motion estimator of the current video frame specifically includes:

obtaining a local motion vector corresponding to the reference block according to the relative displacement of the reference block and the matching block;

Specifically, the motion estimation method may use: a gray projection method, a block matching method, a phase correlation method, an optical flow method, and the like. Preferably, the embodiment of the present invention uses a block matching method to perform local motion vector estimation, and finally obtains a global motion vector by a histogram statistical method. The block matching method is to find out the most relevant part in the image by comparing the current frame with the background frame and establish the relation between them. The block matching method is realized as follows:

any frame of picture is divided into a plurality of image blocks, and one of the image blocks is taken as a reference block. Preferably, the current frame image is divided into a plurality of m × n image blocks, one of the image blocks is taken as a reference block, and the position of the center thereof is recorded.

And (delta m + m) multiplied by (delta n + n) in the corresponding search area of the background frame, and searching out an image block similar to the reference block as a matching block. There are many search methods, including: a full search method, a three-step search method, a diamond search method, etc., preferably, the diamond search method is used for searching, and two search templates are used, one is a large diamond template (LDSP) with 9 search positions (fig. 3 left), and the other is a small diamond template (SDSP) with 5 search positions (fig. 3 right), and the detailed search is as follows, with specific reference to fig. 3:

a) initializing a large diamond template LDSP, and respectively calculating the errors between the image blocks at the corresponding positions with the background frame when the center of the reference block is at 9 search positions by taking the origin of a search window as the center. The error can be calculated by Sum of Absolute Difference (SAD), normalized correlation function (NCFF), Root Mean Square Error (RMSE), maximum matched pixel count (MP)C) Structural Similarity (SSIM), etc. For example, the root mean square error may be used as a measure, if the calculated minimum Block error (MBD) point is located at the center position, go to step c, otherwise go to step b. The root mean square error is calculated as follows, where X_obs,iRepresenting the ith pixel value, x, of the block to be matched_model,iThe ith pixel value of the reference block is represented, and N is the total number of pixel points in the reference block.

b) Constructing a new LDSP by taking the MBD point found at the last time as a center, calculating matching errors of 9 search points of the LDSP, finding out the new MBD point, if the new LDD point is positioned at the center of the template, turning to the step c, and otherwise, repeating the step b;

c) and constructing a small diamond template SDSP by taking the MBD point obtained in the previous time as a center, performing matching calculation and comparison at 5 search points of the small diamond template SDSP, and finding out the MBD point which is a final matching block.

And after the matching block is obtained, obtaining a local motion vector corresponding to the reference block according to the relative displacement of the reference block and the matching block. Since the relative displacement between the matching block and the reference block is the motion vector of the reference block, i.e. the local motion vector.

And after the local motion vectors of different image blocks are obtained, counting the local motion vectors of all the image blocks by adopting a histogram statistical method, and taking the local motion vector represented by the highest point in the histogram as the global motion estimator of the current video frame picture.

In an optional embodiment, the performing, according to the global motion estimator, position correction on the first gaussian mixture model corresponding to each pixel point in the current video frame specifically includes:

Specifically, referring to fig. 4, it is a schematic diagram of the position correction performed by the first gaussian mixture model according to the embodiment of the present invention. As can be seen from fig. 4, assuming that the global motion estimator is represented by (x, y), x represents the lateral shift and y represents the longitudinal shift, the positions of all the first gaussian mixtures are shifted in the same direction, e.g. the original model in region a in the left image is shifted to region a' in the right image. Because the imaging range is fixed, after the displacement, a part of image area is displaced out of the imaging range, such as the upper dark gray area and the right dark gray area in the left image), and the model corresponding to the image in the area needs to be discarded; the area newly entering the imaging range, such as the light gray area below and to the left in the right image, is to initialize the corresponding first gaussian mixture model with the pixel at the corresponding position of the current frame.

That is, the offset is performed according to the global motion estimator, the first gaussian mixture model corresponding to the picture region beyond the preset imaging range is discarded, and the first gaussian mixture model corresponding to the picture region newly entering the preset imaging range is established, that is, for the region newly added in the preset imaging range, the initialization is performed by using the pixels of the corresponding region of the current frame picture, and the initialization operation is similar to the initialization of the gaussian mixture model of the first frame picture, which is not described herein again.

After the original first Gaussian mixture model is partially abandoned and partially supplemented, the corrected first Gaussian mixture model can be obtained according to the adjusted first Gaussian mixture model. And the pixel position of the current video frame picture corresponds to the position of the first Gaussian mixture model after the deviation one by one, so that the foreground or the background is judged.

As an improvement of the above solution, the first gaussian mixture model includes a plurality of sub-gaussian models, then,

if yes, the first Gaussian mixture model corresponding to the ith pixel point is not matched;

In particular, the first gaussian mixture model comprises a plurality of sub-gaussian models, then,

after the position correction is performed on the first gaussian mixture model, a foreground or background judgment can be performed, so the method further includes:

and calculating a first distance between any pixel point of the current video frame and the mean value of any sub-Gaussian model in the corresponding first mixed Gaussian model. For example, the squared distance between the current pixel value pixel and the mean value μ of the sub-gaussian model is calculated.

When the square of the first distance is smaller than the variance delta of any sub-Gaussian model corresponding to the current pixel point²λ of₁In time, it is expressed by formula (pixel-mu)²<λ₁*δ²If so, judging that the current pixel point is matched with any sub-Gaussian model corresponding to the current pixel point; wherein λ is₁＞0。

Otherwise, i.e. (pixel-mu)²≥λ₁*δ²And judging whether any sub-Gaussian model corresponding to the current pixel point is the last sub-Gaussian model.

If so, namely all the sub-Gaussian models are not matched with the current pixel point, judging that the current pixel point is not matched with the first mixed Gaussian model corresponding to the current pixel point;

if not, continuously calculating a first distance between the current pixel point and the mean value of the next sub-Gaussian model, namely traversing all sub-Gaussian models in the first mixed Gaussian model corresponding to the current pixel point.

As an improvement of the above scheme, after the acquiring the current video frame to be detected, the method further includes:

after the ith pixel point is judged to be matched with the jth sub-Gaussian mixture model, the method further comprises the following steps:

judging whether the current video frame is an update frame;

Specifically, it is set to update the model parameters every 10 frames. The video frames every 10 frames are marked with update frame markers,

further, the updating the model parameters of the first gaussian mixture model according to the preset updating method includes:

and judging whether the current video frame is an update frame, namely judging whether the current video frame has a mark, if so, judging that the current video frame is the update frame, and if so, carrying out model update on a model corresponding to the current video frame.

And if the current video frame is an update frame, updating the model parameters of the first Gaussian mixture model according to a preset update mode, and normalizing the updated weights of all sub-Gaussian models. The preset updating mode is as follows:

ω_t＝(1-α)*ω_t-1+α*M_k

μ_t＝(1-α)*μ_t-1+α*x_t

wherein ω is_t，μ_t，

Respectively representing the weight, mean and variance of any sub-Gaussian model at the time t, alpha representing the learning rate of the sub-Gaussian model, M_kThe sub-gaussian model is set to 1 when matching is successful, and is set to 0 when matching is failed. x is the number of_tAnd representing the current pixel point matched with the sub-Gaussian model at the time t. After the parameters are updated, normalization operation needs to be performed on the weights of all the sub-models, and the sum of the weights is ensured to be 1. The normalized calculation formula is as follows:

and sorting the weights of all sub-Gaussian models in the updated first mixed Gaussian model in a descending order, calculating the sum of the weights from large to small, and taking the sub-Gaussian model with the sum of the weights larger than a preset threshold T as a background model.

As an improvement of the above scheme, after the ith pixel point is determined to be the foreground pixel point, the method further includes:

Specifically, the pixel value of the current pixel point is replaced by the mean value of the sub-gaussian model with the minimum weight in the first mixed gaussian model corresponding to the current pixel point, and the variance and the weight of the sub-gaussian model are initialized. Preferably, the weight of the sub-gaussian model is initialized to a small value, e.g., 0.001.

In order to ensure that the sum of the weights is 1, the weights of all the sub-Gaussian models are normalized to obtain a new sub-Gaussian model. In the embodiment of the invention, in the foreground/background detection process, the mixed Gaussian model parameters are not updated in the updating frame, so that the running time of the algorithm is reduced, and the real-time performance of the algorithm is improved.

To more clearly understand how to perform foreground or background judgment after obtaining the first gaussian mixture model, refer to fig. 5, which is a schematic flow chart of performing foreground or background judgment after obtaining the first gaussian mixture model according to the embodiment of the present invention. As can be seen from fig. 5, after the first gaussian mixture model is obtained, the distance between the current pixel value and the corresponding position model is calculated, and then whether the current pixel value is matched with the corresponding position model is judged according to the distance, if the current pixel value is matched with the corresponding position model, whether the current frame is an updated frame is judged, when the current video frame is an updated frame, the model parameters are updated, and then whether the matched model is a background model is judged, if the matched model is a background model, the current pixel point is a background pixel point, and if the matched model is not a background model, the current pixel point is preliminarily judged to be a foreground pixel point; when the current video frame is not an update frame, judging whether the matching model is a background model, if so, judging the current pixel point to be a background pixel point, and if not, preliminarily judging the current pixel point to be a foreground pixel point; if the two are not matched, the matching condition of the next model is continuously judged, and when all the models are not matched, the current pixel value is replaced by the mean value of the model with the minimum weight. For the pixel point primarily determined as the foreground pixel point, further, in order to compensate for the accuracy problem of motion estimation, neighborhood model search is also required, as in steps S11-S12.

As an improvement of the above scheme, for each pixel point classified as a foreground pixel point, matching any one pixel point with a first gaussian mixture model corresponding to other pixel points within a peripheral preset range of the pixel point, specifically including:

calculating a second distance between the kth pixel point classified as the foreground pixel point and the mean value of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point in the peripheral preset range;

if the square of the second distance is smaller than the lambda of the variance of any sub-Gaussian model corresponding to the h pixel point₂If so, judging that the kth pixel point is matched with any sub-Gaussian model corresponding to the h pixel point; wherein λ is₂＞0；

Specifically, for the pixel point preliminarily determined as the foreground pixel point in step S6, a second distance between the current pixel point and a mean value of any sub-gaussian model in the first mixed gaussian model corresponding to other peripheral pixel points is calculated;

if the square of the second distance is smaller than the lambda of the variance of any sub-Gaussian model corresponding to other peripheral pixel points₂Times, formulated as (pixel- μ)²<λ₂*δ²Then judging that the current pixel point is matched with any sub-Gaussian model corresponding to other peripheral pixel points; wherein λ is₂Is greater than 0. And if the model of the current pixel point and any peripheral pixel point is successfully matched, quitting the neighborhood model search, judging whether the pixel point belongs to the background model according to the weight of the successfully matched model, and further judging whether the pixel point belongs to the background pixel point or the foreground pixel point. And only performing model matching and foreground/background judgment in the neighborhood model searching process, and not updating model parameters.

When any matched sub-Gaussian model is a background model, judging that the current pixel point is a background pixel point; and when any one matched sub-Gaussian model is not the background model, judging that the current pixel point is the foreground pixel point. Similarly, the background model is determined in the following manner: the sub-Gaussian models are sorted in descending order according to the weight, the sum of the weights is calculated from large to small, and the sub-Gaussian model corresponding to the weight sum larger than a preset threshold value T is used as a background model.

In addition, if (pixel-mu)²≥1λ₁*δ²Judging whether any sub-Gaussian model corresponding to other pixel points is the last sub-Gaussian model;

if yes, judging that the current pixel point is not matched with the first Gaussian mixture models corresponding to other pixel points, and the current pixel point is a foreground pixel point.

If not, continuously calculating a second distance between the current pixel point and the mean value of the next sub-Gaussian model.

In the embodiment of the invention, the dynamic scene is searched and processed by utilizing the motion estimation and the neighborhood model, the condition of missing detection caused by fuzzification of a small moving object caused by the existing filtering method can be avoided, meanwhile, the neighborhood model searching range can be adaptively changed according to the local motion estimation quantity, the neighborhood searching range is correspondingly larger for the area with violent motion, and the problem of foreground detection precision caused by the motion estimation can be solved by utilizing the neighborhood model searching.

Referring to fig. 6, which is a schematic structural diagram of the motion foreground detecting apparatus provided in this embodiment of the present invention, the apparatus includes:

the video frame acquisition module 11 is used for acquiring a current video frame to be detected;

the model initialization module 12 is configured to initialize a pre-established first gaussian mixture model by using a pixel value of each pixel point of a current video frame when the current video frame is a first frame picture;

a motion estimation module 13, configured to perform motion vector estimation on the current video frame according to the background frame when the current video frame is not the first frame picture, to obtain a global motion estimator of the current video frame; the background frame is obtained by calculation according to first Gaussian mixture models of all pixel points of the current video frame;

the position correction module 14 is configured to perform position correction on the first gaussian mixture model corresponding to each pixel point in the current video frame according to the global motion estimator;

the matching module 15 is further configured to match each pixel point of the current video frame with the first gaussian mixture model after the position of the pixel point is corrected;

and the first foreground judging module 16 is configured to classify each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the first matching result.

Preferably, the apparatus further comprises:

the domain searching module is used for matching any one pixel point with a first Gaussian mixture model corresponding to other pixel points in a search range preset by the pixel point to obtain a neighborhood matching result for each pixel point classified as a foreground pixel point;

and the second foreground judgment module is used for finally judging that any pixel belongs to a foreground pixel or a background pixel according to the neighborhood matching result.

Preferably, the motion estimation module 13 specifically includes:

the device comprises a blocking unit, a motion estimation unit and a motion estimation unit, wherein the blocking unit is used for dividing a current video frame into a plurality of image blocks, and one of the image blocks is taken as a reference block;

a matching block searching unit, configured to search an image block similar to the reference block in a corresponding preset search area of the background frame, as a matching block;

the local motion estimation unit is used for obtaining a local motion vector corresponding to the reference block according to the relative displacement of the reference block and the matching block;

and the global motion estimation unit is used for counting the local motion vectors of all the image blocks by adopting a histogram statistical method, and taking the local motion vector represented by the highest point in the histogram as the global motion estimator of the current video frame.

Preferably, the position correction module 14 specifically includes:

and the offset unit is used for offsetting the current video frame according to the global motion estimator, discarding the first mixed Gaussian model corresponding to the pixel point of the picture region exceeding the preset imaging range, and establishing the corresponding first mixed Gaussian model according to the pixel point of the picture region newly entering the preset imaging range.

Preferably, the first gaussian mixture model includes a plurality of sub-gaussian models, then the apparatus further includes:

the first distance calculation module is used for calculating a first distance between the ith pixel point of the current video frame and the mean value of the jth sub-Gaussian model in the first mixed Gaussian model;

a first judging module, configured to determine a square of the first distance as a function of λ of a variance of the jth sub-gaussian model₁When the number of the pixels is multiplied, judging that the ith pixel point is matched with the jth sub-Gaussian model; wherein λ is₁＞0；

The second judgment module is used for judging whether the jth sub-Gaussian model is the last sub-Gaussian model or not if the jth sub-Gaussian model is not the last sub-Gaussian model;

the third judgment module is used for judging that the first Gaussian mixture model corresponding to the ith pixel point is not matched if the first Gaussian mixture model is not matched;

and the second distance calculation module is used for continuously calculating the first distance between the ith pixel point and the mean value of the (j + 1) th sub-Gaussian model if the pixel point is not in the ith pixel point.

Preferably, the apparatus further comprises:

the marking module is used for setting the interval frame number of the updating frame, and the updating frame is used for indicating the model parameters of the first Gaussian mixture model to be updated according to a preset updating mode; then the process of the first step is carried out,

the device further comprises:

the fourth judging module is used for judging whether the current video frame is an updating frame;

the updating module is used for updating the model parameters of the first Gaussian mixture model according to the preset updating mode if the current video frame is an updating frame;

and the sorting module is used for sorting the weights of all sub-Gaussian models in the updated first mixed Gaussian model in a descending order, calculating the sum of the weights from large to small, and taking the sub-Gaussian model with the sum of the weights larger than a preset threshold value as a background model.

Preferably, the update module includes:

a replacing unit, configured to replace the pixel value of the ith pixel point with a mean value of a sub-gaussian model with a minimum weight in the first mixed gaussian model corresponding to the ith pixel point, and initialize a variance and a weight of the sub-gaussian model;

and the normalization processing unit is used for performing normalization processing on the weights of all the sub-Gaussian models so as to update the first Gaussian mixture model of the ith pixel.

Preferably, the first foreground judging module 16 specifically includes:

the distance calculation unit is used for calculating a second distance between the kth pixel point classified as the foreground pixel point and the mean value of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point in the search range preset at the periphery of the kth pixel point;

a first judging unit, configured to determine, if the square of the second distance is smaller than λ of the variance of any sub-gaussian model corresponding to the h-th pixel point₂If so, judging that the kth pixel point is matched with any sub-Gaussian model corresponding to the h pixel point; wherein λ is₂＞0；

The second judging unit is used for judging that the kth pixel point is a background pixel point when any matched sub-Gaussian model is a background model;

and the third judging unit is used for judging that the kth pixel point is a foreground pixel point when any one matched sub-Gaussian model is not a background model.

The motion foreground detection device provided in the embodiment of the present invention can implement all the processes of the motion foreground detection method described in any one of the embodiments, and the functions and implemented technical effects of each module and unit in the device are respectively the same as those of the motion foreground detection method described in the embodiment, and are not described herein again.

Referring to fig. 7, which is a schematic diagram of a terminal device provided in the embodiment of the present invention, the terminal device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and when the processor 10 executes the computer program, the motion foreground detection method according to any of the above embodiments is implemented.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 20 and executed by the processor 10 to implement the present invention. One or more of the modules/units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of a computer program in a type of motion foreground detection. For example, the computer program may be divided into a model establishing module, a first matching module, a motion estimation module, a position correction module, and a foreground determining module, and the specific functions of each module refer to the motion foreground detecting apparatus described in the above embodiment:

the terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. It will be understood by those skilled in the art that the schematic diagram 7 is merely an example of a terminal device, and is not intended to limit the terminal device, and may include more or less components than those shown, or some components may be combined, or different components, for example, the terminal device may further include an input-output device, a network access device, a bus, etc.

The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor 10 may be any conventional processor or the like, the processor 10 being the control center of the terminal device and connecting the various parts of the whole terminal device with various interfaces and lines.

The memory 20 may be used to store the computer programs and/or modules, and the processor 10 implements various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.

Wherein, the module integrated with the terminal device can be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the method when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, a device where the computer-readable storage medium is located is controlled to execute the motion foreground detection method according to any one of the above embodiments.

In summary, the method, the device, the terminal device and the storage medium for detecting the motion foreground provided by the embodiments of the present invention combine the gaussian mixture model, the motion estimation and the neighborhood model search, so that the condition of missing detection caused by blurring of a small moving object due to filtering in the existing foreground detection method is avoided, the neighborhood model search range can adaptively change according to the local motion estimator, for an area with severe motion, the neighborhood search range is correspondingly larger, the neighborhood model search compensates the precision problem of the motion estimation, and the problem that the background edge part is detected as the foreground in a motion scene is effectively solved. Meanwhile, the invention also reduces the calculation complexity of the motion foreground detection algorithm in a dynamic scene by reducing the updating frequency of the model parameters, and improves the real-time performance of the algorithm, namely, the model parameters of all mixed Gaussian models are not updated, but only the models corresponding to the video frames with a certain frame number interval are updated, and only model matching and foreground/background judgment are carried out in the process of neighborhood model searching, and the model parameters are not updated.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for motion foreground detection, comprising:

acquiring a current video frame to be detected;

2. The motion foreground detection method of claim 1 further comprising:

3. The method according to claim 1, wherein the performing motion vector estimation on the current video frame according to the background frame to obtain the global motion estimator of the current video frame specifically comprises:

4. The method according to claim 1, wherein the performing the position correction on the first gaussian mixture model corresponding to each pixel point in the current video frame according to the global motion estimator specifically comprises:

5. The motion foreground detection method of any one of claims 2-4 wherein the first Gaussian mixture model includes a plurality of sub-Gaussian models,

if yes, judging that the ith pixel point is not matched with the corresponding first Gaussian mixture model;

6. The motion foreground detection method of claim 5 further comprising, before said obtaining the current video frame to be detected:

after the ith pixel point is judged to be matched with the jth sub-Gaussian model, the method further comprises the following steps:

judging whether the current video frame is an update frame;

7. The method for detecting moving foreground according to claim 6 wherein the updating the model parameters of the first Gaussian mixture model according to the preset updating manner includes:

8. The method according to claim 2, wherein the step of matching any one of the pixels with the first gaussian mixture model corresponding to other pixels within a search range preset around the pixel for each of the pixels classified as foreground pixels comprises:

if the square of the second distance is smaller than the lambda of the variance of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h-th pixel point₂If so, judging that the kth pixel point is matched with any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point; wherein λ is₂＞0；

9. A motion foreground detection apparatus, comprising:

10. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the motion foreground detection method according to any one of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the motion foreground detection method according to any one of claims 1 to 8.