CN113409353B

CN113409353B - Motion prospect detection method, motion prospect detection device, terminal equipment and storage medium

Info

Publication number: CN113409353B
Application number: CN202110626840.9A
Authority: CN
Inventors: 朋兴磊; 符顺; 许楚萍
Original assignee: Hangzhou Lianji Technology Co ltd
Current assignee: Hangzhou Lianji Technology Co ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2023-08-01
Anticipated expiration: 2041-06-04
Also published as: CN113409353A

Abstract

The invention discloses a motion prospect detection method, a motion prospect detection device, terminal equipment and a storage medium, wherein the motion prospect detection method comprises the following steps: acquiring a current video frame to be detected; when the current video frame is a first frame picture, initializing a pre-established first mixed Gaussian model by using a pixel value of each pixel point of the current video frame; otherwise, motion vector estimation is carried out on the current video frame according to the background frame, so that global motion estimation of the current video frame is obtained; performing position correction on the first mixed Gaussian model of each pixel point in the current video frame according to the global motion estimator; matching each pixel point of the current video frame with the first mixed Gaussian model after the position correction of the pixel point; and classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result. The method can solve the problem of missing detection of the motion foreground in the current motion foreground detection method under the dynamic scene.

Description

Motion prospect detection method, motion prospect detection device, terminal equipment and storage medium

Technical Field

The present invention relates to the field of computer image processing technologies, and in particular, to a motion foreground detection method, a motion foreground detection device, a terminal device, and a storage medium.

Background

In the field of computer digital image processing, image processing-based detection of motion foreground refers to distinguishing moving foreground objects from background in a video sequence. According to the motion state of a shot picture, a motion foreground detection algorithm can be divided into a foreground detection method under a static background and a foreground detection method under a dynamic background, wherein the static background corresponds to a scene with a fixed camera and a relatively stable background; the dynamic background comprises two cases, one is that the camera is fixed, but the shot picture comprises a moving background, such as leaf shake, water wave fluctuation and the like; another is that the camera is in a motion state, and the photographed image moves along with the camera.

For motion foreground detection in dynamic scenes, the current method comprises the following steps: the image is subjected to downsampling and filtering operation through the Gaussian pyramid, the dynamic scene is changed into a relatively stable static scene, mixed Gaussian background modeling is conducted through data of a layer of which the motion of the pixel points meets the preset condition, and therefore motion foreground detection under the dynamic scene is achieved, but a large amount of image detail information can be lost in downsampling and filtering operation, and detection omission of some motion prospects is caused.

Disclosure of Invention

The embodiment of the invention aims to provide a motion prospect detection method, a motion prospect detection device, terminal equipment and a storage medium, so as to solve the problem of motion prospect omission in the current motion prospect detection method under a dynamic scene.

To achieve the above object, a first embodiment of the present invention provides a motion foreground detection method, including:

acquiring a current video frame to be detected;

when the current video frame is a first frame picture, initializing a pre-established first mixed Gaussian model by using a pixel value of each pixel point of the current video frame;

when the current video frame is not the first frame picture, estimating motion vectors of the current video frame according to the background frame to obtain a global motion estimation value of the current video frame; the background frame is obtained by calculation according to a first mixed Gaussian model of all pixel points of the current video frame;

performing position correction on a first mixed Gaussian model corresponding to each pixel point in the current video frame according to the global motion estimator;

matching each pixel point of the current video frame with the first mixed Gaussian model after the position correction;

and classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result.

Preferably, the method further comprises:

for each pixel point classified as a foreground pixel point, matching any one pixel point with a first mixed Gaussian model corresponding to other pixel points in a preset search range around the pixel point to obtain a neighborhood matching result;

and finally judging whether any pixel belongs to a foreground pixel or a background pixel according to the neighborhood matching result.

Preferably, the estimating the motion vector of the current video frame according to the background frame to obtain the global motion estimator of the current video frame specifically includes:

dividing a current video frame into a plurality of image blocks, and taking one of the image blocks as a reference block;

searching out an image block similar to the reference block in a corresponding preset searching area of the background frame to be used as a matching block;

obtaining a local motion vector of the reference block according to the relative displacement of the reference block and the matching block;

and counting the local motion vectors of all the image blocks by adopting a histogram statistical method, and taking the local motion vector represented by the highest point in the histogram as the global motion estimation amount of the current video frame.

Preferably, the performing, according to the global motion estimator, position correction on the first mixed gaussian model corresponding to each pixel in the current video frame specifically includes:

Shifting the first mixed Gaussian model corresponding to all pixel points of the current video frame according to the global motion estimation;

discarding the first mixed Gaussian model corresponding to the pixel points of the picture area exceeding the preset imaging range;

and establishing a corresponding first mixed Gaussian model for the pixel points which newly enter the picture area of the preset imaging range.

Preferably, the first hybrid gaussian model comprises a plurality of sub-gaussian models, then,

the matching each pixel of the current video frame with the first mixed gaussian model thereof or the matching each pixel of the current video frame with the first mixed gaussian model after the position correction thereof comprises the following steps:

calculating a first distance between an ith pixel point of a current video frame and a mean value of a jth sub-Gaussian model in a first mixed Gaussian model of the ith pixel point;

lambda when the square of the first distance is smaller than the variance of the jth sub-Gaussian model ₁ When the number is multiplied, judging that the ith pixel point is matched with the jth sub Gaussian model; wherein lambda is ₁ ＞0；

Otherwise, judging whether the j-th sub-Gaussian model is the last sub-Gaussian model or not;

if yes, judging that the first mixed Gaussian model corresponding to the ith pixel point is not matched;

If not, continuing to calculate a first distance between the ith pixel point and the mean value of the j+1th sub-Gaussian model.

Preferably, after the obtaining the current video frame to be detected, the method further includes:

setting an update frame interval frame number, wherein the update frame is used for indicating the model parameters of the first mixed Gaussian model to update according to a preset update mode; then the first time period of the first time period,

after the judging judges that the ith pixel point is matched with the jth sub-Gaussian model, the method further comprises the following steps:

judging whether the current video frame is an update frame or not;

if the current video frame is an update frame, updating the model parameters of the first mixed Gaussian model according to the preset updating mode;

and sorting the weights of all the sub-Gaussian models in the updated first mixed Gaussian model according to descending order, calculating the sum of the weights from large to small, and taking the sub-Gaussian model corresponding to the sum of the weights larger than a preset threshold value as a background model.

Preferably, the updating the model parameters of the first mixed gaussian model according to the preset updating mode includes:

replacing the pixel value of the ith pixel point with the mean value of the sub-Gaussian model with the minimum weight in the first mixed Gaussian model corresponding to the ith pixel point, and initializing the variance and the weight of the sub-Gaussian model;

And carrying out normalization processing on the weights of all the sub-Gaussian models to update the first mixed Gaussian model of the ith pixel.

Preferably, for each pixel classified as a foreground pixel, matching any one pixel with a first mixed gaussian model corresponding to other pixels in a preset search range around the pixel, including:

calculating a second distance between the kth pixel point classified as the foreground pixel point and the mean value of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point in a preset search range around the kth pixel point;

if the square of the second distance is smaller than lambda of the variance of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point ₂ The method comprises the steps of multiplying, judging whether a kth pixel point is matched with any sub-Gaussian model in a first Gaussian mixture model corresponding to the kth pixel point; wherein lambda is ₂ ＞0；

When any matched sub Gaussian model is a background model, judging a kth pixel point as a background pixel point;

and when any matched sub-Gaussian model is not a background model, judging the kth pixel point as a foreground pixel point.

A second embodiment of the present invention provides a motion foreground detection apparatus, including:

The video frame acquisition module is used for acquiring a current video frame to be detected;

the model initialization module is used for initializing a pre-established first mixed Gaussian model by using the pixel value of each pixel point of the current video frame when the current video frame is a first frame picture;

the motion estimation module is used for estimating motion vectors of the current video frame according to the background frame when the current video frame is not the first frame picture, so as to obtain a global motion estimation value of the current video frame; the background frame is obtained by calculation according to a first mixed Gaussian model of all pixel points of the current video frame;

the position correction module is used for carrying out position correction on the first mixed Gaussian model corresponding to each pixel point in the current video frame according to the global motion estimator;

the matching module is also used for matching each pixel point of the current video frame with the first mixed Gaussian model after the position correction;

and the first foreground judging module is used for classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result.

A third embodiment of the present invention correspondingly provides a terminal device, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the motion foreground detection method according to any one of the first embodiments above when executing the computer program.

A fourth embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device where the computer readable storage medium is located to execute the motion foreground detection method according to any one of the first embodiments.

Compared with the prior art, the motion prospect detection method, the motion prospect detection device, the terminal equipment and the storage medium provided by the embodiment of the invention have the advantage that the condition that small moving objects are blurred due to filtering to cause missed detection in the existing prospect detection method is avoided by combining the Gaussian mixture model with the motion estimation.

Drawings

FIG. 1 is a schematic flow chart of a motion foreground detection method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a motion foreground detection method according to another embodiment of the present invention;

FIG. 3 shows two search templates of a diamond search method according to an embodiment of the present invention, a large diamond template (LDSP) with 9 search positions on the left and a small diamond template (SDSP) with 5 search positions on the right;

FIG. 4 is a schematic diagram of a first hybrid Gaussian model for position correction according to an embodiment of the invention;

FIG. 5 is a schematic flow chart of foreground or background judgment after obtaining a first mixed Gaussian model according to an embodiment of the invention;

FIG. 6 is a schematic diagram of a motion foreground detection apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flow chart of a motion foreground detection method according to the embodiment of the present invention is shown, where the method includes steps S1 to S6:

s1, acquiring a current video frame to be detected;

s2, when the current video frame is a first frame picture, initializing a pre-established first mixed Gaussian model by using a pixel value of each pixel point of the current video frame;

S3, when the current video frame is not the first frame picture, performing motion vector estimation on the current video frame according to the background frame to obtain a global motion estimation value of the current video frame; the background frame is obtained by calculation according to a first mixed Gaussian model of all pixel points of the current video frame;

s4, carrying out position correction on a first mixed Gaussian model corresponding to each pixel point in the current video frame according to the global motion estimator;

s5, matching each pixel point of the current video frame with the first mixed Gaussian model after the position correction;

s6, classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result.

Specifically, a frame of video frame to be detected in a dynamic scene is obtained. Preferably, the acquired current video frame is subjected to image preprocessing, such as downsampling, graying, filtering, etc. After the preprocessing is completed, a first mixed Gaussian model is respectively built for each pixel point in the current video frame. Each first mixed Gaussian model consists of K sub-Gaussian models, and each sub-Gaussian model comprises three parameters of weight, mean and variance. Wherein, after the first mixed gaussian model of each video frame is established, the corresponding background frame thereof can be calculated, in particular: and calculating a weighted average value of the first mixed Gaussian model corresponding to all pixel points of the video frame to obtain a background frame corresponding to the video frame.

When the current video frame is the first frame picture, initializing a first mixed Gaussian model by using the current video frame, wherein the initializing operation of each pixel point is as follows: setting the average value of all sub-Gaussian models corresponding to each pixel point as a current pixel value; the variance is initialized to a large initial value, typically set to 15 x 15; the weights are evenly distributed, and the weight of each model is 1/K. Preferably, K is 3-6, and when the scene is complex, the value of K is larger. And directly matching each initialized pixel point with the first mixed Gaussian model of the pixel point.

When the current video frame is not the first frame picture, the background frame is taken as a reference frame, the background frame is compared with the current video frame, and motion vector estimation is carried out on the current video frame to obtain the motion vector of the current video frame; and the first mixed Gaussian model is shifted according to the motion vector, so that position correction is realized, the shifted first mixed Gaussian model almost corresponds to the pixel points of the current video frame in a one-to-one mode on the spatial position, and the condition that a small moving object is blurred and detection omission is caused by a filtering method is avoided. And then, matching the pixel points with the shifted first mixed Gaussian model.

And carrying out foreground judgment on the matching result of the video frames belonging to the first frame picture and the video not belonging to the first frame picture. If a pixel point is not matched with the first mixed Gaussian model, the pixel point can be preliminarily judged to be a foreground pixel point, otherwise, the pixel point can be preliminarily judged to be a background pixel point. In the embodiment of the invention, the condition that the small moving object is blurred due to filtering to cause missed detection in the existing foreground detection method can be avoided by combining the Gaussian mixture model with the motion estimation.

In an alternative embodiment, the method further comprises:

s11, for each pixel point classified as a foreground pixel point, matching any one pixel point with a first mixed Gaussian model corresponding to other pixel points in a preset search range around the pixel point to obtain a neighborhood matching result;

and S12, finally judging that any pixel belongs to a foreground pixel or a background pixel according to the neighborhood matching result.

For the pixel point which is preliminarily judged to be the foreground pixel point, matching the pixel point with a first mixed Gaussian model corresponding to other pixel points in a preset searching range around the pixel point to obtain a neighborhood matching result; and according to the neighborhood matching result, carrying out foreground judgment again, if the pixel point is matched with the first mixed Gaussian model of other pixel points in the field, finally judging the pixel point as a background pixel point, and otherwise, finally judging the pixel point as a foreground pixel point. Through the neighborhood model search, the range can be adaptively changed according to the local motion estimation amount, the problem that the motion change of the local area is large is avoided, and the precision problem of motion estimation can be solved. Preferably, the search range of the neighborhood model is half of the local motion estimation of the region to which the current pixel belongs.

Referring to fig. 2, a flow chart of a motion foreground detection method according to another embodiment of the present invention is shown in fig. 2, where in the execution flow chart, in order to improve the adaptability of a model, when the model is matched and judged, parameters of a part of the model are updated, and then a subsequent step is performed according to the updated model.

The embodiment of the invention provides a motion foreground detection method, which comprises the steps of initializing a mixed Gaussian model by using a first frame of video frame in a dynamic scene, directly performing model matching, taking a background frame as a reference for a non-first frame of video frame to obtain a motion vector of the non-first frame of video frame, performing position correction on the mixed Gaussian model, and performing matching of other frames of pictures and a correction model; and then carrying out foreground judgment based on the matching result. In order to avoid that the background edge part is detected as the foreground, the field model search is further added to the pixel points which are preliminarily judged as the foreground pixel points, so that whether the relevant pixel points are the foreground is further judged, and the problem of motion foreground omission in the current motion foreground detection method in the dynamic scene is effectively solved.

In an optional embodiment, the estimating the motion vector of the current video frame according to the background frame to obtain the global motion estimator of the current video frame specifically includes:

obtaining a local motion vector corresponding to the reference block according to the relative displacement of the reference block and the matching block;

Specifically, the motion estimation method may use: gray projection, block matching, phase correlation, optical flow, etc. Preferably, the embodiment of the invention adopts a block matching method to estimate the local motion vector, and finally obtains the global motion vector through a histogram statistical method. The block matching method is to find out the most relevant parts in the image by comparing the current frame with the background frame and establish the connection between them. The implementation process of the block matching method is as follows:

any frame picture is divided into a plurality of image blocks, and one of the image blocks is taken as a reference block. Preferably, the current frame image is divided into a plurality of image blocks of m×n size, one of the image blocks is taken as a reference block, and the position of the center thereof is recorded.

And taking a search window of (delta m+m) x (delta n+n) in a corresponding search area of the background frame, and searching out an image block similar to the reference block as a matching block. There are a number of search methods, including: the full search method, the three-step search method, the diamond search method and the like are preferably adopted for searching, two large diamond templates (LDSP) with 9 search positions are adopted as search templates (left side of fig. 3), and small diamond templates (SDSP) with 5 search positions are adopted as search templates (right side of fig. 3), and the detailed search is as follows, with specific reference to fig. 3:

a) Initializing a large diamond template LDSP, taking the origin of a search window as the center, and calculating errors between the image blocks at the positions corresponding to the background frame when the center of the reference block is positioned at 9 search positions respectively. The error may be expressed in terms of Sum of Absolute Differences (SAD), normalized correlation function (NCFF), root Mean Square Error (RMSE), maximum number of Matched Pixels (MPC), structural Similarity (SSIM), etc. For example, root mean square error may be used as a measure, if the calculated minimum block error (MBD, minimun Block Distortion) point is located at the center, then go to step c, otherwise go to step b. The root mean square error is calculated as follows, wherein Representing the i-th pixel value of the block to be matched, is>Represents the i-th pixel value of the reference block, and N is the total number of pixel points in the reference block.

b) Constructing a new LDSP by taking the MBD point found at the previous time as the center, calculating the matching errors of 9 search points of the new LDSP, finding a new MBD point, turning to the step c if the new MBD point is positioned at the center of the template, otherwise repeating the step b;

c) And constructing a small diamond-shaped template SDSP by taking the MBD point obtained at the previous time as a center, performing matching calculation and comparison at 5 search points of the small diamond-shaped template SDSP, and finding out the MBD point, wherein the position of the MBD point is the final matching block.

And after the matching block is obtained, obtaining the local motion vector of the corresponding reference block according to the relative displacement of the reference block and the matching block. Since the relative displacement between the matching block and the reference block is the motion vector of the reference block, i.e. the local motion vector.

After the local motion vectors of different image blocks are obtained, the local motion vectors of all the image blocks are counted by adopting a histogram statistical method, and the local motion vector represented by the highest point in the histogram is used as the global motion estimation amount of the current video frame picture.

In an optional embodiment, the performing, according to the global motion estimator, position correction on the first mixed gaussian model corresponding to each pixel in the current video frame specifically includes:

Specifically, referring to fig. 4, a schematic diagram of the position correction of the first mixed gaussian model according to the embodiment of the present invention is shown. As can be seen from fig. 4, assuming that the global motion estimator is denoted by (x, y), x denotes a lateral offset, y denotes a longitudinal offset, the positions of all first hybrid gaussian models are offset in the same direction, e.g. the model in the original region a in the left graph is offset to the region a' in the right graph. Because the imaging range is fixed, after the offset, a part of the image area is offset out of the imaging range, such as dark gray areas at the upper part and the right part of the left graph), and the model corresponding to the area image is to be discarded; the region newly entering the imaging range, such as the light gray region at the lower and left in the right drawing, is to initialize the corresponding first hybrid gaussian model with the pixels at the corresponding positions of the current frame.

That is, the shifting is performed according to the global motion estimation, the first mixed gaussian model corresponding to the picture area exceeding the preset imaging range is discarded, the corresponding first mixed gaussian model is built for the picture area newly entering the preset imaging range, that is, for the area newly entering the preset imaging range, the pixels of the corresponding area of the current frame picture are used for initializing, and the initializing operation is similar to the initializing of the mixed gaussian model of the first frame picture and is not repeated herein.

After the original first mixed Gaussian model is partially abandoned and partially supplemented, a corrected first mixed Gaussian model can be obtained according to the adjusted first mixed Gaussian model. The pixel positions of the current video frame picture correspond to the positions of the shifted first mixed Gaussian model one by one, so that foreground or background judgment is conducted.

As a modification of the above, the first mixed gaussian model includes a plurality of sub-gaussian models, and then,

if yes, the first mixed Gaussian model corresponding to the ith pixel point is not matched;

In particular, the first hybrid gaussian model comprises a plurality of sub-gaussian models, then,

after the first hybrid gaussian model performs the position correction, a foreground or background determination may be performed, so the method further comprises:

and calculating a first distance between any pixel point of the current video frame and the mean value of any sub-Gaussian model in the corresponding first mixed Gaussian model. For example, the squared distance between the current pixel value pixel and the mean μ of the sub-gaussian model is calculated.

When the square of the first distance is smaller than the variance of any sub-Gaussian model corresponding to the current pixel point Lambda of (2) ₁ At times, expressed as +.>Judging whether the current pixel point is matched with any sub Gaussian model corresponding to the current pixel point; wherein lambda is ₁ ＞0。

Otherwise, i.eAnd judging whether any sub-Gaussian model corresponding to the current pixel point is the last sub-Gaussian model.

If yes, namely, all the sub Gaussian models are not matched with the current pixel point, judging that the current pixel point is not matched with a first mixed Gaussian model corresponding to the current pixel point;

if not, continuing to calculate a first distance between the current pixel point and the mean value of the next sub-Gaussian model, namely traversing all sub-Gaussian models in the first mixed Gaussian model corresponding to the current pixel point.

As an improvement of the above solution, after the obtaining of the current video frame to be detected, the method further includes:

after the ith pixel point is judged to be matched with the jth sub-mixture Gaussian model, the method further comprises the following steps:

judging whether the current video frame is an update frame or not;

Specifically, the model parameters are set to be updated every 10 frames. The video frames per 10 frames interval are marked with an update frame marker,

further, the updating the model parameters of the first mixed gaussian model according to the preset updating mode includes:

judging whether the current video frame is an update frame or not, namely judging whether the current video frame has a mark, if so, the current video frame is the update frame, and if so, model update is needed to be carried out on a model corresponding to the current video frame.

If the current video frame is an update frame, updating the model parameters of the first mixed Gaussian model according to a preset update mode, and carrying out normalization processing on the weights of all updated sub Gaussian models. The preset updating mode is as follows:

wherein the method comprises the steps of，/>，/>Respectively representing the weight, the mean value and the variance of any sub Gaussian model at the moment t, and the +.>Represent the learning rate of the sub-Gaussian model, +.>The sub-gaussian model is set to 1 when matching is successful and to 0 when matching fails. / >And indicating the current pixel point matched with the sub-Gaussian model at the moment t. After updating the parameters, normalization operation is needed to be carried out on the weights of all the submodels, and the sum of the weights is ensured to be 1. The normalized calculation formula is as follows:

and sorting the weights of all the sub-Gaussian models in the updated first mixed Gaussian model according to descending order, calculating the sum of the weights from large to small, and taking the sub-Gaussian model corresponding to the sum of the weights larger than a preset threshold value T as a background model.

As an improvement of the above-described aspect, after determining that the i-th pixel point is the foreground pixel point, the method further includes:

Specifically, the pixel value of the current pixel point is replaced by the mean value of the sub-Gaussian model with the minimum weight in the first mixed Gaussian model corresponding to the current pixel point, and the variance and the weight of the sub-Gaussian model are initialized. Preferably, the weights of the sub-gaussian models are initialized to a small value, for example 0.001.

In order to ensure that the sum of the weights is 1, carrying out normalization processing on the weights of all the sub-Gaussian models to obtain a new sub-Gaussian model. In the embodiment of the invention, in the foreground/background detection process, the running time of the algorithm is reduced and the real-time performance of the algorithm is improved by not updating the parameters of the mixed Gaussian model in the updating frame.

To better understand how to make the foreground or background determination after the first mixed gaussian model is obtained, referring to fig. 5, a schematic flow chart of the foreground or background determination after the first mixed gaussian model is obtained is provided in this embodiment of the present invention. As can be seen from fig. 5, after the first mixed gaussian model is obtained, the distance between the current pixel value and the corresponding position model is calculated, then whether the current pixel value is matched with the corresponding position model is judged according to the distance, if the current pixel value is matched with the corresponding position model, then whether the current frame is an updated frame is judged, when the current video frame is the updated frame, the model parameters are updated, then whether the matched model is a background model is judged, if the matched model is the background model, the current pixel point is the background pixel point, and if the matched model is not the background model, the current pixel point is the foreground pixel point; when the current video frame is not an updated frame, judging whether the matched model is a background model, if so, determining that the current pixel point is a background pixel point, and if not, primarily determining that the current pixel point is a foreground pixel point; if the two models are not matched, the matching condition of the next model is continuously judged, and when all the models are not matched, the current pixel value is replaced by the average value of the model with the minimum weight. For the pixels which are preliminarily determined to be foreground pixels, further, in order to compensate for the precision problem of motion estimation, a neighborhood model search is also required, as shown in steps S11-S12.

As an improvement of the above solution, for each pixel classified as a foreground pixel, matching any one pixel with a first mixed gaussian model corresponding to other pixels within a preset range around the pixel, specifically includes:

calculating a second distance between a kth pixel point classified as a foreground pixel point and a mean value of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point in a preset peripheral range;

if the square of the second distance is smaller than lambda of the variance of any sub-Gaussian model corresponding to the h pixel point ₂ The k pixel point is judged to be matched with any sub-Gaussian model corresponding to the h pixel point; wherein lambda is ₂ ＞0；

Specifically, for the pixel point preliminarily determined as the foreground pixel point in step S6, calculating a second distance between the current pixel point and the mean value of any one sub-gaussian model in the first mixed gaussian model corresponding to other peripheral pixel points;

If the square of the second distance is smaller than lambda of the variance of any sub-Gaussian model corresponding to other peripheral pixel points ₂ Multiple, expressed asJudging whether the current pixel point is matched with any sub Gaussian model corresponding to other peripheral pixel points; wherein lambda is ₂ > 0. If the current pixel point is successfully matched with the model of any pixel point in the periphery, the neighborhood model search is exited, whether the pixel point belongs to the background model is judged according to the weight of the successfully matched model, and further whether the pixel point belongs to the background pixel point or the foreground pixel point is judged. Only model matching and foreground/background judgment are carried out in the process of searching the neighborhood model, and model parameters are not updated.

When any matched sub Gaussian model is a background model, judging the current pixel point as a background pixel point; and when any matched sub-Gaussian model is not a background model, judging the current pixel point as a foreground pixel point. Likewise, the background model is determined in the following manner: the sub-Gaussian models are firstly ordered according to the descending order of the weights, then the sum of the weights is calculated from large to small, and the sub-Gaussian model corresponding to the weight sum being larger than a preset threshold T is used as a background model.

In addition, ifJudging whether any sub-Gaussian model corresponding to other pixel points is the last sub-Gaussian model or not;

If yes, the first mixed Gaussian model corresponding to the current pixel point and other pixel points is not matched, and the current pixel point is the foreground pixel point.

If not, continuing to calculate a second distance between the current pixel point and the mean value of the next sub-Gaussian model.

In the embodiment of the invention, the dynamic scene is processed by utilizing the motion estimation and the neighborhood model search, so that the condition of missed detection caused by the blurring of small moving objects caused by the existing filtering method can be avoided, meanwhile, the neighborhood model search range can be adaptively changed according to the local motion estimation amount, and for the region with intense motion, the neighborhood search range is correspondingly larger, and the problem of foreground detection precision caused by the motion estimation can be solved by utilizing the neighborhood model search.

Referring to fig. 6, a schematic structural diagram of a motion foreground detection apparatus according to this embodiment of the present invention is provided, where the apparatus includes:

a video frame acquisition module 11, configured to acquire a current video frame to be detected;

a model initialization module 12, configured to initialize a first hybrid gaussian model that is built in advance using a pixel value of each pixel point of a current video frame when the current video frame is a first frame picture;

the motion estimation module 13 is configured to perform motion vector estimation on the current video frame according to the background frame when the current video frame is not the first frame, so as to obtain a global motion estimation value of the current video frame; the background frame is obtained by calculation according to a first mixed Gaussian model of all pixel points of the current video frame;

A position correction module 14, configured to perform position correction on a first mixed gaussian model corresponding to each pixel point in the current video frame according to the global motion estimator;

the matching module 15 is further configured to match each pixel point of the current video frame with the first mixed gaussian model after the position correction;

the first foreground judging module 16 is configured to classify each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result.

Preferably, the apparatus further comprises:

the field searching module is used for matching any one pixel point with a first mixed Gaussian model corresponding to other pixels in a preset searching range around the pixel point for each pixel point classified as a foreground pixel point to obtain a neighborhood matching result;

and the second foreground judging module is used for finally judging that any pixel belongs to a foreground pixel or a background pixel according to the neighborhood matching result.

Preferably, the motion estimation module 13 specifically includes:

the block dividing unit is used for dividing the current video frame into a plurality of image blocks, and taking one of the image blocks as a reference block;

a matching block searching unit, configured to search out an image block similar to the reference block in a corresponding preset searching area of the background frame, as a matching block;

A local motion estimation unit, configured to obtain a local motion vector corresponding to the reference block according to the relative displacement between the reference block and the matching block;

and the global motion estimation unit is used for counting the local motion vectors of all the image blocks by adopting a histogram counting method, and taking the local motion vector represented by the highest point in the histogram as the global motion estimation amount of the current video frame.

Preferably, the position correction module 14 specifically includes:

and the offset unit is used for offsetting the current video frame according to the global motion estimation amount, discarding the first mixed Gaussian model corresponding to the pixel points of the picture area exceeding the preset imaging range, and establishing the corresponding first mixed Gaussian model according to the pixel points of the picture area newly entering the preset imaging range.

Preferably, the first mixed gaussian model comprises a plurality of sub-gaussian models, and the apparatus further comprises:

the first distance calculation module is used for calculating a first distance between an ith pixel point of the current video frame and the mean value of a jth sub-Gaussian model in the first mixed Gaussian model of the ith pixel point;

a first judgment module for judging that the square of the first distance is less than lambda of the variance of the jth sub-Gaussian model ₁ When the number is multiplied, judging that the ith pixel point is matched with the jth sub Gaussian model; wherein lambda is ₁ ＞0；

The second judging module is used for judging whether the j-th sub-Gaussian model is the last sub-Gaussian model or not;

the third judging module is used for judging that the first mixed Gaussian model corresponding to the ith pixel point is not matched if the first mixed Gaussian model is matched with the ith pixel point;

and the second distance calculation module is used for continuously calculating the first distance between the ith pixel point and the mean value of the j+1th sub Gaussian model if not.

Preferably, the apparatus further comprises:

the marking module is used for setting an update frame interval frame number, and the update frame is used for indicating the model parameters of the first mixed Gaussian model to update according to a preset update mode; then the first time period of the first time period,

the apparatus further comprises:

a fourth judging module, configured to judge whether the current video frame is an update frame;

the updating module is used for updating the model parameters of the first mixed Gaussian model according to the preset updating mode if the current video frame is an updating frame;

the sorting module is used for sorting the weights of all the sub-Gaussian models in the updated first mixed Gaussian model according to descending order, calculating the sum of the weights from large to small, and taking the sub-Gaussian model corresponding to the weight sum larger than a preset threshold value as a background model.

Preferably, the updating module includes:

the replacing unit is used for replacing the pixel value of the ith pixel point with the mean value of the sub-Gaussian model with the minimum weight in the first mixed Gaussian model corresponding to the ith pixel point, and initializing the variance and the weight of the sub-Gaussian model;

and the normalization processing unit is used for carrying out normalization processing on the weights of all the sub-Gaussian models so as to update the first mixed Gaussian model of the ith pixel.

Preferably, the first foreground determining module 16 specifically includes:

the distance calculation unit is used for calculating a second distance between a kth pixel point classified as a foreground pixel point and the mean value of any sub-Gaussian model in the first mixed Gaussian model corresponding to the kth pixel point in a preset search range around the kth pixel point;

a first judging unit configured to determine that if the square of the second distance is smaller than λ of the variance of any one of the sub-gaussian models corresponding to the h pixel point ₂ The k pixel point is judged to be matched with any sub-Gaussian model corresponding to the h pixel point; wherein lambda is ₂ ＞0；

The second judging unit is used for judging the kth pixel point as a background pixel point when any matched sub-Gaussian model is a background model;

And the third judging unit is used for judging the kth pixel point as the foreground pixel point when any matched sub-Gaussian model is not the background model.

The motion foreground detection device provided by the embodiment of the invention can realize all the processes of the motion foreground detection method described in any embodiment, and the functions and the realized technical effects of each module and unit in the device are respectively the same as those of the motion foreground detection method described in the embodiment, and are not repeated here.

Referring to fig. 7, a schematic diagram of a terminal device according to this embodiment of the present invention is provided, where the terminal device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and the processor 10 implements the motion foreground detection method according to any one of the foregoing embodiments when executing the computer program.

By way of example, a computer program may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 10 to perform the present invention. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specified function, the instruction segments describing the execution of a computer program in a kind of motion foreground detection. For example, the computer program may be divided into a model building module, a first matching module, a motion estimation module, a position correction module and a foreground judging module, where the specific functions of each module are as described in the foregoing embodiments of the motion foreground detection apparatus:

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram 7 is merely an example of a terminal device, and is not limiting of the terminal device, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The processor 10 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor 10 may be any conventional processor or the like, the processor 10 being the control center of the terminal device, the various interfaces and lines being used to connect the various parts of the overall terminal device.

The memory 20 may be used to store the computer programs and/or modules, and the processor 10 implements the various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory 20, and invoking data stored in the memory 20. The memory 20 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 20 may include high-speed random access memory, and may also include nonvolatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid state storage device.

Wherein the terminal device integrated modules may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of each method embodiment may be implemented. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program, wherein the computer program is used for controlling equipment where the computer readable storage medium is located to execute the motion prospect detection method of any embodiment.

In summary, the motion foreground detection method, the device, the terminal equipment and the storage medium provided by the embodiment of the invention combine the Gaussian mixture model, the motion estimation and the neighborhood model search, so that the condition that small moving objects are blurred due to filtering to cause missed detection in the existing foreground detection method is avoided, the neighborhood model search range can adaptively change according to the local motion estimation amount, the neighborhood search range of a region with severe motion is correspondingly larger, the neighborhood model search compensates the precision problem of motion estimation, and the problem that the background edge part is detected to be the foreground in a motion scene is effectively solved. Meanwhile, the invention reduces the calculation complexity of the motion foreground detection algorithm in the dynamic scene by reducing the model parameter updating frequency, improves the algorithm instantaneity, namely, does not update the model parameters of all the mixed Gaussian models, only updates the models corresponding to video frames with a certain frame number interval, only performs model matching and foreground/background judgment in the process of neighborhood model searching, and does not update the model parameters.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A motion foreground detection method, comprising:

acquiring a current video frame to be detected;

when the current video frame is not the first frame picture, estimating motion vectors of the current video frame according to the background frame to obtain a global motion estimation value of the current video frame; the background frame is obtained by calculating the weighted average value of a first mixed Gaussian model of all pixel points of the current video frame;

classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result;

The motion vector estimation is carried out on the current video frame according to the background frame to obtain the global motion estimation quantity of the current video frame, and the method specifically comprises the following steps:

2. The motion foreground detection method of claim 1, wherein the method further comprises:

3. The motion foreground detection method according to claim 1, wherein the performing, according to the global motion estimator, a position correction on the first hybrid gaussian model corresponding to each pixel in the current video frame specifically includes:

4. A method for motion foreground detection according to any one of claims 2-3, wherein the first hybrid Gaussian model comprises a plurality of sub-Gaussian models,

If yes, judging that the ith pixel point is not matched with the corresponding first mixed Gaussian model;

5. The motion foreground detection method of claim 4, further comprising, prior to said obtaining a current video frame to be detected:

after the ith pixel point is judged to be matched with the jth sub-Gaussian model, the method further comprises the following steps:

judging whether the current video frame is an update frame or not;

6. The method for detecting motion foreground according to claim 5, wherein updating the model parameters of the first hybrid gaussian model according to the preset updating method comprises:

7. The method for detecting a motion foreground according to claim 2, wherein for each pixel classified as a foreground pixel, matching any one pixel with a first mixed gaussian model corresponding to other pixels in a preset search range around the pixel, specifically includes:

if the square of the second distance is smaller than lambda of the variance of any sub-Gaussian model in the first mixed Gaussian model corresponding to the h pixel point ₂ The method comprises the steps of multiplying, judging whether a kth pixel point is matched with any sub-Gaussian model in a first mixed Gaussian model corresponding to the kth pixel point; wherein lambda is ₂ ＞0；

8. A motion foreground detection apparatus, comprising:

the motion estimation module is used for estimating motion vectors of the current video frame according to the background frame when the current video frame is not the first frame picture, so as to obtain a global motion estimation value of the current video frame; the background frame is obtained by calculating the weighted average value of a first mixed Gaussian model of all pixel points of the current video frame;

The first foreground judging module is used for classifying each pixel point in the current video frame into a foreground pixel point and a background pixel point according to the matching result;

the motion estimation module specifically comprises:

9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the motion foreground detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the motion foreground detection method according to any one of claims 1 to 7.