CN103810691A

CN103810691A - Video-based automatic teller machine monitoring scene detection method and apparatus

Info

Publication number: CN103810691A
Application number: CN201210444071.1A
Authority: CN
Inventors: 任烨; 童俊艳; 蔡巍伟; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2012-11-08
Filing date: 2012-11-08
Publication date: 2014-05-21
Anticipated expiration: 2032-11-08
Also published as: CN103810691B

Abstract

The invention discloses a video-based automatic teller machine (ATM) monitoring scene detection method and apparatus. The method comprises the following steps that: a background model related with an ATM monitoring scene is established, wherein a background image and a predetermined parameter respectively corresponding to each pixel point in the background image are determined; and after model establishment completion, when a frame of monitoring image X is obtain each time, the following steps are carried out: a binary foreground image of the monitoring image X is generated according to the background model; edge texture information of the monitoring image X and the background image is respectively obtained and edge similarity of the monitoring image X and the background image is determined according to the obtained edge texture information; and according to the generated binary foreground image and the determined edge similarity, whether someone is in the monitoring image X is determined.

Description

A kind of ATM (Automatic Teller Machine) monitoring scene detection method and device based on video

Technical field

The present invention relates to video technique, particularly a kind of ATM (Automatic Teller Machine) (ATM, AutomaticTeller Machine) monitoring scene detection method and device based on video.

Background technology

In prior art, how detect in ATM monitoring scene whether have people with physical sensors, as conventional infrared emission.But infrared emission is subject to foreign matter and disturbs, as domestic once there be the foreign matter of interference to appear at correlation range ring, there will be the wrong report that always has people, thereby cause the accuracy of testing result to reduce.

Summary of the invention

In view of this, the invention provides a kind of ATM monitoring scene detection method and device based on video, can improve the accuracy of testing result.

For achieving the above object, technical scheme of the present invention is achieved in that

An ATM monitoring scene detection method based on video, comprising:

Set up the background model about described ATM monitoring scene, comprise preset parameter corresponding to each pixel difference of determining in background image and background image;

After modeling completes, in the time often getting a frame monitoring image X, carry out respectively following processing:

Generate the two-value foreground image of monitoring image X according to background model;

Obtain respectively the Edge texture information of monitoring image X and background image, and determine the edge similarity of monitoring image X and background image according to the Edge texture information getting;

Determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined.

An ATM monitoring scene pick-up unit based on video, comprising:

MBM, for setting up the background model about described ATM monitoring scene, comprises preset parameter corresponding to each pixel difference of determining in background image and background image, and set up background model is sent to detection module;

Described detection module, after completing in modeling, in the time often getting a frame monitoring image X, carries out respectively following processing: the two-value foreground image that generates monitoring image X according to background model; Obtain respectively the Edge texture information of monitoring image X and background image, and determine the edge similarity of monitoring image X and background image according to the Edge texture information getting; Determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined.

Visible, adopt scheme of the present invention, the brightness prospect of combining image and Edge texture information determine in ATM monitoring scene, whether there is people, thereby have improved the accuracy of testing result; And scheme of the present invention, applicable to various ATM monitoring scene, has broad applicability, be convenient to universal and promote.

Accompanying drawing explanation

Fig. 1 is the process flow diagram that the present invention is based on the ATM monitoring scene detection method embodiment of video.

Fig. 2 is the schematic diagram of existing Sobel operator.

Embodiment

For problems of the prior art, a kind of ATM monitoring scene detection scheme based on video is proposed in the present invention, can improve the accuracy of testing result.

Monitoring image in scheme of the present invention utilizes ATM monitoring camera to photograph, and ATM monitoring camera needs to photograph storage/access money people's zone of action.

For make technical scheme of the present invention clearer, understand, referring to the accompanying drawing embodiment that develops simultaneously, scheme of the present invention is described in further detail.

Fig. 1 is the process flow diagram that the present invention is based on the ATM monitoring scene detection method embodiment of video.As shown in Figure 1, comprising:

Step 11: set up the background model about ATM monitoring scene, comprise preset parameter corresponding to each pixel difference of determining in background image and background image.

Because ATM monitoring scene environmental facies, to single, therefore can adopt single Gaussian Background modeling, single Gaussian Background modeling is applicable to unimodal distribution background.

Scheme of the present invention is only carried out modeling for the gray-scale value of each pixel, and the each pixel respectively preset parameter of correspondence comprises: average μ and variances sigma etc.

The specific implementation of this step can comprise:

A, obtain a frame monitoring image, by this monitoring image image as a setting;

For the each pixel in this background image, respectively using the gray-scale value of this pixel as average corresponding to this pixel, using the variance of the gray-scale value of this pixel as variance corresponding to this pixel.

Whether B, definite monitoring image number getting equal M, and M is greater than 1 positive integer, if so,, using the up-to-date background image obtaining as final required background image, complete modeling, if not, obtain the monitoring image that a frame is new, and perform step C.

Background image B after C, definite renewal _new(x, y):

B _new(x，y)＝(1-ρ)B _old(x，y)+ρI(x，y)； (1)

Wherein, ρ represents renewal rate, and its value equals 1/N, and N represents the monitoring image number getting, and I (x, y) represents the up-to-date monitoring image getting, B _old(x, y) represents the background image before renewal;

For B _neweach pixel in (x, y), respectively using the gray-scale value of this pixel as average corresponding to this pixel, by (1-ρ) σ _old+ ρ d is as variances sigma corresponding to this pixel _new, have:

σ _new＝(1-ρ)σ _old+ρd； (2)

Wherein, σ _oldrepresent B _old(x, y) variance corresponding to pixel identical with the coordinate position of this pixel in, d represents I (x, y) difference between the average that in, in the gray-scale value of the pixel identical with the coordinate position of this pixel and Bold (x, y), the pixel identical with the coordinate position of this pixel is corresponding;

Afterwards, repeated execution of steps B.

The concrete value of M can be decided according to the actual requirements, such as can be 100.

Illustrate:

The value of supposing M is 100, for ease of statement, this 100 frame monitoring image is numbered respectively to monitoring image 1～monitoring image 100 according to acquisition time by the order after arriving first;

First, set up initial background model according to monitoring image 1, by monitoring image 1 image as a setting, and determine respectively corresponding average and the variance of each pixel in this background image;

Afterwards, according to formula (1), (2), utilize monitoring image 2 to upgrade the up-to-date background model obtaining, comprise the background image of determining after renewal and determine respectively corresponding average and the variance etc. of each pixel in the background image after renewal, wherein, the value of ρ equals 1/2;

In addition after, according to formula (1), (2), utilize monitoring image 3 to upgrade the up-to-date background model obtaining, comprise the background image of determining after renewal and determine respectively corresponding average and the variance etc. of each pixel in the background image after renewal, wherein, the value of ρ equals 1/3;

The processing mode that is numbered 4～99 monitoring image repeats no more;

Finally, according to formula (1), (2), utilize monitoring image 100 to upgrade the up-to-date background model obtaining, comprise the background image of determining after renewal and determine respectively corresponding average and the variance etc. of each pixel in the background image after renewal, wherein, the value of ρ equals 1/100, and using the background image finally obtaining and the each pixel determined respectively corresponding average and variance as final required background model.

Step 12: after modeling completes, in the time often getting a frame monitoring image X, carry out respectively following processing: the two-value foreground image that generates monitoring image X according to background model; Obtain respectively the Edge texture information of monitoring image X and background image, and determine the edge similarity of monitoring image X and background image according to the Edge texture information getting; Determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined.

For ease of statement, in scheme of the present invention, represent the monitoring image that arbitrary needs have or not people to detect with monitoring image X.

In actual applications, because ATM monitoring scene may change, therefore, can upgrade at any time to the background model of setting up in step 11 accuracy of the testing result when guaranteeing whether subsequent detection has people.Specifically, can, whether have people at every turn determining monitoring image X according to the two-value foreground image generating and the edge similarity determined after, utilize monitoring image X to upgrade original background model.

Correspondingly, for arbitrary monitoring image X, the realization of step 12 can be: the two-value foreground image that generates monitoring image X according to the up-to-date background model obtaining (utilizing the background model after the up-to-date monitoring image renewal getting before monitoring image X); Obtain respectively the Edge texture information of monitoring image X and the up-to-date background image obtaining, and determine the edge similarity of monitoring image X and the up-to-date background image obtaining according to the Edge texture information getting; Determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined.

Below above-mentioned related realization is described in detail respectively.

One) utilize monitoring image X to upgrade original background model

Specific implementation can comprise:

Determine the background image B after upgrading _new(x, y):

B _new(x，y)＝(1-ρ)B _old(x，y)+ρI(x，y)； (1)

Wherein, I (x, y) represents monitoring image X, the i.e. up-to-date monitoring image getting, B _old(x, y) represents the background image before renewal.

For B _neweach pixel in (x, y), respectively using the gray-scale value of this pixel as average corresponding to this pixel, by (1-ρ) σ _old+ ρ d is as variance corresponding to this pixel; Wherein, σ _oldrepresent B _oldvariance corresponding to pixel identical with the coordinate position of this pixel in (x, y), d represents gray-scale value and the B of pixel identical with the coordinate position of this pixel in I (x, y) _olddifference between the average that in (x, y), the pixel identical with the coordinate position of this pixel is corresponding.

Wherein, ρ represents renewal rate, and its value can comprise several as follows:

1), in the time determining in I (x, y) nobody,, in the time determining in monitoring image X nobody, the value of ρ is set to 0.01, thereby background model is constantly updated, to adapt to the slow variation of the scenes such as illumination;

2) when determining while having people in I (x, y), the value of ρ is set to 0,, in the time having people in ATM monitoring scene, stops being upgraded by background model;

3) from T to T-t, in this time period, in ATM monitoring scene, there is people when determining always, and ATM monitoring scene is always in the time of transfixion state, and the value of ρ is set to 1, T and represents to get I (x, y) moment, t > 0;

For preventing that ATM monitoring scene from changing suddenly, as ATM monitoring scene is transformed, cause and be judged as people always, can be when having people in ATM monitoring scene, and the transfixion time, while exceeding predetermined threshold as 2 minutes (value that is t is 2 minutes), makes ρ=1, the background of resetting, by present image I (x, y) image as a setting.

In this time period, whether ATM monitoring scene always can be as follows in transfixion state from T to T-t for how to confirm:

For any two two field picture I that get in this time period from T to T-t ₁(x, y) and I ₂(x, y), carry out respectively following processing:

Calculate Dif (x, y)=I ₁(x, y)-I ₂(x, y); (3)

Wherein, Dif (x, y) represents frame difference image, I ₁(x, y) is prior to I ₂(x, y) gets;

For Dif (x, y) the each pixel in, whether the gray-scale value of determining respectively this pixel is greater than predetermined threshold T1, if so, the value of this pixel is set to 1, otherwise, be set to 0, thereby the poor bianry image Dif_Fg of the frame that obtains Dif (x, y) (x, y);

The number Dif_Num of the pixel that in statistics Dif_Fg (x, y), value is 1, and whether definite Dif_Num be less than predetermined threshold T2, if so, determines I ₁(x, y) and I ₂between (x, y) in transfixion state;

If all in transfixion state, can determine from T to T-t in this time period that ATM monitoring scene is always in transfixion state between any two two field pictures that get in this time period from T to T-t.

The concrete value of T1 and T2 all can be decided according to the actual requirements, such as, the value that the value of T1 can be 10, T2 can be 50.

Two) according to the two-value foreground image of the up-to-date background model generation monitoring image X obtaining

For the each pixel in monitoring image X, can carry out respectively following processing:

Calculate the difference d between the gray-scale value of this pixel and average corresponding to pixel identical with the coordinate position of this pixel in the up-to-date background image obtaining;

Calculate wherein, σ represents variance corresponding to pixel identical with the coordinate position of this pixel in the up-to-date background image obtaining;

Determine

result of calculation whether be greater than predetermined threshold T0, if so, the value of this pixel is set to 1, otherwise, be set to 0, thereby generate the two-value foreground image of monitoring image X.

The concrete value of T0 can be decided according to the actual requirements, such as being 9.

After generating the two-value foreground image of monitoring image X, also can carry out successively dilation and erosion operation to the two-value foreground image generating, the isolated point forming to remove noise, and then guarantee whether subsequent detection has the accuracy of people's testing result.

Three) obtain respectively the Edge texture information of monitoring image X and the up-to-date background image obtaining, and determine the edge similarity of monitoring image X and the up-to-date background image obtaining according to the Edge texture information getting

Specific implementation can comprise:

1) obtain respectively horizontal edge image and the vertical edge image of monitoring image X, and obtain respectively horizontal edge image and the vertical edge image of the up-to-date background image obtaining.

In actual applications, can utilize Sobel operator to obtain respectively horizontal edge image and the vertical edge image of monitoring image X and the up-to-date background image obtaining, how be retrieved as prior art.

Fig. 2 is the schematic diagram of existing Sobel operator.As shown in Figure 2, can utilize the Sobel operator on the left side to obtain the horizontal edge image of monitoring image X and the up-to-date background image obtaining, utilize the Sobel algorithm on the right obtain monitoring image X and the up-to-date background image obtaining vertical edge image.

2) according to the horizontal edge image of monitoring image X and vertical edge image, for the each pixel in monitoring image X, calculate respectively the gradient magnitude I of this pixel _gxy:

I _gxy＝|I _gx|+|I _gy|； (4)

Wherein, I _gxrepresent the horizontal gradient value of this pixel, I _gyrepresent the vertical gradient value of this pixel, || represent to take absolute value;

According to the horizontal edge image of the up-to-date background image obtaining and vertical edge image, for the each pixel in the up-to-date background image obtaining, calculate respectively the gradient magnitude B of this pixel _gxy:

B _gxy＝|B _gx|+|B _gy|；(5)

Wherein, B _gxrepresent the horizontal gradient value of this pixel, B _gyrepresent the vertical gradient value of this pixel.

3) the edge similarity ESIM of calculating monitoring image X and the up-to-date background image obtaining:

ESIM = \frac{Σ (2 * I_{gxy} * B_{gxy})}{Σ ({I_{gxy}}^{2} + {B_{gxy}}^{2})}; - - - (6)

Wherein, the span of x is from 1 to E, and the span of y is from 1 to F, and E represents the pixel number in monitoring image X horizontal direction, and F represents the pixel number on monitoring image X vertical direction.

Four) determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined

Specific implementation can comprise:

1) the pixel number Fg that in the two-value foreground image of Statistical monitor image X, value is 1 _num.

2) determine whether to meet the following conditions:

Flag＝(Fg _num/Area＞T3)∩(ESIM＜T4)；(7)

Wherein, Area represents the product of the pixel number on pixel number and the vertical direction in monitoring image X horizontal direction, and T3 and T4 all represent predetermined threshold, ∩ represent and;

If meet above-mentioned condition, the value of Flag is 1, determine in monitoring image X and have people, otherwise unmanned.

The concrete value of T3 and T4 all can be decided according to the actual requirements, such as, the value that the value of T3 can be 0.6, T4 can be 0.8.

Consider that brightness prospect changes interference to light and waits comparatively responsive, whether the simple brightness prospect that relies on has people's detection may cause erroneous judgement, therefore judge in monitoring image X, whether there is people in conjunction with brightness prospect and Edge texture information, thereby improved the accuracy of testing result.

So far, completed the introduction about the inventive method embodiment.

Based on above-mentioned introduction, the present invention discloses a kind of ATM monitoring scene pick-up unit based on video, comprising:

MBM, for setting up the background model about ATM monitoring scene, comprises preset parameter corresponding to each pixel difference of determining in background image and background image, and set up background model is sent to detection module;

Detection module, after completing in modeling, in the time often getting a frame monitoring image X, carries out respectively following processing: the two-value foreground image that generates monitoring image X according to background model; Obtain respectively the Edge texture information of monitoring image X and background image, and determine the edge similarity of monitoring image X and background image according to the Edge texture information getting; Determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined.

Wherein, in MBM, can comprise:

The first processing unit, for obtaining successively M frame monitoring image, M is greater than 1 positive integer, and the each frame monitoring image getting is sent to respectively to the second processing unit;

The second processing unit, be used for the first frame monitoring image image as a setting receiving, and for the each pixel in this background image, respectively using the gray-scale value of this pixel as average corresponding to this pixel, using the variance of the gray-scale value of this pixel as variance corresponding to this pixel;

Afterwards, in the time often receiving a frame monitoring image, be handled as follows respectively:

Determine the background image B after upgrading _new(x, y):

B _new(x，y)＝(1-ρ)B _old(x，y)+ρI(x，y)； (1)

Wherein, ρ represents renewal rate, and its value equals 1/N, and N represents the monitoring image number receiving, and I (x, y) represents the up-to-date monitoring image receiving, B _old(x, y) represents the background image before renewal;

In detection module, can comprise:

The 3rd processing unit, for obtaining successively each frame monitoring image, and sends to respectively fourth processing unit by the each frame monitoring image getting;

Fourth processing unit, in the time often receiving a frame monitoring image X, carries out respectively following processing: the two-value foreground image that generates monitoring image X according to background model; Obtain respectively the Edge texture information of monitoring image X and background image, and determine the edge similarity of monitoring image X and background image according to the Edge texture information getting; Determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined.

In detection module, also can further comprise:

The 5th processing unit, for determining when fourth processing unit after whether monitoring image X have people, utilizes monitoring image X to upgrade original background model;

Correspondingly, fourth processing unit generates the two-value foreground image of monitoring image X according to the up-to-date background model obtaining; Obtain respectively the Edge texture information of monitoring image X and the up-to-date background image obtaining, and determine the edge similarity of monitoring image X and the up-to-date background image obtaining according to the Edge texture information getting.

Particularly,

The 5th processing unit is determined the background image B after upgrading _new(x, y):

B _new(x，y)＝(1-ρ)B _old(x，y)+ρI(x，y)； (1)

Wherein, I (x, y) represents monitoring image X, B _old(x, y) represents the background image before renewal;

For B _neweach pixel in (x, y), respectively using the gray-scale value of this pixel as average corresponding to this pixel, by (1-ρ) σ _old+ ρ d is as variance corresponding to this pixel; Wherein, σ _oldrepresent B _oldvariance corresponding to pixel identical with the coordinate position of this pixel in (x, y), d represents gray-scale value and the B of pixel identical with the coordinate position of this pixel in I (x, y) _olddifference between the average that in (x, y), the pixel identical with the coordinate position of this pixel is corresponding;

Wherein, ρ represents renewal rate;

In the time determining in I (x, y) nobody, the value of ρ is set to 0.01;

When determining while having people in I (x, y), the value of ρ is set to 0;

From T to T-t, in this time period, in ATM monitoring scene, have people when determining, and ATM monitoring scene is always in the time of transfixion state, the value of ρ is set to the moment that 1, T represents to get I (x, y), t > 0 always.

For any two two field picture I that get in this time period from T to T-t ₁(x, y) and I ₂(x, y), the 5th processing unit carries out respectively following processing:

Calculate Dif (x, y)=I ₁(x, y)-I ₂(x, y); (3)

For Dif (x, y) the each pixel in, whether the gray-scale value of determining respectively this pixel is greater than predetermined threshold T1, if so, the value of this pixel is set to 1, otherwise, be set to 0, obtain the poor bianry image Dif_Fg of frame (x, y) of Dif (x, y);

The number Dif_um of the pixel that in statistics Dif_Fg (x, y), value is 1, and whether definite Dif_Num be less than predetermined threshold T2, if so, determines I ₁(x, y) and I ₂between (x, y) in transfixion state;

If between any two two field pictures that get in this time period from T to T-t all in transfixion state, determine from T to T-t in this time period ATM monitoring scene always in transfixion state.

In fourth processing unit, can specifically comprise again:

Foreground detection subelement, for generating the two-value foreground image of monitoring image X according to the up-to-date background model obtaining, and sends to analysis subelement by the two-value foreground image of generation;

Edge similarity is determined subelement, for obtaining respectively the Edge texture information of monitoring image X and the up-to-date background image obtaining, and determine according to the Edge texture information getting and the edge similarity of monitoring image X and the up-to-date background image obtaining the edge similarity of determining is sent to analysis subelement;

Analyze subelement, for determining according to the two-value foreground image and the edge similarity that receive whether monitoring image X has people.

Foreground detection subelement, for the each pixel in monitoring image X, carries out respectively following processing:

Calculate d ²(σ ²) ^-1; Wherein, σ represents variance corresponding to pixel identical with the coordinate position of this pixel in the up-to-date background image obtaining;

Determine

result of calculation whether be greater than predetermined threshold T0, if so, the value of this pixel is set to 1, otherwise, be set to 0.

Foreground detection subelement can be further used for, and after generating the two-value foreground image of monitoring image X, this two-value foreground image is carried out to dilation and erosion operation successively, and the two-value foreground image after dilation and erosion operation is sent to analysis subelement.

Edge similarity determines that subelement obtains respectively horizontal edge image and the vertical edge image of monitoring image X, and obtains respectively horizontal edge image and the vertical edge image of the up-to-date background image obtaining;

According to the horizontal edge image of monitoring image X and vertical edge image, for the each pixel in monitoring image X, calculate respectively the gradient magnitude I of this pixel _gxy:

I _gxy＝|I _gx|+|I _gy|；(4)

B _gxy＝|B _gx|+|B _gy|； (5)

Wherein, B _gxrepresent the horizontal gradient value of this pixel, B _gyrepresent the vertical gradient value of this pixel;

Calculate the edge similarity ESIM of monitoring image X and the up-to-date background image obtaining:

ESIM = \frac{Σ (2 * I_{gxy} * B_{gxy})}{Σ ({I_{gxy}}^{2} + {B_{gxy}}^{2})}; - - - (6)

The pixel number Fg that in the two-value foreground image of analysis subelement Statistical monitor image X, value is 1 _num;

Determine whether to meet the following conditions:

Flag＝(Fg _num/Area＞T3)∩(ESIM＜T4)； (7)

Wherein, Area represents the product of the pixel number on pixel number and the vertical direction in monitoring image X horizontal direction, and T3 and T4 all represent predetermined threshold;

If meet above-mentioned condition, determine in monitoring image X and have people, otherwise unmanned.

The specific works flow process of said apparatus embodiment please refer to the respective description in preceding method embodiment, repeats no more herein.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims

1. the ATM (Automatic Teller Machine) ATM monitoring scene detection method based on video, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, the each pixel in described definite background image and the background image respectively preset parameter of correspondence comprises:

For the each pixel in this background image, respectively using the gray-scale value of this pixel as average corresponding to this pixel, using the variance of the gray-scale value of this pixel as variance corresponding to this pixel;

Whether B, definite monitoring image number getting equal M, and M is greater than 1 positive integer, if so,, using the up-to-date background image obtaining as final required background image, complete modeling, if not, obtain the monitoring image that a frame is new, and perform step C;

Background image B after C, definite renewal _new(x, y): B _new(x, y)=(1-ρ) B _old(x, y)+ρ I (x, y); Wherein, ρ represents renewal rate, and its value equals 1/N, and N represents the monitoring image number getting, and I (x, y) represents the up-to-date monitoring image getting, B _old(x, y) represents the background image before renewal;

For B _neweach pixel in (x, y), respectively using the gray-scale value of this pixel as average corresponding to this pixel, by (1-ρ) σ _old+ ρ d is as variance corresponding to this pixel; Wherein, σ _oldrepresent B _oldvariance corresponding to pixel identical with the coordinate position of this pixel in (x, y), d represents gray-scale value and the B of pixel identical with the coordinate position of this pixel in I (x, y) _olddifference between the average that in (x, y), the pixel identical with the coordinate position of this pixel is corresponding; Repeated execution of steps B.

3. method according to claim 1, is characterized in that,

After whether having people in described definite monitoring image X, further comprise: utilize monitoring image X to upgrade original background model;

The described two-value foreground image according to background model generation monitoring image X comprises: the two-value foreground image that generates monitoring image X according to the up-to-date background model obtaining;

The described Edge texture information of obtaining respectively monitoring image X and background image, and determine that according to the Edge texture information getting the edge similarity of monitoring image X and background image comprises: obtain respectively the Edge texture information of monitoring image X and the up-to-date background image obtaining, and determine the edge similarity of monitoring image X and the up-to-date background image obtaining according to the Edge texture information getting.

4. method according to claim 3, is characterized in that, the described monitoring image X that utilizes upgrades and comprises original background model:

Determine the background image B after upgrading _new(x, y): B _new(x, y)=(1-ρ) B _old(x, y)+ρ I (x, y); Wherein, I (x, y) represents monitoring image X, B _old(x, y) represents the background image before renewal;

Wherein, ρ represents renewal rate;

In the time determining in I (x, y) nobody, the value of ρ is set to 0.01;

When determining while having people in I (x, y), the value of ρ is set to 0;

In this time period, in described ATM monitoring scene, there is people when determining from T to T-t always, and described ATM monitoring scene is always in the time of transfixion state, and the value of ρ is set to 1, T and represents to get I (x, y) moment, t > 0.

5. method according to claim 4, is characterized in that, described determine from T to T-t in this time period described ATM monitoring scene comprise in transfixion state always:

Calculate Dif (x, y)=I ₁(x, y)-I ₂(x, y); Wherein, Dif (x, y) represents frame difference image, I ₁(x, y) is prior to I ₂(x, y) gets;

If between any two two field pictures that get in this time period from T to T-t all in transfixion state, determine from T to T-t in this time period described ATM monitoring scene always in transfixion state.

6. according to the method described in claim 3,4 or 5, it is characterized in that, the described two-value foreground image that generates monitoring image X according to the up-to-date background model obtaining comprises:

For the each pixel in monitoring image X, carry out respectively following processing:

Determine

7. according to the method described in claim 3,4 or 5, it is characterized in that, after the two-value foreground image of described generation monitoring image X, further comprise:

The two-value foreground image of monitoring image X is carried out to dilation and erosion operation successively.

8. method according to claim 6, it is characterized in that, the described Edge texture information of obtaining respectively monitoring image X and the up-to-date background image obtaining, and determine that according to the Edge texture information getting the edge similarity of monitoring image X and the up-to-date background image obtaining comprises:

Obtain respectively horizontal edge image and the vertical edge image of monitoring image X, and obtain respectively horizontal edge image and the vertical edge image of the up-to-date background image obtaining;

According to the horizontal edge image of monitoring image X and vertical edge image, for the each pixel in monitoring image X, calculate respectively the gradient magnitude I of this pixel _gxy: I _gxy=| I _gx|+| I _gy| wherein, I _gxrepresent the horizontal gradient value of this pixel, I _gyrepresent the vertical gradient value of this pixel, || represent to take absolute value;

According to the horizontal edge image of the up-to-date background image obtaining and vertical edge image, for the each pixel in the up-to-date background image obtaining, calculate respectively the gradient magnitude B of this pixel _gxy: B _gxy=| B _gx|+| B _gy|; Wherein, B _gxrepresent the horizontal gradient value of this pixel, B _gyrepresent the vertical gradient value of this pixel;

Calculate the edge similarity ESIM of monitoring image X and the up-to-date background image obtaining: wherein, the span of x is from 1 to E, and the span of y is from 1 to F, and E represents the pixel number in monitoring image X horizontal direction, and F represents the pixel number on monitoring image X vertical direction.

9. method according to claim 8, is characterized in that, described according to generate two-value foreground image and the edge similarity of determining determine in monitoring image X, whether someone comprises:

The pixel number Fg that in the two-value foreground image of Statistical monitor image X, value is 1 _num;

Determine whether to meet the following conditions: Flag=(Fg _num/ Area > T3) ∩ (ESIM < T4); Wherein, Area represents the product of the pixel number on pixel number and the vertical direction in monitoring image X horizontal direction, and T3 and T4 all represent predetermined threshold;

10. the ATM (Automatic Teller Machine) ATM monitoring scene pick-up unit based on video, is characterized in that, comprising:

11. devices according to claim 10, is characterized in that, described MBM comprises:

Described the second processing unit, be used for the first frame monitoring image image as a setting receiving, and for the each pixel in this background image, respectively using the gray-scale value of this pixel as average corresponding to this pixel, using the variance of the gray-scale value of this pixel as variance corresponding to this pixel;

Determine the background image B after upgrading _new(x, y): B _new(x, y)=(1-ρ) B _old(x, y)+ρ I (x, y); Wherein, ρ represents renewal rate, and its value equals 1/N, and N represents the monitoring image number receiving, and I (x, y) represents the up-to-date monitoring image receiving, B _old(x, y) represents the background image before renewal;

12. devices according to claim 10, is characterized in that, described detection module comprises:

Described fourth processing unit, in the time often receiving a frame monitoring image X, carries out respectively following processing: the two-value foreground image that generates monitoring image X according to background model; Obtain respectively the Edge texture information of monitoring image X and background image, and determine the edge similarity of monitoring image X and background image according to the Edge texture information getting; Determine in monitoring image X, whether there is people according to the two-value foreground image generating and the edge similarity determined.

13. devices according to claim 12, is characterized in that, described detection module further comprises:

The 5th processing unit, for determining when described fourth processing unit after whether monitoring image X have people, utilizes monitoring image X to upgrade original background model;

Described in described fourth processing unit, generate the two-value foreground image of monitoring image X according to the up-to-date background model obtaining; Obtain respectively the Edge texture information of monitoring image X and the up-to-date background image obtaining, and determine the edge similarity of monitoring image X and the up-to-date background image obtaining according to the Edge texture information getting.

14. devices according to claim 13, is characterized in that,

Described the 5th processing unit is determined the background image B after upgrading _new(x, y):

B _new(x, y)=(1-ρ) B _old(x, y)+ρ I (x, y); Wherein, I (x, y) represents monitoring image X, B _old(x, y) represents the background image before renewal;

Wherein, ρ represents renewal rate;

In the time determining in I (x, y) nobody, the value of ρ is set to 0.01;

When determining while having people in I (x, y), the value of ρ is set to 0;

15. devices according to claim 14, is characterized in that,

For any two two field picture I that get in this time period from T to T-t ₁(x, y) and I ₂(x, y), described the 5th processing unit carries out respectively following processing:

16. according to the device described in claim 13,14 or 15, it is characterized in that, described fourth processing unit comprises:

Edge similarity is determined subelement, for obtaining respectively the Edge texture information of monitoring image X and the up-to-date background image obtaining, and determine according to the Edge texture information getting and the edge similarity of monitoring image X and the up-to-date background image obtaining the edge similarity of determining is sent to described analysis subelement;

Described analysis subelement, for determining according to the two-value foreground image and the edge similarity that receive whether monitoring image X has people.

17. devices according to claim 16, is characterized in that,

Described foreground detection subelement, for the each pixel in monitoring image X, carries out respectively following processing:

Determine

18. devices according to claim 16, is characterized in that,

Described foreground detection subelement is further used for, after generating the two-value foreground image of monitoring image X, this two-value foreground image is carried out to dilation and erosion operation successively, and the two-value foreground image after dilation and erosion operation is sent to described analysis subelement.

19. devices according to claim 17, is characterized in that,

Described edge similarity determines that subelement obtains respectively horizontal edge image and the vertical edge image of monitoring image X, and obtains respectively horizontal edge image and the vertical edge image of the up-to-date background image obtaining;

According to the horizontal edge image of monitoring image X and vertical edge image, for the each pixel in monitoring image X, calculate respectively the gradient magnitude I of this pixel _gxy: I _gxy=| I _gx|+| I _gy|; Wherein, I _gxrepresent the horizontal gradient value of this pixel, I _gyrepresent the vertical gradient value of this pixel, || represent to take absolute value;

20. devices according to claim 19, is characterized in that,

The pixel number Fg that in the two-value foreground image of described analysis subelement Statistical monitor image X, value is 1 _num;