CN107274432A

CN107274432A - A kind of common scene intelligent video monitoring method of view-based access control model conspicuousness and depth own coding

Info

Publication number: CN107274432A
Application number: CN201710434834.7A
Authority: CN
Inventors: 王田; 乔美娜; 陈阳; 陶飞
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-06-10
Filing date: 2017-06-10
Publication date: 2017-10-20
Anticipated expiration: 2037-06-10
Also published as: CN107274432B

Abstract

The present invention relates to the common scene intelligent video monitoring method of a kind of view-based access control model conspicuousness and depth own coding, including：Single frames decomposition is carried out to video, movable information is extracted using vision significance, then the light stream of consecutive frame moving object is calculated, detection process afterwards is divided into two processes of training and test, in training, the input of own coding is used as using the light stream of training sample, whole autoencoder network is trained by minimizing loss function, in test phase, respectively to train the light stream with test sample to be used as input, extract the encoder in the autoencoder network trained, the feature of input is extracted by dimensionality reduction, then the result after dimensionality reduction is visualized, the visualization scope of training sample is represented using suprasphere, in input test sample, visualized using same method, if the visual result of sample is fallen into the range of suprasphere, then judge that sample is normal；Conversely, falling outside suprasphere scope, judge that sample is abnormal, be achieved in the intelligent monitoring of video.

Description

A kind of common scene intelligent video monitoring of view-based access control model conspicuousness and depth own coding Method

Technical field

The present invention relates to image processing techniques, more particularly to a kind of view-based access control model conspicuousness and depth own coding it is public Scene intelligent video frequency monitoring method.

Background technology

In recent years, monitoring device was used in all trades and professions, and the common scene such as modern airport, station, hospital is covered into Thousand supervising devices up to ten thousand, because video data is numerous, depends merely on Security Personnel's analysis, filter out the normal behaviour under normal scene, Note abnormalities behavior in time, be a very big workload, and increasing with analysis quantity, the notice of personnel and work are imitated Rate can produce obvious decline, in order to which people is freed from substantial amounts of analysis and understanding, study a kind of intelligent video monitoring side Method is significant.

Intelligent monitor system relates generally to the content of three parts：The extraction of movable information in video, that is, extract in video Moving target, because monitoring system is fixed, so this part is mainly the movable information for extracting foreground target in video；OK A major challenge in the extraction being characterized, intelligent monitor system, the features such as having uniqueness, robustness, extracts feature；It is abnormal Behavioral value, is divided into rule-based detection, and whether such as detection target violates some predefined rules, and based on statistics Detection, i.e., find the pattern of behavior, use pattern knows method for distinguishing and model carries out abnormal behaviour identification in substantial amounts of sample. Second of existing technology multidigit, use pattern knows method for distinguishing to be identified, but this method precision compares deep learning Method precision it is low, therefore service precision of the present invention is higher carries out abnormal row based on depth autoencoder network in deep learning For identification.

The content of the invention

In view of this, it is a primary object of the present invention to provide, a kind of accuracy of detection is high, strong robustness view-based access control model shows The common scene intelligent video monitoring method of work property and depth own coding, substantially increases accuracy of detection, meanwhile, it can tackle a variety of Abnormal behaviour identification under scene, robustness is very strong.

In order to achieve the above object, technical scheme proposed by the present invention is：A kind of view-based access control model conspicuousness and depth are self-editing The common scene intelligent video monitoring method of code, realizes that step is as follows：

Step 1, the video read under common scene, resolve into single frame by video, are then based on difference of Gaussian assemblage zone Bandpass filter, calculates the vision significance figure of each frame, movable information is extracted with this；

Step 2, on the basis of every frame Saliency maps, calculate consecutive frame light stream, so as to extract the motion of foreground target Information, obtains motion feature；

Step 3, in the algorithm of anomalous identification comprising train and test two processes, in the training process, calculate training The vision significance figure of sample simultaneously extracts motion feature, and obtained Optical-flow Feature is converted to column vector as depth own coding net The input of network, using the dimensionality reduction and the restructuration of decoder of encoder in depth autoencoder network, by minimizing loss letter Number rebuilds input, trains depth autoencoder network；

Step 4, rebuild after input, training depth autoencoder network by minimizing loss function, extract the depth that trains The encoder section of autoencoder network is spent as the network in test process, and the aobvious of training sample and test sample is calculated respectively After work property figure and motion feature, using the Optical-flow Feature of each sample as the input of encoder in depth autoencoder network, pass through The dimensionality reduction operation of the encoder network, the low-dimensional feature of input can most be represented by being extracted with low-dimensional vector；

Step 5, in three-dimensional coordinate during visual testing encoder network result, represent it with a suprasphere Distribution after middle training sample dimensionality reduction；

Step 6, the anomalous identification for input test sample, if the visual scope of test sample falls into the model of suprasphere In enclosing, then judge the test sample as normal sequence；Conversely, falling beyond suprasphere scope, then judge the test sample to be different Chang Xulie, is achieved in the intelligent monitoring of video under the identification of abnormal behaviour, common scene.

The method of vision significance figure is as follows in the step 1：

Step i) is for a two field picture, and the significance each put in image is defined as：

S (x, y)=| | I_μ-I_whc(x,y)||

Wherein, I_μThe average of input picture each pixel color in Lab space, I_whc(x, y) is to carry out height to image After this is fuzzy, each pixel is in the value of Lab space, and S (x, y) represents the significance of each pixel, be the Euclidean of the two away from From；

Step ii) Gaussian Blur is carried out to image first, two-dimentional gauss of distribution function is：

Wherein, x and y respectively correspond to central point around 8 points transverse and longitudinal coordinate, σ be gauss of distribution function variance, G (x, Y) it is the fog-level of each pixel；

For coloured image, in R, G, tri- passages of B do convolution operation using Gaussian kernel and original image respectively, will be each The result of passage merges, as the image after Gaussian Blur, and it is empty that the image and original image after Gaussian Blur are transformed into Lab respectively Between

Step iii) calculate Gaussian Blur after each pixel of image Lab values I_whcEach pixel of (x, y) and original image In the average I of Lab space color_μ, the Euclidean distance of the two is calculated, that is, obtains the vision significance figure of original image.

The step 3 using training depth autoencoder network detailed process be：

Normal sample is only included in step i) training samples, in the training process, the adjacent two field picture of training sample is calculated Optical-flow Feature, column vector is converted to by Optical-flow Feature, as the input of depth autoencoder network, and own coding is one to connect entirely Mode, allows output as closely as possible equal to the network that the structure of input is input layer-hidden layer-output layer, whole network is by left-half Encoder and right half part decoder composition, encoder be used for Data Dimensionality Reduction, extraction can most represent the characteristic information of input； Decoder is with error as small as possible, using the output of encoder as the input of decoder, rebuilds being originally inputted for whole network, Depth autoencoder network is on the basis of autoencoder network, to be added in encoder network and decoder network several hidden Layer；

Step ii) using light stream as input X={ x₁,x₂...x_n, the activation primitive of network uses ReLU function f (x)=max (0, x), wherein, x is the input of activation primitive, i.e. independent variable, and f (x) is the dependent variable of activation primitive, and the first half of network is Encoder network is output as：Z=f (wX+b), wherein, w is the weight of encoder network, and b is the biasing of encoder network, Z It is the result after X dimensionality reductions for the output of encoder network, i.e. Z, X characteristic information can be represented；The latter half of network is decoding Device is output as：Y=f (w'Z+b'), wherein, w' is the weight of decoder network, and b' is the biasing of decoder network, i.e. Y is X Reconstruction, whole encoder network is expressed as with formulating：Y=f (w'(f (wX+b))+b').

Step iii) loss function uses mean square error：MSE=| | X-Y | |²=| | X-f (w'(f (wX+b))+b') | |, And minimize loss function to rebuild input, it is exactly the training process by depth autoencoder network, makes mean square error minimum, this When output be input reconstruction.

The step 4 extracts the encoder section of the depth autoencoder network trained as the network mistake in test process Cheng Wei：

First, the pretreatment of image is similar with training process by step i), with training sample and the Optical-flow Feature of test sample The column vector of conversion as network input；

Step ii) it is different from network used in training process, the training that training process is obtained is extracted in test process Encoder in good depth autoencoder network is acted on as the network of test process using the dimensionality reduction of encoder network, will be defeated Enter 3 neurons of boil down to, from encoder the characteristics of, these three neurons can comprising input full detail.

In summary, the common scene intelligent video of a kind of view-based access control model conspicuousness and depth own coding of the present invention Monitoring method, including：Single frames decomposition is carried out to the video under common scene, it is aobvious using vision in the frame of video come is decomposited Work property extracts movable information, then calculates the light stream of consecutive frame moving object, includes the size and Orientation of movement velocity, afterwards Detection process is divided into two processes of training and test, in training, using the light stream of training sample as the input of depth own coding, Entire depth autoencoder network is trained by minimizing loss function, in test phase, respectively with training and test sample Light stream extracts the encoder in the depth autoencoder network trained as input, is carried by the dimensionality reduction effect of encoder network The feature of input is taken, according to the characteristic of encoder network, the feature after dimensionality reduction can represent the full detail of input, then visualize Result after dimensionality reduction, the visualization scope of training sample is represented using suprasphere, in input test sample, uses same side Method is visualized, if the visual result of sample is fallen into the range of suprasphere, judges that sample is normal；Conversely, falling in suprasphere model Outside enclosing, judge that sample is abnormal, be achieved in the intelligent monitoring of video.

The advantage of the present invention compared with prior art is：

(1) present invention tentatively extracts movable information based on the identification of abnormal behaviour using vision significance and optical flow method, Then feature is extracted using the depth self-encoding encoder in deep learning, is trained and detects, because depth self-encoding encoder can be with Minimize loss function and rebuild input, the dimensionality reduction effect of encoder can extract the low-dimensional feature that can represent to input information, so carrying The feature taken has very strong robustness, and just because of the robustness of feature, can efficiently carry out very much the knowledge of abnormal behaviour Not, arithmetic accuracy is improved.Due to representing normal scope using suprasphere, when carrying out abnormal differentiation, it is only necessary to which judgement can Scope depending on changing result, so judging that speed is fast.

(2) the features such as present invention has accuracy of detection high, strong robustness, can be widely applied to Community Safety protection, hospital, The safeguard protection of the common scenes such as bank.By using the depth autoencoder network in optical flow method and deep learning, extracting can table Show the low-dimensional feature of object full detail, judge accurate, strong robustness, due to representing normal scope using suprasphere, entering During the differentiation of row exception, it is only necessary to the scope of visualization result is judged, so judging that speed is fast.

Brief description of the drawings

Fig. 1 is implementation process figure of the present invention.

Embodiment

It is right below in conjunction with the accompanying drawings and the specific embodiments to make the object, technical solutions and advantages of the present invention clearer The present invention is described in further detail.

The common scene intelligent video monitoring method of a kind of view-based access control model conspicuousness and depth own coding of the present invention, Including：Single frames decomposition is carried out to video under common scene, in the frame of video come is decomposited, is extracted and moved using vision significance Information, then calculates the light stream of consecutive frame moving object, includes the size and Orientation of movement velocity, detection process afterwards is divided into Two processes of training and test, in training, using the light stream of training sample as the input of depth own coding, are damaged by minimizing Function is lost to train entire depth autoencoder network, in test phase, respectively to train the light stream with test sample as input, The encoder in the depth autoencoder network trained is extracted, the feature of input is extracted by the dimensionality reduction effect of encoder network, According to the characteristic of encoder network, the feature after dimensionality reduction can represent the full detail of input, then visualize the result after dimensionality reduction, The visualization scope of training sample is represented using suprasphere, in input test sample, is visualized using same method, if sample This visual result is fallen into the range of suprasphere, then judges that sample is normal；Conversely, falling outside suprasphere scope, sample is judged This exception, is achieved in the intelligent monitoring of video.

As shown in figure 1, the present invention is implemented as follows step：

Step 1), read common scene under video, video is resolved into single frame, difference of Gaussian assemblage zone is then based on Bandpass filter, calculates the vision significance figure of each frame, movable information is extracted with this；

Step 2), on the basis of every frame Saliency maps, calculate consecutive frame light stream, so as to extract the motion of foreground target Information, obtains motion feature；

Step 3), in the algorithm of anomalous identification comprising train and test two processes, in the training process, calculate training The vision significance figure of sample simultaneously extracts motion feature, and the Optical-flow Feature of obtained every two field picture is converted into column vector as depth The input of autoencoder network is spent, using the reconstruction of the dimensionality reduction and decoder of encoder in depth autoencoder network, by minimizing Loss function rebuilds input, trains depth autoencoder network；

Step 4), rebuild after input, training depth autoencoder network by minimizing loss function, extract the depth that trains The encoder section of autoencoder network is spent as the network in test process, and the aobvious of training sample and test sample is calculated respectively After work property figure and motion feature, using the Optical-flow Feature of each picture frame sample as the input of encoder in depth autoencoder network, Operated by the dimensionality reduction of the encoder network, the low-dimensional feature of input can most be represented by being extracted with low-dimensional vector；

Step 5), in three-dimensional coordinate during visual testing encoder network result, represented with a suprasphere Distribution wherein after training sample dimensionality reduction；

Step 6), for the anomalous identification of input test sample, if the visual scope of test sample falls into suprasphere In the range of, then judge the test sample as normal sequence；Conversely, fall beyond suprasphere scope, then judge the test sample as Unusual sequences, are achieved in the intelligent monitoring of video under the identification of abnormal behaviour, common scene.

The step 1) in vision significance figure computational methods it is as follows：

S (x, y)=| | I_μ-I_whc(x,y)||

In R, G, tri- passages of B do convolution operation using Gaussian kernel and original image respectively, and the result of each passage is closed And, the image and original image after Gaussian Blur are transformed into Lab space by the as image after Gaussian Blur respectively.

The step 3) train depth autoencoder network principle as follows：

Normal sample is only included in step i) training samples, in the training process, the adjacent two field picture of training sample is calculated Optical-flow Feature, column vector is converted to by Optical-flow Feature, as the input of depth autoencoder network, and own coding is one to connect entirely Mode, allows output as closely as possible equal to the network that the structure of input is input layer-hidden layer-output layer, whole network by encoder and Decoder is constituted, and encoder is used for Data Dimensionality Reduction, and extraction can most represent the characteristic information of input；Decoder is with mistake as small as possible Difference, using the output of encoder as the input of decoder, rebuilds being originally inputted for whole network, depth autoencoder network is certainly On the basis of coding network, several hidden layers are added in encoder network and decoder network；

The step 4) encoder section of the depth autoencoder network trained is extracted as the network in test process Detailed process is：

Step ii) it is different from network used in training process, the training that training process is obtained is extracted in test process Encoder in good depth autoencoder network is acted on as the network of test process using the dimensionality reduction of encoder network, will be defeated Enter 3 neurons of boil down to, 3 neurons can include the full detail of input.

In summary, presently preferred embodiments of the present invention is these are only, is not intended to limit the scope of the present invention. Within the spirit and principles of the invention, any modification, equivalent substitution and improvements made etc., should be included in the present invention's Within protection domain.

Claims

1. the common scene intelligent video monitoring method of a kind of view-based access control model conspicuousness and depth own coding, it is characterised in that real Existing step is as follows：

Step 1, the video read under common scene, single frame is resolved into by video, is then based on difference of Gaussian combination band logical filter Ripple device, calculates the vision significance figure of each frame, movable information is extracted with this；

Step 2, on the basis of every frame Saliency maps, calculate the light stream of consecutive frame, so as to extract the movable information of foreground target, Obtain motion feature；

Step 3, in the algorithm of anomalous identification comprising train and test two processes, in the training process, calculate training sample Vision significance figure and extract motion feature, obtained Optical-flow Feature is converted to column vector as depth autoencoder network Input, using the dimensionality reduction and the restructuration of decoder of encoder in depth autoencoder network, by minimizing loss function weight Input is built, depth autoencoder network is trained；

Step 4, rebuild after input, training depth autoencoder network by minimizing loss function, extract the depth that trains from The encoder section of coding network calculates the conspicuousness of training sample and test sample respectively as the network in test process After figure and motion feature, using the Optical-flow Feature of each sample as the input of encoder in autoencoder network, by described self-editing The dimensionality reduction operation of encoder in code network, the low-dimensional feature of input can most be represented by being extracted with low-dimensional vector；

Step 5, in three-dimensional coordinate during visual testing encoder network result, represent wherein to instruct with a suprasphere Practice the distribution after sample dimensionality reduction；

Step 6, the anomalous identification for input test sample, if the visual scope of test sample falls into the scope of suprasphere It is interior, then judge the test sample as normal sequence；Conversely, falling beyond suprasphere scope, then judge the test sample as exception Sequence, is achieved in the intelligent monitoring of video under the identification of abnormal behaviour, common scene.

2. the common scene intelligent video monitoring of a kind of view-based access control model conspicuousness according to claim 1 and depth own coding Method, it is characterised in that：In the step 1, the method for calculating the vision significance figure of each frame is as follows：

Step i) is for a two field picture, and the significance each put in image is：

S (x, y)=| | I_μ-I_whc(x,y)||

Wherein, I_μThe average of input picture each pixel color in Lab space, I_whc(x, y) is to carry out Gaussian mode to image After paste, each pixel is in the value of Lab space, and S (x, y) represents the significance of each pixel, is the Euclidean distance of the two；

<mrow> <mi>G</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <msup> <mi>&pi;&sigma;</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <mrow> <msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo> <msup> <mi>y</mi> <mn>2</mn> </msup> </mrow> <mrow> <mn>2</mn> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> </mrow> </mfrac> </mrow> </msup> </mrow>

Wherein, x and y correspond to the transverse and longitudinal coordinate of 8 points around central point respectively, and σ is the variance of gauss of distribution function, and G (x, y) is The fog-level of each pixel；

In R, G, tri- passages of B do convolution operation using Gaussian kernel and original image respectively, the result of each passage are merged, i.e., For the image after Gaussian Blur；The image and original image after Gaussian Blur are transformed into Lab space respectively；

Step iii) calculate Gaussian Blur after each pixel of image Lab values I_whcEach pixel of (x, y) and original image exists The average I of Lab space color_μ, the Euclidean distance of the two is calculated, that is, obtains the vision significance figure of original image.

3. the common scene intelligent video monitoring of a kind of view-based access control model conspicuousness according to claim 1 and depth own coding Method, it is characterised in that：In the step 3, the process that implements of training depth autoencoder network is：

Normal sample is only included in step i) training samples, in the training process, the light stream of the adjacent two field picture of training sample is calculated Feature, column vector is converted to by Optical-flow Feature, as the input of depth autoencoder network, and own coding is one with full connection side Formula, allows output as closely as possible equal to the network that the structure of input is input layer-hidden layer-output layer, whole network is by left-half The decoder composition of encoder and right half part, encoder is used for Data Dimensionality Reduction, and extraction can most represent the characteristic information of input；Solution Code device is with error as small as possible, using the output of encoder as the input of decoder, rebuilds being originally inputted for whole network, deep Degree autoencoder network is that on the basis of autoencoder network, it is hidden in encoder network and decoder network to add several Layer；

Step ii) using light stream as input X={ x₁,x₂...x_n, the activation primitive of network using ReLU function f (x)=max (0, X), wherein, x is the input of activation primitive, i.e. independent variable, f (x) is the dependent variable of activation primitive, and the first half of network is to compile Code device network is output as：Z=f (wX+b), wherein, w is the weight of encoder network, and b is the biasing of encoder network, and Z is The output of encoder network, i.e. Z are the results after X dimensionality reductions, can represent X characteristic information；The latter half of network is decoder It is output as：Y=f (w'Z+b'), wherein, w' is the weight of decoder network, and b' is the biasing of decoder network, i.e. Y is X Rebuild, whole encoder network is expressed as with formulating：Y=f (w'(f (wX+b))+b')；

Step iii) loss function uses mean square error：MSE=| | X-Y | |²=| | X-f (w'(f (wX+b))+b') | |, and it is minimum Change loss function and rebuild input, be exactly the training process by depth autoencoder network, make mean square error minimum, now defeated It is the reconstruction of input to go out.

4. the common scene intelligent video monitoring of a kind of view-based access control model conspicuousness according to claim 1 and depth own coding Method, it is characterised in that：In the step 4, the encoder section of depth autoencoder network that trains is extracted as testing The detailed process of network in journey is：

First, the pretreatment of image is similar with training process by step i), is changed with training sample and the Optical-flow Feature of test sample Column vector as network input；

Step ii) it is different from network used in training process, what what extraction training process was obtained in test process trained Encoder in depth autoencoder network is acted on using the dimensionality reduction of encoder network as the network of test process, input is pressed 3 neurons are condensed to, 3 neurons can include the full detail of input.