CN107274432B

CN107274432B - A kind of intelligent video monitoring method

Info

Publication number: CN107274432B
Application number: CN201710434834.7A
Authority: CN
Inventors: 王田; 乔美娜; 陈阳; 陶飞
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-06-10
Filing date: 2017-06-10
Publication date: 2019-07-26
Anticipated expiration: 2037-06-10
Also published as: CN107274432A

Abstract

The present invention relates to the common scene intelligent video monitoring methods that a kind of view-based access control model conspicuousness and depth encode certainly, it include: that single frames decomposition is carried out to video, motion information is extracted using vision significance, then the light stream of consecutive frame moving object is calculated, detection process later is divided into two processes of training and test, in training, using the light stream of training sample as the input from coding, entire autoencoder network is trained by minimizing loss function, in test phase, respectively using trained and test sample light stream as input, extract the encoder in trained autoencoder network, the feature of input is extracted by dimensionality reduction, then the result after dimensionality reduction is visualized, the visualization range of training sample is indicated using suprasphere, in input test sample, it is visualized using same method, if the visual result of sample falls into suprasphere range It is interior, then determine that sample is normal；Conversely, falling in except suprasphere range, determines that sample is abnormal, be achieved in the intelligent monitoring of video.

Description

A kind of intelligent video monitoring method

Technical field

The present invention relates to image processing techniques, public from what is encoded more particularly to a kind of view-based access control model conspicuousness and depth Scene intelligent video monitoring method.

Background technique

In recent years, monitoring device was used in all trades and professions, and the common scenes such as modern airport, station, hospital are covered into Thousand monitoring devices up to ten thousand depend merely on Security Personnel's analysis, filter out the normal behaviour under normal scene since video data is numerous, Note abnormalities behavior in time, is a very big workload, and increasing with analysis quantity, the attention and work effect of personnel Rate can generate apparent decline, in order to free people from a large amount of analysis and understanding, study a kind of intelligent video monitoring side Method is of great significance.

Intelligent monitor system relates generally to the content of three parts: the extraction of motion information in video, i.e., in extraction video Moving target, since monitoring system is fixed, so this part is mainly the motion information for extracting foreground target in video；Row The extraction being characterized, a major challenge in intelligent monitor system will have the characteristics that uniqueness, robustness, extract feature；It is abnormal Behavioral value is divided into rule-based detection, such as detects whether target violates certain predefined rules, and based on statistics The mode of behavior is found in detection in a large amount of sample, use pattern knows method for distinguishing and model carries out abnormal behaviour identification. Existing second of technology multidigit, use pattern knows method for distinguishing to be identified, but this method precision compares deep learning Method precision it is low, therefore service precision of the present invention is higher carries out abnormal row based on depth autoencoder network in deep learning For identification.

Summary of the invention

In view of this, the main purpose of the present invention is to provide a kind of detection accuracy, high, strong robustness view-based access control model is aobvious Work property and depth substantially increase detection accuracy from the common scene intelligent video monitoring method of coding, meanwhile, it can cope with a variety of Abnormal behaviour identification under scene, robustness are very strong.

In order to achieve the above object, technical solution proposed by the present invention are as follows: a kind of view-based access control model conspicuousness and depth are self-editing The common scene intelligent video monitoring method of code realizes that steps are as follows:

Video under step 1, reading common scene, resolves into single frame for video, is then based on difference of Gaussian assemblage zone Bandpass filter calculates the vision significance figure of each frame, extracts motion information with this；

Step 2, on the basis of every frame Saliency maps, the light stream of consecutive frame is calculated, to extract the movement of foreground target Information obtains motion feature；

Step 3 includes two processes of training and test, in the training process, calculating training in the algorithm of anomalous identification The vision significance figure of sample simultaneously extracts motion feature, and obtained Optical-flow Feature is converted to column vector as depth and encodes net certainly The input of network loses letter by minimizing using the dimensionality reduction of encoder in depth autoencoder network and the restructuration of decoder Number rebuilds input, training depth autoencoder network；

Step 4 rebuilds input by minimizing loss function, after training depth autoencoder network, extracts trained depth The encoder section for spending autoencoder network is used as the network in test process, calculates separately out showing for training sample and test sample After work property figure and motion feature, using the Optical-flow Feature of each sample as the input of encoder in depth autoencoder network, pass through The dimensionality reduction of the encoder network operates, and the low-dimensional feature that can most represent input is extracted with low-dimensional vector；

Step 5, in three-dimensional coordinate during visual testing encoder network as a result, indicating it with a suprasphere Distribution after middle training sample dimensionality reduction；

Step 6, for the anomalous identification of input test sample, if the visual range of test sample falls into the model of suprasphere In enclosing, then determine the test sample for normal sequence；Conversely, falling in other than suprasphere range, then determine that the test sample is different Chang Xulie is achieved in the identification of abnormal behaviour, the intelligent monitoring of video under common scene.

The method of vision significance figure is as follows in the step 1:

Step i) is for a frame image, the significance of each point in image is defined as:

S (x, y)=| | I_μ-I_whc(x,y)||

Wherein, I_μThe mean value of input picture each pixel color in Lab space, I_whc(x, y) is to carry out height to image After this is fuzzy, for each pixel in the value of Lab space, S (x, y) indicates the significance of each pixel, for both Euclidean away from From；

Step ii) Gaussian Blur, two-dimensional gauss of distribution function are carried out to image first are as follows:

Wherein, x and y respectively corresponds the transverse and longitudinal coordinate of 8 points around central point, and σ is the variance of gauss of distribution function, G (x, It y) is the fog-level of each pixel；

For color image, in R, convolution operation is done using Gaussian kernel and original image respectively in tri- channels G, B, will be each The result in channel merges, as the image after Gaussian Blur, respectively by after Gaussian Blur image and original image be transformed into Lab sky Between

Step iii) calculate Gaussian Blur after each pixel of image Lab value I_whcThe each pixel of (x, y) and original image In the mean value I of Lab space color_μ, the Euclidean distance of the two is calculated to get the vision significance figure of original image is arrived.

The step 3 uses the detailed process for training depth autoencoder network are as follows:

Only include normal sample in step i) training sample, in the training process, calculates training sample consecutive frame image Optical-flow Feature is converted to column vector by Optical-flow Feature, is one to connect entirely from coding as the input of depth autoencoder network Mode, allow output as closely as possible equal to input structure be input layer-hidden layer-output layer network, whole network is by left-half Encoder and right half part decoder composition, encoder is used for Data Dimensionality Reduction, and the characteristic information of input can most be represented by extracting； Decoder rebuilds being originally inputted for whole network using the output of encoder as the input of decoder with error as small as possible, Depth autoencoder network is increased in encoder network and decoder network several hidden on the basis of autoencoder network Layer；

Step ii) with light stream be input X={ x₁,x₂...x_n, the activation primitive of network uses ReLU function f (x)=max (0, x), wherein x is the input of activation primitive, i.e., independent variable, f (x) are the dependent variables of activation primitive, and the first half of network is The output of encoder network are as follows: Z=f (wX+b), wherein w is the weight of encoder network, and b is the biasing of encoder network, Z For the output of encoder network, i.e. Z is after X dimensionality reduction as a result, the characteristic information of X can be represented；The latter half of network decodes The output of device are as follows: Y=f (w'Z+b'), wherein w' is the weight of decoder network, and b' is the biasing of decoder network, i.e. Y is X Reconstruction, entire encoder network indicates with formulating are as follows: Y=f (w'(f (wX+b))+b').

Step iii) loss function use mean square error: MSE=| | X-Y | |²=| | X-f (w'(f (wX+b))+b') | |², And loss function is minimized to rebuild input, it is exactly the training process by depth autoencoder network, keeps mean square error minimum, this When output be input reconstruction.

The step 4 extracts the encoder section of trained depth autoencoder network as the network mistake in test process Journey are as follows:

Step i) is firstly, the pretreatment of image is similar with training process, with the Optical-flow Feature of training sample and test sample Input of the column vector of conversion as network；

Step ii) it is different from network used in training process, the training that training process obtains is extracted in test process Network of the encoder as test process in good depth autoencoder network, is acted on using the dimensionality reduction of encoder network, will be defeated Enter 3 neurons of boil down to, by the characteristics of encoder it is found that these three neurons can comprising input all information.

In conclusion the common scene intelligent video that a kind of view-based access control model conspicuousness of the present invention and depth encode certainly Monitoring method, comprising: single frames decomposition is carried out to the video under common scene, it is aobvious using vision in decompositing the video frame come Work property extracts motion information, then the light stream of calculating consecutive frame moving object, the size and Orientation including movement velocity, later Detection process is divided into two processes of training and test, in training, the input of coding certainly using the light stream of training sample as depth, Entire depth autoencoder network is trained by minimizing loss function, in test phase, respectively with trained and test sample Light stream extracts the encoder in trained depth autoencoder network as input, is mentioned by the dimensionality reduction effect of encoder network The feature for taking input, according to the characteristic of encoder network, the feature after dimensionality reduction can represent all information of input, then visualize It is after dimensionality reduction as a result, using suprasphere indicate training sample visualization range use same side in input test sample Method visualization, if the visual result of sample is fallen within the scope of suprasphere, determines that sample is normal；Conversely, falling in suprasphere model Except enclosing, determines that sample is abnormal, be achieved in the intelligent monitoring of video.

The advantages of the present invention over the prior art are that:

(1) present invention tentatively extracts motion information using vision significance and optical flow method based on the identification of abnormal behaviour, Then feature is extracted using the depth self-encoding encoder in deep learning, is trained and detects, since depth self-encoding encoder can be with It minimizes loss function and rebuilds input, the dimensionality reduction effect of encoder can extract the low-dimensional feature that can indicate input information, so mentioning The feature taken has very strong robustness, and just because of the robustness of feature, it can efficiently carry out very much the knowledge of abnormal behaviour Not, arithmetic accuracy is improved.Due to using suprasphere to indicate normal range, when carrying out abnormal differentiation, it is only necessary to which judgement can Range depending on changing result, so judging that speed is fast.

(2) present invention have the characteristics that detection accuracy height, strong robustness, can be widely applied to Community Safety protection, hospital, The safeguard protection of the common scenes such as bank.By using the depth autoencoder network in optical flow method and deep learning, extracting can table Show the low-dimensional feature of object all information, judgement is accurate, strong robustness, due to using suprasphere to indicate normal range, into When the differentiation of row exception, it is only necessary to the range of visualization result is judged, so judging that speed is fast.

Detailed description of the invention

Fig. 1 is implementation flow chart of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, right below in conjunction with the accompanying drawings and the specific embodiments The present invention is described in further detail.

The common scene intelligent video monitoring method that a kind of view-based access control model conspicuousness and depth of the present invention encode certainly, Include: that single frames decomposition is carried out to video under common scene, in decompositing the video frame come, is extracted and moved using vision significance Information, then calculates the light stream of consecutive frame moving object, the size and Orientation including movement velocity, and detection process later is divided into Two processes of training and test, using the light stream of training sample as depth from the input of coding, are damaged in training by minimizing It loses function and trains entire depth autoencoder network, in test phase, respectively using trained and test sample light stream as inputting, The encoder in trained depth autoencoder network is extracted, the feature of input is extracted by the dimensionality reduction effect of encoder network, According to the characteristic of encoder network, the feature after dimensionality reduction can represent all information of input, then visualize dimensionality reduction after as a result, Indicate that the visualization range of training sample is visualized in input test sample using same method using suprasphere, if sample This visual result is fallen within the scope of suprasphere, then determines that sample is normal；Conversely, falling in except suprasphere range, sample is determined This exception is achieved in the intelligent monitoring of video.

As shown in Figure 1, the present invention is implemented as follows step:

Video under step 1), reading common scene, resolves into single frame for video, is then based on difference of Gaussian assemblage zone Bandpass filter calculates the vision significance figure of each frame, extracts motion information with this；

Step 2), on the basis of every frame Saliency maps, the light stream of consecutive frame is calculated, to extract the movement of foreground target Information obtains motion feature；

Step 3) includes two processes of training and test, in the training process, calculating training in the algorithm of anomalous identification The vision significance figure of sample simultaneously extracts motion feature, and the Optical-flow Feature of obtained every frame image is converted to column vector as deep The input of degree autoencoder network passes through minimum using the reconstruction of the dimensionality reduction and decoder of encoder in depth autoencoder network Loss function rebuilds input, training depth autoencoder network；

Step 4) rebuilds input by minimizing loss function, after training depth autoencoder network, extracts trained depth The encoder section for spending autoencoder network is used as the network in test process, calculates separately out showing for training sample and test sample After work property figure and motion feature, using the Optical-flow Feature of each picture frame sample as the input of encoder in depth autoencoder network, It is operated by the dimensionality reduction of the encoder network, the low-dimensional feature that can most represent input is extracted with low-dimensional vector；

Step 5), in three-dimensional coordinate during visual testing encoder network as a result, being indicated with a suprasphere The wherein distribution after training sample dimensionality reduction；

Step 6), for the anomalous identification of input test sample, if the visual range of test sample falls into suprasphere In range, then determine the test sample for normal sequence；Conversely, falling in other than suprasphere range, then determine that the test sample is Unusual sequences are achieved in the identification of abnormal behaviour, the intelligent monitoring of video under common scene.

The calculation method of vision significance figure is as follows in the step 1):

S (x, y)=| | I_μ-I_whc(x,y)||

In R, convolution operation is done using Gaussian kernel and original image respectively in tri- channels G, B, and the result in each channel is closed And the image as after Gaussian Blur, respectively by after Gaussian Blur image and original image be transformed into Lab space.

Step 3) the training depth autoencoder network principle is as follows:

Only include normal sample in step i) training sample, in the training process, calculates training sample consecutive frame image Optical-flow Feature is converted to column vector by Optical-flow Feature, is one to connect entirely from coding as the input of depth autoencoder network Mode allows output as closely as possible equal to the structure of input to be input layer-hidden layer-output layer network, whole network by encoder and Decoder composition, encoder are used for Data Dimensionality Reduction, extract the characteristic information that can most represent input；Decoder is with mistake as small as possible Difference rebuilds being originally inputted for whole network using the output of encoder as the input of decoder, and depth autoencoder network is certainly On the basis of coding network, several hidden layers are increased in encoder network and decoder network；

The step 4) extracts the encoder section of trained depth autoencoder network as the network in test process Detailed process are as follows:

Step ii) it is different from network used in training process, the training that training process obtains is extracted in test process Network of the encoder as test process in good depth autoencoder network, is acted on using the dimensionality reduction of encoder network, will be defeated Enter 3 neurons of boil down to, 3 neurons can include all information of input.

In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims

1. the common scene intelligent video monitoring method that a kind of view-based access control model conspicuousness and depth encode certainly, which is characterized in that real It is existing that steps are as follows:

Video under step 1, reading common scene, resolves into single frame for video, is then based on difference of Gaussian combination band logical filter Wave device calculates the vision significance figure of each frame, extracts motion information with this；

Step 2, on the basis of every frame Saliency maps, calculate the light stream of consecutive frame, to extract the motion information of foreground target, Obtain motion feature；

Step 3 includes two processes of training and test, in the training process, calculating training sample in the algorithm of anomalous identification Vision significance figure and extract motion feature, obtained Optical-flow Feature is converted into column vector as depth autoencoder network Input, using the dimensionality reduction of encoder in depth autoencoder network and the restructuration of decoder, by minimizing loss function weight Build input, training depth autoencoder network；

Step 4 passes through minimum loss function reconstruction input, after training depth autoencoder network, extracts trained depth certainly The encoder section of coding network calculates separately out the conspicuousness of training sample and test sample as the network in test process After figure and motion feature, using the Optical-flow Feature of each sample as the input of encoder in autoencoder network, by described self-editing The dimensionality reduction operation of encoder, the low-dimensional feature that can most represent input is extracted with low-dimensional vector in code network；

Step 5, in three-dimensional coordinate during visual testing encoder network as a result, indicating wherein to instruct with a suprasphere Distribution after practicing sample dimensionality reduction；

Step 6, for the anomalous identification of input test sample, if the visual range of test sample falls into the range of suprasphere It is interior, then determine the test sample for normal sequence；Conversely, falling in other than suprasphere range, then determine the test sample for exception Sequence is achieved in the identification of abnormal behaviour, the intelligent monitoring of video under common scene.

2. the common scene intelligent video monitoring that a kind of view-based access control model conspicuousness according to claim 1 and depth encode certainly Method, it is characterised in that: in the step 1, the method for calculating the vision significance figure of each frame is as follows:

Step i) is for a frame image, the significance of each point in image are as follows:

S (x, y)=| | I_μ-I_whc(x,y)||

Wherein, I_μThe mean value of input picture each pixel color in Lab space, I_whc(x, y) is to carry out Gaussian mode to image After paste, each pixel is in the value of Lab space, and S (x, y) indicates the significance of each pixel, for the Euclidean distance of the two；

Wherein, x and y respectively corresponds the transverse and longitudinal coordinate of 8 points around central point, and σ is the variance of gauss of distribution function, and G (x, y) is The fog-level of each pixel；

In R, convolution operation is done using Gaussian kernel and original image respectively in tri- channels G, B, the result in each channel is merged, i.e., For the image after Gaussian Blur；Respectively by after Gaussian Blur image and original image be transformed into Lab space；

Step iii) calculate Gaussian Blur after each pixel of image Lab value I_whcThe each pixel of (x, y) and original image exists The mean value I of Lab space color_μ, the Euclidean distance of the two is calculated to get the vision significance figure of original image is arrived.

3. the common scene intelligent video monitoring that a kind of view-based access control model conspicuousness according to claim 1 and depth encode certainly Method, it is characterised in that: in the step 3, the specific implementation process of training depth autoencoder network are as follows:

Only include normal sample in step i) training sample, in the training process, calculates the light stream of training sample consecutive frame image Optical-flow Feature is converted to column vector by feature, is one with full connection side from coding as the input of depth autoencoder network Formula, allow output as closely as possible equal to input structure be input layer-hidden layer-output layer network, whole network is by left-half The decoder of encoder and right half part composition, encoder are used for Data Dimensionality Reduction, extract the characteristic information that can most represent input；Solution Code device rebuilds being originally inputted for whole network using the output of encoder as the input of decoder with error as small as possible, deep Degree autoencoder network is on the basis of autoencoder network, and it is hidden in encoder network and decoder network to increase several Layer；

Step ii) with light stream be input X={ x₁,x₂...x_n, the activation primitive of network using ReLU function f (x)=max (0, X), wherein x is the input of activation primitive, i.e. independent variable, and f (x) is the dependent variable of activation primitive, and the first half of network is compiled The output of code device network are as follows: Z=f (wX+b), wherein w is the weight of encoder network, and b is the biasing of encoder network, and Z is The output of encoder network, i.e. Z are after X dimensionality reduction as a result, the characteristic information of X can be represented；Latter half, that is, decoder of network Output are as follows: Y=f (w'Z+b'), wherein w' be decoder network weight, b' be decoder network biasing, i.e. Y is X It rebuilds, entire encoder network is indicated with formulating are as follows: Y=f (w'(f (wX+b))+b')；

Step iii) loss function use mean square error: MSE=| | X-Y | |²=| | X-f (w'(f (wX+b))+b') | |², and most Smallization loss function inputs to rebuild, and is exactly the training process by depth autoencoder network, keeps mean square error minimum, at this time Output is the reconstruction of input.

4. the common scene intelligent video monitoring that a kind of view-based access control model conspicuousness according to claim 1 and depth encode certainly Method, it is characterised in that: in the step 4, the encoder section for extracting trained depth autoencoder network, which is used as, was tested The detailed process of network in journey are as follows:

Step i) is firstly, the pretreatment of image is similar with training process, with training sample and the conversion of the Optical-flow Feature of test sample Input of the column vector as network；

Step ii) it is different from network used in training process, extraction training process obtains trained in test process Network of the encoder as test process in depth autoencoder network is acted on using the dimensionality reduction of encoder network, input is pressed 3 neurons are condensed to, 3 neurons can include all information of input.