CN108805002A

CN108805002A - Monitor video accident detection method based on deep learning and dynamic clustering

Info

Publication number: CN108805002A
Application number: CN201810320572.6A
Authority: CN
Inventors: 徐向华; 刘李启明
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2018-04-11
Filing date: 2018-04-11
Publication date: 2018-11-13
Anticipated expiration: 2038-04-11
Also published as: CN108805002B

Abstract

The monitor video accident detection method based on deep learning and dynamic clustering that the present invention relates to a kind of.Feature extraction phases learn corresponding network filter by training video, and the pixel Optical-flow Feature of low layer is converted to high-rise semantic motion feature by depth network with deep learning network PCANet；Simultaneously by the screening to moving region in video, weed out include only background information temporal and spatial sampling block.In the feature modeling stage, characteristic vector space is modeled with the nonparametric model clustered based on two layers, and the method merged in opposite directions using vector in vectorial merging phase, the vector clusters in dictionary set are finally subjected to anomalous event judgement at a series of event cluster, and according to the Euclidean distance between test vector and event cluster center vector with K mean cluster algorithm.The present invention effectively avoids the feature vector caused by addition and shifts, and improves accident detection rate.

Description

Monitor video accident detection method based on deep learning and dynamic clustering

Technical field

The present invention relates to a kind of monitor video accident detection method, more particularly to one is based on deep learning and dynamic The monitor video accident detection method of cluster.

Background technology

With the development of computer science and technology, can be dashed forward using technologies such as image procossing, computer vision, machine learning The limitation of broken traditional video surveillance system is realized and is examined to the video intelligent analysis and the active of anomalous event of video monitoring system It surveys, real-time early warning, has important value for the video surveillance applications of public safety field.

Accident detection method is broadly divided into four basic steps in monitor video：Image preprocessing, elementary event table Show, build abnormality detection model and judge anomalous event.Wherein elementary event expression is broadly divided into based on lower-level vision feature Event indicates and the event based on high-level semantics feature indicates.Based on lower-level vision feature carry out event expression way be usually Video body is divided into small video block from overlapping, non-overlapping or space-time interest points mode, video block is regarded as substantially Event, from video block extract lower-level vision feature elementary event is indicated.Currently, special using more lower-level vision Sign has light stream, gradient, texture etc..Event based on high-level semantics feature indicates mainly to need to carry out data complicated pattern Processing, such as the methods of target space-time track, social force.Common accident detection model mainly has：Exception based on classification Event detection model, the accident detection model based on cluster, is based on statistics at the accident detection model based on arest neighbors Accident detection model, the accident detection model etc. based on information theory.

Although the accident detection method under monitor video is varied, most accident detection methods fortune Motion feature is modeled with parameter model, needs that many model parameters are voluntarily arranged among these, but parameter experience Value generally requires to re-start setting when changing video scene.In document《Online anomaly detection in videos by clustering dynamic exemplars》【J Feng,C Zhang,P Hao】In, author is for those The very low anomalous event of emerging or probability of occurrence in video, it is proposed that it is a kind of based on the nonparametric model of cluster come pair Feature vector is modeled, and extracts MHOF features in the video flowing of input first, then sequentially inputs these features It is merged in fixed-size dictionary set, then the dictionary set after merging is clustered with K mean algorithms；? Anomalous event judges the stage, which carries out abnormal judgement by the distance between judging characteristic vector and cluster code book.

The performance in detection anomalous event of above-mentioned algorithm is good, but there are still problems with：

1. the algorithm is described the movement in video using MHOF features, and the artificial construction feature such as HOF, HOG Although description effect it is pretty good, the applicability of various features is different in different video scenes, and changing scene often needs Used feature is changed simultaneously, the accident detection of more scenes is poorly suitable for；

2. the algorithm dictionary set vector merge in, using simple weighting summation mode, can cause in this way by After a large amount of vector updates, the value of the feature vector in dictionary set is deviated relative to original value, to final inspection Survey causes influence；

3. the detection in the algorithm for low frequency anomalous event passes through the frequency of occurrence for counting vectorial in dictionary set simultaneously The frequency accounting of corresponding code book is calculated to carry out, however feature extraction phases use be then entire image is carried out it is intensive Sampling, in this way when video scene is sparse scene, the feature vector sampled is largely background information, to wordbook Indicate that the frequency count value of the vector of background information will be very big in conjunction so that corresponding code book frequency accounting is excessively high, causes The frequency of other motion events has been both less than judgment threshold, causes flase drop.

Invention content

In view of the above-mentioned problems, the invention discloses a kind of monitor video anomalous event based on deep learning and dynamic clustering Detection method.This method carries out depth characteristic extraction to video sampling block automatically with PCANet, while being transported to sampling block Dynamic region screening, and cluster modeling is carried out to characteristic set based on two layers of Clustering Model of vector merging using one.

The technical scheme steps that the present invention solves the use of its technical problem are as follows：

Step S101：Image preprocessing.Read monitoring video flow as input, carry out gray processing and using gaussian filtering into Row noise reduction process.

Step S102：Overlap sampling.To inputting the video flowing of algorithm, each picture in each frame image in calculating first The light stream value of vegetarian refreshments is used in combination pixel light stream value to replace gray value；Then fixed-size overlap sampling, output are carried out to I A series of sizes are the video sampling image block of N × N.

Step S103：It screens moving region.For all video sampling image blocks that sampling obtains, histogram is used first Two-peak method counts to obtain the division threshold value for dividing and moving pixel and background pixel point in image, then according to the threshold value to each Sampled images block is judged, the sampled images block comprising motion event is filtered out, and includes only adopting for background information by those The rejecting of sample block is not considered.

Step S104：Depth characteristic is extracted.After being included only the sampled images block of movable information, these are regarded Frequency sampling image block is input in 3 layers of PCANet, to carry out parameter training；It finishes in depth network training and then once will Image block is input in trained depth network, and it is special that network exports corresponding depth for each sampled images block Sign.

Step S105：Dynamic clustering models.For depth characteristic vector set, feature vector is sequentially input into size first In fixed dictionary set, if collective number is more than the upper bound, immediate two feature vectors are merged to maintain Sum is constant；After safeguarding, cluster operation is carried out with K mean algorithms to dictionary set, exports corresponding event cluster code book.

Step S106：After model construction finishes, input test video samples simultaneously every frame image of test video Moving region judgement is carried out, then sampled images are input in trained PCANet and export corresponding depth characteristic, finally Feature vector is compared with event cluster code book, if being all higher than respective threshold value at a distance from all code books, it is determined that It is anomalous event.

Beneficial effects of the present invention：

1. the present invention comes to carry out depth characteristic extraction to sampling block with deep learning network, with traditional artificial structure of utilization It makes feature to compare, depth characteristic is more preferable for the robustness of video scene, and need not take time to be spy to a certain special scenes Sign chooses experiment to determine that the movement in scene is described with any feature.

2. the present invention safeguarded in fixed-size dictionary set, merges in opposite directions with two vectors in the model construction stage Method replace simple weighting summation, effectively avoid the feature vector caused by addition and shift, improve Accident detection rate.

3. the present invention adds moving region screening process, weed out useless background information before feature extraction, only To including that the sampling block significantly moved is subsequently calculated, algorithm detection speed is not only increased, and under sparse scene Improve accident detection rate.

Description of the drawings

Fig. 1 is the flow chart for the accident detection that the present invention is monitored under video；

Fig. 2 is the schematic diagram for the accident detection that the present invention is monitored under video；

Fig. 3 is overlapping sample streams journey figure；

Fig. 4 is moving region screening process figure；

Fig. 5 is that depth characteristic extracts flow chart；

Fig. 6 is dynamic clustering modeling procedure figure；

Fig. 7 is accident detection flow chart；

Fig. 8 is neighbouring sample block position view；

Fig. 9 is final result figure of the present invention.

Specific implementation mode

Below in conjunction with the accompanying drawings, specific embodiments of the present invention are described in further detail.As shown in figs 1-9, have Body step is described as follows：

Step S101：Image preprocessing.

Input video stream I_in, to I_inIt carries out gray processing and carries out noise reduction process using gaussian filtering.At gaussian filtering noise reduction The concrete operations of reason are as follows：With each pixel in one 3 × 3 Gaussian convolution Nuclear receptor co repressor video frame, determined with the convolution Field in pixel weighted average gray value go substitute convolution central pixel point value, output by processing after video flowing I。

Step S102：Overlap sampling.

Treated video flowing I is inputted, calculates the light of each pixel of each frame image in video flowing I first Flow valuve is used in combination the light stream value of pixel to replace gray value, then carries out fixed-size overlap sampling, output size phase to I Same and fixed video sampling image block set Cell.Referring to Fig. 3, detailed process is as follows：

Step S301：It is fitted previous frame video image.The former frame in two adjacent images frame in I is inputted, for adjacent Former frame in continuous two video frame carrys out approximate carry out table to each pixel neighborhood of a point in frame using a multinomial It reaches

Wherein A is symmetrical matrix, and b is vector, and c is scalar, and value can be fitted by weighted least-squares method and be asked , export the polynomial fitting f to the frame image₁(x)。

Step S302：It is fitted latter frame video image.The a later frame in two adjacent images frame in I is inputted, for adjacent A later frame in frame carries out approximate expression with same method

And polynomial parameters are acquired by weighted least-squares method, export the polynomial fitting f of the frame image₂(x)。

Step S303：Front and back expression formula association solves.Input the polynomial fitting f of adjacent two field pictures₁(x) and f₂(x), What it is due to two polynomial repressentations is two continuous frames image adjacent in video image, so that there is movements between them is related Property, if the displacement of pixel is d between two frames, then have

Wherein

A₂=A₁

b₂=b₁-2A₁d

Displacement d is defined as the function about x again, corresponding A and b are defined as

The displacement that pixel x can be obtained is

D (x)=A^-1(x)Δb(x)

Export the displacement d (x) of each pixel in previous frame image.

Step S304：Pixel gray value is replaced.Displacement d (x) corresponding to input video stream I and each frame image, After the light stream value of each pixel of every frame in acquiring video flowing I, for each pixel, with the light stream value of the pixel Original gray value is replaced, the corresponding video flowing I after replacing is exported_out。

Step S305：Overlap sampling.The video flowing I after finishing is replaced in input_out, from first picture of first frame image Vegetarian refreshments starts, and it is N × N to carry out size successively, and Duplication is the repeated sampling of θ, and output size is identical and fixed video sampling Image block set Cell.Wherein N is the sample size on Spatial Dimension, and value takes N under normal circumstances depending on image size =24, repetitive rate θ=0.5, i.e., according to above-mentioned parameter, Spatial Dimension is once adopted every 12 pixels in sampling process Sample.

Step S103：It screens moving region.

After step S102, input that size is identical and fixed video sampling image block set Cell this stage, so And due to being the global sampling of overlapping, so including only background information in some sampling blocks, and do not contain any fortune Dynamic information, thus this stage we sampling block is screened, weed out those only and include the sampling block of background information, output Include the sampling set of blocks Cell of movable information_out.Referring to Fig. 4, detailed process is as follows：

Step S401：Setting divides threshold value.Input sample image block set Cell.To all sampling institutes in the block in set There is the light stream vectors value of pixel into the bimodal statistics of column hisgram, according to the method every δ for a section since 0, by institute There is the light stream value of pixel according to size, counting statistics are carried out in corresponding section, obtain corresponding statistic histogram, generally In the case of δ=0.025.

After counting statistics finish, corresponding statistic histogram is obtained, scanning histogram finds first from small to large first Then the position of a wave crest is scanned the position that histogram finds second wave crest, is finally looked between two wave crests from big to small To the position of trough the division threshold xi is exported using the mediant in the statistics section corresponding to the trough as threshold xi is divided.

Step S402：Sampling block moving region judges.Input divides threshold xi and sampled images set of blocks Cell, is obtaining After dividing threshold value, next each sampling block is screened, if the light stream vectors size for sampling pixel in the block is big In threshold xi, then it is assumed that represented by the pixel is moving region, is defined as enlivening pixel；If active in entire sampling block The accounting of pixel is more than P, and be considered as sampling block expression is moving region, otherwise regards as being that background sampling block is picked It removes, under normal circumstances P=20%, finally output includes the sampling set of blocks Cell of movable information_out。

Step S104：Depth characteristic is extracted.

By the processing of step S103, there is motion event in remaining next sampling block image.This stage input packet Sampling set of blocks Cell containing movable information_out, one 3 layers of deep learning net is trained with these sampled images first Network PCANet；Then again by trained depth network, to extract the depth characteristic of corresponding sampled images, output is trained The network model Net and corresponding characteristic set v of sampling set of blocks.Referring to Fig. 5, detailed process is as follows：

Step S501：Network first tier learns.Input sample image block set Cell_out, the first layer of depth network is equipped with L₁A filter is filtered input picture.For the sampled images that size is N × N, it is k to carry out size to it first₁× k₂Intensive sampling, generally take k₁=k₂=5, and each sampling is rearranged into a column vector x_i, then for all Video sampling block can obtain a vector of samples matrix X.

Then principal component analysis is carried out to matrix X, takes preceding L₁Feature vector corresponding to a maximum eigenvalue is as filtering Device is rearranged into k₁×k₂The matrix of size.For each filter, the image of input is filtered with it, The sampled images of so each input can be converted to L₁Open filtering imageL under normal circumstances₁ =4, export filtering image I corresponding with sampled images^l。

Step S502：The network second layer learns.Input first layer filtering image I^l, L is equipped in the second layer of network₂It is a Filter, general L₂=4.Identical as step S501, it is k to carry out size to all images first in the second layer₁×k₂It is intensive Vectorization arranged side by side is sampled, vector of samples matrix X is obtained；Then principal component analysis is carried out to the matrix, L before choosing₂It is a maximum special The corresponding feature vector of value indicative is used in combination it to be filtered image as filter.

Since the light stream image of input has L by being exported after first layer₁Filtering image is opened, so an image is passing through After first two layers of depth network, export as L₁×L₂Open filtering imageAnd trained depth Spend network N et, wherein each O^lIn be corresponding with L₂Open filtering image.

Step S503：Depth characteristic exports.Input second layer filtering imageThird layer is The output layer of network carries out binary conversion treatment so that include only in result to it first for the filtering image of second layer output There are integer and zero.For each image collectionAn INTEGER MATRICES can be converted it into T^l

Wherein H (*) is class unit-step function

By above-mentioned processing, each pixel be encoded into [0,16) between integer.Obtaining INTEGER MATRICES T^lIt Afterwards, then to the matrix statistics with histogram is carried out, obtains the statistics with histogram vector of one 16 dimension.

For all total L₁A image collection O^l, L can be obtained₁These statistical vectors are carried out cascade behaviour by a statistical vector Make, output dimension isDepth characteristic vector.

Step S105：Dynamic clustering models.

The depth characteristic corresponding to all sampled images has been obtained by step S104.This stage input sample image block Depth characteristic vector set v models it by two layers of Clustering Model, outgoing event depth characteristic set The maximum inter- object distance d of cluster code book c and each code book.Referring to Fig. 6, detailed process is as follows：

Step S601：Dictionary set initializes.The empty dictionary set that a size is fixed as N is defined first, then by institute There is the depth characteristic vector of sampling block to be added to one by one in this dictionary set, and to each vector v in dictionary set Counting ω (v) is carried out, under normal circumstances N=200.

Step S602：Feature vector is added one by one.Depth characteristic vector set v is inputted, the feature vector in v is added successively Enter in dictionary set, during addition, for each newly joined feature vector, if wordbook after being added Vectorial quantity≤N in conjunction, then be directly added into, corresponding new count value ω (v)=1 that vector is added；If=N+1, needs Vector in dictionary set is merged so that the vector sum in dictionary set maintains N constant.

Step S603：Vector merges.Dictionary set to be combined is inputted, is if desired merged into row vector, we choose word Two vector vs of Euclidean distance minimum in allusion quotation set_a=[x_1a,x_2a,…,x_na] and v_b=[x_1b,x_2b,…,x_nb] merge. In merging process, the small vector of ω (*) value is merged into the big vector of ω (*) value by we, it is assumed here that ω (v_a)≥ ω(v_b), by vector v_bIt is merged into v_aIn the middle of.

For the every one-dimensional of vector to be combined, compare value of two vectors in the dimension, according to taking between the two Value size to merge into row vector, if new vector is v=[x₁,x₂,…,x_n], then have

x_i=(1- α) x_ia+α×sign(x_ia,x_ib)×x_ib

And in merging process, the count value ω (v) of the new vector after merging is

ω (v)=ω (v_a)+ω(v_b)

And sum remains the dictionary set output of N after merging.

Step S604：Code book clusters.Input safeguard complete after dictionary set, by all depth characteristic vectors successively It is added to after dictionary set, final only remaining N number of vector after merging.For this N number of vector, then use K mean values Algorithm clusters it, is clustered into k event cluster code book, each class represents a kind of motion event in video, and remembers Cluster centre and maximum distance d vectorial in class in each event class of the lower output of record, wherein taking k=16.

Step S106：Accident detection.

The training dataset for inputting algorithm is converted into corresponding model by step S105, and generates corresponding event cluster Code book, each code book represent a kind of motion event in training video.In this stage, algorithm by the test video of input into Row accident detection, video flowing of the output after detection mark, referring to Fig. 7, detailed process is as follows：

Step S701：Calculate motion event probability of occurrence.In step S105, by K mean cluster, it can obtain every The center vector of a event cluster code book and the maximum inter- object distance of the event cluster.So for each center vector c_i, definition ω (*) value of the event cluster is the sum of ω (*) value of all vectors for belonging to such.

After obtaining the count value ω (*) of each event cluster, count value is converted to corresponding probability of occurrence p (c_i)

Indicate the motion event corresponding to the event cluster code book, the probability occurred in training video are how many.

Step S702：Test video feature extraction.After probability has been calculated, for the test video of input, first, in accordance with Step S101 carries out image preprocessing；Then it is sampled according to step S102, obtains a series of sampling block；Installation steps again The method of S103 carries out moving region screening, those is weeded out only and include the sampling block of background information, only to including movement The sampling block of event carries out abnormal judgement；Screening finish after, for those include movable information sampling block, by sampling block Image is input in trained PCANet networks, with trained PCANet networks come generate corresponding depth characteristic to Amount, exports corresponding testing feature vector.

Step S703：Accident detection.Input test feature vector, in the depth characteristic vector for obtaining test sample block And then abnormal judgement is carried out to it.For any one testing feature vector v, by the center vector of itself and all event clusters c_iCompared one by one, if vector v and wherein some center vector c_iBetween Euclidean distance be less than its corresponding maximum kind Interior distance d_i, it is normal to be considered as the movement corresponding to the sampling block, and goes to step S705；If vector v and all c_iIt Between Euclidean distance be all higher than respective d_i, it is determined that being abnormal, and go to step S704.

Step S704：Secondary detection.Input is judged as abnormal sampling block, and abnormal regard is determined to be for those Frequency image sampling block carries out secondary detection to eliminate interference of the noise to detection to it.For each abnormal sample Block judges sampling block adjacent thereto on space and time dimension (referring to Fig. 8), if possessing M or more simultaneously around it Abnormal sample block is just regarded as being abnormal；Otherwise the sampling block is divided into normally again, under normal circumstances M= 2。

Step S705：Online updating.Input test feature vector needs to adopt the test after judgement terminates extremely In the middle of the depth characteristic vector update to event cluster code book of sample block so that code book deeply can gradually learn to regard with detection Emerging motion event in frequency.It needs again to carry out event cluster code book test vector with the method for step S105 thus Update.

Claims

1. the monitor video accident detection method based on deep learning and dynamic clustering, automatically adopts video with PCANet Sample block carries out depth characteristic extraction, while carrying out moving region screening to sampling block, and using one based on vectorial two merged Strata class model to characteristic set carries out cluster modeling, it is characterised in that includes the following steps：

Step 1：Image preprocessing；Monitoring video flow is read as input, gray processing is carried out and carries out noise reduction using gaussian filtering Processing；

Step 2：Overlap sampling；To inputting the video flowing of algorithm, each pixel in each frame image in calculating first Light stream value is used in combination pixel light stream value to replace gray value；Then fixed-size overlap sampling is carried out to I, output is a series of Size is the video sampling image block of N × N；

Step 3：It screens moving region；For all video sampling image blocks that sampling obtains, histogram Two-peak method is used first Statistics obtains dividing the division threshold value that pixel and background pixel point are moved in image, then according to the threshold value to each sample graph As block is judged, the sampled images block comprising motion event is filtered out, includes only that the sampling block of background information is picked by those Except not considering；

Step 4：Depth characteristic is extracted；After being included only the sampled images block of movable information, by these video samplings Image block is input in 3 layers of PCANet, to carry out parameter training；It is finished in depth network training and then once by image block It is input in trained depth network, network exports corresponding depth characteristic for each sampled images block；

Step 5：Dynamic clustering models；For depth characteristic vector set, feature vector is sequentially input first fixed-size In dictionary set, if collective number is more than the upper bound, immediate two feature vectors are merged to maintain sum not Become；After safeguarding, cluster operation is carried out with K mean algorithms to dictionary set, exports corresponding event cluster code book；

Step 6：After model construction finishes, input test video is sampled and is transported to every frame image that test article is Sampled images, are then input in trained PCANet and export corresponding depth characteristic, finally by feature by dynamic region decision Vector is compared with event cluster code book, if being all higher than respective threshold value at a distance from all code books, it is determined that being abnormal Event.

2. the monitor video accident detection method according to claim 1 based on deep learning and dynamic clustering, It is characterized in that the overlap sampling described in step 2, it is specific as follows：

Step 2-1：It is fitted previous frame video image；The former frame in two adjacent images frame in I is inputted, for adjacent continuous Former frame in two video frame carrys out approximate express to each pixel neighborhood of a point in frame using a multinomial

Wherein A is symmetrical matrix, and b is vector, and c is scalar, and value can be fitted by weighted least-squares method and be acquired, defeated Go out the polynomial fitting f to the frame image₁(x)；

Step 2-2：It is fitted latter frame video image；The a later frame in two adjacent images frame in I is inputted, in consecutive frame A later frame carries out approximate expression with same method

And polynomial parameters are acquired by weighted least-squares method, export the polynomial fitting f of the frame image₂(x)；

Step 2-3：Front and back expression formula association solves；Input the polynomial fitting f of adjacent two field pictures₁(x) and f₂(x), due to two A polynomial repressentation is two continuous frames image adjacent in video image, so there is motion relevances between them, if The displacement of pixel is d between two frames, then has

Wherein

A₂=A₁

b₂=b₁-2A₁d

c₂=d^TA₁d-b₁ ^Td+c₁

The displacement that pixel x can be obtained is

D (x)=A^-1(x)Δb(x)

Export the displacement d (x) of each pixel in previous frame image；

Step 2-4：Pixel gray value is replaced；Displacement d (x) corresponding to input video stream I and each frame image, is acquired After the light stream value of each pixel of every frame in video flowing I, for each pixel, replaced with the light stream value of the pixel Original gray value exports the corresponding video flowing I after replacing_out；

Step 2-5：Overlap sampling；The video flowing I after finishing is replaced in input_out, opened from first pixel of first frame image Begin, it is N × N to carry out size successively, and Duplication is the repeated sampling of θ, and output size is identical and fixed video sampling image block Set Cell；Wherein N is the sample size on Spatial Dimension, and value takes N=24 under normal circumstances depending on image size, Repetitive rate θ=0.5, i.e., according to above-mentioned parameter, Spatial Dimension is once sampled every 12 pixels in sampling process.

3. the monitor video accident detection method according to claim 1 based on deep learning and dynamic clustering, It is characterized in that the moving region screening described in step 3, it is specific as follows：

Step 3-1：Setting divides threshold value；Input sample image block set Cell；To all sampling all pictures in the block in set The light stream vectors value of vegetarian refreshments is into the bimodal statistics of column hisgram, according to the method every δ for a section since 0, by all pictures The light stream value of vegetarian refreshments carries out counting statistics according to size in corresponding section, obtains corresponding statistic histogram, and δ= 0.025；

After counting statistics finish, corresponding statistic histogram is obtained, scanning histogram finds first wave from small to large first Then the position at peak scans the position that histogram finds second wave crest, finally finds wave between two wave crests from big to small The position of paddy exports the division threshold xi using the mediant in the statistics section corresponding to the trough as threshold xi is divided；

Step 3-2：Sampling block moving region judges；Input divides threshold xi and sampled images set of blocks Cell, is obtaining dividing threshold After value, next each sampling block is screened, if the light stream vectors size for sampling pixel in the block is more than threshold value ξ, then it is assumed that represented by the pixel is moving region, is defined as enlivening pixel；If enlivening pixel in entire sampling block Accounting be more than P, be considered as the sampling block expression be moving region, otherwise regard as being that background sampling block is rejected, take P =20%, finally output includes the sampling set of blocks Cell of movable information_out。

4. the monitor video accident detection method according to claim 1 based on deep learning and dynamic clustering, It is characterized in that the depth characteristic extraction described in step 4, it is specific as follows：

Step 4-1：Network first tier learns；Input sample image block set Cell_out, the first layer of depth network is equipped with L₁A filter Wave device is filtered input picture；For the sampled images that size is N × N, it is k to carry out size to it first₁×k₂It is close Collection sampling, takes k₁=k₂=5, and each sampling is rearranged into a column vector x_i, then for all video samplings Block, to obtain a vector of samples matrix X；

Then principal component analysis is carried out to matrix X, takes preceding L₁Feature vector corresponding to a maximum eigenvalue, will as filter It is rearranged into k₁×k₂The matrix of size；For each filter, the image of input is filtered with it, then The sampled images of each input can be converted to L₁Open filtering image I^l=I*W_l ¹, l=1,2 ..., L₁, L under normal circumstances₁= 4, export filtering image I corresponding with sampled images^l；

Step 4-2：The network second layer learns；Input first layer filtering image I^l, L is equipped in the second layer of network₂A filtering Device takes L₂=4；It is k to carry out size to all images first in the second layer₁×k₂Intensive sampling vectorization arranged side by side, adopted Sample vector matrix X；Then principal component analysis is carried out to the matrix, L before choosing₂The corresponding feature vector conduct of a maximum eigenvalue Filter is used in combination it to be filtered image；

Since the light stream image of input has L by being exported after first layer₁Filtering image is opened, so an image is by depth After first two layers of network, export as L₁×L₂Open filtering imageAnd trained depth net Network Net, wherein each O^lIn be corresponding with L₂Open filtering image；

Step 4-3：Depth characteristic exports；Input second layer filtering imageThird layer is network Output layer, for the second layer output filtering image, binary conversion treatment is carried out to it first so that include only whole in result Number and zero；For each image collectionAn INTEGER MATRICES T can be converted it into^l

Wherein H (*) is class unit-step function

By above-mentioned processing, each pixel be encoded into [0,16) between integer；Obtaining INTEGER MATRICES T^lAnd then Statistics with histogram is carried out to the matrix, obtains the statistics with histogram vector of one 16 dimension；

For all total L₁A image collection O^l, L can be obtained₁These statistical vectors are carried out cascade operation by a statistical vector, Exporting dimension isDepth characteristic vector.

5. the monitor video accident detection method according to claim 1 based on deep learning and dynamic clustering, It is characterized in that the dynamic clustering modeling described in step 5, it is specific as follows：

Step 5-1：Dictionary set initializes；The empty dictionary set that a size is fixed as N is defined first, then by all samplings The depth characteristic vector of block is added to one by one in this dictionary set, and is counted to each vector v in dictionary set ω (v) is counted, under normal circumstances N=200；

Step 5-2：Feature vector is added one by one；Depth characteristic vector set v is inputted, the feature vector in v is added sequentially to In dictionary set, during addition, for each newly joined feature vector, if after being added in dictionary set Vectorial quantity≤N, then be directly added into, it is corresponding it is new be added vector count value ω (v)=1；If=N+1, needs to word Vector in allusion quotation set merges so that the vector sum in dictionary set maintains N constant；

Step 5-3：Vector merges；Dictionary set to be combined is inputted, is if desired merged into row vector, it is Central European to choose dictionary set Two minimum vector vs of family name's distance_a=[x_1a,x_2a,…,x_na] and v_b=[x_1b,x_2b,…,x_nb] merge；In merging process In, the small vector of ω (*) value is merged into the big vector of ω (*) value, it is assumed here that ω (v_a)≥ω(v_b), by vector v_b It is merged into v_aIn the middle of；

For the every one-dimensional of vector to be combined, compare value of two vectors in the dimension, it is big according to value between the two It is small to merge into row vector, if new vector is v=[x₁,x₂,…,x_n], then have

x_i=(1- α) x_ia+α×sign(x_ia,x_ib)×x_ib

ω (v)=ω (v_a)+ω(v_b)

And sum remains the dictionary set output of N after merging；

Step 5-4：Code book clusters；The dictionary set after completing is safeguarded in input, is sequentially added by all depth characteristic vectors To after dictionary set, final only remaining N number of vector after merging；For this N number of vector, then use K mean algorithms It is clustered, is clustered into k event cluster code book, each class represents a kind of motion event in video, and records Cluster centre and maximum distance d vectorial in class in each event class are exported, wherein taking k=16.

6. the monitor video accident detection method according to claim 1 based on deep learning and dynamic clustering, It is characterized in that the accident detection described in step 6, it is specific as follows：

Step 6-1：Calculate motion event probability of occurrence；In step S105, by K mean cluster, each thing can be obtained The center vector of part cluster code book and the maximum inter- object distance of the event cluster；So for each center vector c_i, define the thing ω (*) value of part cluster is the sum of ω (*) value of all vectors for belonging to such；

Indicate the motion event corresponding to the event cluster code book, the probability occurred in training video are how many；

Step 6-2：Test video feature extraction；After probability has been calculated, for the test video of input, first, in accordance with step 1 Carry out image preprocessing；Then it is sampled according to step 2, obtains a series of sampling block；Motor area is carried out further according to step 3 Domain is screened, those is weeded out only and include the sampling block of background information, only to include the sampling block of motion event carry out it is abnormal Judge；Screening finish after, for those include movable information sampling block, sampling block image is input to trained In PCANet networks, corresponding depth characteristic vector is generated with trained PCANet networks, exports corresponding test Feature vector；

Step 6-3：Accident detection；Input test feature vector, after obtaining the depth characteristic vector of test sample block, Abnormal judgement is carried out to it again；For any one testing feature vector v, by the center vector c of itself and all event clusters_iIt carries out Compare one by one, if vector v and wherein some center vector c_iBetween Euclidean distance be less than its corresponding maximum kind in away from From d_i, it is normal to be considered as the movement corresponding to the sampling block, and goes to step 6-5；If vector v and all c_iBetween Euclidean distance is all higher than respective d_i, it is determined that being abnormal, and go to step 6-4；

Step 6-4：Secondary detection；Input is judged as abnormal sampling block, those are determined to be with abnormal video figure As sampling block carries out secondary detection to eliminate interference of the noise to detection to it；For each abnormal sample block, sentence Sampling block adjacent thereto on disconnected space and time dimension just will if possessing M or more abnormal sample block around it simultaneously It regards as being abnormal；Otherwise the sampling block is divided into normally again, under normal circumstances M=2；

Step 6-5：Online updating；Input test feature vector needs after judgement terminates extremely by the test sample block In the middle of the update to event cluster code book of depth characteristic vector so that code book deeply can gradually learn in video newly with detection The motion event of appearance；It needs test vector being again updated event cluster code book with the method for step 5 thus.