CN107832688B

CN107832688B - Traffic mode and abnormal behavior detection method for traffic intersection video monitoring

Info

Publication number: CN107832688B
Application number: CN201711030491.4A
Authority: CN
Inventors: 周厚奎; 王陈燕
Original assignee: Zhejiang A&F University ZAFU
Current assignee: Zhejiang A&F University ZAFU
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2020-08-11
Anticipated expiration: 2037-10-27
Also published as: CN107832688A

Abstract

The invention discloses a method for detecting traffic modes and abnormal behaviors, which comprises the following steps: a1, dividing video with time length of T seconds into video with length of T seconds_sShort video clips of seconds, resulting in T/T_sA video document; a2, calculating optical flow vectors of every two adjacent pairs of video frames of each video document; a3, quantizing the obtained optical flow vector to obtain a video word of each pair of video frames; a4, counting the counting vector of the video words of each video document to obtain a document-word counting matrix of the video document set; a5, extracting topics by using a BNBP-PFA topic model to obtain topic-word distribution and document-topic distribution; a6, respectively using the obtained topic-word distribution and document-topic distribution as a new word and a new document, and obtaining the topic of the second layer topic model by utilizing a BNBP-PFA topic model; and A7, detecting abnormal behaviors in the video frame based on the log-likelihood function values of the two layers of topic models.

Description

Traffic mode and abnormal behavior detection method for traffic intersection video monitoring

Technical Field

The invention relates to a method for detecting a traffic mode and abnormal behaviors, in particular to a method for detecting a traffic mode and abnormal behaviors through video monitoring at a traffic intersection.

Background

With the development of machine vision and data mining technologies, it becomes possible to automatically discover useful information in video surveillance data. Regular traffic patterns or abnormal traffic behaviors are found from video data of crowded people and traffic flow traffic intersection scenes, and the problems of important research value and technical application prospect are not completely solved. To solve such problems, there are often several challenges: 1) under complex and crowded traffic scenes, the performance of the existing computer vision tracking-based method is often poor; 2) the traffic mode is independent of the characteristics of the bottom layer in the video, reflects the semantic information of the high layer in the video, and the information relates to the high layer vision in the machine vision, namely the visual understanding problem. A huge semantic gap often exists between the bottom layer visual characteristics and the high layer visual semantics, so that the method based on the bottom layer visual characteristic detection, namely the target detection and target tracking method, cannot acquire the upper layer semantic information of the whole video. Specifically, in the aspect of the traffic mode detection and the abnormal traffic behavior detection at a traffic intersection, because dense pedestrian flow and traffic flow exist at the intersection, a scene is easily affected by noise, illumination, weather change, complex background information and the like, and the performance of a method based on target detection and target motion trajectory clustering is often poor.

To overcome the drawbacks of the above methods, another method for acquiring "events" or "activities" in a video scene directly using the underlying motion information of the video, such as optical flow information, is becoming popular. The method avoids tracking a single moving target, mainly utilizes rich local motion information between adjacent video frames, namely position and motion information from bottom-layer features, and then utilizes a complex dimension reduction model (such as a theme model) to extract effective high-layer semantic information from high-dimensional feature information. Common topic models, such as plsa (systematic topic Semantic analysis), lda (topic dictionary allocation), hdp (hierarchical dictionary process), etc., are used for topic discovery in text corpora initially and for image and video analysis tasks later. Wanggang et al [1] proposed a method for modeling "events" and "behavioral patterns" in complex and crowded video scenes using an unsupervised learning framework of a hierarchical bayesian model. Song et al [2] propose a traffic pattern mining method of a two-layer LDA structure, which can find simple traffic patterns and complex traffic patterns in traffic intersection video scenes and detect abnormal traffic behaviors. Document [3] uses an FSTM (fusion spark Topic model) to detect an abnormality in a traffic video. Document [4] proposes a technique of learning typical activity and traffic state information in traffic video using HDP and HDP-HMM, respectively, and classifying the traffic state information using a gaussian process. MCTM (Markov Clustering Topic model) models visual words, which are the underlying features of the traffic video frames, by means of an LDA model and Markov chains to model the temporal relationship between adjacent video frames. The method can realize the clustering of the traffic visual feature hierarchy into a local traffic mode and a global traffic mode. WS-JTM (Weakly Supervised Joint Topic model) is a weakly Supervised federated Topic model. Based on the LDA model, the model fully utilizes the class characteristics of different video documents to mine typical traffic modes and abnormal traffic modes. Another class of methods is mainly non-topic model based methods. The method mainly utilizes modeling methods such as matrix decomposition, sparse dictionary learning and the like to carry out theme modeling on the visual features of the bottom layer so as to obtain typical and abnormal traffic modes.

In the two traffic video traffic mode mining methods, the sparse traffic mode is difficult to obtain by the method based on the probability topic model, namely, the topics in the video document are not sparsely distributed. In addition, the learning and reasoning method of the probabilistic topic model based method model is complex, so that the arithmetic process of the algorithm is complex and the calculated amount is large. The method based on the non-probabilistic topic model is widely used for finding typical and abnormal traffic patterns in traffic videos due to the fact that sparsity in visual information can be fully utilized, but the number of topics, namely traffic patterns, must be specified in advance, and certain flexibility is lacked. Aiming at the problems, the invention provides a traffic pattern analysis method with a two-layer structure, wherein a first layer utilizes a BNBP-PFA (Betta negative binomial process-Poisson factorization) topic model to extract a topic, namely a simple traffic pattern, to obtain the distribution condition of each video document on the topic (simple traffic pattern), and a second layer BNBP-PFA topic model obtains the topic of a second layer, namely a complex traffic pattern in a video on the basis of the topic obtained by the first layer. Compared with the two-layer LDA model of the document [2], the method provided by the invention does not need to pre-specify the number of subjects in each layer due to the adoption of the BNBP-PFA subject model. Compared with an LDA model, the BNBP is suitable for processing sparse counting data and is particularly suitable for processing motion characteristic data of the video; BNBP has a better structural form and computational flexibility than the HDP topic model.

Disclosure of the invention

The invention mainly aims to overcome the defects of the existing traffic mode and abnormal behavior detection method and provides a novel traffic mode and abnormal behavior detection method based on a two-layer BNBP-PFA topic model. The method utilizes the BNBP-PFA topic model of two layers to simultaneously realize the detection of the simple traffic mode and the complex traffic mode in the traffic video, has the advantages of more identified modes, higher accuracy, capability of automatically learning the number of the modes and the like compared with the existing method, and obtains better detection effect compared with the existing method; on the basis, the abnormal behavior detection method based on the log-likelihood function value of the two-layer BNBP-PFA topic model is provided, and the detection effect is better than that of the existing method.

The method provided by the invention comprises the technical problems of extraction of video optical flow characteristics, generation of video documents, detection of a simple traffic mode and a complex traffic mode based on a two-layer BNBP-PFA topic model, detection of abnormal behaviors in traffic videos and the like. In order to solve the technical problems, the invention provides a method for detecting traffic modes and abnormal behaviors of traffic intersection video monitoring, which comprises the following steps:

A1. dividing a long video with the time length of T seconds into a long video with the time length of T seconds according to the time sequence_sShort video clips of seconds, each video clip as a video document, resulting in a total of N ═ T/T_sA video document;

A2. for each video document, calculating optical flow vectors of every two adjacent pairs of video frames of the video document;

A3. quantizing the optical flow vector obtained in A2 to obtain video words of each pair of video frames of each video document;

A4. counting vectors of video words of each video document based on a word bag model to obtain a document-word counting matrix M of a video document set consisting of the whole long video;

A5. performing theme extraction on the video document obtained in the A4 by using a BNBP-PFA theme model to obtain theme-word distribution and document-theme distribution, wherein the obtained theme is a simple traffic mode in the video;

A6. taking the theme obtained in A5 as a new word, taking the document-theme distribution obtained in A5 as a new document, and performing theme extraction by using a BNBP-PFA theme model to obtain the theme-word distribution of a second layer of theme model, wherein the obtained theme is a complex traffic mode in the video;

A7. and detecting abnormal behaviors in the video frame based on log-likelihood function values of the two-layer theme model on the basis of the two-layer BNBP-PFA theme model obtained by A5 and A6.

The process of calculating the optical flow vector of two adjacent pairs of frames in step a2 specifically includes:

A21. for two adjacent consecutive video frames I_x，I_yCalculating an optical flow information vector (v) for each pixel (i, j)_x(i,j),v_y(i,j))；

A22. According to the formula

And

obtaining the intensity and direction information (M (i, j), D (i, j)) of the optical flow of each pixel point;

the process of quantizing the optical flow vector to obtain the video word in the step a3 specifically includes:

A31. dividing the position information of the optical flow: will be of size N_x×N_yVideo frame partitioning into N₁×N₁Of the pixel block, to obtain in total

Pixel block (symbol [ ])]Denotes taking an integer, N₁,N_x,N_yAll positive integers) and using the coordinate of the central point of the pixel block as the coordinate of the block;

A32. quantification of optical flow intensity and direction: each pixel block comprising N₁ ²A pixel point, and the N is₁ ²The average light stream value of each pixel point is used as the light stream vector of the pixel block

When the optical flow intensity value of the block exceeds a preset threshold value Th is that

If so, judging the pixel block to be a motion pixel block, otherwise, judging the pixel block to be a background pixel block; quantizing the optical flow direction of the block into S directions;

A33. according to the quantization method, the size of the vocabulary of the video document set can be obtained as

The step a5 specifically includes:

assume that the document count matrix obtained in A4 is M_ij∈R^P×NThe count matrix contains P features of the N documents. According to the formula

m_ijk～Pois(φ_ikθ_kj)，φ_k～Dir(α_φ,…,α_φ)，θ_kj～Gamma(r_k,p_k/(1-p_k))，r_k～Gamma(c₀r₀,1/c₀)，p_kThe BNBP-PFA subject modeling process of Beta (c, c (1-)) can obtain K subject distribution matrixes phi ∈ R^P×KAnd a composition case matrix Θ ∈ R of K topics in N documents^K×NAnd the theme distribution matrix represents the distribution of the K themes on the P characteristics.

The step a6 specifically includes:

subjecting the subject φ obtained in the step A5_ikDocument-distribution of words in A6 as words in step A6 θ_kjIt is considered to be composed of the subject matter in a5. According to the formula

θ_kjk′～Pois(φ′_kk′θ′_k′j)，φ′_k′～Dir(α′_φ′,…,α′_φ′)，θ′_k′j～Gamma(r′_k′,p′_k′/(1-p′_k′))，r′_k′～Gamma(c′₀r′₀,1/c′₀)，p′_k′The BNBP-PFA topic modeling process of Beta (c '', c '(1-')) can obtain K 'topics-word distribution phi'_kk′And document-distribution of topics θ'_k′j。

The step a7 specifically includes:

A71. on a visual word counting matrix M of the documents in the whole video document set, randomly selecting 80% of the video documents to form a training video document set X, and forming a test set Y (M-X) by the rest 20% of the video document set;

A72. on the test set Y, according to the formula

(wherein

y_pi＝Y(p,i)，φ_pkIs the topic distribution, theta, obtained by the first layer BNBP-PFA topic model_kiIs a video frame y_piIn a subject phi_pkCoefficient of distribution of (A), N₁Is a video frame y_piLength, i.e. the number of visual words it contains), calculates each video frame y_piNormalized likelihood function value F on first layer BNBP-PFA topic model₁；

A73. According to the formula

(wherein

θ_kiIs a video frame y_piIn a subject phi_pkCoefficient of distribution of phi'_kk′Is a theme distribution, theta ', obtained from a second layer BNBP-PFA theme model'_k′iIs theta_kiAt a topic of'_kk′Coefficient of distribution of (A), N₂Is theta_kiThe length of the vector, i.e. the subject phi it contains_pkNumber of) each video frame y) is calculated_piIn a subject phi_pkOnCoefficient of distribution theta_kiNormalized likelihood function value F on second layer BNBP-PFA topic model₂；

A74. Computing a video frame y_piWeighted likelihood function value F- η & F on two-layer BNBP-PFA subject model₁+(1-η)·F₂Taking parameter η∈ (0, 1);

A75. the likelihood function value F calculated in A74 and a given threshold value Th₁Making a comparison, if there is F < Th₁Then video frame y_piThe middle packet contains abnormal behavior, otherwise it does not.

The embodiment provided by the invention has the following beneficial effects:

the invention applies the topic model to understanding and analyzing the traffic video scene, and provides a method for detecting not only simple traffic modes and complex traffic modes in the traffic video scene, but also abnormal traffic behaviors. Compared with the existing method, the method of the invention has more discovered traffic modes and higher quality. In addition, because a non-parametric topic model is adopted, the method of the invention does not need to specify the number of topics in advance, which is very useful when processing some complex and unknown traffic video data. The method provided by the invention can be applied to the excavation of traffic modes and the detection of abnormal traffic behaviors in traffic videos, and has important significance for the development of the fields of intelligent traffic, traffic video monitoring and the like.

Drawings

FIG. 1 is a flow chart of a traffic pattern and abnormal behavior detection method for video surveillance at a traffic intersection according to an embodiment of the present invention;

FIG. 2 is a sample frame of a data set used in the present embodiment;

FIG. 3 is a diagram illustrating a first-level topic, i.e., a simple traffic pattern, found in a video data set by the method of the present invention according to this embodiment;

fig. 4 shows 15 subjects found in the video data set by the LDA method in this embodiment;

fig. 5 shows 15 subjects found in the video data set by the FTM method in this embodiment;

FIG. 6 shows the 15 subjects found in the video data set by the HDP method in this embodiment;

FIG. 7 is a diagram illustrating a second-level topic, i.e., a complex traffic pattern, found in a video data set according to the method of the present invention;

FIG. 8 shows 4 abnormal traffic behaviors detected on a video data set by the method of the present invention in this embodiment;

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

Fig. 1 is a flowchart of a traffic mode and abnormal behavior detection method for video monitoring at a traffic intersection according to an embodiment of the present invention. As shown in fig. 1, the workflow of the detection method for traffic patterns and abnormal behaviors in the embodiment includes the following steps:

a1: dividing a long video with the time length of T seconds into a long video with the time length of T seconds according to the time sequence_sShort video clips of seconds, each video clip as a video document, resulting in N₁＝T/T_sA video document.

In this step, the traffic intersection video data (http:// www.eecs.qmul.ac.uk/-tmh/downloads. html) disclosed by the QMEL connection Dataset 2 data set is downloaded by the computer as the video data of the embodiment of the present invention. The data set contains a busy urban traffic intersection video of length 52 minutes, frame rate 25Hz, and size 360 x 288 pixels per frame. A sample frame of the data set is shown in fig. 2, and contains 6 motion patterns as shown in table 1. The data set contains a total of 4 abnormal behaviors. The total length of the video data is 3120 seconds, wherein if a video document with a length of 12 seconds (300 video frames in total) is taken, a total of 260 video documents are obtained, and then the process proceeds to step a2.

Table 1 possible motion patterns contained in the QMUL Junction Dataset 2 data set of fig. 2 and their descriptions (up, down, left, right four directions)

Mode id

a

b

c

d

e

f

Description of the modes

Left turn up

Uplink is carried out

Downstream

Downward right turn

Sidewalk right-going

Left-hand side of sidewalk

A2: for each video document, the optical flow vectors for each adjacent two pairs of video frames are calculated.

In this step, for 300 video frames contained in each video document, the frame I is sequentially calculated from the second frame according to the time sequence_xAnd the adjacent previous frame I_yIn between, at each pixel point (i, j), an optical flow vector (v)_x(i,j),v_y(i, j)), where the optical flow calculation uses the standard Lucas-Kanade optical flow calculation method. Then according to the formula

And

and calculating to obtain the intensity and direction matrix (M (i, j), D (i, j)) of the optical flow of each pixel point (i, j).

A3: quantizing the optical flow vectors (M (i, j), D (i, j)) obtained in a2 to obtain video words for each pair of video frames of each video document.

In this step, the position information, intensity and direction of the optical flow are quantized respectively, and the method specifically includes three substeps:

1) dividing a video frame with the size of 360 × 288 pixels into 8 × 8 pixel blocks, obtaining 1620 pixel blocks in total, and using the coordinate of the center point of each pixel block as the coordinate of the block;

2) quantification of optical flow intensity and direction: taking the average optical flow value of 64 pixel points contained in each pixel block as the optical flow vector of the pixel block

When the optical flow intensity value of the block exceeds the preset threshold value Th of 0.05

If so, judging the pixel block to be a motion pixel block, otherwise, judging the pixel block to be a background pixel block; quantizing the optical flow direction of the block into 4 directions including up, down, left and right directions;

3) according to the quantization method, the size of the vocabulary of each video document set can be 6480.

A4. Counting the counting vector of the video words of each video document based on a bag-of-words model to obtain a document-word counting matrix M of a video document set consisting of the whole video data set_260*6480；

A5. For the video document matrix M obtained in A4_260*6480Extracting the theme by using a BNBP-PFA theme model to obtain the distribution phi of the theme-words and the distribution theta of the document-theme, wherein the obtained theme is a simple traffic mode in the video;

the step a5 specifically includes:

assume what is stated in A4The obtained document counting matrix is M_ij∈R^6480×260The count matrix contains 6480 features of 260 documents. According to the formula

m_ijk～Pois(φ_ikθ_kj)，φ_k～Dir(α_φ,…,α_φ)，θ_kj～Gamma(r_k,p_k/(1-p_k))，r_k～Gamma(c₀r₀,1/c₀)，p_kThe BNBP-PFA topic modeling process of Beta (c, c (1-)) can obtain the probability distribution phi of the BNBP-PFA topic model by using the commonly used Markov chain Monte Carlo reasoning algorithm as follows_k、θ_kjAnd a related parameter r_kAnd p_k. The specific reasoning algorithm is as follows:

1) note the book

The upper bound of the theme number K is determined by c γ α B (c, c (1-)), where P ═ 6480, N ═ 260, c ═ 1, γ ═ 1, α ═ 1, ═ 0.05;

2) m is obtained by sampling according to the following formula (1)_ijk；

[m_ij1,…,m_ijK]～Mult(m_ij；ζ_ij1,…,ζ_ijK) (1)

3) Using the relationship between Poisson distribution and polynomial distribution, and the relational expression

P ([ m ]) is known_1jk,…,m_pjk]|-)＝Mult(m_·jk；φ_k) Then phi can be obtained by sampling according to the following formula (2)_k；

p(φ_k|-)～Dir(α_φ+m_1·k,…,α_φ+m_P·k) (2)

4) Marginalizing phi_kAnd theta_kjAfter, m_·jk～NB(r_k,p_k)，p_kBeta (c, c (1-)), then p_kCan be obtained by sampling according to the following formula (3);

p(p_k|-)～Beta(c+m_··k,c(1-)+Nr_k)(3)

5) due to the fact that

R can be obtained by sampling according to the following equation (4)_k。

A 15 topic distribution matrix Φ ∈ R can be obtained^P×KAnd K15 topics composition case matrix Θ ∈ R in N260 documents^K×NAnd the theme distribution matrix represents the distribution of K-15 themes on P-6480 features. The obtained subject is that the simple traffic mode in the video is shown in fig. 3, and as a comparison with the experimental results obtained by the method of the present invention, fig. 4-6 show the LDA method respectively^[4]HDP method^[1]And FTM method^[3]K-15 representative topics obtained on the QMUL Junction Dataset 2 Dataset. As can be seen from the results shown in fig. 3-6, all the valid motion patterns can be correctly detected by all three methods except that the HDP method can only detect 5 patterns including the correct 4 patterns. From the analysis of the above experimental results, the result obtained by the HDP method is the worst in the four models, the result obtained by the method of the present invention is the best, and the performance of the LDA method and the FTM method is not very different. Also, from the experimental results of fig. 3 to 6, the subject quality generated by the four methods was reduced in the order of the method of the present invention, FTM method, LDA method, HDP method.

A6. Taking the topic distribution phi obtained in A5 as a new word, taking the document-topic distribution theta obtained in A5 as a new document, and performing topic extraction by using a BNBP-PFA topic model to obtain a topic-word distribution phi' of a second layer topic model, wherein the obtained topic is a complex traffic mode in a video;

the step a6 specifically includes:

the 15 subjects φ obtained in the step A5_ikIs regarded as a word in A6, document in A6-distribution of words θ_kjIt is considered to be composed of the subject matter in a5. According to the formula

θ_kjk′～Pois(φ′_kk′θ′_k′j)，φ′_k′～Dir(α′_φ′,…,α′_φ′)，θ′_k′j～Gamma(r′_k′,p′_k′/(1-p′_k′))，r′_k′～Gamma(c′₀r′₀,1/c′₀)，p′_k′The BNBP-PFA topic modeling process of Beta (c ', c' (1-)) can obtain K '═ 3 topics, the word distribution phi'_kk′And 260 documents-distribution of topics θ'_k′j. The probability distribution φ 'of the BNBP-PFA topic model can be obtained by using the commonly used Markov chain Monte Carlo inference algorithm as follows'_k′、θ′_k′jAnd a related parameter r'_k′And p'_k′. The specific reasoning algorithm is as follows:

1) note the book

The upper bound of the theme number K ' is determined by c ' γ ' α ' B (c ", c ' (1-)), where K15, N260, c 1, γ 1, α 1, ' ' 0.05;

2) sampling according to the following equation (5) to obtain theta_kjk′；

[θ_kj1,…,θ_kjK′]～Mult(θ_kj；ζ′_kj1,…,ζ′_kjK′) (5)

Let us know p ([ theta ])_1jk′,…,θ_kjk′]|-)＝Mult(θ_·jk′；φ′_k′) Then phi 'can be obtained by sampling according to the following formula (6)'_k′；

p(φ′_k′|-)～Dir(α′_φ′+θ_1·k′,…,α′_φ′+θ_K·k′) (6)

4) Marginalized phi'_k′And θ'_k′jRear, theta_·jk′～NB(r′_k′,p′_k′)，p′_k′Beta (c ', c ' (1-)), then p '_k′Can be obtained by sampling according to the following formula (7);

p(p′_k′|-)～Beta(c′′+θ_··k′,c′(1-′)+N′r′_k′) (7)

5) due to the fact that

Then r 'can be obtained by sampling according to the following formula (8)'_k′。

K '═ 3 topic distribution matrices Φ' ∈ R can be obtained^K×K'And K '3 topics in N260 documents the composition case matrix Θ' ∈ R^K'×NThe theme distribution matrix represents the distribution of K ═ 3 themes over K ═ 15 sub-themes (i.e., simple traffic patterns). The subject matter obtained in a6 is that the complex traffic pattern in the video is shown in fig. 7. And document [2]]Compared with the method in the prior art, the method can only obtain the complex traffic modes in two large directions of left and right, up and down, and the method provided by the invention can obtain more detailed complex traffic modes. Table 2 below gives the subject of the 3 traffic modes obtained by the method of the invention on the QMEL JunctionDataset 2 datasetComposition and description of traffic flow status.

TABLE 2 theme composition and traffic flow status description of 3 traffic patterns on QMUL Junction Dataset 2 data set

The step a7 specifically includes:

A71. document-visual word count matrix M over the entire video document set_260×6480In the above, 80% of the video documents are randomly selected to form a training video document set X, and the remaining 20% of the video document set forms a test set Y which is M-X;

A72. on the test set Y, according to the formula

(wherein

y_pi＝Y(p,i)，φ_pkIs the topic distribution, theta, obtained by the first layer BNBP-PFA topic model_kiIs a video frame y_piIn a subject phi_pkCoefficient of distribution of (A), N₁6480 is video frame y_piLength, i.e. the number of visual words it contains), calculates each video frame y_piNormalized likelihood function value F on first layer BNBP-PFA topic model₁；

A73. According to the formula

(wherein

θ_kiIs a video frame y_piIn a subject phi_pkCoefficient of distribution of phi'_kk′Is a theme distribution, theta ', obtained from a second layer BNBP-PFA theme model'_k′iIs theta_kiAt a topic of'_kk′Coefficient of distribution of (A), N ₂15 is θ_kiThe length of the vector, i.e. the subject phi it contains_pkNumber of) each video frame y) is calculated_piIn a subject phi_pkCoefficient theta of distribution of_kiNormalized likelihood function value F on second layer BNBP-PFA topic model₂；

A74. Computing a video frame y_piWeighted likelihood function value F- η & F on two-layer BNBP-PFA subject model₁+(1-η)·F₂Wherein the parameter η is 0.5;

A75. the likelihood function value F calculated in A74 and a given threshold value Th₁The comparison is carried out at 0.1, if there is F < Th₁Then video frame y_piThe middle packet contains abnormal behavior, otherwise it does not.

Each video document on the test data set is detected, and a video frame containing abnormal traffic behaviors can be obtained. In fig. 8, the 4 abnormal traffic behaviors detected by the method of the present invention on the QMUL Junction Dataset 2 data set are respectively: (1) pedestrians can cross the road without walking the zebra crossing, (2) pedestrians pass the road on the sidewalk and run red lights, (3) vehicles change lanes in the middle of the intersection, and (4) vehicles pass between the two vehicles. The objects and locations where abnormal behavior occurs are marked with red boxes in fig. 8.

In order to quantitatively evaluate the performance of the abnormal traffic behavior detection method proposed by the present invention, a comparative experiment was performed on the method proposed by the present invention and the MCTM method and LDA method in document [6 ]. For the convenience of comparative experiments, the data of abnormal traffic behavior detection on the QMUL Junction Dataset 2 by the MCTM method and the LDA method are directly cited as the results in document [6 ]. Since the abnormal behavior patterns (3) and (4) shown in fig. 5 are less appeared on the QMUL Junction Dataset 2 data set, and the MCTM method only roughly detects the abnormal traffic patterns of 2 pedestrians, i.e., corresponding to the abnormal behaviors (1) and (2) in fig. 5, the present invention only uses the data of 2 abnormal behaviors for comparative experiments. Table 3 below shows the results of the anomaly detection method of the present invention, the MCTM and LDA methods on the QMUL Junction Dataset 2 Dataset, respectively.

TABLE 3 comparison of test results of anomaly detection Performance on QMEL Junction Dataset 2 data set by various methods

From the experimental results in table 3, on the QMUL Junction Dataset 2 data set, the abnormal traffic behavior detection method provided by the present invention obtains the best results on both abnormal behavior detection of pedestrian crossing road and pedestrian red light running, the total TPR (true rate) obtains the maximum value and the total FPR (false positive rate) obtains the minimum value. In conclusion, the method of the present invention achieves better abnormal behavior detection capability on the QMUL Junction Dataset 2 Dataset than the MCTM and LDA methods.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Reference to the literature

[1]X.Wang,X.Ma,E.Grimson,Unsupervised activity perception byhierarchical Bayesian models,in:IEEE Conference on Computer Vision andPattern Recognition,2007,pp.1–8.

[2]L.Song,F.Jiang,Z.Shi,A.Katsaggelos,“Understanding dynamic scenesby hierarchical motion pattern mining”,IEEE International Conference onMultimedia and Expo(ICME),pp.1–6,2011.

[3]K.Than,and T.B.Ho,"Fully sparse topic models",Proceedings of theEuropean conference on Machine Learning and Knowledge Discovery in Databases-Volume Part I,2012.

[4]Liao W,Rosenhahn B,Yang M Y.Video Event Recognition by CombiningHDP and Gaussian Process[C].IEEE International Conference on Computer VisionWorkshop.IEEE,2015:166-174.

[5]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journalof Machine Learning Research,2003,3:993-1022.

[6]T.Hospedales,S.Gong and T.Xiang,“A Markov Clustering Topic Modelfor Mining Behaviour in Video,”in Proc.Int’l.Conf.Computer Vision,pp.1165-1172,2009.

Claims

1. A method for detecting traffic modes and abnormal behaviors of traffic intersection video monitoring is characterized by comprising the following steps:

a1, dividing the traffic intersection video with the time length of T seconds into T-length traffic intersection videos according to the time sequence_sShort video clips of seconds, each video clip as a video document, resulting in a total of N ═ T/T_sA video document;

a2, calculating an optical flow vector of each adjacent pair of video frames of each video document obtained in A1;

a3, quantizing the optical flow vector obtained in A2 to obtain video words of each pair of video frames of each video document;

a4, counting the counting vector of the video words of each video document based on a bag-of-words model, and obtaining a document-word counting matrix M of a video document set consisting of the whole long video;

a5, carrying out theme extraction on the video document obtained in A4 by using a BNBP-PFA theme model to obtain theme-word distribution and document-theme distribution, wherein the obtained theme is a simple traffic mode in the video;

a6, using the theme obtained in A5 as a new word, using the document-theme distribution obtained in A5 as a new document, and performing theme extraction by using a BNBP-PFA theme model to obtain the theme-word and document-theme distribution of a second layer theme model, wherein the obtained theme is a complex traffic mode in a video;

a7, based on the two-layer BNBP-PFA topic model obtained by A5 and A6, calculating the log-likelihood function value of the video frame based on the topic-word distribution, the document-topic distribution, the topic-word distribution and the document-topic distribution of the first layer topic model and the second layer topic model in the two-layer topic model, and detecting the abnormal behavior in the video frame.

2. The method for detecting traffic patterns and abnormal behaviors at traffic intersections by video surveillance according to claim 1, wherein the step a2 specifically comprises:

1) for two consecutive video frames I₁，I₂Calculating an optical flow information vector (v) for each pixel (a, b)₁(a,b),v₂(a,b))；

2) According to the formula

And

the intensity and direction information (M (a, b), D (a, b)) of the optical flow for each pixel is obtained.

3. The method for detecting traffic patterns and abnormal behaviors at traffic intersection video surveillance as claimed in claim 2, wherein the process of quantizing the light flow vector to obtain the video word in the step a3 specifically comprises:

1) partitioning the position information of the optical flow: will be of size N_x×N_yVideo frame partitioning into N₁×N₁Of the pixel block, to obtain in total

A block of pixels, wherein the symbol [. X]Denotes taking an integer, N₁,N_x,N_yAll the pixel blocks are positive integers, and the coordinates of the center point of the pixel block are used as the coordinates of the block;

2) quantification of optical flow intensity and direction: each pixel block comprising N₁ ²A pixel point, and the N is₁ ²The average light stream value of each pixel point is used as the light stream vector of the pixel block

When the optical flow intensity value of the block exceeds a preset threshold value Th

Quantizing the optical flow direction of the block into S directions, here Th ∈ (0, 1);

3) according to the above quantization method, the size of the vocabulary of the set of video documents can be obtained as

4. The method for detecting traffic patterns and abnormal behaviors at traffic intersections by video surveillance according to claim 3, wherein the step A5 specifically comprises:

1) note that the document count matrix obtained in a4 is M_ij∈R^P×NThe counting matrix comprises P characteristics of N documents according to a formula

m_ijk～Pois(φ_ikθ_kj)，φ_ik～Dir(α_φ,…,α_φ),θ_kj～Gamma(r_k,p_k/(1-p_k))，r_k～Gamma(c₀r₀,1/c₀)，p_kThe BNBP-PFA subject modeling process of Beta (c, c (1-)) can obtain K subject distribution matrixes phi ∈ R^P×KAnd topic-word distribution phi_ikAnd a distribution matrix Θ ∈ R of K topics among N documents^K×NAnd document-topic distribution theta_kjWherein the theme distribution matrix represents the distribution of K themes on P features;

2) the probability distribution φ of the BNBP-PFA topic model in step A5 above can be obtained by using the commonly used Markov chain Monte Carlo inference algorithm_ik、θ_kjAnd a related parameter r_kAnd p_kThe specific reasoning algorithm is as follows:

a) note the book

The upper bound of the number of topics K is determined by c γ α Beta (c, c (1-)), where P is the number of features, N is the number of documents, c ═ 1, γ ═ 1, α ═ 1, ═ 0.05;

b) m is obtained by sampling according to the following formula (1)_ijk；

Wherein

c) Using the relationship between Poisson distribution and polynomial distribution, and the relational expression

P ([ m ]) is known_1jk,…,m_pjk]|-)＝Mult(m_·jk；φ_ik) Then phi can be obtained by sampling according to the following formula (2)_k；

p(φ_ik|-)～Dir(α_φ+m_1·k,…,α_φ+m_P·k) (2)

d) Marginalizing phi_ikAnd theta_kjAfter, m_·jk～NB(r_k,p_k)，p_kBeta (c, c (1-)), then p_kCan be obtained by sampling according to the following formula (3);

p(p_k|-)～Beta(c+m_··k,c(1-)+Nr_k) (3)

e) due to the fact that

R can be obtained by sampling according to the following equation (4)_k；

5. The method for detecting traffic patterns and abnormal behaviors at traffic intersections by video surveillance according to claim 4, wherein the step A6 specifically comprises:

1) subjecting the subject φ obtained in the step A5_ikDocument-distribution of words in A6 as words in step A6 θ_kjIt is regarded as composed of the subject matter in A5 according to the formula

φ′_k′～Dir(α′_φ′,…,α′_φ′)，θ′_k′j～Gamma(r′_k′,p′_k′/(1-p′_k′))，r′_k′～Gamma(c′₀r′₀,1/c′₀)，p′_k′The BNBP-PFA topic modeling process of Beta (c '', c '(1-')) can obtain K 'topics-word distribution phi'_kk′And document-distribution of topics θ'_k′j；

2) The probability distribution phi 'of the BNBP-PFA topic model in step A6 above can be obtained by using the commonly used Markov chain Monte Carlo reasoning algorithm'_kk′、θ′_k′jAnd a related parameter r'_k′And p'_k′The specific reasoning algorithm is as follows:

a) note the book

The upper bound of the number of subjects K ' is determined by c ' γ ' α ' Beta (c ", c ' (1-)), where K is the number of features, the number of first-layer subjects, N is the number of video documents, c ' ═ 1, γ ' ═ 1, α ' ═ 1, ' ═ 0.05;

b) sampling according to the following equation (5) to obtain theta_kjk′；

[θ_kj1,…,θ_kjK′]～Mult(θ_kj；ζ′_kj1,…,ζ′_kjK′) (5)

p(φ′_k′|-)～Dir(α′_φ′+θ_1·k′,…,α′_φ′+θ_K·k′) (6)

d) Marginalized phi'_k′And θ'_k′jRear, theta_·jk′～NB(r′_k′,p′_k′)，p′_k′Beta (c ', c ' (1-)), then p '_k′Can be obtained by sampling according to the following formula (7);

p(p′_k′|-)～Beta(c′′+θ_··k′,c′(1-′)+N′r′_k′) (7)

e) due to the fact that

Then r 'can be obtained by sampling according to the following formula (8)'_k′；

6. The method for detecting traffic patterns and abnormal behaviors at traffic intersections by video surveillance according to claim 5, wherein the step A7 specifically comprises:

A72. on the test set Y, according to the formula

Wherein

y_pi＝Y(p,i)，φ_pkIs the topic distribution, theta, obtained by the first layer BNBP-PFA topic model_kiIs a video frame y_piIn a subject phi_pkCoefficient of distribution of (A), N₁Is a video frame y_piLength, i.e. the number of visual words it contains, is calculated for each video frame y_piNormalized likelihood function value F on first layer BNBP-PFA topic model₁；

A73. According to the formula

Wherein

θ_kiIs a video frame y_piIn a subject phi_pkCoefficient of distribution of phi'_kk′Is a theme distribution, theta ', obtained from a second layer BNBP-PFA theme model'_k′iIs theta_kiAt a topic of'_kk′Coefficient of distribution of (A), N₂Is theta_kiThe length of the vector, i.e. the subject phi it contains_pkNumber of each video frame y_piIn a subject phi_pkCoefficient theta of distribution of_kiNormalized likelihood function value F on second layer BNBP-PFA topic model₂；

A75. the likelihood function value F calculated in A74 and a given threshold value Th₁Making a comparison, if there is F < Th₁Wherein Th₁∈ (0,1), then video frame y_piThe middle packet contains abnormal behavior, otherwise it does not.