CN108108688A

CN108108688A - A kind of limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic with theme modeling

Info

Publication number: CN108108688A
Application number: CN201711366304.XA
Authority: CN
Inventors: 纪刚; 周粉粉; 周萌萌; 安帅; 商胜楠; 于腾
Original assignee: Qingdao Powerise Technology Co Ltd
Current assignee: Qingdao Powerise Technology Co Ltd
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2018-06-01
Anticipated expiration: 2037-12-18
Also published as: CN108108688B

Abstract

The invention belongs to technical field of video monitoring,It is related to limbs conflict behavior detection method of the kind based on the extraction of low-dimensional space-time characteristic with theme modeling,The step of detection is to need first to define a word sheet,The location of pixels of re-quantization object,The size of foreground target in scene is described,Determine the motion conditions of foreground pixel,By completing complete this foundation of word and the foundation of corpus after above-mentioned steps,The judgement of limbs conflict behavior is carried out by above-mentioned calculation,This method combines the data characteristics expression of low-dimensional and the complex scene analysis based on model,Utilize the variation of position of human body information in action,Learn a mass motion model unrelated with body part,By analyzing mass motion model,Parameter in the result and model that detect is compared,And then judge human motion state,The present invention is compared with prior art,This method design concept is ingenious,Testing principle science,Detection mode is simple and detects accuracy height,Great market prospects.

Description

It is a kind of to be detected based on the extraction of low-dimensional space-time characteristic and the limbs conflict behavior of theme modeling Method

Technical field：

The invention belongs to technical field of video monitoring, are related to a kind of limbs conflict behavior detection method, more particularly to a kind of Limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic with theme modeling.

Background technology：

In recent years, increasing with various safe accidents, the promotion of popular awareness of safety, the artificial intelligence of simultaneous The energy infiltration of theory and the continuous maturation of artificial intelligence technology, intelligent monitoring have been to be concerned by more and more people.Traditional monitoring System mainly realizes the safety management to public arena by way of manually monitoring, and lacks real-time and initiative.Very much In the case of, video monitoring is to play the role of video backup not accomplishing the responsibility supervised due to unattended.In addition, with It the popularization of monitoring camera and lays extensively, traditional artificial monitor mode cannot meet the needs of modern monitoring.For Solve the problems, such as this, masses are devoted to seek solution to replace manual operation.At present, with Video Supervision Technique and The continuous development of information science, there is significant progress in the fields such as video monitoring, human-computer interaction, video search, automatically-monitored It is increasingly becoming a research topic with wide application prospect.Unusual checking is the important content monitored automatically, is compared It being concentrated in usually Human bodys' response in the identification of the conventional action of people, abnormal behaviour is usually sudden with height, and Duration is shorter, it is difficult to the characteristics of obtaining behavioural characteristic.

In recent years, for the detection of abnormal behaviour, researchers propose different methods, abnormal in early stage behavioral value Research work is focused primarily upon describes human body behavior using simple set model, such as based on two-dimensional silhouette model, three dimensional cylinder Model etc.；In addition to static geometric model, researcher also attempts to be modeled using some features for describing human motion, such as shape The features such as shape, angle, position, movement velocity, the direction of motion, movement locus carry out behavior description and differentiation, and using bag It includes the subspace method including Principal Component Analysis, independent component analysis method etc. and dimensionality reduction and screening is carried out to the feature of extraction, from And carry out behavioural analysis.The existing invention for unusual checking exists and fails the inherent characteristics for getting a real idea of abnormal behaviour, Thus existing unusual checking model can not abnormal reaction behavior completely essence, so as to cause according to existing abnormal behaviour Accuracy of detection that detection model obtains and not up to ideal effect, therefore, design is a kind of to be extracted and main based on low-dimensional space-time characteristic The limbs conflict behavior detection method of modeling is inscribed, computational methods are accurate, and testing result is accurate.

The content of the invention：

It is an object of the invention to overcome defect existing in the prior art, seek design one kind and carried based on low-dimensional space-time characteristic The limbs conflict behavior detection method with theme modeling is taken, calculation is simple, and computational accuracy is high, can be rapidly and accurately to limb Body conflict behavior is detected, and being capable of timely early warning.

To achieve these goals, a kind of limbs based on the extraction of low-dimensional space-time characteristic with theme modeling of the present invention The processing step that conflict behavior detection method specifically includes is as follows：

The definition of S1, word sheet

First go out to meet the semantic understanding of human cognitive from original monitor video extracting data, pass through the algorithm of the present invention Design, which automatically analyzes, understands video data, and analytic process is divided into the extraction of foreground target, target signature represents and behavioural analysis is returned Class, this method is based on LDMA models for human body unusual checking in video monitoring, to the pixel position of each object in video It puts and is described, to each pixel decimation feature vector, position of this feature vector comprising each pixel, the speed of movement and side To, be under the jurisdiction of the size of target object, ultimately form visual information word sheet and document, and define an effective word sheet, as Cover the dictionary that the pixel in monitor video can inquire about；

S2, the location of pixels for quantifying object

In the video obtained in video monitoring, behavior is substantially characterized by the position of behavior hair survivor, therefore, this hair In the bright structure that location information is considered to word sheet, the location of pixels of object in video is quantized into the thin of nonoverlapping 10*10 In cell element, for the object video of M × N, therefore M/10 × N/10 cell tuple can be obtained；

S3, description scene in foreground target size

In order to accurately represent foreground target in object video, which kind of each foreground pixel and the pixel belonged to by the present invention Foreground target connects, and in the video data obtained in video monitoring, the prospect frame observed is sized to based on them Two classes are divided into, one kind is small prospect frame, mainly pedestrian, and one kind is big prospect frame, mainly includes vehicle or a group Pedestrian；Therefore, the present invention clusters to classify the size of prospect frame using K-means, so as to obtain the prospect that each pixel is subordinate to Target takes the cluster numbers k=2 in K-means, final to describe the size of the target in scene, i.e., 1 using cluster label 1 and 2 It is big target for Small object, 2；

S4, the motion conditions for determining foreground pixel

For the scene in video monitoring, the content of analysis is directed to foreground target, it is necessary to carry out before background subtraction obtains Scene element, and each foreground pixel to obtaining solves the Optic flow information of the pixel according to Lucas-Kanade optical flow algorithms, leads to The threshold value of setting light stream vectors size is crossed to define prospect static pixels (static labels) and dynamic pixel；Again dynamic picture Element is quantized into the motion state described with 4 kinds of direction of motion, track, position, speed sports immunology words, therefore, for detection The foreground pixel arrived has and determines prospect picture with the direction of motion, track, position, speed and static 5 kinds of possible sports immunology words The motion conditions of element；

S5, video sequence and pixel are defined

Video sequence under scene in video monitoring is denoted asIt willIt is divided into several video sequences Row, wherein,For m-th of video segment of segmentation, video sequenceRegard current corpus asThen Document (document) in corresponding corpus, in video segmentIn, definition pixel is word (word), and each word corresponds to One theme (topic), then with the variation of time t,In, each word theme generates transfer to other themes or shifts certainly State, from MCMC (Markov Chain MonteCarlo) characteristic, this characteristic can reach one after a period of time has passed Kind Stationary Distribution；

S6, word sheet is established

There are the expression of M/10 × N/10 kinds, fortune for the position of each pixel of the object video of M × N according to above-mentioned steps Dynamic form has 5 kinds of descriptions, big target and Small object there are two types of statement, and the word that can be obtained is expressed as M/10 × N/10 × 5 × 2 Kind form, i.e., for some foreground pixel, existKind of describing mode, but at a time under, each pixel Movable information and the target that is subordinate to there is independence, i.e., for video segment, the different masters formed with the variation of time t Topic, theme independent should respectively obtain, and therefore, each position (location) can use union feature (to move, greatly It is small) it representsIt will move and the feature of size cascades, then the word as each cell member Set, uses V_cIt represents, this is meant that when building a video-frequency band, and a pixel will provide two kinds of features simultaneously to this position Word --- the target sizes for moving and being subordinate to, then final word originally can be expressed as M/10 × N/10 × (5+2) form；Therefore, one The Feature Words of a pixel can be defined as w_c,aC is thin cell location, and a is forms of motion and the union feature of size；

The foundation of S7, corpus

Monitor video is divided into several short video-frequency bands, each video-frequency band is as a document, t at any time in video-frequency band The pixel of variation is expressed as the subject content that the word occurred in document and a series of this word represent, then is generated with each pixel Word sheet for foundation, if total word frequency in corpus is N, in all N number of words, if paying close attention to each word v_iGeneration Frequency frequency n_i, then

Then the probability of each language material is in corpus：

Wherein, P (n) refers to the probability of the frequency number that each word occurs in corpus；

So, for each specific themeAnd the probability of vocabulary in corpus is generated by the themeIt is then final The probability that corpus generates is exactly to each themeThe cumulative summation of the vocabulary probability of upper generation：

In corpus WObey multinomial distribution,ThemeIt obeys One probability distributionThis distribution becomes parameterPrior distribution, prior distributionSelect the conjugation of multinomial distribution Distribution --- Dirichlet is distributed；According to the regularity of distribution of Dirichlet, the generation probability to calculate corpus of text is：

Wherein,Represent the parameter of Dirichlet prior distributions；The corpus of text is into corpus by sets of documentation

Regard video sequence as a document (document), document is mixed by multiple themes (topic), and Each Topic is the probability distribution on vocabulary, and each word that each pixel represents in video sequence is fixed by one Topic generations, this process is exactly the process of Document Modeling, is a bag-of-words model：If there is V topic- Word is denoted asEach theme corresponds to the probability distribution of a term vectorFor including the language material C=of M documents (d₁,d₂,···,d_M) in every document d_m, all can be there are one specific doc-topicThat is the corresponding master of every document Inscribing vectorial probability distribution isSo m document d_mIn the generating probability of each word be：

The generating probability of entire chapter document is：

Due to independently of each other, the generating probability of entire language material being write out according to above-mentioned formula between document, Topic- is generated Then Model carries out solution locally optimal solution using EM algorithms；

The judgement of S8, limbs conflict behavior

Limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic and theme modeling combines the data of low-dimensional Character representation and the complex scene analysis based on model, analyze video sequence with this, according to detecting people in video Body position using the variation of position of human body information in action, learns a mass motion model unrelated with body part, leads to Analysis mass motion model is crossed, the parameter in the result and model that detect is compared, and then judges human motion shape State, the present invention in each behavior can correspond to a kind of theme distribution, under trained model case, in the video segment tested Such as occur the situation of limbs conflict, then this behavior, which can concentrate, to be distributed in a kind of theme, and then determines this according to theme Kind behavior is to belong to the state for limbs conflict occur.

Compared with prior art, the present invention it has the advantages that：It is main accurately to be extracted using the spectral signature of image The profile of moving region can clearly see the contour edge of moving target, be analyzed for behavioural characteristic, be applicable not only to beat The limbs conflict behaviors such as frame are equally applicable to the detection of other behaviors, such as quickly mobile behavior, this method design concept are skilful Wonderful, testing principle science, detection mode is simple and detects accuracy height, and application environment is friendly, great market prospects.

Description of the drawings：

Fig. 1 is the foreground detection design sketch of different video two field picture in video flowing of the present invention.

Fig. 2 is the limbs conflict behavior detection method of the present invention based on the extraction of low-dimensional space-time characteristic with theme modeling Process flow diagram.

Specific embodiment：

The present invention is described further by way of example and in conjunction with the accompanying drawings.

Embodiment：

To achieve these goals, being rushed based on the extraction of low-dimensional space-time characteristic and the limbs of theme modeling described in the present embodiment The processing step that prominent behavioral value method specifically includes is as follows：

The definition of S1, word sheet

First go out to meet the semantic understanding of human cognitive from original monitor video extracting data, calculation through this embodiment Method design, which automatically analyzes, understands video data, and analytic process is divided into the extraction of foreground target, target signature represents and behavioural analysis Sort out, this method is based on LDMA models for human body unusual checking in video monitoring, to the pixel of each object in video Position is described, to each pixel decimation feature vector, position of this feature vector comprising each pixel, movement speed and Direction, the size for being under the jurisdiction of target object ultimately form visual information word sheet and document, and define an effective word sheet, make The dictionary that can be inquired about for the pixel covered in monitor video；

S2, the location of pixels for quantifying object

In the video obtained in video monitoring, behavior is substantially characterized by the position of behavior hair survivor, therefore, this reality It applies in the structure that location information is considered word sheet by example, the location of pixels of object in video is quantized into nonoverlapping 10*10's In cell member, for the object video of M × N, therefore M/10 × N/10 cell tuple can be obtained；

S3, description scene in foreground target size

In order to accurately represent foreground target in object video, what each foreground pixel and the pixel belonged to by the present embodiment Kind foreground target connects, and in the video data obtained in video monitoring, the prospect frame observed is based on their size energy Two classes are enough divided into, one kind is small prospect frame, mainly pedestrian, and one kind is big prospect frame, mainly includes vehicle or one Group pedestrian；Therefore, the present embodiment is subordinate to using K-means clusters come the size for prospect frame of classifying so as to obtaining each pixel Foreground target takes the cluster numbers k=2 in K-means, final to describe the big of the target in scene using cluster label 1 and 2 Small, i.e., 1 is Small object, and 2 be big target；

S4, the motion conditions for determining foreground pixel

S5, video sequence and pixel are defined

Video sequence under scene in video monitoring is denoted asIt willIt is divided into several videos Sequence, wherein,For m-th of video segment of segmentation, video sequenceRegard current corpus asThen Document (document) in corresponding corpus, in video segmentIn, definition pixel is word (word), each word corresponds to a theme (topic), then with the variation of time t,In, each word theme is to other masters Topic generates transfer or from transfering state, and from MCMC (Markov Chain MonteCarlo) characteristic, this characteristic is being passed through A kind of Stationary Distribution can be reached after a period of time；

S6, word sheet is established

The foundation of S7, corpus

Then the probability of each language material is in corpus：

So, for each specific themeAnd the probability of vocabulary in corpus is generated by the themeThen The probability that final corpus generates is exactly to each themeThe cumulative summation of the vocabulary probability of upper generation：

The generating probability of entire chapter document is：

The judgement of S8, limbs conflict behavior

Limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic and theme modeling, the data with reference to low-dimensional are special Sign represents and the complex scene analysis based on model, video sequence is analyzed with this, according to detecting human body in video Position using the variation of position of human body information in action, learns a mass motion model unrelated with body part, pass through Mass motion model is analyzed, the parameter in the result and model that detect is compared, and then judges human motion state, Each behavior can correspond to a kind of theme distribution in the present embodiment, under trained model case, in the video segment tested Such as occur the situation of limbs conflict, then this behavior, which can concentrate, to be distributed in a kind of theme, and then determines this according to theme Kind behavior is to belong to the state for limbs conflict occur.

Claims

1. a kind of limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic with theme modeling, it is characterised in that specific Detection method carries out in accordance with the following steps：

The definition of S1, word sheet

First go out to meet the semantic understanding of human cognitive from original monitor video extracting data, be designed by the algorithm of the present invention It automatically analyzes and understands video data, analytic process is divided into the extraction of foreground target, target signature represents and behavioural analysis is sorted out, should Method is based on LDMA models for human body unusual checking in video monitoring, and the location of pixels of each object in video is carried out Description, to each pixel decimation feature vector, position of this feature vector comprising each pixel, the speed of movement and direction, person in servitude Belong to the size of target object, ultimately form visual information word sheet and document, and define an effective word sheet, supervised as covering The dictionary that pixel in control video can inquire about；

S2, the location of pixels for quantifying object

In the video obtained in video monitoring, behavior is substantially characterized by the position of behavior hair survivor, and therefore, the present invention will Location information is considered in the structure of word sheet, the location of pixels of object in video is quantized into the cell member of nonoverlapping 10*10 In, for the object video of M × N, therefore M/10 × N/10 cell tuple can be obtained；

S3, description scene in foreground target size

In order to accurately represent foreground target in object video, which kind of prospect each foreground pixel and the pixel are belonged to by the present invention Target connects, in the video data obtained in video monitoring, the prospect frame observed based on they be sized to divide For two classes, one kind is small prospect frame, mainly pedestrian, and one kind is big prospect frame, mainly includes vehicle or a group pedestrian； Therefore, the present invention clusters to classify the size of prospect frame using K-means, so as to obtain the foreground target that each pixel is subordinate to, The cluster numbers k=2 in K-means is taken, final to describe the size of the target in scene using cluster label 1 and 2, i.e., 1 is small Target, 2 be big target；

S4, the motion conditions for determining foreground pixel

For the scene in video monitoring, the content of analysis is directed to foreground target, it is necessary to which carrying out background subtraction obtains prospect picture Element, and each foreground pixel to obtaining solves the Optic flow information of the pixel according to Lucas-Kanade optical flow algorithms, by setting The threshold value of light stream vectors size is determined to define prospect static pixels (static labels) and dynamic pixel；Again dynamic amount of pixels The motion state that chemical conversion is described with 4 kinds of direction of motion, track, position, speed sports immunology words, therefore, for what is detected Foreground pixel has and determines foreground pixel with the direction of motion, track, position, speed and static 5 kinds of possible sports immunology words Motion conditions；

S5, video sequence and pixel are defined

Video sequence under scene in video monitoring is denoted asIt willSeveral video sequences are divided into, In,For m-th of video segment of segmentation, video sequenceRegard current corpus asThen Document (document) in corresponding corpus, in video segmentIn, definition pixel is word (word), and each word corresponds to one A theme (topic), then with the variation of time t,In, each word theme generates transfer or certainly transfer shape to other themes State, from MCMC (Markov Chain MonteCarlo) characteristic, this characteristic can reach a kind of after a period of time has passed Stationary Distribution；

S6, word sheet is established

There is the expression of M/10 × N/10 kinds for the position of each pixel of the object video of M × N according to above-mentioned steps, move shape Formula has 5 kinds of descriptions, big target and Small object there are two types of statement, and the word that can be obtained is expressed as M/10 × N/10 × 5 × 2 kind shape Formula that is, for some foreground pixel, existsKind of describing mode, but at a time under, the fortune of each pixel Dynamic information and the target being subordinate to have independence, i.e., for video segment, with the different themes that the variation of time t is formed, Its theme independent should respectively obtain, and therefore, each position (location) can use union feature (movement, size) To representTo move and the feature of size cascades, then as each cell member word collection It closes, uses V_cIt represents, this is meant that when building a video-frequency band, and a pixel will provide two kinds of features simultaneously to this position Word --- the target sizes for moving and being subordinate to, then final word originally can be expressed as M/10 × N/10 × (5+2) form；Therefore, one The Feature Words of a pixel can be defined as w_c,aC is thin cell location, and a is forms of motion and the union feature of size；

The foundation of S7, corpus

Monitor video is divided into several short video-frequency bands, each video-frequency band is as a document, and t changes at any time in video-frequency band Pixel be expressed as the subject content that the word occurred in document and a series of this word represent, then the word generated with each pixel This is foundation, if total word frequency in corpus is N, in all N number of words, if paying close attention to each word v_iOccurrence frequency Frequency n_i, then

Then the probability of each language material is in corpus：

So, for each specific themeAnd the probability of vocabulary in corpus is generated by the themeThen final language Expect that the probability that storehouse generates is exactly to each themeThe cumulative summation of the vocabulary probability of upper generation：

In corpus WObey multinomial distribution,ThemeObey one generally Rate is distributedThis distribution becomes parameterPrior distribution, prior distributionSelect the conjugation point of multinomial distribution Cloth --- Dirichlet is distributed；According to the regularity of distribution of Dirichlet, the generation probability to calculate corpus of text is：

Regard video sequence as a document (document), document is mixed by multiple themes (topic), and each Topic is the probability distribution on vocabulary, and each word that each pixel represents in video sequence is by a fixed Topic life Into, this process is exactly the process of Document Modeling, is a bag-of-words model：If there is V topic-word, note ForEach theme corresponds to the probability distribution of a term vectorFor including the language material C=(d of M documents₁, d₂,···,d_M) in every document d_m, all can there are one specificThat is the corresponding master of every document Inscribing vectorial probability distribution isSo m document d_mIn the generating probability of each word be：

The generating probability of entire chapter document is：

Due to independently of each other, the generating probability of entire language material being write out according to above-mentioned formula between document, Topic-Model is generated, Then solution locally optimal solution is carried out using EM algorithms；

The judgement of S8, limbs conflict behavior

Limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic and theme modeling, with reference to the data characteristics table of low-dimensional Show and the complex scene analysis based on model, video sequence is analyzed with this, according to detecting position of human body in video, Using the variation of position of human body information in action, learn a mass motion model unrelated with body part, pass through analysis Parameter in the result and model that detect is compared, and then judges human motion state, this hair by mass motion model Each behavior can correspond to a kind of theme distribution in bright, under trained model case, if any going out in the video segment tested The situation of existing limbs conflict, then this behavior, which can concentrate, to be distributed in a kind of theme, and then determines this behavior according to theme It is to belong to the state for limbs conflict occur.