CN108108688B

CN108108688B - Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling

Info

Publication number: CN108108688B
Application number: CN201711366304.XA
Authority: CN
Inventors: 纪刚; 周粉粉; 周萌萌; 安帅; 商胜楠; 于腾
Original assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Current assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2021-11-23
Anticipated expiration: 2037-12-18
Also published as: CN108108688A

Abstract

The invention belongs to the technical field of video monitoring, and relates to a limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling, which comprises the steps of defining a word book, quantizing the pixel position of an object, describing the size of a foreground target in a scene, and determining the motion condition of a foreground pixel, completing the establishment of the complete word book and the establishment of a corpus through the steps, and judging the limb conflict behavior through the calculation mode, wherein the method combines low-dimensional data feature representation and model-based complex scene analysis, learns an integral motion model irrelevant to body parts by using the change of human body position information in motion, compares the detected result with parameters in the model through analyzing the integral motion model, and further judges the motion state of a human body, the method has the advantages of ingenious design concept, scientific detection principle, simple detection mode, high detection accuracy and wide market prospect.

Description

Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling

The technical field is as follows:

the invention belongs to the technical field of video monitoring, relates to a limb conflict behavior detection method, and particularly relates to a limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling.

Background art:

in recent years, with the increase of various safety emergencies, the safety consciousness of the public is improved, and meanwhile, along with the penetration of an artificial intelligence concept and the continuous maturity of an artificial intelligence technology, intelligent monitoring is more and more concerned by people. The traditional monitoring system mainly realizes the safety management of public places in a manual monitoring mode and lacks real-time performance and initiative. In many cases, video surveillance is not in the role of supervision because it is unattended to only play a role of video backup. In addition, with the popularization and wide arrangement of monitoring cameras, the traditional manual monitoring mode cannot meet the requirements of modern monitoring. To solve this problem, the public is working to find solutions to replace manual operations. At present, with the continuous development of video monitoring technology and information science, the fields of video monitoring, human-computer interaction, video searching and the like have great development, and automatic monitoring gradually becomes a research subject with wide application prospect. The abnormal behavior detection is an important content of automatic monitoring, and compared with the general human behavior recognition which focuses on the recognition of the conventional actions of people, the abnormal behavior detection has the characteristics of high burstiness, short duration and difficulty in acquiring behavior characteristics.

In recent years, researchers have proposed different methods for detecting abnormal behaviors, and research work for detecting abnormal behaviors in early stage mainly focuses on describing human body behaviors by using a simple set model, such as a model based on a two-dimensional contour, a three-dimensional cylinder, etc.; besides the static geometric model, researchers try to perform behavior description and differentiation by using certain characteristics describing human body motion, such as shape, angle, position, motion speed, motion direction, motion track and the like, and perform dimension reduction and screening on the extracted characteristics by adopting a subspace method including a principal component analysis method, an independent component analysis method and the like, so as to perform behavior analysis. The existing invention aiming at abnormal behavior detection has the inherent characteristic that abnormal behaviors cannot be really understood, so that the existing abnormal behavior detection model cannot completely reflect the essence of the abnormal behaviors, and the detection precision obtained according to the existing abnormal behavior detection model does not achieve the ideal effect.

The invention content is as follows:

the invention aims to overcome the defects in the prior art, and seeks to design a method for detecting the limb conflict behavior based on low-dimensional space-time feature extraction and topic modeling, which has the advantages of simple calculation mode, high calculation precision, capability of quickly and accurately detecting the limb conflict behavior and capability of timely early warning.

In order to achieve the above object, the method for detecting the limb conflict behavior based on the low-dimensional space-time feature extraction and the topic modeling specifically comprises the following steps:

s1, definition of word book

Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, the pixel position of each object in a video is described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers the pixels in the monitoring video and can be inquired;

s2, quantizing the pixel position of the object

In the video obtained by video monitoring, the behavior is basically characterized by the position of a behavior generator, therefore, the invention takes the position information into account in the construction of the word book, the pixel position of an object in the video is quantized into 10 × 10 non-overlapping cell elements, and for an M × N video object, M/10 × N/10 cell element groups can be obtained;

s3, describing the size of the foreground object in the scene

In order to accurately represent foreground objects in video objects, each foreground pixel is associated with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame which is mainly a pedestrian, and the other type is a large foreground frame which mainly comprises a vehicle or a group of pedestrians; therefore, the method uses K-means clustering to classify the size of the foreground frame so as to obtain the foreground target to which each pixel belongs, the clustering number K in the K-means is taken as 2, and finally, clustering labels 1 and 2 are used for describing the size of the target in the scene, namely 1 is a small target and 2 is a large target;

s4, determining the motion situation of the foreground pixel

For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels (static labels) and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that 5 possible motion descriptors, namely motion direction, track, position and speed, determine the motion condition of the foreground pixel for the detected foreground pixel;

s5, defining video sequence and pixel point

Recording video sequence under scene in video monitoring

Will be provided with

Into a number of video sequences, wherein,

for the m-th segmented video segment, the video sequence is divided

Considering the current corpus W, then

Corresponding to documents (documents) in the corpus, in the video segment

In the method, pixel points are defined as words (word), each word corresponds to a topic (topic), and the change of the time t is shown in

In the method, each word theme generates a transition or self-transition state to other themes, and the MCMC (Markov Chain Monte Carlo) characteristic can know that the characteristic can reach a smooth distribution after a period of time;

s6, establishing a vocabulary book

According to the steps, M/10 XN/10 expressions are provided for the position of each pixel of an M × N video object, 5 descriptions are provided for the motion form, two expressions are provided for a large target and a small target, and the expression of the obtained words is M/10 XN/10 × 5 × 2 forms, namely, for a certain foreground pixel, the expression exists

Describing the method, at a certain moment, the motion information and the attached object of each pixel have independence, that is, for the video segment, different topics are formed along with the change of time t, the topics should be acquired independently and respectively, therefore, each location (location) can be represented by joint features (motion, size)

Features of movement and size are concatenated and then used as a set of words for each cell element, using V_cRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, a feature word for a pixel may be defined as w_c,aC is the cell position, a is the combined characteristics of movement form and size;

s7 establishment of corpus

Dividing a monitoring video into a plurality of short video segments, taking each video segment as a document, expressing pixel points in the video segments changing along with time t into words appearing in the document and subject contents expressed by the series of words, and then taking a word book generated by each pixel as a basis, if the total word frequency in a corpus is N, in all N words, if each word v is concerned_iNumber of generation frequencies n_iThen, then

Then the probability of each corpus in the corpus is:

wherein,

the probability of the frequency times of each word in the corpus is referred to;

then, for each specific topic

And generating probabilities of words in the corpus from the topics

Then the final corpusThe probability generated is for each topic

Cumulative summation of the above generated vocabulary probabilities:

in corpus W

Subject to the distribution of the polynomial expression,

themes

Obeying a probability distribution

This distribution becomes a parameter

A priori distribution of

Selecting a conjugate distribution of the polynomial distribution, namely Dirichlet distribution; according to the distribution rule of Dirichlet, the probability of generating text corpora is calculated as follows:

wherein,

a parameter representing a prior distribution of Dirichlet; the text corpus is a corpus composed of documents

Regarding a video sequence as a document (document), the document is formed by mixing a plurality of topics (topics), each Topic is probability distribution on words, each word represented by each pixel in the video sequence is generated by a fixed Topic, and the process is a document modeling process, namely a bag-of-words model: if there are T topoc-words, it is recorded as

Probability distribution of one word vector for each topic

For corpus C ═ d containing M documents₁,d₂,···,d_M) Each document d in (1)_mThere will be a specific doc-topic

That is, each document corresponds to a topic vector probability distribution of

Then the mth document d_mThe generation probability of each word in (1) is:

the generation probability of the whole document is:

because the documents are mutually independent, the generation probability of the whole corpus is written according to the formula to generate a Topic-Model, and then the local optimal solution is solved by using an EM algorithm;

s8, judgment of limb conflict behavior

A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling combines low-dimensional data feature representation and complex scene analysis based on a model to analyze a video sequence, detects the position of a human body in the video, learns an overall motion model irrelevant to the body part by using the change of human body position information in motion, compares the detected result with parameters in the model by analyzing the overall motion model, and further judges the motion state of the human body.

Compared with the prior art, the invention has the following beneficial effects: the method has the advantages that the spectral characteristics of the image are mainly adopted to accurately extract the outline of the motion area, the outline edge of the motion target can be clearly seen for behavior characteristic analysis, the method is not only suitable for limb conflict behaviors such as fighting and the like, but also suitable for detection of other behaviors such as rapid movement and the like, the design concept is ingenious, the detection principle is scientific, the detection mode is simple, the detection accuracy is high, the application environment is friendly, and the method has a great market prospect.

Description of the drawings:

fig. 1 is a diagram illustrating the foreground detection effect of different video frame images in a video stream according to the present invention.

FIG. 2 is a process flow diagram of a method for detecting a limb conflict behavior based on low-dimensional spatiotemporal feature extraction and topic modeling according to the present invention.

The specific implementation mode is as follows:

the invention is further illustrated by the following examples in conjunction with the accompanying drawings.

Example (b):

in order to achieve the above object, the method for detecting a limb conflict behavior based on low-dimensional spatio-temporal feature extraction and topic modeling specifically includes the following steps:

s1, definition of word book

Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design of the embodiment, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, pixel positions of each object in a video are described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers pixels in the monitoring video and can be inquired;

s2, quantizing the pixel position of the object

In the video obtained by video monitoring, the behavior is basically characterized by the position of a behavior generator, therefore, in the embodiment, the position information is considered in the construction of the word book, the pixel position of the object in the video is quantized into 10 × 10 non-overlapping cell elements, and for an M × N video object, M/10 × N/10 cell element groups can be obtained;

s3, describing the size of the foreground object in the scene

In order to accurately represent foreground objects in video objects, each foreground pixel is associated with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame mainly including pedestrians, and the other type is a large foreground frame mainly including vehicles or a group of pedestrians; therefore, in the embodiment, the size of the foreground frame is classified by using K-means clustering, so as to obtain a foreground target to which each pixel belongs, the clustering number K in the K-means is taken to be 2, and finally, the sizes of targets in a scene are described by using clustering labels 1 and 2, that is, 1 is a small target and 2 is a large target;

s4, determining the motion situation of the foreground pixel

s5, defining video sequence and pixel point

Recording video sequence under scene in video monitoring

Will be provided with

Into a number of video sequences, wherein,

for the m-th segmented video segment, the video sequence is divided

Considering the current corpus W, then

Corresponding to documents (documents) in the corpus, in the video segment

s6, establishing a vocabulary book

There are M/10 XN/10 representations, motion shapes, for each pixel location of an M × N video object according to the above stepsThe expression has 5 descriptions, and the large target and the small target have two expressions, and the expression of the word can be obtained in M/10 XN/10 X5 X2 forms, namely for a certain foreground pixel, the expression exists

s7 establishment of corpus

Then the probability of each corpus in the corpus is:

wherein,

then, for each specific topic

And generating probabilities of words in the corpus from the topics

The probability of the final corpus generation is for each topic

Cumulative summation of the above generated vocabulary probabilities:

in corpus W

Subject to the distribution of the polynomial expression,

themes

Obeying a probability distribution

This distribution becomes a parameter

A priori distribution of

wherein,

Probability distribution of one word vector for each topic

Then the mth document d_mThe generation probability of each word in (1) is:

the generation probability of the whole document is:

s8, judgment of limb conflict behavior

A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling is characterized in that low-dimensional data feature representation and complex scene analysis based on a model are combined to analyze a video sequence, a human body position is detected in a video, a whole motion model irrelevant to a body part is learned by using the change of human body position information in motion, the detected result is compared with parameters in the model by analyzing the whole motion model, and then the motion state of the human body is judged.

Claims

1. A limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling is characterized by comprising the following steps:

s1, definition of word book

Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, the pixel position of each object in the video is described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers the pixels in the monitoring video and can be inquired;

s2, quantizing the pixel position of the object

In the video obtained by video monitoring, behaviors are characterized by the positions of behavior generators, so that the pixel positions of objects in the video are quantized into 10 × 10 non-overlapping cell elements by taking the position information into account in the construction of the word book, and for M × N video objects, M/10 × N/10 cell tuples can be obtained;

s3, describing the size of the foreground object in the scene

In order to accurately represent foreground objects in video objects, each foreground pixel is connected with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame and is a pedestrian, and the other type is a large foreground frame and comprises a vehicle or a group of pedestrians; therefore, classifying the size of the foreground frame by using K-means clustering to obtain a foreground target to which each pixel belongs, taking the clustering number K in the K-means as 2, and finally describing the size of the target in the scene by using clustering labels 1 and 2, namely 1 is a small target and 2 is a large target;

s4, determining the motion situation of the foreground pixel

For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that for the detected foreground pixel, the motion condition of the foreground pixel is determined by 5 motion descriptors, namely motion direction, track, position and speed;

s5, defining video sequence and pixel point

Recording video sequence under scene in video monitoring

Will be provided with

Into a number of video sequences, wherein,

for the m-th segmented video segment, the video sequence is divided

Considering the current corpus W, then

Corresponding to documents in a corpus, in a video segment

In the method, pixel points are defined as words, each word corresponds to a theme, and the change along with the time t is

In the method, each word theme generates a transition or self-transition state to other themes, and the Markov Chain MonteCarlo characteristic can reach a smooth distribution after a period of time;

s6, establishing a vocabulary book

Describing the method, at a certain moment, the motion information of each pixel and the attached target have independence, namely, for the video clip, different themes formed along with the change of time t are independently and respectively obtained, so that each position can be represented by adopting a joint characteristic

Features of movement and size are concatenated and then used as a set of words for each cell element, using V_cRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, the feature word for one pixel is defined as w_c,aC is the cell position, a is the combined characteristics of movement form and size;

s7 establishment of corpus

Then the probability of each corpus in the corpus is:

wherein,

then, for each specific topic

And generating probabilities of words in the corpus from the topics

The probability of the final corpus generation is for each topic

Cumulative summation of the above generated vocabulary probabilities:

in corpus W

Subject to the distribution of the polynomial expression,

themes

Obey toDistribution of individual probability

This distribution becomes a parameter

A priori distribution of

wherein,

The video sequence is regarded as a document which is formed by mixing a plurality of topics, and each topic is a probability distribution on a vocabularyEach word represented by each pixel in the video sequence is generated by a fixed topic, and the process is a document modeling process, namely a bag-of-words model: if there are T subjects-words, it is marked as

Probability distribution of one word vector for each topic

For corpus C ═ d containing M documents₁,d₂,…,d_M) Each document d in (1)_mThere will be a specific document theme

Then the mth document d_mThe generation probability of each word in (1) is:

the generation probability of the whole document is:

because the documents are mutually independent, the generation probability of the whole corpus is written according to a formula (10), a topic model is generated, and then the local optimal solution is solved by using an EM algorithm;

s8, judgment of limb conflict behavior

A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling is characterized in that low-dimensional data feature representation and complex scene analysis based on a model are combined to analyze a video sequence, the position of a human body is detected in the video, an overall motion model irrelevant to the body part is learned by using the change of human body position information in actions, the detected result is compared with parameters in the model by analyzing the overall motion model, the motion state of the human body is further judged, each behavior corresponds to theme distribution, if a limb conflict situation occurs in a tested video segment under the condition of a trained model, the behaviors are intensively distributed in the theme, and the behavior is determined to belong to the limb conflict situation according to the theme.