CN108108688B - Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling - Google Patents

Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling Download PDF

Info

Publication number
CN108108688B
CN108108688B CN201711366304.XA CN201711366304A CN108108688B CN 108108688 B CN108108688 B CN 108108688B CN 201711366304 A CN201711366304 A CN 201711366304A CN 108108688 B CN108108688 B CN 108108688B
Authority
CN
China
Prior art keywords
video
pixel
foreground
motion
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711366304.XA
Other languages
Chinese (zh)
Other versions
CN108108688A (en
Inventor
纪刚
周粉粉
周萌萌
安帅
商胜楠
于腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Lianhe Chuangzhi Technology Co ltd
Original Assignee
Qingdao Lianhe Chuangzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Lianhe Chuangzhi Technology Co ltd filed Critical Qingdao Lianhe Chuangzhi Technology Co ltd
Priority to CN201711366304.XA priority Critical patent/CN108108688B/en
Publication of CN108108688A publication Critical patent/CN108108688A/en
Application granted granted Critical
Publication of CN108108688B publication Critical patent/CN108108688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20224Image subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of video monitoring, and relates to a limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling, which comprises the steps of defining a word book, quantizing the pixel position of an object, describing the size of a foreground target in a scene, and determining the motion condition of a foreground pixel, completing the establishment of the complete word book and the establishment of a corpus through the steps, and judging the limb conflict behavior through the calculation mode, wherein the method combines low-dimensional data feature representation and model-based complex scene analysis, learns an integral motion model irrelevant to body parts by using the change of human body position information in motion, compares the detected result with parameters in the model through analyzing the integral motion model, and further judges the motion state of a human body, the method has the advantages of ingenious design concept, scientific detection principle, simple detection mode, high detection accuracy and wide market prospect.

Description

Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling
The technical field is as follows:
the invention belongs to the technical field of video monitoring, relates to a limb conflict behavior detection method, and particularly relates to a limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling.
Background art:
in recent years, with the increase of various safety emergencies, the safety consciousness of the public is improved, and meanwhile, along with the penetration of an artificial intelligence concept and the continuous maturity of an artificial intelligence technology, intelligent monitoring is more and more concerned by people. The traditional monitoring system mainly realizes the safety management of public places in a manual monitoring mode and lacks real-time performance and initiative. In many cases, video surveillance is not in the role of supervision because it is unattended to only play a role of video backup. In addition, with the popularization and wide arrangement of monitoring cameras, the traditional manual monitoring mode cannot meet the requirements of modern monitoring. To solve this problem, the public is working to find solutions to replace manual operations. At present, with the continuous development of video monitoring technology and information science, the fields of video monitoring, human-computer interaction, video searching and the like have great development, and automatic monitoring gradually becomes a research subject with wide application prospect. The abnormal behavior detection is an important content of automatic monitoring, and compared with the general human behavior recognition which focuses on the recognition of the conventional actions of people, the abnormal behavior detection has the characteristics of high burstiness, short duration and difficulty in acquiring behavior characteristics.
In recent years, researchers have proposed different methods for detecting abnormal behaviors, and research work for detecting abnormal behaviors in early stage mainly focuses on describing human body behaviors by using a simple set model, such as a model based on a two-dimensional contour, a three-dimensional cylinder, etc.; besides the static geometric model, researchers try to perform behavior description and differentiation by using certain characteristics describing human body motion, such as shape, angle, position, motion speed, motion direction, motion track and the like, and perform dimension reduction and screening on the extracted characteristics by adopting a subspace method including a principal component analysis method, an independent component analysis method and the like, so as to perform behavior analysis. The existing invention aiming at abnormal behavior detection has the inherent characteristic that abnormal behaviors cannot be really understood, so that the existing abnormal behavior detection model cannot completely reflect the essence of the abnormal behaviors, and the detection precision obtained according to the existing abnormal behavior detection model does not achieve the ideal effect.
The invention content is as follows:
the invention aims to overcome the defects in the prior art, and seeks to design a method for detecting the limb conflict behavior based on low-dimensional space-time feature extraction and topic modeling, which has the advantages of simple calculation mode, high calculation precision, capability of quickly and accurately detecting the limb conflict behavior and capability of timely early warning.
In order to achieve the above object, the method for detecting the limb conflict behavior based on the low-dimensional space-time feature extraction and the topic modeling specifically comprises the following steps:
s1, definition of word book
Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, the pixel position of each object in a video is described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers the pixels in the monitoring video and can be inquired;
s2, quantizing the pixel position of the object
In the video obtained by video monitoring, the behavior is basically characterized by the position of a behavior generator, therefore, the invention takes the position information into account in the construction of the word book, the pixel position of an object in the video is quantized into 10 × 10 non-overlapping cell elements, and for an M × N video object, M/10 × N/10 cell element groups can be obtained;
s3, describing the size of the foreground object in the scene
In order to accurately represent foreground objects in video objects, each foreground pixel is associated with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame which is mainly a pedestrian, and the other type is a large foreground frame which mainly comprises a vehicle or a group of pedestrians; therefore, the method uses K-means clustering to classify the size of the foreground frame so as to obtain the foreground target to which each pixel belongs, the clustering number K in the K-means is taken as 2, and finally, clustering labels 1 and 2 are used for describing the size of the target in the scene, namely 1 is a small target and 2 is a large target;
s4, determining the motion situation of the foreground pixel
For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels (static labels) and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that 5 possible motion descriptors, namely motion direction, track, position and speed, determine the motion condition of the foreground pixel for the detected foreground pixel;
s5, defining video sequence and pixel point
Recording video sequence under scene in video monitoring
Figure GDA0003306343670000031
Will be provided with
Figure GDA0003306343670000032
Into a number of video sequences, wherein,
Figure GDA0003306343670000033
for the m-th segmented video segment, the video sequence is divided
Figure GDA0003306343670000034
Considering the current corpus W, then
Figure GDA0003306343670000035
Corresponding to documents (documents) in the corpus, in the video segment
Figure GDA0003306343670000036
In the method, pixel points are defined as words (word), each word corresponds to a topic (topic), and the change of the time t is shown in
Figure GDA0003306343670000037
In the method, each word theme generates a transition or self-transition state to other themes, and the MCMC (Markov Chain Monte Carlo) characteristic can know that the characteristic can reach a smooth distribution after a period of time;
s6, establishing a vocabulary book
According to the steps, M/10 XN/10 expressions are provided for the position of each pixel of an M × N video object, 5 descriptions are provided for the motion form, two expressions are provided for a large target and a small target, and the expression of the obtained words is M/10 XN/10 × 5 × 2 forms, namely, for a certain foreground pixel, the expression exists
Figure GDA0003306343670000038
Describing the method, at a certain moment, the motion information and the attached object of each pixel have independence, that is, for the video segment, different topics are formed along with the change of time t, the topics should be acquired independently and respectively, therefore, each location (location) can be represented by joint features (motion, size)
Figure GDA0003306343670000041
Features of movement and size are concatenated and then used as a set of words for each cell element, using VcRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, a feature word for a pixel may be defined as wc,aC is the cell position, a is the combined characteristics of movement form and size;
s7 establishment of corpus
Dividing a monitoring video into a plurality of short video segments, taking each video segment as a document, expressing pixel points in the video segments changing along with time t into words appearing in the document and subject contents expressed by the series of words, and then taking a word book generated by each pixel as a basis, if the total word frequency in a corpus is N, in all N words, if each word v is concernediNumber of generation frequencies niThen, then
Figure GDA0003306343670000042
Figure GDA0003306343670000043
Then the probability of each corpus in the corpus is:
Figure GDA0003306343670000044
wherein the content of the first and second substances,
Figure GDA0003306343670000045
the probability of the frequency times of each word in the corpus is referred to;
then, for each specific topic
Figure GDA0003306343670000046
And generating probabilities of words in the corpus from the topics
Figure GDA0003306343670000047
Then the final corpusThe probability generated is for each topic
Figure GDA0003306343670000048
Cumulative summation of the above generated vocabulary probabilities:
Figure GDA0003306343670000049
in corpus W
Figure GDA00033063436700000410
Subject to the distribution of the polynomial expression,
Figure GDA00033063436700000411
themes
Figure GDA00033063436700000412
Obeying a probability distribution
Figure GDA00033063436700000413
This distribution becomes a parameter
Figure GDA00033063436700000414
A priori distribution of
Figure GDA00033063436700000415
Selecting a conjugate distribution of the polynomial distribution, namely Dirichlet distribution; according to the distribution rule of Dirichlet, the probability of generating text corpora is calculated as follows:
Figure GDA0003306343670000051
wherein the content of the first and second substances,
Figure GDA0003306343670000052
a parameter representing a prior distribution of Dirichlet; the text corpus is a corpus composed of documents
Figure GDA0003306343670000053
Regarding a video sequence as a document (document), the document is formed by mixing a plurality of topics (topics), each Topic is probability distribution on words, each word represented by each pixel in the video sequence is generated by a fixed Topic, and the process is a document modeling process, namely a bag-of-words model: if there are T topoc-words, it is recorded as
Figure GDA0003306343670000054
Probability distribution of one word vector for each topic
Figure GDA0003306343670000055
For corpus C ═ d containing M documents1,d2,···,dM) Each document d in (1)mThere will be a specific doc-topic
Figure GDA0003306343670000056
That is, each document corresponds to a topic vector probability distribution of
Figure GDA0003306343670000057
Then the mth document dmThe generation probability of each word in (1) is:
Figure GDA0003306343670000058
the generation probability of the whole document is:
Figure GDA0003306343670000059
because the documents are mutually independent, the generation probability of the whole corpus is written according to the formula to generate a Topic-Model, and then the local optimal solution is solved by using an EM algorithm;
s8, judgment of limb conflict behavior
A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling combines low-dimensional data feature representation and complex scene analysis based on a model to analyze a video sequence, detects the position of a human body in the video, learns an overall motion model irrelevant to the body part by using the change of human body position information in motion, compares the detected result with parameters in the model by analyzing the overall motion model, and further judges the motion state of the human body.
Compared with the prior art, the invention has the following beneficial effects: the method has the advantages that the spectral characteristics of the image are mainly adopted to accurately extract the outline of the motion area, the outline edge of the motion target can be clearly seen for behavior characteristic analysis, the method is not only suitable for limb conflict behaviors such as fighting and the like, but also suitable for detection of other behaviors such as rapid movement and the like, the design concept is ingenious, the detection principle is scientific, the detection mode is simple, the detection accuracy is high, the application environment is friendly, and the method has a great market prospect.
Description of the drawings:
fig. 1 is a diagram illustrating the foreground detection effect of different video frame images in a video stream according to the present invention.
FIG. 2 is a process flow diagram of a method for detecting a limb conflict behavior based on low-dimensional spatiotemporal feature extraction and topic modeling according to the present invention.
The specific implementation mode is as follows:
the invention is further illustrated by the following examples in conjunction with the accompanying drawings.
Example (b):
in order to achieve the above object, the method for detecting a limb conflict behavior based on low-dimensional spatio-temporal feature extraction and topic modeling specifically includes the following steps:
s1, definition of word book
Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design of the embodiment, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, pixel positions of each object in a video are described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers pixels in the monitoring video and can be inquired;
s2, quantizing the pixel position of the object
In the video obtained by video monitoring, the behavior is basically characterized by the position of a behavior generator, therefore, in the embodiment, the position information is considered in the construction of the word book, the pixel position of the object in the video is quantized into 10 × 10 non-overlapping cell elements, and for an M × N video object, M/10 × N/10 cell element groups can be obtained;
s3, describing the size of the foreground object in the scene
In order to accurately represent foreground objects in video objects, each foreground pixel is associated with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame mainly including pedestrians, and the other type is a large foreground frame mainly including vehicles or a group of pedestrians; therefore, in the embodiment, the size of the foreground frame is classified by using K-means clustering, so as to obtain a foreground target to which each pixel belongs, the clustering number K in the K-means is taken to be 2, and finally, the sizes of targets in a scene are described by using clustering labels 1 and 2, that is, 1 is a small target and 2 is a large target;
s4, determining the motion situation of the foreground pixel
For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels (static labels) and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that 5 possible motion descriptors, namely motion direction, track, position and speed, determine the motion condition of the foreground pixel for the detected foreground pixel;
s5, defining video sequence and pixel point
Recording video sequence under scene in video monitoring
Figure GDA0003306343670000081
Will be provided with
Figure GDA0003306343670000082
Into a number of video sequences, wherein,
Figure GDA0003306343670000083
for the m-th segmented video segment, the video sequence is divided
Figure GDA0003306343670000084
Considering the current corpus W, then
Figure GDA0003306343670000085
Corresponding to documents (documents) in the corpus, in the video segment
Figure GDA0003306343670000086
In the method, pixel points are defined as words (word), each word corresponds to a topic (topic), and the change of the time t is shown in
Figure GDA0003306343670000087
In the method, each word theme generates a transition or self-transition state to other themes, and the MCMC (Markov Chain Monte Carlo) characteristic can know that the characteristic can reach a smooth distribution after a period of time;
s6, establishing a vocabulary book
There are M/10 XN/10 representations, motion shapes, for each pixel location of an M × N video object according to the above stepsThe expression has 5 descriptions, and the large target and the small target have two expressions, and the expression of the word can be obtained in M/10 XN/10 X5 X2 forms, namely for a certain foreground pixel, the expression exists
Figure GDA0003306343670000088
Describing the method, at a certain moment, the motion information and the attached object of each pixel have independence, that is, for the video segment, different topics are formed along with the change of time t, the topics should be acquired independently and respectively, therefore, each location (location) can be represented by joint features (motion, size)
Figure GDA0003306343670000089
Features of movement and size are concatenated and then used as a set of words for each cell element, using VcRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, a feature word for a pixel may be defined as wc,aC is the cell position, a is the combined characteristics of movement form and size;
s7 establishment of corpus
Dividing a monitoring video into a plurality of short video segments, taking each video segment as a document, expressing pixel points in the video segments changing along with time t into words appearing in the document and subject contents expressed by the series of words, and then taking a word book generated by each pixel as a basis, if the total word frequency in a corpus is N, in all N words, if each word v is concernediNumber of generation frequencies niThen, then
Figure GDA0003306343670000091
Figure GDA0003306343670000092
Then the probability of each corpus in the corpus is:
Figure GDA0003306343670000093
wherein the content of the first and second substances,
Figure GDA0003306343670000094
the probability of the frequency times of each word in the corpus is referred to;
then, for each specific topic
Figure GDA0003306343670000095
And generating probabilities of words in the corpus from the topics
Figure GDA0003306343670000096
The probability of the final corpus generation is for each topic
Figure GDA0003306343670000097
Cumulative summation of the above generated vocabulary probabilities:
Figure GDA0003306343670000098
in corpus W
Figure GDA0003306343670000099
Subject to the distribution of the polynomial expression,
Figure GDA00033063436700000910
themes
Figure GDA00033063436700000911
Obeying a probability distribution
Figure GDA00033063436700000912
This distribution becomes a parameter
Figure GDA00033063436700000913
A priori distribution of
Figure GDA00033063436700000914
Selecting a conjugate distribution of the polynomial distribution, namely Dirichlet distribution; according to the distribution rule of Dirichlet, the probability of generating text corpora is calculated as follows:
Figure GDA00033063436700000915
Figure GDA0003306343670000101
wherein the content of the first and second substances,
Figure GDA0003306343670000102
a parameter representing a prior distribution of Dirichlet; the text corpus is a corpus composed of documents
Figure GDA0003306343670000103
Regarding a video sequence as a document (document), the document is formed by mixing a plurality of topics (topics), each Topic is probability distribution on words, each word represented by each pixel in the video sequence is generated by a fixed Topic, and the process is a document modeling process, namely a bag-of-words model: if there are T topoc-words, it is recorded as
Figure GDA0003306343670000104
Probability distribution of one word vector for each topic
Figure GDA0003306343670000105
For corpus C ═ d containing M documents1,d2,···,dM) Each document d in (1)mThere will be a specific doc-topic
Figure GDA0003306343670000106
That is, each document corresponds to a topic vector probability distribution of
Figure GDA0003306343670000107
Then the mth document dmThe generation probability of each word in (1) is:
Figure GDA0003306343670000108
the generation probability of the whole document is:
Figure GDA0003306343670000109
because the documents are mutually independent, the generation probability of the whole corpus is written according to the formula to generate a Topic-Model, and then the local optimal solution is solved by using an EM algorithm;
s8, judgment of limb conflict behavior
A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling is characterized in that low-dimensional data feature representation and complex scene analysis based on a model are combined to analyze a video sequence, a human body position is detected in a video, a whole motion model irrelevant to a body part is learned by using the change of human body position information in motion, the detected result is compared with parameters in the model by analyzing the whole motion model, and then the motion state of the human body is judged.

Claims (1)

1. A limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling is characterized by comprising the following steps:
s1, definition of word book
Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, the pixel position of each object in the video is described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers the pixels in the monitoring video and can be inquired;
s2, quantizing the pixel position of the object
In the video obtained by video monitoring, behaviors are characterized by the positions of behavior generators, so that the pixel positions of objects in the video are quantized into 10 × 10 non-overlapping cell elements by taking the position information into account in the construction of the word book, and for M × N video objects, M/10 × N/10 cell tuples can be obtained;
s3, describing the size of the foreground object in the scene
In order to accurately represent foreground objects in video objects, each foreground pixel is connected with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame and is a pedestrian, and the other type is a large foreground frame and comprises a vehicle or a group of pedestrians; therefore, classifying the size of the foreground frame by using K-means clustering to obtain a foreground target to which each pixel belongs, taking the clustering number K in the K-means as 2, and finally describing the size of the target in the scene by using clustering labels 1 and 2, namely 1 is a small target and 2 is a large target;
s4, determining the motion situation of the foreground pixel
For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that for the detected foreground pixel, the motion condition of the foreground pixel is determined by 5 motion descriptors, namely motion direction, track, position and speed;
s5, defining video sequence and pixel point
Recording video sequence under scene in video monitoring
Figure FDA0003306343660000021
Will be provided with
Figure FDA0003306343660000022
Into a number of video sequences, wherein,
Figure FDA0003306343660000023
for the m-th segmented video segment, the video sequence is divided
Figure FDA0003306343660000024
Considering the current corpus W, then
Figure FDA0003306343660000025
Figure FDA0003306343660000026
Corresponding to documents in a corpus, in a video segment
Figure FDA0003306343660000027
In the method, pixel points are defined as words, each word corresponds to a theme, and the change along with the time t is
Figure FDA0003306343660000028
In the method, each word theme generates a transition or self-transition state to other themes, and the Markov Chain MonteCarlo characteristic can reach a smooth distribution after a period of time;
s6, establishing a vocabulary book
According to the steps, M/10 XN/10 expressions are provided for the position of each pixel of an M × N video object, 5 descriptions are provided for the motion form, two expressions are provided for a large target and a small target, and the expression of the obtained words is M/10 XN/10 × 5 × 2 forms, namely, for a certain foreground pixel, the expression exists
Figure FDA0003306343660000029
Describing the method, at a certain moment, the motion information of each pixel and the attached target have independence, namely, for the video clip, different themes formed along with the change of time t are independently and respectively obtained, so that each position can be represented by adopting a joint characteristic
Figure FDA00033063436600000210
Features of movement and size are concatenated and then used as a set of words for each cell element, using VcRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, the feature word for one pixel is defined as wc,aC is the cell position, a is the combined characteristics of movement form and size;
s7 establishment of corpus
Dividing a monitoring video into a plurality of short video segments, taking each video segment as a document, expressing pixel points in the video segments changing along with time t into words appearing in the document and subject contents expressed by the series of words, and then taking a word book generated by each pixel as a basis, if the total word frequency in a corpus is N, in all N words, if each word v is concernediNumber of generation frequencies niThen, then
Figure FDA0003306343660000031
Figure FDA0003306343660000032
Then the probability of each corpus in the corpus is:
Figure FDA0003306343660000033
wherein the content of the first and second substances,
Figure FDA0003306343660000034
the probability of the frequency times of each word in the corpus is referred to;
then, for each specific topic
Figure FDA0003306343660000035
And generating probabilities of words in the corpus from the topics
Figure FDA0003306343660000036
The probability of the final corpus generation is for each topic
Figure FDA0003306343660000037
Cumulative summation of the above generated vocabulary probabilities:
Figure FDA0003306343660000038
in corpus W
Figure FDA0003306343660000039
Subject to the distribution of the polynomial expression,
Figure FDA00033063436600000310
themes
Figure FDA00033063436600000311
Obey toDistribution of individual probability
Figure FDA00033063436600000312
This distribution becomes a parameter
Figure FDA00033063436600000313
A priori distribution of
Figure FDA00033063436600000314
Selecting a conjugate distribution of the polynomial distribution, namely Dirichlet distribution; according to the distribution rule of Dirichlet, the probability of generating text corpora is calculated as follows:
Figure FDA00033063436600000315
Figure FDA00033063436600000316
Figure FDA00033063436600000317
Figure FDA0003306343660000041
wherein the content of the first and second substances,
Figure FDA0003306343660000042
a parameter representing a prior distribution of Dirichlet; the text corpus is a corpus composed of documents
Figure FDA0003306343660000043
The video sequence is regarded as a document which is formed by mixing a plurality of topics, and each topic is a probability distribution on a vocabularyEach word represented by each pixel in the video sequence is generated by a fixed topic, and the process is a document modeling process, namely a bag-of-words model: if there are T subjects-words, it is marked as
Figure FDA0003306343660000044
Probability distribution of one word vector for each topic
Figure FDA0003306343660000045
For corpus C ═ d containing M documents1,d2,…,dM) Each document d in (1)mThere will be a specific document theme
Figure FDA0003306343660000046
That is, each document corresponds to a topic vector probability distribution of
Figure FDA0003306343660000047
Then the mth document dmThe generation probability of each word in (1) is:
Figure FDA0003306343660000048
the generation probability of the whole document is:
Figure FDA0003306343660000049
because the documents are mutually independent, the generation probability of the whole corpus is written according to a formula (10), a topic model is generated, and then the local optimal solution is solved by using an EM algorithm;
s8, judgment of limb conflict behavior
A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling is characterized in that low-dimensional data feature representation and complex scene analysis based on a model are combined to analyze a video sequence, the position of a human body is detected in the video, an overall motion model irrelevant to the body part is learned by using the change of human body position information in actions, the detected result is compared with parameters in the model by analyzing the overall motion model, the motion state of the human body is further judged, each behavior corresponds to theme distribution, if a limb conflict situation occurs in a tested video segment under the condition of a trained model, the behaviors are intensively distributed in the theme, and the behavior is determined to belong to the limb conflict situation according to the theme.
CN201711366304.XA 2017-12-18 2017-12-18 Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling Active CN108108688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711366304.XA CN108108688B (en) 2017-12-18 2017-12-18 Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711366304.XA CN108108688B (en) 2017-12-18 2017-12-18 Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling

Publications (2)

Publication Number Publication Date
CN108108688A CN108108688A (en) 2018-06-01
CN108108688B true CN108108688B (en) 2021-11-23

Family

ID=62209950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711366304.XA Active CN108108688B (en) 2017-12-18 2017-12-18 Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling

Country Status (1)

Country Link
CN (1) CN108108688B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242826B (en) * 2018-08-07 2022-02-22 高龑 Mobile equipment end stick-shaped object root counting method and system based on target detection
CN110659363B (en) * 2019-07-30 2021-11-23 浙江工业大学 Web service mixed evolution clustering method based on membrane computing
CN111160170B (en) * 2019-12-19 2023-04-21 青岛联合创智科技有限公司 Self-learning human behavior recognition and anomaly detection method
CN113705274B (en) * 2020-05-20 2023-09-05 杭州海康威视数字技术股份有限公司 Climbing behavior detection method and device, electronic equipment and storage medium
CN111707375B (en) * 2020-06-10 2021-07-09 青岛联合创智科技有限公司 Electronic class card with intelligent temperature measurement attendance and abnormal behavior detection functions
CN117372969B (en) * 2023-12-08 2024-05-10 暗物智能科技(广州)有限公司 Monitoring scene-oriented abnormal event detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530603A (en) * 2013-09-24 2014-01-22 杭州电子科技大学 Video abnormality detection method based on causal loop diagram model
CN103995915A (en) * 2014-03-21 2014-08-20 中山大学 Crowd evacuation simulation system based on composite potential energy field
CN104268546A (en) * 2014-05-28 2015-01-07 苏州大学 Dynamic scene classification method based on topic model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268495B (en) * 2013-05-31 2016-08-17 公安部第三研究所 Human body behavior modeling recognition methods based on priori knowledge cluster in computer system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530603A (en) * 2013-09-24 2014-01-22 杭州电子科技大学 Video abnormality detection method based on causal loop diagram model
CN103995915A (en) * 2014-03-21 2014-08-20 中山大学 Crowd evacuation simulation system based on composite potential energy field
CN104268546A (en) * 2014-05-28 2015-01-07 苏州大学 Dynamic scene classification method based on topic model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Markov Clustering Topic Model for Mining Behaviour in Video;Timothy Hospedales et al.;《2009 IEEE 12th International Conference on Computer Vision》;20091231;第1165-1172页 *
主题模型在视频异常行为检测中的应用;赵靓 等;《计算机科学》;20120930;第39卷(第9期);全文 *
人群运动主题语义特征提取和行为分析研究;黄鲜萍;《中国博士学位论文全文数据库信息科技辑》;20160715(第07期);第13-14、44-70页 *
基于轨迹分析的行人异常行为识别;胡瑗 等;《计算机工程与科学》;20171130;第39卷(第11期);全文 *

Also Published As

Publication number Publication date
CN108108688A (en) 2018-06-01

Similar Documents

Publication Publication Date Title
CN108108688B (en) Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling
Yan et al. Abnormal event detection from videos using a two-stream recurrent variational autoencoder
Adithya et al. Artificial neural network based method for Indian sign language recognition
Castellano et al. Crowd detection in aerial images using spatial graphs and fully-convolutional neural networks
CN105787458A (en) Infrared behavior identification method based on adaptive fusion of artificial design feature and depth learning feature
CN104281853A (en) Behavior identification method based on 3D convolution neural network
CN111738218B (en) Human body abnormal behavior recognition system and method
Reshna et al. Spotting and recognition of hand gesture for Indian sign language recognition system with skin segmentation and SVM
Rabiee et al. Crowd behavior representation: an attribute-based approach
Lu et al. Multi-object detection method based on YOLO and ResNet hybrid networks
Intwala et al. Indian sign language converter using convolutional neural networks
CN103500456A (en) Object tracking method and equipment based on dynamic Bayes model network
Castellano et al. Crowd counting from unmanned aerial vehicles with fully-convolutional neural networks
Qin et al. Application of video scene semantic recognition technology in smart video
Ramzan et al. Automatic Unusual Activities Recognition Using Deep Learning in Academia.
CN115798055B (en) Violent behavior detection method based on cornersort tracking algorithm
Andrade et al. Characterisation of optical flow anomalies in pedestrian traffic
Xia et al. Anomaly detection in traffic surveillance with sparse topic model
Alsaadi et al. An automated mammals detection based on SSD-mobile net
Hao et al. Human behavior analysis based on attention mechanism and LSTM neural network
Patel et al. Vision Based Real-time Recognition of Hand Gestures for Indian Sign Language using Histogram of Oriented Gradients Features.
CN112487920B (en) Convolution neural network-based crossing behavior identification method
Sahoo et al. An Improved VGG-19 Network Induced Enhanced Feature Pooling for Precise Moving Object Detection in Complex Video Scenes
Nabi et al. Abnormal event recognition in crowd environments
Katti et al. Character and Word Level Gesture Recognition of Indian Sign Language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 266200 Household No. 8, Qingda Third Road, Laoshan District, Qingdao City, Shandong Province

Applicant after: QINGDAO LIANHE CHUANGZHI TECHNOLOGY Co.,Ltd.

Address before: Room 1204, No. 40, Hong Kong Middle Road, Shinan District, Qingdao, Shandong 266200

Applicant before: QINGDAO LIANHE CHUANGZHI TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant