CN108108688B - Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling - Google Patents
Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling Download PDFInfo
- Publication number
- CN108108688B CN108108688B CN201711366304.XA CN201711366304A CN108108688B CN 108108688 B CN108108688 B CN 108108688B CN 201711366304 A CN201711366304 A CN 201711366304A CN 108108688 B CN108108688 B CN 108108688B
- Authority
- CN
- China
- Prior art keywords
- video
- pixel
- foreground
- motion
- corpus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 23
- 238000000605 extraction Methods 0.000 title claims abstract description 15
- 238000012544 monitoring process Methods 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000004458 analytical method Methods 0.000 claims abstract description 12
- 230000008859 change Effects 0.000 claims abstract description 10
- 238000013461 design Methods 0.000 claims abstract description 6
- 230000006399 behavior Effects 0.000 claims description 35
- 230000014509 gene expression Effects 0.000 claims description 15
- 206010000117 Abnormal behaviour Diseases 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 10
- 230000003287 optical effect Effects 0.000 claims description 9
- 230000003068 static effect Effects 0.000 claims description 6
- 230000019771 cognition Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000012880 independent component analysis Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000012847 principal component analysis method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/28—Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/254—Analysis of motion involving subtraction of images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20224—Image subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/44—Event detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of video monitoring, and relates to a limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling, which comprises the steps of defining a word book, quantizing the pixel position of an object, describing the size of a foreground target in a scene, and determining the motion condition of a foreground pixel, completing the establishment of the complete word book and the establishment of a corpus through the steps, and judging the limb conflict behavior through the calculation mode, wherein the method combines low-dimensional data feature representation and model-based complex scene analysis, learns an integral motion model irrelevant to body parts by using the change of human body position information in motion, compares the detected result with parameters in the model through analyzing the integral motion model, and further judges the motion state of a human body, the method has the advantages of ingenious design concept, scientific detection principle, simple detection mode, high detection accuracy and wide market prospect.
Description
The technical field is as follows:
the invention belongs to the technical field of video monitoring, relates to a limb conflict behavior detection method, and particularly relates to a limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling.
Background art:
in recent years, with the increase of various safety emergencies, the safety consciousness of the public is improved, and meanwhile, along with the penetration of an artificial intelligence concept and the continuous maturity of an artificial intelligence technology, intelligent monitoring is more and more concerned by people. The traditional monitoring system mainly realizes the safety management of public places in a manual monitoring mode and lacks real-time performance and initiative. In many cases, video surveillance is not in the role of supervision because it is unattended to only play a role of video backup. In addition, with the popularization and wide arrangement of monitoring cameras, the traditional manual monitoring mode cannot meet the requirements of modern monitoring. To solve this problem, the public is working to find solutions to replace manual operations. At present, with the continuous development of video monitoring technology and information science, the fields of video monitoring, human-computer interaction, video searching and the like have great development, and automatic monitoring gradually becomes a research subject with wide application prospect. The abnormal behavior detection is an important content of automatic monitoring, and compared with the general human behavior recognition which focuses on the recognition of the conventional actions of people, the abnormal behavior detection has the characteristics of high burstiness, short duration and difficulty in acquiring behavior characteristics.
In recent years, researchers have proposed different methods for detecting abnormal behaviors, and research work for detecting abnormal behaviors in early stage mainly focuses on describing human body behaviors by using a simple set model, such as a model based on a two-dimensional contour, a three-dimensional cylinder, etc.; besides the static geometric model, researchers try to perform behavior description and differentiation by using certain characteristics describing human body motion, such as shape, angle, position, motion speed, motion direction, motion track and the like, and perform dimension reduction and screening on the extracted characteristics by adopting a subspace method including a principal component analysis method, an independent component analysis method and the like, so as to perform behavior analysis. The existing invention aiming at abnormal behavior detection has the inherent characteristic that abnormal behaviors cannot be really understood, so that the existing abnormal behavior detection model cannot completely reflect the essence of the abnormal behaviors, and the detection precision obtained according to the existing abnormal behavior detection model does not achieve the ideal effect.
The invention content is as follows:
the invention aims to overcome the defects in the prior art, and seeks to design a method for detecting the limb conflict behavior based on low-dimensional space-time feature extraction and topic modeling, which has the advantages of simple calculation mode, high calculation precision, capability of quickly and accurately detecting the limb conflict behavior and capability of timely early warning.
In order to achieve the above object, the method for detecting the limb conflict behavior based on the low-dimensional space-time feature extraction and the topic modeling specifically comprises the following steps:
s1, definition of word book
Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, the pixel position of each object in a video is described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers the pixels in the monitoring video and can be inquired;
s2, quantizing the pixel position of the object
In the video obtained by video monitoring, the behavior is basically characterized by the position of a behavior generator, therefore, the invention takes the position information into account in the construction of the word book, the pixel position of an object in the video is quantized into 10 × 10 non-overlapping cell elements, and for an M × N video object, M/10 × N/10 cell element groups can be obtained;
s3, describing the size of the foreground object in the scene
In order to accurately represent foreground objects in video objects, each foreground pixel is associated with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame which is mainly a pedestrian, and the other type is a large foreground frame which mainly comprises a vehicle or a group of pedestrians; therefore, the method uses K-means clustering to classify the size of the foreground frame so as to obtain the foreground target to which each pixel belongs, the clustering number K in the K-means is taken as 2, and finally, clustering labels 1 and 2 are used for describing the size of the target in the scene, namely 1 is a small target and 2 is a large target;
s4, determining the motion situation of the foreground pixel
For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels (static labels) and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that 5 possible motion descriptors, namely motion direction, track, position and speed, determine the motion condition of the foreground pixel for the detected foreground pixel;
s5, defining video sequence and pixel point
Recording video sequence under scene in video monitoringWill be provided withInto a number of video sequences, wherein,for the m-th segmented video segment, the video sequence is dividedConsidering the current corpus W, thenCorresponding to documents (documents) in the corpus, in the video segmentIn the method, pixel points are defined as words (word), each word corresponds to a topic (topic), and the change of the time t is shown inIn the method, each word theme generates a transition or self-transition state to other themes, and the MCMC (Markov Chain Monte Carlo) characteristic can know that the characteristic can reach a smooth distribution after a period of time;
s6, establishing a vocabulary book
According to the steps, M/10 XN/10 expressions are provided for the position of each pixel of an M × N video object, 5 descriptions are provided for the motion form, two expressions are provided for a large target and a small target, and the expression of the obtained words is M/10 XN/10 × 5 × 2 forms, namely, for a certain foreground pixel, the expression existsDescribing the method, at a certain moment, the motion information and the attached object of each pixel have independence, that is, for the video segment, different topics are formed along with the change of time t, the topics should be acquired independently and respectively, therefore, each location (location) can be represented by joint features (motion, size)Features of movement and size are concatenated and then used as a set of words for each cell element, using VcRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, a feature word for a pixel may be defined as wc,aC is the cell position, a is the combined characteristics of movement form and size;
s7 establishment of corpus
Dividing a monitoring video into a plurality of short video segments, taking each video segment as a document, expressing pixel points in the video segments changing along with time t into words appearing in the document and subject contents expressed by the series of words, and then taking a word book generated by each pixel as a basis, if the total word frequency in a corpus is N, in all N words, if each word v is concernediNumber of generation frequencies niThen, then
Then the probability of each corpus in the corpus is:
then, for each specific topicAnd generating probabilities of words in the corpus from the topicsThen the final corpusThe probability generated is for each topicCumulative summation of the above generated vocabulary probabilities:
in corpus WSubject to the distribution of the polynomial expression,themesObeying a probability distributionThis distribution becomes a parameterA priori distribution ofSelecting a conjugate distribution of the polynomial distribution, namely Dirichlet distribution; according to the distribution rule of Dirichlet, the probability of generating text corpora is calculated as follows:
wherein,a parameter representing a prior distribution of Dirichlet; the text corpus is a corpus composed of documents
Regarding a video sequence as a document (document), the document is formed by mixing a plurality of topics (topics), each Topic is probability distribution on words, each word represented by each pixel in the video sequence is generated by a fixed Topic, and the process is a document modeling process, namely a bag-of-words model: if there are T topoc-words, it is recorded asProbability distribution of one word vector for each topicFor corpus C ═ d containing M documents1,d2,···,dM) Each document d in (1)mThere will be a specific doc-topicThat is, each document corresponds to a topic vector probability distribution ofThen the mth document dmThe generation probability of each word in (1) is:
the generation probability of the whole document is:
because the documents are mutually independent, the generation probability of the whole corpus is written according to the formula to generate a Topic-Model, and then the local optimal solution is solved by using an EM algorithm;
s8, judgment of limb conflict behavior
A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling combines low-dimensional data feature representation and complex scene analysis based on a model to analyze a video sequence, detects the position of a human body in the video, learns an overall motion model irrelevant to the body part by using the change of human body position information in motion, compares the detected result with parameters in the model by analyzing the overall motion model, and further judges the motion state of the human body.
Compared with the prior art, the invention has the following beneficial effects: the method has the advantages that the spectral characteristics of the image are mainly adopted to accurately extract the outline of the motion area, the outline edge of the motion target can be clearly seen for behavior characteristic analysis, the method is not only suitable for limb conflict behaviors such as fighting and the like, but also suitable for detection of other behaviors such as rapid movement and the like, the design concept is ingenious, the detection principle is scientific, the detection mode is simple, the detection accuracy is high, the application environment is friendly, and the method has a great market prospect.
Description of the drawings:
fig. 1 is a diagram illustrating the foreground detection effect of different video frame images in a video stream according to the present invention.
FIG. 2 is a process flow diagram of a method for detecting a limb conflict behavior based on low-dimensional spatiotemporal feature extraction and topic modeling according to the present invention.
The specific implementation mode is as follows:
the invention is further illustrated by the following examples in conjunction with the accompanying drawings.
Example (b):
in order to achieve the above object, the method for detecting a limb conflict behavior based on low-dimensional spatio-temporal feature extraction and topic modeling specifically includes the following steps:
s1, definition of word book
Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design of the embodiment, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, pixel positions of each object in a video are described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers pixels in the monitoring video and can be inquired;
s2, quantizing the pixel position of the object
In the video obtained by video monitoring, the behavior is basically characterized by the position of a behavior generator, therefore, in the embodiment, the position information is considered in the construction of the word book, the pixel position of the object in the video is quantized into 10 × 10 non-overlapping cell elements, and for an M × N video object, M/10 × N/10 cell element groups can be obtained;
s3, describing the size of the foreground object in the scene
In order to accurately represent foreground objects in video objects, each foreground pixel is associated with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame mainly including pedestrians, and the other type is a large foreground frame mainly including vehicles or a group of pedestrians; therefore, in the embodiment, the size of the foreground frame is classified by using K-means clustering, so as to obtain a foreground target to which each pixel belongs, the clustering number K in the K-means is taken to be 2, and finally, the sizes of targets in a scene are described by using clustering labels 1 and 2, that is, 1 is a small target and 2 is a large target;
s4, determining the motion situation of the foreground pixel
For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels (static labels) and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that 5 possible motion descriptors, namely motion direction, track, position and speed, determine the motion condition of the foreground pixel for the detected foreground pixel;
s5, defining video sequence and pixel point
Recording video sequence under scene in video monitoringWill be provided withInto a number of video sequences, wherein,for the m-th segmented video segment, the video sequence is dividedConsidering the current corpus W, thenCorresponding to documents (documents) in the corpus, in the video segmentIn the method, pixel points are defined as words (word), each word corresponds to a topic (topic), and the change of the time t is shown inIn the method, each word theme generates a transition or self-transition state to other themes, and the MCMC (Markov Chain Monte Carlo) characteristic can know that the characteristic can reach a smooth distribution after a period of time;
s6, establishing a vocabulary book
There are M/10 XN/10 representations, motion shapes, for each pixel location of an M × N video object according to the above stepsThe expression has 5 descriptions, and the large target and the small target have two expressions, and the expression of the word can be obtained in M/10 XN/10 X5 X2 forms, namely for a certain foreground pixel, the expression existsDescribing the method, at a certain moment, the motion information and the attached object of each pixel have independence, that is, for the video segment, different topics are formed along with the change of time t, the topics should be acquired independently and respectively, therefore, each location (location) can be represented by joint features (motion, size)Features of movement and size are concatenated and then used as a set of words for each cell element, using VcRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, a feature word for a pixel may be defined as wc,aC is the cell position, a is the combined characteristics of movement form and size;
s7 establishment of corpus
Dividing a monitoring video into a plurality of short video segments, taking each video segment as a document, expressing pixel points in the video segments changing along with time t into words appearing in the document and subject contents expressed by the series of words, and then taking a word book generated by each pixel as a basis, if the total word frequency in a corpus is N, in all N words, if each word v is concernediNumber of generation frequencies niThen, then
Then the probability of each corpus in the corpus is:
then, for each specific topicAnd generating probabilities of words in the corpus from the topicsThe probability of the final corpus generation is for each topicCumulative summation of the above generated vocabulary probabilities:
in corpus WSubject to the distribution of the polynomial expression,themesObeying a probability distributionThis distribution becomes a parameterA priori distribution ofSelecting a conjugate distribution of the polynomial distribution, namely Dirichlet distribution; according to the distribution rule of Dirichlet, the probability of generating text corpora is calculated as follows:
wherein,a parameter representing a prior distribution of Dirichlet; the text corpus is a corpus composed of documents
Regarding a video sequence as a document (document), the document is formed by mixing a plurality of topics (topics), each Topic is probability distribution on words, each word represented by each pixel in the video sequence is generated by a fixed Topic, and the process is a document modeling process, namely a bag-of-words model: if there are T topoc-words, it is recorded asProbability distribution of one word vector for each topicFor corpus C ═ d containing M documents1,d2,···,dM) Each document d in (1)mThere will be a specific doc-topicThat is, each document corresponds to a topic vector probability distribution ofThen the mth document dmThe generation probability of each word in (1) is:
the generation probability of the whole document is:
because the documents are mutually independent, the generation probability of the whole corpus is written according to the formula to generate a Topic-Model, and then the local optimal solution is solved by using an EM algorithm;
s8, judgment of limb conflict behavior
A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling is characterized in that low-dimensional data feature representation and complex scene analysis based on a model are combined to analyze a video sequence, a human body position is detected in a video, a whole motion model irrelevant to a body part is learned by using the change of human body position information in motion, the detected result is compared with parameters in the model by analyzing the whole motion model, and then the motion state of the human body is judged.
Claims (1)
1. A limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling is characterized by comprising the following steps:
s1, definition of word book
Firstly, semantic understanding which accords with human cognition is extracted from original monitoring video data, the video data is automatically analyzed and understood through algorithm design, the analysis process is divided into extraction of a foreground target, target characteristic representation and behavior analysis and classification, the method is used for detecting abnormal behaviors of human bodies in video monitoring based on an LDMA model, the pixel position of each object in the video is described, a characteristic vector is extracted from each pixel, the characteristic vector comprises the position, the moving speed and the moving direction of each pixel and the size of the object affiliated to the target, a visual information word book and a document are finally formed, and an effective word book is defined and serves as a dictionary which covers the pixels in the monitoring video and can be inquired;
s2, quantizing the pixel position of the object
In the video obtained by video monitoring, behaviors are characterized by the positions of behavior generators, so that the pixel positions of objects in the video are quantized into 10 × 10 non-overlapping cell elements by taking the position information into account in the construction of the word book, and for M × N video objects, M/10 × N/10 cell tuples can be obtained;
s3, describing the size of the foreground object in the scene
In order to accurately represent foreground objects in video objects, each foreground pixel is connected with which foreground object the pixel belongs to, and in video data obtained by video monitoring, observed foreground frames can be divided into two types based on the size of the foreground pixels, wherein one type is a small foreground frame and is a pedestrian, and the other type is a large foreground frame and comprises a vehicle or a group of pedestrians; therefore, classifying the size of the foreground frame by using K-means clustering to obtain a foreground target to which each pixel belongs, taking the clustering number K in the K-means as 2, and finally describing the size of the target in the scene by using clustering labels 1 and 2, namely 1 is a small target and 2 is a large target;
s4, determining the motion situation of the foreground pixel
For a scene in video monitoring, the analyzed content aims at a foreground target, background subtraction is required to be carried out to obtain foreground pixels, the optical flow information of each obtained foreground pixel is solved according to a Lucas-Kanade optical flow algorithm, and foreground static pixels and dynamic pixels are defined by setting a threshold value of the size of an optical flow vector; then, the dynamic pixel is quantized into a motion state described by 4 motion descriptors, namely motion direction, track, position and speed, so that for the detected foreground pixel, the motion condition of the foreground pixel is determined by 5 motion descriptors, namely motion direction, track, position and speed;
s5, defining video sequence and pixel point
Recording video sequence under scene in video monitoringWill be provided withInto a number of video sequences, wherein,for the m-th segmented video segment, the video sequence is dividedConsidering the current corpus W, then Corresponding to documents in a corpus, in a video segmentIn the method, pixel points are defined as words, each word corresponds to a theme, and the change along with the time t isIn the method, each word theme generates a transition or self-transition state to other themes, and the Markov Chain MonteCarlo characteristic can reach a smooth distribution after a period of time;
s6, establishing a vocabulary book
According to the steps, M/10 XN/10 expressions are provided for the position of each pixel of an M × N video object, 5 descriptions are provided for the motion form, two expressions are provided for a large target and a small target, and the expression of the obtained words is M/10 XN/10 × 5 × 2 forms, namely, for a certain foreground pixel, the expression existsDescribing the method, at a certain moment, the motion information of each pixel and the attached target have independence, namely, for the video clip, different themes formed along with the change of time t are independently and respectively obtained, so that each position can be represented by adopting a joint characteristicFeatures of movement and size are concatenated and then used as a set of words for each cell element, using VcRepresenting, namely, when a video segment is constructed, one pixel needs to provide two characteristic words, namely target sizes of motion and membership, to the position of the pixel, and then the final word can be represented in a form of M/10 xN/10 x (5+ 2); thus, the feature word for one pixel is defined as wc,aC is the cell position, a is the combined characteristics of movement form and size;
s7 establishment of corpus
Dividing a monitoring video into a plurality of short video segments, taking each video segment as a document, expressing pixel points in the video segments changing along with time t into words appearing in the document and subject contents expressed by the series of words, and then taking a word book generated by each pixel as a basis, if the total word frequency in a corpus is N, in all N words, if each word v is concernediNumber of generation frequencies niThen, then
Then the probability of each corpus in the corpus is:
then, for each specific topicAnd generating probabilities of words in the corpus from the topicsThe probability of the final corpus generation is for each topicCumulative summation of the above generated vocabulary probabilities:
themesObey toDistribution of individual probabilityThis distribution becomes a parameterA priori distribution ofSelecting a conjugate distribution of the polynomial distribution, namely Dirichlet distribution; according to the distribution rule of Dirichlet, the probability of generating text corpora is calculated as follows:
wherein,a parameter representing a prior distribution of Dirichlet; the text corpus is a corpus composed of documents
The video sequence is regarded as a document which is formed by mixing a plurality of topics, and each topic is a probability distribution on a vocabularyEach word represented by each pixel in the video sequence is generated by a fixed topic, and the process is a document modeling process, namely a bag-of-words model: if there are T subjects-words, it is marked asProbability distribution of one word vector for each topicFor corpus C ═ d containing M documents1,d2,…,dM) Each document d in (1)mThere will be a specific document themeThat is, each document corresponds to a topic vector probability distribution ofThen the mth document dmThe generation probability of each word in (1) is:
the generation probability of the whole document is:
because the documents are mutually independent, the generation probability of the whole corpus is written according to a formula (10), a topic model is generated, and then the local optimal solution is solved by using an EM algorithm;
s8, judgment of limb conflict behavior
A limb conflict behavior detection method based on low-dimensional space-time feature extraction and theme modeling is characterized in that low-dimensional data feature representation and complex scene analysis based on a model are combined to analyze a video sequence, the position of a human body is detected in the video, an overall motion model irrelevant to the body part is learned by using the change of human body position information in actions, the detected result is compared with parameters in the model by analyzing the overall motion model, the motion state of the human body is further judged, each behavior corresponds to theme distribution, if a limb conflict situation occurs in a tested video segment under the condition of a trained model, the behaviors are intensively distributed in the theme, and the behavior is determined to belong to the limb conflict situation according to the theme.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711366304.XA CN108108688B (en) | 2017-12-18 | 2017-12-18 | Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711366304.XA CN108108688B (en) | 2017-12-18 | 2017-12-18 | Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108108688A CN108108688A (en) | 2018-06-01 |
CN108108688B true CN108108688B (en) | 2021-11-23 |
Family
ID=62209950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711366304.XA Active CN108108688B (en) | 2017-12-18 | 2017-12-18 | Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108108688B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109242826B (en) * | 2018-08-07 | 2022-02-22 | 高龑 | Mobile equipment end stick-shaped object root counting method and system based on target detection |
CN110659363B (en) * | 2019-07-30 | 2021-11-23 | 浙江工业大学 | Web service mixed evolution clustering method based on membrane computing |
CN111160170B (en) * | 2019-12-19 | 2023-04-21 | 青岛联合创智科技有限公司 | Self-learning human behavior recognition and anomaly detection method |
CN113705274B (en) * | 2020-05-20 | 2023-09-05 | 杭州海康威视数字技术股份有限公司 | Climbing behavior detection method and device, electronic equipment and storage medium |
CN111707375B (en) * | 2020-06-10 | 2021-07-09 | 青岛联合创智科技有限公司 | Electronic class card with intelligent temperature measurement attendance and abnormal behavior detection functions |
CN117372969B (en) * | 2023-12-08 | 2024-05-10 | 暗物智能科技(广州)有限公司 | Monitoring scene-oriented abnormal event detection method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530603A (en) * | 2013-09-24 | 2014-01-22 | 杭州电子科技大学 | Video abnormality detection method based on causal loop diagram model |
CN103995915A (en) * | 2014-03-21 | 2014-08-20 | 中山大学 | Crowd evacuation simulation system based on composite potential energy field |
CN104268546A (en) * | 2014-05-28 | 2015-01-07 | 苏州大学 | Dynamic scene classification method based on topic model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103268495B (en) * | 2013-05-31 | 2016-08-17 | 公安部第三研究所 | Human body behavior modeling recognition methods based on priori knowledge cluster in computer system |
-
2017
- 2017-12-18 CN CN201711366304.XA patent/CN108108688B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530603A (en) * | 2013-09-24 | 2014-01-22 | 杭州电子科技大学 | Video abnormality detection method based on causal loop diagram model |
CN103995915A (en) * | 2014-03-21 | 2014-08-20 | 中山大学 | Crowd evacuation simulation system based on composite potential energy field |
CN104268546A (en) * | 2014-05-28 | 2015-01-07 | 苏州大学 | Dynamic scene classification method based on topic model |
Non-Patent Citations (4)
Title |
---|
A Markov Clustering Topic Model for Mining Behaviour in Video;Timothy Hospedales et al.;《2009 IEEE 12th International Conference on Computer Vision》;20091231;第1165-1172页 * |
主题模型在视频异常行为检测中的应用;赵靓 等;《计算机科学》;20120930;第39卷(第9期);全文 * |
人群运动主题语义特征提取和行为分析研究;黄鲜萍;《中国博士学位论文全文数据库信息科技辑》;20160715(第07期);第13-14、44-70页 * |
基于轨迹分析的行人异常行为识别;胡瑗 等;《计算机工程与科学》;20171130;第39卷(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108108688A (en) | 2018-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108108688B (en) | Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling | |
Yan et al. | Abnormal event detection from videos using a two-stream recurrent variational autoencoder | |
Adithya et al. | Artificial neural network based method for Indian sign language recognition | |
CN105787458A (en) | Infrared behavior identification method based on adaptive fusion of artificial design feature and depth learning feature | |
Nair et al. | A review on Indian sign language recognition | |
CN104281853A (en) | Behavior identification method based on 3D convolution neural network | |
EP2473969A2 (en) | Detecting anomalous trajectories in a video surveillance system | |
CN111738218B (en) | Human body abnormal behavior recognition system and method | |
Reshna et al. | Spotting and recognition of hand gesture for Indian sign language recognition system with skin segmentation and SVM | |
Rabiee et al. | Crowd behavior representation: an attribute-based approach | |
Lu et al. | Multi-object detection method based on YOLO and ResNet hybrid networks | |
Intwala et al. | Indian sign language converter using convolutional neural networks | |
Qin et al. | Application of video scene semantic recognition technology in smart video | |
Bulzomi et al. | End-to-end neuromorphic lip-reading | |
CN103500456A (en) | Object tracking method and equipment based on dynamic Bayes model network | |
Sahoo et al. | An Improved VGG-19 Network Induced Enhanced Feature Pooling For Precise Moving Object Detection In Complex Video Scenes | |
Castellano et al. | Crowd counting from unmanned aerial vehicles with fully-convolutional neural networks | |
Dorrani | Traffic Scene Analysis and Classification using Deep Learning | |
Ramzan et al. | Automatic Unusual Activities Recognition Using Deep Learning in Academia. | |
CN115798055B (en) | Violent behavior detection method based on cornersort tracking algorithm | |
Xia et al. | Anomaly detection in traffic surveillance with sparse topic model | |
Hao et al. | Human behavior analysis based on attention mechanism and LSTM neural network | |
Patel et al. | Vision Based Real-time Recognition of Hand Gestures for Indian Sign Language using Histogram of Oriented Gradients Features. | |
CN112487920B (en) | Convolution neural network-based crossing behavior identification method | |
Katti et al. | Character and word level gesture recognition of Indian Sign language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 266200 Household No. 8, Qingda Third Road, Laoshan District, Qingdao City, Shandong Province Applicant after: QINGDAO LIANHE CHUANGZHI TECHNOLOGY Co.,Ltd. Address before: Room 1204, No. 40, Hong Kong Middle Road, Shinan District, Qingdao, Shandong 266200 Applicant before: QINGDAO LIANHE CHUANGZHI TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |