CN112381072B - A human abnormal behavior detection method based on spatiotemporal information and human-object interaction - Google Patents

A human abnormal behavior detection method based on spatiotemporal information and human-object interaction Download PDF

Info

Publication number
CN112381072B
CN112381072B CN202110030865.2A CN202110030865A CN112381072B CN 112381072 B CN112381072 B CN 112381072B CN 202110030865 A CN202110030865 A CN 202110030865A CN 112381072 B CN112381072 B CN 112381072B
Authority
CN
China
Prior art keywords
human
objects
information
people
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110030865.2A
Other languages
Chinese (zh)
Other versions
CN112381072A (en
Inventor
龚勋
马冰
刘璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN202110030865.2A priority Critical patent/CN112381072B/en
Publication of CN112381072A publication Critical patent/CN112381072A/en
Application granted granted Critical
Publication of CN112381072B publication Critical patent/CN112381072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Alarm Systems (AREA)

Abstract

本发明公开了一种基于时空信息及人、物交互的人体异常行为检测方法,步骤如下:S1、数据采集与标注;S2、提取人和物体的位置信息;S3、提取人和物体的运动信息;S4、人和物体特征交互关系建模;S5、行为分类及融合;S6、检测结果优化。针对倒地、攀爬和肢体冲突异常动作以及持续性异常状态检测问题,采用人物交互的方式辅助判断异常行为,并结合重心的变化情况检测异常行为的持续性状态,同时,除了能够检测异常行为之外,本发明还可以检测走路、站立以及坐下这些正常动作。

Figure 202110030865

The invention discloses a human abnormal behavior detection method based on space-time information and human-object interaction. The steps are as follows: S1, data collection and labeling; S2, extracting position information of people and objects; S3, extracting motion information of people and objects ; S4, interactive relationship modeling of human and object features; S5, behavior classification and fusion; S6, optimization of detection results. For the detection of abnormal movements of falling to the ground, climbing and physical conflicts, and persistent abnormal states, the method of human interaction is used to assist in judging abnormal behaviors, and the persistent state of abnormal behaviors is detected by combining the changes of the center of gravity. At the same time, in addition to detecting abnormal behaviors In addition, the present invention can also detect normal actions such as walking, standing and sitting.

Figure 202110030865

Description

一种基于时空信息及人、物交互的人体异常行为检测方法A human abnormal behavior detection method based on spatiotemporal information and human-object interaction

技术领域technical field

本发明涉及计算机视觉和深度学习技术领域,具体涉及一种基于时空信息及人、物交互的人体异常行为检测方法。The invention relates to the technical fields of computer vision and deep learning, in particular to a method for detecting abnormal human behavior based on space-time information and human-object interaction.

背景技术Background technique

人体异常行为检测在安防和智能监控领域具有重要的应用,很大程度上缓解了人工监控的压力并提高了检测效率。现有的解决方案有些采用手工特征提取运动特征进行判别,在实际的真实场景应用中准确率较低;而当前一些基于深度学习的方法只能检测某一种异常行为,无法适应真实条件下多种异常行为的自动判定。而像攀爬和倒地这类异常动作具有一定的特殊性,不仅需要实时检测行为人正在进行的异常动作,还需要能够持续判别这些异常动作的状态。比如倒地后可能持续性的躺在那里不动,攀爬后不断的在桌子或其他辅助物体上走动,这都对现有检测技术带来了挑战,目前的方法无法检测异常动作的持续性状态,因此需要新的技术方法来解决。Human abnormal behavior detection has important applications in the field of security and intelligent monitoring, which greatly relieves the pressure of manual monitoring and improves the detection efficiency. Some of the existing solutions use manual feature extraction motion features for discrimination, which have low accuracy in actual real-world applications; while some current deep learning-based methods can only detect a certain abnormal behavior, and cannot adapt to many real conditions. Automatic determination of abnormal behavior. However, abnormal actions such as climbing and falling to the ground have certain particularities. It is not only necessary to detect the abnormal actions being performed by the actor in real time, but also to be able to continuously discriminate the state of these abnormal actions. For example, after falling to the ground, you may continue to lie there still, and after climbing, you may continue to walk on a table or other auxiliary objects, all of which bring challenges to the existing detection technology. The current method cannot detect the persistence of abnormal movements. state, so new technical methods are needed to solve it.

发明内容SUMMARY OF THE INVENTION

为解决现有技术中存在的问题,本发明提供了一种基于时空信息及人、物交互的人体异常行为检测方法,解决了上述背景技术中提到的问题。In order to solve the problems existing in the prior art, the present invention provides a method for detecting abnormal human behavior based on space-time information and human-object interaction, which solves the problems mentioned in the above background art.

为实现上述目的,本发明提供如下技术方案:一种基于时空信息及人、物交互的人体异常行为检测方法,步骤如下:S1、数据采集与标注;S2、提取人和物体的位置信息;S3、提取人和物体的运动信息;S4、人和物体特征交互关系建模;S5、行为分类及融合;S6、检测结果优化;所述的异常行为是指超出正常范围的行为,具有一定的场景相关性,表示在该场景下不被接受的行为。In order to achieve the above purpose, the present invention provides the following technical solutions: a method for detecting abnormal human behavior based on space-time information and human-object interaction, the steps are as follows: S1, data collection and labeling; S2, extracting the position information of people and objects; S3 , extracting the motion information of people and objects; S4, modeling the interaction relationship between people and objects; S5, behavior classification and fusion; S6, optimization of detection results; the abnormal behavior refers to behavior beyond the normal range, with a certain scene Relevance, indicating unacceptable behavior in the context.

优选的,所述步骤S1中的数据采集与标注包括:采集视频监控中的正常动作和异常动作,对视频数据进行裁剪,通过SSD目标检测网络生成人和物体的初始空间位置, 最后使用简易标注工具对生成的位置信息进行人工校正,修正检测不准确的物体位置,得到准确的位置信息;所述的正常动作是指在监控场景下可以被接受的动作,正常动作包括走路、坐下或站立;而异常动作则表示在该场景下不被接受的动作,异常动作包括倒地、攀爬或肢体冲突。Preferably, the data collection and labeling in step S1 includes: collecting normal actions and abnormal actions in video surveillance, cropping video data, generating initial spatial positions of people and objects through the SSD target detection network, and finally using simple labeling The tool manually corrects the generated position information, corrects inaccurate object positions, and obtains accurate position information; the normal actions refer to actions that are acceptable in the monitoring scene, and normal actions include walking, sitting or standing. ; and abnormal actions represent unacceptable actions in the scene, including falling to the ground, climbing, or physical conflict.

优选的,所述的简易标注工具是用来修正框的位置信息, 读取并显示图片及其对应的人、物框,可以判断显示框的位置是否准确,并通过鼠标重新绘制新的框,新数据会覆盖旧数据。Preferably, the simple labeling tool is used to correct the position information of the frame, read and display the picture and its corresponding person and object frame, can judge whether the position of the display frame is accurate, and redraw a new frame through the mouse, New data overwrites old data.

优选的,所述步骤S2的提取人和物体的位置信息包括: 通过对MS COCO数据集上预训练的SSD目标检测网络在采集的数据集上进行微调,准确检测人和物体的位置。Preferably, the extraction of the position information of people and objects in the step S2 includes: by fine-tuning the SSD target detection network pre-trained on the MS COCO data set on the collected data set to accurately detect the positions of people and objects.

优选的,所述的微调是指在MS COCO数据集预训练的模型基础上,针对训练数据只对网络的最后两层进行重新训练,其余层的参数保持不变。Preferably, the fine-tuning refers to retraining only the last two layers of the network for the training data on the basis of the model pre-trained in the MS COCO dataset, and the parameters of the remaining layers remain unchanged.

优选的,所述步骤S3中的提取人和物体的运动信息包括:采用3D-ShuffleNet网络作为时空运动信息的主干网络, 取当前帧以及前面的15帧数据共同组成一个输入片段作为输入数据, 对输入的16帧数据进行特征提取,最终得到单帧的时空信息特征图。Preferably, the motion information of extracting people and objects in the step S3 includes: adopting the 3D-ShuffleNet network as the backbone network of the spatiotemporal motion information, taking the current frame and the previous 15 frames of data to form an input segment together as input data, to The input 16 frames of data are subjected to feature extraction, and finally the spatiotemporal information feature map of a single frame is obtained.

优选的,所述步骤S4中的人和物体特征交互关系建模包括:将经步骤S2得到的人 和物体的位置信息应用到步骤S3提取得到的特征图上,得到时空特征信息;将人和物体的 特征单独裁剪出来,进行交互建模,公式如下:

Figure DEST_PATH_IMAGE001
,其中,
Figure DEST_PATH_IMAGE002
表示 第i个人的时空特征与各个物体特征总体的相关性,
Figure DEST_PATH_IMAGE003
表示第i个人的时空运动特征;
Figure DEST_PATH_IMAGE004
表 示第j个物体的特征;
Figure DEST_PATH_IMAGE005
表示当前帧物体特征集合。
Figure DEST_PATH_IMAGE006
表示人和物体关系模型,
Figure DEST_PATH_IMAGE007
表示整合多 个人物关系模型的结果。 Preferably, the modeling of the interaction relationship between people and objects features in the step S4 includes: applying the position information of the people and objects obtained in step S2 to the feature map extracted in step S3 to obtain spatiotemporal feature information; The features of the object are individually cut out and interactively modeled. The formula is as follows:
Figure DEST_PATH_IMAGE001
,in,
Figure DEST_PATH_IMAGE002
Represents the correlation between the spatiotemporal characteristics of the ith person and the overall characteristics of each object,
Figure DEST_PATH_IMAGE003
represents the spatiotemporal motion characteristics of the i-th person;
Figure DEST_PATH_IMAGE004
Represents the feature of the jth object;
Figure DEST_PATH_IMAGE005
Represents the feature set of objects in the current frame.
Figure DEST_PATH_IMAGE006
Represents a model of the relationship between people and objects,
Figure DEST_PATH_IMAGE007
Represents the result of integrating multiple person relationship models.

优选的,所述步骤S5中的行为分类及融合包括:分别对人体运动信息及人和物体 交互关系模型进行行为分类,对两个分类结果进行融合,得到初步检测结果,融合公式如 下:

Figure DEST_PATH_IMAGE008
,其中,C表示将
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
的分类得分融合得到的动作分类结 果,
Figure DEST_PATH_IMAGE011
表示人体运动信息得到的分类结果得分,
Figure DEST_PATH_IMAGE012
表示人和物体交互关系建模得到的分类 结果得分,
Figure DEST_PATH_IMAGE013
为可学习的超参数,表示结果的重要程度,如果
Figure DEST_PATH_IMAGE014
<0.5,则说明行为与物体的 相关性小,模型更关注人体运动信息的分类结果,反之,模型则更关注人和物体交互关系建 模的分类结果。 Preferably, the behavior classification and fusion in the step S5 includes: classifying the behavior of the human body motion information and the human-object interaction model, respectively, and merging the two classification results to obtain a preliminary detection result. The fusion formula is as follows:
Figure DEST_PATH_IMAGE008
, where C means that the
Figure DEST_PATH_IMAGE009
and
Figure DEST_PATH_IMAGE010
The action classification results obtained by the fusion of the classification scores,
Figure DEST_PATH_IMAGE011
represents the classification result score obtained from human motion information,
Figure DEST_PATH_IMAGE012
Represents the classification result score obtained by modeling the interaction between people and objects,
Figure DEST_PATH_IMAGE013
is a learnable hyperparameter, indicating the importance of the result, if
Figure DEST_PATH_IMAGE014
<0.5, it means that the correlation between behavior and objects is small, and the model pays more attention to the classification results of human motion information. On the contrary, the model pays more attention to the classification results of modeling the interaction between people and objects.

优选的,所述步骤S6中的检测结果优化包括:通过上一帧的初步检测结果,判断是 否检测到倒地的动作,如未检测到倒地的动作,则将上一帧的初步检测结果作为最终结果 并输出行为类别;如检测到倒地的动作则通过位置框计算人体的重心点,并计算相邻帧的 速度变化信息,得到

Figure DEST_PATH_IMAGE015
Figure DEST_PATH_IMAGE016
表示相邻帧的速度变化信息,将
Figure 423545DEST_PATH_IMAGE016
与阈值
Figure DEST_PATH_IMAGE017
作比较,如果小于阈 值
Figure 344228DEST_PATH_IMAGE017
,则表明仍然处于倒地状态,并将该结果覆盖检测出的结果;如果大于等于阈值
Figure 677120DEST_PATH_IMAGE017
,则表 明已经不再处于倒地的状态,采用模型检测到的结果作为最终的结果并输出行为类别。 Preferably, the optimization of the detection results in the step S6 includes: judging whether the action of falling to the ground is detected by the preliminary detection result of the previous frame, and if the action of falling to the ground is not detected, then the preliminary detection result of the previous frame is used. As the final result, the behavior category is output; if the falling motion is detected, the center of gravity of the human body is calculated through the position box, and the speed change information of adjacent frames is calculated to obtain
Figure DEST_PATH_IMAGE015
,
Figure DEST_PATH_IMAGE016
Indicates the speed change information of adjacent frames, the
Figure 423545DEST_PATH_IMAGE016
with threshold
Figure DEST_PATH_IMAGE017
for comparison, if less than the threshold
Figure 344228DEST_PATH_IMAGE017
, it indicates that it is still in the downed state, and the result is covered by the detected result; if it is greater than or equal to the threshold
Figure 677120DEST_PATH_IMAGE017
, it indicates that it is no longer in the state of falling to the ground, and the result detected by the model is used as the final result and the behavior category is output.

本发明的有益效果是:通过本发明的方法,目标检测模块可以准确定位行为人和物体的具体空间位置,模型最终会给出行为人的行为类别。最后将人体框和行为类别绘制到原始图片上(不包含物体框),并记录异常行为类别。本发明主要在利用人与物交互建模分析、行为分类融合以及基于重心速度模型对结果的优化上,采用人物交互的方式辅助判断异常行为,并结合重心的变化情况检测异常行为的持续性状态。同时,除了能够检测异常行为之外,本发明还可以检测走路、站立以及坐下这些正常动作。The beneficial effects of the present invention are: through the method of the present invention, the target detection module can accurately locate the specific spatial position of the actor and the object, and the model will finally give the behavior category of the actor. Finally, the human frame and behavior category are drawn onto the original image (without the object frame), and the abnormal behavior category is recorded. The present invention mainly uses human-object interaction modeling analysis, behavior classification and fusion, and optimization of results based on the center of gravity velocity model, adopts the method of human interaction to assist in judging abnormal behavior, and combines the change of center of gravity to detect the persistent state of abnormal behavior . Meanwhile, in addition to detecting abnormal behaviors, the present invention can also detect normal actions such as walking, standing and sitting.

附图说明Description of drawings

图1为本发明网络模型图;Fig. 1 is the network model diagram of the present invention;

图2为本发明数据采集与标注流程图;Fig. 2 is the flow chart of data collection and labeling of the present invention;

图3为本发明检测结果优化流程图;Fig. 3 is the flow chart of detection result optimization of the present invention;

图4为本发明模型训练和运行流程图。Fig. 4 is a flow chart of model training and operation of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

请参阅图1-4,模型训练和运行流程如图3所示,本发明提供一种技术方案:一种基于时空信息及人、物交互的人体异常行为检测方法,步骤如下:(1)数据采集与标注;(2)提取人和物体的位置信息;(3)提取人和物体的运动信息;(4)人和物体特征交互关系建模;(5)行为分类及融合;(6)检测结果优化。Please refer to Figures 1-4. The model training and running process is shown in Figure 3. The present invention provides a technical solution: a method for detecting abnormal human behavior based on space-time information and human-object interaction. The steps are as follows: (1) Data Collection and labeling; (2) Extracting the position information of people and objects; (3) Extracting motion information of people and objects; (4) Modeling the interaction relationship between people and objects; (5) Behavior classification and fusion; (6) Detection Results are optimized.

(1)数据采集与标注(1) Data collection and labeling

本发明在真实的视频监控场景下采集了正常动作和异常动作的,为了便于对数据标注,我们对真实场景的视频数据进行裁剪,紧接着使用SSD目标检测网络生成人和物体的初始空间位置,网络模型图如图1所示,最后使用发明的简易标注工具对生成的位置信息进行人工校正,修正检测不准确的物体位置,具体流程如图2所示。The present invention collects normal actions and abnormal actions in the real video surveillance scene. In order to facilitate the data annotation, we cut the video data of the real scene, and then use the SSD target detection network to generate the initial spatial positions of people and objects. The network model diagram is shown in Figure 1. Finally, the generated position information is manually corrected using the invented simple annotation tool, and the inaccurate object position is corrected. The specific process is shown in Figure 2.

简易标注工具说明:本工具主要是用来修正框的位置信息的,可以读取并显示图片及其对应的人、物框,使用者可以判断显示框的位置是否准确,并通过鼠标重新绘制新的框,新数据会覆盖旧数据。Simple annotation tool description: This tool is mainly used to correct the position information of the frame. It can read and display the picture and its corresponding person and object frame. The user can judge whether the position of the display frame is accurate, and redraw the new frame through the mouse. box, the new data will overwrite the old data.

(2)提取人和物体的位置信息(2) Extract the location information of people and objects

本发明通过对MS COCO数据集上预训练的SSD(Single Shot MultiBox Detector)目标检测网络在采集的数据集上进行微调,以适应监控场景下的目标特征,准确检测人和物体的位置。The invention fine-tunes the pre-trained SSD (Single Shot MultiBox Detector) target detection network on the MS COCO data set on the collected data set, so as to adapt to the target features in the monitoring scene and accurately detect the positions of people and objects.

微调方式:在MS COCO数据集预训练的模型基础上,针对训练数据只对网络的最后两层进行重新训练,其余层的参数保持不变。Fine-tuning method: Based on the pre-trained model of the MS COCO dataset, only the last two layers of the network are retrained for the training data, and the parameters of the remaining layers remain unchanged.

(3)提取人和物体的运动信息(3) Extract the motion information of people and objects

为了兼顾运行速度和检测准确度,本发明提出使用3D-ShuffleNet作为时空运动信息的主干网络,具体过程如下:In order to take into account the running speed and detection accuracy, the present invention proposes to use 3D-ShuffleNet as the backbone network of spatiotemporal motion information. The specific process is as follows:

1)数据采样,本发明使用16帧数据作为输入,具体采样过程是:取当前帧以及前面的15帧数据共同组成一个输入片段作为输入数据;1) Data sampling, the present invention uses 16 frames of data as input, and the specific sampling process is: take the current frame and the previous 15 frames of data to form an input segment as input data;

2)使用时空下采样的方式,对输入的16帧数据进行特征提取,通过对特征下采样,最终得到单帧的时空信息特征图,通过这种方式可以方便与目标检测模块进行结合。2) Use spatiotemporal downsampling to extract features from the input 16 frames of data, and finally obtain the spatiotemporal information feature map of a single frame by downsampling the features, which can be easily combined with the target detection module.

(4) 人和物体特征交互关系建模(4) Modeling the interaction relationship between human and object features

本模块的主要过程包含以下几个步骤:The main process of this module includes the following steps:

1)将第(2)步得到的位置信息应用到第(3)提取得到的特征图上,得到时空特征信息;1) Apply the position information obtained in step (2) to the feature map extracted in step (3) to obtain spatiotemporal feature information;

2)将人和物体的特征单独裁剪出来,进行交互建模分析,公式如下:2) Cut out the features of people and objects separately, and conduct interactive modeling analysis. The formula is as follows:

Figure DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE018

其中

Figure DEST_PATH_IMAGE019
表示第i个人的时空运动特征;
Figure DEST_PATH_IMAGE020
表示第j个物体的特征;
Figure DEST_PATH_IMAGE021
表示当前帧物体 特征集合。
Figure DEST_PATH_IMAGE022
表示人和物体关系模型,
Figure DEST_PATH_IMAGE023
表示整合多个人物关系模型的结果,这两者都是通 过卷积神经网络实现。 in
Figure DEST_PATH_IMAGE019
represents the spatiotemporal motion characteristics of the i-th person;
Figure DEST_PATH_IMAGE020
Represents the feature of the jth object;
Figure DEST_PATH_IMAGE021
Represents the feature set of objects in the current frame.
Figure DEST_PATH_IMAGE022
Represents a model of the relationship between people and objects,
Figure DEST_PATH_IMAGE023
Represents the result of integrating multiple person relationship models, both of which are implemented via convolutional neural networks.

(5) 行为分类及融合(5) Behavior classification and fusion

该模块主要包含三个步骤:This module mainly consists of three steps:

1)对(3)中得到的人体运动信息进行行为分类;1) Behavior classification of the human motion information obtained in (3);

2)对(4)中建立的关系模型进行行为分类;2) Classify the behavior of the relational model established in (4);

3)对两个分类结果融合,用到的公式如下:3) For the fusion of the two classification results, the formula used is as follows:

Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE024

其中,

Figure DEST_PATH_IMAGE025
表示1)得到的分类结果得分,
Figure DEST_PATH_IMAGE026
表示2)中关系建模得到的分类结果得分,
Figure DEST_PATH_IMAGE027
为可学习的超参数,表示结果的重要程度,如果行为与物体的关系较小,则
Figure 676780DEST_PATH_IMAGE027
较小,模型更 关注1)中的分类结果,否则,2)中的分类结果更重要。 in,
Figure DEST_PATH_IMAGE025
Represents 1) the obtained classification result score,
Figure DEST_PATH_IMAGE026
Represents the classification result score obtained by relational modeling in 2),
Figure DEST_PATH_IMAGE027
is a learnable hyperparameter, indicating the importance of the result, if the relationship between the behavior and the object is small, then
Figure 676780DEST_PATH_IMAGE027
Smaller, the model pays more attention to the classification result in 1), otherwise, the classification result in 2) is more important.

(4) 检测结果优化(4) Optimization of test results

本步骤主要是用来优化倒地异常行为的检测结果,主要原因在于,倒地之后人体运动信息有可能较少,单纯使用深度学习的方式无法与正常的行为区分开,因此在检测出倒地的动作之后,通过计算人体的重心速度变化情况进行辅助判断是否仍然处于倒地的状态,优化流程如图4所示。This step is mainly used to optimize the detection results of abnormal behavior of falling to the ground. After the movement of the human body, the change of the center of gravity of the human body is calculated to assist in judging whether it is still in the state of falling to the ground. The optimization process is shown in Figure 4.

当前检测结果的优化过程如下:The optimization process of the current detection results is as follows:

1)通过上一帧的初步检测结果,判断是否检测到倒地的动作,如未检测到倒地的动作,则将上一帧的初步检测结果作为最终结果并输出行为类别;如检测到倒地的动作则进行第二步;1) According to the preliminary detection result of the previous frame, determine whether the falling motion is detected. If the falling motion is not detected, the preliminary detection result of the previous frame is used as the final result and the behavior category is output; if the falling motion is detected The action of the ground is the second step;

2)通过位置框计算人体的重心点,并计算相邻帧的速度变化信息,得到

Figure 702505DEST_PATH_IMAGE016
;3)将
Figure DEST_PATH_IMAGE028
与阈值
Figure DEST_PATH_IMAGE029
作比较,如果小于阈值
Figure 48167DEST_PATH_IMAGE029
,则表明仍然处于倒地状态,并将该结果覆盖检测出的结 果;如果大于等于阈值
Figure 184750DEST_PATH_IMAGE029
则表明已经不再处于倒地的状态(比如从倒地站立起来),此时采 用模型检测到的结果作为最终的行为类别。 2) Calculate the center of gravity of the human body through the position box, and calculate the speed change information of adjacent frames to get
Figure 702505DEST_PATH_IMAGE016
; 3) will
Figure DEST_PATH_IMAGE028
with threshold
Figure DEST_PATH_IMAGE029
for comparison, if less than the threshold
Figure 48167DEST_PATH_IMAGE029
, it indicates that it is still in the downed state, and the result is covered by the detected result; if it is greater than or equal to the threshold
Figure 184750DEST_PATH_IMAGE029
It indicates that it is no longer in the state of falling to the ground (such as standing up from the ground), and at this time, the result detected by the model is used as the final behavior category.

尽管参照前述实施例对本发明进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。Although the present invention has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible to modify the technical solutions described in the foregoing embodiments, or to perform equivalent replacements for some of the technical features. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims (6)

1.一种基于时空信息及人、物交互的人体异常行为检测方法,其特征在于,步骤如下:S1、数据采集与标注;S2、提取人和物体的位置信息;S3、提取人和物体的运动信息;S4、人和物体特征交互关系建模;S5、行为分类及融合;S6、检测结果优化;1. A method for detecting abnormal human behavior based on space-time information and interaction between people and objects, characterized in that the steps are as follows: S1, data collection and labeling; S2, extract the position information of people and objects; S3, extract the position information of people and objects Motion information; S4, interactive relationship modeling of human and object features; S5, behavior classification and fusion; S6, optimization of detection results; 所述步骤S1中的数据采集与标注包括:采集视频监控中的正常动作和异常动作,对视频数据进行裁剪,通过SSD目标检测网络生成人和物体的初始空间位置,最后使用简易标注工具对生成的位置信息进行人工校正,修正检测不准确的物体位置,得到准确的位置信息;The data collection and labeling in the step S1 includes: collecting normal actions and abnormal actions in video surveillance, cropping the video data, generating the initial spatial positions of people and objects through the SSD target detection network, and finally using a simple labeling tool to generate data. The position information is manually corrected to correct the inaccurate object position and obtain accurate position information; 所述步骤S4中的人和物体特征交互关系建模包括:将经步骤S2得到的人和物体的位置信息应用到步骤S3提取得到的特征图上,得到时空特征信息;将人和物体的特征单独裁剪出来,进行交互建模,公式如下:R(Pi)=Fα{Gβ(Pi,Oj),Oj∈O},其中,R(Pi)表示第i个人的时空特征与各个物体特征总体的相关性,Pi表示第i个人的时空运动特征;Oj表示第j个物体的特征;O表示当前帧物体特征集合;Gβ表示人和物体关系模型,Fα表示整合多个人物关系模型的结果;The modeling of the interaction relationship between people and objects features in the step S4 includes: applying the position information of the people and objects obtained in step S2 to the feature map extracted in step S3 to obtain spatiotemporal feature information; Cut them out individually and perform interactive modeling. The formula is as follows: R(P i )=F α {G β (P i ,O j ),O j ∈O}, where R(P i ) represents the space-time of the ith person The correlation between the feature and the overall feature of each object, Pi represents the spatiotemporal motion feature of the ith person; Oj represents the feature of the jth object; O represents the feature set of objects in the current frame; G β represents the relationship model between people and objects, F α Represents the result of integrating multiple character relationship models; 所述步骤S6中的检测结果优化包括:通过上一帧的初步检测结果,判断是否检测到倒地的动作,如未检测到倒地的动作,则将上一帧的初步检测结果作为最终结果并输出行为类别;如检测到倒地的动作则通过位置框计算人体的重心点,并计算相邻帧的速度变化信息,得到Vi,将Vi与阈值μ作比较,如果小于阈值μ,则表明仍然处于倒地状态,并将该结果覆盖检测出的结果;如果大于等于阈值μ,则表明已经不再处于倒地的状态,采用模型检测到的结果作为最终的结果并输出行为类别。The optimization of the detection results in the step S6 includes: through the preliminary detection results of the previous frame, judging whether the action of falling to the ground is detected, if the action of falling to the ground is not detected, then the preliminary detection result of the previous frame is used as the final result. And output the behavior category; if the action of falling to the ground is detected, the center of gravity of the human body is calculated through the position box, and the speed change information of adjacent frames is calculated to obtain V i , and V i is compared with the threshold μ, if it is smaller than the threshold μ, It indicates that it is still in the downed state, and the result is covered by the detected result; if it is greater than or equal to the threshold μ, it indicates that it is no longer in the downed state, and the result detected by the model is used as the final result and the behavior category is output. 2.根据权利要求1所述的基于时空信息及人、物交互的人体异常行为检测方法,其特征在于:所述的简易标注工具是用来修正框的位置信息,读取并显示图片及其对应的人、物框,可以判断显示框的位置是否准确,并通过鼠标重新绘制新的框,新数据会覆盖旧数据。2. the human body abnormal behavior detection method based on time-space information and human-object interaction according to claim 1, is characterized in that: described simple and convenient labeling tool is used to correct the position information of the frame, read and display the picture and it. The corresponding person and object frame can judge whether the position of the display frame is accurate, and redraw a new frame through the mouse, and the new data will overwrite the old data. 3.根据权利要求1所述的基于时空信息及人、物交互的人体异常行为检测方法,其特征在于:所述步骤S2中的提取人和物体的位置信息包括:通过对MS COCO数据集上预训练的SSD目标检测网络在采集的数据集上进行微调,准确检测人和物体的位置。3. the human body abnormal behavior detection method based on time-space information and human, object interaction according to claim 1, is characterized in that: the position information of extracting people and objects in the described step S2 comprises: by the MS COCO data set on the The pre-trained SSD object detection network is fine-tuned on the collected dataset to accurately detect the location of people and objects. 4.根据权利要求3所述的基于时空信息及人、物交互的人体异常行为检测方法,其特征在于:所述的微调是指在MS COCO数据集预训练的模型基础上,针对训练数据只对网络的最后两层进行重新训练,其余层的参数保持不变。4. the human abnormal behavior detection method based on space-time information and human-object interaction according to claim 3, it is characterized in that: described fine-tuning refers to on the basis of the model pre-trained in MS COCO data set, only for training data. The last two layers of the network are retrained, and the parameters of the remaining layers remain unchanged. 5.根据权利要求1所述的基于时空信息及人、物交互的人体异常行为检测方法,其特征在于:所述步骤S3中的提取人和物体的运动信息包括:采用3D-ShuffleNet网络作为时空运动信息的主干网络,取当前帧以及前面的15帧数据共同组成一个输入片段作为输入数据,对输入的16帧数据进行特征提取,最终得到单帧的时空信息特征图。5. the human body abnormal behavior detection method based on space-time information and human, thing interaction according to claim 1, is characterized in that: the motion information of extracting people and objects in described step S3 comprises: adopt 3D-ShuffleNet network as space-time The backbone network of motion information takes the current frame and the previous 15 frames of data to form an input segment as input data, and performs feature extraction on the input 16 frames of data, and finally obtains the spatiotemporal information feature map of a single frame. 6.根据权利要求1所述的基于时空信息及人、物交互的人体异常行为检测方法,其特征在于:所述步骤S5中的行为分类及融合包括:分别对人体运动信息及人和物体交互关系模型进行行为分类,对两个分类结果进行融合,得到初步检测结果,融合公式如下:C=(1-θ)*S1+θ*S2,其中,C表示将S1和S2的分类得分融合得到的动作分类结果,S1表示人体运动信息得到的分类结果得分,S2表示人和物体交互关系建模得到的分类结果得分,θ为可学习的超参数,表示结果的重要程度,如果θ<0.5,则说明行为与物体的相关性小,模型更关注人体运动信息的分类结果,反之,模型则更关注人和物体交互关系建模的分类结果。6. The method for detecting abnormal human behavior based on space-time information and human-object interaction according to claim 1, wherein the behavior classification and fusion in the step S5 include: human motion information and human-object interaction respectively The relational model performs behavior classification, and fuses the two classification results to obtain preliminary detection results. The fusion formula is as follows: C=(1-θ)*S 1 +θ*S 2 , where C represents the combination of S 1 and S 2 The action classification result obtained by the fusion of classification scores, S 1 represents the classification result score obtained from human motion information, S 2 represents the classification result score obtained by modeling the interaction between people and objects, θ is a learnable hyperparameter, indicating the importance of the result. , if θ < 0.5, it means that the correlation between behavior and objects is small, and the model pays more attention to the classification results of human motion information. On the contrary, the model pays more attention to the classification results of human-object interaction modeling.
CN202110030865.2A 2021-01-11 2021-01-11 A human abnormal behavior detection method based on spatiotemporal information and human-object interaction Active CN112381072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110030865.2A CN112381072B (en) 2021-01-11 2021-01-11 A human abnormal behavior detection method based on spatiotemporal information and human-object interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110030865.2A CN112381072B (en) 2021-01-11 2021-01-11 A human abnormal behavior detection method based on spatiotemporal information and human-object interaction

Publications (2)

Publication Number Publication Date
CN112381072A CN112381072A (en) 2021-02-19
CN112381072B true CN112381072B (en) 2021-05-25

Family

ID=74590054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110030865.2A Active CN112381072B (en) 2021-01-11 2021-01-11 A human abnormal behavior detection method based on spatiotemporal information and human-object interaction

Country Status (1)

Country Link
CN (1) CN112381072B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926837B (en) * 2022-05-26 2023-08-04 东南大学 Emotion recognition method based on human-object space-time interaction behavior

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104510475A (en) * 2014-12-15 2015-04-15 中国科学院计算技术研究所 Human body falling-down detection method and system
CN110321780A (en) * 2019-04-30 2019-10-11 苏州大学 Exception based on spatiotemporal motion characteristic falls down behavioral value method
CN110569773A (en) * 2019-08-30 2019-12-13 江南大学 A two-stream network action recognition method based on spatio-temporal saliency action attention
CN111310689A (en) * 2020-02-25 2020-06-19 陕西科技大学 Method for recognizing human body behaviors in potential information fusion home security system
CN111325073A (en) * 2018-12-17 2020-06-23 上海交通大学 Monitoring video abnormal behavior detection method based on motion information clustering
CN111738218A (en) * 2020-07-27 2020-10-02 成都睿沿科技有限公司 Human body abnormal behavior recognition system and method
CN111898514A (en) * 2020-07-24 2020-11-06 燕山大学 A multi-target visual supervision method based on target detection and action recognition
CN112149616A (en) * 2020-10-13 2020-12-29 西安电子科技大学 A method of character interaction behavior recognition based on dynamic information

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236783A (en) * 2010-04-29 2011-11-09 索尼公司 Method and equipment for detecting abnormal actions and method and equipment for generating detector
CA3041148C (en) * 2017-01-06 2023-08-15 Sportlogiq Inc. Systems and methods for behaviour understanding from trajectories
US11450145B2 (en) * 2017-04-12 2022-09-20 Disney Enterprise, Inc. System and method for monitoring procedure compliance
US10572723B2 (en) * 2017-12-07 2020-02-25 Futurewei Technologies, Inc. Activity detection by joint human and object detection and tracking
CN110555404A (en) * 2019-08-29 2019-12-10 西北工业大学 Flying wing unmanned aerial vehicle ground station interaction device and method based on human body posture recognition
CN111339668B (en) * 2020-02-28 2022-05-10 西南交通大学 Crowd evacuation visualization method based on emotion cognition
CN111582122B (en) * 2020-04-29 2021-03-16 成都信息工程大学 System and method for intelligently analyzing behaviors of multi-dimensional pedestrians in surveillance video
CN111709306B (en) * 2020-05-22 2023-06-09 江南大学 Two-stream Network Behavior Recognition Method Based on Multi-level Spatial-Temporal Feature Fusion Enhancement
CN111797705A (en) * 2020-06-11 2020-10-20 同济大学 An Action Recognition Method Based on Character Relationship Modeling
CN111767888A (en) * 2020-07-08 2020-10-13 北京澎思科技有限公司 Object state detection method, computer device, storage medium, and electronic device
CN112052795B (en) * 2020-09-07 2022-10-18 北京理工大学 Video behavior identification method based on multi-scale space-time feature aggregation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104510475A (en) * 2014-12-15 2015-04-15 中国科学院计算技术研究所 Human body falling-down detection method and system
CN111325073A (en) * 2018-12-17 2020-06-23 上海交通大学 Monitoring video abnormal behavior detection method based on motion information clustering
CN110321780A (en) * 2019-04-30 2019-10-11 苏州大学 Exception based on spatiotemporal motion characteristic falls down behavioral value method
CN110569773A (en) * 2019-08-30 2019-12-13 江南大学 A two-stream network action recognition method based on spatio-temporal saliency action attention
CN111310689A (en) * 2020-02-25 2020-06-19 陕西科技大学 Method for recognizing human body behaviors in potential information fusion home security system
CN111898514A (en) * 2020-07-24 2020-11-06 燕山大学 A multi-target visual supervision method based on target detection and action recognition
CN111738218A (en) * 2020-07-27 2020-10-02 成都睿沿科技有限公司 Human body abnormal behavior recognition system and method
CN112149616A (en) * 2020-10-13 2020-12-29 西安电子科技大学 A method of character interaction behavior recognition based on dynamic information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection;Chen Gao等;《arXiv:1808.10437v1》;20180830;第1-13页引言、第3节,图3 *
基于时空交互注意力模型的人体行为识别算法;潘娜等;《激光与光电子学进展》;20200930;第57卷(第18期);第181506-1 - 181506-9页 *
基于深度学习的行为检测方法综述;高陈强等;《重庆邮电大学学报(自然科学版)》;20201231;第32卷(第6期);第991-1002页 *

Also Published As

Publication number Publication date
CN112381072A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
Vishnu et al. Human fall detection in surveillance videos using fall motion vector modeling
CN107451553B (en) It is a kind of based on hypergraph transformation video in incident of violence detection method
WO2019232894A1 (en) Complex scene-based human body key point detection system and method
WO2020042419A1 (en) Gait-based identity recognition method and apparatus, and electronic device
CN111797771B (en) A method and system for weakly supervised video behavior detection based on iterative learning
CN112766159A (en) Cross-database micro-expression identification method based on multi-feature fusion
CN108416258B (en) A Multi-body Tracking Method Based on Human Body Part Model
CN111191667B (en) Crowd counting method based on multiscale generation countermeasure network
CN103093199B (en) Based on the Given Face tracking of ONLINE RECOGNITION
CN109299690B (en) A method that can improve the accuracy of video real-time face recognition
CN109829382B (en) Abnormal target early warning tracking system and method based on intelligent behavior characteristic analysis
CN110688980B (en) Human body posture classification method based on computer vision
CN107729876A (en) Fall detection method in old man room based on computer vision
CN112801000B (en) A fall detection method and system for the elderly at home based on multi-feature fusion
CN115761908A (en) Mobile terminal child visual attention abnormity screening method based on multi-mode data learning
CN103106394A (en) Human body action recognition method in video surveillance
CN105930770A (en) Human motion identification method based on Gaussian process latent variable model
CN110909672A (en) Smoking action recognition method based on double-current convolutional neural network and SVM
CN110477907B (en) Modeling method for intelligently assisting in recognizing epileptic seizures
CN111652035A (en) A method and system for pedestrian re-identification based on ST-SSCA-Net
CN115393830A (en) A fatigue driving detection method based on deep learning and facial features
CN114358194A (en) A method for detecting abnormal limb behavior in autism spectrum disorder based on posture tracking
CN109886102B (en) Fall-down behavior time-space domain detection method based on depth image
CN105976397A (en) Target tracking method based on half nonnegative optimization integration learning
Pervaiz et al. Artificial neural network for human object interaction system over aerial images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant