CN112381072B

CN112381072B - A human abnormal behavior detection method based on spatiotemporal information and human-object interaction

Info

Publication number: CN112381072B
Application number: CN202110030865.2A
Authority: CN
Inventors: 龚勋; 马冰; 刘璐
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-05-25
Anticipated expiration: 2041-01-11
Also published as: CN112381072A

Abstract

The invention discloses a human abnormal behavior detection method based on space-time information and human-object interaction. The steps are as follows: S1, data collection and labeling; S2, extracting position information of people and objects; S3, extracting motion information of people and objects ; S4, interactive relationship modeling of human and object features; S5, behavior classification and fusion; S6, optimization of detection results. For the detection of abnormal movements of falling to the ground, climbing and physical conflicts, and persistent abnormal states, the method of human interaction is used to assist in judging abnormal behaviors, and the persistent state of abnormal behaviors is detected by combining the changes of the center of gravity. At the same time, in addition to detecting abnormal behaviors In addition, the present invention can also detect normal actions such as walking, standing and sitting.

Description

A human abnormal behavior detection method based on spatiotemporal information and human-object interaction

技术领域technical field

本发明涉及计算机视觉和深度学习技术领域，具体涉及一种基于时空信息及人、物交互的人体异常行为检测方法。The invention relates to the technical fields of computer vision and deep learning, in particular to a method for detecting abnormal human behavior based on space-time information and human-object interaction.

背景技术Background technique

人体异常行为检测在安防和智能监控领域具有重要的应用，很大程度上缓解了人工监控的压力并提高了检测效率。现有的解决方案有些采用手工特征提取运动特征进行判别，在实际的真实场景应用中准确率较低；而当前一些基于深度学习的方法只能检测某一种异常行为，无法适应真实条件下多种异常行为的自动判定。而像攀爬和倒地这类异常动作具有一定的特殊性，不仅需要实时检测行为人正在进行的异常动作，还需要能够持续判别这些异常动作的状态。比如倒地后可能持续性的躺在那里不动，攀爬后不断的在桌子或其他辅助物体上走动，这都对现有检测技术带来了挑战，目前的方法无法检测异常动作的持续性状态，因此需要新的技术方法来解决。Human abnormal behavior detection has important applications in the field of security and intelligent monitoring, which greatly relieves the pressure of manual monitoring and improves the detection efficiency. Some of the existing solutions use manual feature extraction motion features for discrimination, which have low accuracy in actual real-world applications; while some current deep learning-based methods can only detect a certain abnormal behavior, and cannot adapt to many real conditions. Automatic determination of abnormal behavior. However, abnormal actions such as climbing and falling to the ground have certain particularities. It is not only necessary to detect the abnormal actions being performed by the actor in real time, but also to be able to continuously discriminate the state of these abnormal actions. For example, after falling to the ground, you may continue to lie there still, and after climbing, you may continue to walk on a table or other auxiliary objects, all of which bring challenges to the existing detection technology. The current method cannot detect the persistence of abnormal movements. state, so new technical methods are needed to solve it.

发明内容SUMMARY OF THE INVENTION

为解决现有技术中存在的问题，本发明提供了一种基于时空信息及人、物交互的人体异常行为检测方法，解决了上述背景技术中提到的问题。In order to solve the problems existing in the prior art, the present invention provides a method for detecting abnormal human behavior based on space-time information and human-object interaction, which solves the problems mentioned in the above background art.

为实现上述目的，本发明提供如下技术方案：一种基于时空信息及人、物交互的人体异常行为检测方法，步骤如下：S1、数据采集与标注；S2、提取人和物体的位置信息；S3、提取人和物体的运动信息；S4、人和物体特征交互关系建模；S5、行为分类及融合；S6、检测结果优化；所述的异常行为是指超出正常范围的行为，具有一定的场景相关性，表示在该场景下不被接受的行为。In order to achieve the above purpose, the present invention provides the following technical solutions: a method for detecting abnormal human behavior based on space-time information and human-object interaction, the steps are as follows: S1, data collection and labeling; S2, extracting the position information of people and objects; S3 , extracting the motion information of people and objects; S4, modeling the interaction relationship between people and objects; S5, behavior classification and fusion; S6, optimization of detection results; the abnormal behavior refers to behavior beyond the normal range, with a certain scene Relevance, indicating unacceptable behavior in the context.

优选的，所述步骤S1中的数据采集与标注包括：采集视频监控中的正常动作和异常动作，对视频数据进行裁剪,通过SSD目标检测网络生成人和物体的初始空间位置, 最后使用简易标注工具对生成的位置信息进行人工校正，修正检测不准确的物体位置,得到准确的位置信息；所述的正常动作是指在监控场景下可以被接受的动作，正常动作包括走路、坐下或站立；而异常动作则表示在该场景下不被接受的动作，异常动作包括倒地、攀爬或肢体冲突。Preferably, the data collection and labeling in step S1 includes: collecting normal actions and abnormal actions in video surveillance, cropping video data, generating initial spatial positions of people and objects through the SSD target detection network, and finally using simple labeling The tool manually corrects the generated position information, corrects inaccurate object positions, and obtains accurate position information; the normal actions refer to actions that are acceptable in the monitoring scene, and normal actions include walking, sitting or standing. ; and abnormal actions represent unacceptable actions in the scene, including falling to the ground, climbing, or physical conflict.

优选的，所述的简易标注工具是用来修正框的位置信息, 读取并显示图片及其对应的人、物框，可以判断显示框的位置是否准确，并通过鼠标重新绘制新的框，新数据会覆盖旧数据。Preferably, the simple labeling tool is used to correct the position information of the frame, read and display the picture and its corresponding person and object frame, can judge whether the position of the display frame is accurate, and redraw a new frame through the mouse, New data overwrites old data.

优选的，所述步骤S2的提取人和物体的位置信息包括: 通过对MS COCO数据集上预训练的SSD目标检测网络在采集的数据集上进行微调，准确检测人和物体的位置。Preferably, the extraction of the position information of people and objects in the step S2 includes: by fine-tuning the SSD target detection network pre-trained on the MS COCO data set on the collected data set to accurately detect the positions of people and objects.

优选的，所述的微调是指在MS COCO数据集预训练的模型基础上，针对训练数据只对网络的最后两层进行重新训练，其余层的参数保持不变。Preferably, the fine-tuning refers to retraining only the last two layers of the network for the training data on the basis of the model pre-trained in the MS COCO dataset, and the parameters of the remaining layers remain unchanged.

优选的，所述步骤S3中的提取人和物体的运动信息包括:采用3D-ShuffleNet网络作为时空运动信息的主干网络, 取当前帧以及前面的15帧数据共同组成一个输入片段作为输入数据, 对输入的16帧数据进行特征提取，最终得到单帧的时空信息特征图。Preferably, the motion information of extracting people and objects in the step S3 includes: adopting the 3D-ShuffleNet network as the backbone network of the spatiotemporal motion information, taking the current frame and the previous 15 frames of data to form an input segment together as input data, to The input 16 frames of data are subjected to feature extraction, and finally the spatiotemporal information feature map of a single frame is obtained.

优选的，所述步骤S4中的人和物体特征交互关系建模包括:将经步骤S2得到的人和物体的位置信息应用到步骤S3提取得到的特征图上，得到时空特征信息;将人和物体的特征单独裁剪出来，进行交互建模,公式如下:

，其中,

表示第i个人的时空特征与各个物体特征总体的相关性,

表示第i个人的时空运动特征；

表示第j个物体的特征；

表示当前帧物体特征集合。

表示人和物体关系模型，

表示整合多个人物关系模型的结果。 Preferably, the modeling of the interaction relationship between people and objects features in the step S4 includes: applying the position information of the people and objects obtained in step S2 to the feature map extracted in step S3 to obtain spatiotemporal feature information; The features of the object are individually cut out and interactively modeled. The formula is as follows:

,in,

Represents the correlation between the spatiotemporal characteristics of the ith person and the overall characteristics of each object,

represents the spatiotemporal motion characteristics of the i-th person;

Represents the feature of the jth object;

Represents the feature set of objects in the current frame.

Represents a model of the relationship between people and objects,

Represents the result of integrating multiple person relationship models.

优选的，所述步骤S5中的行为分类及融合包括：分别对人体运动信息及人和物体交互关系模型进行行为分类，对两个分类结果进行融合，得到初步检测结果，融合公式如下：

，其中，C表示将

和

的分类得分融合得到的动作分类结果，

表示人体运动信息得到的分类结果得分，

表示人和物体交互关系建模得到的分类结果得分，

为可学习的超参数，表示结果的重要程度，如果

＜0.5，则说明行为与物体的相关性小，模型更关注人体运动信息的分类结果，反之，模型则更关注人和物体交互关系建模的分类结果。 Preferably, the behavior classification and fusion in the step S5 includes: classifying the behavior of the human body motion information and the human-object interaction model, respectively, and merging the two classification results to obtain a preliminary detection result. The fusion formula is as follows:

, where C means that the

and

The action classification results obtained by the fusion of the classification scores,

represents the classification result score obtained from human motion information,

Represents the classification result score obtained by modeling the interaction between people and objects,

is a learnable hyperparameter, indicating the importance of the result, if

<0.5, it means that the correlation between behavior and objects is small, and the model pays more attention to the classification results of human motion information. On the contrary, the model pays more attention to the classification results of modeling the interaction between people and objects.

优选的，所述步骤S6中的检测结果优化包括：通过上一帧的初步检测结果，判断是否检测到倒地的动作，如未检测到倒地的动作，则将上一帧的初步检测结果作为最终结果并输出行为类别；如检测到倒地的动作则通过位置框计算人体的重心点，并计算相邻帧的速度变化信息，得到

，

表示相邻帧的速度变化信息，将

与阈值

作比较，如果小于阈值

，则表明仍然处于倒地状态，并将该结果覆盖检测出的结果；如果大于等于阈值

，则表明已经不再处于倒地的状态，采用模型检测到的结果作为最终的结果并输出行为类别。 Preferably, the optimization of the detection results in the step S6 includes: judging whether the action of falling to the ground is detected by the preliminary detection result of the previous frame, and if the action of falling to the ground is not detected, then the preliminary detection result of the previous frame is used. As the final result, the behavior category is output; if the falling motion is detected, the center of gravity of the human body is calculated through the position box, and the speed change information of adjacent frames is calculated to obtain

,

Indicates the speed change information of adjacent frames, the

with threshold

for comparison, if less than the threshold

, it indicates that it is still in the downed state, and the result is covered by the detected result; if it is greater than or equal to the threshold

, it indicates that it is no longer in the state of falling to the ground, and the result detected by the model is used as the final result and the behavior category is output.

本发明的有益效果是：通过本发明的方法，目标检测模块可以准确定位行为人和物体的具体空间位置，模型最终会给出行为人的行为类别。最后将人体框和行为类别绘制到原始图片上（不包含物体框），并记录异常行为类别。本发明主要在利用人与物交互建模分析、行为分类融合以及基于重心速度模型对结果的优化上，采用人物交互的方式辅助判断异常行为，并结合重心的变化情况检测异常行为的持续性状态。同时，除了能够检测异常行为之外，本发明还可以检测走路、站立以及坐下这些正常动作。The beneficial effects of the present invention are: through the method of the present invention, the target detection module can accurately locate the specific spatial position of the actor and the object, and the model will finally give the behavior category of the actor. Finally, the human frame and behavior category are drawn onto the original image (without the object frame), and the abnormal behavior category is recorded. The present invention mainly uses human-object interaction modeling analysis, behavior classification and fusion, and optimization of results based on the center of gravity velocity model, adopts the method of human interaction to assist in judging abnormal behavior, and combines the change of center of gravity to detect the persistent state of abnormal behavior . Meanwhile, in addition to detecting abnormal behaviors, the present invention can also detect normal actions such as walking, standing and sitting.

附图说明Description of drawings

图1为本发明网络模型图；Fig. 1 is the network model diagram of the present invention;

图2为本发明数据采集与标注流程图；Fig. 2 is the flow chart of data collection and labeling of the present invention;

图3为本发明检测结果优化流程图；Fig. 3 is the flow chart of detection result optimization of the present invention;

图4为本发明模型训练和运行流程图。Fig. 4 is a flow chart of model training and operation of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

请参阅图1-4，模型训练和运行流程如图3所示，本发明提供一种技术方案：一种基于时空信息及人、物交互的人体异常行为检测方法，步骤如下：（1）数据采集与标注；（2）提取人和物体的位置信息；（3）提取人和物体的运动信息；（4）人和物体特征交互关系建模；（5）行为分类及融合；（6）检测结果优化。Please refer to Figures 1-4. The model training and running process is shown in Figure 3. The present invention provides a technical solution: a method for detecting abnormal human behavior based on space-time information and human-object interaction. The steps are as follows: (1) Data Collection and labeling; (2) Extracting the position information of people and objects; (3) Extracting motion information of people and objects; (4) Modeling the interaction relationship between people and objects; (5) Behavior classification and fusion; (6) Detection Results are optimized.

（1）数据采集与标注(1) Data collection and labeling

本发明在真实的视频监控场景下采集了正常动作和异常动作的，为了便于对数据标注，我们对真实场景的视频数据进行裁剪，紧接着使用SSD目标检测网络生成人和物体的初始空间位置，网络模型图如图1所示，最后使用发明的简易标注工具对生成的位置信息进行人工校正，修正检测不准确的物体位置，具体流程如图2所示。The present invention collects normal actions and abnormal actions in the real video surveillance scene. In order to facilitate the data annotation, we cut the video data of the real scene, and then use the SSD target detection network to generate the initial spatial positions of people and objects. The network model diagram is shown in Figure 1. Finally, the generated position information is manually corrected using the invented simple annotation tool, and the inaccurate object position is corrected. The specific process is shown in Figure 2.

简易标注工具说明：本工具主要是用来修正框的位置信息的，可以读取并显示图片及其对应的人、物框，使用者可以判断显示框的位置是否准确，并通过鼠标重新绘制新的框，新数据会覆盖旧数据。Simple annotation tool description: This tool is mainly used to correct the position information of the frame. It can read and display the picture and its corresponding person and object frame. The user can judge whether the position of the display frame is accurate, and redraw the new frame through the mouse. box, the new data will overwrite the old data.

（2）提取人和物体的位置信息(2) Extract the location information of people and objects

本发明通过对MS COCO数据集上预训练的SSD（Single Shot MultiBox Detector）目标检测网络在采集的数据集上进行微调，以适应监控场景下的目标特征，准确检测人和物体的位置。The invention fine-tunes the pre-trained SSD (Single Shot MultiBox Detector) target detection network on the MS COCO data set on the collected data set, so as to adapt to the target features in the monitoring scene and accurately detect the positions of people and objects.

微调方式：在MS COCO数据集预训练的模型基础上，针对训练数据只对网络的最后两层进行重新训练，其余层的参数保持不变。Fine-tuning method: Based on the pre-trained model of the MS COCO dataset, only the last two layers of the network are retrained for the training data, and the parameters of the remaining layers remain unchanged.

（3）提取人和物体的运动信息(3) Extract the motion information of people and objects

为了兼顾运行速度和检测准确度，本发明提出使用3D-ShuffleNet作为时空运动信息的主干网络，具体过程如下：In order to take into account the running speed and detection accuracy, the present invention proposes to use 3D-ShuffleNet as the backbone network of spatiotemporal motion information. The specific process is as follows:

1）数据采样，本发明使用16帧数据作为输入，具体采样过程是：取当前帧以及前面的15帧数据共同组成一个输入片段作为输入数据；1) Data sampling, the present invention uses 16 frames of data as input, and the specific sampling process is: take the current frame and the previous 15 frames of data to form an input segment as input data;

2）使用时空下采样的方式，对输入的16帧数据进行特征提取，通过对特征下采样，最终得到单帧的时空信息特征图，通过这种方式可以方便与目标检测模块进行结合。2) Use spatiotemporal downsampling to extract features from the input 16 frames of data, and finally obtain the spatiotemporal information feature map of a single frame by downsampling the features, which can be easily combined with the target detection module.

（4）人和物体特征交互关系建模(4) Modeling the interaction relationship between human and object features

本模块的主要过程包含以下几个步骤：The main process of this module includes the following steps:

1）将第（2）步得到的位置信息应用到第（3）提取得到的特征图上，得到时空特征信息；1) Apply the position information obtained in step (2) to the feature map extracted in step (3) to obtain spatiotemporal feature information;

2）将人和物体的特征单独裁剪出来，进行交互建模分析，公式如下：2) Cut out the features of people and objects separately, and conduct interactive modeling analysis. The formula is as follows:

其中

表示第i个人的时空运动特征；

表示第j个物体的特征；

表示当前帧物体特征集合。

表示人和物体关系模型，

表示整合多个人物关系模型的结果，这两者都是通过卷积神经网络实现。 in

represents the spatiotemporal motion characteristics of the i-th person;

Represents the feature of the jth object;

Represents the feature set of objects in the current frame.

Represents a model of the relationship between people and objects,

Represents the result of integrating multiple person relationship models, both of which are implemented via convolutional neural networks.

（5）行为分类及融合(5) Behavior classification and fusion

该模块主要包含三个步骤：This module mainly consists of three steps:

1）对（3）中得到的人体运动信息进行行为分类；1) Behavior classification of the human motion information obtained in (3);

2）对（4）中建立的关系模型进行行为分类；2) Classify the behavior of the relational model established in (4);

3）对两个分类结果融合，用到的公式如下：3) For the fusion of the two classification results, the formula used is as follows:

其中，

表示1）得到的分类结果得分，

表示2）中关系建模得到的分类结果得分，

为可学习的超参数，表示结果的重要程度，如果行为与物体的关系较小，则

较小，模型更关注1）中的分类结果，否则，2）中的分类结果更重要。 in,

Represents 1) the obtained classification result score,

Represents the classification result score obtained by relational modeling in 2),

is a learnable hyperparameter, indicating the importance of the result, if the relationship between the behavior and the object is small, then

Smaller, the model pays more attention to the classification result in 1), otherwise, the classification result in 2) is more important.

（4）检测结果优化(4) Optimization of test results

本步骤主要是用来优化倒地异常行为的检测结果，主要原因在于，倒地之后人体运动信息有可能较少，单纯使用深度学习的方式无法与正常的行为区分开，因此在检测出倒地的动作之后，通过计算人体的重心速度变化情况进行辅助判断是否仍然处于倒地的状态，优化流程如图4所示。This step is mainly used to optimize the detection results of abnormal behavior of falling to the ground. After the movement of the human body, the change of the center of gravity of the human body is calculated to assist in judging whether it is still in the state of falling to the ground. The optimization process is shown in Figure 4.

当前检测结果的优化过程如下：The optimization process of the current detection results is as follows:

1）通过上一帧的初步检测结果，判断是否检测到倒地的动作，如未检测到倒地的动作，则将上一帧的初步检测结果作为最终结果并输出行为类别；如检测到倒地的动作则进行第二步；1) According to the preliminary detection result of the previous frame, determine whether the falling motion is detected. If the falling motion is not detected, the preliminary detection result of the previous frame is used as the final result and the behavior category is output; if the falling motion is detected The action of the ground is the second step;

2）通过位置框计算人体的重心点，并计算相邻帧的速度变化信息，得到

；3）将

与阈值

作比较，如果小于阈值

则表明已经不再处于倒地的状态（比如从倒地站立起来），此时采用模型检测到的结果作为最终的行为类别。 2) Calculate the center of gravity of the human body through the position box, and calculate the speed change information of adjacent frames to get

; 3) will

with threshold

for comparison, if less than the threshold

It indicates that it is no longer in the state of falling to the ground (such as standing up from the ground), and at this time, the result detected by the model is used as the final behavior category.

尽管参照前述实施例对本发明进行了详细的说明，对于本领域的技术人员来说，其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换,凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Although the present invention has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible to modify the technical solutions described in the foregoing embodiments, or to perform equivalent replacements for some of the technical features. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. A method for detecting abnormal human behavior based on space-time information and interaction between people and objects, characterized in that the steps are as follows: S1, data collection and labeling; S2, extract the position information of people and objects; S3, extract the position information of people and objects Motion information; S4, interactive relationship modeling of human and object features; S5, behavior classification and fusion; S6, optimization of detection results;

The data collection and labeling in the step S1 includes: collecting normal actions and abnormal actions in video surveillance, cropping the video data, generating the initial spatial positions of people and objects through the SSD target detection network, and finally using a simple labeling tool to generate data. The position information is manually corrected to correct the inaccurate object position and obtain accurate position information;

The modeling of the interaction relationship between people and objects features in the step S4 includes: applying the position information of the people and objects obtained in step S2 to the feature map extracted in step S3 to obtain spatiotemporal feature information; Cut them out individually and perform interactive modeling. The formula is as follows: R(P _i )=F _α {G _β (P _i ,O _j ),O _j ∈O}, where R(P _i ) represents the space-time of the ith person The correlation between the feature and the overall feature of each object, Pi represents the spatiotemporal motion feature of the ith person; _Oj represents the feature of the _jth object; O represents the feature set of objects in the current frame; G _β represents the relationship model between people and objects, F _α Represents the result of integrating multiple character relationship models;

The optimization of the detection results in the step S6 includes: through the preliminary detection results of the previous frame, judging whether the action of falling to the ground is detected, if the action of falling to the ground is not detected, then the preliminary detection result of the previous frame is used as the final result. And output the behavior category; if the action of falling to the ground is detected, the center of gravity of the human body is calculated through the position box, and the speed change information of adjacent frames is calculated to obtain V _i , and V _i is compared with the threshold μ, if it is smaller than the threshold μ, It indicates that it is still in the downed state, and the result is covered by the detected result; if it is greater than or equal to the threshold μ, it indicates that it is no longer in the downed state, and the result detected by the model is used as the final result and the behavior category is output.

2. the human body abnormal behavior detection method based on time-space information and human-object interaction according to claim 1, is characterized in that: described simple and convenient labeling tool is used to correct the position information of the frame, read and display the picture and it. The corresponding person and object frame can judge whether the position of the display frame is accurate, and redraw a new frame through the mouse, and the new data will overwrite the old data.

3. the human body abnormal behavior detection method based on time-space information and human, object interaction according to claim 1, is characterized in that: the position information of extracting people and objects in the described step S2 comprises: by the MS COCO data set on the The pre-trained SSD object detection network is fine-tuned on the collected dataset to accurately detect the location of people and objects.

4. the human abnormal behavior detection method based on space-time information and human-object interaction according to claim 3, it is characterized in that: described fine-tuning refers to on the basis of the model pre-trained in MS COCO data set, only for training data. The last two layers of the network are retrained, and the parameters of the remaining layers remain unchanged.

5. the human body abnormal behavior detection method based on space-time information and human, thing interaction according to claim 1, is characterized in that: the motion information of extracting people and objects in described step S3 comprises: adopt 3D-ShuffleNet network as space-time The backbone network of motion information takes the current frame and the previous 15 frames of data to form an input segment as input data, and performs feature extraction on the input 16 frames of data, and finally obtains the spatiotemporal information feature map of a single frame.

6. The method for detecting abnormal human behavior based on space-time information and human-object interaction according to claim 1, wherein the behavior classification and fusion in the step S5 include: human motion information and human-object interaction respectively The relational model performs behavior classification, and fuses the two classification results to obtain preliminary detection results. The fusion formula is as follows: C=(1-θ)*S ₁ +θ*S ₂ , where C represents the combination of S ₁ and S ₂ The action classification result obtained by the fusion of classification scores, S ₁ represents the classification result score obtained from human motion information, S ₂ represents the classification result score obtained by modeling the interaction between people and objects, θ is a learnable hyperparameter, indicating the importance of the result. , if θ < 0.5, it means that the correlation between behavior and objects is small, and the model pays more attention to the classification results of human motion information. On the contrary, the model pays more attention to the classification results of human-object interaction modeling.