CN111310689B - Method for recognizing human body behaviors in potential information fusion home security system - Google Patents

Method for recognizing human body behaviors in potential information fusion home security system Download PDF

Info

Publication number
CN111310689B
CN111310689B CN202010116795.8A CN202010116795A CN111310689B CN 111310689 B CN111310689 B CN 111310689B CN 202010116795 A CN202010116795 A CN 202010116795A CN 111310689 B CN111310689 B CN 111310689B
Authority
CN
China
Prior art keywords
human
human body
features
behavior
joint point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010116795.8A
Other languages
Chinese (zh)
Other versions
CN111310689A (en
Inventor
李颀
姜莎莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202010116795.8A priority Critical patent/CN111310689B/en
Publication of CN111310689A publication Critical patent/CN111310689A/en
Application granted granted Critical
Publication of CN111310689B publication Critical patent/CN111310689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

一种潜在信息融合的家庭安防系统中的人体行为识别的方法,以跟踪得到的人体运动时序序列作为研究对象,将姿态特征与行为之间、交互物体特征与行为之间、行为与行为之间的相关性作为潜在信息,通过在姿态时空特征的提取以及交互物体的特征提取中引入约束条件充分挖掘潜在信息对家庭安防系统中的人体行为识别的影响,从而增大行为类间差异,同时减小行为类内差异,提高人体行为识别方法的准确性和泛化性。将每个关节点关于行为类别的互信息作为约束条件,对得到的所有互信息进行排序,保留能表示特定行为的互信息最大的关节点组,使用筛选后的关节点组和交互物体特征融合进行行为识别,提高识别的实时性和准确性。

Figure 202010116795

A method of human behavior recognition in a family security system based on potential information fusion. Taking the time series of human motion obtained from tracking as the research object, the relationship between gesture features and behaviors, between interactive object features and behaviors, and between behaviors and behaviors The correlation of human body behavior recognition in the home security system is fully exploited by introducing constraints in the extraction of gesture spatio-temporal features and the feature extraction of interactive objects to fully mine the impact of potential information on human behavior recognition in home security systems, thereby increasing the differences between behavior categories and reducing Small behavior intra-class differences improve the accuracy and generalization of human behavior recognition methods. Use the mutual information of each joint point about the behavior category as a constraint condition, sort all the mutual information obtained, retain the joint point group with the largest mutual information that can represent a specific behavior, and use the filtered joint point group and interactive object feature fusion Perform behavior recognition to improve the real-time and accuracy of recognition.

Figure 202010116795

Description

潜在信息融合的家庭安防系统中的人体行为识别的方法Human behavior recognition method in home security system based on latent information fusion

技术领域Technical Field

本发明涉及计算机视觉技术领域,特别涉及潜在信息融合的家庭安防系统中的人体行为识别的方法。The present invention relates to the field of computer vision technology, and in particular to a method for human behavior recognition in a home security system with potential information fusion.

背景技术Background Art

目前,很多人开始在家中安装视频监控系统,保障自己的财产和生命安全。但是这些监控系统是安装在家庭内部的,不能防患于未然,并且传统的数字监控系统主要依赖于监控人员对监视画面的监视和分析,不仅效率低下,而且实时性和有效性也都无法满足越来越高的安全性需求。At present, many people have begun to install video surveillance systems at home to protect their property and life safety. However, these surveillance systems are installed inside the home and cannot prevent accidents before they happen. In addition, traditional digital surveillance systems mainly rely on monitoring personnel to monitor and analyze the surveillance images, which is not only inefficient, but also cannot meet the increasingly high security requirements in terms of real-time and effectiveness.

发明内容Summary of the invention

为了克服上述现有技术的不足,本发明的目的在于提供潜在信息融合的家庭安防系统中的人体行为识别的方法,通过计算机代替家人对于监控画面中的人体行为进行自动分析,当发现异常现象时,可以立即提示家人注意家门口的情况,不但可以进行24×7全天候可靠监控,而且实时性和有效性均得到大幅度提升。In order to overcome the deficiencies of the above-mentioned prior art, the purpose of the present invention is to provide a method for human behavior recognition in a home security system with potential information fusion, in which a computer is used to automatically analyze human behavior in the monitoring screen on behalf of family members. When an abnormal phenomenon is found, the family members can be immediately reminded to pay attention to the situation at the door of the house. Not only can 24×7 all-weather reliable monitoring be carried out, but the real-time performance and effectiveness are also greatly improved.

为了实现上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical solution adopted by the present invention is:

提供潜在信息融合的家庭安防系统中的人体行为识别的方法,包括以下步骤;A method for human behavior recognition in a home security system providing potential information fusion includes the following steps;

步骤一,使用摄像头采集图像;Step 1: Use a camera to capture images;

步骤二,使用基于背景差分法的光照自适应方法对采集的图像进行人体目标检测,然后使用Staple方法对检测到的人体目标进行跟踪,获得人体运动时序序列;Step 2: Use the illumination adaptive method based on background difference method to detect human targets in the collected images, and then use the Staple method to track the detected human targets to obtain the human motion time series;

步骤三,对检测到的人脸进行识别,判断其是否为家人,若是家人,对步骤二中得到的运动时序序列不进行任何操作,反之,进行人体行为识别;Step 3: Identify the detected face to determine whether it is a family member. If it is a family member, do not perform any operation on the motion time sequence obtained in step 2. Otherwise, perform human behavior recognition.

步骤四,对步骤二中得到的运动时序序列提取人体的姿态时空特征;Step 4, extracting the temporal and spatial characteristics of the human body posture from the motion time series obtained in step 2;

步骤五,采用线索增强的深度卷积神经网络完成交互物体特征的提取;Step 5: Use the cue-enhanced deep convolutional neural network to extract the features of the interactive objects;

步骤六,融合步骤四和步骤五中提取到的全局的姿态时空特征和局部的交互物体特征;Step 6: Fusing the global spatiotemporal features of the posture and the local interactive object features extracted in steps 4 and 5;

步骤七,将融合后的特征向量输入到SVM分类器中进行行为识别。Step 7: Input the fused feature vector into the SVM classifier for behavior recognition.

所述的步骤二中,对进入到检测范围内的人体使用基于背景差分法的光照自适应方法检测,使用Vibe算法进行背景建模,记录前一帧检测到的人体目标的像素点数,用Y表示,当前帧检测到的前景目标的像素点数,用L表示,由于在光照突变的一瞬间,会出现大面积白色,系统会将背景误检为前景,此时L>Y。因此在前景检测的时候设定一个阈值(前一帧中检测到人体目标范围的像素点数)判断前景的检测范围,若范围超过该阈值,则说明发生了光照突变,反之,则没有发生光照突变,若发生光照突变,则利用像素点在相邻两帧图像的亮度变化值对背景模型进行光照补偿,补偿公式如下:In the step 2, the human body entering the detection range is detected using a lighting adaptive method based on the background difference method, and the background modeling is performed using the Vibe algorithm. The number of pixels of the human target detected in the previous frame is recorded, represented by Y, and the number of pixels of the foreground target detected in the current frame is represented by L. Since a large area of white will appear at the moment of sudden change in illumination, the system will mistakenly detect the background as the foreground, and L>Y at this time. Therefore, a threshold value (the number of pixels of the human target range detected in the previous frame) is set during foreground detection to determine the detection range of the foreground. If the range exceeds the threshold value, it means that a sudden illumination change has occurred. Otherwise, no sudden illumination change has occurred. If a sudden illumination change has occurred, the brightness change value of the pixel point in two adjacent frames of the image is used to perform illumination compensation on the background model. The compensation formula is as follows:

Δt(x,y)=|Vt(x,y)-Vt-1(x,y)|Δ t (x,y)=|V t (x,y)-V t-1 (x,y)|

其中:in:

Figure BDA0002391733480000031
Figure BDA0002391733480000031

Vt表示图像It(x,y)的全局平均亮度值,式中n为图像中的总的像素点数,n=1280×480=614400个像素点,It(x,y)max(R,G,B)和It(x,y)min(R,G,B)分别表示像素点(x,y)处R,G,B分量中的最大值与最小值;V t represents the global average brightness value of the image It (x, y), where n is the total number of pixels in the image, n = 1280 × 480 = 614400 pixels, It (x, y) max (R, G, B) and It (x, y) min (R, G, B) represent the maximum and minimum values of the R, G, and B components at the pixel point (x, y), respectively;

检测到人体目标后使用Staple方法对检测到的人体目标进行跟踪,在跟踪的过程中使用平移滤波器和颜色滤波器找到目标的位置,然后利用尺度滤波器得到目标的大小,最终获得人体运动时序序列。After detecting the human target, the Staple method is used to track the detected human target. During the tracking process, the translation filter and color filter are used to find the position of the target, and then the scale filter is used to get the size of the target, and finally the human motion time series is obtained.

所述的步骤四中,对得到的人体运动时序序列提取姿态时空特征,具体过程包括:In the step 4, the temporal and spatial features of postures are extracted from the obtained human body motion time series, and the specific process includes:

1)计算每个关节点的互信息,通过互信息判断每个关节点对某个特定行为的响应程度,最终保留能表示特定行为的互信息最大的关节点组,计算每个关节点的互信息的公式如下:1) Calculate the mutual information of each joint point, and use the mutual information to determine the response degree of each joint point to a specific behavior. Finally, retain the joint point group with the largest mutual information that can represent the specific behavior. The formula for calculating the mutual information of each joint point is as follows:

I(fj,Y)=H(fj)-H(fj|Y)I(f j ,Y)=H(f j )-H(f j |Y)

其中H(fj)表示第j个关节点的信息熵,j=1,2,...,20,

Figure BDA0002391733480000032
Figure BDA0002391733480000033
表示第j个关节点随时间变化的动态过程,N 表示人体运动时序序列的帧数,Y为人体行为的类别,在家庭安防场景下,主要识别送水、送快递、送外卖、朋友、保洁人员、其他人等,因此Y=1,2,3,4,5,6,其中熵的计算公式如下:Where H(f j ) represents the information entropy of the jth joint point, j = 1, 2, ..., 20,
Figure BDA0002391733480000032
Figure BDA0002391733480000033
represents the dynamic process of the jth joint point changing over time, N represents the number of frames of the human motion time series, and Y is the category of human behavior. In the home security scenario, it mainly identifies water delivery, express delivery, takeaway delivery, friends, cleaning staff, and other people, so Y = 1, 2, 3, 4, 5, 6, where the entropy calculation formula is as follows:

Figure BDA0002391733480000041
Figure BDA0002391733480000041

其中p(fj)是概率密度函数,i表示时序序列的帧数i=1,2...,N;Where p(f j ) is the probability density function, i represents the number of frames in the time series, i=1,2...,N;

2)对上述经过筛选后的关节点提取姿态时空特征,其中空间维度上的特征为:2) Extract the posture spatiotemporal features of the above-screened joint points, where the features in the spatial dimension are:

Figure BDA0002391733480000042
Figure BDA0002391733480000042

其中K表示人体姿态的关节点,K=1,2,...,20,N表示人体运动时序序列的帧数,选取人体髋关节点作为人体质心,T表示关节坐标轨迹特征矩阵,θ表示每个删选后的每个关节点相对于人体质心的方向矩阵,D表示任意两个关节点的空间距离矩阵,ψ表示任意2个关节构成的向量相对于质心向上的向量的方向矩阵,A表示任意3个关节点构成的3个内角大小矩阵;Where K represents the joint point of the human body posture, K = 1, 2, ..., 20, N represents the number of frames of the human body motion time series, the human hip joint point is selected as the human body center of mass, T represents the joint coordinate trajectory feature matrix, θ represents the direction matrix of each joint point after each selection relative to the human body center of mass, D represents the spatial distance matrix of any two joint points, ψ represents the direction matrix of the vector formed by any two joints relative to the upward vector of the center of mass, and A represents the 3 inner angle size matrices formed by any 3 joint points;

时间维度上的特征为:The characteristics in the time dimension are:

Figure BDA0002391733480000043
Figure BDA0002391733480000043

其中,ΔT为关节点的轨迹位移矩阵,Δθ为同一关节点随位移的方向,ΔD为任意两个关节点的距离随时间变化的矩阵,Δψ为任意两个关节点相对质心向上的向量的方向变化,ΔA为任意3个关节点构成的内角大小变化矩阵。Among them, ΔT is the trajectory displacement matrix of the joint point, Δθ is the direction of the same joint point with displacement, ΔD is the matrix of the distance change between any two joint points over time, Δψ is the change in direction of the upward vector of any two joint points relative to the center of mass, and ΔA is the change matrix of the size of the internal angle formed by any three joint points.

提取到的姿态时空特征表示为:The extracted spatiotemporal features of the posture are expressed as:

Fpose=Fspatial+FtemporalF pose =F spatial +F temporal .

所述的步骤五中,通过将检测到的人体作为线索,将与人交互的有效物体作为高级线索,使用卷积神经网络提取与人交互的物体的特征,隐式的将检测到的人体中的物体与人之间的位置关系整合到卷积神经网络中,提取与人交互的有效物体的特征;In the step 5, the detected human body is used as a clue, and the effective object interacting with the human body is used as a high-level clue, and the features of the object interacting with the human body are extracted using a convolutional neural network, and the positional relationship between the object in the detected human body and the human body is implicitly integrated into the convolutional neural network to extract the features of the effective object interacting with the human body;

训练过程中使用损失函数,并在损失反向传播之时进行参数的调整,混合损失函数计算公式为:The loss function is used during training, and the parameters are adjusted during loss back propagation. The formula for calculating the mixed loss function is:

L(M,D)=Lmain(M,D)+αLhint(M,D)L(M,D)=L main (M,D)+αL hint (M,D)

其中Lmain(M,D)表示交互物体特征提取的损失函数,Lhint(M,D)表示距离暗示任务的损失是函数,M表示网络模型,

Figure BDA0002391733480000051
作为 N个样本图片的训练集合,
Figure BDA0002391733480000052
表示N张图像,
Figure BDA0002391733480000053
表示相关的类别标签,α取值在0~1之间。Where L main (M, D) represents the loss function of interactive object feature extraction, L hint (M, D) represents the loss function of the distance hint task, M represents the network model,
Figure BDA0002391733480000051
As a training set of N sample images,
Figure BDA0002391733480000052
represents N images,
Figure BDA0002391733480000053
Represents the relevant category label, and α takes a value between 0 and 1.

所述的步骤六中,由于姿态时空特征和交互物体特征对于不同人体行为识别的响应程度不同,所以对得到的两种特征进行加权融合,公式如下:In step 6, since the spatiotemporal features of the posture and the interactive object features have different response levels to the recognition of different human behaviors, the two features are weightedly fused, and the formula is as follows:

F=w1Fpose+w2Fobject F=w 1 F pose +w 2 F object

其中,w1为姿态时空特征的加权系数,w2为人体交互物体特征的加权系数,且w1+w2=1,Fpose表示姿态时空特征,Fobject表示交互物体特征。Wherein, w1 is the weighted coefficient of the posture spatiotemporal feature, w2 is the weighted coefficient of the human body interactive object feature, and w1 + w2 =1, Fpose represents the posture spatiotemporal feature, and Fobject represents the interactive object feature.

所述的步骤七中,将融合后的特征向量输入到SVM分类器中进行分类,得到最后的识别结果。In the step seven, the fused feature vector is input into the SVM classifier for classification to obtain the final recognition result.

本发明的有益效果:Beneficial effects of the present invention:

本发明使用机器视觉技术对家庭安防环境下的人体进行行为检测,在行为检测之前首先进行人脸识别,判断是否为家人,若是家人就不进行行为检测,反之进行行为检测,在行为检测中融合了交互物体检测,提高了识别准确率,在按门铃之前检测出行为,能够防患于未然,提高实时性和有效性。本发明中将人体交互物体和人体运动的本体特征结合起来进行行为检测,对处理不同场景下通过交互物体识别人体行为具有重要的研究价值。The present invention uses machine vision technology to detect human behavior in a home security environment. Before behavior detection, face recognition is first performed to determine whether the person is a family member. If the person is a family member, behavior detection is not performed. Otherwise, behavior detection is performed. Interactive object detection is integrated into behavior detection to improve recognition accuracy. Behavior is detected before the doorbell is pressed, which can prevent problems before they occur and improve real-time performance and effectiveness. In the present invention, the body characteristics of human interactive objects and human motion are combined for behavior detection, which has important research value for processing human behavior recognition through interactive objects in different scenarios.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例提供的行为识别方法流程图。FIG1 is a flow chart of a behavior recognition method provided by an embodiment of the present invention.

图2是本发明实施例提供的人体目标检测流程图。FIG. 2 is a flow chart of human target detection provided by an embodiment of the present invention.

图3是本发明实施例提供的姿态时空特征提取流程图。FIG3 is a flow chart of posture spatiotemporal feature extraction provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明作进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings.

针对现有家庭安防系统存在的外人入侵问题以及传统的数字监控存在的实时性和有效性无法满足安全需求的问题。本发明将机器视觉技术使用在安防场景中,无需监控人对监控图像进行分析,对家人采用人脸识别技术,对家人之外的人采用人体行为识别技术,在敲门之前检测出人体的行为,及时通知给家人,防患于未然,提高了实时性和准确性。In view of the problem of outsider intrusion in existing home security systems and the problem that the real-time and effectiveness of traditional digital monitoring cannot meet security needs, the present invention uses machine vision technology in security scenarios. It does not require monitoring personnel to analyze monitoring images. It uses face recognition technology for family members and human behavior recognition technology for people other than family members. It detects human behavior before knocking on the door and notifies family members in time to prevent problems before they happen, thereby improving real-time and accuracy.

下面结合附图对本发明的应用原理作进一步的说明:The application principle of the present invention is further described below in conjunction with the accompanying drawings:

如图1所示为本发明方法的总体流程示意图,本发明所述的潜在信息融合的家庭安防系统中的人体行为识别方法,按以下步骤进行:FIG1 is a schematic diagram of the overall process of the method of the present invention. The method for identifying human behavior in a home security system with potential information fusion according to the present invention is performed in the following steps:

步骤一,使用安装在家门口的摄像头采集检测范围内的图像。Step 1: Use the camera installed at the door to collect images within the detection range.

步骤二,如图2所示:首先对采集到的图像进行实时的人体目标检测,本系统使用背景差分法进行人体目标检测时,背景模型的建立与更新是背景差分法的关键,又由于家庭安防系统对实时性要求较高,且存在光线的突变问题,所以采用Vibe背景建模法实现模型的建立,该算法速度非常快,计算量比较小,而且对噪声有一定的鲁棒性。记录前一帧中检测到的人体目标范围的像素点数,用Y表示,当前帧中检测为前景的像素点数,用L表示,由于在光照突变的一瞬间,会出现大面积白色,系统会将背景误检为前景,此时L>Y。因此在前景检测的时候设定一个阈值(前一帧检测到人体目标范围的像素点数)判断前景的检测范围,若范围超过该阈值,则说明发生了光照突变,反之,则没有发生光照突变,若发生光照突变,则利用像素点在相邻两帧图像的亮度变化值对背景模型进行光照补偿,补偿公式如下:Step 2, as shown in Figure 2: First, perform real-time human target detection on the collected image. When this system uses the background difference method for human target detection, the establishment and update of the background model is the key to the background difference method. Since the home security system has high real-time requirements and there is a sudden change in light, the Vibe background modeling method is used to achieve model establishment. The algorithm is very fast, has a relatively small amount of calculation, and has a certain degree of robustness to noise. Record the number of pixels in the human target range detected in the previous frame, represented by Y, and the number of pixels detected as foreground in the current frame, represented by L. Since a large area of white will appear at the moment of sudden change in light, the system will mistakenly detect the background as foreground, and L>Y at this time. Therefore, when detecting the foreground, a threshold is set (the number of pixels in the human target range detected in the previous frame) to determine the detection range of the foreground. If the range exceeds the threshold, it means that a sudden change in light has occurred. Otherwise, no sudden change in light has occurred. If a sudden change in light has occurred, the background model is compensated for light by using the brightness change value of the pixel point in two adjacent frames of the image. The compensation formula is as follows:

Δt(x,y)=|Vt(x,y)-Vt-1(x,y)|Δ t (x,y)=|V t (x,y)-V t-1 (x,y)|

其中:in:

Figure BDA0002391733480000071
Figure BDA0002391733480000071

表示图像It(x,y)的全局平均亮度值,式中n为图像中的总的像素点数,n=1280×480=614400,It(x,y)max(R,G,B)和It(x,y)min(R,G,B)分别表示像素点(x,y)处R,G,B分量中的最大值与最小值。It represents the global average brightness value of the image It (x, y), where n is the total number of pixels in the image, n = 1280 × 480 = 614400, It (x, y) max (R, G, B) and It (x, y) min (R, G, B) represent the maximum and minimum values of the R, G, and B components at the pixel point (x, y), respectively.

在实际情况中,由于拍摄的图片中存在边界部分或者一些较强的反射系数的背景物体,经过背景差分处理后,仍然无法完全抵消,它们体现为一些点状、小块状和线状的噪声,这些需要在目标检测的过程中作出合理的判断,以区分实际的运动目标。因此对获得的二值图像使用形态学滤波解决此问题,对于多个目标来说,经过形态学滤波以后的二值化图像一般包含多个区域,由于多目标区域一般都由几个互相之间不连通的子区域组成,因此,有必要检测每一个区域的连通情况,然后通过标记将它们区分开来,按照这些标记在原始图像中框定各个目标,从而进一步计算出各目标在每帧图像中的位置。In actual situations, since there are boundary parts or some background objects with strong reflection coefficients in the captured pictures, they cannot be completely offset after background difference processing. They are reflected as some point-shaped, small block-shaped and line-shaped noises. These need to make reasonable judgments in the process of target detection to distinguish the actual moving targets. Therefore, morphological filtering is used to solve this problem for the binary image obtained. For multiple targets, the binary image after morphological filtering generally contains multiple regions. Since the multi-target region is generally composed of several sub-regions that are not connected to each other, it is necessary to detect the connectivity of each region, and then distinguish them by marking, and frame each target in the original image according to these marks, so as to further calculate the position of each target in each frame image.

进一步,采用Staple方法对第一帧图像建立相关滤波器模型和颜色滤波器模板,对于新的一帧图像,首先使用平移滤波器和颜色滤波器找到目标的位置,然后利用尺度滤波器以目标位置为中心点提取不同的候选目标框,将响应值最大的值相应的尺度作为最终的目标尺度,得到目标的位置和大小,然后更新相关滤波器模型和颜色滤波器模型。Furthermore, the Staple method is used to establish a correlation filter model and a color filter template for the first frame image. For a new frame image, the translation filter and the color filter are first used to find the position of the target, and then the scale filter is used to extract different candidate target frames with the target position as the center point. The scale corresponding to the value with the largest response value is used as the final target scale to obtain the position and size of the target, and then the correlation filter model and the color filter model are updated.

通过得到平移滤波器的得分和颜色直方图的得分后,进行加权求和,计算公式为:After obtaining the score of the translation filter and the score of the color histogram, a weighted sum is performed, and the calculation formula is:

f(x)=γtmplftmpl(x)+γhistfhist(x)f(x)=γ tmpl f tmpl (x)+γ hist f hist (x)

其中x=T(xt,p);θt-1,T是特征提取函数,xt是表示第t帧图像,p 表示任意一帧图像中的矩形框,θ表示模型参数,θt-1是根据前t-1帧建立的目标模型参数,为了在满足实时性的基础上将梯度特征与颜色特征结合,采用线性方式得出打分函数,其中γtmpl是滤波器模板的得分系数,γhist是直方图得分系数,γtmplhist=1。Where x=T( xt , p); θt -1 , T is the feature extraction function, xt represents the t-th frame image, p represents the rectangular box in any frame image, θ represents the model parameters, θt -1 is the target model parameter established according to the previous t-1 frame, in order to combine the gradient features with the color features on the basis of meeting the real-time performance, a linear method is used to derive the scoring function, where γtmpl is the score coefficient of the filter template, γhist is the histogram score coefficient, and γtmpl + γhist =1.

步骤三,对检测到的人体进行人脸识别,判断是否为家人,若是家人,将识别结果显示在人机交互界面上;反之,进行行为识别。Step three, perform facial recognition on the detected human body to determine whether it is a family member. If it is a family member, the recognition result will be displayed on the human-computer interaction interface; otherwise, behavioral recognition will be performed.

步骤四,如图3所示:以人体运动时序序列作为研究对象,提取人体姿态时空特征,具体过程包括:Step 4, as shown in Figure 3: Taking the human motion time series as the research object, extracting the spatiotemporal features of human posture. The specific process includes:

1)由于在家庭安防场景中,人是向前运动的,因此摄像头拍摄到的人体是远近变化的,所以人体关节点位置坐标可能存在较大的差异,为了消除这种差异,首先对人体关节点坐标进行归一化处理。1) In the home security scenario, people move forward, so the human body captured by the camera changes in distance, so there may be large differences in the coordinates of the human joints. In order to eliminate this difference, the coordinates of the human joints are first normalized.

假设人体某关节点原始坐标为(x0,y0),归一化后坐标为(x,y),归一化公式如下:Assume that the original coordinates of a joint point of the human body are (x 0 ,y 0 ), and the normalized coordinates are (x,y). The normalization formula is as follows:

Figure BDA0002391733480000091
Figure BDA0002391733480000091

其中d=max{w,h},w和h分别为图像的宽和高,归一化后 x,y∈(-1,1)。Where d = max{w,h}, w and h are the width and height of the image respectively, and after normalization x,y∈(-1,1).

2)得到归一化的人体姿态坐标后,提取姿态时序序列的时间特征和空间特征,其中空间特征刻画了同一帧图像上关节点的位置及相互位置关系,时间特征刻画了由于姿态变化导致的关节位置的变化。2) After obtaining the normalized human body posture coordinates, the temporal and spatial features of the posture time series are extracted. The spatial features describe the positions of the joint points and their relative positions on the same frame image, and the temporal features describe the changes in the joint positions caused by posture changes.

由于人体的各个关节点对某个特定行为的响应程度不同,如果将所有关节点都考虑进去而不加以区分,那些响应程度低的关节点势必会引入噪声而降低识别效果,所以要有选择的丢弃一些噪声点,所以通过计算每个关节点关于行为类别的互信息,保留能表示特定行为的互信息最大的关节点组。Since each joint of the human body responds to a specific behavior to a different degree, if all joints are taken into account without distinguishing them, those joints with low response levels will inevitably introduce noise and reduce the recognition effect. Therefore, some noise points must be selectively discarded. Therefore, by calculating the mutual information of each joint about the behavior category, the group of joints with the largest mutual information that can represent the specific behavior is retained.

假设行为时序序列一共有N帧,第j(j=1,2,...,20)个关节点随时间变化的动态过程可以表示为:Assuming that the behavior time series has a total of N frames, the dynamic process of the jth (j=1, 2, ..., 20) joint point changing over time can be expressed as:

Figure BDA0002391733480000092
Figure BDA0002391733480000092

每个关节点对于人体行为类别的互信息为:The mutual information of each joint point for the human behavior category is:

I(fj,Y)=H(fj)-H(fj|Y)I(f j ,Y)=H(f j )-H(f j |Y)

其中H(fj)表示第j个关节点的信息熵,Y为人体行为的类别,在家庭安防场景下,主要识别送水、送快递、送外卖、朋友、保洁人员、其他人等,因此Y=1,2,3,4,5,6。使用上式度量每一个关节点对特定行为类别的响应程度。其中熵的计算公式为:Where H(f j ) represents the information entropy of the jth joint point, and Y is the category of human behavior. In the home security scenario, it mainly identifies water delivery, express delivery, takeaway delivery, friends, cleaning staff, and other people, so Y = 1, 2, 3, 4, 5, 6. The above formula is used to measure the response degree of each joint point to a specific behavior category. The entropy calculation formula is:

Figure BDA0002391733480000101
Figure BDA0002391733480000101

Figure BDA0002391733480000102
Figure BDA0002391733480000102

其中p(fj)是概率密度函数,p(Y,fi j)为联合概率密度函数, p(fi j|Y)为条件概率密度函数,i表示时序序列的帧数i=1,2...,N。Where p(f j ) is the probability density function, p(Y, fi j ) is the joint probability density function, p(fi j |Y) is the conditional probability density function, and i represents the number of frames of the time series i = 1, 2..., N.

计算出每个关节点对人体行为类别的互信息后,将互信息从大到小排序,选定能够表示特定行为的互信息最大的关节点组。After calculating the mutual information of each joint point to the human behavior category, the mutual information is sorted from large to small, and the joint point group with the largest mutual information that can represent a specific behavior is selected.

互信息最大的关节点组的选取规则是:对于家庭安防场景下的人体行为识别,对于正常的人体来说主要关注的是人体的胳膊、手部、腿部的关节点的信息,而对于有入侵行为的人来说会关注到整个关节点的信息,因此对排好序的互信息矩阵选取的约束条件是:The selection rule of the joint point group with the largest mutual information is: for human behavior recognition in the home security scenario, for a normal human body, the main focus is on the information of the joint points of the human body's arms, hands, and legs, while for a person with intrusion behavior, the focus will be on the information of the entire joint point. Therefore, the constraints for selecting the sorted mutual information matrix are:

每个行为得到的关节点的互信息组成的矩阵是:The matrix composed of the mutual information of the joint points obtained for each behavior is:

Figure BDA0002391733480000103
Figure BDA0002391733480000103

其中N表示第N(N=1,2,3,4,5,6)类行为,K表示第K(K=1,2,...,20) 个关节点,对每个关节点得到的互信息进行排序,由于胳膊、手部、腿部是主要关注的关节点,因此在排好序的互信息组中选择上述三部分关节点组成的能表示特定行为的关节点组。Where N represents the Nth (N=1, 2, 3, 4, 5, 6) type of behavior, K represents the Kth (K=1, 2, ..., 20) joint point, and the mutual information obtained for each joint point is sorted. Since the arms, hands, and legs are the main joint points of concern, the joint point group composed of the above three parts of joint points that can represent specific behaviors is selected from the sorted mutual information group.

通过关节点对行为的响应程度筛选后的姿态矩阵为:The posture matrix after screening by the response degree of the joint points to the behavior is:

Figure BDA0002391733480000111
Figure BDA0002391733480000111

其中,rij=(xij,yij),i∈{1,2,...,N},j∈{1,2,...,K},N=6,K的取值范围为4~14,此处的K为得到的最大关节点组的最大下标。Wherein, r ij =(x ij ,y ij ), i∈{1,2,...,N}, j∈{1,2,...,K}, N=6, the value range of K is 4 to 14, and K here is the maximum subscript of the obtained maximum joint point group.

得到筛选后的姿态矩阵后,提取空间维度上的特征和时间维度上的特征,空间维度上的特征为:After obtaining the filtered posture matrix, the features in the spatial dimension and the features in the temporal dimension are extracted. The features in the spatial dimension are:

Figure BDA0002391733480000112
Figure BDA0002391733480000112

其中选取人体髋关节点(xi0,yi0)作为人体质心,T=(tij)N×K表示关节坐标轨迹特征矩阵,tij=(xij-xi0,yij-yi0),

Figure BDA0002391733480000113
表示筛选后的每个关节点相对于人体质心的方向矩阵,
Figure BDA0002391733480000114
表示任意两个关节点的空间距离矩阵,
Figure BDA0002391733480000115
表示任意2个关节构成的向量相对于质心向上的向量的方向矩阵,
Figure BDA0002391733480000116
表示任意3个关节点构成的3个内角大小矩阵。The human hip joint point ( xi0 , yi0 ) is selected as the human body center of mass, T = ( tij ) N × K represents the joint coordinate trajectory feature matrix, tij = ( xi0 - xi0 , yi0 - yi0 ),
Figure BDA0002391733480000113
Represents the direction matrix of each joint point after screening relative to the center of mass of the human body,
Figure BDA0002391733480000114
Represents the spatial distance matrix of any two joint points,
Figure BDA0002391733480000115
Represents the direction matrix of the vector formed by any two joints relative to the upward vector of the center of mass,
Figure BDA0002391733480000116
Represents the 3 inner angle matrices formed by any 3 joint points.

时间维度上的特征为:The characteristics in the time dimension are:

Figure BDA0002391733480000117
Figure BDA0002391733480000117

其中,ΔT=(xi+s,j-xij,yi+s,j-yij)(N-s)×2K为关节点的轨迹位移矩阵,

Figure BDA0002391733480000121
为同一关节点随位移的方向,
Figure BDA0002391733480000122
为任意两个关节点的距离随时间变化的矩阵,
Figure BDA0002391733480000123
为任意两个关节点相对质心向上的向量的方向变化,
Figure BDA0002391733480000124
为任意3个关节点构成的内角大小变化矩阵。Among them, ΔT=(xi +s,j - xij ,yi +s,j - yij ) (Ns)×2K is the trajectory displacement matrix of the joint point,
Figure BDA0002391733480000121
is the direction of displacement of the same joint point,
Figure BDA0002391733480000122
is the matrix of the distance between any two joint points changing over time,
Figure BDA0002391733480000123
is the change in the direction of the upward vector of any two joint points relative to the center of mass,
Figure BDA0002391733480000124
It is the inner angle size change matrix formed by any three joint points.

通过时间特征和空间特征得到姿态时空特征表示为:The posture spatiotemporal features obtained by time features and space features are expressed as:

Fpose=Fspatial+Ftemporal F pose =F spatial +F temporal

进一步,通过将检测到的人体作为线索,将与人交互的有效物体作为高级线索,使用卷积神经网络提取与人交互的物体的特征,隐式的将检测到的人体中的物体与人之间的位置关系整合到卷积神经网络中,提取与人交互的有效物体的特征。Furthermore, by taking the detected human body as a clue and the effective objects interacting with the human body as high-level clues, a convolutional neural network is used to extract the features of the objects interacting with the human body. The positional relationship between the detected objects in the human body and the human body is implicitly integrated into the convolutional neural network to extract the features of the effective objects interacting with the human body.

本发明中联合执行两个任务,包括交互物体识别的主任务和距离暗示增强的辅助任务。其中辅助任务起到正则化网络的作用,增强网络的表达能力。暗示任务在主要任务上的影响体现在共享全连接之前的所有卷积层。为了混合地学习到这些层的权重,使用一个混合损失函数,它结合两个任务的损失函数。具体如下:用M表达网络模型,

Figure BDA0002391733480000125
作为N个样本图片的训练集合,
Figure BDA0002391733480000126
表示N张图像,
Figure BDA0002391733480000127
表示相关的类别标签,α取值在0~1之间,混合损失函数的公式为:The present invention jointly performs two tasks, including the main task of interactive object recognition and the auxiliary task of distance hint enhancement. The auxiliary task plays a role in regularizing the network and enhancing the network's expressiveness. The influence of the hint task on the main task is reflected in all the convolutional layers before the shared full connection. In order to learn the weights of these layers in a mixed way, a hybrid loss function is used, which combines the loss functions of the two tasks. Specifically, the network model is expressed by M,
Figure BDA0002391733480000125
As a training set of N sample images,
Figure BDA0002391733480000126
represents N images,
Figure BDA0002391733480000127
Represents the relevant category label, α takes a value between 0 and 1, and the formula of the mixed loss function is:

L(M,D)=Lmain(M,D)+αLhint(M,D)L(M,D)=L main (M,D)+αL hint (M,D)

Figure BDA0002391733480000128
Figure BDA0002391733480000128

Figure BDA0002391733480000131
Figure BDA0002391733480000131

Mmain(·)和Mhint(·)分别表示主要任务的输出和暗示任务的输出。模型参数通过随机梯度下降来训练和微调。使用随机梯度下降算法去优化L(M,D)。在计算梯度后,使用如下公式表示的规则更新权重ω。M main (·) and M hint (·) represent the output of the main task and the output of the hint task, respectively. The model parameters are trained and fine-tuned by stochastic gradient descent. Stochastic gradient descent is used to optimize L(M,D). After calculating the gradient, the weight ω is updated using the rule expressed in the following formula.

Figure BDA0002391733480000132
Figure BDA0002391733480000132

进一步,由于姿态特征和交互物体特征对于不同人体行为识别的响应程度不同,所以对得到的两种特征进行加权融合,公式如下:Furthermore, since the posture features and interactive object features have different response levels to different human behavior recognition, the two features are weighted fused, and the formula is as follows:

F=w1Fpose+w2Fobject F=w 1 F pose +w 2 F object

其中,w1为姿态时空特征的加权系数,w2为人体交互物体特征的加权系数,且w1+w2=1。Fpose为提取到的姿态时空特征,Fobject为提取到的交互物体特征。Wherein, w 1 is the weighted coefficient of the posture spatiotemporal feature, w 2 is the weighted coefficient of the human body interactive object feature, and w 1 +w 2 = 1. F pose is the extracted posture spatiotemporal feature, and F object is the extracted interactive object feature.

进一步,对融合的特征进行分类,本系统主要对送水、送快递、送外卖、朋友和其他人等进行行为识别。因此需要多分类的支持向量机,它的实现是通过在任意两类之间设计一个二分类模型,最后组合多个二分类器实现多分类器的构造,此处的二分类依然使用上述的方法。本系统中有6个类别,每次分类都把1个类别作为正样本,另外 1个类别作为负样本,依次类推。这样共有15个分类器。分类时这 15个分类器依次回答属于两个类别中的哪一类,最后投票统计得票数最高的那个类别即为所属类别。Furthermore, the fused features are classified. This system mainly recognizes behaviors such as delivering water, express delivery, takeout delivery, friends, and other people. Therefore, a multi-classification support vector machine is needed. It is implemented by designing a binary classification model between any two categories, and finally combining multiple binary classifiers to realize the construction of multiple classifiers. The binary classification here still uses the above method. There are 6 categories in this system. Each classification takes 1 category as a positive sample and the other 1 category as a negative sample, and so on. In this way, there are 15 classifiers in total. During classification, these 15 classifiers answer in turn which category they belong to in the two categories. Finally, the category with the highest number of votes is the category to which they belong.

步骤五,由于图像处理是在云服务器上进行的,所以识别的结果不能够被每位用户看到,因此需要人机交互模块接收识别的结果并进行显示。对于存在家人不在家而有人在家门口活动的情况,将识别的结果以短信的形式发送给家人。Step 5: Since the image processing is performed on the cloud server, the recognition result cannot be seen by every user, so the human-computer interaction module is required to receive the recognition result and display it. If there is a situation where the family is not at home but someone is active at the door, the recognition result will be sent to the family in the form of a text message.

Claims (2)

1.提供潜在信息融合的家庭安防系统中的人体行为识别的方法,其特征在于,包括以下步骤;1. A method for human behavior recognition in a home security system with potential information fusion is provided, characterized in that it comprises the following steps; 步骤一,使用摄像头采集图像;Step 1: Use a camera to capture images; 步骤二,使用基于背景差分法的光照自适应方法对采集的图像进行人体目标检测,然后使用Staple方法对检测到的人体目标进行跟踪,获得人体运动时序序列;Step 2: Use the illumination adaptive method based on background difference method to detect human targets in the collected images, and then use the Staple method to track the detected human targets to obtain the human motion time series; 步骤三,对检测到的人脸进行识别,判断其是否为家人,若是家人,对步骤二中得到的运动时序序列不进行任何操作,反之,进行人体行为识别;Step 3: Identify the detected face to determine whether it is a family member. If it is a family member, do not perform any operation on the motion time sequence obtained in step 2. Otherwise, perform human behavior recognition. 步骤四,对步骤二中得到的运动时序序列提取人体的姿态时空特征;Step 4, extracting the temporal and spatial characteristics of the human body posture from the motion time series obtained in step 2; 步骤五,采用线索增强的深度卷积神经网络完成交互物体特征的提取;Step 5: Use the cue-enhanced deep convolutional neural network to extract the features of the interactive objects; 步骤六,融合步骤四和步骤五中提取到的全局的姿态时空特征和局部的交互物体特征;Step 6: Fusing the global spatiotemporal features of the posture and the local interactive object features extracted in steps 4 and 5; 步骤七,将融合后的特征向量输入到SVM分类器中进行行为识别;Step 7: Input the fused feature vector into the SVM classifier for behavior recognition; 所述的步骤二中,对进入到检测范围内的人体使用基于背景差分法的光照自适应方法检测,使用Vibe算法进行背景建模,记录前一帧检测到的人体目标的像素点数,用Y表示,当前帧检测到的前景目标的像素点数,用L表示,由于在光照突变的一瞬间,会出现大面积白色,系统会将背景误检为前景,此时L>Y,因此在前景检测的时候设定一个阈值(前一帧中检测到人体目标范围的像素点数)判断前景的检测范围,若范围超过该阈值,则说明发生了光照突变,反之,则没有发生光照突变,若发生光照突变,则利用像素点在相邻两帧图像的亮度变化值对背景模型进行光照补偿,补偿公式如下:In the step 2, the human body entering the detection range is detected using a lighting adaptive method based on the background difference method, and the Vibe algorithm is used to perform background modeling. The number of pixels of the human target detected in the previous frame is recorded, represented by Y, and the number of pixels of the foreground target detected in the current frame is represented by L. Since a large area of white will appear at the moment of sudden change in illumination, the system will mistakenly detect the background as the foreground. At this time, L>Y, so a threshold (the number of pixels of the human target range detected in the previous frame) is set during foreground detection to determine the detection range of the foreground. If the range exceeds the threshold, it means that a sudden illumination change has occurred. Otherwise, no sudden illumination change has occurred. If a sudden illumination change has occurred, the background model is compensated for illumination using the brightness change value of the pixel point in two adjacent frames of the image. The compensation formula is as follows: Δt(x,y)=|Vt(x,y)-Vt-1(x,y)|Δ t (x,y)=|V t (x,y)-V t-1 (x,y)| 其中:in:
Figure FDA0004035706340000021
Figure FDA0004035706340000021
Vt表示图像It(x,y)的全局平均亮度值,式中n为图像中的总的像素点数,n=1280×480=614400个像素点,It(x,y)max(R,G,B)和It(x,y)min(R,G,B)分别表示像素点(x,y)处R,G,B分量中的最大值与最小值;V t represents the global average brightness value of the image It (x, y), where n is the total number of pixels in the image, n = 1280 × 480 = 614400 pixels, It (x, y) max (R, G, B) and It (x, y) min (R, G, B) represent the maximum and minimum values of the R, G, and B components at the pixel point (x, y), respectively; 检测到人体目标后使用Staple方法对检测到的人体目标进行跟踪,在跟踪的过程中使用平移滤波器和颜色滤波器找到目标的位置,然后利用尺度滤波器得到目标的大小,最终获得人体运动时序序列;After detecting the human target, the Staple method is used to track the detected human target. During the tracking process, the translation filter and color filter are used to find the position of the target, and then the scale filter is used to obtain the size of the target, and finally the human motion time series is obtained; 所述的步骤四中,对得到的人体运动时序序列提取姿态时空特征,具体过程包括:In the step 4, the temporal and spatial features of postures are extracted from the obtained human body motion time series, and the specific process includes: 1)计算每个关节点的互信息,通过互信息判断每个关节点对某个特定行为的响应程度,最终保留能表示特定行为的互信息最大的关节点组,计算每个关节点的互信息的公式如下:1) Calculate the mutual information of each joint point, and use the mutual information to determine the response degree of each joint point to a specific behavior. Finally, retain the joint point group with the largest mutual information that can represent the specific behavior. The formula for calculating the mutual information of each joint point is as follows: I(fj,Y)=H(fj)-H(fj|Y)I(f j ,Y)=H(f j )-H(f j |Y) 其中H(fj)表示第j个关节点的信息熵,j=1,2,...,20,
Figure FDA0004035706340000022
Figure FDA0004035706340000023
表示第j个关节点随时间变化的动态过程,N表示人体运动时序序列的帧数,Y为人体行为的类别,在家庭安防场景下,主要识别送水、送快递、送外卖、朋友、保洁人员、其他人等,因此Y=1,2,3,4,5,6,其中熵的计算公式如下:
Where H(f j ) represents the information entropy of the jth joint point, j = 1, 2, ..., 20,
Figure FDA0004035706340000022
Figure FDA0004035706340000023
represents the dynamic process of the jth joint point changing over time, N represents the number of frames of the human motion time series, and Y is the category of human behavior. In the home security scenario, it mainly identifies water delivery, express delivery, takeaway delivery, friends, cleaning staff, and other people, so Y = 1, 2, 3, 4, 5, 6, where the entropy calculation formula is as follows:
Figure FDA0004035706340000031
Figure FDA0004035706340000031
其中p(fj)是概率密度函数,i表示时序序列的帧数i=1,2...,N;Where p(f j ) is the probability density function, i represents the number of frames in the time series, i=1,2...,N; 2)对上述经过筛选后的关节点提取姿态时空特征,其中空间维度上的特征为:2) Extract the posture spatiotemporal features of the above-screened joint points, where the features in the spatial dimension are:
Figure FDA0004035706340000032
Figure FDA0004035706340000032
其中K表示人体姿态的关节点,K=1,2,...,20,N表示人体运动时序序列的帧数,选取人体髋关节点作为人体质心,T表示关节坐标轨迹特征矩阵,θ表示每个删选后的每个关节点相对于人体质心的方向矩阵,D表示任意两个关节点的空间距离矩阵,ψ表示任意2个关节构成的向量相对于质心向上的向量的方向矩阵,A表示任意3个关节点构成的3个内角大小矩阵;Where K represents the joint point of the human body posture, K = 1, 2, ..., 20, N represents the number of frames of the human body motion time series, the human hip joint point is selected as the human body center of mass, T represents the joint coordinate trajectory feature matrix, θ represents the direction matrix of each joint point after each selection relative to the human body center of mass, D represents the spatial distance matrix of any two joint points, ψ represents the direction matrix of the vector formed by any two joints relative to the upward vector of the center of mass, and A represents the 3 inner angle size matrices formed by any 3 joint points; 时间维度上的特征为:The characteristics in the time dimension are:
Figure FDA0004035706340000033
Figure FDA0004035706340000033
其中,ΔT为关节点的轨迹位移矩阵,Δθ为同一关节点随位移的方向,ΔD为任意两个关节点的距离随时间变化的矩阵,Δψ为任意两个关节点相对质心向上的向量的方向变化,ΔA为任意3个关节点构成的内角大小变化矩阵;Among them, ΔT is the trajectory displacement matrix of the joint point, Δθ is the direction of the same joint point with displacement, ΔD is the matrix of the distance change between any two joint points over time, Δψ is the direction change of the upward vector of any two joint points relative to the center of mass, and ΔA is the internal angle size change matrix formed by any three joint points; 提取到的姿态时空特征表示为:The extracted spatiotemporal features of the posture are expressed as: Fpose=Fspatial+FtemporalF pose =F spatial +F temporal ; 所述的步骤五中,通过将检测到的人体作为线索,将与人交互的有效物体作为高级线索,使用卷积神经网络提取与人交互的物体的特征,隐式的将检测到的人体中的物体与人之间的位置关系整合到卷积神经网络中,提取与人交互的有效物体的特征;In the step 5, the detected human body is used as a clue, and the effective object interacting with the human body is used as a high-level clue, and the features of the object interacting with the human body are extracted using a convolutional neural network, and the positional relationship between the object in the detected human body and the human body is implicitly integrated into the convolutional neural network to extract the features of the effective object interacting with the human body; 训练过程中使用损失函数,并在损失反向传播之时进行参数的调整,混合损失函数计算公式为:The loss function is used during training, and the parameters are adjusted during loss back propagation. The formula for calculating the mixed loss function is: L(M,D)=Lmain(M,D)+αLhint(M,D)L(M,D)=L main (M,D)+αL hint (M,D) 其中Lmain(M,D)表示交互物体特征提取的损失函数,Lhint(M,D)表示距离暗示任务的损失是函数,M表示网络模型,
Figure FDA0004035706340000041
作为N个样本图片的训练集合,
Figure FDA0004035706340000042
表示N张图像,
Figure FDA0004035706340000043
表示相关的类别标签,α取值在0~1之间;
Where L main (M, D) represents the loss function of interactive object feature extraction, L hint (M, D) represents the loss function of the distance hint task, M represents the network model,
Figure FDA0004035706340000041
As a training set of N sample images,
Figure FDA0004035706340000042
represents N images,
Figure FDA0004035706340000043
Represents the relevant category label, and α takes a value between 0 and 1;
所述的步骤六中,由于姿态时空特征和交互物体特征对于不同人体行为识别的响应程度不同,所以对得到的两种特征进行加权融合,公式如下:In step 6, since the spatiotemporal features of the posture and the interactive object features have different response levels to the recognition of different human behaviors, the two features are weightedly fused, and the formula is as follows: F=w1Fpose+w2Fobject F=w 1 F pose +w 2 F object 其中,w1为姿态时空特征的加权系数,w2为人体交互物体特征的加权系数,且w1+w2=1,Fpose表示姿态时空特征,Fobject表示交互物体特征。Wherein, w1 is the weighted coefficient of the posture spatiotemporal feature, w2 is the weighted coefficient of the human body interactive object feature, and w1 + w2 =1, Fpose represents the posture spatiotemporal feature, and Fobject represents the interactive object feature.
2.根据权利要求1所述的提供潜在信息融合的家庭安防系统中的人体行为识别的方法,其特征在于,所述的步骤七中,将融合后的特征向量输入到SVM分类器中进行分类,得到最后的识别结果。2. According to the method for human behavior recognition in a home security system providing potential information fusion as described in claim 1, it is characterized in that in the step seven, the fused feature vector is input into the SVM classifier for classification to obtain the final recognition result.
CN202010116795.8A 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system Active CN111310689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010116795.8A CN111310689B (en) 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116795.8A CN111310689B (en) 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system

Publications (2)

Publication Number Publication Date
CN111310689A CN111310689A (en) 2020-06-19
CN111310689B true CN111310689B (en) 2023-04-07

Family

ID=71149293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116795.8A Active CN111310689B (en) 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system

Country Status (1)

Country Link
CN (1) CN111310689B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381072B (en) * 2021-01-11 2021-05-25 西南交通大学 A human abnormal behavior detection method based on spatiotemporal information and human-object interaction
CN113487596A (en) * 2021-07-26 2021-10-08 盛景智能科技(嘉兴)有限公司 Working strength determination method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018095082A1 (en) * 2016-11-28 2018-05-31 江苏东大金智信息系统有限公司 Rapid detection method for moving target in video monitoring
WO2018130016A1 (en) * 2017-01-10 2018-07-19 哈尔滨工业大学深圳研究生院 Parking detection method and device based on monitoring video
CN110096950A (en) * 2019-03-20 2019-08-06 西北大学 A kind of multiple features fusion Activity recognition method based on key frame
CN110378281A (en) * 2019-07-17 2019-10-25 青岛科技大学 Group Activity recognition method based on pseudo- 3D convolutional neural networks
CN110555387A (en) * 2019-08-02 2019-12-10 华侨大学 Behavior identification method based on local joint point track space-time volume in skeleton sequence
CN110826453A (en) * 2019-10-30 2020-02-21 西安工程大学 Behavior identification method by extracting coordinates of human body joint points

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016168869A1 (en) * 2015-04-16 2016-10-20 California Institute Of Technology Systems and methods for behavior detection using 3d tracking and machine learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018095082A1 (en) * 2016-11-28 2018-05-31 江苏东大金智信息系统有限公司 Rapid detection method for moving target in video monitoring
WO2018130016A1 (en) * 2017-01-10 2018-07-19 哈尔滨工业大学深圳研究生院 Parking detection method and device based on monitoring video
CN110096950A (en) * 2019-03-20 2019-08-06 西北大学 A kind of multiple features fusion Activity recognition method based on key frame
CN110378281A (en) * 2019-07-17 2019-10-25 青岛科技大学 Group Activity recognition method based on pseudo- 3D convolutional neural networks
CN110555387A (en) * 2019-08-02 2019-12-10 华侨大学 Behavior identification method based on local joint point track space-time volume in skeleton sequence
CN110826453A (en) * 2019-10-30 2020-02-21 西安工程大学 Behavior identification method by extracting coordinates of human body joint points

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于DTW约束的动作行为识别;李海涛;《计算机仿真》(第11期);全文 *
基于姿态时空特征的人体行为识别方法;郑潇等;《计算机辅助设计与图形学学报》(第09期);全文 *

Also Published As

Publication number Publication date
CN111310689A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
Shahzad et al. A smart surveillance system for pedestrian tracking and counting using template matching
CN110244322B (en) Multi-source sensor-based environmental perception system and method for pavement construction robot
CN107622258B (en) A Fast Pedestrian Detection Method Combining Static Underlying Features and Motion Information
Hsiao et al. Occlusion reasoning for object detectionunder arbitrary viewpoint
CN101226597B (en) A nighttime pedestrian recognition method and system based on thermal infrared gait
Asif et al. Privacy preserving human fall detection using video data
CN104778453B (en) A kind of night pedestrian detection method based on infrared pedestrian's brightness statistics feature
CN101587485B (en) Face information automatic login method based on face recognition technology
CN109145742A (en) A kind of pedestrian recognition method and system
CN111611905A (en) A target recognition method based on visible light and infrared fusion
KR101653278B1 (en) Face tracking system using colar-based face detection method
CN111144207B (en) Human body detection and tracking method based on multi-mode information perception
CN111881749A (en) Bidirectional pedestrian flow statistical method based on RGB-D multi-modal data
CN112989889B (en) Gait recognition method based on gesture guidance
CN105160297A (en) Masked man event automatic detection method based on skin color characteristics
CN110688980B (en) Human body posture classification method based on computer vision
Zaidi et al. Video anomaly detection and classification for human activity recognition
CN111310689B (en) Method for recognizing human body behaviors in potential information fusion home security system
Nosheen et al. Efficient Vehicle Detection and Tracking using Blob Detection and Kernelized Filter
Abd et al. Human fall down recognition using coordinates key points skeleton
CN107085729A (en) A Correction Method of Person Detection Result Based on Bayesian Inference
Chen et al. Multiview social behavior analysis in work environments
CN111160115B (en) A Video Pedestrian Re-Identification Method Based on Siamese Two-Stream 3D Convolutional Neural Network
Martinez-Gonzalez et al. Real time face detection using neural networks
CN113658223B (en) A method and system for multi-pedestrian detection and tracking based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant