CN113989495A - Vision-based pedestrian calling behavior identification method - Google Patents
Vision-based pedestrian calling behavior identification method Download PDFInfo
- Publication number
- CN113989495A CN113989495A CN202111362421.5A CN202111362421A CN113989495A CN 113989495 A CN113989495 A CN 113989495A CN 202111362421 A CN202111362421 A CN 202111362421A CN 113989495 A CN113989495 A CN 113989495A
- Authority
- CN
- China
- Prior art keywords
- network
- pedestrian
- graph
- random forest
- key point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000007637 random forest analysis Methods 0.000 claims abstract description 58
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 40
- 230000004927 fusion Effects 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000001815 facial effect Effects 0.000 claims description 33
- 238000001514 detection method Methods 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 4
- 230000003321 amplification Effects 0.000 claims 1
- 238000003199 nucleic acid amplification method Methods 0.000 claims 1
- 230000007613 environmental effect Effects 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 7
- 238000003066 decision tree Methods 0.000 description 5
- 230000029305 taxis Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于车辆智能化领域,尤其涉及一种自动驾驶出租车识别行人行为意图的方法。The invention belongs to the field of vehicle intelligence, and in particular relates to a method for recognizing pedestrian behavior intention by an automatic driving taxi.
背景技术Background technique
交通场景中的车辆识别行人的行为属于车辆智能化的范畴。准确有效的识别行人的召车意图可以帮助自动驾驶出租车在道路上快速寻找到有召车意图的行人,这对提高行人的出行效率和提高自动驾驶出租车的使用效率,避免交通拥堵具有重要意义。The behavior of vehicles in traffic scenes to identify pedestrians belongs to the category of vehicle intelligence. Accurate and effective identification of pedestrians' car-hailing intentions can help autonomous taxis quickly find pedestrians with car-hailing intentions on the road, which is important for improving pedestrians' travel efficiency, improving the use efficiency of autonomous taxis, and avoiding traffic congestion. significance.
行人召车行为识别是指利用计算机视觉的方法对交通场景中的行人进行分析,寻找具有召车意图的行人。交通场景具有高度的复杂性,交通参与者(包括行人、车辆、骑行者等)的数目和种类远高于其他应用场景,这增加了行为识别的难度。召车的行为与行人的其他行为(走路、跑步、骑行等)相比具有明显的随机性和瞬时性特点:首先,当前场景中的任何一个行人在任何时间都有可能转化成一个具有召车意图的人;另外,召车行为具有明显的瞬时的特性,司机判断一个人是否具有召车意图仅仅需要单独的一张图像就可以实现,而不需要考虑这张图像的前后连续几帧图像的信息。基于上述两个特点,传统的基于3DCNN(3D Convolutional Neural Network)和LSTM(Long Short Term Memory Network)的行为识别算法不能适用于具有瞬时特性的召车意图推理。行人的手势是表达行人意图的关键信息,而目前的大多数手势识别算法主要应用于室内的场景,且基于视觉的手势识别算法对图像中手部轮廓的分辨率要求较高,但智能车搭载的车载相机无法实现在复杂的交通场景中生成如此高质量的图像。Pedestrian car-hailing behavior recognition refers to the use of computer vision methods to analyze pedestrians in traffic scenes to find pedestrians with the intention of car-hailing. Traffic scenes are highly complex, and the number and types of traffic participants (including pedestrians, vehicles, cyclists, etc.) are much higher than other application scenarios, which increases the difficulty of behavior recognition. Compared with other behaviors of pedestrians (walking, running, cycling, etc.), the car-hailing behavior has obvious randomness and transient characteristics: First, any pedestrian in the current scene may be transformed into a car-hailing vehicle at any time. In addition, the car-hailing behavior has obvious instantaneous characteristics, and the driver only needs a single image to determine whether a person has the car-hailing intention, and does not need to consider the consecutive frames before and after the image. Information. Based on the above two characteristics, traditional behavior recognition algorithms based on 3DCNN (3D Convolutional Neural Network) and LSTM (Long Short Term Memory Network) cannot be applied to the reasoning of car-hailing intent with instantaneous characteristics. Pedestrian gestures are the key information to express the intention of pedestrians, and most of the current gesture recognition algorithms are mainly used in indoor scenes, and vision-based gesture recognition algorithms have high requirements on the resolution of the hand contour in the image, but smart cars are equipped with The in-vehicle cameras of 100% cannot achieve such high-quality images in complex traffic scenes.
发明内容SUMMARY OF THE INVENTION
为解决现有技术存在的上述问题,本发明要设计一种环境适应性强、识别精度高且基于视觉的行人召车行为识别方法,能够通过处理车载相机采集的图像,实时对图像中有召车意图行人的准确识别,从而帮助自动驾驶出租车更高效的发现乘客。In order to solve the above-mentioned problems existing in the prior art, the present invention aims to design a pedestrian car-calling behavior recognition method with strong environmental adaptability, high recognition accuracy and based on vision, which can process the images collected by the vehicle-mounted camera, and real-timely identify the calls in the images in real time. Accurate identification of vehicles and pedestrians to help autonomous taxis find passengers more efficiently.
为了实现上述目的,本发明的技术方案如下:一种基于视觉的行人召车行为识别方法,包括以下步骤:In order to achieve the above object, the technical solution of the present invention is as follows: a vision-based pedestrian car-hailing behavior recognition method, comprising the following steps:
A、图像预处理A. Image preprocessing
采用目标检测算法和人体关键点提取算法实现对图像的预处理,得到行人的检测框D以及每个检测框内所对应的行人的关键点参数K,在召车行为推理的过程中,人体的面部注意力是判断其是否具有召车意图的关键线索,在真实的场景中,行人召车的过程,行人将会对出租车具有高度的注意力。对面部注意力的推理,从两个方面进行,首先利用人体关键点检测中所检测到的面部关键点进行推理,以左耳关键点和右耳关键点的横坐标之差hp为基准,以σ为放大系数,形成一个边长为σhp的正方形框S作为面部区域;当左耳关键点和鼻关键点的横向距离hf大于hp,意味着行人的面部以相对侧面的角度正对出租车,即行人对车辆的注意力较小;当hf小于hp,将面部区域S输入到面部注意力深度网络中计算行人的面部注意力概率;面部注意力深度网络包括前部网络和后部网络,前部网络为特征提取网络,采用Resnet50作为基准网络,提取面部特征;后部网络为由全连接层组成的特征连接网络,实现将前部网络所提取的面部特征连接,得到全局特征,输出为面部注意力概率ρf;The target detection algorithm and the human body key point extraction algorithm are used to preprocess the image, and the pedestrian detection frame D and the pedestrian key point parameter K corresponding to each detection frame are obtained. Facial attention is a key clue to determine whether it has the intention of hailing a car. In a real scene, in the process of pedestrian hailing a car, pedestrians will have a high degree of attention to taxis. The reasoning of facial attention is carried out from two aspects. First, the facial key points detected in the human body key point detection are used for inference, and the difference h p between the abscissas of the left ear key point and the right ear key point is used as the benchmark. Taking σ as the magnification factor, a square frame S with side length σh p is formed as the face area; when the lateral distance h f between the left ear key point and the nose key point is greater than h p , it means that the pedestrian's face is at a positive angle from the opposite side. For taxis, that is, pedestrians pay less attention to vehicles; when h f is less than h p , input the face region S into the facial attention depth network to calculate the pedestrian's facial attention probability; the facial attention depth network includes the front network and the rear network, the front network is a feature extraction network, and Resnet50 is used as the benchmark network to extract facial features; the rear network is a feature connection network composed of fully connected layers, and the facial features extracted by the front network are connected. global feature, the output is the facial attention probability ρ f ;
B、意图推理B. Intentional reasoning
采用随机森林算法和图卷积网络相结合进行行人的意图推理,具体步骤如下:Using the combination of random forest algorithm and graph convolutional network to infer pedestrian intent, the specific steps are as follows:
B1、采用随机森林算法推理人体关键点之间的连接角度和行人意图的关系,随机森林的输入是人体关键点的连接角度,为了防止出现过拟合的现象,选取一些与行人召车关系较强的关键点角度作为随机森林的输入,包括以颈关键点、左肩关键点、右肩关键点、左肘关键点、右肘关键点为顶点的连接角度,随机森林的输出为行人具有召车意图的概率ρr。B1. The random forest algorithm is used to infer the relationship between the connection angle between the key points of the human body and the intention of pedestrians. The input of the random forest is the connection angle of the key points of the human body. The strong key point angle is used as the input of random forest, including the connection angle of neck key point, left shoulder key point, right shoulder key point, left elbow key point, and right elbow key point as vertices. The output of random forest is that pedestrians have car-hailing Probability ρ r of intent.
B2、采用图卷积网络推理人体关键点位置与行人意图的关系,图卷积网络的输入为人体图模型G(v,e),其中,v为人体图模型的节点,即人体关键点,节点特征为关键点的坐标,e为人体图模型的边,即节点之间的连接。由于目标检测所获取的检测框D的尺寸不固定,为了降低检测框尺寸对意图推理的影响,采用坐标转换实现将人体关键点的图像坐标转化为以人体颈部关键点为原点的关联坐标:B2. The graph convolution network is used to infer the relationship between the position of the human body key points and the pedestrian intent. The input of the graph convolution network is the human body graph model G(v,e), where v is the node of the human body graph model, that is, the human body key point, The node features are the coordinates of the key points, and e is the edge of the human body graph model, that is, the connection between the nodes. Since the size of the detection frame D obtained by the target detection is not fixed, in order to reduce the influence of the size of the detection frame on the intention inference, coordinate transformation is used to convert the image coordinates of the human body key points into the associated coordinates with the human neck key point as the origin:
其中,xinew和yinew为第i个人体关键点转换后的横坐标和纵坐标;ui与vi为第i个人体关键点的转换前的横坐标和纵坐标;u1与v1为颈部关键点的横坐标和纵坐标。Among them, x inew and y inew are the abscissa and ordinate after the transformation of the key point of the ith person; u i and v i are the abscissa and ordinate before the transformation of the key point of the ith person; u 1 and v 1 are the abscissa and ordinate of the neck key point.
图卷积网络的过程为:The process of graph convolutional network is:
其中,A是人体图模型的邻接矩阵;是人体图模型的度矩阵;H(l)是第l层图卷积的输出特征,H(l+1)为第l+1层图卷积的输出特征;W(l)为第l层图卷积的参数矩阵;是激活函数;Z是图卷积网络的输出,即行人具有召车意图的概率ρg;H(z)是最后一层图卷积的特征矩阵;W(z)是最后一层图卷积的参数矩阵;readout(·)是由全连接层组成的图读出网络,实现将人体图模型中的所有节点特征聚合连接。in, A is the adjacency matrix of the human figure model; is the degree matrix of the human graph model; H (l) is the output feature of the lth layer graph convolution, H (l+1) is the output feature of the l+1th layer graph convolution; W (l) is the lth layer The parameter matrix of graph convolution; is the activation function; Z is the output of the graph convolution network, that is, the probability ρ g that the pedestrian has the intention to call a car; H (z) is the feature matrix of the last layer of graph convolution; W (z) is the last layer of graph convolution The parameter matrix of ; readout( ) is a graph readout network composed of fully connected layers, which realizes the aggregation and connection of all node features in the human graph model.
B3、算法融合B3. Algorithm fusion
通过随机森林和图卷积网络,分别得到行人具有召车意图的概率随机森林输出概率ρr和图卷积网络输出概率ρg,为了得到更稳定准确的意图推理,提出一套具有逻辑上可解释的融合规则实现将随机森林和图卷积网络融合,融合规则如下:Through random forest and graph convolutional network, the probability of the pedestrian's intention to call a car is obtained. The random forest output probability ρ r and the graph convolution network output probability ρ g are respectively obtained. In order to obtain more stable and accurate intention reasoning, a set of logically feasible The explained fusion rule realizes the fusion of random forest and graph convolutional network. The fusion rule is as follows:
其中,p是融合后行人具有召车意图的概率。当pg>0.5且pr>0.5或者pg<0.5且pr<0.5时,意味着随机森林算法和图卷积网络算法具有相同的推理结果,则融合概率p为当pg>0.5且pr<0.5时,则意味着随机森林算法和图卷积网络算法具有不同的推理结果,图卷积网络的推理结果为行人具有召车意图,随机森林的推理结果为行人没有召车意图,为了得到一个更准确的推理结果,面部注意力概率pf作为动态权重对pg和pr实现动态加权平均,即,当pf>0.5,意味着行人具有较高的召车概率,则赋予图卷积网络的输出一个更高的权重,而随机森林的输出赋予一个较低的权重;当pf<0.5时,则赋予随机森林的输出一个更高的权重,而赋予图卷积网络的输出一个更低的权重;当pg<0.5且pr>0.5时,则意味着另一种随机森林算法和图卷积网络算法具有不同的推理结果的情况,图卷积网络的推理结果为行人没有召车意图,而随机森林的推理结果为行人具有召车意图,当pf>0.5时,意味着随机森林的推理结果有更高的概率为正确的结果,则随机森林的输出赋予更高的权重,而图卷积网络的输出赋予更低的权重;反之,当pf<0.5时,则图卷积网络的输出赋予更高的权重,而随机森林的输出赋予更低的权重。Among them, p is the probability that the pedestrian has the intention to call a car after fusion. When p g >0.5 and p r >0.5 or p g <0.5 and p r <0.5, it means that the random forest algorithm and the graph convolutional network algorithm have the same inference result, then the fusion probability p is When p g > 0.5 and p r < 0.5, it means that the random forest algorithm and the graph convolutional network algorithm have different inference results. The inference result of the graph convolution network is that the pedestrian has the intention to call a car, and the inference result of the random forest is Pedestrians have no intention to call a car. In order to obtain a more accurate inference result, the facial attention probability p f is used as a dynamic weight to achieve a dynamic weighted average of p g and p r , that is, when p f > 0.5, it means that the pedestrian has a higher car-hailing probability, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a lower weight; when p f < 0.5, the output of the random forest is given a higher weight, and Give the output of the graph convolutional network a lower weight; when p g < 0.5 and p r > 0.5, it means that another random forest algorithm and the graph convolutional network algorithm have different inference results. The reasoning result of the product network is that the pedestrian has no intention to call a car, while the reasoning result of the random forest is that the pedestrian has the intention to call a car. When p f > 0.5, it means that the reasoning result of the random forest has a higher probability of being the correct result, then The output of the random forest is given a higher weight, while the output of the graph convolutional network is given a lower weight; conversely, when p f < 0.5, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a higher weight. assign lower weights.
与现有技术相比,本发明的有益效果和益处如下:Compared with the prior art, the beneficial effects and benefits of the present invention are as follows:
1、本发明采用计算机视觉的方法从图像中准确高效的识别出具有召车行为的行人,实现自动驾驶出租车更高效的发现乘客,提高了自动驾驶出租车的使用效率,也提高了乘客的出行效率。1. The present invention uses the method of computer vision to accurately and efficiently identify pedestrians with car-hailing behaviors from images, so that the autonomous taxi can find passengers more efficiently, improve the use efficiency of the autonomous taxi, and improve the passenger's safety. travel efficiency.
2、本发明采用了空间推理网络实现对行人召车行为的推理,减少了对时间维度信息的依赖,与传统的行为识别算法相比,减少了时间特征提取的过程,能够简化网络,提高行为推理的实时性。2. The present invention adopts a spatial reasoning network to realize the reasoning of pedestrian car-hailing behavior, which reduces the dependence on time dimension information. Compared with the traditional behavior recognition algorithm, the process of time feature extraction is reduced, which can simplify the network and improve the behavior. Real-time inference.
3、本发明采用了一套具有逻辑上可解释的融合规则,实现随机森林和图卷积网络的融合,逻辑上可解释的特性能够提高算法的环境适应性和行为识别的精度,实现融合算法对行人召车意图更稳定准确的推理。3. The present invention adopts a set of logically interpretable fusion rules to realize the fusion of random forest and graph convolutional network. The logically interpretable characteristics can improve the environmental adaptability of the algorithm and the accuracy of behavior recognition, and realize the fusion algorithm. More stable and accurate reasoning about pedestrian car-hailing intentions.
附图说明Description of drawings
图1是本发明的流程示意图。FIG. 1 is a schematic flow chart of the present invention.
图2是OpenPose提取的人体关键点示意图。Figure 2 is a schematic diagram of human key points extracted by OpenPose.
图3是面部注意力深度网络示意图。Figure 3 is a schematic diagram of the facial attention deep network.
图4是随机森林示意图。Figure 4 is a schematic diagram of a random forest.
图5是图卷积网络示意图。Figure 5 is a schematic diagram of a graph convolutional network.
具体实施方式Detailed ways
下面结合附图对本发明进行进一步的描述,如图1所示,一种基于视觉的行人召车行为识别方法,包括以下步骤:The present invention will be further described below in conjunction with the accompanying drawings. As shown in Figure 1, a visual-based pedestrian car-hailing behavior recognition method includes the following steps:
A、图像预处理A. Image preprocessing
采用Yolov5作为目标检测方法和人体关键点提取算法OpenPose实现对图像的预处理,得到行人的检测框D以及每个检测框内所对应的行人的关键点参数K,其中,关键点的参数如图2所示,关键点的序列与人体部位的对应关系为:Using Yolov5 as the target detection method and the human body key point extraction algorithm OpenPose to realize the preprocessing of the image, the pedestrian detection frame D and the pedestrian key point parameter K corresponding to each detection frame are obtained. The parameters of the key points are shown in the figure 2, the correspondence between the sequence of key points and the body parts is:
目标检测所提供的检测框可以提高人体关键点提取的精度。在召车意图推理的过程中,人体的面部注意力是判断其是否具有召车意图的关键线索,在真实的场景中,行人召车的过程,行人将会对出租车具有高度的注意力。对面部注意力的推理,本发明主要从两个方面进行,首先利用人体关键点检测中所检测到的面部关键点进行推理,以关键点16和关键点17的横坐标之差hp为基准,以σ=1.2为放大系数,形成一个边长为σhp的正方形框S作为面部区域,当关键点16和关键点0之间的横向距离hf大于hp,意味着行人的面部以相对侧面的角度正对出租车,即行人对车辆的注意力较小,设置面部注意力概率ρf=0.1,当hf小于hp,则很难通过上述判断行人是否注意到车辆,因此,将面部区域S输入到面部注意力深度网络中计算其面部注意力概率,面部注意力深度网络示意图如图3所示,主要有两部分组成,前部分为特征提取网络,采用Resnet50作为基准网络,提取面部特征,后部分为由全连接层组成的特征连接网络,将前部分所提取的特征连接,得到全局特征,输出为面部注意力概率ρf。The detection frame provided by target detection can improve the accuracy of human key point extraction. In the process of car-hailing intention reasoning, the facial attention of the human body is the key clue to determine whether it has the car-hailing intention. In the real scene, pedestrians will have a high degree of attention to the taxi during the process of hailing a car. For the reasoning of facial attention, the present invention is mainly carried out from two aspects. First, the facial key points detected in the human body key point detection are used for reasoning, and the difference h p between the abscissas of the
B、意图推理B. Intentional reasoning
通过步骤A可以得到行人的目标检测框D,目标检测框内行人的人体关键点K以及对应行人的面部注意力概率ρf。本发明采用随机森林算法和图卷积网络相结合进行行人的意图推理。Through step A, the target detection frame D of the pedestrian, the human key points K of the pedestrian in the target detection frame and the facial attention probability ρ f of the corresponding pedestrian can be obtained. The present invention adopts the combination of random forest algorithm and graph convolution network to perform pedestrian intention reasoning.
B1、随机森林主要推理人体关键点之间的连接角度和行人意图的关系。因此,随机森林的输入是人体关键点的连接角度,为了防止出现过拟合的现象,本发明中,选取一些与行人召车关系较强的关键点角度作为随机森林的输入,包括以关键点1、关键点2、关键点3、关键点5和关键点6为顶点的连接角度,随机森林的输出为行人具有召车意图的概率为ρr,所输入的关键点连接角度为以关键点1为顶点的∠318、∠6111、∠418、∠7111、∠618、∠617;以关键点2为顶点的∠123、∠124;以关键点5为顶点的∠156、∠157;以关键点3为顶点的∠234、∠438、∠134;以关键点6为顶点的∠567、∠7611、∠167。B1. Random forest mainly infers the relationship between the connection angle between human key points and pedestrian intent. Therefore, the input of the random forest is the connection angle of the key points of the human body. In order to prevent the phenomenon of overfitting, in the present invention, some key point angles that have a strong relationship with the pedestrian car-hailing are selected as the input of the random forest, including the key point 1. The
随机森林的示意图如图4所示,随机森林是由N个独立的决策树组成,其中,N=55,使用不同的数据集来训练不同的决策树,得到包含训练参数的相应的模型。每棵决策树都是一个特定的分类器,并根据输入数据做出独立的决策。决策聚合的过程采用多数投票法,输出为决策是召车意图的决策树的数目与决策树总数的比值,即行人具有召车意图的概率ρr。The schematic diagram of the random forest is shown in Figure 4. The random forest is composed of N independent decision trees, where N=55. Different data sets are used to train different decision trees to obtain the corresponding model including the training parameters. Each decision tree is a specific classifier and makes independent decisions based on the input data. The process of decision aggregation adopts the majority voting method, and the output is the ratio of the number of decision trees whose decision is the car-hailing intention to the total number of decision trees, that is, the probability ρ r that the pedestrian has the car-hailing intention.
B2、图卷积网络主要推理人体关键点位置与行人意图的关系,因此,图卷积网络的输入为人体图模型G(v,e),其中,v为人体图模型的节点,即人体关键点,节点特征为关键点的坐标,e为人体图模型的边,即节点之间的连接。由于目标检测所获取的检测框D的尺寸不固定,为了降低检测框尺寸对意图推理的影响,采用坐标转换实现将人体关键点的图像坐标转化为以关键点1为原点的关联坐标:B2. The graph convolutional network mainly infers the relationship between the position of the human body key points and the pedestrian intent. Therefore, the input of the graph convolutional network is the human body graph model G(v,e), where v is the node of the human body graph model, that is, the human body key point, the node feature is the coordinate of the key point, e is the edge of the human body graph model, that is, the connection between the nodes. Since the size of the detection frame D obtained by the target detection is not fixed, in order to reduce the influence of the size of the detection frame on the intention reasoning, coordinate transformation is used to convert the image coordinates of the key points of the human body into the associated coordinates with the key point 1 as the origin:
其中,xinew、yinew为第i个人体关键点的转换后的横坐标和纵坐标,ui与vi为第i个人体关键点的转换前的横坐标和纵坐标;u1与v1为关键点1的横坐标和纵坐标。Among them, x inew and y inew are the abscissa and ordinate after the transformation of the ith person's key point, u i and v i are the abscissa and ordinate before the transformation of the ith person's key point; u 1 and v 1 is the abscissa and ordinate of key point 1.
图卷积网络的示意图如图5所示,将人体图模型输入到图卷积网络中,人体的每个节点特征沿节点之间的边将节点特征传递到相邻的节点中,而每个节点也聚合来自相邻节点所传递的特征,实现节点特征沿边的传递和聚合,为增强模型的表达能力,在每一层图卷积后,采用激活函数RELU实现节点特征非线性映射,最后,采用全连层组成的图读出网络实现将所有节点特征的聚合连接,得到最终的分类结果。The schematic diagram of the graph convolutional network is shown in Figure 5. The human body graph model is input into the graph convolutional network, and each node feature of the human body transfers the node feature to the adjacent nodes along the edges between the nodes, and each Nodes also aggregate the features transmitted from adjacent nodes to realize the transfer and aggregation of node features along the edge. In order to enhance the expressive ability of the model, after each layer of graph convolution, the activation function RELU is used to realize the nonlinear mapping of node features. Finally, A graph readout network composed of fully connected layers is used to aggregate and connect the features of all nodes to obtain the final classification result.
图卷积网络的过程可以总结为:The process of graph convolutional network can be summarized as:
其中,A是人体图模型的邻接矩阵;是人体图模型的度矩阵;H(l)是第l层图卷积的输出特征,H(l+1)为第l+1层图卷积的输出特征;W(l)为第l层图卷积的参数矩阵;是激活函数RELU;Z是图卷积网络的输出,即行人具有召车意图的概率ρg;H(z)是最后一层图卷积的特征矩阵;W(z)是最后一层图卷积的参数矩阵;readout(·)是由全连接层组成的图读出网络,能够实现将人体图模型中的所有节点特征聚合连接。in, A is the adjacency matrix of the human figure model; is the degree matrix of the human graph model; H (l) is the output feature of the lth layer graph convolution, H (l+1) is the output feature of the l+1th layer graph convolution; W (l) is the lth layer The parameter matrix of graph convolution; is the activation function RELU; Z is the output of the graph convolution network, that is, the probability ρ g of the pedestrian with the intention to call a car; H (z) is the feature matrix of the last layer of graph convolution; W (z) is the last layer of graph volume The parameter matrix of the product; readout( ) is a graph readout network composed of fully connected layers, which can aggregate and connect all node features in the human graph model.
B3、算法融合B3. Algorithm fusion
通过随机森林和图卷积网络,分别得到行人具有召车意图的概率ρr和ρg,为了能够得到更稳定准确的意图推理,本发明提出一套具有逻辑上可解释的融合规则实现将随机森林和图卷积网络融合,融合规则如下:Through random forest and graph convolutional network, the probability ρ r and ρ g of pedestrians with car-calling intention are obtained respectively. In order to obtain more stable and accurate intention reasoning, the present invention proposes a set of logically interpretable fusion rules to realize random Fusion of forest and graph convolutional network, the fusion rules are as follows:
其中,p是融合后行人具有召车意图的概率。当pg>0.5且pr>0.5或者pg<0.5且pr<0.5时,意味着随机森林算法和图卷积网络算法具有相同的推理结果,则融合概率p为当pg>0.5且pr<0.5时,则意味着随机森林算法和图卷积网络算法具有不同的推理结果,图卷积网络的推理结果为行人具有召车意图,随机森林的推理结果为行人没有召车意图,为了得到一个更准确的推理结果,面部注意力pf作为动态权重对pg和pr实现动态加权平均,即,当pf>0.5,意味着行人具有较高的召车概率,则赋予图卷积网络的输出一个更高的权重,而随机森林的输出赋予一个较低的权重;当pf<0.5时,则赋予随机森林的输出一个更高的权重,而赋予图卷积网络的输出一个更低的权重;当pg<0.5且pr>0.5时,则意味着另一种随机森林算法和图卷积网络算法具有不同的推理结果的情况,图卷积网络的推理结果为行人没有召车意图,而随机森林的推理结果为行人具有召车意图,当pf>0.5时,意味着随机森林的推理结果有更高的概率为正确的结果,则随机森林的输出赋予更高的权重,而图卷积网络的输出赋予更低的权重;反之,当pf<0.5时,则图卷积网络的输出赋予更高的权重,而随机森林的输出赋予更低的权重。Among them, p is the probability that the pedestrian has the intention to call a car after fusion. When p g >0.5 and p r >0.5 or p g <0.5 and p r <0.5, it means that the random forest algorithm and the graph convolutional network algorithm have the same inference result, then the fusion probability p is When p g > 0.5 and p r < 0.5, it means that the random forest algorithm and the graph convolutional network algorithm have different inference results. The inference result of the graph convolution network is that the pedestrian has the intention to call a car, and the inference result of the random forest is Pedestrians have no intention to call a car. In order to obtain a more accurate inference result, the facial attention p f is used as a dynamic weight to achieve a dynamic weighted average of p g and p r , that is, when p f > 0.5, it means that the pedestrian has a higher calling car probability, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a lower weight; when p f < 0.5, the output of the random forest is given a higher weight, and the output of the random forest is given a lower weight; The output of the graph convolutional network has a lower weight; when p g <0.5 and p r >0.5, it means that another random forest algorithm and the graph convolutional network algorithm have different inference results. The inference result of the network is that the pedestrian has no intention to call a car, while the inference result of the random forest is that the pedestrian has the intention to call a car. When p f > 0.5, it means that the inference result of the random forest has a higher probability of being the correct result, then the random forest The output of the forest is given a higher weight, while the output of the graph convolutional network is given a lower weight; conversely, when p f < 0.5, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a higher weight. lower weight.
本发明不局限于本实施例,任何在本发明披露的技术范围内的等同构思或者改变,均列为本发明的保护范围。The present invention is not limited to this embodiment, and any equivalent ideas or changes within the technical scope disclosed in the present invention are included in the protection scope of the present invention.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111362421.5A CN113989495B (en) | 2021-11-17 | 2021-11-17 | Pedestrian calling behavior recognition method based on vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111362421.5A CN113989495B (en) | 2021-11-17 | 2021-11-17 | Pedestrian calling behavior recognition method based on vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113989495A true CN113989495A (en) | 2022-01-28 |
CN113989495B CN113989495B (en) | 2024-04-26 |
Family
ID=79749065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111362421.5A Active CN113989495B (en) | 2021-11-17 | 2021-11-17 | Pedestrian calling behavior recognition method based on vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113989495B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926823A (en) * | 2022-05-07 | 2022-08-19 | 西南交通大学 | WGCN-based vehicle driving behavior prediction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
KR20200121206A (en) * | 2019-04-15 | 2020-10-23 | 계명대학교 산학협력단 | Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof |
CN112052802A (en) * | 2020-09-09 | 2020-12-08 | 上海工程技术大学 | Front vehicle behavior identification method based on machine vision |
CN113255543A (en) * | 2021-06-02 | 2021-08-13 | 西安电子科技大学 | Facial Expression Recognition Method Based on Graph Convolutional Network |
-
2021
- 2021-11-17 CN CN202111362421.5A patent/CN113989495B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
KR20200121206A (en) * | 2019-04-15 | 2020-10-23 | 계명대학교 산학협력단 | Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof |
CN112052802A (en) * | 2020-09-09 | 2020-12-08 | 上海工程技术大学 | Front vehicle behavior identification method based on machine vision |
CN113255543A (en) * | 2021-06-02 | 2021-08-13 | 西安电子科技大学 | Facial Expression Recognition Method Based on Graph Convolutional Network |
Non-Patent Citations (1)
Title |
---|
杜启亮;黄理广;田联房;黄迪臻;靳守杰;李淼;: "基于视频监控的手扶电梯乘客异常行为识别", 华南理工大学学报(自然科学版), no. 08, 15 August 2020 (2020-08-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926823A (en) * | 2022-05-07 | 2022-08-19 | 西南交通大学 | WGCN-based vehicle driving behavior prediction method |
Also Published As
Publication number | Publication date |
---|---|
CN113989495B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110942000B (en) | Unmanned vehicle target detection method based on deep learning | |
CN109190444B (en) | An implementation method of a video-based toll lane vehicle feature recognition system | |
CN107038422B (en) | Fatigue state recognition method based on deep learning of spatial geometric constraints | |
CN112487908B (en) | Detection and dynamic tracking method of front vehicle line-pressing behavior based on vehicle video | |
Kuang et al. | Feature selection based on tensor decomposition and object proposal for night-time multiclass vehicle detection | |
CN111292366B (en) | Visual driving ranging algorithm based on deep learning and edge calculation | |
CN110309764B (en) | Multi-stage driver call-making behavior detection method based on deep learning | |
JP2016062610A (en) | Feature model creation method and feature model creation device | |
CN103914698A (en) | Method for recognizing and classifying road barriers based on video | |
CN107292933B (en) | Vehicle color identification method based on BP neural network | |
CN111401188B (en) | A traffic police gesture recognition method based on human body key point features | |
CN109886079A (en) | A vehicle detection and tracking method | |
CN108960074B (en) | Small-size pedestrian target detection method based on deep learning | |
CN111738336A (en) | Image detection method based on multi-scale feature fusion | |
CN111753651A (en) | A subway crowd abnormal behavior detection method based on two-dimensional crowd density analysis at stations | |
CN109948643A (en) | A vehicle type classification method based on deep network fusion model | |
CN110298257A (en) | A kind of driving behavior recognition methods based on human body multiple location feature | |
CN110689578A (en) | An obstacle recognition method for UAV based on monocular vision | |
CN106326851A (en) | Head detection method | |
CN104143091A (en) | One-sample face recognition method based on improved mLBP | |
CN117111055A (en) | Vehicle state sensing method based on thunder fusion | |
CN115308732A (en) | Multi-target detection and tracking method integrating millimeter wave radar and depth vision | |
CN116071374B (en) | A lane line instance segmentation method and system | |
CN113989495A (en) | Vision-based pedestrian calling behavior identification method | |
CN111062311B (en) | Pedestrian gesture recognition and interaction method based on depth-level separable convolution network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |