CN113989495A - Vision-based pedestrian calling behavior identification method - Google Patents

Vision-based pedestrian calling behavior identification method Download PDF

Info

Publication number
CN113989495A
CN113989495A CN202111362421.5A CN202111362421A CN113989495A CN 113989495 A CN113989495 A CN 113989495A CN 202111362421 A CN202111362421 A CN 202111362421A CN 113989495 A CN113989495 A CN 113989495A
Authority
CN
China
Prior art keywords
network
pedestrian
graph
random forest
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111362421.5A
Other languages
Chinese (zh)
Other versions
CN113989495B (en
Inventor
连静
王政皓
李琳辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202111362421.5A priority Critical patent/CN113989495B/en
Publication of CN113989495A publication Critical patent/CN113989495A/en
Application granted granted Critical
Publication of CN113989495B publication Critical patent/CN113989495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vision-based pedestrian calling behavior identification method, which comprises the following steps of: image preprocessing and intent inference. The invention adopts a computer vision method to accurately and efficiently identify the pedestrians with the taxi calling behavior from the image, realizes that the automatic taxi driving finds the passengers more efficiently, improves the use efficiency of the automatic taxi driving, and also improves the trip efficiency of the passengers. The invention adopts the spatial reasoning network to realize the reasoning of the pedestrian car-calling behavior, reduces the dependence on time dimension information, reduces the time characteristic extraction process compared with the traditional behavior recognition algorithm, can simplify the network and improve the real-time performance of behavior reasoning. The invention adopts a set of fusion rules with logical interpretability to realize the fusion of random forests and graph convolution networks, the characteristic of logical interpretability can improve the environmental adaptability and the behavior recognition precision of the algorithm, and the fusion algorithm can realize more stable and accurate reasoning on the pedestrian car-calling intention.

Description

一种基于视觉的行人召车行为识别方法A Vision-Based Pedestrian Car-hailing Behavior Recognition Method

技术领域technical field

本发明属于车辆智能化领域,尤其涉及一种自动驾驶出租车识别行人行为意图的方法。The invention belongs to the field of vehicle intelligence, and in particular relates to a method for recognizing pedestrian behavior intention by an automatic driving taxi.

背景技术Background technique

交通场景中的车辆识别行人的行为属于车辆智能化的范畴。准确有效的识别行人的召车意图可以帮助自动驾驶出租车在道路上快速寻找到有召车意图的行人,这对提高行人的出行效率和提高自动驾驶出租车的使用效率,避免交通拥堵具有重要意义。The behavior of vehicles in traffic scenes to identify pedestrians belongs to the category of vehicle intelligence. Accurate and effective identification of pedestrians' car-hailing intentions can help autonomous taxis quickly find pedestrians with car-hailing intentions on the road, which is important for improving pedestrians' travel efficiency, improving the use efficiency of autonomous taxis, and avoiding traffic congestion. significance.

行人召车行为识别是指利用计算机视觉的方法对交通场景中的行人进行分析,寻找具有召车意图的行人。交通场景具有高度的复杂性,交通参与者(包括行人、车辆、骑行者等)的数目和种类远高于其他应用场景,这增加了行为识别的难度。召车的行为与行人的其他行为(走路、跑步、骑行等)相比具有明显的随机性和瞬时性特点:首先,当前场景中的任何一个行人在任何时间都有可能转化成一个具有召车意图的人;另外,召车行为具有明显的瞬时的特性,司机判断一个人是否具有召车意图仅仅需要单独的一张图像就可以实现,而不需要考虑这张图像的前后连续几帧图像的信息。基于上述两个特点,传统的基于3DCNN(3D Convolutional Neural Network)和LSTM(Long Short Term Memory Network)的行为识别算法不能适用于具有瞬时特性的召车意图推理。行人的手势是表达行人意图的关键信息,而目前的大多数手势识别算法主要应用于室内的场景,且基于视觉的手势识别算法对图像中手部轮廓的分辨率要求较高,但智能车搭载的车载相机无法实现在复杂的交通场景中生成如此高质量的图像。Pedestrian car-hailing behavior recognition refers to the use of computer vision methods to analyze pedestrians in traffic scenes to find pedestrians with the intention of car-hailing. Traffic scenes are highly complex, and the number and types of traffic participants (including pedestrians, vehicles, cyclists, etc.) are much higher than other application scenarios, which increases the difficulty of behavior recognition. Compared with other behaviors of pedestrians (walking, running, cycling, etc.), the car-hailing behavior has obvious randomness and transient characteristics: First, any pedestrian in the current scene may be transformed into a car-hailing vehicle at any time. In addition, the car-hailing behavior has obvious instantaneous characteristics, and the driver only needs a single image to determine whether a person has the car-hailing intention, and does not need to consider the consecutive frames before and after the image. Information. Based on the above two characteristics, traditional behavior recognition algorithms based on 3DCNN (3D Convolutional Neural Network) and LSTM (Long Short Term Memory Network) cannot be applied to the reasoning of car-hailing intent with instantaneous characteristics. Pedestrian gestures are the key information to express the intention of pedestrians, and most of the current gesture recognition algorithms are mainly used in indoor scenes, and vision-based gesture recognition algorithms have high requirements on the resolution of the hand contour in the image, but smart cars are equipped with The in-vehicle cameras of 100% cannot achieve such high-quality images in complex traffic scenes.

发明内容SUMMARY OF THE INVENTION

为解决现有技术存在的上述问题,本发明要设计一种环境适应性强、识别精度高且基于视觉的行人召车行为识别方法,能够通过处理车载相机采集的图像,实时对图像中有召车意图行人的准确识别,从而帮助自动驾驶出租车更高效的发现乘客。In order to solve the above-mentioned problems existing in the prior art, the present invention aims to design a pedestrian car-calling behavior recognition method with strong environmental adaptability, high recognition accuracy and based on vision, which can process the images collected by the vehicle-mounted camera, and real-timely identify the calls in the images in real time. Accurate identification of vehicles and pedestrians to help autonomous taxis find passengers more efficiently.

为了实现上述目的,本发明的技术方案如下:一种基于视觉的行人召车行为识别方法,包括以下步骤:In order to achieve the above object, the technical solution of the present invention is as follows: a vision-based pedestrian car-hailing behavior recognition method, comprising the following steps:

A、图像预处理A. Image preprocessing

采用目标检测算法和人体关键点提取算法实现对图像的预处理,得到行人的检测框D以及每个检测框内所对应的行人的关键点参数K,在召车行为推理的过程中,人体的面部注意力是判断其是否具有召车意图的关键线索,在真实的场景中,行人召车的过程,行人将会对出租车具有高度的注意力。对面部注意力的推理,从两个方面进行,首先利用人体关键点检测中所检测到的面部关键点进行推理,以左耳关键点和右耳关键点的横坐标之差hp为基准,以σ为放大系数,形成一个边长为σhp的正方形框S作为面部区域;当左耳关键点和鼻关键点的横向距离hf大于hp,意味着行人的面部以相对侧面的角度正对出租车,即行人对车辆的注意力较小;当hf小于hp,将面部区域S输入到面部注意力深度网络中计算行人的面部注意力概率;面部注意力深度网络包括前部网络和后部网络,前部网络为特征提取网络,采用Resnet50作为基准网络,提取面部特征;后部网络为由全连接层组成的特征连接网络,实现将前部网络所提取的面部特征连接,得到全局特征,输出为面部注意力概率ρfThe target detection algorithm and the human body key point extraction algorithm are used to preprocess the image, and the pedestrian detection frame D and the pedestrian key point parameter K corresponding to each detection frame are obtained. Facial attention is a key clue to determine whether it has the intention of hailing a car. In a real scene, in the process of pedestrian hailing a car, pedestrians will have a high degree of attention to taxis. The reasoning of facial attention is carried out from two aspects. First, the facial key points detected in the human body key point detection are used for inference, and the difference h p between the abscissas of the left ear key point and the right ear key point is used as the benchmark. Taking σ as the magnification factor, a square frame S with side length σh p is formed as the face area; when the lateral distance h f between the left ear key point and the nose key point is greater than h p , it means that the pedestrian's face is at a positive angle from the opposite side. For taxis, that is, pedestrians pay less attention to vehicles; when h f is less than h p , input the face region S into the facial attention depth network to calculate the pedestrian's facial attention probability; the facial attention depth network includes the front network and the rear network, the front network is a feature extraction network, and Resnet50 is used as the benchmark network to extract facial features; the rear network is a feature connection network composed of fully connected layers, and the facial features extracted by the front network are connected. global feature, the output is the facial attention probability ρ f ;

B、意图推理B. Intentional reasoning

采用随机森林算法和图卷积网络相结合进行行人的意图推理,具体步骤如下:Using the combination of random forest algorithm and graph convolutional network to infer pedestrian intent, the specific steps are as follows:

B1、采用随机森林算法推理人体关键点之间的连接角度和行人意图的关系,随机森林的输入是人体关键点的连接角度,为了防止出现过拟合的现象,选取一些与行人召车关系较强的关键点角度作为随机森林的输入,包括以颈关键点、左肩关键点、右肩关键点、左肘关键点、右肘关键点为顶点的连接角度,随机森林的输出为行人具有召车意图的概率ρrB1. The random forest algorithm is used to infer the relationship between the connection angle between the key points of the human body and the intention of pedestrians. The input of the random forest is the connection angle of the key points of the human body. The strong key point angle is used as the input of random forest, including the connection angle of neck key point, left shoulder key point, right shoulder key point, left elbow key point, and right elbow key point as vertices. The output of random forest is that pedestrians have car-hailing Probability ρ r of intent.

B2、采用图卷积网络推理人体关键点位置与行人意图的关系,图卷积网络的输入为人体图模型G(v,e),其中,v为人体图模型的节点,即人体关键点,节点特征为关键点的坐标,e为人体图模型的边,即节点之间的连接。由于目标检测所获取的检测框D的尺寸不固定,为了降低检测框尺寸对意图推理的影响,采用坐标转换实现将人体关键点的图像坐标转化为以人体颈部关键点为原点的关联坐标:B2. The graph convolution network is used to infer the relationship between the position of the human body key points and the pedestrian intent. The input of the graph convolution network is the human body graph model G(v,e), where v is the node of the human body graph model, that is, the human body key point, The node features are the coordinates of the key points, and e is the edge of the human body graph model, that is, the connection between the nodes. Since the size of the detection frame D obtained by the target detection is not fixed, in order to reduce the influence of the size of the detection frame on the intention inference, coordinate transformation is used to convert the image coordinates of the human body key points into the associated coordinates with the human neck key point as the origin:

Figure BDA0003359810260000031
Figure BDA0003359810260000031

其中,xinew和yinew为第i个人体关键点转换后的横坐标和纵坐标;ui与vi为第i个人体关键点的转换前的横坐标和纵坐标;u1与v1为颈部关键点的横坐标和纵坐标。Among them, x inew and y inew are the abscissa and ordinate after the transformation of the key point of the ith person; u i and v i are the abscissa and ordinate before the transformation of the key point of the ith person; u 1 and v 1 are the abscissa and ordinate of the neck key point.

图卷积网络的过程为:The process of graph convolutional network is:

Figure BDA0003359810260000032
Figure BDA0003359810260000032

Figure BDA0003359810260000041
Figure BDA0003359810260000041

其中,

Figure BDA0003359810260000042
A是人体图模型的邻接矩阵;
Figure BDA0003359810260000043
是人体图模型的度矩阵;H(l)是第l层图卷积的输出特征,H(l+1)为第l+1层图卷积的输出特征;W(l)为第l层图卷积的参数矩阵;
Figure BDA0003359810260000044
是激活函数;Z是图卷积网络的输出,即行人具有召车意图的概率ρg;H(z)是最后一层图卷积的特征矩阵;W(z)是最后一层图卷积的参数矩阵;readout(·)是由全连接层组成的图读出网络,实现将人体图模型中的所有节点特征聚合连接。in,
Figure BDA0003359810260000042
A is the adjacency matrix of the human figure model;
Figure BDA0003359810260000043
is the degree matrix of the human graph model; H (l) is the output feature of the lth layer graph convolution, H (l+1) is the output feature of the l+1th layer graph convolution; W (l) is the lth layer The parameter matrix of graph convolution;
Figure BDA0003359810260000044
is the activation function; Z is the output of the graph convolution network, that is, the probability ρ g that the pedestrian has the intention to call a car; H (z) is the feature matrix of the last layer of graph convolution; W (z) is the last layer of graph convolution The parameter matrix of ; readout( ) is a graph readout network composed of fully connected layers, which realizes the aggregation and connection of all node features in the human graph model.

B3、算法融合B3. Algorithm fusion

通过随机森林和图卷积网络,分别得到行人具有召车意图的概率随机森林输出概率ρr和图卷积网络输出概率ρg,为了得到更稳定准确的意图推理,提出一套具有逻辑上可解释的融合规则实现将随机森林和图卷积网络融合,融合规则如下:Through random forest and graph convolutional network, the probability of the pedestrian's intention to call a car is obtained. The random forest output probability ρ r and the graph convolution network output probability ρ g are respectively obtained. In order to obtain more stable and accurate intention reasoning, a set of logically feasible The explained fusion rule realizes the fusion of random forest and graph convolutional network. The fusion rule is as follows:

Figure BDA0003359810260000045
Figure BDA0003359810260000045

其中,p是融合后行人具有召车意图的概率。当pg>0.5且pr>0.5或者pg<0.5且pr<0.5时,意味着随机森林算法和图卷积网络算法具有相同的推理结果,则融合概率p为

Figure BDA0003359810260000046
当pg>0.5且pr<0.5时,则意味着随机森林算法和图卷积网络算法具有不同的推理结果,图卷积网络的推理结果为行人具有召车意图,随机森林的推理结果为行人没有召车意图,为了得到一个更准确的推理结果,面部注意力概率pf作为动态权重对pg和pr实现动态加权平均,即,当pf>0.5,意味着行人具有较高的召车概率,则赋予图卷积网络的输出一个更高的权重,而随机森林的输出赋予一个较低的权重;当pf<0.5时,则赋予随机森林的输出一个更高的权重,而赋予图卷积网络的输出一个更低的权重;当pg<0.5且pr>0.5时,则意味着另一种随机森林算法和图卷积网络算法具有不同的推理结果的情况,图卷积网络的推理结果为行人没有召车意图,而随机森林的推理结果为行人具有召车意图,当pf>0.5时,意味着随机森林的推理结果有更高的概率为正确的结果,则随机森林的输出赋予更高的权重,而图卷积网络的输出赋予更低的权重;反之,当pf<0.5时,则图卷积网络的输出赋予更高的权重,而随机森林的输出赋予更低的权重。Among them, p is the probability that the pedestrian has the intention to call a car after fusion. When p g >0.5 and p r >0.5 or p g <0.5 and p r <0.5, it means that the random forest algorithm and the graph convolutional network algorithm have the same inference result, then the fusion probability p is
Figure BDA0003359810260000046
When p g > 0.5 and p r < 0.5, it means that the random forest algorithm and the graph convolutional network algorithm have different inference results. The inference result of the graph convolution network is that the pedestrian has the intention to call a car, and the inference result of the random forest is Pedestrians have no intention to call a car. In order to obtain a more accurate inference result, the facial attention probability p f is used as a dynamic weight to achieve a dynamic weighted average of p g and p r , that is, when p f > 0.5, it means that the pedestrian has a higher car-hailing probability, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a lower weight; when p f < 0.5, the output of the random forest is given a higher weight, and Give the output of the graph convolutional network a lower weight; when p g < 0.5 and p r > 0.5, it means that another random forest algorithm and the graph convolutional network algorithm have different inference results. The reasoning result of the product network is that the pedestrian has no intention to call a car, while the reasoning result of the random forest is that the pedestrian has the intention to call a car. When p f > 0.5, it means that the reasoning result of the random forest has a higher probability of being the correct result, then The output of the random forest is given a higher weight, while the output of the graph convolutional network is given a lower weight; conversely, when p f < 0.5, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a higher weight. assign lower weights.

与现有技术相比,本发明的有益效果和益处如下:Compared with the prior art, the beneficial effects and benefits of the present invention are as follows:

1、本发明采用计算机视觉的方法从图像中准确高效的识别出具有召车行为的行人,实现自动驾驶出租车更高效的发现乘客,提高了自动驾驶出租车的使用效率,也提高了乘客的出行效率。1. The present invention uses the method of computer vision to accurately and efficiently identify pedestrians with car-hailing behaviors from images, so that the autonomous taxi can find passengers more efficiently, improve the use efficiency of the autonomous taxi, and improve the passenger's safety. travel efficiency.

2、本发明采用了空间推理网络实现对行人召车行为的推理,减少了对时间维度信息的依赖,与传统的行为识别算法相比,减少了时间特征提取的过程,能够简化网络,提高行为推理的实时性。2. The present invention adopts a spatial reasoning network to realize the reasoning of pedestrian car-hailing behavior, which reduces the dependence on time dimension information. Compared with the traditional behavior recognition algorithm, the process of time feature extraction is reduced, which can simplify the network and improve the behavior. Real-time inference.

3、本发明采用了一套具有逻辑上可解释的融合规则,实现随机森林和图卷积网络的融合,逻辑上可解释的特性能够提高算法的环境适应性和行为识别的精度,实现融合算法对行人召车意图更稳定准确的推理。3. The present invention adopts a set of logically interpretable fusion rules to realize the fusion of random forest and graph convolutional network. The logically interpretable characteristics can improve the environmental adaptability of the algorithm and the accuracy of behavior recognition, and realize the fusion algorithm. More stable and accurate reasoning about pedestrian car-hailing intentions.

附图说明Description of drawings

图1是本发明的流程示意图。FIG. 1 is a schematic flow chart of the present invention.

图2是OpenPose提取的人体关键点示意图。Figure 2 is a schematic diagram of human key points extracted by OpenPose.

图3是面部注意力深度网络示意图。Figure 3 is a schematic diagram of the facial attention deep network.

图4是随机森林示意图。Figure 4 is a schematic diagram of a random forest.

图5是图卷积网络示意图。Figure 5 is a schematic diagram of a graph convolutional network.

具体实施方式Detailed ways

下面结合附图对本发明进行进一步的描述,如图1所示,一种基于视觉的行人召车行为识别方法,包括以下步骤:The present invention will be further described below in conjunction with the accompanying drawings. As shown in Figure 1, a visual-based pedestrian car-hailing behavior recognition method includes the following steps:

A、图像预处理A. Image preprocessing

采用Yolov5作为目标检测方法和人体关键点提取算法OpenPose实现对图像的预处理,得到行人的检测框D以及每个检测框内所对应的行人的关键点参数K,其中,关键点的参数如图2所示,关键点的序列与人体部位的对应关系为:Using Yolov5 as the target detection method and the human body key point extraction algorithm OpenPose to realize the preprocessing of the image, the pedestrian detection frame D and the pedestrian key point parameter K corresponding to each detection frame are obtained. The parameters of the key points are shown in the figure 2, the correspondence between the sequence of key points and the body parts is:

Figure BDA0003359810260000061
Figure BDA0003359810260000061

目标检测所提供的检测框可以提高人体关键点提取的精度。在召车意图推理的过程中,人体的面部注意力是判断其是否具有召车意图的关键线索,在真实的场景中,行人召车的过程,行人将会对出租车具有高度的注意力。对面部注意力的推理,本发明主要从两个方面进行,首先利用人体关键点检测中所检测到的面部关键点进行推理,以关键点16和关键点17的横坐标之差hp为基准,以σ=1.2为放大系数,形成一个边长为σhp的正方形框S作为面部区域,当关键点16和关键点0之间的横向距离hf大于hp,意味着行人的面部以相对侧面的角度正对出租车,即行人对车辆的注意力较小,设置面部注意力概率ρf=0.1,当hf小于hp,则很难通过上述判断行人是否注意到车辆,因此,将面部区域S输入到面部注意力深度网络中计算其面部注意力概率,面部注意力深度网络示意图如图3所示,主要有两部分组成,前部分为特征提取网络,采用Resnet50作为基准网络,提取面部特征,后部分为由全连接层组成的特征连接网络,将前部分所提取的特征连接,得到全局特征,输出为面部注意力概率ρfThe detection frame provided by target detection can improve the accuracy of human key point extraction. In the process of car-hailing intention reasoning, the facial attention of the human body is the key clue to determine whether it has the car-hailing intention. In the real scene, pedestrians will have a high degree of attention to the taxi during the process of hailing a car. For the reasoning of facial attention, the present invention is mainly carried out from two aspects. First, the facial key points detected in the human body key point detection are used for reasoning, and the difference h p between the abscissas of the key point 16 and the key point 17 is used as the benchmark. , with σ=1.2 as the magnification factor, a square frame S with side length σh p is formed as the face area. When the horizontal distance h f between the key point 16 and the key point 0 is greater than h p , it means that the pedestrian's face is relatively The angle of the side faces the taxi, that is, the pedestrian pays less attention to the vehicle, and the facial attention probability ρ f = 0.1 is set. When h f is less than h p , it is difficult to judge whether the pedestrian pays attention to the vehicle through the above. Therefore, set the The facial area S is input into the facial attention depth network to calculate its facial attention probability. The schematic diagram of the facial attention depth network is shown in Figure 3. It mainly consists of two parts. The first part is the feature extraction network. Resnet50 is used as the benchmark network. Facial features, the latter part is a feature connection network composed of fully connected layers, and the features extracted in the former part are connected to obtain global features, and the output is the facial attention probability ρ f .

B、意图推理B. Intentional reasoning

通过步骤A可以得到行人的目标检测框D,目标检测框内行人的人体关键点K以及对应行人的面部注意力概率ρf。本发明采用随机森林算法和图卷积网络相结合进行行人的意图推理。Through step A, the target detection frame D of the pedestrian, the human key points K of the pedestrian in the target detection frame and the facial attention probability ρ f of the corresponding pedestrian can be obtained. The present invention adopts the combination of random forest algorithm and graph convolution network to perform pedestrian intention reasoning.

B1、随机森林主要推理人体关键点之间的连接角度和行人意图的关系。因此,随机森林的输入是人体关键点的连接角度,为了防止出现过拟合的现象,本发明中,选取一些与行人召车关系较强的关键点角度作为随机森林的输入,包括以关键点1、关键点2、关键点3、关键点5和关键点6为顶点的连接角度,随机森林的输出为行人具有召车意图的概率为ρr,所输入的关键点连接角度为以关键点1为顶点的∠318、∠6111、∠418、∠7111、∠618、∠617;以关键点2为顶点的∠123、∠124;以关键点5为顶点的∠156、∠157;以关键点3为顶点的∠234、∠438、∠134;以关键点6为顶点的∠567、∠7611、∠167。B1. Random forest mainly infers the relationship between the connection angle between human key points and pedestrian intent. Therefore, the input of the random forest is the connection angle of the key points of the human body. In order to prevent the phenomenon of overfitting, in the present invention, some key point angles that have a strong relationship with the pedestrian car-hailing are selected as the input of the random forest, including the key point 1. The key point 2, key point 3, key point 5 and key point 6 are the connection angles of the vertices. The output of the random forest is that the probability that the pedestrian has the intention to call a car is ρ r , and the input key point connection angle is based on the key point. ∠318, ∠6111, ∠418, ∠7111, ∠618, ∠617 with 1 as the vertex; ∠123, ∠124 with the key point 2 as the vertex; ∠156, ∠157 with the key point 5 as the vertex; Point 3 is ∠234, ∠438, ∠134; the key point 6 is ∠567, ∠7611, ∠167.

随机森林的示意图如图4所示,随机森林是由N个独立的决策树组成,其中,N=55,使用不同的数据集来训练不同的决策树,得到包含训练参数的相应的模型。每棵决策树都是一个特定的分类器,并根据输入数据做出独立的决策。决策聚合的过程采用多数投票法,输出为决策是召车意图的决策树的数目与决策树总数的比值,即行人具有召车意图的概率ρrThe schematic diagram of the random forest is shown in Figure 4. The random forest is composed of N independent decision trees, where N=55. Different data sets are used to train different decision trees to obtain the corresponding model including the training parameters. Each decision tree is a specific classifier and makes independent decisions based on the input data. The process of decision aggregation adopts the majority voting method, and the output is the ratio of the number of decision trees whose decision is the car-hailing intention to the total number of decision trees, that is, the probability ρ r that the pedestrian has the car-hailing intention.

B2、图卷积网络主要推理人体关键点位置与行人意图的关系,因此,图卷积网络的输入为人体图模型G(v,e),其中,v为人体图模型的节点,即人体关键点,节点特征为关键点的坐标,e为人体图模型的边,即节点之间的连接。由于目标检测所获取的检测框D的尺寸不固定,为了降低检测框尺寸对意图推理的影响,采用坐标转换实现将人体关键点的图像坐标转化为以关键点1为原点的关联坐标:B2. The graph convolutional network mainly infers the relationship between the position of the human body key points and the pedestrian intent. Therefore, the input of the graph convolutional network is the human body graph model G(v,e), where v is the node of the human body graph model, that is, the human body key point, the node feature is the coordinate of the key point, e is the edge of the human body graph model, that is, the connection between the nodes. Since the size of the detection frame D obtained by the target detection is not fixed, in order to reduce the influence of the size of the detection frame on the intention reasoning, coordinate transformation is used to convert the image coordinates of the key points of the human body into the associated coordinates with the key point 1 as the origin:

Figure BDA0003359810260000081
Figure BDA0003359810260000081

其中,xinew、yinew为第i个人体关键点的转换后的横坐标和纵坐标,ui与vi为第i个人体关键点的转换前的横坐标和纵坐标;u1与v1为关键点1的横坐标和纵坐标。Among them, x inew and y inew are the abscissa and ordinate after the transformation of the ith person's key point, u i and v i are the abscissa and ordinate before the transformation of the ith person's key point; u 1 and v 1 is the abscissa and ordinate of key point 1.

图卷积网络的示意图如图5所示,将人体图模型输入到图卷积网络中,人体的每个节点特征沿节点之间的边将节点特征传递到相邻的节点中,而每个节点也聚合来自相邻节点所传递的特征,实现节点特征沿边的传递和聚合,为增强模型的表达能力,在每一层图卷积后,采用激活函数RELU实现节点特征非线性映射,最后,采用全连层组成的图读出网络实现将所有节点特征的聚合连接,得到最终的分类结果。The schematic diagram of the graph convolutional network is shown in Figure 5. The human body graph model is input into the graph convolutional network, and each node feature of the human body transfers the node feature to the adjacent nodes along the edges between the nodes, and each Nodes also aggregate the features transmitted from adjacent nodes to realize the transfer and aggregation of node features along the edge. In order to enhance the expressive ability of the model, after each layer of graph convolution, the activation function RELU is used to realize the nonlinear mapping of node features. Finally, A graph readout network composed of fully connected layers is used to aggregate and connect the features of all nodes to obtain the final classification result.

图卷积网络的过程可以总结为:The process of graph convolutional network can be summarized as:

Figure BDA0003359810260000082
Figure BDA0003359810260000082

Figure BDA0003359810260000091
Figure BDA0003359810260000091

其中,

Figure BDA0003359810260000092
A是人体图模型的邻接矩阵;
Figure BDA0003359810260000093
是人体图模型的度矩阵;H(l)是第l层图卷积的输出特征,H(l+1)为第l+1层图卷积的输出特征;W(l)为第l层图卷积的参数矩阵;
Figure BDA0003359810260000094
是激活函数RELU;Z是图卷积网络的输出,即行人具有召车意图的概率ρg;H(z)是最后一层图卷积的特征矩阵;W(z)是最后一层图卷积的参数矩阵;readout(·)是由全连接层组成的图读出网络,能够实现将人体图模型中的所有节点特征聚合连接。in,
Figure BDA0003359810260000092
A is the adjacency matrix of the human figure model;
Figure BDA0003359810260000093
is the degree matrix of the human graph model; H (l) is the output feature of the lth layer graph convolution, H (l+1) is the output feature of the l+1th layer graph convolution; W (l) is the lth layer The parameter matrix of graph convolution;
Figure BDA0003359810260000094
is the activation function RELU; Z is the output of the graph convolution network, that is, the probability ρ g of the pedestrian with the intention to call a car; H (z) is the feature matrix of the last layer of graph convolution; W (z) is the last layer of graph volume The parameter matrix of the product; readout( ) is a graph readout network composed of fully connected layers, which can aggregate and connect all node features in the human graph model.

B3、算法融合B3. Algorithm fusion

通过随机森林和图卷积网络,分别得到行人具有召车意图的概率ρr和ρg,为了能够得到更稳定准确的意图推理,本发明提出一套具有逻辑上可解释的融合规则实现将随机森林和图卷积网络融合,融合规则如下:Through random forest and graph convolutional network, the probability ρ r and ρ g of pedestrians with car-calling intention are obtained respectively. In order to obtain more stable and accurate intention reasoning, the present invention proposes a set of logically interpretable fusion rules to realize random Fusion of forest and graph convolutional network, the fusion rules are as follows:

Figure BDA0003359810260000095
Figure BDA0003359810260000095

其中,p是融合后行人具有召车意图的概率。当pg>0.5且pr>0.5或者pg<0.5且pr<0.5时,意味着随机森林算法和图卷积网络算法具有相同的推理结果,则融合概率p为

Figure BDA0003359810260000096
当pg>0.5且pr<0.5时,则意味着随机森林算法和图卷积网络算法具有不同的推理结果,图卷积网络的推理结果为行人具有召车意图,随机森林的推理结果为行人没有召车意图,为了得到一个更准确的推理结果,面部注意力pf作为动态权重对pg和pr实现动态加权平均,即,当pf>0.5,意味着行人具有较高的召车概率,则赋予图卷积网络的输出一个更高的权重,而随机森林的输出赋予一个较低的权重;当pf<0.5时,则赋予随机森林的输出一个更高的权重,而赋予图卷积网络的输出一个更低的权重;当pg<0.5且pr>0.5时,则意味着另一种随机森林算法和图卷积网络算法具有不同的推理结果的情况,图卷积网络的推理结果为行人没有召车意图,而随机森林的推理结果为行人具有召车意图,当pf>0.5时,意味着随机森林的推理结果有更高的概率为正确的结果,则随机森林的输出赋予更高的权重,而图卷积网络的输出赋予更低的权重;反之,当pf<0.5时,则图卷积网络的输出赋予更高的权重,而随机森林的输出赋予更低的权重。Among them, p is the probability that the pedestrian has the intention to call a car after fusion. When p g >0.5 and p r >0.5 or p g <0.5 and p r <0.5, it means that the random forest algorithm and the graph convolutional network algorithm have the same inference result, then the fusion probability p is
Figure BDA0003359810260000096
When p g > 0.5 and p r < 0.5, it means that the random forest algorithm and the graph convolutional network algorithm have different inference results. The inference result of the graph convolution network is that the pedestrian has the intention to call a car, and the inference result of the random forest is Pedestrians have no intention to call a car. In order to obtain a more accurate inference result, the facial attention p f is used as a dynamic weight to achieve a dynamic weighted average of p g and p r , that is, when p f > 0.5, it means that the pedestrian has a higher calling car probability, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a lower weight; when p f < 0.5, the output of the random forest is given a higher weight, and the output of the random forest is given a lower weight; The output of the graph convolutional network has a lower weight; when p g <0.5 and p r >0.5, it means that another random forest algorithm and the graph convolutional network algorithm have different inference results. The inference result of the network is that the pedestrian has no intention to call a car, while the inference result of the random forest is that the pedestrian has the intention to call a car. When p f > 0.5, it means that the inference result of the random forest has a higher probability of being the correct result, then the random forest The output of the forest is given a higher weight, while the output of the graph convolutional network is given a lower weight; conversely, when p f < 0.5, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a higher weight. lower weight.

本发明不局限于本实施例,任何在本发明披露的技术范围内的等同构思或者改变,均列为本发明的保护范围。The present invention is not limited to this embodiment, and any equivalent ideas or changes within the technical scope disclosed in the present invention are included in the protection scope of the present invention.

Claims (1)

1.一种基于视觉的行人召车行为识别方法,其特征在于:包括以下步骤:1. a vision-based pedestrian car-hailing behavior recognition method, is characterized in that: comprise the following steps: A、图像预处理A. Image preprocessing 采用目标检测算法和人体关键点提取算法实现对图像的预处理,得到行人的检测框D以及每个检测框内所对应的行人的关键点参数K,在召车行为推理的过程中,人体的面部注意力是判断其是否具有召车意图的关键线索,在真实的场景中,行人召车的过程,行人将会对出租车具有高度的注意力;对面部注意力的推理,从两个方面进行,首先利用人体关键点检测中所检测到的面部关键点进行推理,以左耳关键点和右耳关键点的横坐标之差hp为基准,以σ为放大系数,形成一个边长为σhp的正方形框S作为面部区域;当左耳关键点和鼻关键点的横向距离hf大于hp,意味着行人的面部以相对侧面的角度正对出租车,即行人对车辆的注意力较小;当hf小于hp,将面部区域S输入到面部注意力深度网络中计算行人的面部注意力概率;面部注意力深度网络包括前部网络和后部网络,前部网络为特征提取网络,采用Resnet50作为基准网络,提取面部特征;后部网络为由全连接层组成的特征连接网络,实现将前部网络所提取的面部特征连接,得到全局特征,输出为面部注意力概率ρfThe target detection algorithm and the human body key point extraction algorithm are used to preprocess the image, and the pedestrian detection frame D and the pedestrian key point parameter K corresponding to each detection frame are obtained. Facial attention is a key clue to determine whether it has the intention of hailing a car. In a real scene, in the process of pedestrian hailing a car, pedestrians will have a high degree of attention to the taxi; the reasoning of facial attention, from two aspects First, use the facial key points detected in the human body key point detection to infer, take the difference h p between the abscissas of the left ear key point and the right ear key point as the benchmark, and use σ as the amplification factor to form a side length of The square frame S of σh p is used as the face area; when the lateral distance h f between the left ear key point and the nose key point is greater than h p , it means that the pedestrian's face is facing the taxi at the opposite side angle, that is, the pedestrian's attention to the vehicle smaller; when h f is less than h p , input the face region S into the facial attention depth network to calculate the pedestrian's facial attention probability; the facial attention depth network includes the front network and the back network, and the front network is the feature extraction The network uses Resnet50 as the benchmark network to extract facial features; the rear network is a feature connection network composed of fully connected layers, which realizes the connection of the facial features extracted by the front network to obtain global features, and the output is the facial attention probability ρ f ; B、意图推理B. Intentional reasoning 采用随机森林算法和图卷积网络相结合进行行人的意图推理,具体步骤如下:Using the combination of random forest algorithm and graph convolutional network to infer pedestrian intent, the specific steps are as follows: B1、采用随机森林算法推理人体关键点之间的连接角度和行人意图的关系,随机森林的输入是人体关键点的连接角度,为了防止出现过拟合的现象,选取一些与行人召车关系较强的关键点角度作为随机森林的输入,包括以颈关键点、左肩关键点、右肩关键点、左肘关键点、右肘关键点为顶点的连接角度,随机森林的输出为行人具有召车意图的概率ρrB1. The random forest algorithm is used to infer the relationship between the connection angle between the key points of the human body and the intention of pedestrians. The input of the random forest is the connection angle of the key points of the human body. The strong key point angle is used as the input of random forest, including the connection angle of neck key point, left shoulder key point, right shoulder key point, left elbow key point, and right elbow key point as vertices. The output of random forest is that pedestrians have car-hailing the probability of intent ρ r ; B2、采用图卷积网络推理人体关键点位置与行人意图的关系,图卷积网络的输入为人体图模型G(v,e),其中,v为人体图模型的节点,即人体关键点,节点特征为关键点的坐标,e为人体图模型的边,即节点之间的连接;由于目标检测所获取的检测框D的尺寸不固定,为了降低检测框尺寸对意图推理的影响,采用坐标转换实现将人体关键点的图像坐标转化为以人体颈部关键点为原点的关联坐标:B2. The graph convolution network is used to infer the relationship between the position of the human body key points and the pedestrian intent. The input of the graph convolution network is the human body graph model G(v,e), where v is the node of the human body graph model, that is, the human body key point, The node features are the coordinates of the key points, and e is the edge of the human body graph model, that is, the connection between the nodes; since the size of the detection frame D obtained by the target detection is not fixed, in order to reduce the influence of the size of the detection frame on the intention reasoning, the coordinates are used. The transformation realizes the transformation of the image coordinates of the key points of the human body into the associated coordinates with the key points of the neck of the human body as the origin:
Figure FDA0003359810250000021
Figure FDA0003359810250000021
其中,xinew和yinew为第i个人体关键点转换后的横坐标和纵坐标;ui与vi为第i个人体关键点的转换前的横坐标和纵坐标;u1与v1为颈部关键点的横坐标和纵坐标;Among them, x inew and y inew are the abscissa and ordinate after the transformation of the key point of the ith person; u i and v i are the abscissa and ordinate before the transformation of the key point of the ith person; u 1 and v 1 are the abscissa and ordinate of the neck key point; 图卷积网络的过程为:The process of graph convolutional network is:
Figure FDA0003359810250000022
Figure FDA0003359810250000022
Figure FDA0003359810250000023
Figure FDA0003359810250000023
其中,
Figure FDA0003359810250000024
A是人体图模型的邻接矩阵;
Figure FDA0003359810250000025
是人体图模型的度矩阵;H(l)是第l层图卷积的输出特征,H(l+1)为第l+1层图卷积的输出特征;W(l)为第l层图卷积的参数矩阵;
Figure FDA0003359810250000026
是激活函数;Z是图卷积网络的输出,即行人具有召车意图的概率ρg;H(z)是最后一层图卷积的特征矩阵;W(z)是最后一层图卷积的参数矩阵;readout(·)是由全连接层组成的图读出网络,实现将人体图模型中的所有节点特征聚合连接;
in,
Figure FDA0003359810250000024
A is the adjacency matrix of the human figure model;
Figure FDA0003359810250000025
is the degree matrix of the human graph model; H (l) is the output feature of the lth layer graph convolution, H (l+1) is the output feature of the l+1th layer graph convolution; W (l) is the lth layer The parameter matrix of graph convolution;
Figure FDA0003359810250000026
is the activation function; Z is the output of the graph convolution network, that is, the probability ρ g that the pedestrian has the intention to call a car; H (z) is the feature matrix of the last layer of graph convolution; W (z) is the last layer of graph convolution The parameter matrix of ; readout( ) is a graph readout network composed of fully connected layers, which realizes the aggregation and connection of all node features in the human graph model;
B3、算法融合B3. Algorithm fusion 通过随机森林和图卷积网络,分别得到行人具有召车意图的概率随机森林输出概率ρr和图卷积网络输出概率ρg,为了得到更稳定准确的意图推理,提出一套具有逻辑上可解释的融合规则实现将随机森林和图卷积网络融合,融合规则如下:Through random forest and graph convolutional network, the probability of the pedestrian's intention to call a car is obtained. The random forest output probability ρ r and the graph convolution network output probability ρ g are respectively obtained. In order to obtain more stable and accurate intention reasoning, a set of logically feasible The explained fusion rule realizes the fusion of random forest and graph convolutional network. The fusion rule is as follows:
Figure FDA0003359810250000031
Figure FDA0003359810250000031
其中,p是融合后行人具有召车意图的概率;当pg>0.5且pr>0.5或者pg<0.5且pr<0.5时,意味着随机森林算法和图卷积网络算法具有相同的推理结果,则融合概率p为
Figure FDA0003359810250000032
当pg>0.5且pr<0.5时,则意味着随机森林算法和图卷积网络算法具有不同的推理结果,图卷积网络的推理结果为行人具有召车意图,随机森林的推理结果为行人没有召车意图,为了得到一个更准确的推理结果,面部注意力概率pf作为动态权重对pg和pr实现动态加权平均,即,当pf>0.5,意味着行人具有较高的召车概率,则赋予图卷积网络的输出一个更高的权重,而随机森林的输出赋予一个较低的权重;当pf<0.5时,则赋予随机森林的输出一个更高的权重,而赋予图卷积网络的输出一个更低的权重;当pg<0.5且pr>0.5时,则意味着另一种随机森林算法和图卷积网络算法具有不同的推理结果的情况,图卷积网络的推理结果为行人没有召车意图,而随机森林的推理结果为行人具有召车意图,当pf>0.5时,意味着随机森林的推理结果有更高的概率为正确的结果,则随机森林的输出赋予更高的权重,而图卷积网络的输出赋予更低的权重;反之,当pf<0.5时,则图卷积网络的输出赋予更高的权重,而随机森林的输出赋予更低的权重。
Among them, p is the probability that the pedestrian has the intention to call a car after fusion; when p g >0.5 and p r >0.5 or p g <0.5 and p r <0.5, it means that the random forest algorithm and the graph convolutional network algorithm have the same Inference result, then the fusion probability p is
Figure FDA0003359810250000032
When p g > 0.5 and p r < 0.5, it means that the random forest algorithm and the graph convolutional network algorithm have different inference results. The inference result of the graph convolution network is that the pedestrian has the intention to call a car, and the inference result of the random forest is Pedestrians have no intention to call a car. In order to obtain a more accurate inference result, the facial attention probability p f is used as a dynamic weight to achieve a dynamic weighted average of p g and p r , that is, when p f > 0.5, it means that the pedestrian has a higher car-hailing probability, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a lower weight; when p f < 0.5, the output of the random forest is given a higher weight, and Give the output of the graph convolutional network a lower weight; when p g < 0.5 and p r > 0.5, it means that another random forest algorithm and the graph convolutional network algorithm have different inference results. The reasoning result of the product network is that the pedestrian has no intention to call a car, while the reasoning result of the random forest is that the pedestrian has the intention to call a car. When p f > 0.5, it means that the reasoning result of the random forest has a higher probability of being the correct result, then The output of the random forest is given a higher weight, while the output of the graph convolutional network is given a lower weight; conversely, when p f < 0.5, the output of the graph convolutional network is given a higher weight, and the output of the random forest is given a higher weight. assign lower weights.
CN202111362421.5A 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision Active CN113989495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111362421.5A CN113989495B (en) 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111362421.5A CN113989495B (en) 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision

Publications (2)

Publication Number Publication Date
CN113989495A true CN113989495A (en) 2022-01-28
CN113989495B CN113989495B (en) 2024-04-26

Family

ID=79749065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111362421.5A Active CN113989495B (en) 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision

Country Status (1)

Country Link
CN (1) CN113989495B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926823A (en) * 2022-05-07 2022-08-19 西南交通大学 WGCN-based vehicle driving behavior prediction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117701A (en) * 2018-06-05 2019-01-01 东南大学 Pedestrian's intension recognizing method based on picture scroll product
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN112052802A (en) * 2020-09-09 2020-12-08 上海工程技术大学 Front vehicle behavior identification method based on machine vision
CN113255543A (en) * 2021-06-02 2021-08-13 西安电子科技大学 Facial Expression Recognition Method Based on Graph Convolutional Network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117701A (en) * 2018-06-05 2019-01-01 东南大学 Pedestrian's intension recognizing method based on picture scroll product
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN112052802A (en) * 2020-09-09 2020-12-08 上海工程技术大学 Front vehicle behavior identification method based on machine vision
CN113255543A (en) * 2021-06-02 2021-08-13 西安电子科技大学 Facial Expression Recognition Method Based on Graph Convolutional Network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杜启亮;黄理广;田联房;黄迪臻;靳守杰;李淼;: "基于视频监控的手扶电梯乘客异常行为识别", 华南理工大学学报(自然科学版), no. 08, 15 August 2020 (2020-08-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926823A (en) * 2022-05-07 2022-08-19 西南交通大学 WGCN-based vehicle driving behavior prediction method

Also Published As

Publication number Publication date
CN113989495B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN110942000B (en) Unmanned vehicle target detection method based on deep learning
CN109190444B (en) An implementation method of a video-based toll lane vehicle feature recognition system
CN107038422B (en) Fatigue state recognition method based on deep learning of spatial geometric constraints
CN112487908B (en) Detection and dynamic tracking method of front vehicle line-pressing behavior based on vehicle video
Kuang et al. Feature selection based on tensor decomposition and object proposal for night-time multiclass vehicle detection
CN111292366B (en) Visual driving ranging algorithm based on deep learning and edge calculation
CN110309764B (en) Multi-stage driver call-making behavior detection method based on deep learning
JP2016062610A (en) Feature model creation method and feature model creation device
CN103914698A (en) Method for recognizing and classifying road barriers based on video
CN107292933B (en) Vehicle color identification method based on BP neural network
CN111401188B (en) A traffic police gesture recognition method based on human body key point features
CN109886079A (en) A vehicle detection and tracking method
CN108960074B (en) Small-size pedestrian target detection method based on deep learning
CN111738336A (en) Image detection method based on multi-scale feature fusion
CN111753651A (en) A subway crowd abnormal behavior detection method based on two-dimensional crowd density analysis at stations
CN109948643A (en) A vehicle type classification method based on deep network fusion model
CN110298257A (en) A kind of driving behavior recognition methods based on human body multiple location feature
CN110689578A (en) An obstacle recognition method for UAV based on monocular vision
CN106326851A (en) Head detection method
CN104143091A (en) One-sample face recognition method based on improved mLBP
CN117111055A (en) Vehicle state sensing method based on thunder fusion
CN115308732A (en) Multi-target detection and tracking method integrating millimeter wave radar and depth vision
CN116071374B (en) A lane line instance segmentation method and system
CN113989495A (en) Vision-based pedestrian calling behavior identification method
CN111062311B (en) Pedestrian gesture recognition and interaction method based on depth-level separable convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant