CN113989495A - Vision-based pedestrian calling behavior identification method - Google Patents
Vision-based pedestrian calling behavior identification method Download PDFInfo
- Publication number
- CN113989495A CN113989495A CN202111362421.5A CN202111362421A CN113989495A CN 113989495 A CN113989495 A CN 113989495A CN 202111362421 A CN202111362421 A CN 202111362421A CN 113989495 A CN113989495 A CN 113989495A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- network
- random forest
- reasoning
- intention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000007637 random forest analysis Methods 0.000 claims abstract description 61
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 40
- 230000004927 fusion Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000001514 detection method Methods 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000001815 facial effect Effects 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 4
- 230000003321 amplification Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 description 18
- 238000003066 decision tree Methods 0.000 description 5
- 206010063385 Intellectualisation Diseases 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a vision-based pedestrian calling behavior identification method, which comprises the following steps of: image preprocessing and intent inference. The invention adopts a computer vision method to accurately and efficiently identify the pedestrians with the taxi calling behavior from the image, realizes that the automatic taxi driving finds the passengers more efficiently, improves the use efficiency of the automatic taxi driving, and also improves the trip efficiency of the passengers. The invention adopts the spatial reasoning network to realize the reasoning of the pedestrian car-calling behavior, reduces the dependence on time dimension information, reduces the time characteristic extraction process compared with the traditional behavior recognition algorithm, can simplify the network and improve the real-time performance of behavior reasoning. The invention adopts a set of fusion rules with logical interpretability to realize the fusion of random forests and graph convolution networks, the characteristic of logical interpretability can improve the environmental adaptability and the behavior recognition precision of the algorithm, and the fusion algorithm can realize more stable and accurate reasoning on the pedestrian car-calling intention.
Description
Technical Field
The invention belongs to the field of vehicle intellectualization, and particularly relates to a method for automatically driving a taxi to identify pedestrian behavior intention.
Background
The behavior of vehicles in a traffic scene for identifying pedestrians belongs to the category of vehicle intellectualization. The pedestrian calling intention can be accurately and effectively identified, so that the automatic driving taxi can be helped to quickly find the pedestrian with the calling intention on the road, and the method has important significance for improving the traveling efficiency of the pedestrian and the use efficiency of the automatic driving taxi and avoiding traffic jam.
The pedestrian car-calling behavior identification means that a computer vision method is used for analyzing pedestrians in a traffic scene and searching pedestrians with car-calling intentions. Traffic scenarios are highly complex, with the number and variety of traffic participants (including pedestrians, vehicles, riders, etc.) being much higher than other application scenarios, which increases the difficulty of behavior recognition. The behavior of holding a car has obvious randomness and transient characteristics compared with other behaviors of pedestrians (walking, running, riding and the like): firstly, any pedestrian in the current scene is likely to be converted into a person with the intention of calling the car at any time; in addition, the behavior of calling the car has obvious instantaneous characteristic, and the driver judges whether one person has the intention of calling the car and can realize the purpose only by a single image without considering the information of a plurality of continuous images in front of and behind the image. Based on the above two features, the conventional behavior recognition algorithm based on 3D statistical Neural Network (DCNN) and lstm (long Short Term Memory Network) cannot be applied to the train calling intention inference with transient characteristics. The gestures of pedestrians are key information for expressing the intentions of the pedestrians, most of the existing gesture recognition algorithms are mainly applied to indoor scenes, the resolution requirement of the vision-based gesture recognition algorithms on hand outlines in images is high, but the vehicle-mounted camera carried by an intelligent vehicle cannot generate the high-quality images in complex traffic scenes.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to design a pedestrian calling behavior identification method which is strong in environmental adaptability, high in identification precision and based on vision, and can accurately identify pedestrians with calling intentions in an image in real time by processing the image acquired by a vehicle-mounted camera, so that automatic driving of a taxi is helped to find passengers more efficiently.
In order to achieve the purpose, the technical scheme of the invention is as follows: a vision-based pedestrian calling behavior identification method comprises the following steps:
A. image pre-processing
The method comprises the steps of preprocessing an image by adopting a target detection algorithm and a human body key point extraction algorithm to obtain detection frames D of pedestrians and key point parameters K of the pedestrians corresponding to each detection frame, judging whether the human body has key clues of vehicle calling intentions or not by the human body face attention in the vehicle calling behavior reasoning process, and enabling the pedestrians to have high attention to a taxi in the real scene in the vehicle calling process. The inference of the facial attention is carried out from two aspects, firstly, the facial key points detected in the human body key point detection are used for inference, and the difference h between the abscissa of the left ear key point and the abscissa of the right ear key point is used for inferringpTaking sigma as an amplification factor as a reference, forming a side length of sigma hpThe square frame S of (a) as a face region; when the lateral distance h between the key point of the left ear and the key point of the nosefGreater than hpThe face of the pedestrian faces the taxi at an angle relative to the side face, namely the pedestrian has less attention to the taxi; when h is generatedfLess than hpInputting the face region S into a face attention depth network to calculate the face attention probability of the pedestrian; the facial depth of attention network includes a front network and a back network, the front network being characterizedExtracting a network, namely extracting facial features by adopting Resnet50 as a reference network; the rear network is a feature connection network composed of full connection layers, the facial features extracted by the front network are connected to obtain global features, and the global features are output as facial attention probability rhof;
B. Intention reasoning
The method adopts the combination of a random forest algorithm and a graph convolution network to carry out intention reasoning on pedestrians, and comprises the following specific steps:
b1, reasoning the relation between the connection angle between key points of the human body and the intention of the pedestrian by adopting a random forest algorithm, wherein the input of the random forest is the connection angle of the key points of the human body, in order to prevent the phenomenon of overfitting, a plurality of key point angles with strong relation with the pedestrian to call the vehicle are selected as the input of the random forest, the key point angles comprise the connection angles with the key point of the neck, the key point of the left shoulder, the key point of the right shoulder, the key point of the left elbow and the key point of the right elbow as the vertexes, and the output of the random forest is the probability rho that the pedestrian has the intention to call the vehicler。
B2, reasoning the relation between the positions of the key points of the human body and the intention of the pedestrian by adopting a graph convolution network, wherein the input of the graph convolution network is a human body graph model G (v, e), v is a node of the human body graph model, namely a human body key point, the characteristic of the node is the coordinate of the key point, and e is the edge of the human body graph model, namely the connection between the nodes. Because the size of the detection frame D obtained by target detection is not fixed, in order to reduce the influence of the size of the detection frame on intention reasoning, the image coordinates of the human body key points are converted into associated coordinates with the human body neck key points as the original points by adopting coordinate conversion:
wherein x isinewAnd yinewThe transformed abscissa and ordinate are the ith human body key point; u. ofiAnd viThe abscissa and ordinate before the transformation for the ith individual key point; u. of1And v1The abscissa and ordinate of the neck keypoint.
The process of the graph convolution network is as follows:
wherein,a is the adjacency matrix of the anthropometric dummy;is a degree matrix of the human body graph model; h(l)Is the output characteristic of the convolution of the l-th layer graph, H(l+1)The output characteristics of the convolution for the l +1 th layer diagram; w(l)A parameter matrix for the convolution of the first layer graph;is an activation function; z is the output of the graph convolution network, i.e. the probability ρ that a pedestrian has an intention to call a carg;H(z)Is the feature matrix of the last layer of graph convolution; w(z)Is the parameter matrix of the last layer of graph convolution; readout (-) is a graph readout network composed of fully connected layers, and realizes the aggregation connection of all node features in the human body graph model.
B3, algorithm fusion
Respectively obtaining the probability random forest output probability rho of the pedestrian with the car calling intention through a random forest and a graph convolution networkrSum-plot convolution network output probability ρgIn order to obtain more stable and accurate intention reasoning, a set of fusion rules with logical interpretability is provided to realize the fusion of the random forest and the graph convolution network, wherein the fusion rules are as follows:
wherein p is the probability that the pedestrian has the intention of calling the car after the fusion. When p isg> 0.5 and pr> 0.5 or pg< 0.5 and prWhen the fusion probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have the same reasoning result, and the fusion probability p isWhen p isg> 0.5 and prIf the face attention probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have different reasoning results, the reasoning result of the graph convolution network indicates that the pedestrian has the car calling intention, the reasoning result of the random forest indicates that the pedestrian does not have the car calling intention, and in order to obtain a more accurate reasoning result, the face attention probability pfAs dynamic weight pairs pgAnd prImplementing a dynamic weighted average, i.e. when pfIf the weight is more than 0.5, the pedestrian has higher car calling probability, the output of the graph convolution network is given a higher weight, and the output of the random forest is given a lower weight; when p isfIf the weight is less than 0.5, giving a higher weight to the output of the random forest, and giving a lower weight to the output of the graph convolution network; when p isg< 0.5 and prWhen the number is more than 0.5, the situation that another random forest algorithm and the graph convolution network algorithm have different reasoning results is meant, the reasoning result of the graph convolution network indicates that the pedestrian has no car calling intention, the reasoning result of the random forest indicates that the pedestrian has the car calling intention, and when p is greater than the car calling intention, the situation that the pedestrian has the car calling intention is judgedfIf the inference result of the random forest is more than 0.5, the inference result of the random forest has higher probability to be a correct result, the output of the random forest is endowed with higher weight, and the output of the graph convolution network is endowed with lower weight; on the contrary, when pfIf < 0.5, the output of the graph convolution network is given a higher weight, while the output of the random forest is given a lower weight.
Compared with the prior art, the invention has the following beneficial effects and benefits:
1. the invention adopts a computer vision method to accurately and efficiently identify the pedestrians with the taxi calling behavior from the image, realizes that the automatic taxi driving finds the passengers more efficiently, improves the use efficiency of the automatic taxi driving, and also improves the trip efficiency of the passengers.
2. The invention adopts the spatial reasoning network to realize the reasoning of the pedestrian car-calling behavior, reduces the dependence on time dimension information, reduces the time characteristic extraction process compared with the traditional behavior recognition algorithm, can simplify the network and improve the real-time performance of behavior reasoning.
3. The invention adopts a set of fusion rules with logical interpretability to realize the fusion of random forests and graph convolution networks, the characteristic of logical interpretability can improve the environmental adaptability and the behavior recognition precision of the algorithm, and the fusion algorithm can realize more stable and accurate reasoning on the pedestrian car-calling intention.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
FIG. 2 is a schematic diagram of human key points extracted by OpenPose.
Fig. 3 is a schematic diagram of a facial depth of attention network.
Fig. 4 is a schematic diagram of a random forest.
Fig. 5 is a schematic diagram of a graph convolution network.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and as shown in fig. 1, a vision-based pedestrian-summoning behavior identification method includes the following steps:
A. image pre-processing
Adopting Yolov5 as a target detection method and a human body key point extraction algorithm OpenPose to realize image preprocessing, and obtaining pedestrian detection frames D and key point parameters K of pedestrians corresponding to each detection frame, wherein the parameters of the key points are shown in FIG. 2, and the corresponding relationship between the sequences of the key points and human body parts is as follows:
the detection frame provided by the target detection can improve the accuracy of extracting the key points of the human body. Process of reasoning on car calling intentionIn the real scene, in the process of calling the taxi by the pedestrian, the pedestrian has high attention to the taxi. The present invention mainly proceeds from two aspects, firstly, the facial key points detected in the human body key point detection are used for reasoning, and the difference h between the abscissa of the key point 16 and the abscissa of the key point 17 is used for reasoningpOn the basis of the amplification factor of 1.2, a side length of σ h is formedpAs a face area, when the lateral distance h between the key point 16 and the key point 0 is larger than the width of the key pointfGreater than hpMeaning that the face of the pedestrian faces the taxi at an angle to the side, i.e. the pedestrian has less attention to the taxi, the face attention probability ρ is setfWhen h is 0.1fLess than hpTherefore, the face region S is input to a face attention depth network to calculate the face attention probability, the face attention depth network is mainly composed of two parts, the front part is a feature extraction network, facial features are extracted by using the Resnet50 as a reference network, the rear part is a feature connection network composed of full connection layers, the features extracted in the front part are connected to obtain global features, and the global features are output as the face attention probability ρf。
B. Intention reasoning
Through the step A, a target detection frame D of the pedestrian, human key points K of the pedestrian in the target detection frame and the face attention probability rho of the corresponding pedestrian can be obtainedf. The invention combines the random forest algorithm and the graph convolution network to carry out the intention reasoning of the pedestrian.
B1, the random forest mainly infers the relation between the connection angle between key points of the human body and the intention of the pedestrian. Therefore, the input of the random forest is the connection angle of key points of a human body, in order to prevent the phenomenon of overfitting, in the invention, key point angles with strong relation with a pedestrian calling car are selected as the input of the random forest, wherein the key point angles comprise the connection angles with the key points 1, 2, 3, 5 and 6 as vertexes, and the connection angles of the random forest are selected as the connection angles with the key points 1, 2, 3, 5 and 6 as vertexesThe output is that the probability that the pedestrian has the intention of calling the car is rhorThe input key point connection angles are < 318, < 6111, < 0418, < 17111, < 2618 and < 3617 with the key point 1 as the vertex; the key point 2 is used as the peak of < 4123 and < 5124; the key point 5 is used as the peak of < 156 > and < 157 >; the key point 3 is used as the peak of < 234, < 438, < 134; the key point 6 is used as the peak of the angle 567, angle 7611 and angle 167.
A schematic diagram of a random forest is shown in fig. 4, where the random forest is composed of N independent decision trees, where N is 55, different decision trees are trained using different data sets, and a corresponding model containing training parameters is obtained. Each decision tree is a specific classifier and makes independent decisions based on the input data. The decision aggregation process adopts a majority voting method, and the ratio of the number of decision trees with decision being the car-calling intention to the total number of the decision trees is output, namely the probability rho that the pedestrian has the car-calling intentionr。
B2, the graph convolution network mainly infers the relation between the positions of the key points of the human body and the intention of the pedestrian, therefore, the input of the graph convolution network is a human body graph model G (v, e), wherein v is the nodes of the human body graph model, namely the key points of the human body, the characteristics of the nodes are the coordinates of the key points, and e is the edges of the human body graph model, namely the connection between the nodes. Because the size of the detection frame D obtained by target detection is not fixed, in order to reduce the influence of the size of the detection frame on the intention inference, the image coordinates of the human body key points are converted into associated coordinates with the key point 1 as the origin by adopting coordinate conversion:
wherein x isinew、yinewTransformed abscissa and ordinate, u, for the ith individual body key pointiAnd viThe abscissa and ordinate before the transformation for the ith individual key point; u. of1And v1The abscissa and ordinate of the key point 1.
The schematic diagram of the graph convolution network is shown in fig. 5, a human body graph model is input into the graph convolution network, each node feature of a human body transfers the node feature to an adjacent node along an edge between the nodes, each node also aggregates the features transferred from the adjacent nodes to realize the transfer and aggregation of the node feature along the edge, in order to enhance the expression capability of the model, after convolution of each layer of the graph, activation function RELU is adopted to realize nonlinear mapping of the node feature, and finally, a graph reading network composed of full connected layers is adopted to realize the aggregation connection of all the node features to obtain a final classification result.
The process of graph convolution network can be summarized as:
wherein,a is the adjacency matrix of the anthropometric dummy;is a degree matrix of the human body graph model; h(l)Is the output characteristic of the convolution of the l-th layer graph, H(l+1)The output characteristics of the convolution for the l +1 th layer diagram; w(l)A parameter matrix for the convolution of the first layer graph;is the activation function RELU; z is the output of the graph convolution network, i.e. the probability ρ that a pedestrian has an intention to call a carg;H(z)Is the feature matrix of the last layer of graph convolution; w(z)Is the parameter matrix of the last layer of graph convolution; readout (-) is a graph readout network composed of fully connected layers, which enables aggregation connection of all node features in the human body graph model.
B3, algorithm fusion
Through random forest and graph convolution networksRespectively obtaining the probabilities rho of pedestrians having the intention of calling the vehiclerAnd ρgIn order to obtain more stable and accurate intention reasoning, the invention provides a set of fusion rules with logical interpretability to realize the fusion of random forests and graph convolution networks, wherein the fusion rules are as follows:
wherein p is the probability that the pedestrian has the intention of calling the car after the fusion. When p isg> 0.5 and pr> 0.5 or pg< 0.5 and prWhen the fusion probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have the same reasoning result, and the fusion probability p isWhen p isg> 0.5 and prIf the face attention p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have different reasoning results, the reasoning result of the graph convolution network indicates that the pedestrian has the car calling intention, the reasoning result of the random forest indicates that the pedestrian does not have the car calling intention, and the face attention p is used for obtaining a more accurate reasoning resultfAs dynamic weight pairs pgAnd prImplementing a dynamic weighted average, i.e. when pfIf the weight is more than 0.5, the pedestrian has higher car calling probability, the output of the graph convolution network is given a higher weight, and the output of the random forest is given a lower weight; when p isfIf the weight is less than 0.5, giving a higher weight to the output of the random forest, and giving a lower weight to the output of the graph convolution network; when p isg< 0.5 and prWhen the number is more than 0.5, the situation that another random forest algorithm and the graph convolution network algorithm have different reasoning results is meant, the reasoning result of the graph convolution network indicates that the pedestrian has no car calling intention, the reasoning result of the random forest indicates that the pedestrian has the car calling intention, and when p is greater than the car calling intention, the situation that the pedestrian has the car calling intention is judgedfIf the inference result of the random forest is more than 0.5, the inference result of the random forest has higher probability of being the correct result, the output of the random forest is endowed with higher weight, and the graph convolution networkThe output is given lower weight; on the contrary, when pfIf < 0.5, the output of the graph convolution network is given a higher weight, while the output of the random forest is given a lower weight.
The present invention is not limited to the embodiment, and any equivalent idea or change within the technical scope of the present invention is to be regarded as the protection scope of the present invention.
Claims (1)
1. A pedestrian calling behavior identification method based on vision is characterized in that: the method comprises the following steps:
A. image pre-processing
The method comprises the steps that preprocessing of images is achieved through a target detection algorithm and a human body key point extraction algorithm, detection frames D of pedestrians and key point parameters K of the pedestrians corresponding to the detection frames are obtained, in the process of vehicle calling behavior reasoning, the face attention of a human body is a key clue for judging whether the human body has vehicle calling intentions, and in a real scene, in the process of vehicle calling by the pedestrians, the pedestrians have high attention to a taxi; the inference of the facial attention is carried out from two aspects, firstly, the facial key points detected in the human body key point detection are used for inference, and the difference h between the abscissa of the left ear key point and the abscissa of the right ear key point is used for inferringpTaking sigma as an amplification factor as a reference, forming a side length of sigma hpThe square frame S of (a) as a face region; when the lateral distance h between the key point of the left ear and the key point of the nosefGreater than hpThe face of the pedestrian faces the taxi at an angle relative to the side face, namely the pedestrian has less attention to the taxi; when h is generatedfLess than hpInputting the face region S into a face attention depth network to calculate the face attention probability of the pedestrian; the face attention depth network comprises a front network and a rear network, wherein the front network is a feature extraction network, and face features are extracted by adopting Resnet50 as a reference network; the rear network is a feature connection network composed of full connection layers, the facial features extracted by the front network are connected to obtain global features, and the global features are output as facial attention probability rhof;
B. Intention reasoning
The method adopts the combination of a random forest algorithm and a graph convolution network to carry out intention reasoning on pedestrians, and comprises the following specific steps:
b1, reasoning the relation between the connection angle between key points of the human body and the intention of the pedestrian by adopting a random forest algorithm, wherein the input of the random forest is the connection angle of the key points of the human body, in order to prevent the phenomenon of overfitting, a plurality of key point angles with strong relation with the pedestrian to call the vehicle are selected as the input of the random forest, the key point angles comprise the connection angles with the key point of the neck, the key point of the left shoulder, the key point of the right shoulder, the key point of the left elbow and the key point of the right elbow as the vertexes, and the output of the random forest is the probability rho that the pedestrian has the intention to call the vehicler;
B2, reasoning the relation between the positions of the key points of the human body and the intention of the pedestrian by adopting a graph convolution network, wherein the input of the graph convolution network is a human body graph model G (v, e), v is a node of the human body graph model, namely a human body key point, the characteristic of the node is the coordinate of the key point, and e is the edge of the human body graph model, namely the connection between the nodes; because the size of the detection frame D obtained by target detection is not fixed, in order to reduce the influence of the size of the detection frame on intention reasoning, the image coordinates of the human body key points are converted into associated coordinates with the human body neck key points as the original points by adopting coordinate conversion:
wherein x isinewAnd yinewThe transformed abscissa and ordinate are the ith human body key point; u. ofiAnd viThe abscissa and ordinate before the transformation for the ith individual key point; u. of1And v1The abscissa and the ordinate of the neck key point are shown;
the process of the graph convolution network is as follows:
wherein,a is the adjacency matrix of the anthropometric dummy;is a degree matrix of the human body graph model; h(l)Is the output characteristic of the convolution of the l-th layer graph, H(l+1)The output characteristics of the convolution for the l +1 th layer diagram; w(l)A parameter matrix for the convolution of the first layer graph;is an activation function; z is the output of the graph convolution network, i.e. the probability ρ that a pedestrian has an intention to call a carg;H(z)Is the feature matrix of the last layer of graph convolution; w(z)Is the parameter matrix of the last layer of graph convolution; readout (·) is a graph reading network composed of full connection layers, and realizes the aggregation connection of all node characteristics in the human body graph model;
b3, algorithm fusion
Respectively obtaining the probability random forest output probability rho of the pedestrian with the car calling intention through a random forest and a graph convolution networkrSum-plot convolution network output probability ρgIn order to obtain more stable and accurate intention reasoning, a set of fusion rules with logical interpretability is provided to realize the fusion of the random forest and the graph convolution network, wherein the fusion rules are as follows:
wherein p is the probability that the pedestrian has the intention of calling the car after the fusion; when p isg> 0.5 and pr> 0.5 or pg< 0.5 and prWhen the number is less than 0.5, the random forest algorithm and the graph convolution network algorithm have the same reasoning knotIf the result is positive, the fusion probability p isWhen p isg> 0.5 and prIf the face attention probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have different reasoning results, the reasoning result of the graph convolution network indicates that the pedestrian has the car calling intention, the reasoning result of the random forest indicates that the pedestrian does not have the car calling intention, and in order to obtain a more accurate reasoning result, the face attention probability pfAs dynamic weight pairs pgAnd prImplementing a dynamic weighted average, i.e. when pfIf the weight is more than 0.5, the pedestrian has higher car calling probability, the output of the graph convolution network is given a higher weight, and the output of the random forest is given a lower weight; when p isfIf the weight is less than 0.5, giving a higher weight to the output of the random forest, and giving a lower weight to the output of the graph convolution network; when p isg< 0.5 and prWhen the number is more than 0.5, the situation that another random forest algorithm and the graph convolution network algorithm have different reasoning results is meant, the reasoning result of the graph convolution network indicates that the pedestrian has no car calling intention, the reasoning result of the random forest indicates that the pedestrian has the car calling intention, and when p is greater than the car calling intention, the situation that the pedestrian has the car calling intention is judgedfIf the inference result of the random forest is more than 0.5, the inference result of the random forest has higher probability to be a correct result, the output of the random forest is endowed with higher weight, and the output of the graph convolution network is endowed with lower weight; on the contrary, when pfIf < 0.5, the output of the graph convolution network is given a higher weight, while the output of the random forest is given a lower weight.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111362421.5A CN113989495B (en) | 2021-11-17 | 2021-11-17 | Pedestrian calling behavior recognition method based on vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111362421.5A CN113989495B (en) | 2021-11-17 | 2021-11-17 | Pedestrian calling behavior recognition method based on vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113989495A true CN113989495A (en) | 2022-01-28 |
CN113989495B CN113989495B (en) | 2024-04-26 |
Family
ID=79749065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111362421.5A Active CN113989495B (en) | 2021-11-17 | 2021-11-17 | Pedestrian calling behavior recognition method based on vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113989495B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926823A (en) * | 2022-05-07 | 2022-08-19 | 西南交通大学 | WGCN-based vehicle driving behavior prediction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
KR20200121206A (en) * | 2019-04-15 | 2020-10-23 | 계명대학교 산학협력단 | Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof |
CN112052802A (en) * | 2020-09-09 | 2020-12-08 | 上海工程技术大学 | Front vehicle behavior identification method based on machine vision |
CN113255543A (en) * | 2021-06-02 | 2021-08-13 | 西安电子科技大学 | Facial expression recognition method based on graph convolution network |
-
2021
- 2021-11-17 CN CN202111362421.5A patent/CN113989495B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
KR20200121206A (en) * | 2019-04-15 | 2020-10-23 | 계명대학교 산학협력단 | Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof |
CN112052802A (en) * | 2020-09-09 | 2020-12-08 | 上海工程技术大学 | Front vehicle behavior identification method based on machine vision |
CN113255543A (en) * | 2021-06-02 | 2021-08-13 | 西安电子科技大学 | Facial expression recognition method based on graph convolution network |
Non-Patent Citations (1)
Title |
---|
杜启亮;黄理广;田联房;黄迪臻;靳守杰;李淼;: "基于视频监控的手扶电梯乘客异常行为识别", 华南理工大学学报(自然科学版), no. 08, 15 August 2020 (2020-08-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926823A (en) * | 2022-05-07 | 2022-08-19 | 西南交通大学 | WGCN-based vehicle driving behavior prediction method |
Also Published As
Publication number | Publication date |
---|---|
CN113989495B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A unified framework for concurrent pedestrian and cyclist detection | |
Kortylewski et al. | Combining compositional models and deep networks for robust object classification under occlusion | |
CN108875608B (en) | Motor vehicle traffic signal identification method based on deep learning | |
Silva et al. | Automatic detection of motorcyclists without helmet | |
CN109190444B (en) | Method for realizing video-based toll lane vehicle feature recognition system | |
CN104778453B (en) | A kind of night pedestrian detection method based on infrared pedestrian's brightness statistics feature | |
Kuang et al. | Feature selection based on tensor decomposition and object proposal for night-time multiclass vehicle detection | |
Yogameena et al. | Deep learning‐based helmet wear analysis of a motorcycle rider for intelligent surveillance system | |
CN107066933A (en) | A kind of road sign recognition methods and system | |
CN105590102A (en) | Front car face identification method based on deep learning | |
Lee et al. | Recognizing pedestrian’s unsafe behaviors in far-infrared imagery at night | |
CN108647700B (en) | Multitask vehicle part identification model, method and system based on deep learning | |
CN109886161B (en) | Road traffic identification recognition method based on likelihood clustering and convolutional neural network | |
JP2016062610A (en) | Feature model creation method and feature model creation device | |
Ming et al. | Vehicle detection using tail light segmentation | |
CN107292933B (en) | Vehicle color identification method based on BP neural network | |
CN108960074B (en) | Small-size pedestrian target detection method based on deep learning | |
CN107315998A (en) | Vehicle class division method and system based on lane line | |
CN109784216B (en) | Vehicle-mounted thermal imaging pedestrian detection Rois extraction method based on probability map | |
CN111832461A (en) | Non-motor vehicle riding personnel helmet wearing detection method based on video stream | |
CN112381101B (en) | Infrared road scene segmentation method based on category prototype regression | |
CN115280373A (en) | Managing occlusions in twin network tracking using structured dropping | |
Cai et al. | Vehicle Detection Based on Deep Dual‐Vehicle Deformable Part Models | |
CN113989495A (en) | Vision-based pedestrian calling behavior identification method | |
Zhou et al. | A novel object detection method in city aerial image based on deformable convolutional networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |