CN113989495A - Vision-based pedestrian calling behavior identification method - Google Patents

Vision-based pedestrian calling behavior identification method Download PDF

Info

Publication number
CN113989495A
CN113989495A CN202111362421.5A CN202111362421A CN113989495A CN 113989495 A CN113989495 A CN 113989495A CN 202111362421 A CN202111362421 A CN 202111362421A CN 113989495 A CN113989495 A CN 113989495A
Authority
CN
China
Prior art keywords
pedestrian
network
random forest
reasoning
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111362421.5A
Other languages
Chinese (zh)
Other versions
CN113989495B (en
Inventor
连静
王政皓
李琳辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202111362421.5A priority Critical patent/CN113989495B/en
Publication of CN113989495A publication Critical patent/CN113989495A/en
Application granted granted Critical
Publication of CN113989495B publication Critical patent/CN113989495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a vision-based pedestrian calling behavior identification method, which comprises the following steps of: image preprocessing and intent inference. The invention adopts a computer vision method to accurately and efficiently identify the pedestrians with the taxi calling behavior from the image, realizes that the automatic taxi driving finds the passengers more efficiently, improves the use efficiency of the automatic taxi driving, and also improves the trip efficiency of the passengers. The invention adopts the spatial reasoning network to realize the reasoning of the pedestrian car-calling behavior, reduces the dependence on time dimension information, reduces the time characteristic extraction process compared with the traditional behavior recognition algorithm, can simplify the network and improve the real-time performance of behavior reasoning. The invention adopts a set of fusion rules with logical interpretability to realize the fusion of random forests and graph convolution networks, the characteristic of logical interpretability can improve the environmental adaptability and the behavior recognition precision of the algorithm, and the fusion algorithm can realize more stable and accurate reasoning on the pedestrian car-calling intention.

Description

Vision-based pedestrian calling behavior identification method
Technical Field
The invention belongs to the field of vehicle intellectualization, and particularly relates to a method for automatically driving a taxi to identify pedestrian behavior intention.
Background
The behavior of vehicles in a traffic scene for identifying pedestrians belongs to the category of vehicle intellectualization. The pedestrian calling intention can be accurately and effectively identified, so that the automatic driving taxi can be helped to quickly find the pedestrian with the calling intention on the road, and the method has important significance for improving the traveling efficiency of the pedestrian and the use efficiency of the automatic driving taxi and avoiding traffic jam.
The pedestrian car-calling behavior identification means that a computer vision method is used for analyzing pedestrians in a traffic scene and searching pedestrians with car-calling intentions. Traffic scenarios are highly complex, with the number and variety of traffic participants (including pedestrians, vehicles, riders, etc.) being much higher than other application scenarios, which increases the difficulty of behavior recognition. The behavior of holding a car has obvious randomness and transient characteristics compared with other behaviors of pedestrians (walking, running, riding and the like): firstly, any pedestrian in the current scene is likely to be converted into a person with the intention of calling the car at any time; in addition, the behavior of calling the car has obvious instantaneous characteristic, and the driver judges whether one person has the intention of calling the car and can realize the purpose only by a single image without considering the information of a plurality of continuous images in front of and behind the image. Based on the above two features, the conventional behavior recognition algorithm based on 3D statistical Neural Network (DCNN) and lstm (long Short Term Memory Network) cannot be applied to the train calling intention inference with transient characteristics. The gestures of pedestrians are key information for expressing the intentions of the pedestrians, most of the existing gesture recognition algorithms are mainly applied to indoor scenes, the resolution requirement of the vision-based gesture recognition algorithms on hand outlines in images is high, but the vehicle-mounted camera carried by an intelligent vehicle cannot generate the high-quality images in complex traffic scenes.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to design a pedestrian calling behavior identification method which is strong in environmental adaptability, high in identification precision and based on vision, and can accurately identify pedestrians with calling intentions in an image in real time by processing the image acquired by a vehicle-mounted camera, so that automatic driving of a taxi is helped to find passengers more efficiently.
In order to achieve the purpose, the technical scheme of the invention is as follows: a vision-based pedestrian calling behavior identification method comprises the following steps:
A. image pre-processing
The method comprises the steps of preprocessing an image by adopting a target detection algorithm and a human body key point extraction algorithm to obtain detection frames D of pedestrians and key point parameters K of the pedestrians corresponding to each detection frame, judging whether the human body has key clues of vehicle calling intentions or not by the human body face attention in the vehicle calling behavior reasoning process, and enabling the pedestrians to have high attention to a taxi in the real scene in the vehicle calling process. The inference of the facial attention is carried out from two aspects, firstly, the facial key points detected in the human body key point detection are used for inference, and the difference h between the abscissa of the left ear key point and the abscissa of the right ear key point is used for inferringpTaking sigma as an amplification factor as a reference, forming a side length of sigma hpThe square frame S of (a) as a face region; when the lateral distance h between the key point of the left ear and the key point of the nosefGreater than hpThe face of the pedestrian faces the taxi at an angle relative to the side face, namely the pedestrian has less attention to the taxi; when h is generatedfLess than hpInputting the face region S into a face attention depth network to calculate the face attention probability of the pedestrian; the facial depth of attention network includes a front network and a back network, the front network being characterizedExtracting a network, namely extracting facial features by adopting Resnet50 as a reference network; the rear network is a feature connection network composed of full connection layers, the facial features extracted by the front network are connected to obtain global features, and the global features are output as facial attention probability rhof
B. Intention reasoning
The method adopts the combination of a random forest algorithm and a graph convolution network to carry out intention reasoning on pedestrians, and comprises the following specific steps:
b1, reasoning the relation between the connection angle between key points of the human body and the intention of the pedestrian by adopting a random forest algorithm, wherein the input of the random forest is the connection angle of the key points of the human body, in order to prevent the phenomenon of overfitting, a plurality of key point angles with strong relation with the pedestrian to call the vehicle are selected as the input of the random forest, the key point angles comprise the connection angles with the key point of the neck, the key point of the left shoulder, the key point of the right shoulder, the key point of the left elbow and the key point of the right elbow as the vertexes, and the output of the random forest is the probability rho that the pedestrian has the intention to call the vehicler
B2, reasoning the relation between the positions of the key points of the human body and the intention of the pedestrian by adopting a graph convolution network, wherein the input of the graph convolution network is a human body graph model G (v, e), v is a node of the human body graph model, namely a human body key point, the characteristic of the node is the coordinate of the key point, and e is the edge of the human body graph model, namely the connection between the nodes. Because the size of the detection frame D obtained by target detection is not fixed, in order to reduce the influence of the size of the detection frame on intention reasoning, the image coordinates of the human body key points are converted into associated coordinates with the human body neck key points as the original points by adopting coordinate conversion:
Figure BDA0003359810260000031
wherein x isinewAnd yinewThe transformed abscissa and ordinate are the ith human body key point; u. ofiAnd viThe abscissa and ordinate before the transformation for the ith individual key point; u. of1And v1The abscissa and ordinate of the neck keypoint.
The process of the graph convolution network is as follows:
Figure BDA0003359810260000032
Figure BDA0003359810260000041
wherein,
Figure BDA0003359810260000042
a is the adjacency matrix of the anthropometric dummy;
Figure BDA0003359810260000043
is a degree matrix of the human body graph model; h(l)Is the output characteristic of the convolution of the l-th layer graph, H(l+1)The output characteristics of the convolution for the l +1 th layer diagram; w(l)A parameter matrix for the convolution of the first layer graph;
Figure BDA0003359810260000044
is an activation function; z is the output of the graph convolution network, i.e. the probability ρ that a pedestrian has an intention to call a carg;H(z)Is the feature matrix of the last layer of graph convolution; w(z)Is the parameter matrix of the last layer of graph convolution; readout (-) is a graph readout network composed of fully connected layers, and realizes the aggregation connection of all node features in the human body graph model.
B3, algorithm fusion
Respectively obtaining the probability random forest output probability rho of the pedestrian with the car calling intention through a random forest and a graph convolution networkrSum-plot convolution network output probability ρgIn order to obtain more stable and accurate intention reasoning, a set of fusion rules with logical interpretability is provided to realize the fusion of the random forest and the graph convolution network, wherein the fusion rules are as follows:
Figure BDA0003359810260000045
wherein p is the probability that the pedestrian has the intention of calling the car after the fusion. When p isg> 0.5 and pr> 0.5 or pg< 0.5 and prWhen the fusion probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have the same reasoning result, and the fusion probability p is
Figure BDA0003359810260000046
When p isg> 0.5 and prIf the face attention probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have different reasoning results, the reasoning result of the graph convolution network indicates that the pedestrian has the car calling intention, the reasoning result of the random forest indicates that the pedestrian does not have the car calling intention, and in order to obtain a more accurate reasoning result, the face attention probability pfAs dynamic weight pairs pgAnd prImplementing a dynamic weighted average, i.e. when pfIf the weight is more than 0.5, the pedestrian has higher car calling probability, the output of the graph convolution network is given a higher weight, and the output of the random forest is given a lower weight; when p isfIf the weight is less than 0.5, giving a higher weight to the output of the random forest, and giving a lower weight to the output of the graph convolution network; when p isg< 0.5 and prWhen the number is more than 0.5, the situation that another random forest algorithm and the graph convolution network algorithm have different reasoning results is meant, the reasoning result of the graph convolution network indicates that the pedestrian has no car calling intention, the reasoning result of the random forest indicates that the pedestrian has the car calling intention, and when p is greater than the car calling intention, the situation that the pedestrian has the car calling intention is judgedfIf the inference result of the random forest is more than 0.5, the inference result of the random forest has higher probability to be a correct result, the output of the random forest is endowed with higher weight, and the output of the graph convolution network is endowed with lower weight; on the contrary, when pfIf < 0.5, the output of the graph convolution network is given a higher weight, while the output of the random forest is given a lower weight.
Compared with the prior art, the invention has the following beneficial effects and benefits:
1. the invention adopts a computer vision method to accurately and efficiently identify the pedestrians with the taxi calling behavior from the image, realizes that the automatic taxi driving finds the passengers more efficiently, improves the use efficiency of the automatic taxi driving, and also improves the trip efficiency of the passengers.
2. The invention adopts the spatial reasoning network to realize the reasoning of the pedestrian car-calling behavior, reduces the dependence on time dimension information, reduces the time characteristic extraction process compared with the traditional behavior recognition algorithm, can simplify the network and improve the real-time performance of behavior reasoning.
3. The invention adopts a set of fusion rules with logical interpretability to realize the fusion of random forests and graph convolution networks, the characteristic of logical interpretability can improve the environmental adaptability and the behavior recognition precision of the algorithm, and the fusion algorithm can realize more stable and accurate reasoning on the pedestrian car-calling intention.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
FIG. 2 is a schematic diagram of human key points extracted by OpenPose.
Fig. 3 is a schematic diagram of a facial depth of attention network.
Fig. 4 is a schematic diagram of a random forest.
Fig. 5 is a schematic diagram of a graph convolution network.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and as shown in fig. 1, a vision-based pedestrian-summoning behavior identification method includes the following steps:
A. image pre-processing
Adopting Yolov5 as a target detection method and a human body key point extraction algorithm OpenPose to realize image preprocessing, and obtaining pedestrian detection frames D and key point parameters K of pedestrians corresponding to each detection frame, wherein the parameters of the key points are shown in FIG. 2, and the corresponding relationship between the sequences of the key points and human body parts is as follows:
Figure BDA0003359810260000061
the detection frame provided by the target detection can improve the accuracy of extracting the key points of the human body. Process of reasoning on car calling intentionIn the real scene, in the process of calling the taxi by the pedestrian, the pedestrian has high attention to the taxi. The present invention mainly proceeds from two aspects, firstly, the facial key points detected in the human body key point detection are used for reasoning, and the difference h between the abscissa of the key point 16 and the abscissa of the key point 17 is used for reasoningpOn the basis of the amplification factor of 1.2, a side length of σ h is formedpAs a face area, when the lateral distance h between the key point 16 and the key point 0 is larger than the width of the key pointfGreater than hpMeaning that the face of the pedestrian faces the taxi at an angle to the side, i.e. the pedestrian has less attention to the taxi, the face attention probability ρ is setfWhen h is 0.1fLess than hpTherefore, the face region S is input to a face attention depth network to calculate the face attention probability, the face attention depth network is mainly composed of two parts, the front part is a feature extraction network, facial features are extracted by using the Resnet50 as a reference network, the rear part is a feature connection network composed of full connection layers, the features extracted in the front part are connected to obtain global features, and the global features are output as the face attention probability ρf
B. Intention reasoning
Through the step A, a target detection frame D of the pedestrian, human key points K of the pedestrian in the target detection frame and the face attention probability rho of the corresponding pedestrian can be obtainedf. The invention combines the random forest algorithm and the graph convolution network to carry out the intention reasoning of the pedestrian.
B1, the random forest mainly infers the relation between the connection angle between key points of the human body and the intention of the pedestrian. Therefore, the input of the random forest is the connection angle of key points of a human body, in order to prevent the phenomenon of overfitting, in the invention, key point angles with strong relation with a pedestrian calling car are selected as the input of the random forest, wherein the key point angles comprise the connection angles with the key points 1, 2, 3, 5 and 6 as vertexes, and the connection angles of the random forest are selected as the connection angles with the key points 1, 2, 3, 5 and 6 as vertexesThe output is that the probability that the pedestrian has the intention of calling the car is rhorThe input key point connection angles are < 318, < 6111, < 0418, < 17111, < 2618 and < 3617 with the key point 1 as the vertex; the key point 2 is used as the peak of < 4123 and < 5124; the key point 5 is used as the peak of < 156 > and < 157 >; the key point 3 is used as the peak of < 234, < 438, < 134; the key point 6 is used as the peak of the angle 567, angle 7611 and angle 167.
A schematic diagram of a random forest is shown in fig. 4, where the random forest is composed of N independent decision trees, where N is 55, different decision trees are trained using different data sets, and a corresponding model containing training parameters is obtained. Each decision tree is a specific classifier and makes independent decisions based on the input data. The decision aggregation process adopts a majority voting method, and the ratio of the number of decision trees with decision being the car-calling intention to the total number of the decision trees is output, namely the probability rho that the pedestrian has the car-calling intentionr
B2, the graph convolution network mainly infers the relation between the positions of the key points of the human body and the intention of the pedestrian, therefore, the input of the graph convolution network is a human body graph model G (v, e), wherein v is the nodes of the human body graph model, namely the key points of the human body, the characteristics of the nodes are the coordinates of the key points, and e is the edges of the human body graph model, namely the connection between the nodes. Because the size of the detection frame D obtained by target detection is not fixed, in order to reduce the influence of the size of the detection frame on the intention inference, the image coordinates of the human body key points are converted into associated coordinates with the key point 1 as the origin by adopting coordinate conversion:
Figure BDA0003359810260000081
wherein x isinew、yinewTransformed abscissa and ordinate, u, for the ith individual body key pointiAnd viThe abscissa and ordinate before the transformation for the ith individual key point; u. of1And v1The abscissa and ordinate of the key point 1.
The schematic diagram of the graph convolution network is shown in fig. 5, a human body graph model is input into the graph convolution network, each node feature of a human body transfers the node feature to an adjacent node along an edge between the nodes, each node also aggregates the features transferred from the adjacent nodes to realize the transfer and aggregation of the node feature along the edge, in order to enhance the expression capability of the model, after convolution of each layer of the graph, activation function RELU is adopted to realize nonlinear mapping of the node feature, and finally, a graph reading network composed of full connected layers is adopted to realize the aggregation connection of all the node features to obtain a final classification result.
The process of graph convolution network can be summarized as:
Figure BDA0003359810260000082
Figure BDA0003359810260000091
wherein,
Figure BDA0003359810260000092
a is the adjacency matrix of the anthropometric dummy;
Figure BDA0003359810260000093
is a degree matrix of the human body graph model; h(l)Is the output characteristic of the convolution of the l-th layer graph, H(l+1)The output characteristics of the convolution for the l +1 th layer diagram; w(l)A parameter matrix for the convolution of the first layer graph;
Figure BDA0003359810260000094
is the activation function RELU; z is the output of the graph convolution network, i.e. the probability ρ that a pedestrian has an intention to call a carg;H(z)Is the feature matrix of the last layer of graph convolution; w(z)Is the parameter matrix of the last layer of graph convolution; readout (-) is a graph readout network composed of fully connected layers, which enables aggregation connection of all node features in the human body graph model.
B3, algorithm fusion
Through random forest and graph convolution networksRespectively obtaining the probabilities rho of pedestrians having the intention of calling the vehiclerAnd ρgIn order to obtain more stable and accurate intention reasoning, the invention provides a set of fusion rules with logical interpretability to realize the fusion of random forests and graph convolution networks, wherein the fusion rules are as follows:
Figure BDA0003359810260000095
wherein p is the probability that the pedestrian has the intention of calling the car after the fusion. When p isg> 0.5 and pr> 0.5 or pg< 0.5 and prWhen the fusion probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have the same reasoning result, and the fusion probability p is
Figure BDA0003359810260000096
When p isg> 0.5 and prIf the face attention p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have different reasoning results, the reasoning result of the graph convolution network indicates that the pedestrian has the car calling intention, the reasoning result of the random forest indicates that the pedestrian does not have the car calling intention, and the face attention p is used for obtaining a more accurate reasoning resultfAs dynamic weight pairs pgAnd prImplementing a dynamic weighted average, i.e. when pfIf the weight is more than 0.5, the pedestrian has higher car calling probability, the output of the graph convolution network is given a higher weight, and the output of the random forest is given a lower weight; when p isfIf the weight is less than 0.5, giving a higher weight to the output of the random forest, and giving a lower weight to the output of the graph convolution network; when p isg< 0.5 and prWhen the number is more than 0.5, the situation that another random forest algorithm and the graph convolution network algorithm have different reasoning results is meant, the reasoning result of the graph convolution network indicates that the pedestrian has no car calling intention, the reasoning result of the random forest indicates that the pedestrian has the car calling intention, and when p is greater than the car calling intention, the situation that the pedestrian has the car calling intention is judgedfIf the inference result of the random forest is more than 0.5, the inference result of the random forest has higher probability of being the correct result, the output of the random forest is endowed with higher weight, and the graph convolution networkThe output is given lower weight; on the contrary, when pfIf < 0.5, the output of the graph convolution network is given a higher weight, while the output of the random forest is given a lower weight.
The present invention is not limited to the embodiment, and any equivalent idea or change within the technical scope of the present invention is to be regarded as the protection scope of the present invention.

Claims (1)

1. A pedestrian calling behavior identification method based on vision is characterized in that: the method comprises the following steps:
A. image pre-processing
The method comprises the steps that preprocessing of images is achieved through a target detection algorithm and a human body key point extraction algorithm, detection frames D of pedestrians and key point parameters K of the pedestrians corresponding to the detection frames are obtained, in the process of vehicle calling behavior reasoning, the face attention of a human body is a key clue for judging whether the human body has vehicle calling intentions, and in a real scene, in the process of vehicle calling by the pedestrians, the pedestrians have high attention to a taxi; the inference of the facial attention is carried out from two aspects, firstly, the facial key points detected in the human body key point detection are used for inference, and the difference h between the abscissa of the left ear key point and the abscissa of the right ear key point is used for inferringpTaking sigma as an amplification factor as a reference, forming a side length of sigma hpThe square frame S of (a) as a face region; when the lateral distance h between the key point of the left ear and the key point of the nosefGreater than hpThe face of the pedestrian faces the taxi at an angle relative to the side face, namely the pedestrian has less attention to the taxi; when h is generatedfLess than hpInputting the face region S into a face attention depth network to calculate the face attention probability of the pedestrian; the face attention depth network comprises a front network and a rear network, wherein the front network is a feature extraction network, and face features are extracted by adopting Resnet50 as a reference network; the rear network is a feature connection network composed of full connection layers, the facial features extracted by the front network are connected to obtain global features, and the global features are output as facial attention probability rhof
B. Intention reasoning
The method adopts the combination of a random forest algorithm and a graph convolution network to carry out intention reasoning on pedestrians, and comprises the following specific steps:
b1, reasoning the relation between the connection angle between key points of the human body and the intention of the pedestrian by adopting a random forest algorithm, wherein the input of the random forest is the connection angle of the key points of the human body, in order to prevent the phenomenon of overfitting, a plurality of key point angles with strong relation with the pedestrian to call the vehicle are selected as the input of the random forest, the key point angles comprise the connection angles with the key point of the neck, the key point of the left shoulder, the key point of the right shoulder, the key point of the left elbow and the key point of the right elbow as the vertexes, and the output of the random forest is the probability rho that the pedestrian has the intention to call the vehicler
B2, reasoning the relation between the positions of the key points of the human body and the intention of the pedestrian by adopting a graph convolution network, wherein the input of the graph convolution network is a human body graph model G (v, e), v is a node of the human body graph model, namely a human body key point, the characteristic of the node is the coordinate of the key point, and e is the edge of the human body graph model, namely the connection between the nodes; because the size of the detection frame D obtained by target detection is not fixed, in order to reduce the influence of the size of the detection frame on intention reasoning, the image coordinates of the human body key points are converted into associated coordinates with the human body neck key points as the original points by adopting coordinate conversion:
Figure FDA0003359810250000021
wherein x isinewAnd yinewThe transformed abscissa and ordinate are the ith human body key point; u. ofiAnd viThe abscissa and ordinate before the transformation for the ith individual key point; u. of1And v1The abscissa and the ordinate of the neck key point are shown;
the process of the graph convolution network is as follows:
Figure FDA0003359810250000022
Figure FDA0003359810250000023
wherein,
Figure FDA0003359810250000024
a is the adjacency matrix of the anthropometric dummy;
Figure FDA0003359810250000025
is a degree matrix of the human body graph model; h(l)Is the output characteristic of the convolution of the l-th layer graph, H(l+1)The output characteristics of the convolution for the l +1 th layer diagram; w(l)A parameter matrix for the convolution of the first layer graph;
Figure FDA0003359810250000026
is an activation function; z is the output of the graph convolution network, i.e. the probability ρ that a pedestrian has an intention to call a carg;H(z)Is the feature matrix of the last layer of graph convolution; w(z)Is the parameter matrix of the last layer of graph convolution; readout (·) is a graph reading network composed of full connection layers, and realizes the aggregation connection of all node characteristics in the human body graph model;
b3, algorithm fusion
Respectively obtaining the probability random forest output probability rho of the pedestrian with the car calling intention through a random forest and a graph convolution networkrSum-plot convolution network output probability ρgIn order to obtain more stable and accurate intention reasoning, a set of fusion rules with logical interpretability is provided to realize the fusion of the random forest and the graph convolution network, wherein the fusion rules are as follows:
Figure FDA0003359810250000031
wherein p is the probability that the pedestrian has the intention of calling the car after the fusion; when p isg> 0.5 and pr> 0.5 or pg< 0.5 and prWhen the number is less than 0.5, the random forest algorithm and the graph convolution network algorithm have the same reasoning knotIf the result is positive, the fusion probability p is
Figure FDA0003359810250000032
When p isg> 0.5 and prIf the face attention probability p is less than 0.5, the random forest algorithm and the graph convolution network algorithm have different reasoning results, the reasoning result of the graph convolution network indicates that the pedestrian has the car calling intention, the reasoning result of the random forest indicates that the pedestrian does not have the car calling intention, and in order to obtain a more accurate reasoning result, the face attention probability pfAs dynamic weight pairs pgAnd prImplementing a dynamic weighted average, i.e. when pfIf the weight is more than 0.5, the pedestrian has higher car calling probability, the output of the graph convolution network is given a higher weight, and the output of the random forest is given a lower weight; when p isfIf the weight is less than 0.5, giving a higher weight to the output of the random forest, and giving a lower weight to the output of the graph convolution network; when p isg< 0.5 and prWhen the number is more than 0.5, the situation that another random forest algorithm and the graph convolution network algorithm have different reasoning results is meant, the reasoning result of the graph convolution network indicates that the pedestrian has no car calling intention, the reasoning result of the random forest indicates that the pedestrian has the car calling intention, and when p is greater than the car calling intention, the situation that the pedestrian has the car calling intention is judgedfIf the inference result of the random forest is more than 0.5, the inference result of the random forest has higher probability to be a correct result, the output of the random forest is endowed with higher weight, and the output of the graph convolution network is endowed with lower weight; on the contrary, when pfIf < 0.5, the output of the graph convolution network is given a higher weight, while the output of the random forest is given a lower weight.
CN202111362421.5A 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision Active CN113989495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111362421.5A CN113989495B (en) 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111362421.5A CN113989495B (en) 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision

Publications (2)

Publication Number Publication Date
CN113989495A true CN113989495A (en) 2022-01-28
CN113989495B CN113989495B (en) 2024-04-26

Family

ID=79749065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111362421.5A Active CN113989495B (en) 2021-11-17 2021-11-17 Pedestrian calling behavior recognition method based on vision

Country Status (1)

Country Link
CN (1) CN113989495B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926823A (en) * 2022-05-07 2022-08-19 西南交通大学 WGCN-based vehicle driving behavior prediction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117701A (en) * 2018-06-05 2019-01-01 东南大学 Pedestrian's intension recognizing method based on picture scroll product
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN112052802A (en) * 2020-09-09 2020-12-08 上海工程技术大学 Front vehicle behavior identification method based on machine vision
CN113255543A (en) * 2021-06-02 2021-08-13 西安电子科技大学 Facial expression recognition method based on graph convolution network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117701A (en) * 2018-06-05 2019-01-01 东南大学 Pedestrian's intension recognizing method based on picture scroll product
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN112052802A (en) * 2020-09-09 2020-12-08 上海工程技术大学 Front vehicle behavior identification method based on machine vision
CN113255543A (en) * 2021-06-02 2021-08-13 西安电子科技大学 Facial expression recognition method based on graph convolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杜启亮;黄理广;田联房;黄迪臻;靳守杰;李淼;: "基于视频监控的手扶电梯乘客异常行为识别", 华南理工大学学报(自然科学版), no. 08, 15 August 2020 (2020-08-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926823A (en) * 2022-05-07 2022-08-19 西南交通大学 WGCN-based vehicle driving behavior prediction method

Also Published As

Publication number Publication date
CN113989495B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
Li et al. A unified framework for concurrent pedestrian and cyclist detection
Kortylewski et al. Combining compositional models and deep networks for robust object classification under occlusion
CN108875608B (en) Motor vehicle traffic signal identification method based on deep learning
Silva et al. Automatic detection of motorcyclists without helmet
CN109190444B (en) Method for realizing video-based toll lane vehicle feature recognition system
CN104778453B (en) A kind of night pedestrian detection method based on infrared pedestrian&#39;s brightness statistics feature
Kuang et al. Feature selection based on tensor decomposition and object proposal for night-time multiclass vehicle detection
Yogameena et al. Deep learning‐based helmet wear analysis of a motorcycle rider for intelligent surveillance system
CN107066933A (en) A kind of road sign recognition methods and system
CN105590102A (en) Front car face identification method based on deep learning
Lee et al. Recognizing pedestrian’s unsafe behaviors in far-infrared imagery at night
CN108647700B (en) Multitask vehicle part identification model, method and system based on deep learning
CN109886161B (en) Road traffic identification recognition method based on likelihood clustering and convolutional neural network
JP2016062610A (en) Feature model creation method and feature model creation device
Ming et al. Vehicle detection using tail light segmentation
CN107292933B (en) Vehicle color identification method based on BP neural network
CN108960074B (en) Small-size pedestrian target detection method based on deep learning
CN107315998A (en) Vehicle class division method and system based on lane line
CN109784216B (en) Vehicle-mounted thermal imaging pedestrian detection Rois extraction method based on probability map
CN111832461A (en) Non-motor vehicle riding personnel helmet wearing detection method based on video stream
CN112381101B (en) Infrared road scene segmentation method based on category prototype regression
CN115280373A (en) Managing occlusions in twin network tracking using structured dropping
Cai et al. Vehicle Detection Based on Deep Dual‐Vehicle Deformable Part Models
CN113989495A (en) Vision-based pedestrian calling behavior identification method
Zhou et al. A novel object detection method in city aerial image based on deformable convolutional networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant