Pedestrian gesture recognition and interaction method based on depth-level separable convolutional network
Technical Field
The invention relates to a pedestrian gesture recognition and interaction technology based on a depth-level separable convolutional network, and belongs to the technical field of advanced automobile driver assistance.
Background
The driving environment perception function is an important function of advanced driver assistance system adas (advanced driver assistance system). Pedestrians, as an important component in public transportation scenarios, have a significant impact on vehicle driving decisions. Currently, most research is focused on how to drive autonomously driven vehicles efficiently and safely, and there is a lack of research in terms of interaction with pedestrians. Therefore, as an important part of the driving environment perception, there is an urgent need to recognize a pedestrian gesture and perform pedestrian interaction.
Currently, in order to complete the task of recognizing the gesture of a pedestrian, there are two main methods: one method is based on the traditional statistical learning method, and depends on complicated characteristic engineering to obtain the gesture information of the pedestrian; in the other method, a deep learning method is used, image information is extracted by relying on a convolution network, and a proper loss function is designed for feature graph output to train a model, so that the aim of recognizing the gesture of the pedestrian is finally achieved. Although the traditional statistical learning method based on the feature engineering is small in calculated amount and simple and easy to implement, the recognition accuracy is poor due to the fact that the feature engineering is too complex; although the model based on the deep convolutional network has high recognition accuracy, most of the models need high-performance GPUs to achieve the real-time recognition effect.
Chinese patent application publication No. CN107423679A proposes a pedestrian intention detection method and system, the method comprising: arranging a distance sensor to collect target form data in an observation area; acquiring track information of the target based on the existing state information of the target; and judging the action intention of each target according to the movement track and the space information of each target. The method only obtains the prediction of the walking track of the pedestrian, and does not achieve the interaction effect of the pedestrian and the vehicle. In addition, chinese patent application publication No. CN104915628A proposes a pedestrian intention detection model for an automated vehicle, the method including: acquiring basic scene elements of a traffic scene around a pedestrian related to the movement intention of the pedestrian; analyzing a relationship between a state change when the pedestrian walks and each surrounding basic scene element to obtain a relationship between the basic scene element and a pedestrian state change, based on the basic scene element and three-dimensional (3D) distance information of the pedestrian over time; establishing a context correlation model between the pedestrian and all the surrounding basic scene elements by using the obtained relationship; and predicting the next motion state of the pedestrian by using the established context correlation model based on the current scene element which is obtained in real time and is related to the current pedestrian so as to generate the next motion prediction result of the pedestrian. The method also has no interaction process of pedestrians and vehicles, needs to identify more additional scene information and 3D information, is very large in calculation amount, and also does not indicate how to deal with when multiple pedestrians are simultaneously present.
Disclosure of Invention
The technical problem to be solved by the invention is as follows:
the invention provides a pedestrian gesture recognition and interaction method based on a depth-level separable convolutional network, and aims to solve the problems of large model calculation amount, low recognition speed and poor pedestrian and vehicle interactivity in the process of recognizing and interacting pedestrian gestures of an autonomous driving automobile.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a pedestrian gesture recognition and interaction method based on a depth-level separable convolutional network, which is characterized by comprising the following steps of:
step one, collecting an image containing a pedestrian;
inputting the image into a depth separable convolution network, detecting a pedestrian bounding box, inputting the image of the bounding box region into a gesture recognition network, and outputting a characteristic diagram of the pedestrian region;
step three, calculating joint point coordinates and classifying the joint point coordinates to obtain gesture recognition results;
step four, sorting the priority of the gestures;
and step five, obtaining a final interaction decision of the moving vehicle according to the gesture expression with the prior priority.
As mentioned above, the pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network, further, the depth-level separable convolutional neural network in the second step specifically includes:
step 2.1, deep convolution;
step 2.2, batch normalization;
step 2.3, Relu activation;
step 2.4, point convolution;
step 2.5, batch normalization;
and 2.6, Relu activation.
The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network is further characterized in that the feature points in the feature map in the step two comprise the probabilities of 12 human body joint points existing at the feature points and the offset vector of each joint point at the point.
The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network is further characterized in that a depth-level separable convolutional structure reduction model is adopted for joint point classification in the second step.
The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network as described above, further, the specific step of classifying the joint point in step three includes:
step 3.1, calculating the coordinates of the joint points: finding out the point with the highest confidence coefficient in each characteristic diagram by combining the confidence coefficient of the distribution characteristic diagram of the human body joint points contained in the characteristic points obtained in the step two and the offset vector characteristic diagram of the corresponding point to determine the type of the joint points, and then obtaining the positions of the joint points from the offset vectors so as to obtain the complete information of the human body joint points;
step 3.2, normalization: after obtaining the coordinates of the human body joint points, taking the central point of the connecting line of the left shoulder and the right shoulder as the center, subtracting the coordinates of the central point from all the joint points, and then carrying out normalization processing;
step 3.3, classification: and classifying the normalized data by using a support vector machine or a layer of fully-connected network to obtain a final pedestrian gesture recognition result.
According to the pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network, further, in the fifth step, when a plurality of pedestrians around the vehicle are detected to make different gestures at the same time, action decisions are made by adopting the most conservative strategy according to different priorities of the gestures of the pedestrians. When a plurality of pedestrians appear in front of the vehicle at the same time, the model needs to identify the gestures of the plurality of pedestrians at the same time; after the gesture information of a plurality of pedestrians is obtained, the gestures are sorted according to the priority of the gesture information, and then the most conservative strategy is adopted to respond. For example, if some pedestrians require the vehicle to decelerate, and some pedestrians require the vehicle to stop, the parking strategy is preferentially executed. This ensures traffic safety with maximum probability.
The model updates the pedestrian state in the visual field in time, and when no pedestrian exists in the visual field or the gestures of all pedestrians do not require the vehicle to give way, the vehicle enters a normal driving state.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
because the method is realized based on the depth-level separable convolution model, compared with the traditional deep learning model, the method has the advantages that the scale is reduced by times, the support of special hardware or GPU equipment is not needed, and the application cost is reduced. Meanwhile, the identification precision can be ensured, and the application scene is greatly widened. The technical scheme provided by the invention can realize the real-time recognition of the pedestrian gesture information on low-power-consumption mobile equipment such as a mobile phone. And, after the information is recognized, the vehicle and the pedestrian make effective interaction. In addition, for a scene with a plurality of pedestrians in front of the vehicle, the model can adopt the most conservative strategy to make a decision according to the priority of the pedestrian gesture, and the traffic safety is guaranteed to the maximum extent.
Drawings
FIG. 1 is a schematic diagram of a deep separable convolutional network;
FIG. 2 is a schematic of the process of the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
it will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention provides a pedestrian gesture recognition and interaction method based on a depth-level separable convolutional network. FIG. 2 is a schematic of the process of the present invention. As shown in fig. 2. The method comprises the following steps:
a front image is first captured by a camera mounted in front of the vehicle. The parameters of video data collected by a forward-looking camera used in the invention are 1280 multiplied by 720@60FPS, video frames are color images and comprise RGB three-channel color information, the color information is expressed by tensor of (1280,720,3) dimensionality, each element in the tensor is an integer, and the value range is [0,255 ].
The image is then input into a depth level separable convolutional neural network to detect pedestrian bounding boxes. The invention utilizes the depth-level separable convolution structure to divide the traditional convolution structure into two steps of depth convolution and point convolution, so that the division can reduce the volume of the model by times on the premise of ensuring the identification effect of the model. Fig. 1 is a schematic diagram of a deep separable convolutional network. As shown in fig. 1, this structure divides the common convolution operation into a deep convolution and a point convolution. The deep convolution adopts different convolution kernels for each input channel, namely one convolution kernel corresponds to one input channel; dot convolution is just a common convolution, except that it uses a 1 × 1 convolution kernel. And (3) extracting a feature map through cascading a plurality of depth-level separable convolution modules, and obtaining a pedestrian bounding box in the feature map.
And then inputting the obtained pedestrian area image into a gesture recognition network. And constructing a feature extraction network of the human body joint points by cascading a plurality of depth-level separable convolution modules. The feature map output by the pedestrian gesture recognition network comprises S multiplied by 36 features, wherein S represents the size of the output feature map, and each feature point is composed of a feature vector containing 36 data. These 36 data contain the probabilities of 12 human body joint points existing at the feature point, and the offset vector of each joint point at that point. And obtaining the coordinates of the joint points of the human body of the pedestrian by combining the probability characteristic diagram and the offset vector diagram.
After the coordinates of the human body joint points are obtained, the central point of the connecting line of the left shoulder and the right shoulder is taken as the center, all the joint points are subtracted from the coordinates of the central point, normalization processing is carried out, and finally, the normalized data are classified by using a support vector machine or a layer of full-connection network, so that the final pedestrian gesture recognition result is obtained.
In the step, the gesture recognition network utilizes a depth-level separable convolution structure simplified model, and finally obtains a gesture classification result by using a support vector machine or a full connection layer.
When a plurality of pedestrians appear in front of the vehicle at the same time, the model needs to identify the gestures of the plurality of pedestrians at the same time; after the gesture information of a plurality of pedestrians is obtained, the gestures are sorted according to the priority of the gesture information, and then the most conservative strategy is adopted to respond. For example, if some pedestrians require the vehicle to decelerate, and some pedestrians require the vehicle to stop, the parking strategy is preferentially executed. This ensures traffic safety with maximum probability.
When no pedestrian is in front of the vehicle or no extra request is made to the vehicle by the pedestrian gesture in the field of view, the vehicle enters a normal driving state.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.