CN111062311A - Pedestrian gesture recognition and interaction method based on depth-level separable convolutional network - Google Patents
Pedestrian gesture recognition and interaction method based on depth-level separable convolutional network Download PDFInfo
- Publication number
- CN111062311A CN111062311A CN201911281009.3A CN201911281009A CN111062311A CN 111062311 A CN111062311 A CN 111062311A CN 201911281009 A CN201911281009 A CN 201911281009A CN 111062311 A CN111062311 A CN 111062311A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- depth
- gesture recognition
- network
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/54—Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention relates to a pedestrian gesture recognition and interaction method of a depth-level separable convolutional network, which comprises the following steps: capturing an image containing a pedestrian by a forward-looking camera system mounted on a vehicle; inputting the image into a depth separable convolution network, detecting a pedestrian bounding box, inputting the image of the bounding box region into a gesture recognition network, and outputting a characteristic diagram of the pedestrian region. And inputting the image of the area where the pedestrian is located into a gesture recognition network for gesture recognition. The gesture recognition network extracts features through a depth-level separable convolution layer, 12 human body joint point information and corresponding 12 offset vectors are predicted at each point of an output feature map, finally pedestrian gestures are understood through classification of the joint points, and a vehicle takes a most conservative strategy to make a decision according to the recognized pedestrian gestures and in combination with gesture priorities. The invention uses a depth-level separable convolution implementation model, reduces the scale of the model by times, and can realize detection in low-power-consumption mobile terminals such as smart phones and the like.
Description
Technical Field
The invention relates to a pedestrian gesture recognition and interaction technology based on a depth-level separable convolutional network, and belongs to the technical field of advanced automobile driver assistance.
Background
The driving environment perception function is an important function of advanced driver assistance system adas (advanced driver assistance system). Pedestrians, as an important component in public transportation scenarios, have a significant impact on vehicle driving decisions. Currently, most research is focused on how to drive autonomously driven vehicles efficiently and safely, and there is a lack of research in terms of interaction with pedestrians. Therefore, as an important part of the driving environment perception, there is an urgent need to recognize a pedestrian gesture and perform pedestrian interaction.
Currently, in order to complete the task of recognizing the gesture of a pedestrian, there are two main methods: one method is based on the traditional statistical learning method, and depends on complicated characteristic engineering to obtain the gesture information of the pedestrian; in the other method, a deep learning method is used, image information is extracted by relying on a convolution network, and a proper loss function is designed for feature graph output to train a model, so that the aim of recognizing the gesture of the pedestrian is finally achieved. Although the traditional statistical learning method based on the feature engineering is small in calculated amount and simple and easy to implement, the recognition accuracy is poor due to the fact that the feature engineering is too complex; although the model based on the deep convolutional network has high recognition accuracy, most of the models need high-performance GPUs to achieve the real-time recognition effect.
Chinese patent application publication No. CN107423679A proposes a pedestrian intention detection method and system, the method comprising: arranging a distance sensor to collect target form data in an observation area; acquiring track information of the target based on the existing state information of the target; and judging the action intention of each target according to the movement track and the space information of each target. The method only obtains the prediction of the walking track of the pedestrian, and does not achieve the interaction effect of the pedestrian and the vehicle. In addition, chinese patent application publication No. CN104915628A proposes a pedestrian intention detection model for an automated vehicle, the method including: acquiring basic scene elements of a traffic scene around a pedestrian related to the movement intention of the pedestrian; analyzing a relationship between a state change when the pedestrian walks and each surrounding basic scene element to obtain a relationship between the basic scene element and a pedestrian state change, based on the basic scene element and three-dimensional (3D) distance information of the pedestrian over time; establishing a context correlation model between the pedestrian and all the surrounding basic scene elements by using the obtained relationship; and predicting the next motion state of the pedestrian by using the established context correlation model based on the current scene element which is obtained in real time and is related to the current pedestrian so as to generate the next motion prediction result of the pedestrian. The method also has no interaction process of pedestrians and vehicles, needs to identify more additional scene information and 3D information, is very large in calculation amount, and also does not indicate how to deal with when multiple pedestrians are simultaneously present.
Disclosure of Invention
The technical problem to be solved by the invention is as follows:
the invention provides a pedestrian gesture recognition and interaction method based on a depth-level separable convolutional network, and aims to solve the problems of large model calculation amount, low recognition speed and poor pedestrian and vehicle interactivity in the process of recognizing and interacting pedestrian gestures of an autonomous driving automobile.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a pedestrian gesture recognition and interaction method based on a depth-level separable convolutional network, which is characterized by comprising the following steps of:
step one, collecting an image containing a pedestrian;
inputting the image into a depth separable convolution network, detecting a pedestrian bounding box, inputting the image of the bounding box region into a gesture recognition network, and outputting a characteristic diagram of the pedestrian region;
step three, calculating joint point coordinates and classifying the joint point coordinates to obtain gesture recognition results;
step four, sorting the priority of the gestures;
and step five, obtaining a final interaction decision of the moving vehicle according to the gesture expression with the prior priority.
As mentioned above, the pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network, further, the depth-level separable convolutional neural network in the second step specifically includes:
step 2.1, deep convolution;
step 2.2, batch normalization;
step 2.3, Relu activation;
step 2.4, point convolution;
step 2.5, batch normalization;
and 2.6, Relu activation.
The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network is further characterized in that the feature points in the feature map in the step two comprise the probabilities of 12 human body joint points existing at the feature points and the offset vector of each joint point at the point.
The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network is further characterized in that a depth-level separable convolutional structure reduction model is adopted for joint point classification in the second step.
The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network as described above, further, the specific step of classifying the joint point in step three includes:
step 3.1, calculating the coordinates of the joint points: finding out the point with the highest confidence coefficient in each characteristic diagram by combining the confidence coefficient of the distribution characteristic diagram of the human body joint points contained in the characteristic points obtained in the step two and the offset vector characteristic diagram of the corresponding point to determine the type of the joint points, and then obtaining the positions of the joint points from the offset vectors so as to obtain the complete information of the human body joint points;
step 3.2, normalization: after obtaining the coordinates of the human body joint points, taking the central point of the connecting line of the left shoulder and the right shoulder as the center, subtracting the coordinates of the central point from all the joint points, and then carrying out normalization processing;
step 3.3, classification: and classifying the normalized data by using a support vector machine or a layer of fully-connected network to obtain a final pedestrian gesture recognition result.
According to the pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network, further, in the fifth step, when a plurality of pedestrians around the vehicle are detected to make different gestures at the same time, action decisions are made by adopting the most conservative strategy according to different priorities of the gestures of the pedestrians. When a plurality of pedestrians appear in front of the vehicle at the same time, the model needs to identify the gestures of the plurality of pedestrians at the same time; after the gesture information of a plurality of pedestrians is obtained, the gestures are sorted according to the priority of the gesture information, and then the most conservative strategy is adopted to respond. For example, if some pedestrians require the vehicle to decelerate, and some pedestrians require the vehicle to stop, the parking strategy is preferentially executed. This ensures traffic safety with maximum probability.
The model updates the pedestrian state in the visual field in time, and when no pedestrian exists in the visual field or the gestures of all pedestrians do not require the vehicle to give way, the vehicle enters a normal driving state.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
because the method is realized based on the depth-level separable convolution model, compared with the traditional deep learning model, the method has the advantages that the scale is reduced by times, the support of special hardware or GPU equipment is not needed, and the application cost is reduced. Meanwhile, the identification precision can be ensured, and the application scene is greatly widened. The technical scheme provided by the invention can realize the real-time recognition of the pedestrian gesture information on low-power-consumption mobile equipment such as a mobile phone. And, after the information is recognized, the vehicle and the pedestrian make effective interaction. In addition, for a scene with a plurality of pedestrians in front of the vehicle, the model can adopt the most conservative strategy to make a decision according to the priority of the pedestrian gesture, and the traffic safety is guaranteed to the maximum extent.
Drawings
FIG. 1 is a schematic diagram of a deep separable convolutional network;
FIG. 2 is a schematic of the process of the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
it will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention provides a pedestrian gesture recognition and interaction method based on a depth-level separable convolutional network. FIG. 2 is a schematic of the process of the present invention. As shown in fig. 2. The method comprises the following steps:
a front image is first captured by a camera mounted in front of the vehicle. The parameters of video data collected by a forward-looking camera used in the invention are 1280 multiplied by 720@60FPS, video frames are color images and comprise RGB three-channel color information, the color information is expressed by tensor of (1280,720,3) dimensionality, each element in the tensor is an integer, and the value range is [0,255 ].
The image is then input into a depth level separable convolutional neural network to detect pedestrian bounding boxes. The invention utilizes the depth-level separable convolution structure to divide the traditional convolution structure into two steps of depth convolution and point convolution, so that the division can reduce the volume of the model by times on the premise of ensuring the identification effect of the model. Fig. 1 is a schematic diagram of a deep separable convolutional network. As shown in fig. 1, this structure divides the common convolution operation into a deep convolution and a point convolution. The deep convolution adopts different convolution kernels for each input channel, namely one convolution kernel corresponds to one input channel; dot convolution is just a common convolution, except that it uses a 1 × 1 convolution kernel. And (3) extracting a feature map through cascading a plurality of depth-level separable convolution modules, and obtaining a pedestrian bounding box in the feature map.
And then inputting the obtained pedestrian area image into a gesture recognition network. And constructing a feature extraction network of the human body joint points by cascading a plurality of depth-level separable convolution modules. The feature map output by the pedestrian gesture recognition network comprises S multiplied by 36 features, wherein S represents the size of the output feature map, and each feature point is composed of a feature vector containing 36 data. These 36 data contain the probabilities of 12 human body joint points existing at the feature point, and the offset vector of each joint point at that point. And obtaining the coordinates of the joint points of the human body of the pedestrian by combining the probability characteristic diagram and the offset vector diagram.
After the coordinates of the human body joint points are obtained, the central point of the connecting line of the left shoulder and the right shoulder is taken as the center, all the joint points are subtracted from the coordinates of the central point, normalization processing is carried out, and finally, the normalized data are classified by using a support vector machine or a layer of full-connection network, so that the final pedestrian gesture recognition result is obtained.
In the step, the gesture recognition network utilizes a depth-level separable convolution structure simplified model, and finally obtains a gesture classification result by using a support vector machine or a full connection layer.
When a plurality of pedestrians appear in front of the vehicle at the same time, the model needs to identify the gestures of the plurality of pedestrians at the same time; after the gesture information of a plurality of pedestrians is obtained, the gestures are sorted according to the priority of the gesture information, and then the most conservative strategy is adopted to respond. For example, if some pedestrians require the vehicle to decelerate, and some pedestrians require the vehicle to stop, the parking strategy is preferentially executed. This ensures traffic safety with maximum probability.
When no pedestrian is in front of the vehicle or no extra request is made to the vehicle by the pedestrian gesture in the field of view, the vehicle enters a normal driving state.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (6)
1. A pedestrian gesture recognition and interaction method based on a depth-level separable convolutional network is characterized by comprising the following steps:
step one, collecting an image containing a pedestrian;
inputting the image into a depth separable convolution network, detecting a pedestrian bounding box, inputting the image of the bounding box region into a gesture recognition network, and outputting a characteristic diagram of the pedestrian region;
step three, calculating joint point coordinates and classifying the joint point coordinates to obtain gesture recognition results;
step four, sorting the priority of the gestures;
and step five, obtaining a final interaction decision of the moving vehicle according to the gesture expression with the prior priority.
2. The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network as claimed in claim 1, wherein the depth-level separable convolutional neural network in step two specifically comprises:
step 2.1, deep convolution;
step 2.2, batch normalization;
step 2.3, Relu activation;
step 2.4, point convolution;
step 2.5, batch normalization;
and 2.6, Relu activation.
3. The method as claimed in claim 1, wherein the feature points in the feature map in step two include the probabilities of 12 human body joint points existing at the feature points and the offset vector of each joint point at the feature point.
4. The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network as claimed in claim 1, wherein the joint classification in step two adopts a reduced model of depth-level separable convolutional structure.
5. The pedestrian gesture recognition and interaction method based on the depth-level separable convolutional network as claimed in claim 4, wherein the concrete step of classifying the joint points in step three comprises:
step 3.1, calculating the coordinates of the joint points: finding out the point with the highest confidence coefficient in each characteristic diagram by combining the confidence coefficient of the distribution characteristic diagram of the human body joint points contained in the characteristic points obtained in the step two and the offset vector characteristic diagram of the corresponding point to determine the type of the joint points, and then obtaining the positions of the joint points from the offset vectors so as to obtain the complete information of the human body joint points;
step 3.2, normalization: after obtaining the coordinates of the human body joint points, taking the central point of the connecting line of the left shoulder and the right shoulder as the center, subtracting the coordinates of the central point from all the joint points, and then carrying out normalization processing;
step 3.3, classification: and classifying the normalized data by using a support vector machine or a layer of fully-connected network to obtain a final pedestrian gesture recognition result.
6. The model as claimed in claim 1, wherein in the fifth step, when it is detected that a plurality of pedestrians make different gestures around the vehicle, the most conservative strategy is adopted to make action decisions according to the different priorities of the pedestrian gestures.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911281009.3A CN111062311B (en) | 2019-12-13 | 2019-12-13 | Pedestrian gesture recognition and interaction method based on depth-level separable convolution network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911281009.3A CN111062311B (en) | 2019-12-13 | 2019-12-13 | Pedestrian gesture recognition and interaction method based on depth-level separable convolution network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111062311A true CN111062311A (en) | 2020-04-24 |
CN111062311B CN111062311B (en) | 2023-05-23 |
Family
ID=70301176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911281009.3A Active CN111062311B (en) | 2019-12-13 | 2019-12-13 | Pedestrian gesture recognition and interaction method based on depth-level separable convolution network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111062311B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115546824A (en) * | 2022-04-18 | 2022-12-30 | 荣耀终端有限公司 | Taboo picture identification method, equipment and storage medium |
CN117711014A (en) * | 2023-07-28 | 2024-03-15 | 荣耀终端有限公司 | Method and device for identifying space-apart gestures, electronic equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
CN109613930A (en) * | 2018-12-21 | 2019-04-12 | 中国科学院自动化研究所南京人工智能芯片创新研究院 | Control method, device, unmanned vehicle and the storage medium of unmanned vehicle |
CN110096973A (en) * | 2019-04-16 | 2019-08-06 | 东南大学 | A kind of traffic police's gesture identification method separating convolutional network based on ORB algorithm and depth level |
CN110096968A (en) * | 2019-04-10 | 2019-08-06 | 西安电子科技大学 | A kind of ultrahigh speed static gesture identification method based on depth model optimization |
-
2019
- 2019-12-13 CN CN201911281009.3A patent/CN111062311B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
CN109613930A (en) * | 2018-12-21 | 2019-04-12 | 中国科学院自动化研究所南京人工智能芯片创新研究院 | Control method, device, unmanned vehicle and the storage medium of unmanned vehicle |
CN110096968A (en) * | 2019-04-10 | 2019-08-06 | 西安电子科技大学 | A kind of ultrahigh speed static gesture identification method based on depth model optimization |
CN110096973A (en) * | 2019-04-16 | 2019-08-06 | 东南大学 | A kind of traffic police's gesture identification method separating convolutional network based on ORB algorithm and depth level |
Non-Patent Citations (1)
Title |
---|
SHICHAO ZHANG 等: "One For All: A Mutual Enhancement Method for", 《MDPI》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115546824A (en) * | 2022-04-18 | 2022-12-30 | 荣耀终端有限公司 | Taboo picture identification method, equipment and storage medium |
CN115546824B (en) * | 2022-04-18 | 2023-11-28 | 荣耀终端有限公司 | Taboo picture identification method, apparatus and storage medium |
CN117711014A (en) * | 2023-07-28 | 2024-03-15 | 荣耀终端有限公司 | Method and device for identifying space-apart gestures, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111062311B (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175576B (en) | Driving vehicle visual detection method combining laser point cloud data | |
Nguyen et al. | Learning framework for robust obstacle detection, recognition, and tracking | |
EP2601615B1 (en) | Gesture recognition system for tv control | |
Hoang et al. | Enhanced detection and recognition of road markings based on adaptive region of interest and deep learning | |
US20210125338A1 (en) | Method and apparatus for computer vision | |
US11024042B2 (en) | Moving object detection apparatus and moving object detection method | |
CN111666921A (en) | Vehicle control method, apparatus, computer device, and computer-readable storage medium | |
CN112487862B (en) | Garage pedestrian detection method based on improved EfficientDet model | |
CN110730966B (en) | System and method for pedestrian detection | |
CN110781964A (en) | Human body target detection method and system based on video image | |
CN113378641B (en) | Gesture recognition method based on deep neural network and attention mechanism | |
Dewangan et al. | Towards the design of vision-based intelligent vehicle system: methodologies and challenges | |
CN113297959B (en) | Target tracking method and system based on corner point attention twin network | |
CN103905824A (en) | Video semantic retrieval and compression synchronization camera system and method | |
CN111768438A (en) | Image processing method, device, equipment and computer readable storage medium | |
JP6130325B2 (en) | Road environment recognition device | |
US20230281961A1 (en) | System and method for 3d object detection using multi-resolution features recovery using panoptic segmentation information | |
CN111062311B (en) | Pedestrian gesture recognition and interaction method based on depth-level separable convolution network | |
CN115620393A (en) | Fine-grained pedestrian behavior recognition method and system oriented to automatic driving | |
CN113723170A (en) | Integrated hazard detection architecture system and method | |
CN112508839A (en) | Object detection system and object detection method thereof | |
CN113435232A (en) | Object detection method, device, equipment and storage medium | |
CN107463886A (en) | A kind of double method and systems for dodging identification and vehicle obstacle-avoidance | |
CN113449629B (en) | Lane line false and true identification device, method, equipment and medium based on driving video | |
Wang et al. | Road semantic segmentation and traffic object detection model based on encoder-decoder cnn architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |