CN112329689A

CN112329689A - Abnormal driving behavior identification method based on graph convolution neural network under vehicle-mounted environment

Info

Publication number: CN112329689A
Application number: CN202011280953.XA
Authority: CN
Inventors: 殷绪成; 王顺; 陈松路; 杨春
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2021-02-05

Abstract

The invention provides a method for identifying abnormal driving behaviors based on a graph convolution neural network in a vehicle-mounted environment, which relates to the technical field of computer vision, can effectively identify fine behaviors and similar behaviors of a human body, and improves the identification capability of the abnormal driving behaviors; the method adopts a mode of combining an improved space-time convolution network and a novel cyclic neural network to identify human body behaviors; the improved space-time convolution network further increases the number of joints on the basis of the original space-time convolution network to improve a space topological graph, extracts the space-time characteristic information of a plurality of frames of framework sequence segments, extracts the time semantic information of different framework sequence segments by using a neural network introducing long-term and short-term memory, and identifies the driving behavior by taking all the extracted information as the basis. The technical scheme provided by the invention is suitable for the human behavior recognition process.

Description

Abnormal driving behavior identification method based on graph convolution neural network under vehicle-mounted environment

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of computer vision, in particular to a method for identifying abnormal driving behaviors based on a graph convolution neural network in a vehicle-mounted environment.

[ background of the invention ]

The abnormal driving behavior recognition is important for safe driving. The main factors affecting safe driving come not only from outside the vehicle but also from inside the vehicle, especially from the driver's behavior. In recent years, most of car accidents are caused by abnormal driving behaviors. The abnormal driving behavior recognition may be regarded as a branch of the human behavior recognition. The mainstream deep learning network motion recognition method based on the RGB video model has good performance in video motion recognition. The method based on RGB video model TSN (L.Wang, Y.Xiong, Z.Wang, Y.Qiao, D.Lin, X.Tang, and L.Van Gool, "Temporal segment networks: aware good features for discovery and registration," in European conference on computer vision, spring, 2016, pp.20-36.) is a classical behavior recognition method, mainly using the appearance and optical flow information in video. However, the method based on the RGB video model is easily affected by the illumination intensity in the vehicle under the vehicle environment, resulting in low recognition accuracy.

The identification method based on the human body skeleton joint point data can pay more attention to human body information and is insensitive to the influence of various appearances and illumination. Graph neural networks can achieve better results than networks based on the RGB model, because human skeletal data is more appropriately represented by a graph structure rather than in the form of a pseudo-image. Recently, many studies have begun to use graph convolutional networks to extract motion information in human skeletal joint points. STGCN (s.yan, y.xiong, and d.lin, "Spatial temporal mapping relational network for boundary-based action registration," in third-second AAAI linkage area identification association, 2018.) proposes a volume neural network to extract motion information in multi-frame human skeleton data from two dimensions of space and time, but the Spatial topology used in this method only contains 18 joints, so that distinctive Spatial semantic features cannot be extracted when subtle behaviors are identified; for example, "yawning" behavior, which is mainly manifested in subtle changes in facial expressions, correlates more closely with changes in key points near the mouth and eyes; further, behaviors such as drinking and smoking are difficult to distinguish. Furthermore, the graph-convolution network has difficulty understanding the temporal correlation between longer frames in the video. AGC-LSTM (C.Si, W.Chen, W.Wang, L.Wang, and T.Tan, "An attribute enhanced graphics controlled LSTM network for latency-based interaction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019, pp.1227-1236.) uses a convolutional network to extract spatial semantic information of a single frame, and then uses a recurrent neural network to extract temporal semantic information contained in a plurality of frames. However, these methods extract features in the spatial and temporal dimensions independently, and thus they cannot effectively represent the correlation between information in the spatial and temporal dimensions.

Therefore, how to effectively identify subtle similar behaviors remains a challenge, and it is necessary to research an abnormal driving behavior identification method based on a graph-convolution neural network in an on-vehicle environment to overcome the shortcomings of the prior art so as to solve or alleviate one or more of the above problems.

[ summary of the invention ]

In view of the above, the invention provides a method for identifying abnormal driving behaviors based on a convolutional neural network in a vehicle-mounted environment, which can effectively identify human body subtle behaviors and similar behaviors and improve the identification capability of the abnormal driving behaviors.

On one hand, the invention provides a method for identifying abnormal driving behaviors based on a graph convolution neural network in a vehicle-mounted environment, which is characterized in that the method for identifying abnormal driving behaviors based on the graph convolution neural network is used for identifying human body behaviors in a mode of combining an improved space-time convolution network and a novel cyclic neural network; after the time and space dimension characteristic information of a plurality of frames of framework sequence fragments is extracted by the improved space-time convolution network, the time semantic information of different framework sequence fragments is extracted by using the novel recurrent neural network, and the driving behavior is identified by taking all the extracted information as the basis.

The above-mentioned aspects and any possible implementation further provide an implementation, wherein the improved spatio-temporal convolutional network further increases the number of joints on the basis of the original spatio-temporal convolutional network to improve the spatial topological graph, so as to extract more spatial semantic information to identify subtle behaviors;

the above-described aspects and any possible implementation further provide an implementation that improves the recognition capability of similarity behaviors by improving the temporal semantic feature learning capability of the neural network, for the neural network introduced with long-term and short-term memory.

The above aspects and any possible implementations further provide an implementation where the modified spatio-temporal convolutional network has a number of joints of 124. The number of joints is not necessarily 124, and may be set to different numbers according to actual conditions.

The above aspect and any possible implementation further provide an implementation, wherein the 124 joints are specifically 12 upper body joints, 70 facial joints and 42 hand joints.

The above-mentioned aspects and any possible implementation manners further provide an implementation manner, where the skeleton sequence segments are divided into continuous segments with the same length by a complete skeleton sequence, and then input into the improved spatio-temporal convolution network for feature extraction.

The above-described aspects and any possible implementations further provide an implementation in which adjacent framework sequence segments include overlapping portions to improve data multiplexing efficiency.

In accordance with the foregoing aspect and any possible implementation manner, there is further provided an implementation manner in which the skeleton sequence is coordinate data of each joint of the human body in each frame of the video.

After the improved spatio-temporal convolution network extracts spatio-temporal feature information of the skeleton sequence segments, inputting the spatio-temporal feature information into the novel cyclic neural network to extract time semantic information of different skeleton sequence segments, and performing feature fusion on the spatio-temporal feature information and the time semantic information to obtain feature fusion data; and the characteristic fusion data is processed by a full connection layer and a softmax function in sequence to obtain final behavior recognition classification information. The fully-connected layer is a specific network layer name in the deep neural network, and is characterized in that each node of the layer is connected with all nodes of the previous layer.

In another aspect, the present invention provides a system for identifying abnormal driving behavior based on a convolutional neural network in a vehicle-mounted environment, wherein the system comprises:

the preprocessing module is used for dividing the complete skeleton sequence in each frame of picture into continuous segments with the same length to obtain skeleton sequence segments;

the time-space convolution network module is used for extracting the time-space characteristic information of the skeleton sequence segment;

the cyclic neural network module is used for extracting time semantic information of different skeleton sequence segments aiming at the time-space characteristic information;

the feature fusion module is used for fusing the spatio-temporal feature information and the time semantic information to obtain feature fusion data;

the recognition and judgment module is used for recognizing and classifying the driving behaviors aiming at the fusion data;

the space-time convolution network module is realized by a space-time convolution network which improves a space topological graph by increasing the number of joints;

the recurrent neural network module is implemented by a neural network that introduces long-term and short-term memory.

Compared with the prior art, the invention can obtain the following technical effects: the improved space topological graph structure aiming at the graph convolution network can better identify subtle abnormal driving behaviors; by introducing a mechanism of segmenting sequence segments and LSTM, the ability of the deep neural network to discriminate similarity behaviors can be further improved; experiments on the collected abnormal driving behavior data set and the Kinetics data set show that the method provided by the invention can obtain obviously improved performance and has good generalization capability.

Of course, it is not necessary for any one product in which the invention is practiced to achieve all of the above-described technical effects simultaneously.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an overall network structure of an end-to-end detection network provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a spatial domain topology structure provided by an embodiment of the present invention for improving a graph convolution network;

FIG. 3 is a schematic representation of data using skeletal sequences provided by one embodiment of the present invention;

FIG. 4 is a schematic diagram of an abnormal driving behavior data set collected according to an embodiment of the present invention.

[ detailed description ] embodiments

For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The identification method based on the human body skeleton joint point data can pay more attention to human body information and is insensitive to the influence of various appearances and illumination. Graph neural networks can achieve better results than networks based on the RGB model, because human skeletal data is more appropriately represented by a graph structure rather than in the form of a pseudo-image. Recently, many studies have begun to use graph convolutional networks to extract motion information in human skeletal joint points. However, since the spatial topological graph used in these methods includes only 18 joints, it is impossible to extract a distinctive spatial semantic feature when recognizing a subtle behavior. Based on the analysis, the invention designs a spatial topological graph containing 124 joints in total, wherein 12 upper body joints (12 upper body joints are distributed on three shoulders, one elbow and three joints on two sides of the upper body), 70 face joints (face joints are distributed on the outlines of the eyebrows, the eyes, the noses, the lips and the faces) and 42 joints of two hands (42 joints are distributed on the two hands, 21 joints are distributed on the five fingers and the palm), so that more spatial semantic information can be extracted from the spatial topological graph consisting of more joints to identify subtle behaviors. In addition, conventional graph-convolution networks have difficulty finding and mining temporal correlations between longer frames in a video. The method provided by the invention combines the advantages of the graph convolution network and the long-short term memory network LSTM, can learn the time and space dimension characteristic information of the multi-frame framework sequence segments through the graph convolution network, and then further improves the time semantic representation of different framework sequence segments by using the LSTM.

The deep neural network combines the space-time graph convolution network and the novel cyclic neural network to solve the challenge of identifying subtle and similar abnormal driving behaviors. The neural network provided by the invention mainly comprises two parts, namely a real-time blank graph convolution network part (GCN network part) and a neural network part (LSTM network part). The neural network structure is shown in fig. 1. Our input data is a skeletal sequence consisting of human joint coordinates for each frame of the video. The skeletal data is shown in fig. 3. Before inputting data into the GCN graph convolution network part, we divide the complete skeleton sequence into several contiguous segments of the same length. And adjacent segments will contain overlapping portions, which helps us to efficiently multiplex data simultaneously during the graph convolution operation. The GCN graph convolution network part performs graph convolution operation on the segmented segments from time and space dimensions, and extracts space-time characteristic information of the segment skeleton sequences. Then, we input the feature vectors of all segments into the two-layer LSTM network. Finally, we join the feature vectors of different segments extracted previously by the GCN department with the output feature vector of the LSTM to form a fused feature. And finally, calculating a final behavior recognition classification score after the fusion features pass through the full connection layer and the softmax function. The whole model is trained end to end through a cross entropy loss function. The invention improves the neural network structure mainly from two aspects of a space domain and a time domain. In the spatial domain, the improvement of the spatial topological graph is carried out on the basis of an open-source space-time graph convolution network STGCN. In the time domain, the segmented different segments are connected through a long-short term memory neural network LSTM, and the learning ability of the network to the time information is further improved.

(a) Improvement of GCN graph convolution network space topological graph

The graph convolution part in the deep neural network provided by the invention is an improvement of a spatial topological graph based on an open-source space-time graph convolution network (STGCN). We conclude that a topology with more joints can learn more spatial semantic information, which helps to differentiate subtle behaviors. The invention designs a spatial topological graph containing 124 joints, so that more spatial semantic information can be extracted from the spatial topological graph consisting of more joints to identify subtle behaviors. The improved spatial topology is shown in fig. 2. Specifically, the 124 joints consist of 12 joints of the upper body, 70 joints of the face, and 42 joints of the hand. Through the joints of the nose and the wrist, the face, the hands and the four limbs can be connected to form a final space topological diagram. In an actual driving scene, the lower body is not visible in the image, and therefore, the joint data of the lower body is discarded. The topology in the time dimension is constructed in the same way as in previous methods, and we connect joints at the same position in consecutive adjacent frames.

(b) Learning temporal correlations between different segments by segmenting sequences and LSTM mechanisms

Long-short term memory networks LSTM have proven to be advantageous in sequence data-based models, where conventional graph-convolution networks have difficulty exploring the temporal correlation between longer frames in a learned video. The method provided by the invention combines the advantages of the graph convolution network and the long-short term memory network LSTM, and can further improve the time semantic features of different skeleton sequence segments by using the LSTM after learning the time and space dimension feature information of a plurality of skeleton sequence segments through the graph convolution network.

Example 1:

(1) data set used in this example:

the current public data sets for behavior recognition are mainly about recognition of common behaviors, and are not completely suitable for abnormal driving behavior recognition tasks. Thus, we have collected data sets that are specifically used to identify abnormal driving behavior. The abnormal driving behavior of the different tags in the data set is shown in fig. 4. Including approximately 4850 brief videos containing 5 abnormal behaviors. The data set includes five tags: drinking water, closing eyes for a long time, communicating with a mobile phone, smoking and yawning. The video in the dataset is infrared video because infrared video can reduce the impact of different lighting conditions. Each video lasts about 4 seconds and the frame rate is 10 frames per second. The resolution of each video is 720 p. The training set includes 4600 videos, with approximately 1000 videos in each category. The test set contains 250 videos, 50 in each category. We obtain skeletal data of these videos using the open-source pose estimation algorithm openpos, which can detect joints of the body torso, face, and hands. The coordinates of these joint points are comprised of the abscissa, the ordinate and the confidence score. The experiment evaluates the network identification performance by calculating the top-1 and top-5 classification accuracy. We compared some mainstream methods on the driving data set and verified the effectiveness of different modules in the network structure proposed by the present invention through ablation experiments.

In addition, we also performed experiments on the Kinetics dataset. The Kinetics human behavior dataset contained approximately 300,000 short videos cropped from the YouTube video website. These videos cover up to 400 categories of human actions, including daily activities, athletic activities, and some complex multiplayer gaming activities. Each clip video in the data set lasts about 10 seconds. The resolution of a video in the kinetic data set is 340x256, the frame rate is 30FPS, and skeleton data of the kinetic data set is obtained by using an open-source attitude estimation algorithm OpenPose. The coordinates of the joint points in the dataset consist of the abscissa, the ordinate and the confidence score. Furthermore, the skeletal data set contains only skeletal information of 18 joints of the human body. We used this data set as a comparison to demonstrate the generalization performance of the method of the present invention and evaluate the recognition performance by calculating the top-1 and top-5 classification accuracy.

(2) Description of the experiments

In the experiments of the present invention, we used the PyTorch framework as a training tool and python as the implementation language. The deep neural network proposed in the invention is realized based on open-source MMSkeleton, and all experiments are carried out on a server provided with a Linux system and a 4-block NVIDIA TITAN Xp GPU video card. We improve the implementation of our GCN graph convolution module based on an open-source STGCN network, which contains 9 space-time graph convolution layers in total, each space-time graph convolution layer is composed of a space-time graph convolution layer, a time graph convolution layer and a discarding layer, and the discarding rate is set to 0.5 to avoid overfitting. Each spatio-temporal graph convolution layer is followed by a batch normalization layer and a ReLU layer. The LSTM network portion is a two-layer LSTM network with an intermediate hidden layer output of 512-dimensional vectors. We use cross entropy as a loss function to propagate the gradient backwards. The entire network is trained in an end-to-end fashion.

When performing experiments on the abnormal driving data set we collected, we uniformly preprocessed each of the input original skeleton sequences into 40 frames. We then split it into three clips with overlapping boundaries, 20 frames in length, for training and testing. We used the SGD optimization algorithm as the optimization strategy with the batch size set to 32. The momentum is set to 0.9, the weight decay to 0.0001, and the initial learning rate to 0.1. The training process ends at round 20. When experiments were performed on the Kinetics dataset, we randomly selected 150 consecutive frames from the input backbone sequence and then divided it into three segments with overlapping boundaries. We use the SGD optimization algorithm as the optimization strategy, with the batch size set to 256. The momentum is set to 0.9, the weight decay to 0.0001, and the initial learning rate to 0.1. The training process ends at round 50.

(3) Results of the experiment

Experiments on the data set collected by the user show that compared with the original STGCN method, the top-1 classification accuracy of the method provided by the invention is improved from 75.6% to 90.4%, and the performance is obviously improved. Ablation experiments show that when the improved spatial topological map proposed by the present invention is used instead of the default spatial topological map, the recognition result is improved by 11.3 percentage points compared to the original STGCN method, wherein the recognition accuracy of all classes is improved, and particularly, the recognition accuracy of the fine behavior "yawning" is improved from 74% to 96%. This experiment also demonstrates the effectiveness of the improved spatial topology of the present invention. After introducing the segmentation sequence and the LSTM mechanism, experiments show that two similar actions of drinking water and smoking can reach the precision of more than 80%, and the experiments prove that the mechanism of segmenting the segments and introducing the LSTM module provided by the invention is effective. In addition, we also experimented with the division of 40 frame sequences into 3 overlapping 20 frame sequence segments, where we only classified using the output of the graph-convolution network, top-1 accuracy was 80.4%. The output of only the LSTM network portion was used for classification with a top-1 accuracy of 83.2%. When the fused features formed by cascading the outputs of the GCN part and the LSTM part are finally classified, the top-1 precision is the best result of 90.4%. In addition, we also tried to split the input sequence into 2 and 4 sequence fragments, and experiments showed that the results were not as good as those obtained by splitting into 3 fragments.

The experimental result on the Kinetics data set shows that the accuracy of top-1 of the method provided by the invention reaches 31.5 percent, the accuracy of top-5 reaches 53.7 percent, and the result is superior to the accuracy of top-1 of the previous STGCN method based on skeleton data by 30.7 percent and the accuracy of top-5 is 52.8 percent. Furthermore, we also tested the performance of splitting the network into 2 and 4 different numbers of segments. The results of top-1 and top-5 were 29.6% and 52.5% when 2 sequence fragments were cleaved, and 29.9% and 51.9% when 4 sequence fragments were cleaved. The results show that the method using 3 fragments works best.

Experiments show that fine abnormal driving behaviors can be better identified through the improved spatial topological graph provided by the invention. In addition, by a mechanism of segmenting sequence fragments and introducing LSTM, the ability of the deep neural network to discriminate similarity behaviors can be further improved. On the video data set we collected for abnormal driving recognition, the method proposed by the present invention achieves an excellent result of 90.04% top-1 accuracy. Experiments on a Kinetics data set show that the method provided by the invention is also superior to the previous identification method based on the skeleton data, and the method provided by the invention is proved to have good generalization capability.

The present invention proposes a novel cyclic graph convolution network that combines a space-time graph convolution network and a cyclic neural network to address the challenge of identifying subtle and similar abnormal driving behaviors. First, the skeleton sequence is divided into several segments of the same length after processing the video data. Secondly, we use the GCN part to extract spatio-temporal feature information of different skeleton sequence segments and design improved spatial topology maps of 124 joints to replace the spatial topology maps of 18 joints used in the previous method. We then use the LSTM part to explore deeper temporal features hidden between different segments. Finally, we classify the abnormal driving behavior using the fused features formed by combining the outputs of the GCN part and the LSTM part. The advantages of the invention are mainly embodied in the following three points:

(a) aiming at an abnormal driving behavior recognition task in a vehicle-mounted environment, a new special human body space topological graph is constructed, which is beneficial to extracting more space semantic information and recognizing fine abnormal driving behaviors by a graph convolution network.

(b) We introduce segmented skeleton sequences and long-short term memory neural network LSTM mechanisms to improve the temporal semantic feature learning ability of the network, which helps to distinguish abnormal driving behaviors with similarity.

(c) The video data sets of five abnormal driving behaviors of a driver in a vehicle under a vehicle-mounted environment are collected and used for an abnormal driving behavior recognition task. The abnormal driving behavior of the different tags is shown in fig. 4. Experiments performed on both the data set we have collected and the development of the data set have demonstrated the effectiveness of the method proposed by the present invention.

The method for identifying the abnormal driving behavior based on the convolutional neural network in the vehicle-mounted environment is described in detail above. The above description of the embodiments is only for the purpose of helping to understand the method of the present application and its core ideas; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

As used in the specification and claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect. The description which follows is a preferred embodiment of the present application, but is made for the purpose of illustrating the general principles of the application and not for the purpose of limiting the scope of the application. The protection scope of the present application shall be subject to the definitions of the appended claims.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The foregoing description shows and describes several preferred embodiments of the present application, but as aforementioned, it is to be understood that the application is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the application as described herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the application, which is to be protected by the claims appended hereto.

Claims

1. An abnormal driving behavior recognition method based on a graph convolution neural network under a vehicle-mounted environment is characterized by comprising the following steps:

s1, extracting the space-time characteristic information of the multi-frame skeleton sequence segment by adopting an improved space-time convolution network;

s2, extracting time semantic information of different skeleton sequence segments by using a novel recurrent neural network by taking the space-time characteristic information as input;

s3, integrating the spatio-temporal feature information and the time semantic information, and identifying the driving behaviors by taking the integrated information as a basis;

the improved space-time convolutional network is used for improving a space topological graph by further increasing the number of joints on the basis of the existing space-time convolutional network and improving the fine behavior recognition capability.

2. The method for recognizing the abnormal driving behavior based on the graph convolution neural network under the vehicle-mounted environment according to claim 1, wherein the novel cyclic neural network is a neural network with long-term and short-term memory introduced, and the recognition capability of the similarity behavior is improved by improving the time semantic feature learning capability of the network.

3. The method for identifying abnormal driving behavior based on the convolutional neural network in the vehicular environment as claimed in claim 1, wherein the number of joints of the improved spatio-temporal convolutional network is 124.

4. The method for identifying abnormal driving behavior based on the atlas neural network in the vehicle-mounted environment according to claim 3, wherein the 124 joints are 12 upper body joints, 70 facial joints and 42 hand joints.

5. The method for identifying the abnormal driving behavior based on the graph convolution neural network under the vehicle-mounted environment according to claim 1, wherein the skeleton sequence segments are formed by dividing a complete skeleton sequence into continuous segments with the same length, and then the continuous segments are input into the improved space-time convolution network for feature extraction.

6. The method for identifying abnormal driving behavior based on the convolutional neural network under the vehicular environment of claim 5, wherein adjacent skeleton sequence segments comprise overlapped parts so as to improve data multiplexing efficiency.

7. The method for identifying the abnormal driving behavior based on the atlas neural network in the vehicle-mounted environment according to claim 5, wherein the skeleton sequence is coordinate data of each joint of the human body in each frame of the video.

8. The abnormal driving behavior recognition method based on the graph convolution neural network under the vehicle-mounted environment according to claim 1, characterized in that after the improved space-time convolution network extracts the space-time feature information of the skeleton sequence segments, the space-time feature information is input into the novel cyclic neural network to extract the time semantic information of different skeleton sequence segments, and then the space-time feature information and the time semantic information are subjected to feature fusion to obtain feature fusion data; and the characteristic fusion data is processed by a full connection layer and a softmax function in sequence to obtain final behavior recognition classification information.

9. An abnormal driving behavior recognition system based on a graph convolution neural network in a vehicle-mounted environment, which is characterized by comprising:

the spatio-temporal convolutional network module is realized by a spatio-temporal convolutional network which improves a spatial topology map by increasing the number of joints.

10. The system of claim 9, further comprising:

the preprocessing module is used for dividing the complete skeleton sequence in each frame of picture into continuous segments with the same length to obtain skeleton sequence segments, and providing the obtained skeleton sequence segments to the space-time convolution network module;