CN111242015B

CN111242015B - Method for predicting driving dangerous scene based on motion profile semantic graph

Info

Publication number: CN111242015B
Application number: CN202010026768.1A
Authority: CN
Inventors: 高珍; 欧明锋; 余荣杰; 许靖宁; 冯巾松
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2023-05-02
Anticipated expiration: 2040-01-10
Also published as: CN111242015A

Abstract

The invention relates to a method for predicting a driving dangerous scene based on a motion profile semantic graph, which comprises the following steps: step S1: acquiring a driving video and dividing an interested region; step S2: detecting a traffic object by using a target detection algorithm and generating a motion profile semantic graph; step S3: counting motion data and an acceleration threshold value, and dividing a motion profile semantic graph into high-risk events or normal events; step S4: inputting the important kinematic features into a random forest classifier, and sorting according to the feature importance to obtain the important kinematic features; step S5: constructing a multi-mode deep neural network model; step S6: and obtaining a motion profile semantic graph and important kinematic features of the driving video to be detected, inputting the motion profile semantic graph and the important kinematic features into a multi-modal depth neural network model, predicting whether the driving has risk, and alarming to a driver if the driving has risk. Compared with the prior art, the method has the advantages of improving the prediction accuracy of the driving dangerous scene, reducing the measurement fluctuation and the like.

Description

Method for predicting driving dangerous scene based on motion profile semantic graph

Technical Field

The invention relates to the field of automobile auxiliary driving, in particular to a method for predicting a driving dangerous scene based on a motion profile semantic graph.

Background

The data fusion model using deep learning is a new trend of traffic safety prediction, because both video data and kinematic data have respective limitations, and fusing these two types of data in a reasonable manner to improve the accuracy of scene risk prediction is a hotspot of current research. There have been some studies on high risk driving scene recognition, but there are still some problems. There are studies to detect dangerous situations by using abrupt changes in vehicle speed and direction and combining frame differences of videos, wherein the manner of comparing the frame differences of videos by an automatic encoder is more suitable for corner dangerous situations, and the accuracy is only 71% in general cases, which is not ideal. In addition, classical machine learning classifiers, including kNN, random forest, SVM, decision tree, gaussian neighborhood and AdaBoost, are used based on kinematic data, but the accuracy of the test results has volatility and is greatly affected by the prediction range. In addition, it is proposed to create Motion images based on driving forward videos, and predict risks by performing TTC calculation or capturing other information through tracks.

Disclosure of Invention

The invention aims to overcome the defects of lower accuracy and larger fluctuation of measurement precision in the prior art and provides a method for predicting a driving dangerous scene based on a motion profile semantic graph.

The aim of the invention can be achieved by the following technical scheme:

a method for predicting a driving danger scene based on a motion profile semantic graph comprises the following steps:

step S1: acquiring a driving video of a vehicle, and segmenting a region of interest (ROI) of the driving video;

step S2: detecting a traffic object by using a target detection algorithm in an interesting region of the driving video and generating a motion profile semantic graph containing semantics;

step S3: counting motion data of the vehicle, setting an acceleration threshold according to a counting result, and dividing the motion profile semantic graph into high-risk events or normal events;

step S4: inputting the high-risk event or the normal event into a random forest classifier, and sequencing classification results according to feature importance to obtain important kinematic features;

step S5: constructing a multi-mode deep neural network model according to the motion profile semantic graph and the important kinematic features;

step S6: and S1-S4, executing the driving video to be detected, obtaining a motion profile semantic graph and important kinematic features of the driving video to be detected, inputting the motion profile semantic graph and the important kinematic features into the multi-mode deep neural network model, predicting whether the driving has risk, and giving an alarm to a driver if the driving has risk.

In step S1, the specific process of segmenting the region of interest of the driving video is as follows:

step S101: filtering irrelevant image textures in the driving video through a Gaussian filter, and extracting the outline of the road in the driving video through an edge detection algorithm, wherein the method specifically comprises the following steps:

step S1011: the video frame image of the colorful driving video is converted into a gray image, and the method is concretely as follows:

f＝0.299*R+0.587*G+0.114*B

wherein R, G, B represents a matrix of each of the three RGB channels, respectively;

step S1012: the gray scale map is filtered by a gaussian filter, concretely as follows:

where f (m, n) represents the original image gray value at position (m, n), g (m, n) represents the Gaussian filtered gray value;

step S1013: calculating the gradient strength and gradient direction of corresponding pixels in the gray level diagram filtered by the Gaussian filter through a Sobel operator, wherein the gradient strength and gradient direction are as follows:

wherein G is _x (m, n) is the gradient strength in the transverse direction, G _y (m, n) is the longitudinal gradient strength, S _x Is a transverse Sobel operator, S _y Is a longitudinal Sobel operator, G (m, n) is gradient strength, and theta (m, n) is gradient direction;

step S1014: comparing the gradient intensity of the current pixel with two pixels along the positive and negative gradient directions, if the gradient intensity of the current pixel is the largest compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is restrained, namely, is set to 0;

step S1015: setting an upper threshold value and a lower threshold value v _min And v _max Wherein is greater than v _max Are detected as edges and are lower than v _min Is detected as a non-edge. For the middle pixel point, if the middle pixel point is adjacent to the pixel point determined to be the edge, thenJudging the edge; otherwise, the image is non-edge, so that a corresponding binary image (the gray value of the edge point is 1 and the gray value of the non-edge point is 0) is obtained.

Step S102: detecting a transformation of a straight line in a contour of the road by Huo Fuxian transformation, performing an accumulator to calculate a number of points mapped to the line in a hough space, and detecting the straight line if there are enough mapped points in the hough space;

step S103: after hough line transformation, more than two lines may be detected in the image. Since only two lines are required to calculate the position of the vanishing point, the lines are divided into two groups of left and right, the average parameter of each group is calculated, two intersecting lines are obtained from the average parameter, and the coordinate of the intersection of the two intersecting lines is calculated as x _d ，y _d I.e. vanishing points;

step S104: y is the ordinate of vanishing point _d Upper boundary y as ROI region _u The lower boundary y of the ROI area is defined by the largest ordinate among the detected starting points of the two groups of lines _l The width of the ROI area is the width of the driving video.

The specific process of generating the motion profile semantic graph in step S2 is as follows:

step S201: the method comprises the following steps of carrying out averaging treatment on the interested area of each frame of image of the driving video, converting the interested area into a row of pixels, and specifically comprising the following steps:

step S2011: longitudinal [ y ] in each frame of picture for obtaining driving video _l ，y _u ]Transverse [0,w ]]RGB pixel values within a rectangular range of (i.e. (y) _u -y _l W, 3) a three-dimensional integer matrix, w being the video width;

step S2012: for each channel in RGB in the rectangular range, the average value of the longitudinal pixels is taken as the pixel value of one point, i.e. (y _u -y _l W, 3) the average value of the first dimension of the three-dimensional integer matrix, arranged in a row of pixels of 1×w, i.e., (1, w, 3) matrix;

step S202: a row of pixel matrix obtained for each frame is spliced in time order to form (fps× (t _b -t _a ) W, 3) matrix, fps is the number of frames per second of videoGenerating a colorful motion profile according to the pixel matrix;

step S203: identifying the motion profile by a real-time object detection frame, judging whether a traffic object in the identified traffic environment is positioned in an interested area, if so, marking the traffic object on the position of a corresponding frame line in the motion profile in a colored pixel line segment form according to the transverse position of the traffic object in a frame picture of a driving video pair to form a motion profile semantic graph, wherein the specific process is as follows:

step S2031: for t _f A moment video frame picture, wherein a YOLO real-time object detection frame is used for identifying all traffic objects in the picture, and four pieces of information of the position, the size, the type and the confidence coefficient of the traffic objects are obtained;

step S2032: screening all traffic objects that confidence is greater than c _t Traffic objects with center coordinates in the ROI area, wherein the traffic objects comprise pedestrians and vehicles;

step S2033: the pixel line segment position corresponding to each traffic object in the video frame of the driving video is calculated, and the method specifically comprises the following steps:

wherein [ x ] ₁ ，x ₂ ]For the pixel line segment position corresponding to the traffic object, x _c ，w _o Detecting the center coordinates and the width of an object respectively for YOLO, wherein w is the width of a video picture;

step S2034: t in the motion profile _f The corresponding pixel row (i.e. the t-th _f Line), pixel line segment [ x ] of different class type object ₁ ，x ₂ ]Different colors are set, if the object is a vehicle, [ x ] ₁ ，x ₂ ]Pixels within the range are set to red; if the object is a pedestrian, setting green;

finally, a motion profile semantic graph containing the semantic features of the moving object is formed, line segments of the object in the graph are arranged along with time to form a continuous track, and the width degree of the track reflects the relative longitudinal position of the traffic object from the vehicle. The wider the trajectory, the closer the representative traffic object is, and the higher the corresponding risk factor.

The specific process of step S3 is as follows:

step S301: the majority of the vehicle kinematic characteristic variables accord with normal distribution, abnormal values of vehicle motion data are detected and filtered through a 3 sigma principle of the normal distribution, namely, each non-empty kinematic characteristic variable of one driving record is judged, and the abnormal values accord with the conditions, specifically:

|x-μ|＞3σ

wherein x is a kinematic parameter, μ is an average value of x, and σ is a standard deviation of x;

filling the missing value by a linear interpolation method, specifically:

wherein, the liquid crystal display device comprises a liquid crystal display device,

is a missing value, d ^i-1 D is the last non-empty nearest neighbor value of the missing value ⁱ⁺¹ The next non-empty nearest neighbor value that is a missing value, n is the total number of records, t ^i-1 ，t ⁱ ，t ⁱ⁺¹ D is respectively ^i-1 ，/>

d ⁱ⁺¹ Corresponding time;

step S302: extracting vehicle acceleration data a in natural driving data, drawing and observing a distribution curve, determining an acceleration threshold value of obvious deceleration behavior, and recording as TH _d ；

Step S303: scanning driving time sequence data, and according to acceleration condition a less than or equal to TH _d Collecting emergency braking moment t _d For each time t _d Taking d before ₁ To d ₂ Time slices of seconds, constituting a potentially high risk event slice e _c Combining video checking, eliminating false alarms caused by data acquisition errors, and combining n _{conflict_candidate} The high risk event fragments form a high risk event preparation set

To avoid event overlap, ensure that adjacent emergency braking moments meet t _d [i+1]-t _d [i]≥|d ₁ -d ₂ |。

Step S304: from the rest of the driving time sequence data, the data is represented by the value of d ₁ -d ₂ I is a time window, and n is randomly sampled out _{normal_candidate} Normal non-conflicting events as a normal event preparation set

The specific process of step S4 is as follows:

step S401: one containing m _l Recording each event with multiple kinematic features, extracting n kinematic features { m } ₁ ，…，m _n Using event classification as a classification label value of the sample as an important feature of the sample to generate a sample set;

step S402: selecting n from sample set by sampling and replacing method _s The q samples are used as training sets, q times are repeated to generate q training sets { S ] ₁ ，…，S _q }；

Step S403: constructing a random forest { T } containing q CART decision trees by taking each training set as an input of one decision tree ₁ ，…，T _q }, wherein for T _i Selection m for each node randomly not repeated _node By means of the m _node Pairs of features S _i Dividing, and obtaining optimal division by using the minimum base index as a standard, so as to train q CART decision trees;

step S404: sorting classification results according to feature importance to obtain important kinematic features, specifically:

step S4041: calculate { m } ₁ ，…，m _n Each kinematic feature m in } _j Average amount of change I of node splitting uncertainty in all decision trees _j (i.e., importance), the degree of node o's opacity is measured using the base index, as follows:

wherein, GI _o For node o to be impure, k represents class (high risk, normal), p _ok Represents the proportion of class k in node o, p _ok′ Representing the proportion of non-category k;

step S4042: calculating m _j Importance I in the ith tree _ji The formula is as follows:

/>

wherein O is the m containing kinematic feature of the ith tree _j Node set, GI _jio A base index G which is the node o of the ith tree _jiol ，G _jior The base indexes of the left and right new nodes after the node o branches are obtained;

step S4043: calculating m _j Importance I in all trees _j The formula is as follows:

wherein q is the number of CART decision trees;

step S4044: obtain the importance set { I ] of all kinematic features ₁ ，…，I _X After } the importance is normalized, the specific steps are as follows:

ranking the importance sets of the features subjected to normalization processing from large to small to obtain n before importance ranking _immportant Features of (2)

Step S405: prepare the set of normal events

And a high risk event preparation set

Uses the n in the event for each event of the event _important Represented by kinematic features, i.e. for each event +.>

Wherein id is event number, label is event type, and forms normal event set +.>

And high risk event set->

The multi-modal deep neural network model specifically comprises:

an input layer for converting the motion profile semantic graph into a matrix m ₁ ；

Conv1 layer, setting parameters of convolution layer including number, size, step length and activation function of filter, inputting m ₁ Obtaining matrix m ₂ ；

The pool layer sets parameters of the pooling layer, including the size, the type, the step length and the like of the filter, and inputs m ₂ Performing maximum pooling to obtain a matrix m ₃ ；

Conv2 layer, set the parameters of convolution layer, input m ₃ Obtaining a matrix m through a ReLU activation function ₄ ；

Pool2 layer, setting parameters of pooling layer, inputting m ₄ Performing maximum pooling to obtain a matrix m ₅ ；

Conv3 layer, set the parameters of convolution layer, input m ₅ Obtaining a matrix m through a ReLU activation function ₆ ；

Conv4 layer, set the parameters of convolution layer, input m ₆ Obtaining a matrix m through a ReLU activation function ₇ ；

Conv5 layer, set the parameters of convolution layer, input m ₇ Obtaining a matrix m through a ReLU activation function ₈ ；

Pool5 layer, setting pooling layer parameters, inputting m ₈ Performing maximum pooling to obtain a matrix m ₉ ；

FC6 smoothing layer, matrix m to be input ₉ Smoothing into a one-dimensional matrix m ₁₀ ；

Drop6 layer, input matrix m ₁₀ Discarding part of nerve nodes with a certain proportion of Dropout probability, preventing overfitting, and obtaining matrix m ₁₁ ；

FC7 full connection layer, input matrix m ₁₁ Output a one-dimensional matrix m of r×1 ₁₂ ；

Let m ₁₂ And f _kinematic Merging, i.e. [ f _kinematic m ₁₂ ]As an input to the FC8 fully connected layer, a 2×1 matrix is output, two values in the matrix correspond to predicted values of probabilities belonging to a risky class and a risky class, and then the predicted values are processed using Softmax to make the sum of the probabilities of the two classes 1.

The specific process of step S5 is as follows:

step S501: dividing step S4 into normal event sets

And high risk event set->

Respectively divide the training set into training sets theta in a ratio of 2:1 _train Test set Θ _test ；

Step S502: training multimodal deep nervesNetwork model, through n _epoch The epoch, the loss value of the model is converged to a smaller value, training is stopped, and the final multi-mode deep neural network model M is stored _DCNN ；

Step S503: for test set Θ _test (comprising e _c Normal event e _n High risk events) invoke a trained M for each event in the set _DCNN Model, obtaining its predicted classification value, and making statistics to obtain the normal event predicted by model

And conflict event->

Generating a confusion matrix as shown in table 1 according to the prediction result of the test set:

TABLE 1 confusion matrix

Obtaining sensitivity I of model according to confusion matrix calculation _sensitivity Specificity I _specificity The formula is as follows:

I _sensitivity ＝TP/(TP+FN)

I _specificity ＝TN/(FP+TN)

and according to I _sensitivity And I _specificity And generating an ROC curve for evaluating the model prediction effect.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention combines video data and kinematic data to perform risk prediction, and the model accuracy reaches 91.6 percent, which is far superior to other single-source data models.

2. According to the invention, a real-time object detection frame is used for detecting a moving object on a frame picture of a driving video, semantic information of a traffic object track is added in a motion profile generated by the video, the track of potential conflict objects such as motor vehicles, non-motor vehicles and pedestrians is highlighted in a colored line segment mode, and the interference of the track of static elements in a traffic environment on a prediction result is greatly reduced.

3. According to the invention, the random forest is used for screening important kinematic feature variables, so that the accuracy of the multi-mode deep neural network model is improved.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a schematic illustration of a road profile extracted by edge detection according to the present invention;

FIG. 3 is a schematic illustration of a forward driving video based region of interest of the present invention;

FIG. 4 is a schematic diagram of the conversion of a region of interest of a forward driving video to a motion profile;

FIG. 5 (a) is a motion profile semantic graph of a normal event after YOLO target recognition based on the present invention;

FIG. 5 (b) is a motion profile semantic graph of the present invention after YOLO-based object recognition and noise filtering.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

As shown in fig. 1, a method for predicting a driving dangerous scene based on a motion profile semantic graph includes the following steps:

step S3: counting motion data of the vehicle, and setting an acceleration threshold according to a counting result to divide a motion profile semantic graph into high risk events or normal events;

step S4: inputting the high-risk event or the normal event into a random forest classifier, and sequencing classification results according to the feature importance to obtain important kinematic features;

step S6: and S1-S4, the driving video to be detected is executed, a motion profile semantic graph and important kinematic features of the driving video to be detected are obtained and input into a multi-modal depth neural network model, whether the driving is at risk or not is predicted, and if the driving is at risk, the warning is given to a driver.

The specific process of segmenting the region of interest of the driving video in step S1 is as follows:

step S101: as shown in fig. 2, filtering irrelevant image textures in the driving video by a gaussian filter, and extracting the outline of the road in the driving video by an edge detection algorithm specifically includes:

f＝0.299*R+0.587*G+0.114*B

/>

step S1015: setting an upper threshold value and a lower threshold value v _min And v _max Wherein is greater than v _max Are detected as edges and are lower than v _min Is detected as a non-edge. For the middle pixel point, if the middle pixel point is adjacent to the pixel point determined as the edge, the edge is determined; otherwise, the image is non-edge, so that a corresponding binary image (the gray value of the edge point is 1 and the gray value of the non-edge point is 0) is obtained.

Step S102: detecting a transformation of a straight line in a contour of a road by Huo Fuxian transformation, performing an accumulator to calculate a number of points mapped to the line in a hough space, and if there are enough mapped points in the hough space, detecting the straight line;

step S201: the method comprises the following steps of carrying out averaging treatment on an interested region of each frame of image of a driving video, converting the interested region into a row of pixels, and specifically comprising the following steps:

step S202: a row of pixel matrix obtained for each frame is spliced in time order to form (fps× (t _b -t _a ) W, 3) matrix, fps is the number of frames per second of video, and a color motion profile is generated according to the pixel matrix;

step S203: as shown in fig. 3, the motion profile is identified by the real-time object detection frame, and whether the traffic object in the identified traffic environment is located in the region of interest is determined, if yes, the traffic object is marked on the position of the corresponding frame line in the motion profile in the form of a colored pixel line segment according to the transverse position of the traffic object in the frame picture of the driving video pair, so as to form a motion profile semantic graph, which specifically comprises the following steps:

step S2032: screening traffic objects with confidence coefficient greater than 0.5 and central coordinates in the ROI area, wherein the traffic objects comprise pedestrians and vehicles;

step S2034: as shown in fig. 4, t in the motion profile _f The corresponding pixel row (i.e. the t-th _f Line), pixel line segment [ x ] of different class type object ₁ ，x ₂ ]Different colors are set, if the object is a vehicle, [ x ] ₁ ，x ₂ ]Pixels within the range are set to red; if the object is a pedestrian, setting green;

as shown in fig. 5 (a) and fig. 5 (b), a motion profile semantic graph containing the semantic features of the moving object is finally formed, line segments of the object in the graph are arranged along with time to form a continuous track, and the width degree of the track reflects the relative longitudinal position of the traffic object from the vehicle. The wider the trajectory, the closer the representative traffic object is, and the higher the corresponding risk factor.

The specific process of step S3 is as follows:

|x-μ|＞3σ

filling the missing value by a linear interpolation method, specifically:

d ⁱ⁺¹ Corresponding time;

step S302: extracting vehicle acceleration data a in natural driving data, drawing and observing a distribution curve, and determining an acceleration threshold value-0.3 of obvious deceleration behavior;

step S303: scanning driving time sequence data, and collecting emergency braking time t according to acceleration condition a less than or equal to-0.3 _d For each time t _d Taking the first 8 to 1 second time segment to form a potential high risk event segment e _c Combining video checking, eliminating false alarm caused by data acquisition errors, and forming 179 event fragments into a high-risk event preparation set

To avoid event overlap, it is ensured that adjacent emergency braking moments meet the conditions: t is t _d [i+1]-t _d [i]≥7。

Step S304: randomly sampling 1055 normal non-conflict events from the rest driving time sequence data by taking 7 seconds as a time window as a normal event preparation set

The specific process of step S4 is as follows:

step S401: one containing m _l The bar records, each with 10 events of kinematic features, extracted 26 features kinematic features in table 2 as important features of the sample, table 2 specifically as follows:

table 2 event sample field specification table

Taking event classification as a classification label value of the sample to generate a sample set;

step S402: 616 samples are selected as training sets (89 samples of high risk events and 527 samples of normal events) by a sampling and replacing method, and 1000 times are repeated to generate 1000 training sets { S ₁ ，…，S ₁₀₀₀ }；

Step S403: taking each training set as an input of a decision tree to construct a random forest { T }, which contains 1000 CART decision trees ₁ ，…，T ₁₀₀₀ }, wherein for T _i Selection m for each node randomly not repeated _node =2 features, using these 2 feature pairs S _i Dividing, and obtaining optimal division by using the minimum base index as a division standard, so as to train 1000 CART decision trees;

step S4041: calculating each kinematic feature m of 26 kinematic features _j Average amount of change I of node splitting uncertainty in all decision trees _j (i.e., importance), the degree of node o's opacity is measured using the base index, as follows:

wherein q is the number of CART decision trees;

the importance sets of the features that completed the normalization process are ranked from large to small, and the features that obtained the importance top 5 rank are shown in table 3:

TABLE 3 feature importance ranking table

Characteristic variable name	Description of the invention
		ACCEL_MEAN	Average value of acceleration in 8 seconds to 2 seconds before risk moment
ACCEL_MAX	Maximum acceleration within 8 seconds to 2 seconds before risk moment
		ACCEL_MIN	Minimum acceleration value within 8 seconds to 2 seconds before risk time
ACCEL_5S	Acceleration 5 seconds before the risk time
		ACCEL_6S	Acceleration at 6 seconds before the risk time

；

Step S405: prepare the set of normal events

And a high risk event preparation set

Each event of the plurality of events is represented by the above 5 features in the event, namely { id, accel_mean, accel_max, accel_min, accel_5s, accel_6s, label } wherein id is the number of the event, label is the type of the event, and a normal event set is formed>

And high risk event set->

/>

The multi-modal deep neural network model specifically comprises:

Pool1 layer, setting parameters of pooling layer including filter size, type and step length, etc., inputting m ₂ Performing maximum pooling to obtain a matrix m ₃ ；

Pool5 layer, setting pooling layer parameters, inputting m ₈ Maximum is carried outPooling to obtain matrix m ₉ ；

Let m ₁₂ And f _kinematic Merging, i.e. [ f _kinematic m ₁₂ ]As an input to the FC8 fully-connected layer, a 2×1 matrix is output, two values in the matrix correspond to predicted values of probabilities belonging to a risky class and a risky-free class, and then the predicted values are processed using Softmax to make the sum of the probabilities of the two classes 1, and the matrix transformation in the multi-modal deep neural network model is specifically as shown in table 1:

table 1 multi-modal network architecture table

Layer(s)	Input device	Output of
			Conv1	224×224×3	54×54×96
Pool1	54×54×96	28×28×96
			Conv2	28×28×96	28×28×256
Pool2	28×28×256	13×13×256
			Conv3	13×13×256	13×13×384
Conv4	13×13×384	13×13×384
			Conv5	13×13×384	13×13×256
Pool5	13×13×256	6×6×256
			FC6	6×6×256	4096×1
Drop6	4096×1	2048×1
			FC7	2048×1	5×1
FC8	10×1(5×1+5×1)	2×1

The specific process of step S5 is as follows:

step S501: dividing step S4 into normal event sets

And high risk event set->

Step S502: training a multi-modal deep neural network model through n _epoch The epoch, the loss value of the model is converged to a smaller value, training is stopped, and the final multi-mode deep neural network model M is stored _DCNN ；

And conflict event->

Generating a confusion matrix as shown in table 2 according to the prediction result of the test set:

TABLE 2 confusion matrix

I _sensitivity ＝TP/(TP+FN)

I _specificity ＝TN/(FP+TN)

and according to I _sensitivity And I _speciicity And generating an ROC curve for evaluating the model prediction effect. The AUC of the ROC curve corresponding to the multi-modal deep neural network model is 0.9, the AUC of the decision tree model is 0.56, the AUC of the random forest model is 0.75, the AUC of the bayesian network model is 0.69, and the AUC of the logistic regression model is 0.69. In contrast, the multi-modal deep neural network model is superior to other models in terms of both accuracy and authenticity.

Furthermore, the particular embodiments described herein may vary from one embodiment to another, and the above description is merely illustrative of the structure of the present invention. All such small variations and simple variations in construction, features and principles of the inventive concept are intended to be included within the scope of the present invention. Various modifications or additions to the described embodiments or similar methods may be made by those skilled in the art without departing from the structure of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims

1. The method for predicting the driving dangerous scene based on the motion profile semantic graph is characterized by comprising the following steps of:

step S1: acquiring a driving video of a vehicle, and dividing an interested region of the driving video;

step S6: S1-S4 are executed on the driving video to be detected, a motion profile semantic graph and important kinematic features of the driving video to be detected are obtained and input into the multi-mode deep neural network model, whether the driving has risks or not is predicted, and if the driving has risks, the driving is alarmed;

the specific process of generating the motion profile semantic graph in the step S2 is as follows:

step S201: carrying out averaging treatment on the interested area of each frame of image of the driving video, and converting the interested area into a row of pixels;

step S202: all the pixels in one row are spliced together in time sequence to form a motion profile;

step S203: identifying the motion profile by a real-time object detection frame, judging whether a traffic object in the identified traffic environment is positioned in an interested area, if so, marking the traffic object on the position of a corresponding frame line in the motion profile in a form of a colored pixel line segment according to the transverse position of the traffic object in a frame picture of a driving video, and forming a motion profile semantic graph;

the importance of the characteristics is determined by the base index of the corresponding characteristics of the high-risk event or the normal event;

the specific process of step S4 is as follows:

step S401: one containing m _l Recording each event with multiple kinematic features, extracting n kinematic features { m } ₁ ,…,m _n Using event classification as a classification label value of the sample as an important feature of the sample to generate a sample set;

step S402: selecting n from sample set by sampling and replacing method _s The q samples are used as training sets, q times are repeated to generate q training sets { S ] ₁ ,…,S _q }；

Step S403: constructing a random forest { T } containing q CART decision trees by taking each training set as an input of one decision tree ₁ ,…,T _q }, wherein for T _i Selection m for each node randomly not repeated _node By means of the m _node Pairs of features S _i Dividing, and obtaining optimal division by using the minimum base index as a standard, so as to train q CART decision trees;

step S4041: calculate { m } ₁ ,…,m _n Each kinematic feature m in } _j Average amount of change I of node splitting uncertainty in all decision trees _j I.e. importance, the degree of non-purity of node o is measured using the base index, as follows:

wherein, GI _o For the unreliability of node o, k represents the class: high risk, normal, p _ok Represents the proportion of class k in node o, p _ok′ Representing the proportion of non-category k;

I _ji ＝∑ _o∈O I _jio ＝∑ _o∈O (GI _jio -G _jiol -G _jior )

wherein O is the m containing kinematic feature of the ith tree _j Node set, GI _jio A base index G which is the node o of the ith tree _jiol ,G _jior The base indexes of the left and right new nodes after the node o branches are obtained;

wherein q is the number of CART decision trees;

step S4044: obtain the importance set { I ] of all kinematic features ₁ ,…,I _X After } the importance is normalized, the specific steps are as follows:

ranking the importance sets of the features subjected to normalization processing from large to small to obtain n before importance ranking _important Features of (2)

Step S405: prepare the set of normal events

And a high risk event preparation set

Wherein id is event number, label is event type, and forms normal event set +.>

And high risk event set->

The multi-modal deep neural network model specifically comprises:

Conv1 layer, setting convolution layer parameters including filteringNumber, size, step length and activation function of the device, input m ₁ Obtaining matrix m ₂ ；

2. The method for predicting a driving hazard scene based on motion profile semantic graphs according to claim 1, wherein the region of interest comprises an upper boundary and a lower boundary.

3. The method for predicting a driving hazard scene based on a motion profile semantic graph according to claim 2, wherein the specific process of segmenting the region of interest of the driving video in step S1 is as follows:

step S101: filtering irrelevant image textures in the driving video through a Gaussian filter, and extracting the outline of a road in the driving video through an edge detection algorithm;

step S102: detecting a transformation of a straight line in the contour of the road by Huo Fuxian transformation;

step S103: and calculating cross lines for the detected groups of straight lines, obtaining cross points according to the cross lines to determine the upper boundary of the region of interest, and determining the lower boundary of the region of interest according to the starting points of the groups of straight lines.

4. The method for predicting driving risk scenes based on motion profile semantic graphs according to claim 1, wherein the traffic objects include pedestrians and vehicles.

5. The method for predicting driving hazard scenes based on the motion profile semantic graph according to claim 1, wherein the specific process of the step S3 is as follows:

step S301: detecting and filtering abnormal values of vehicle motion data through a normal distributed 3 sigma principle, and filling the missing values through a linear interpolation method;

step S302: acquiring corresponding acceleration distribution according to the filtered and filled vehicle motion data, judging vehicle avoidance behavior, and setting an acceleration threshold value of a dangerous driving event according to a judgment result;

step S303: extracting a potential dangerous driving event according to the acceleration threshold value;

step S304: and calibrating a normal event set and a conflict event set on the potential dangerous driving event according to the checking result of the driving video.

6. The method for predicting a driving hazard scene based on a motion profile semantic graph according to claim 1, wherein the multi-modal deep neural network model comprises a visual data processing layer, a kinematic data processing layer, a data fusion layer and a prediction layer.