CN110807352A

CN110807352A - In-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning

Info

Publication number: CN110807352A
Application number: CN201910808682.1A
Authority: CN
Inventors: 缪其恒; 苏志杰; 孙焱标; 王江明; 许炜
Original assignee: Zhejiang Zero Run Technology Co Ltd
Current assignee: Zhejiang Zero Run Technology Co Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2020-02-18
Anticipated expiration: 2039-08-29
Also published as: CN110807352B

Abstract

The invention discloses a visual analysis method for in-vehicle and out-vehicle scenes for early warning of dangerous driving behaviors, which comprises the following steps of: s1, data acquisition, synchronization and pretreatment; s2, semantic coding of road scenes; s3, semantic coding of a cab scene; s4, classifying time sequence dangerous driving behaviors; and S5 model forward operation deployment and output post-processing. The technical scheme includes that synchronous foresight and cab scene images are input, road scenes and cab scenes are subjected to convolutional neural network feature coding, then are sent to a time sequence behavior classifier based on a recurrent neural network in a cascading mode, dangerous driving behavior categories are output and used for a subsequent corresponding early warning algorithm, and judgment on dangerous driving behaviors is accurate.

Description

In-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning

Technical Field

The invention relates to the field of driver behavior auxiliary systems, in particular to a visual analysis method for in-vehicle and out-vehicle scenes for early warning of dangerous driving behaviors.

Background

According to the statistics of the road traffic accident data, more than half of the traffic accidents are caused by the danger of the drivers or the wrong vehicle operation. However, most of such human accidents are caused by driving fatigue or distraction. Therefore, the active safety system and the driver behavior analysis system have important application values. The conventional driving assistance systems for passenger vehicles and commercial vehicles respectively perform dangerous driving behavior early warning according to important road scene parameters, such as pre-collision time (TTC) and pre-compaction line Time (TLC), or according to important driver's cab scene parameters, such as eyeball opening and face orientation.

However, both of these systems have their own advantages and disadvantages; the system based on the road scene analysis cannot accurately reflect the concentration and fatigue degree of a driver on vehicle operation, is susceptible to the influence of an installation angle and a view field range, can falsely trigger corresponding early warning signals in certain normal driving behaviors, and cannot accurately send out the early warning signals in certain untrained dangerous behaviors. Therefore, the two systems have a potential complementary relationship, the early warning accuracy and reliability of dangerous driving behaviors can be effectively improved by comprehensively analyzing the scene inside and outside the vehicle, and at present, no method based on the visual joint analysis of the scene inside and outside the vehicle is applied to an auxiliary driving (early warning) system.

At present, no auxiliary driving system deployed in mass-production vehicle models comprehensively analyzes scenes inside and outside a vehicle and then sends out early warning signals of corresponding dangerous driving behaviors (such as too short distance between vehicles, lane departure, fatigue driving, distraction and the like). The existing assistant driving system is mainly based on: i) vehicle dynamics parameters and steering signals; ii) visual system perception results; and iii) carrying out corresponding driving behavior early warning on the sensing result of the millimeter wave radar system. The used visual system can be divided into two types according to scene analysis basis: i) the cab vision system mainly carries out recognition of partial fatigue and distracted driving states such as doze, yawning and the like through facial image feature analysis of a driver; a forward looking system, primarily through road scene image feature analysis, for specific vehicle driving state identification, such as yaw, pre-crash, etc.

The potential problems existing in the dangerous driving behavior early warning by independently using the two vision systems are as follows: i) systems based on road scene analysis need to recognize driver intent (e.g., lane change intent with turn lights) by specific signal input, false alarms are easily triggered, and the system cannot accurately reflect driver concentration and fatigue on vehicle operation; ii) the system based on the analysis of the cab scene is susceptible to the influence of the installation angle and the field of view, some normal driving behaviors can trigger corresponding early warning signals by mistake, and the system cannot accurately send out the early warning signals for some untrained dangerous behaviors.

Disclosure of Invention

The invention aims to solve the problem of inaccurate judgment on dangerous driving conditions caused by low fusion degree of a cab vision system and a forward-looking system, and provides an in-vehicle and out-vehicle scene vision analysis method for dangerous driving behavior early warning.

In order to achieve the technical purpose, the invention provides a technical scheme that the in-vehicle and out-vehicle scene visual analysis method for the early warning of dangerous driving behaviors comprises the following steps:

s1, data acquisition, synchronization and pretreatment;

s2, semantic coding of road scenes;

s3, semantic coding of a cab scene;

s4, classifying time sequence dangerous driving behaviors;

s5, model forward operation deployment and output post-processing.

In the scheme, firstly, road scene and cab scene image input is collected and synchronized, and after preprocessing operations such as format conversion, ROI selection and scaling, the images are sent to a subsequent deep convolutional neural network module analysis module; secondly, based on the foresight traffic scene image input, performing foresight scene semantic feature description by using a road scene deep convolution neural network obtained through offline training, enabling the activation part to be image areas such as road boundaries, vehicles and pedestrians, outputting encoded road scene semantic features, and sending the road scene semantic features and cab scene semantic features into a time sequence analysis module after cascading; then, based on the input of a driver's cab scene image, performing semantic feature description on the driver's cab scene by using a deep convolutional neural network of the driver's cab scene obtained by off-line training, enabling the activated part to be an image area of the face, the upper body and the like of the driver, outputting the coded driver's cab scene semantic features, and cascading the coded driver's cab scene semantic features with the road scene semantic features and then sending the coded driver's cab scene semantic features into a time sequence analysis module; and finally, classifying the time-series behaviors by using a recurrent neural network model or a support vector machine according to different early warning application requirements based on the characteristics of the inside scene and the outside scene of the cascade vehicle.

The step S1 includes the following steps:

s11, preprocessing a road scene image: road scene image data are collected through a forward-looking camera and stored in an image cache pool, and are subjected to feature description of a convolutional neural network and then are sent to a time sequence behavior classifier based on the recurrent neural network in a cascading mode;

s12, cab scene image preprocessing: the method comprises the steps of collecting cab scene image data through a cab camera, storing the cab scene image data in an image cache pool, and sending the cab scene image data to a time sequence behavior classifier based on a recurrent neural network in a cascade mode after being described by characteristics of the recurrent neural network.

The step S2 includes the following steps:

s21, road scene neural network topology: the input is a road scene RGB image with 320 × 180 × 3 channels, and the backbone network comprises convolution, pooling, normalization, activation and deconvolution basic networks;

s22, road scene training data set: collecting a traffic scene data set, and manually marking to generate a multi-task training label;

s23, off-line training of the road scene model: designing a road scene characteristic loss function L by comprehensively considering the application of a road scene neural network in an auxiliary driving system and the compatibility and portability of network characteristics_traffic。

Loss of said stepFunction L_trafficThe following calculation formula is used:

L_traffic＝k₁L_obj+k₂L_road

L_ce(y，g)＝glogy+(1-g)log(1-y)

in the formula, L_objAs a function of the target loss, L_roadFor the road surface semantic loss function, k1 is the target loss function L_objK2 is the road surface semantic loss function L_roadWeight coefficient of (1), L_1s(loc_i，g_loc，i) And L_ce(att_ki，g_att，ki) As a cross-entropy loss function, L_1s(y, g) is smoothL1 loss function, target loss function L_objIncluding classification loss function, position regression loss function and attribute classification loss function of each target, α is weight coefficient of classification loss function, β is weight coefficient of position regression loss function, lambda_jFor weighting coefficients of attribute classification loss function, road semantic loss function L_roadResulting from the image pixel level cross entropy summation.

The step S3 includes the following steps:

s31, cab scene neural network topology: the method comprises the steps that (1) a scene infrared image of a cab with 320-180-1 channels is input, and a backbone network comprises convolution, pooling, normalization, activation and deconvolution basic networks;

s32, a cab scene training data set: collecting a scene data set of a cab, and manually marking to generate a multi-task training label;

s33 driver' S cabin scene modelOff-line training: the application of a cab scene neural network in an auxiliary driving system and the compatibility and the portability of network characteristics are comprehensively considered, and a cab scene characteristic loss function L is designed_driver。

The loss function L_driverThe following calculation formula is used:

L_driver＝μ₁L_fd+μ₂L_gd+μ₃L_hp

in the formula: l is_fdDetecting a loss function for a face, L_gdAs a function of eye orientation loss, L_hpAs a function of facial orientation loss, μ₁Detecting a loss function L for a face_fdWeight coefficient of (d), mu₂As a function of eye orientation loss L_gdWeight coefficient of (d), mu₃As a function of facial orientation loss L_hpThe face detection loss function L_fdIncluding face classification loss function, face region regression loss function and key point regression loss function, α₁Weighting coefficients for the face classification loss function, α₂Loss function which is a regression loss function for the face region, α₃Face orientation regression loss function L as weight coefficient of the key point regression loss function_hpIncluding face orientation classification loss function, face orientation angle regression loss function, and orientation classification and angle consistency loss function, β₁Weighting coefficients for face orientation classification loss functions, β₂Weighting coefficients for the face orientation angle regression loss function, β₃To weight coefficients of the orientation classification and angle consistency loss function,eyeball orientation regression loss function L_gdComprises an eyeball orientation classification loss function, an eyeball orientation angle regression loss function, an eyeball orientation classification and angle consistency loss function, gamma₁Weight coefficient, gamma, of a function of eye orientation classification loss₂Weight coefficient, gamma, of the regression loss function for the eyeball orientation angle₃Weighting coefficients of the eyeball orientation classification and angle consistency loss function.

The step S4 includes the following steps:

s41, topology of the long-term and short-term memory neural network;

s42, training a long-term and short-term memory neural network data set;

s43, off-line training of the long-term and short-term memory neural network; method for constructing driving behavior classification loss function L by curing convolution characteristic layer network parameters_behaviorOptimizing the loss function L in a batch stochastic gradient descent manner_behavior。

The loss function L_behaviorThe following calculation formula is used:

in the formula: b is_i，jTo predict behavior classes, g_b，ijThe behavior category true value is shown, N is the number of independent fragments, and T is the number of independent fragment frames.

In step S5, the models include a road scene model, a cab scene model, and a dangerous driving behavior classification model.

The invention has the beneficial effects that:

1. compared with a visual analysis system based on single scene input, the system for the early warning of the dangerous behaviors has higher reliability in comprehensively analyzing the driving state of a driver and the driving state of a vehicle_；

2. By utilizing a deep learning method, behavior categories generated by end-to-end neural network and massive driving data training are suitable for different driving groups and driving habits, and compared with an early warning system based on specific rules and numerical criteria, the early warning robustness of dangerous behaviors is higher;

3. the scene activation areas are consistent with the interested activation areas when the systems are independently applied, and can be integrated in the neural network-based architecture vision system without introducing extra characteristic computation, so that the portability and the expandability of the invention are better.

4. And meanwhile, the potential dangerous vehicle motion and the potential fatigue driving state are identified, and the vehicle driving state early warning and the corresponding driver driving state are associated, so that the driving state of the driver in the corresponding dangerous driving state is identified to reduce unnecessary false alarm in the part in the normal driving state.

Drawings

Fig. 1 is a flowchart of a method for visually analyzing scenes inside and outside a vehicle for early warning of dangerous driving behaviors according to the present invention.

Fig. 2 is a specific method flow of the in-vehicle and out-vehicle scene visual analysis method for early warning of dangerous driving behaviors in the present invention.

FIG. 3 is a schematic diagram of a deep neural network architecture suitable for use in the present invention.

Detailed Description

For the purpose of better understanding the objects, technical solutions and advantages of the present invention, the following detailed description of the present invention with reference to the accompanying drawings and examples should be understood that the specific embodiment described herein is only a preferred embodiment of the present invention, and is only used for explaining the present invention, and not for limiting the scope of the present invention, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the scope of the present invention.

Example (b): as shown in fig. 1, a method flowchart of an in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning includes the following steps:

s1, data acquisition, synchronization and pretreatment;

s2, semantic coding of road scenes;

s3, semantic coding of a cab scene;

s4, classifying time sequence dangerous driving behaviors;

s5, model forward operation deployment and output post-processing.

In the embodiment, firstly, road scene and cab scene image input is collected and synchronized, and after preprocessing operations such as format conversion, ROI selection and scaling, the images are sent to a subsequent deep convolutional neural network module analysis module; secondly, based on the foresight traffic scene image input, performing foresight scene semantic feature description by using a road scene deep convolution neural network obtained through offline training, enabling the activation part to be image areas such as road boundaries, vehicles and pedestrians, outputting encoded road scene semantic features, and sending the road scene semantic features and cab scene semantic features into a time sequence analysis module after cascading; then, based on the input of a driver's cab scene image, performing semantic feature description on the driver's cab scene by using a deep convolutional neural network of the driver's cab scene obtained by off-line training, enabling the activated part to be an image area of the face, the upper body and the like of the driver, outputting the coded driver's cab scene semantic features, and cascading the coded driver's cab scene semantic features with the road scene semantic features and then sending the coded driver's cab scene semantic features into a time sequence analysis module; and finally, classifying the time-series behaviors by using a recurrent neural network model or a support vector machine according to different early warning application requirements based on the characteristics of the inside scene and the outside scene of the cascade vehicle.

As shown in fig. 2, a flow chart of a specific method of the in-vehicle and out-vehicle scene visual analysis method for the early warning of dangerous driving behaviors is shown,

the invention provides a vehicle interior and exterior scene vision joint analysis method for early warning of dangerous driving behaviors, which inputs road scene images (colors) acquired by a forward-looking camera and cab scene images (infrared) acquired by a cab camera, and after CNN (convolutional neural network) feature description, the road scene images and the cab scene images are cascaded and sent to a time sequence behavior classifier based on a recurrent neural network, and dangerous driving behavior categories are output. The horizontal field angle of the front-view camera is 50 degrees, the left and right centers of the front-view camera are arranged at the position of the front windshield with the height of 1-1.2 meters, and the direction of the front-view camera is horizontal and forward; the horizontal field of view angle of driver' S cabin camera is 50, adopts infrared light filling, installs in A post department towards driver region, S1, data acquisition, synchronization and preliminary treatment: and (3) offline adjusting and configuring acquisition parameters such as exposure, gain and the like, marking a system time stamp after acquiring image original data, sending the image original data into respective preprocessing cache queues according to the same sequence number after matching the time stamp, and inputting the image original data into a neural network operation unit after the following preprocessing operations.

S11, preprocessing a road scene image: the method comprises the steps of reading a picture (YUV format, 1280 × 720) at the top of a road scene image cache pool, converting the picture into an RGB format, intercepting a predefined ROI, and then zooming to the road scene neural network input size (according to a corresponding network input interface, the picture is defaulted to 320 × 180 × 3).

S12, cab scene image preprocessing: reading a picture (YUV format) at the top of a cache pool of the scene image of the cab, intercepting Y-channel data, and then intercepting a predefined ROI (region of interest) to be zoomed to the input size of a neural network of the scene of the cab (according to a corresponding network input interface, the default is 320 × 180 × 1).

S2, road scene semantic coding: the road scene semantic coding neural network is shown in fig. 2, a forward-looking traffic scene image is input, and road scene activation semantic features are output through offline data acquisition, neural network model training and online model reasoning.

S21, road scene neural network topology: the input is a road scene RGB image with 320 × 180 × 3 channels, and the backbone network is mainly composed of basic network operations such as convolution (conv), pooling (posing), normalization (BN), activation (Relu) and deconvolution (deconv). The scene features comprise feature descriptions at 1/4, 1/16 and 1/64 scales of network input, and branches of calculation loss functions adopted by offline training comprise a target detection branch, a road semantic segmentation branch and a target attribute classification branch.

S22, road scene training data set: the method comprises the steps of collecting a traffic scene data set, extracting time sequence discrete samples including different time (day, night and the like), weather (sunny, cloudy, rainy and the like), driving scenes (city, high speed, tunnel and the like), manually labeling to generate a multitask training label, and mainly comprising a target detection label (in a target frame form), a lane boundary label (in a semantic layer form) and a drivable area label (in a semantic layer form). The target frame form comprises a target category (0-other, 1-small vehicle, 2-big vehicle, 3-pedestrian, 4-non-motor vehicle, 5-signal lamp, 6-signboard), a position (x, Y, W, H) and other custom attributes (such as signboard category, vehicle 3D attribute and the like).

S23, off-line training of the road scene model: designing a road scene characteristic loss function L by comprehensively considering the application of a road scene neural network in an auxiliary driving system (including other traffic participants, the identification of traffic signal identifications and the identification of lanes and road boundaries) and the compatibility and the portability of network characteristics_trafficThe following were used:

L_traffic＝k₁L_obi+k₂L_road

L_ce(y，g)＝glogy+(1-g)log(1-y)

And expanding the training data set in the step S22 by adopting color, geometric transformation and other modes, reversely transmitting the gradient information of the loss function by utilizing a batch sample gradient descending mode, and updating the corresponding weight parameters of the neural network.

S3, semantic coding of cab scene: as shown in fig. 3, the deep neural network architecture is a schematic diagram of a semantic coding neural network for a cab scene, as shown in fig. 3, the semantic coding neural network for the cab scene is input as an image of the cab scene, and the semantic features for activating the cab scene are output through offline data acquisition, neural network model training and online model reasoning.

S31, cab scene neural network topology: the input is 320 × 180 × 1 channel cab scene infrared image, and the main network part is composed of basic network operations such as convolution (conv), pooling (pooling), normalization (BN), activation (Relu), deconvolution (deconv) and the like, similar to the road scene neural network. The cab scenario features mainly include feature descriptions at the scale of 1/4, 1/8, and 1/16 of the neural network inputs.

S32, a cab scene training data set: the method comprises the steps of collecting a cab scene data set, extracting time sequence discrete samples, manually labeling and generating a multi-task training label, wherein the cab scene data set comprises different time (day, night and the like), weather (clear, cloudy, rainy and the like), cab camera installation positions (middle, a column and the like), cab space structures (cars, suv and the like) and driver identities (vehicle drivers with different features, sexes and the like), and mainly comprises a face area, a key point label, a face orientation label and an eyeball orientation label. The label format of the face region label is the same as that of the common target frame label format, and the face key point label comprises 13 key points (including image coordinate positions of 8 key points of eyes, 1 key point of a nose tip and 4 key points of a mouth), a face orientation (a head three-degree-of-freedom rotation angle under a camera coordinate system) and an eyeball orientation which are two-degree-of-freedom angle labels (namely, angles of up-and-down and left-and-right rotation of an eyeball in a face plane).

S33, off-line training of a cab scene model: the application of a cab scene neural network in an auxiliary driving system (including the detection of fatigue and distracted driving behaviors) and the compatibility and the portability of network characteristics are comprehensively considered, and a cab scene characteristic loss function L is designed_driverThe following were used:

L_driver＝μ₁L_fd+μ₂L_gd+μ₃L_hp

in the formula: l is_fdDetecting a loss function for a face, L_gdAs a function of eye orientation loss, L_hpAs a function of facial orientation loss, μ₁Detecting a loss function L for a face_fdWeight coefficient of (d), mu₂As a function of eye orientation loss L_gdWeight coefficient of (d), mu₃As a function of facial orientation loss L_hpThe face detection loss function L_fdIncluding face classification loss function, face region regression loss function and key point regression loss function, α₁Weighting coefficients for the face classification loss function, α₂Loss function which is a regression loss function for the face region, α₃Face orientation regression loss function L as weight coefficient of the key point regression loss function_hpIncluding face orientation classification loss function, face orientation angle regression loss function, and orientation classification and angle consistency loss function, β₁Weighting coefficients for face orientation classification loss functions, β₂Weighting coefficients for the face orientation angle regression loss function, β₃Eye orientation regression loss function L as weight coefficient of orientation classification and angle consistency loss function_gdComprises an eyeball orientation classification loss function, an eyeball orientation angle regression loss function, an eyeball orientation classification and angle consistency loss function, gamma₁Weight coefficient, gamma, of a function of eye orientation classification loss₂Regression loss function for eyeball orientation angleWeight coefficient of (a), γ₃Weighting coefficients of the eyeball orientation classification and angle consistency loss function.

And expanding the training data set in the step S32 by adopting color, geometric transformation and other modes, reversely transmitting the gradient information of the loss function by utilizing a batch sample gradient descending mode, and updating the corresponding weight parameters of the neural network.

S4, time-series dangerous driving behavior classification: cascading the coded driving cab indoor and outdoor scenes as a single-moment behavior feature description, classifying the predefined segment length behavior features by using a long-short-term memory neural network (LSTM, shown in figure 3), and outputting predefined driving behavior categories (0-normal driving, 1-lane departure, 2-front vehicle potential collision, 3-fatigue driving and 4-inattentive driving).

S41, LSTM network topology: the number of time series recursion units is 12 (behavior corresponding to time series data of approximately 1 second at a processing speed of 12.5 frames/second), and the formula used is as follows:

f_t＝sigmoid(σ_f(x_t，h_t-1))

i_t＝sigmoid(σ_i(x_t，h_t-1))

o_t＝sigmoid(σ_o(x_t，h_t-1))

c_t＝f_t·c_t-1+i_t·tanh(σ_c(x_t，h_t-1))

h_t＝o_t·tanh(c_t)

in the formula: x is the number of_tAs an input vector, f_tTo forget the gate vector, i_tTo update the gate vector, h_tIs a hidden layer vector, o_tTo output the gate vector, c_tIs a tuple state vector.

S42, LSTM network training data set: and (3) acquiring image data of the indoor scene and the outdoor scene of the corresponding cab with the predefined driving behaviors in step 4 according to the data acquisition and synchronization mode in step 1, wherein the image data comprises scenes and working conditions described in data sets in step S2 and step S3, and corresponding video segments (2 seconds correspond to an event) are intercepted according to a frame rate of 12.5 frames, and each video segment corresponds to a behavior label.

S43, LSTM network offline training: solidifying the network parameters of the convolution characteristic layer (namely, the gradient is not propagated to the part reversely), and constructing a driving behavior classification loss function L_behaviorThis loss function is optimized as follows in a batch stochastic gradient descent:

S5, model forward operation deployment and model output post-processing: as described in step S2, step 3 and step 4, the model includes three branches, namely, a road scene model, a cab scene model and a dangerous driving behavior classification model. The branches included in the step S2 and the step S3 are only used for training, calculating a loss function and a back propagation gradient, only the corresponding scene feature layer part is reserved during forward operation, and after data quantization, thinning and other compression operations are performed on the corresponding model parameters according to the operation characteristics of a front-end platform, the corresponding model parameters are cascaded according to a preset feature channel sequence and then are transmitted to a long-short memory module to perform time sequence driving behavior online classification.

The present invention is not limited to the specific implementation scope of the present invention, but the scope of the present invention includes and is not limited to the specific implementation scope of the present invention, and all equivalent changes made according to the shape and structure of the present invention are within the protection scope of the present invention.

Claims

1. A visual analysis method for in-vehicle and out-vehicle scenes for early warning of dangerous driving behaviors is characterized by comprising the following steps of: the method comprises the following steps:

s1, data acquisition, synchronization and pretreatment;

s2, semantic coding of road scenes;

s3, semantic coding of a cab scene;

s4, classifying time sequence dangerous driving behaviors;

s5, model forward operation deployment and output post-processing.

2. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 1, wherein:

the step S1 includes the following steps:

3. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 1, wherein:

the step S2 includes the following steps:

4. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 3, wherein:

the step loss function L_trafficThe following calculation formula is used:

L_traffic＝k₁L_obj+k₂L_road

L_ce(y，g)＝glogy+(1-g)log(1-y)

5. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 1, wherein:

the step S3 includes the following steps:

s33, off-line training of a cab scene model: the application of a cab scene neural network in an auxiliary driving system and the compatibility and the portability of network characteristics are comprehensively considered, and a cab scene characteristic loss function L is designed_driver。

6. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 5, wherein:

the loss function L_driverThe following calculation formula is used:

L_driver＝μ₁L_fd+μ₂L_gd+μ₃L_hp

in the formula: l is_fdDetecting a loss function for a face, L_gdAs a function of eye orientation loss, L_hpAs a function of facial orientation loss, μ₁Detecting a loss function L for a face_fdWeight coefficient of (d), mu₂As a function of eye orientation loss L_gdWeight coefficient of (d), mu₃As a function of facial orientation loss L_hpThe face detection loss function L_fdIncluding face classification loss function, face region regression loss function and key point regression loss function, α₁Weighting coefficients for the face classification loss function, α₂Loss function which is a regression loss function for the face region, α₃Face orientation regression loss function L as weight coefficient of the key point regression loss function_hpIncluding face orientation classification loss function, face orientation angle regression loss function, and orientation classification and angle consistency loss function, β₁Weighting coefficients for face orientation classification loss functions, β₂Weighting coefficients for the face orientation angle regression loss function, β₃Eye orientation regression loss function L as weight coefficient of orientation classification and angle consistency loss function_gdComprises an eyeball orientation classification loss function, an eyeball orientation angle regression loss function, an eyeball orientation classification and angle consistency loss function, gamma₁Weight coefficient, gamma, of a function of eye orientation classification loss₂Weight coefficient, gamma, of the regression loss function for the eyeball orientation angle₃Weighting coefficients of the eyeball orientation classification and angle consistency loss function.

7. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 1, wherein:

the step S4 includes the following steps:

s41, topology of the long-term and short-term memory neural network;

s42, training a long-term and short-term memory neural network data set;

8. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 7, wherein:

the loss function L_behaviorThe following calculation formula is used:

9. The in-vehicle and out-vehicle scene visual analysis method for dangerous driving behavior early warning as claimed in claim 1, wherein: