CN113569817A

CN113569817A - Driver attention dispersion detection method based on image area positioning mechanism

Info

Publication number: CN113569817A
Application number: CN202111110059.2A
Authority: CN
Inventors: 赵磊; 孙浩然; 罗映; 徐楠; 刘建华; 闫法义; 贝太学; 李新海; 张宗喜
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2021-10-29
Anticipated expiration: 2041-09-23
Also published as: CN113569817B

Abstract

A driver attention dispersion detection method based on an image area positioning mechanism is characterized in that areas needing attention under different behavior states of a driver in an image are obtained through a manual calibration method, the areas are combined with neural network activation mapping, a model optimization function based on area enhancement driving is established, the neural network is trained through the optimization function, so that a detection model can automatically obtain key areas in the image of the driver according to different behavior characteristics of the driver in the detection process, the problem of automatic extraction of key positions and features in a driver behavior detection method based on image information is solved, and the detection precision of the model is improved.

Description

Driver attention dispersion detection method based on image area positioning mechanism

Technical Field

The invention relates to the technical field of driver state recognition, in particular to a driver attention dispersion detection method based on an image area positioning mechanism.

Background

With the development of science and technology, intelligent electronic devices such as smart phones, tablet computers and vehicle information systems greatly improve the probability of driver distraction, easily generate potential safety hazards to cause traffic accidents and harm life and property safety. Statistically, nearly 125 million people die of traffic accidents each year. Nearly one fifth of accidents are caused by distractions of the driver. With the progress of artificial intelligence technology, the automatic driving technology is rapidly developed. However, current conditional autonomous driving systems still require a driver who is ready to take over in time. The american national traffic safety committee counted 37 car accidents in 18 months for the Uber autopilot test car between 2018 and 2019. Therefore, the accurate and effective driver distraction behavior detection system is designed to have important significance for improving traffic safety.

The driver distraction detection methods can be classified into the following three categories: based on the driver physiological information, the driving operation information, and the visual information. When the mental state of a driver changes, the physiological signal of the driver also changes, however, most of the physiological acquisition sensors need to be worn to the corresponding position of the body of the driver, and the driving experience is influenced. The driver state identification method based on the operation behaviors mainly utilizes the driver to acquire operation information of a steering wheel, an accelerator and a brake pedal, analyzes the driving behaviors of the driver in different states and presumes whether the driver is in a dangerous driving state. However, the recognition accuracy of the method is often influenced by the operating habits, skills, traffic road conditions and other factors of the driver. The vision-based detection method can non-invasively extract visual image information of the driver and is not affected by external interference. Therefore, the visual characteristics are the most widely used information in the driver distraction detection method. Vision-based detection methods can be divided into two categories: the first method directly classifies the original image to detect the state and behavior of the driver, and the method is often interfered by other factors in the image besides the driver in the image; the second method utilizes a target detection or image segmentation model to extract key areas or features such as hands, heads, upper bodies and the like from a driver image, and then inputs the extracted information into an identification model to obtain a detection result, however, the positioning of the areas or the features is often limited by the accuracy of an algorithm and is often subjected to false detection.

Disclosure of Invention

In order to overcome the defects of the technologies, the invention provides the driver attention dispersion detection method based on the image area positioning mechanism, solves the problem of automatic extraction of key positions and features in the driver behavior detection method based on the image information on the premise of not increasing the complexity of the model, and improves the detection precision of the model.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

a driver distraction detection method based on an image area positioning mechanism comprises the following steps:

a) acquiring visual images of different behaviors of a driver, and determining key areas needing attention in the different behavior states in an automatic positioning and manual adjusting mode according to the different behavior states of the driver in each visual image;

b) establishing a probability heat map of a key area in a visual image of a driver by using a Gaussian model, and establishing a driver behavior detection data set based on area positioning;

c) establishing a neural network model, constructing cost functions driven by class activation mapping and key region probability heat maps, and training a neural network by using the cost functions to obtain an optimized neural network model;

d) and installing a camera in the vehicle, acquiring a real-time image of the side part of the driver, inputting the image into the optimized neural network model, and extracting the output probability of the model to obtain the behavior state of the driver.

Further, a camera is installed in the vehicle in the step a), videos of different behaviors of the driver are collected through the camera, the videos are converted into visual images frame by frame, and the visual images are stored to obtain sample images.

Further, the behavior states of the driver in step a) are respectively defined as: a normal driving state, a state of using a smart phone or a tablet computer, a calling state, a conversation with a co-driver state, a drinking state and an operation center control electronic equipment state; when the driver is in a normal driving state, key areas needing attention are positioned on the hands and the upper arms of the driver in the visual image; when the driver is in a state of using the smart phone or the tablet personal computer, the key area needing attention is located in the mobile phone or the tablet personal computer of the hand of the driver in the visual image; when the driver is in a calling state, the key area needing attention is positioned at the mouth and the mobile phone position of the driver in the visual image; when the driver is in a conversation state with the co-driver, key areas needing attention are positioned at the mouth and the face of the driver in the visual image; when the driver is in a drinking state, the key area needing attention is positioned in a container held by the driver in the visual image; when the driver is in the state of operating the central control electronic equipment, the key area needing attention is positioned at the hand of the driver and the central control equipment in the visual image.

Further, the step of step a) comprises:

a-1) finding out the limb movement area of a driver in the process of executing different behaviors in a sample image, and establishing a key area based on different behavior states in the driver image;

a-2) based on the established key area, automatically acquiring the position information of an upper arm skeleton point and a head skeleton point of a driver in a sample image by a skeleton point positioning method, drawing a rectangular frame based on the upper arm skeleton point and the head skeleton point, wherein the skeleton point is positioned at the center of the rectangular frame, and obtaining the initial position of the key area of the image;

and a-3) manually correcting to obtain a final key area of the image according to the position and the size of the rectangular frame.

Further, step b) comprises the following steps:

b-1) based on the key region, by formula

Establishing a two-dimensional Gaussian model

In the formula

In order to normalize the factors, the method comprises the steps of,

in the form of a covariance matrix,

in order to be transposed, the device is provided with a plurality of groups of parallel connection terminals,

are variables of the two-dimensional gaussian model,

for key locations in the driver behavior image,

，

is the abscissa of the critical area and is,

is the ordinate of the key area and is,

is composed of

The medium maximum value is the maximum value of the average,

is composed of

The minimum value of the sum of the average values,

is composed of

The medium maximum value is the maximum value of the average,

is composed of

A medium to minimum value;

b-2) two-dimensional Gaussian model

Converting the image into a two-dimensional image to obtain a probability heat map of a key area in a visual image of the driver;

b-3) traversing all image samples in the driver behavior detection data set based on the area positioning, and repeatedly executing the steps b-1) to b-2), and storing probability heat maps of key areas in the visual images of all drivers to obtain the driver behavior detection data set based on the area positioning.

Further, step c) comprises the steps of:

c-1) establishing a ResNeXt neural network model, and adopting a global average pooling layer on the top layer of the neural network;

c-2) establishing a SoftMax classifier at the top layer of the global pooling layer to output a driver behavior prediction probability value;

c-3) by the formula

Calculating the class activation function of each driver behavior state class output by the top layer of the neural network,

is composed of

A heat map of the class is generated,

the number of neurons in the top layer is,

as a top-level weight parameter,

mapping values of a previous layer of the global mean pooling layer;

c-4) extracting the driver behavior prediction probability value and the class activation mapping of the neural network model through a formula

Calculating to obtain a region enhanced optimization function

In the formula (I), wherein,

in order to be a function of the non-linear transformation,

is a neural network

The class activation mapping of the class behavior state class,

as of the same class as the true behavioral state class

The class activates the mapping and the class activates the mapping,

in the same class as the real category of behavior,

as a function of the number of the coefficients,

the matrix is a product of the hadamard,

is a predefined one

Class activation mapping;

c-5) by the formula

Calculating a cost function

，

For the cost function based on the driver state value,

，

as a function of the number of the coefficients,

for the output values of the resenext neural network model,

is a calibration value;

c-6) passing a cost function

Training the ResNeXt neural network model until convergence, and establishing the hyper-parameters of the model through cross validation.

Further, a camera is arranged at the position of the roof above the right of the driver in the step d).

Further, step d) comprises the following steps:

d-1) reading the ResNeXt neural network model trained in c-6) as a detection model;

d-2) inputting each frame of image of the driver acquired by the camera into the detection model;

d-3) obtaining the prediction probability value in a SoftMax classifier at the top layer of the ResNeXt neural network model, and identifying the current behavior state of the driver.

The invention has the beneficial effects that: the method comprises the steps of obtaining regions needing attention of a driver in different behavior states in an image through a manual calibration method, combining the regions with neural network type activation mapping, establishing a model optimization function based on region enhancement driving, training a neural network through the optimization function, enabling a detection model to automatically obtain key regions in the image of the driver according to different behavior characteristics of the driver in the detection process, solving the problem of automatic extraction of key features and positions in a detection method based on visual features, and improving the identification precision of the model.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a flow chart of an embodiment of the present invention.

Detailed Description

The invention will be further explained with reference to fig. 1 and 2.

As shown in the attached drawings, a method for detecting distraction of a driver based on an image area positioning mechanism comprises the following steps:

a) the method comprises the steps of collecting visual images of different behaviors of a driver, and determining key areas needing attention in the different behavior states in an automatic positioning and manual adjusting mode according to different behavior states of the driver in each visual image.

b) And establishing a probability heat map of a key area in a visual image of the driver by using a Gaussian model, and establishing a driver behavior detection data set based on area positioning.

c) Establishing a neural network model, constructing cost functions driven by class activation mapping and key region probability heat maps, and training a neural network by using the cost functions to obtain the optimized neural network model.

As shown in the attached figure 2, the regions needing attention in different behavior states of the driver in the image are obtained through a manual calibration method, the regions are combined with neural network activation mapping, a model optimization function based on region enhancement driving is established, the neural network is trained through the optimization function, so that the detection model can automatically obtain key regions in the image of the driver according to different behavior characteristics of the driver in the detection process, the problem of automatic extraction of key features and positions in the detection method based on visual features is solved, and the identification precision of the model is improved.

Specifically, a camera is installed in the vehicle in the step a), videos of different behaviors of the driver are collected through the camera, the videos are converted into visual images frame by frame, and the visual images are stored to obtain sample images.

Specifically, the behavior states of the driver in step a) are respectively defined as: a normal driving state, a state of using a smart phone or a tablet computer, a calling state, a conversation with a co-driver state, a drinking state and an operation center control electronic equipment state; when the driver is in a normal driving state, key areas needing attention are positioned on the hands and the upper arms of the driver in the visual image; when the driver is in a state of using the smart phone or the tablet personal computer, the key area needing attention is located in the mobile phone or the tablet personal computer of the hand of the driver in the visual image; when the driver is in a calling state, the key area needing attention is positioned at the mouth and the mobile phone position of the driver in the visual image; when the driver is in a conversation state with the co-driver, key areas needing attention are positioned at the mouth and the face of the driver in the visual image; when the driver is in a drinking state, the key area needing attention is positioned in a container held by the driver in the visual image; when the driver is in the state of operating the central control electronic equipment, the key area needing attention is positioned at the hand of the driver and the central control equipment in the visual image.

Specifically, the step a) is as follows:

a-1) finding out the limb movement area of the driver in the process of executing different behaviors in the sample image, and establishing key areas based on different behavior states in the driver image.

a-2) based on the established key area, automatically obtaining the position information of the upper arm skeleton point and the head skeleton point of the driver in the sample image by a skeleton point positioning method, drawing a rectangular frame based on the upper arm skeleton point and the head skeleton point, and obtaining the initial position of the key area of the image, wherein the skeleton point is positioned at the center of the rectangular frame. Let the size of the image behÍbThe dimensions of the initial rectangular frame are set as follows: the height of the rectangular frame in the normal driving image ishA width ofb(ii)/4; the height of the rectangular frame of the driver in the image of the smart phone or the tablet computer ishA width ofb(iii)/5; the height of the rectangular frame in the image of the driver's call ishA width ofb(iii)/5; the rectangular frame of the driver in the image of the conversation with the copilot is h/3 in height and h/3 in widthb(iii)/5; the height of the rectangular frame of the driver in the drinking image ishA width ofb(iii)/5; the height of the rectangular frame in the image of the electronic equipment controlled in the driving operation ishA width ofb/5。

a-3) manually correcting to obtain a rectangular frame of which the image finally contains the key area according to the position and the size of the rectangular frame. Preferably, the ratio of the finally corrected rectangular frame in the key area is equal to or greater than 90% and the rectangular frame is equal to or less than 1/2 of the sample image. Based on the above principle, the size range of the final rectangular frame after manual correction is as follows: the range of the rectangular box height in the normal driving image is:h/9 ~ h/7, wide range ofb/6 ~ bA/3; the range of the height of the rectangular frame of the image of the driver using the smart phone or the tablet computer is as follows:h/6 ~ ha broad range ofb/6 ~ bA/3; the range of the height of the rectangular frame in the image of the driver making a call is:h/6 ~ h/4, wide range ofb/8 ~ b6; the range of the height of the rectangular frame in the image of the driver talking with the co-driver is: h4 to h/3, the wide range isb/6 ~ b(iii)/5; the range of the height of the rectangular frame of the driver in the drinking image is as follows:h/7 ~ ha broad range ofb/8 ~ b(iii)/5; the range of the height of the rectangular frame in the driving operation central control electronic equipment image is as follows: h/3 to h/2, the wide range isb/6 ~ b/4。

Specifically, the step b) comprises the following steps:

b-1) based on the key region, by formula

Establishing a two-dimensional Gaussian model

In the formula

In order to normalize the factors, the method comprises the steps of,

in the form of a covariance matrix,

are variables of the two-dimensional gaussian model,

for key locations in the driver behavior image,

，

is the abscissa of the critical area and is,

is the ordinate of the key area and is,

is composed of

The medium maximum value is the maximum value of the average,

is composed of

The minimum value of the sum of the average values,

is composed of

The medium maximum value is the maximum value of the average,

is composed of

The medium minimum value.

b-2) two-dimensional Gaussian model

And converting the image into a two-dimensional image to obtain a probability heat map of a key area in the visual image of the driver.

Specifically, the step c) comprises the following steps:

c-1) establishing a ResNeXt neural network model with the network layer number of 50, and adopting a global average pooling layer at the top layer of the neural network.

And c-2) establishing a SoftMax classifier at the top layer of the global pooling layer to output the predicted probability value of the driver behavior.

c-3) by the formula

is composed of

A heat map of the class is generated,

the number of neurons in the top layer is,

as a top-level weight parameter,

and the mapping value of the previous layer of the global mean pooling layer.

Calculating to obtain a region enhanced optimization function

In the formula (I), wherein,

in order to be a function of the non-linear transformation,

is a neural network

The class activation mapping of the class behavior state class,

as of the same class as the true behavioral state class

The class activates the mapping and the class activates the mapping,

in the same class as the real category of behavior,

as a function of the number of the coefficients,

the matrix is a product of the hadamard,

is a predefined one

The class activates the mapping.

c-5) by the formula

Calculating a cost function

，

For a cost function based on the driver state value, the traditional ResNeXt network uses a cost function of

，

，

As a function of the number of the coefficients,

for the output values of the resenext neural network model,

is a calibrated value.

c-6) passing a cost function

Training the ResNeXt neural network model until convergence, and establishing the hyper-parameters of the model through cross validation. The final defined hyper-parameters are mainly: the neural network trains the learning rate, the number of batch training samples (batch size), the coefficients in the loss function, the number of batch samples, the momentum parameter β of the momentum optimizer.

Preferably, in this patent, the learning rate r = 0.001 for neural network training, and the number of batch training samples: blocksize =32, coefficients in loss function

=0.3, momentum parameter β =0.9 of the momentum optimizer.

Preferably, a camera is installed at the position of the roof right above the driver in the step d).

Specifically, the step d) comprises the following steps:

d-1) reading the trained ResNeXt neural network model in c-6) as a detection model.

d-2) inputting each frame of image of the driver acquired by the camera into the detection model.

In order to verify that the detection precision is improved by the driver distraction detection method based on the image area positioning mechanism, a driver behavior data set is constructed through a real-vehicle experiment, and the data set comprises 12688 driver images of 6 behaviors including a normal driving state, a state of using a smart phone or a tablet personal computer, a state of making a call, a state of talking with a co-driver, a water drinking state and a state of operating a central control electronic device, wherein 40 drivers, 10 females among 40 drivers and 30 males. If the data set is input into the ResNeXt model trained by the traditional training method, the model recognition accuracy is only 89.75% taking the ResNeXt model with 50 layers as an example. If the data set is input into the ResNeXt model trained by the invention, the model identification accuracy can reach 95.59% by taking a 50-layer ResNeXt model as an example.

In order to verify the accuracy of the driver distraction detection method based on the image area positioning mechanism in the patent, a driver behavior data set is constructed through a real-vehicle experiment, and the data set comprises 12688 driver images of 6 behaviors including a normal driving state, a state of using a smart phone or a tablet computer, a state of making a call, a state of talking with a copilot, a water drinking state and a state of operating central control electronic equipment, wherein 40 drivers, 10 females and 30 males among 40 drivers. The experiment takes a ResNeXt model with 50 layers as an example, training and verification are carried out by relying on the data set, the experimental result is shown in table 1, and C0-C5 in table 1 respectively represent 6 behaviors of a normal driving state, a state of using a smart phone or a tablet computer, a state of calling, a state of talking with a co-driver, a state of drinking water and a state of operating central control electronic equipment. Compared with the traditional training method, the recognition accuracy of the ResNeXt model can be effectively improved by the training method provided by the patent through experimental results.

TABLE 1

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A driver distraction detection method based on an image area positioning mechanism is characterized by comprising the following steps:

2. The method for detecting the distraction of the driver based on the image area localization mechanism according to claim 1, wherein: in the step a), a camera is installed in the vehicle, videos of different behaviors of a driver are collected through the camera, the videos are converted into visual images frame by frame, and the visual images are stored to obtain sample images.

3. The method for detecting the distraction of the driver based on the image area positioning mechanism according to claim 1, wherein the behavior states of the driver in the step a) are respectively defined as: a normal driving state, a state of using a smart phone or a tablet computer, a calling state, a conversation with a co-driver state, a drinking state and an operation center control electronic equipment state; when the driver is in a normal driving state, key areas needing attention are positioned on the hands and the upper arms of the driver in the visual image; when the driver is in a state of using the smart phone or the tablet personal computer, the key area needing attention is located in the mobile phone or the tablet personal computer of the hand of the driver in the visual image; when the driver is in a calling state, the key area needing attention is positioned at the mouth and the mobile phone position of the driver in the visual image; when the driver is in a conversation state with the co-driver, key areas needing attention are positioned at the mouth and the face of the driver in the visual image; when the driver is in a drinking state, the key area needing attention is positioned in a container held by the driver in the visual image; when the driver is in the state of operating the central control electronic equipment, the key area needing attention is positioned at the hand of the driver and the central control equipment in the visual image.

4. The method for detecting the distraction of the driver based on the image area positioning mechanism according to claim 3, wherein the step a) comprises:

a-3) manually correcting to obtain a rectangular frame of which the image finally contains the key area according to the position and the size of the rectangular frame.

5. The method for detecting the distraction of the driver based on the image area positioning mechanism according to claim 1, wherein the step b) comprises the following steps:

b-1) based on the key region, by formula

Establishing a two-dimensional Gaussian model

In the formula

In order to normalize the factors, the method comprises the steps of,

in the form of a covariance matrix,

are variables of the two-dimensional gaussian model,

for key locations in the driver behavior image,

，

is the abscissa of the critical area and is,

is the ordinate of the key area and is,

is composed of

The medium maximum value is the maximum value of the average,

is composed of

The minimum value of the sum of the average values,

is composed of

The medium maximum value is the maximum value of the average,

is composed of

A medium to minimum value;

b-2) two-dimensional Gaussian model

6. The method for detecting the distraction of the driver based on the image area positioning mechanism according to claim 1, wherein the step c) comprises the following steps:

c-3) by the formula

is composed of

A heat map of the class is generated,

the number of neurons in the top layer is,

as a top-level weight parameter,

mapping values of a previous layer of the global mean pooling layer;

Calculating to obtain a region enhanced optimization function

In the formula (I), wherein,

in order to be a function of the non-linear transformation,

is a neural network

The class activation mapping of the class behavior state class,

as of the same class as the true behavioral state class

The class activates the mapping and the class activates the mapping,

in the same class as the real category of behavior,

as a function of the number of the coefficients,

the matrix is a product of the hadamard,

is a predefined one

Class activation mapping;

c-5) by the formula

Calculating a cost function

，

For the cost function based on the driver state value,

，

as a function of the number of the coefficients,

for the output values of the resenext neural network model,

is a calibration value;

c-6) passing a cost function

7. The method for detecting the distraction of the driver based on the image area localization mechanism according to claim 1, wherein: and d) mounting a camera at the position of the roof above the right of the driver.

8. The method for detecting the distraction of the driver based on the image area positioning mechanism according to claim 6, wherein the step d) comprises the following steps: