CN109977786B

CN109977786B - Driver posture detection method based on video and skin color area distance

Info

Publication number: CN109977786B
Application number: CN201910156046.5A
Authority: CN
Inventors: 何杰; 汤慧; 化丽茹; 曦曙; 郑有凤; 赵池航; 周博见
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2021-02-09
Anticipated expiration: 2039-03-01
Also published as: CN109977786A

Abstract

The invention discloses a driver posture detection method based on a video and skin color area distance, which comprises the steps of extracting skin color areas of sampled images in a plurality of sample videos, calculating the centroid coordinate of the skin color areas, converting the centroid coordinate into a characteristic distance to represent the characteristic value of each image, and fusing the characteristic values of a plurality of images corresponding to one section of video into one characteristic value by adopting a clustering algorithm; constructing a BP neural network, inputting the fused characteristic values and the corresponding driving posture categories into the BP neural network as training samples, and training to obtain a driver posture detection model; and during detection, acquiring a video of a driver to be detected during driving, calculating a characteristic value of the video to be detected according to the method in the step, taking a calculation result as an input of a driver posture detection model, and outputting the calculation result as a driving posture category of the video to be detected. The method can effectively improve the detection rate of the posture of the driver, realize the recognition and classification of the driving behaviors of the driver and finally realize the real-time early warning of the operation driving process.

Description

Driver posture detection method based on video and skin color area distance

Technical Field

The invention belongs to the field of traffic safety, and particularly relates to a driver posture detection method based on a video and a skin color area distance.

Background

The world health organization 'road safety global status report 2015' indicates that road traffic accidents are a main factor of global population death, about 3500 people in the world die due to road traffic collision every day, and serious road traffic accidents not only harm the national economy, but also bring heavy burden to families, so that the improvement of traffic safety becomes one of the primary tasks in the current work of various countries.

The occurrence of traffic accidents is the result of the combined action of human-vehicle-road-environment factors, and researchers generally believe that more than 80% of accidents occur due to wrong driving behaviors of drivers, such as overspeed, fatigue driving and cell phone use. These behaviors all seriously affect the perception and judgment of the driver, and how to detect and identify the behavior of the driver becomes important for traffic safety.

With the increasing requirements of people on traffic safety, the existing method for detecting the posture of the driver limited to the head is difficult to meet the requirements of people on safety; the invasive detection method has obvious limitation and is difficult to popularize; meanwhile, research results based on the head and the left and right hand areas are few, and intensive research and further optimization are urgently needed. In addition, the currently applied skin color detection method extracts a single full image pixel as a feature, the feature data is single and the dimension is too large, and the problem that detection cannot be performed due to instantaneous area overlapping, partial shielding and the like can occur in the detection process, so that the detection accuracy is affected.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a driver posture detection method based on a video and skin color area distance, which can effectively improve the detection rate of the posture of a driver, realize the recognition and classification of the driving behaviors of the driver and finally realize the real-time early warning of the operation driving process.

The technical scheme is as follows: the invention adopts the following technical scheme:

a driver posture detection method based on video and skin color area distance comprises the following steps:

(1) collecting N sections of videos of drivers during driving, wherein each section of video only comprises one of three different driving postures, namely a steering wheel gripped by two hands, a control gear and an abnormal driving posture; the duration of the ith video is t_iFrame rate fr_iThe driving posture class is p_i，i＝1..N；

(2) Sequentially processing N sections of videos collected in the step (1), and intercepting one image of each section of video every F frames to form an image data set J ═ J₁,J₂,…,J_N) Wherein the ith video corresponds to a data set

mi is the number of images intercepted by the ith video;

(3) sequentially processing a data set J corresponding to the ith video_iExtracting skin color areas and calculating the centroid coordinates of the first three skin color areas with the largest area; when the data set J_iIf one image can not detect three skin color areas, the next image is switched to continue processing, and if the data set J is detected_iIf none of the images can be processed, an image is intercepted again for the ith video every F' frame to form a new data set J_iRe-inspection if data set J_iAll images in the video cannot be processed, and the ith video segment, F ', is abandoned'>F；

(4) Converting the centroid coordinates of the skin color area extracted in the step (3) into characteristic distances to represent the characteristic value of each image, wherein the characteristic distances are represented on a data set J_iThe images in (1) are processed in sequence to obtain characteristic values

ni is the actual effective image quantity after processing in the image intercepted by the ith video, and ni is less than or equal to mi; using a clustering algorithm to convert lambda_iThe eigenvalue Lambda is fused into_iAnd finally, the result obtained by the N-segment video processing forms a characteristic set of lambda ═ (lambda)₁,Λ₂,…,Λ_W)，W≤N；

(5) Adopting a BP neural network classifier to obtain a characteristic set Lambda and a corresponding driving posture class p_iAs trainingInputting the sample into a classifier, and training the classifier to obtain a driver posture detection model;

(6) collecting a video of a driver to be detected when driving, wherein the video only comprises one of three different driving postures, namely a steering wheel gripped by two hands, a control gear and an abnormal driving posture; intercepting the image of the collected video according to the method in the step (2) to form an image data set V ═ V₁,v₂,…,v_m) Wherein m is the number of the intercepted images; extracting skin color regions from the images in the data set V according to the method in the step (3) and calculating the centroid coordinates of the first three skin color regions with the largest area; obtaining the characteristic value Lambda of the image data set V according to the method in the step (4)_VWill be_VAnd (5) as the input of the BP neural network model trained in the step (5), outputting the driving posture category of the video to be detected.

The step (3) of calculating the centroid coordinates of the first three skin color regions with the largest area in the image specifically comprises the following steps:

(3-1) preprocessing an image to be processed by using a reference white method, extracting a characteristic region by using a skin color model based on a normalized RGB color space, and finally removing an interference region in the image to be processed by using a mathematical morphology method and reserving the first three characteristic regions with the largest area;

(3-2) scanning the image pixels processed in the step (3-1) row by row and column by using a bwleael function, identifying three characteristic regions according to pixel arrangement positions in a matrix returned by the function, and then calculating the centroid coordinates of the three characteristic regions, wherein the centroid calculation formula is as follows:

wherein x is_kAnd y_kRespectively a centroid abscissa and a centroid ordinate of the kth region, sum _ x is the sum of abscissa of pixel points in the kth region, sum _ y is the sum of ordinate of pixel points in the kth region, area is the number of pixel points in the kth region, and k is 1,2, and 3;

the region with the upper centroid among the three feature regions is set as a head region of a label H, the rest two regions on the left side are set as a left-hand region of a label L, and the regions on the right side are set as a right-hand region of a label R.

In the step (4), a data set J_iCharacteristic value of (A)_iThe calculation is specifically as follows:

(4-1) calculation of data set J_iEach image of

Characteristic value of

q＝1,..,ni：

Calculating characteristic distance by using the skin color area centroid coordinate calculated in the step (3) to represent the image

The eigenvalue, expressed by the formula:

wherein (x)_H,y_H)、(x_L,y_L)、(x_R,y_R) Are respectively images

Centroid coordinates of the middle head region, the left hand region and the right hand region;

(4-2) on the data set J_iEach image in (a) calculates its feature value to form a set

Using clustering algorithm to pair lambda_iCentering, continuously iterating each data point

The standard deviation from the clustering center reaches the minimum, and the clustering center is J_iCharacteristic value of (A)_i。

The step (5) is specifically as follows:

(5-1) vs Λ ═ Λ₁,Λ₂,…,Λ_W) Carrying out normalization processing to obtain a set Lambda' as a training sample; will drive the attitude class p_iConversion to vectors

Is represented by [1,0]Represents the first class, [0,1,0 ]]Represents the second class, [0,0,1 ]]Represents a third class;

(5-2) constructing a BP neural network, wherein an input layer of the BP neural network has 2 input nodes; two hidden layers, each hidden layer having 5 hidden nodes; the output layer has 3 nodes, and the values of the 3 output nodes form a driving posture category represented by the vector;

(5-3) mixing the samples in the lambda'_iInputting the constructed BP neural network, and obtaining the actual output r of the neural network after the forward layer-by-layer processing is carried out on the connection condition among all the nodes_i(ii) a Calculating r_iAnd

the error is reversely transmitted back to the previous layers layer by layer, and the error is loaded on the connection weight, so that the connection weight of the whole BP neural network is converted to the direction of reducing the error; and repeating the steps for each group of input and output samples in the training set until the error of the whole training sample set is reduced to a preset threshold value.

When the image is intercepted again for the ith video in the step (3), the value of the interval F' is as follows:

wherein

The rounding-up operator.

The abnormal driving postures comprise that two hands leave a steering wheel, one hand drives, eating and calling.

Has the advantages that: compared with the prior art, the method for detecting the posture of the driver based on the video and the skin color area distance detects the behavior of the driver in real time by using the posture sequence observation information acquired from the driving video, and finally realizes real-time early warning on the operation driving process. Compared with the existing detection method for extracting a single characteristic value by using a single full-pixel image, the method can effectively reduce the problem that the detection cannot be carried out due to instantaneous area overlapping, partial shielding and the like, thereby effectively improving the driving posture detection rate and ensuring the safety of the passenger and freight car operation process.

Drawings

FIG. 1 is a flow chart of a driver gesture detection and recognition method disclosed in the present invention;

FIG. 2 is an original image of different postures of the driver captured from a video;

FIG. 3 is a diagram of reference white processing effect

FIG. 4 is a diagram of skin color detection results;

FIG. 5 is a diagram illustrating the effect of image morphological processing;

FIG. 6 is a schematic diagram of extracting centroid coordinates of a skin color region;

FIG. 7 is a diagram of the training effect of the BP neural network.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described below with reference to the accompanying drawings.

In the embodiment, a driver posture data set is established by shooting different driving postures of different drivers in different real driving scenes, so that the detection and the recognition of the driver posture are realized, and a flow chart is shown in fig. 1.

Step 1: the placement position of the camera is determined according to the vehicle type and the cab space during video acquisition, the upper body area of a driver is ensured to be positioned in the center of the lens, and the detection area is shown in fig. 2. The method comprises the steps of collecting a video of a driver during driving within 200 periods of time of 10 seconds and 30 frames per second, wherein the video comprises 50 gestures of holding a steering wheel by two hands, 50 gestures of operating a gear and 100 abnormal driving gestures, and the abnormal driving gestures comprise two hands leaving the steering wheel, one-hand driving, eating, making a call and the like;

step 2: sequentially processing 200 sections of videos collected in the step 1, and intercepting one image of each section of video every 30 frames to form an image data set J ═ J₁,J₂,…,J₂₀₀) That is, each video segment intercepts 10 images, wherein the ith video segment corresponds to a data set

And step 3: sequentially processing a data set J corresponding to the ith video_iExtracting skin color areas and calculating the centroid coordinates of the first three skin color areas with the largest area; when the data set J_iIf one image can not detect three skin color areas, the next image is switched to continue processing, and if the data set J is detected_iIf none of the images can be processed, an image is intercepted again for the ith video every 15 frames to form a new data set J_iRe-inspection if data set J_iDiscarding the ith video if all the images in the video still cannot be processed;

182 effective samples are finally obtained in the embodiment, wherein the effective samples comprise 48 gestures of holding the steering wheel with two hands, 50 gestures of operating gears and 84 abnormal driving gestures (two hands leave the steering wheel, one hand drives, eats things and makes a call);

the steps of extracting the first three skin color areas with the largest area in the image and calculating the centroid coordinate specifically comprise:

(3-1) preprocessing the image by a reference white method before extracting the skin color area, wherein the processing effect is shown in fig. 3, wherein fig. 3(a) is the image before processing, and fig. 3(b) is the image after processing; then, extracting a characteristic region by adopting a skin color model based on a normalized RGB color space, and specifically comprising the following steps:

the RGB color space values (r, g, b) for each pixel are normalized according to equation (1):

normalized value (r)₀,g₀,b₀) If the conditions of the formulas (2) and (3) are satisfied, the gray-scale value of the pixel is changed to 255, otherwise, the gray-scale value of the pixel is 0, and the result is shown in fig. 4 (b);

r₀>95,g₀>45,b₀>20,r₀>g₀+15,r₀>b₀ (2)

Max{r₀,g₀,b₀}-Min{r₀,g₀,b₀}>15 (3)

removing the interference area in the image by using a mathematical morphology method and a connected region labeling method, and reserving the first three characteristic regions with the largest area, wherein the processing effect is shown in fig. 5, wherein fig. 5(a) and 5(c) are images before removing the interference area, and fig. 5(b) and 5(d) are corresponding images after removing the interference area;

wherein x is_kAnd y_kThe centroid abscissa and ordinate of the kth region are respectively, sum _ x is the sum of the abscissa of the pixels in the kth region, sum _ y is the sum of the ordinate of the pixels in the kth region, area is the number of the pixels in the kth region, k is 1,2,3, and fig. 6 shows the mass of 3 regionsThe coordinates of the heart.

And 4, step 4: converting the centroid coordinates of the skin color area extracted in the step 3 into characteristic distances to represent the characteristic value of each image, wherein the characteristic distances are represented for a data set J_iThe images in (1) are processed in sequence to obtain characteristic values

Using a clustering algorithm to convert lambda_iThe eigenvalue Lambda is fused into_iThe final 182 segments of video processing result in the set of feature set Λ ═ (Λ)₁,Λ₂,…,Λ₁₈₂)；

Calculating the characteristic distance by using the centroid coordinates of the skin color area and fusing each group of processing results specifically comprises the following steps:

(4-1) calculation of data set J_iEach image of

Characteristic value of

q＝1,..,ni：

The eigenvalue, expressed by the formula:

wherein (x)_H,y_H)、(x_L,y_L)、(x_R,y_R) Are respectively images

Centroid coordinates of the middle head region, the left hand region and the right hand region; l₁，l₂，l₃Representing head-to-left, head-to-right, and left-to-right distances, respectively.

The standard deviation from the clustering center reaches the minimum, and the clustering center is J_iCharacteristic value of (A)_i. The data obtained by processing in this embodiment is shown in table 1, where class 1 is a posture in which both hands grip the steering wheel, class 2 is a posture in which a shift is manipulated, and class 3 is an abnormal driving posture, including a posture in which both hands leave the steering wheel, a posture in which both hands drive, eat, and make a call;

and 5: adopting a BP neural network classifier to obtain a characteristic set Lambda and a corresponding driving posture class p_iInputting the training sample into a classifier, and training the classifier to obtain a driver posture detection model;

in this example, the results of the detection by the various classifiers are shown in table 2. The BP neural network is a neural network model with wider application, and is mainly used for function approximation, model identification and classification, time series prediction and the like. The method has certain superiority in the aspects of accuracy, function calling convenience and the like, so the embodiment adopts the BP neural network to train and classify the sample data.

The training and calling of the BP neural network specifically comprises the following steps:

(5-1) vs Λ ═ Λ₁,Λ₂,…,Λ₁₈₂) Carrying out normalization processing to obtain a set Lambda' as a training sample; will drive the attitude class p_iConversion to vectors

Is represented by [1,0]Representing a first type of two-handed grip of the steering wheel, [0,1,0]Represents the second type of operation gear position, 0,1]Representing a third type of abnormal driving posture;

Table 1: characteristic value obtained by processing different driving posture categories

Step 6: collecting a video of a driver to be detected when the driver drives, wherein the video only comprises a steering wheel tightly held by two hands and a driver to operateOne of three different driving postures, namely a longitudinal gear posture and an abnormal driving posture; intercepting the image of the collected video according to the method in the step (2) to form an image data set V ═ V₁,v₂,…,v_m) Wherein m is the number of the intercepted images; extracting skin color regions from the images in the data set V according to the method in the step (3) and calculating the centroid coordinates of the first three skin color regions with the largest area; obtaining the characteristic value Lambda of the image data set V according to the method in the step (4)_VWill be_VAnd (5) as the input of the BP neural network model trained in the step (5), outputting the driving posture category of the video to be detected.

The present embodiment uses videos of known driving gesture categories for testing and verification, and the verification result is shown in fig. 7. The open circles in the figure represent the results of the detection performed by the method of the invention, i.e. the prediction output; the asterisks indicate the known driving attitude category, i.e., the desired output. As can be seen from the figure, 4 expected outputs in 45 samples to be tested do not accord with the predicted output, the expected outputs of the other samples to be tested are completely consistent with the predicted output, and the accuracy reaches 91 percent, which shows that the method has higher detection accuracy.

TABLE 2 evaluation of classification methods

Classification method	Rate of accuracy	Duration of training
			Decision tree	89.0％	5.3883ms
SVM	86.8％	11.507ms
			KNN	87.4％	12.542ms
BP neural network of the present invention	93.5％	6.4973ms

Claims

1. A driver posture detection method based on video and skin color area distance is characterized by comprising the following steps:

mi is the number of images intercepted by the ith video;

(5) Adopting a BP neural network classifier to obtain a characteristic set Lambda and a corresponding driving posture class p_iInputting the training sample into a classifier, and training the classifier to obtain a driver posture detection model;

(6) collecting a video of a driver to be detected when driving, wherein the video only comprises one of three different driving postures, namely a steering wheel gripped by two hands, a control gear and an abnormal driving posture; intercepting the image of the collected video according to the method in the step (2) to form an image data set V ═ V₁,v₂,…,v_m) Wherein m is the number of the intercepted images; extracting skin color regions from the images in the data set V according to the method in the step (3) and calculating the centroid coordinates of the first three skin color regions with the largest area; obtaining the characteristic value Lambda of the image data set V according to the method in the step (4)_VWill be_VAs the input of the BP neural network model trained in the step (5), outputting the driving posture category of the video to be detected;

(4-1) calculation of data set J_iEach image of

Characteristic value of

The eigenvalue, expressed by the formula:

wherein (x)_H,y_H)、(x_L,y_L)、(x_R,y_R) Are respectively images

And clusteringThe standard deviation of the center reaches the minimum, and the clustering center is J_iCharacteristic value of (A)_i。

2. The method for detecting the posture of the driver based on the video and the distance between the skin color regions as claimed in claim 1, wherein the step (3) of calculating the coordinates of the center of mass of the first three skin color regions with the largest area in the image is specifically as follows:

3. The method for detecting the attitude of the driver based on the video and the skin color region distance as claimed in claim 1, wherein the step (5) is specifically as follows:

(5-1) vs Λ ═ Λ₁,Λ₂,…,Λ_W) Carrying out normalization processing to obtain a set Lambda' as a training sample; will drive the postureState class p_iConversion to vectors

4. The method for detecting the posture of the driver based on the video and the skin color area distance as claimed in claim 1, wherein when the image is re-captured in the ith video in the step (3), the value of the interval F' is as follows:

wherein

The rounding-up operator.

5. The method of claim 1, wherein the abnormal driving gesture comprises two hands off a steering wheel, one hand driving, eating, and making a call.