CN111814568B

CN111814568B - Target detection method and device for monitoring state of driver

Info

Publication number: CN111814568B
Application number: CN202010532960.8A
Authority: CN
Inventors: 张世亮; 刘鹏; 祁亚斐
Original assignee: Adasplus Beijing Technology Co ltd
Current assignee: Adasplus Beijing Technology Co ltd
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2022-08-02
Anticipated expiration: 2040-06-11
Also published as: CN111814568A

Abstract

The application discloses a target detection method and a target detection device for monitoring a driver state, wherein the method comprises the following steps: collecting a driver state information image; establishing a driver state information detection model and a key point alignment model based on a deep convolutional neural network; obtaining a driver state information bounding box and a first confidence score; filtering the bounding box according to the first confidence score; expanding the filtered boundary box of the driver state information; obtaining the coordinates of the key points of the state of the driver and a second confidence score; filtering the bounding box according to the second confidence score; and obtaining the filtered boundary frame of the driver state information. The apparatus, comprising: the system comprises a data acquisition module, a model building module, a first filtering module, a second filtering module and an expansion module; the method and the device reduce dependence on a single deep learning neural network model, improve detection precision, save labor cost and maximize detection capability of the model.

Description

Target detection method and device for monitoring state of driver

Technical Field

The application relates to the technical field of face detection, in particular to a target detection method and device for monitoring a driver state.

Background

At present, with the development of artificial intelligence technology, the face detection technology based on deep learning obtains breakthrough performance in the field of face detection. In the prior art, the accuracy of face detection can be greatly improved by methods of constructing a Deep Convolutional Neural Network (DCNN) model, adjusting parameters of a target detection algorithm and the like. In a driver status monitoring (DMS) application scenario, the face position Of a driver is generally determined by face detection or key point positioning, and then roi (region Of interest) is extracted by using the position information as an input Of driver behavior detection, that is, the accuracy Of driver behavior detection is directly affected by the quality Of a face detection result.

In the prior art, a single deep learning neural network model is often applied, and the output prediction result is judged by setting a confidence threshold. The method is easy to generate more false detections and missed detections for face detection: when the threshold value is set to be low, part of objects with similar skin colors and shapes close to ellipses are easy to detect as human faces; when the threshold value is set to be higher, the face with larger attitude angle can be rejected by the threshold value when wearing sunglasses, a mask, a hat and the like which are partially shielded.

One way to solve this problem is by improving the expressive power of deep learning neural network models, such as increasing the number and quality of training samples, adjusting training methods and hyper-parameters, adjusting model network structure, etc. However, the above methods are very costly and technically difficult, and adjusting the network structure, such as increasing the width and depth of the model, will greatly increase the inference time of the model at the terminal, and the requirements for real-time data processing cannot be met for embedded devices with small computation amount, such as DMS. The other method is to carry out secondary confirmation on the first detection result by constructing a second or a plurality of models to ensure the credibility of the detection result, namely, carrying out face positioning through key points or detection frames for the first time, acquiring the input of secondary detection by using positioning information, and finally confirming the result by using the secondary detection. However, the method still depends heavily on the model for the first detection, if the positioning information for the first detection is inaccurate, the second detection result can be directly influenced, so that the final detection result is influenced, and a plurality of models relate to the determination of a plurality of threshold values.

Aiming at the problem of low detection result precision caused by improper threshold selection in the related technology, no effective solution is provided at present.

Disclosure of Invention

The present application mainly aims to provide a target detection method and device for monitoring a driver state, so as to solve the problem of low detection result precision caused by improper threshold selection in the related art.

In order to achieve the above object, in a first aspect, the present application provides a target detection method for driver state monitoring.

S100: collecting a driver state information image, and dividing all images into a training set and a testing set;

s101: establishing a driver state information detection model and a key point alignment model based on a deep convolutional neural network by using a training set;

s102: inputting a test set into the driver state information detection model based on the deep convolutional neural network to obtain a driver state information boundary box and a first confidence score;

s103: setting a first confidence threshold, and filtering the boundary frame of the state information of the driver according to the first confidence score;

s104: expanding the filtered boundary box of the driver state information;

s105: inputting a test set into the alignment model of the key points of the driver state information based on the deep convolutional neural network to obtain the coordinates of the key points of the driver state and a second confidence score;

s106: setting a second confidence threshold, and filtering the boundary frame of the state information of the driver according to the second confidence score;

s107: and obtaining a filtered driver state information boundary frame which is the final detection output.

The method for expanding the filtered driver state information bounding box comprises the following steps:

assuming that the upper left corner of the input image is the origin, (x) ₁ ,y ₁ ) And (x) ₂ ,y ₂ ) For expanding the coordinates of the upper left corner and the lower right corner of the boundary box of the driver state information before, the width and the height of the input image are W, H respectively, and the coordinates of the middle point of the boundary box of the driver state information are taken

The width of the boundary box of the driver state information is w ═ x ₂ -x ₁ The height of the boundary box of the driver state information is h ═ y ₂ -y ₁ And if the short side size of the boundary frame of the driver state information is taken to be min (w, h), the coordinates of the upper left corner of the expanded boundary frame of the driver state information are as follows:

the coordinates of the lower right corner are:

preventing the expanded driver state information bounding box from overflowing the boundary of the input image:

x' ₁ <When 0, let x' ₁ ＝0，

When y' ₁ <When 0, let y' ₁ ＝0，

X' ₂ >When W is, let x' ₂ ＝W，

When y' ₂ >H is, y' ₂ ＝H，

Where δ is the spreading ratio.

The setting of the first confidence threshold and the second confidence threshold, and the adoption of an automatic search method of an optimal threshold to obtain an optimal threshold combination result, which are respectively used as the set first confidence threshold and the set second confidence threshold, comprises the following steps:

step S200: acquiring a driver state information image and a label corresponding to the image, and forming a verification set by the image and the label;

step S201: initializing a first confidence coefficient threshold and a second confidence coefficient threshold, and respectively setting initial and final values and step length of iteration of the first confidence coefficient threshold and the second confidence coefficient threshold;

step S202: the first confidence coefficient threshold value and the second confidence coefficient threshold value at the moment are given to a driver state information detection model and a key point alignment model based on the deep convolutional neural network;

step S203: inputting the images in the verification set into a driver state information detection model and a key point alignment model based on a deep convolutional neural network, and detecting to obtain a detection result;

step S204: comparing the label in the verification set with the detection result, and recording the accuracy;

step S205: adding the step length to the first confidence coefficient threshold value, and reassigning to the first confidence coefficient threshold value;

step S206: judging whether the first confidence coefficient threshold is larger than or equal to the final value of the first confidence coefficient threshold, if so, turning to the step S207, and if not, turning to the step S202;

step S207: adding the step length to the second confidence coefficient threshold value, and reassigning to the second confidence coefficient threshold value;

step S208: judging whether the second confidence coefficient threshold is greater than or equal to the final value of the second confidence coefficient threshold, if so, ending the searching method, outputting the optimal threshold combination, and if not, going to step S209;

step S209: the first confidence threshold at this time is given as the initial value of the second confidence threshold, and the process proceeds to step S202.

The setting of the first confidence threshold and the second confidence threshold, and the adoption of an improved automatic search method of the optimal threshold to obtain the optimal threshold combination result, which are respectively used as the set first confidence threshold and the second confidence threshold, the process is as follows:

step S300: acquiring a driver state information image and a label corresponding to the image, and forming a verification set by the image and the label;

step S301: initializing the first confidence threshold and the second confidence threshold, and respectively setting initial values, final values, step lengths, a first list and a second list of iteration of the first confidence threshold and the second confidence threshold. Putting the initial value of the first confidence coefficient threshold into a first list, and putting the initial value of the second confidence coefficient threshold into a second list;

step S302: adding the step length to the first confidence coefficient threshold value, reassigning the first confidence coefficient threshold value, and putting the reassigned first confidence coefficient threshold value into a first list;

step S303: judging whether the first confidence coefficient threshold is larger than or equal to the final value of the first confidence coefficient threshold, if so, turning to the step S304, and if not, turning to the step S302;

step S304: adding the step length to the second confidence coefficient threshold value, reassigning the second confidence coefficient threshold value, and putting the reassigned second confidence coefficient threshold value into a second list;

step S305: judging whether the second confidence coefficient threshold is larger than or equal to the final value of the second confidence coefficient threshold, if so, ending the searching method, outputting a first list and a second list, and if not, turning to the step S306;

step S306: the first confidence threshold at this time is given as the initial value of the second confidence threshold, and the process proceeds to step S302.

Step S307, endowing the first list and the second list with a driver state information detection model and a key point alignment model based on a deep convolutional neural network;

step S308: inputting the images in the verification set into a driver state information detection model and a key point alignment model based on a deep convolutional neural network, and detecting to obtain a set of detection results;

step S309: comparing the labels in the verification set with the set of detection results one by one, and recording an accuracy set;

step S310: and finding a corresponding first confidence coefficient threshold value and a second confidence coefficient threshold value with the highest accuracy in the accuracy set, and outputting the first confidence coefficient threshold value and the second confidence coefficient threshold value as the optimal threshold value combination.

The setting of the first confidence threshold value and the filtering of the driver state information bounding box according to the first confidence score specifically include: and when the first confidence score is larger than the set first confidence threshold, the driver state information boundary box is valid, the next step is carried out, the driver state information boundary box is expanded, and otherwise, the driver state information boundary box is invalid and is discarded.

The setting of the second confidence threshold value and the filtering of the driver state information bounding box according to the second confidence score specifically include: and when the second confidence score is larger than the set second confidence threshold, keeping the driver state information boundary box as the final detection output of the driver state information boundary box, and otherwise, carrying out rejection identification (rejection) on the driver state information boundary box and discarding the driver state information boundary box.

In a second aspect, the present application also provides an object detection device for driver condition monitoring, comprising: the system comprises a data acquisition module, a model building module, a first filtering module, a second filtering module and an expansion module;

the acquisition module, the model building module, the first filtering module, the expansion module and the second filtering module are sequentially connected;

the data acquisition module: collecting a driver state information image, and dividing all images into a training set and a testing set;

the model building module: establishing a driver state information detection model and a key point alignment model based on a deep convolutional neural network by using a training set;

the first filter module: setting a first confidence threshold, and filtering the boundary frame of the state information of the driver according to the first confidence score;

the extension module: expanding the filtered boundary box of the driver state information;

the second filter module: and setting a second confidence coefficient threshold value, and filtering the boundary box of the driver state information according to the second confidence coefficient score.

The beneficial technical effects are as follows:

the method and the device can reduce dependence on a single deep learning neural network model, the detection of the first model can guarantee higher detectable rate (Recall), and the detection of the second model can guarantee higher Precision (Precision). And then the boundary box of the driver state information is expanded before being input into the second model, so that the influence of the first model on the second model can be greatly reduced, and the detection precision is improved. According to the characteristics of the DCNN model, the setting of the confidence threshold plays a critical role in the detection accuracy, so that the method for automatically calculating the optimal threshold by using the verification set is further provided, the labor cost is saved, and the detection capability of the model can be maximized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a flow chart of a target detection method for driver condition monitoring provided in accordance with an embodiment of the present application;

FIG. 2 is a flow chart of an automated method for searching for an optimal threshold provided in accordance with an embodiment of the present application;

FIG. 3 is a flow chart of an improved automated search method for optimal thresholds according to an embodiment of the present application;

fig. 4 is a target detection device for monitoring a driver state according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the present application and its embodiments, and are not used to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.

Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.

In addition, the term "plurality" shall mean two as well as more than two.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

In a first aspect, the present application provides a target detection method for monitoring a driver state, as shown in fig. 1, the method may be used for detecting a human body posture, detecting a gesture, detecting a face, and the like of a driver.

s102: inputting a test set into the driver state information detection model based on the deep convolutional neural network to obtain a face frame and a first confidence score;

s103: setting a first confidence threshold, and filtering the face frame according to the first confidence score;

s104: expanding the filtered face frame;

s106: setting a second confidence threshold, and filtering the face frame according to the second confidence score;

s107: and obtaining the filtered face frame, and outputting the face frame for final detection.

The periphery of the filtered effective face frame is expanded to a certain extent, the purpose of expansion is to avoid face information loss caused by inaccurate face frame position, and the following two aspects need to be noticed during expansion: firstly, the aspect ratio of the image input by the key point alignment model adopted by the application is 1: 1, in order to avoid image scaling deformation, ensuring that the aspect ratio of the extended face frame is 1: and 1, preventing the expanded face frame from overflowing the boundary of the original input image.

The method for expanding the filtered face frame comprises the following steps:

assuming that the upper left corner of the input image is the origin, (x) ₁ ,y ₁ ) And (x) ₂ ,y ₂ ) To expand the coordinates of the top left corner and the bottom right corner of the front face frame, the width and the height of the input image are W, H respectively, and the coordinates of the midpoint of the face frame are taken

The width of the face frame is w ═ x ₂ -x ₁ The height of the face frame is h ═ y ₂ -y ₁ And taking the size of the short edge of the face frame as min (w, h), and then the coordinates of the upper left corner of the extended face frame are as follows:

the coordinates of the lower right corner are:

preventing the expanded face frame from overflowing the boundary of the input image:

x' ₁ <When 0, let x' ₁ ＝0，

When y' ₁ <When 0, let y' ₁ ＝0，

X' ₂ >When W is, let x' ₂ ＝W，

Y' ₂ >H is, y' ₂ ＝H，

Where δ is the spreading ratio. Generally, the value is 1.1-1.3, the specific value is determined according to the size and the position accuracy of a face frame output by a face detection model, and the extended face frame is ensured to contain the whole face information and not contain too much redundant information. And inputting the obtained image information corresponding to the new face frame into a deep learning neural network model in charge of key point alignment.

An automated search method for optimal thresholds requires the establishment of a sufficiently sophisticated validation set. The verification set should contain enough positive and negative samples and cover as many kinds of scenes as possible in practical application. Different from a common face detection data set, the driver face detection verification is centralized, and because each automobile driver has uniqueness, at most one face frame in each picture needs to be ensured.

For the trained face detection model and the trained key point alignment model, the determination of the threshold value directly affects the performance of the model in the actual scene, the first confidence threshold value and the second confidence threshold value are set, and the automatic search method of the optimal threshold value is adopted to obtain the optimal threshold value combination result, which is respectively used as the set first confidence threshold value and the set second confidence threshold value, as shown in fig. 2, the flow is as follows:

step S201: initializing a first confidence coefficient threshold and a second confidence coefficient threshold, and respectively setting initial and final values and step lengths of iteration of the first confidence coefficient threshold and the second confidence coefficient threshold;

step S202: endowing the first confidence coefficient threshold value and the second confidence coefficient threshold value to a driver state information detection model and a key point alignment model based on the deep convolutional neural network;

step S208: judging whether the second confidence coefficient threshold is greater than or equal to the final value of the second confidence coefficient threshold, if so, ending the search method, outputting the optimal threshold combination, and if not, going to step S209;

The automatic flow method can obtain the best model and the threshold combination of the model under the specified verification set at the fastest speed without manual intervention, and greatly saves the labor cost in model selection and threshold selection.

In order to further improve the search efficiency, aiming at the repeated calculation in the search method, the following optimization strategy is proposed, as shown in fig. 3, and the flow is as follows:

The method only needs to carry out reasoning once on the verification set, and the efficiency of searching the optimal threshold value is greatly improved.

The setting of the first confidence threshold and the filtering of the face frame according to the first confidence score specifically include: and when the first confidence score is larger than the set first confidence threshold, the face frame is valid, the next step is carried out, the face frame is expanded, and otherwise, the face frame is invalid and is discarded.

The setting of the second confidence threshold value and the filtering of the face frame according to the second confidence score specifically include: and when the second confidence score is larger than the set second confidence threshold, keeping the face frame as the final face frame detection output, and otherwise, performing recognition rejection (recognition rejection) on the face frame and discarding the face frame.

The method can be applied to the task of aligning the key points by using the detection result of the first stage as input in the second stage, and the detection result can be judged by using double thresholds, so that the detection accuracy can be improved.

the acquisition module, the model building module, the first filtering module, the expansion module and the second filtering module are sequentially connected, as shown in fig. 4;

the first filter module: setting a first confidence threshold, and filtering the face frame according to the first confidence score;

the extension module: expanding the filtered face frame;

the second filtration module: and setting a second confidence coefficient threshold value, and filtering the face frame according to the second confidence coefficient score.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An object detection method for driver condition monitoring, comprising:

s104: expanding the filtered boundary box of the driver state information;

s107: obtaining a filtered driver state information boundary frame, and outputting the boundary frame for final detection;

step S209: the first confidence threshold at this time is given as the initial value of the second confidence threshold, and the process goes to step S202.

2. The object detection method for driver state monitoring as claimed in claim 1, characterized in that the filtered driver state information bounding box is expanded by:

assuming that the upper left corner of the input image is the origin, (x) ₁ ，y ₁ ) And (x) ₂ ，y ₂ ) For expanding the coordinates of the upper left corner and the lower right corner of the boundary box of the driver state information before, the width and the height of the input image are W, H respectively, and the coordinates of the middle point of the boundary box of the driver state information are taken

the coordinates of the lower right corner are:

x' ₁ If < 0, let x' ₁ ＝0，

Y' ₁ If < 0, let 'y' ₁ ＝0，

X' ₂ When > W, let x' ₂ ＝W，

When y' ₂ When > H, let y' ₂ ＝H，

Where δ is the spreading ratio.

3. The object detection method for driver state monitoring according to claim 1, wherein the setting of the first confidence threshold and the second confidence threshold, the automated search method using the improved optimal threshold, and the result of the combination of the optimal thresholds, which are respectively used as the set first confidence threshold and the set second confidence threshold, are as follows:

step S301: initializing a first confidence coefficient threshold and a second confidence coefficient threshold, respectively setting initial values, final values, step lengths, a first list and a second list of iteration of the first confidence coefficient threshold and the second confidence coefficient threshold, putting the initial values of the first confidence coefficient threshold into the first list, and putting the initial values of the second confidence coefficient threshold into the second list;

step S306: assigning the first confidence threshold value at this time as the initial value of the second confidence threshold value, and proceeding to step S302;

4. The target detection method for driver state monitoring according to claim 1, wherein the setting of the first confidence threshold and the filtering of the driver state information bounding box according to the first confidence score are specifically: and when the first confidence score is larger than the set first confidence threshold, the driver state information boundary box is valid, the next step is carried out, the driver state information boundary box is expanded, and otherwise, the driver state information boundary box is invalid and is discarded.

5. The target detection method for driver state monitoring according to claim 1, wherein the setting of the second confidence threshold and the filtering of the driver state information bounding box according to the second confidence score are specifically: and when the second confidence score is larger than the set second confidence threshold, keeping the driver state information boundary box as the final detection output of the driver state information boundary box, and otherwise, carrying out rejection identification on the driver state information boundary box and discarding the driver state information boundary box.

6. An object detection device for driver condition monitoring, characterized by being implemented using the object detection method for driver condition monitoring according to any one of claims 1 to 5, comprising: the system comprises a data acquisition module, a model building module, a first filtering module, a second filtering module and an expansion module;

the first filtration module: setting a first confidence threshold, and filtering the boundary frame of the state information of the driver according to the first confidence score;

the second filter module: setting a second confidence threshold, and filtering the boundary frame of the state information of the driver according to the second confidence score;

threshold automatic searching modules are arranged in the first filtering module and the second filtering module, and automatic setting of the first confidence coefficient threshold and the second confidence coefficient threshold is completed by adopting an automatic searching method of an optimal threshold or an improved automatic searching method of the optimal threshold.

7. An electronic device, characterized in that the electronic device comprises a memory for storing a computer program and a processor for executing the computer program to cause the computer device to perform the object detection method for driver status monitoring according to any one of claims 1-5.

8. A readable storage medium, in which computer program instructions are stored, which, when read and executed by a processor, perform the object detection method for driver condition monitoring of any one of claims 1-5.