CN115171189A

CN115171189A - Fatigue detection method, device, equipment and storage medium

Info

Publication number: CN115171189A
Application number: CN202210822432.5A
Authority: CN
Inventors: 刘小东
Original assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Zeekr Intelligent Technology Co Ltd
Current assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Zeekr Intelligent Technology Co Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2022-10-11

Abstract

The embodiment of the disclosure provides a fatigue detection method, a fatigue detection device, fatigue detection equipment and a storage medium, wherein the method comprises the following steps: acquiring 3D face point cloud data of an object to be detected, and converting the 3D face point cloud data into 2D image data; carrying out face detection on the 2D image data to obtain a face image of the object to be detected; determining a human eye region image of the object to be detected according to the human face image; identifying the human eye region image based on a deep learning convolutional neural network to obtain an eye identification result, wherein the eye identification result is used for representing the eye fatigue degree of the object to be detected; and determining that the object to be detected is in a fatigue state in response to the eye recognition result obtained within a preset time period reaching a prediction condition. The method can more accurately identify the fatigue state of the object to be detected, so that the identification rate of fatigue detection is higher, and the robustness is stronger.

Description

Fatigue detection method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a fatigue detection method, apparatus, device, and storage medium.

Background

Fatigue detection based on a face image of a driver is the main trend of fatigue driving detection, but when the fatigue features of the face of the driver are extracted, the accuracy of the method in fatigue detection is not high due to the influence of factors such as illumination, shielding, posture, angle and the like.

Disclosure of Invention

In view of the above, the disclosed embodiments provide at least one fatigue detection method, apparatus, device and storage medium.

Specifically, the embodiment of the present disclosure is implemented by the following technical solutions:

in a first aspect, a method for detecting fatigue is provided, the method comprising:

acquiring 3D face point cloud data of an object to be detected, and converting the 3D face point cloud data into 2D image data;

carrying out face detection on the 2D image data to obtain a face image of the object to be detected;

determining a human eye region image of the object to be detected according to the human face image;

identifying the human eye region image based on a deep learning convolutional neural network to obtain an eye identification result, wherein the eye identification result is used for representing the eye fatigue degree of the object to be detected;

and determining that the object to be detected is in a fatigue state in response to the eye recognition result obtained within a preset time period reaching a prediction condition.

In a second aspect, there is provided a fatigue detection apparatus, the apparatus comprising:

the data acquisition module is used for acquiring 3D face point cloud data of an object to be detected and converting the point cloud data into 2D image data;

the face detection module is used for carrying out face detection on the 2D image data to obtain a face image of the object to be detected;

the human eye determining module is used for determining a human eye area image of the object to be detected according to the face image;

the eye recognition module is used for recognizing the human eye region image based on a deep learning convolutional neural network to obtain an eye recognition result, and the eye recognition result is used for expressing the eye fatigue degree of the object to be detected;

and the state determination module is used for responding to the eye recognition result obtained in the preset time period to reach the prediction condition and determining that the object to be detected is in a fatigue state.

In a third aspect, an electronic device is provided, the device comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the fatigue detection method according to any of the embodiments of the present disclosure when executing the computer instructions.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the fatigue detection method according to any of the embodiments of the present disclosure.

According to the fatigue detection method provided by the technical scheme of the embodiment of the disclosure, the 3D face point cloud data of the object to be detected is obtained and converted into the 2D image data for subsequent fatigue detection, wherein the 3D face point cloud data can better contain feature detail information of a face for detection, so that the influences of illumination, shielding, posture and angle are reduced; carrying out face detection on the 2D image data to obtain a face image of an object to be detected, determining an eye region image of the object to be detected according to the face image, and positioning the accurate range of the eyes of the face; the eye region image is recognized based on the deep learning convolutional neural network to obtain an eye recognition result, the object to be detected is determined to be in a fatigue state in response to the fact that the eye recognition result obtained in a preset time period reaches a prediction condition, and the fatigue state of the object to be detected can be recognized more accurately due to the fact that the deep learning has good generalization learning capability and recognition accuracy when the eye features are extracted compared with a traditional method, and therefore the recognition rate of fatigue detection is higher and robustness is higher.

Drawings

In order to more clearly illustrate one or more embodiments of the present disclosure or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in one or more embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 is a flow chart illustrating a fatigue detection method in accordance with at least one embodiment of the present disclosure;

FIG. 2 is a diagram of a depth separable convolution structure, as shown in at least one embodiment of the present disclosure;

FIG. 3 is a block diagram of a face detector, according to at least one embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a face keypoint, in accordance with at least one embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a curve fit in accordance with at least one embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating face alignment in accordance with at least one embodiment of the present disclosure;

FIG. 7 is a schematic view of a human eye alignment shown in at least one embodiment of the present disclosure;

FIG. 8 is a block diagram of a deep learning convolutional neural network, shown in at least one embodiment of the present disclosure;

FIG. 9 is a network architecture diagram of a feature extraction module, shown in at least one embodiment of the present disclosure;

FIG. 10 is a graph illustrating a degree of eye opening in accordance with at least one embodiment of the present disclosure;

FIG. 11 is a flow chart illustrating yet another fatigue detection method in accordance with at least one embodiment of the present disclosure;

FIG. 12 is a block diagram illustrating a fatigue detection device in accordance with at least one embodiment of the present disclosure;

fig. 13 is a hardware structure diagram of an electronic device according to at least one embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

In recent years, with the development of intelligent automobiles, an auxiliary safety driving system has become an important research project of the intelligent automobiles, and fatigue driving is one of important causes of traffic accidents. If a set of effective driver fatigue detection system can be developed, auxiliary reminding can be given when the driver is tired, and the occurrence of traffic accidents can be effectively avoided.

Currently, fatigue detection methods for drivers are mainly focused on three categories: the first is fatigue detection based on driving characteristics of the vehicle; the second type is fatigue detection based on the physiological characteristics of the driver; the third category is fatigue detection based on changes in the driver's face. The first method is limited by conditions such as driver habits, vehicle models, natural weather and road conditions, so that the detection result has larger error and the accuracy needs to be improved; in the second method, when the physiological characteristics of the driver are extracted, the acquisition equipment acquires signals in a mode of directly contacting the body of the driver, so that the normal driving and driving safety of the driver can be interfered; the third type of driver face fatigue detection is the main trend of fatigue driving detection, but when the fatigue characteristics of the driver are extracted, the fatigue characteristics are influenced by factors such as illumination, shielding, posture, angle and the like, so that the accuracy of the fatigue detection of the method is relatively low.

Based on this, the embodiment of the present disclosure provides a fatigue detection algorithm for a 3D (3-dimension) face based on deep learning on the basis of a third type of face fatigue detection.

As shown in fig. 1, fig. 1 is a flowchart illustrating a fatigue detection method, which may be used in a safety-assisted driving system of a vehicle, according to at least one embodiment of the present disclosure, and includes the following steps:

in step 102, 3D face point cloud data of an object to be detected is obtained and converted into 2D image data.

In this embodiment, the 3D face point cloud data is information of a plurality of points in space acquired by any 3D scanning device such as a laser radar and a 3D camera module sensor, and may include XYZ position information, RGB color information, intensity information, and the like, and is a set of vectors in a three-dimensional coordinate system. The 3D face point cloud data can provide abundant geometric, shape and scale information and is not easily influenced by illumination intensity change, other object occlusion and the like. The 2D (2-dimension) image data is image information of a two-dimensional plane.

When the object to be detected is a driver, 3D face point cloud data acquired by collecting the object to be detected in the vehicle cabin can be acquired.

In this step, the 3D point cloud data collected by the 3D scanning device may be preprocessed by cutting, filtering, etc., for example, a sphere composed of 8cm radius with the nose tip of the human face as the center of the sphere is used as a boundary, and only the point cloud in the sphere is retained, i.e., the point cloud of the human face is segmented, so that the interference of the background is removed. The segmented face point cloud can be filtered, invalid noise is removed, and 3D face point cloud data which are preprocessed are obtained.

When the 3D face point cloud data is converted into the 2D image data, the information of the 3D face point cloud data may be projected into a plane parallel to the face, or the information of the 3D face point cloud data may be projected onto a curved surface, such as a spherical surface or a cylindrical surface, to obtain the 2D image data, where the 2D image data includes a face image of the object to be detected.

In one embodiment, in order to prevent the detail information of the face in the converted 2D image data from being omitted and to make the face in the 2D image data more correct, the following processing may be performed:

adjusting the offset angle of the 3D face point cloud data to enable the adjusted 3D face point cloud data to be matched with a standard 3D face model; the standard 3D face model is preset standard 3D face point cloud data, for example, the standard 3D face model may be obtained by collecting a front face of a person with a straight neck and looking ahead, and the standard 3D face model may be used as a template for aligning the acquired 3D face point cloud data. In the actual 3D face point cloud data acquisition, due to the pose, the shielding, and other reasons of the target object, the angle of the face in the acquired 3D face point cloud data is not fixed, and if the 2D image data is obtained by direct projection, the key face information is likely to be omitted, so in this example, the standard 3D face model is used as the standard template, and the offset angle of the 3D face point cloud data is adjusted, for example, the 3D face point cloud data can be integrally rotated, offset, and the like in the X, Y, Z axis and other directions, so that the adjusted 3D face point cloud data is matched with or matched with the standard 3D face model.

Projecting the adjusted 3D face point cloud data into a plane corresponding to the standard 3D face model to obtain 2D image data; and the plane corresponding to the standard 3D face model is a plane parallel to the standard face. When the standard 3D face model is set, a plane parallel to the standard face corresponding to the standard 3D face model is set at the same time, and when the plane parallel to the standard face is set, facial key points of the standard face, for example, key points at the centers of both eyes and key points at the eyebrow center, may be obtained first, and a plane determined according to the key points is considered to be a plane parallel to the standard face. The face in the adjusted 3D face point cloud data is also parallel to the standard plane, so that the adjusted 3D face point cloud data is projected into the plane corresponding to the standard 3D face model, and the 2D image data containing the face on the front side can be obtained. In step 104, performing face detection on the 2D image data to obtain a face image of the object to be detected.

In this embodiment, the manner of performing face detection on the 2D image data is not limited, for example, face detection may be performed by using face detection models such as Retina Net, yoloV3, PCN, and the like, or face detection may be performed by using face detection algorithms such as face detector in opencv and face detector in dlib.

In one example, a face detector may be used to detect 2D image data, resulting in a face detection frame containing a face image; the convolutional neural network in the face detector is a deep separable convolutional structure.

In this example, we can design a new face detector MTCNN (Multi-task convolutional neural Network), which is composed of three layer Network models, namely, PNet (candidate Network), RNet (Refine Network) and ONet (Output Network), and can achieve both performance and accuracy, and can realize coarse-to-fine face detection. The convolution networks of all layers in the face detector adopt a deep separable convolution technology to replace the traditional convolution neural network, thereby not only considering performance and accuracy rate, but also reducing the calculated amount of the network. As shown in fig. 2, (a) in fig. 2 is a standard convolution structure, and (b) in fig. 2 is a depth separable convolution structure.

The computational effort of the depth separable convolution structure and the normal standard convolution structure can be compared, here expressed as the ratio of the number of parameters used in the convolution, as shown in equation (1):

wherein D is _F *D _F * M is the size of the convolution kernel, D _K *D _K * M is the size of the feature map, and N is the number of convolution kernels. In general, N is larger, and if a convolution of 3x3 is used, the depth separable convolution can be reduced by a factor of 9 compared to the standard convolution.

When the improved MTCNN with the depth separable convolution structure is used for face detection, firstly, 2D image data is transformed in different scales to construct an image pyramid so as to be suitable for detection of faces with different sizes, a PNet model is used for generating a large number of candidate target area frames, a RNet model is used for carrying out selection and frame regression on the candidate target area frames so as to exclude most negative examples, an ONet network is used for carrying out judgment and area frame regression on the remaining target area frames, and therefore 5 key points of a face area are detected and positioned, and a face image with 5 key points is obtained, wherein the 5 key points are two left and right mouth corners, centers of two eyes and points on a nose. The MTCNN model structure of the improved deep separable convolution structure is shown in fig. 3.

In step 106, the human eye region image of the object to be detected is determined according to the human face image.

The eye region image is an image of an eye region of the object to be detected. For example, target detection with human eyes as targets can be performed on a human face image to obtain a human eye region image; for another example, the human eye image may be segmented into human eye region images by image segmentation. The embodiment does not limit the specific manner of determining the human eye region image of the object to be detected from the human face image.

In one example, in order to accurately locate a range of human eyes, when determining an image of a human eye region of an object to be detected according to a human face image, key point detection may be performed on the human face image to obtain a plurality of human eye key points; and performing curve fitting according to the key points of the human eyes to obtain an image of the human eye region. The human eye key points are key points on the eye contour, including left eye key points and right eye key points.

For example, in order to refine local key regions in a face, a face key point locator may be used to perform key point detection on a face image. In this example, 68 point positioning of the face contour of the human face can be used, 68 point positioning can describe the face contour of the human face and local features such as eyes and mouth more accurately, the operation time is reduced, and the real-time performance of face recognition is improved, and in other examples, other numbers of key points of the human eyes can also be used. In point 68 localization, as shown in FIG. 4, the left eye has 6 keypoints (37-42), the right eye has 6 keypoints (43-48), the mouth has 20 keypoints (49-68), and the rest are face contour keypoints. After 68 key points are detected, a plurality of key points of the human eye, namely, (37-42), (43-48) are selected. For another example, the eye key point locator may be directly used to detect the face image to obtain a plurality of eye key points.

And performing curve fitting on the plurality of detected human eye key points to accurately position a human eye region to obtain a human eye region image, as shown in fig. 5. In the curve fitting, a polynomial interpolation fitting, a least square curve fitting, or the like may be used.

In one example, in order to reduce the influence of the facial pose and the shooting angle of the object to be detected and make the extracted feature information of the human face region more stable, before determining the image of the human eye region of the object to be detected according to the human face image, the method further includes: and aligning the human face in the human face image to obtain an aligned human face image. The face alignment is actually the alignment of key points of the face. This operation of face alignment enables subsequent models to extract features that are independent of the position of the five sense organs, only the shape texture of the five sense organs.

As shown in fig. 6, the first image is an inclined face, the second image is an aligned face, and the position of the aligned face in the face image is the position in the normal posture. For example, during alignment, the alignment may be performed according to an angle between a connecting line of two eye key points and an axis of the face image, and the angle is adjusted through affine transformation to achieve the face alignment. In other examples, face alignment may also be performed using a pre-trained neural network model.

In one example, since the actual situation of the object to be detected is usually complicated, for example, due to the face pose or the camera angle, the eyes in the finally recognized eye region image are not in a horizontal state, but have a certain inclination angle, and at this time, the positions of the eyes need to be adjusted to be rotated to be horizontal, so as to reduce the influence of the complicated factors, and therefore, the alignment of the eyes is required. After determining the human eye area image of the object to be detected according to the face image, the method further comprises: and aligning the human eyes in the human eye area image to obtain an aligned human eye area image. As shown in fig. 7, the left side is the tilted human eye and the right side is the aligned human eye. During alignment, the left eye and the right eye can be aligned respectively, or the left eye and the right eye can be aligned simultaneously, and the human eye alignment can be performed by adopting affine transformation, a pre-trained neural network model and other modes.

After the human eye region image is obtained, the human eye region image can be filtered, interference factors are removed, and fitting of the eye shape boundary in the human eye region image is refined.

In step 108, the eye region image is identified based on a deep learning convolutional neural network to obtain an eye identification result, wherein the eye identification result is used for representing the eye fatigue degree of the object to be detected.

With the continuous progress of machine learning technology, deep learning algorithms represented by convolutional neural networks are widely applied in many fields. In the process of feature extraction and classification, the deep learning technology does not need manual intervention, and the defects of complexity of manual extraction and low efficiency are overcome. The features are extracted layer by layer, the features of each layer can be well fused together, the interference of noise and background is avoided, deeper information can be mined for a target to be trained, and the robustness of the method is superior to that of the traditional pattern recognition method. Therefore, in the step, the eye fatigue features of the object to be detected are extracted and classified and identified by utilizing the algorithm of the deep learning convolutional neural network, so that the accuracy and the robustness of the fatigue detection algorithm are improved.

The deep learning convolutional neural network is used for identifying the eye fatigue degree of an object to be detected in the human eye region image, inputting the human eye region image into the deep learning convolutional neural network, and outputting an eye identification result corresponding to the frame of human eye region image. The eye recognition result may be classified into eye fatigue and eye non-fatigue, or eye non-fatigue, eye mild fatigue, eye moderate fatigue and eye severe fatigue.

In one example, a deep learning convolutional neural network, comprising: at least one feature extraction module, a global feature identification module; the convolution kernels in the at least one feature extraction module are different. The at least one feature extraction module is used for extracting features of the input human eye region image; the global feature recognition module is used for obtaining an eye recognition result based on the extracted feature recognition.

An exemplary framework of the deep learning convolutional neural network constructed in this example is shown in fig. 8. The deep learning convolutional neural network in the graph comprises human eye region images after filtering and alignment and is output as an eye identification result corresponding to the frame of human eye region images. The framework adopts five feature extraction modules, the number of the feature extraction modules is not limited, each feature extraction module is a deep learning calculation unit, and the same network structure or different network structures can be adopted. When the same network structure is employed, the network parameters in each module are different. The first feature extraction module is used for extracting features of the input human eye region image and outputting the extracted features, the second feature extraction module is used for continuously extracting the features extracted by the first feature extraction module and outputting the extracted features, each feature extraction module except the first module is used for continuously extracting the features based on the features extracted by the last module, the features extracted by the last feature extraction module are input into the global feature extraction module, and the global feature recognition module is used for recognizing the extracted features to obtain an eye recognition result.

For example, each module may use a short-CNN (direct convolution network) to better capture the local shape information of the eye, and the short-CNN network adopts different convolution kernels in each local feature extractor in different modules, increases different receptive fields, better learns different features, and newly adds a jump connection, so that the extracted features can be transmitted to the next layer of network. The structure of Shortcut-CNN is shown in FIG. 9, the Shoutcut-CNN of each module uses different convolution kernels to extract features, and simultaneously uses 1x1 convolution kernels to extract features, and then combines the features of different receptive fields. When the layer number of the network becomes deeper, the extracted feature information is more abstract, and the learned features are better. The deep learning convolution neural network in the embodiment uses five feature extraction modules to perform feature fusion on feature information of different scales, so that the learning capability of a network architecture is stronger.

In step 110, it is determined that the object to be detected is in a fatigue state in response to the eye recognition result obtained within a preset time period reaching a prediction condition.

The prediction condition may be determined according to the content included in the eye recognition result. For example, when the eye recognition result includes eye fatigue and eye non-fatigue, the prediction condition may be that the number of frames of the eye region image of the eye fatigue reaches an early warning frame number or an early warning proportion within a preset time period, and when the preset condition is met, it is determined that the object to be detected is in a fatigue state; for another example, when the eye recognition result is eye fatigue, mild eye fatigue, moderate eye fatigue and severe eye fatigue, the prediction condition may be that the number of frames of the eye region image of the eye mild fatigue, the eye moderate fatigue and the eye severe fatigue reaches the early warning frame number or the early warning proportion within a preset time period, and when the preset condition is met, it is determined that the object to be detected is in the mild/moderate/severe fatigue state.

In an example, the eye recognition result is that the object to be detected is in an eye-closed state or an eye-open state, and when it is determined that the object to be detected is in a fatigue state, the number of the early warning frames of the eye region images in the eye-closed state in the eye recognition results corresponding to the multiple frames of the eye region images obtained within the preset time period may be determined, and the object to be detected is determined to be in the fatigue state.

According to the fatigue detection method in the embodiment of the disclosure, the 3D face point cloud data of the object to be detected is obtained and converted into the 2D image data for subsequent fatigue detection, wherein the 3D face point cloud data can better contain feature detail information of a face for detection, so that the influences of illumination, shielding, posture and angle are reduced; carrying out face detection on the 2D image data to obtain a face image of an object to be detected, determining an eye region image of the object to be detected according to the face image, and positioning the accurate range of the eyes of the face; the eye region image is recognized based on the deep learning convolutional neural network to obtain an eye recognition result, the object to be detected is determined to be in a fatigue state in response to the fact that the eye recognition result obtained in a preset time period reaches a prediction condition, and the fatigue state of the object to be detected can be recognized more accurately due to the fact that the deep learning has good generalization learning capability and recognition accuracy when the eye features are extracted compared with a traditional method, and therefore the recognition rate of fatigue detection is higher and robustness is higher.

In one embodiment, the deep learning convolutional neural network may determine the state of the driver's eyes using the PERCLOS principle, which generally has three metrics: p70, p80 and em, the p80 standard is selected as the fatigue judgment standard in the example, and when the area of the pupil covered by the eyelid exceeds 80%, the eye is marked as closed, namely the eye is in a fatigue state. The eye opening/closing curve based on PERCLOS principle is shown in FIG. 10, and the vertical axis is the degree of eye opening E _open The horizontal axis represents time t, t ₁ Time of eye 20% closure, [ t% ₁ ,t ₂ ]The interval is from 20% closure of the eye to 80% closure of the eyeTime taken, [ t ₂ ,t ₃ ]The interval is the time it takes for the eye to close completely, [ t ₃ ,t ₄ ]The interval is the time it takes for the eyes to open from 20% to 80%. The time is corresponding to the image sequence in the video, so the time can be represented by the number of frames, and the time percentage of the eye closing area exceeding 80% in the preset time period exceeds the threshold value, namely the number of the early warning frames of the images of the eye region in the eye closing state is reached.

Let fp be the percentage of eye closure time in this period, as shown in equation (2):

the eye can be seen as two nested ellipses. Assuming that the width of the open eye is w and the vertical distance of the upper and lower eyelids is h, the area of the eye is approximated to S as shown in equation (3):

S＝π*w*h (3)

taking the left eye as an example, the calculation expressions (4) and (5) of w and h of the human eye are shown in fig. 4 according to the key points collected by the eye:

w＝x ₄₀ -x ₃₇ (4)

h＝y ₃₉ -y ₄₁ (5)

wherein x is ₃₇ 、x ₄₀ The abscissa of key points at two ends of the left eye of a human body; y is ₄₁ 、y ₃₉ Is the ordinate of key points on the upper, lower and right sides of the eyelid of the left eye. The right eye is treated in the same way, and the percentage of the eye opened is calculated as shown in equation (6):

wherein, w _max 、h _max Respectively, the width and height of the eye when it is fully open.

Will calculate the obtained E _open Calculating t according to PERCOLS principle ₁ ,t ₂ ,t ₃ ,t ₄ Fp can be calculated by substituting the formula (2), and fp can be input into a deep convolution neural network for training to judge whether the person to be detected is in a fatigue state. And finally, calculating the frame number in the fatigue state in a preset time period, and outputting the fatigue detection early warning of the driver when the early warning frame number is reached. The complete flow of this embodiment is shown in FIG. 11.

As shown in fig. 12, fig. 12 is a block diagram of a fatigue detection apparatus according to at least one embodiment of the present disclosure, the apparatus including:

the data acquisition module 11 is used for acquiring 3D face point cloud data of an object to be detected and converting the 3D face point cloud data into 2D image data;

a face detection module 12, configured to perform face detection on the 2D image data to obtain a face image of the object to be detected;

the human eye determining module 13 is configured to determine a human eye region image of the object to be detected according to the face image;

the eye recognition module 14 is configured to recognize the human eye region image based on a deep learning convolutional neural network to obtain an eye recognition result, where the eye recognition result is used to indicate the eye fatigue degree of the object to be detected;

the state determining module 15 is configured to determine that the object to be detected is in a fatigue state in response to that the eye recognition result obtained within a preset time period reaches a prediction condition.

In an embodiment, the data obtaining module 11 is configured to, when obtaining 3D face point cloud data of an object to be detected, specifically: and acquiring 3D face point cloud data acquired by collecting the object to be detected in the vehicle cabin.

In one embodiment, the face detection module 12 is specifically configured to: detecting the 2D image data by using a face detector to obtain a face detection frame containing a face image; the convolutional neural network in the face detector is a depth separable convolutional structure.

In an embodiment, the human eye determining module 13 is specifically configured to: performing key point detection on the face image to obtain a plurality of human eye key points; and performing curve fitting according to the key points of the human eyes to obtain an image of the human eye region.

In one embodiment, the eye determination module 13 is further configured to: before determining the human eye region image of the object to be detected according to the human face image, aligning the human face in the human face image to obtain an aligned human face image; after the human eye region image of the object to be detected is determined according to the human face image, aligning human eyes in the human eye region image to obtain an aligned human eye region image.

In one embodiment, the deep learning convolutional neural network comprises: at least one feature extraction module, a global feature identification module; the convolution kernels in the at least one feature extraction module are different; the at least one feature extraction module is used for extracting features of the input human eye region image; the global feature recognition module is used for obtaining an eye recognition result based on the extracted feature recognition.

In one embodiment, the eye recognition result is that the object to be detected is in an eye closing state or an eye opening state; a state determination module 15 configured to: and responding to an eye recognition result corresponding to a plurality of frames of eye region images obtained within a preset time period, wherein the number of the early warning frames is reached by the eye region images in the eye closing state, and determining that the object to be detected is in the fatigue state.

The implementation process of the functions and actions of each module in the above device is detailed in the implementation process of the corresponding steps in the above method, and is not described herein again.

The embodiment of the present disclosure further provides an electronic device, as shown in fig. 13, where the electronic device includes a memory 21 and a processor 22, where the memory 21 is configured to store computer instructions executable on the processor, and the processor 22 is configured to implement the fatigue detection method according to any embodiment of the present disclosure when executing the computer instructions.

Embodiments of the present disclosure also provide a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the computer program/instruction implements the fatigue detection method according to any embodiment of the present disclosure.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for detecting fatigue is implemented according to any one of the embodiments of the present disclosure.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the present specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following the general principles of the specification and including such departures from the present disclosure as come within known or customary practice in the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method of fatigue detection, the method comprising:

determining a human eye area image of the object to be detected according to the human face image;

2. The method of claim 1,

the conversion into 2D image data includes:

adjusting the offset angle of the 3D face point cloud data to enable the adjusted 3D face point cloud data to be matched with a standard 3D face model;

projecting the adjusted 3D face point cloud data into a plane corresponding to the standard 3D face model to obtain 2D image data; and the plane corresponding to the standard 3D face model is a plane parallel to the standard face.

3. The method of claim 1,

the method for acquiring the 3D face point cloud data of the object to be detected comprises the following steps:

and acquiring 3D face point cloud data acquired by collecting the object to be detected in the vehicle cabin.

4. The method of claim 1,

the performing face detection on the 2D image data to obtain a face image of the object to be detected includes:

detecting the 2D image data by using a face detector to obtain a face detection frame containing a face image; the convolutional neural network in the face detector is a depth separable convolutional structure.

5. The method of claim 1,

the determining the human eye region image of the object to be detected according to the face image comprises the following steps:

carrying out key point detection on the face image to obtain a plurality of human eye key points;

and performing curve fitting according to the key points of the human eyes to obtain an image of the human eye region.

6. The method of claim 1,

before determining the human eye region image of the object to be detected according to the human face image, the method further comprises:

aligning the face in the face image to obtain an aligned face image;

after determining the human eye area image of the object to be detected according to the face image, the method further comprises:

and aligning the human eyes in the human eye area image to obtain the aligned human eye area image.

7. The method of claim 1,

the deep learning convolutional neural network comprises: at least one feature extraction module, a global feature identification module; the convolution kernels in the at least one feature extraction module are different;

the at least one feature extraction module is used for extracting features of the input human eye region image;

the global feature recognition module is used for obtaining an eye recognition result based on the extracted feature recognition.

8. The method according to claim 1, wherein the eye recognition result is that the object to be detected is in an eye-closed state or an eye-open state;

the determining that the object to be detected is in a fatigue state when the eye recognition result obtained in response to the preset time period reaches the prediction condition includes:

and responding to an eye recognition result corresponding to a plurality of frames of eye region images obtained within a preset time period, wherein the number of the early warning frames is reached by the eye region images in the eye closing state, and determining that the object to be detected is in the fatigue state.

9. A fatigue detection apparatus, characterized in that the apparatus comprises:

the data acquisition module is used for acquiring 3D face point cloud data of an object to be detected and converting the 3D face point cloud data into 2D image data;

the human eye determining module is used for determining a human eye region image of the object to be detected according to the human face image;

10. An electronic device, comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of any one of claims 1 to 8 when executing the computer instructions.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 8.