CN113537176A

CN113537176A - Method, device and equipment for determining fatigue state of driver

Info

Publication number: CN113537176A
Application number: CN202111085378.2A
Authority: CN
Inventors: 姜英豪; 朱星
Original assignee: Wuhan Future Phantom Technology Co Ltd
Current assignee: Wuhan Future Phantom Technology Co Ltd
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2021-10-22

Abstract

The application provides a method, a device and equipment for determining a fatigue state of a driver, which are used for providing a more convenient detection scheme for fatigue detection of the driver on a vehicle, and have significantly improved practicability for practical application. The method comprises the following steps: acquiring a plurality of initial images of a driving position on a vehicle; carrying out face recognition processing on the plurality of initial images to obtain a plurality of face images; extracting a plurality of human eye images according to the human face features corresponding to the plurality of human face images; sequentially inputting a plurality of human eye images into the open-close eye recognition model, and performing open-close eye recognition processing on the human eye images by the open-close eye recognition model to obtain a plurality of open-close eye recognition results output by the open-close eye recognition model; judging whether continuous closed eye recognition results with time point span larger than preset time length exist in the open-closed eye recognition results or not by combining the image acquisition time points corresponding to the open-closed eye recognition results; and if so, determining that the drivers corresponding to the plurality of initial images are in a fatigue state.

Description

Method, device and equipment for determining fatigue state of driver

Technical Field

The application relates to the field of vehicles, in particular to a method, a device and equipment for determining fatigue state of a driver.

Background

With the continuous improvement of living standard, the holding amount of vehicles keeps increasing, and to a certain extent, the more serious the congestion condition of the road, the higher the probability of vehicle collision, and the congested road condition has higher requirements for the attention of drivers compared with the road in a normal state except the vehicle driving technology.

In addition, studies show that a minimum of 8000 traffic accidents are caused by inattention of drivers or fatigue, and the fatigue driving of the drivers is one of the main factors causing the traffic accidents, so that the method has great significance for the research on fatigue detection of the drivers.

In the research process of the prior related art, the inventor finds that the prior detection scheme of the fatigue state of the driver mainly comprises two types, namely a first type detection scheme which adopts electroencephalogram to judge whether the driver is in the fatigue state or not and has the characteristic of contact, but possibly influences the driving operation of the driver; the second type of detection scheme measures the fatigue state of the driver by using a detection value of a physical quantity (PERCLOS) for measuring fatigue/drowsiness, which can be understood as the Percentage of the Time of the eye closing state in a unit Time to the total Time, however, the total Time generally needs to be 1 minute or 30 seconds, which is a problem of long Time consumption, and as can be seen, the existing detection scheme has a problem of inconvenient application.

Disclosure of Invention

The application provides a method, a device and equipment for determining a fatigue state of a driver, which are used for providing a more convenient detection scheme for fatigue detection of the driver on a vehicle, and have significantly improved practicability for practical application.

In a first aspect, the present application provides a method for determining a fatigue state of a driver, the method comprising:

acquiring a plurality of initial images of a driving position on a vehicle, wherein the plurality of initial images are obtained by shooting the driving position through a camera arranged on the vehicle;

carrying out face recognition processing on the plurality of initial images to obtain a plurality of face images contained in the plurality of initial images;

extracting a plurality of human eye images contained in the plurality of human face images according to human face features corresponding to the plurality of human face images, wherein the human eye images comprise a pupil area and an area in a preset range around the pupil area;

sequentially inputting a plurality of human eye images to an open-close eye recognition model, carrying out open-close eye recognition processing on the human eye images by the open-close eye recognition model to obtain a plurality of open-close eye recognition results output by the open-close eye recognition model, wherein the open-close eye recognition model is obtained by training an initial model through a sample human eye image marked with corresponding open-close eye recognition results, and the open-close eye recognition model is used for recognizing whether human eyes in an input image are in an open-close state or a closed-close state;

judging whether continuous closed eye recognition results with time point span larger than preset time length exist in the open-closed eye recognition results or not by combining the image acquisition time points corresponding to the open-closed eye recognition results;

and if so, determining that the drivers corresponding to the plurality of initial images are in a fatigue state.

With reference to the first aspect of the present application, in a first possible implementation manner of the first aspect of the present application, the open-close eye recognition model is specifically a neural network model that adopts a model architecture of the CNN LeNet-5 model.

With reference to the first possible implementation manner of the first aspect of the present application, in a second possible implementation manner of the first aspect of the present application, the size of the input layer of the open-close eye recognition model is 48 × 48 pixels, and the number of channels of each convolution layer in the open-close eye recognition model is twice the number of channels of the corresponding convolution layer in the CNN LeNet-5 model.

With reference to the first possible implementation manner of the first aspect of the present application, in a third possible implementation manner of the first aspect of the present application, an attention mechanism model architecture is configured between at least one front layer model structure and at least one rear layer model structure, where the attention mechanism model architecture sequentially includes a global pooling layer, a first full connection layer, a second full connection layer, and a sigmod output layer.

With reference to the first possible implementation manner of the first aspect of the present application, in a fourth possible implementation manner of the first aspect of the present application, in a training process of the open-close eye recognition model, on the basis of positive and negative sample human eye images, a Focal _ loss function is used for training the model.

With reference to the first possible implementation manner of the first aspect of the present application, in a fifth possible implementation manner of the first aspect of the present application, in sample eye images adopted by the open-closed eye recognition model in a training process, the eye image in the completely closed eye state is labeled as an eye-closed recognition result, the eye image in the squinting state, the completely open eye state and a non-eye region is labeled as an eye-open recognition result, the sample eye image is obtained by performing data enhancement on an initial sample eye image, and the data enhancement processing includes at least one of random color enhancement processing, random rotation processing and random saturation processing.

With reference to the first aspect of the present application, in a sixth possible implementation manner of the first aspect of the present application, the preset time period is specifically 1 to 2 s.

In a second aspect, the present application provides a driver fatigue state determination apparatus, comprising:

an acquisition unit configured to acquire a plurality of initial images of a driving position on a vehicle, the plurality of initial images being obtained by photographing the driving position by a camera disposed on the vehicle;

the face recognition unit is used for carrying out face recognition processing on the plurality of initial images to obtain a plurality of face images contained in the plurality of initial images;

the extraction unit is used for extracting a plurality of human eye images contained in the plurality of human face images according to human face features corresponding to the plurality of human face images, and the human eye images comprise a pupil area and an area in a preset range around the pupil area;

an open-closed eye recognition unit configured to sequentially input a plurality of human eye images to an open-closed eye recognition model, perform open-closed eye recognition processing on the human eye images by the open-closed eye recognition model, and obtain a plurality of open-closed eye recognition results output by the open-closed eye recognition model, wherein the open-closed eye recognition model is obtained by training an initial model with a sample human eye image labeled with a result corresponding to the open-closed eye recognition result, and the open-closed eye recognition model is used to recognize whether human eyes in an input image are in an open-eye state or a closed-eye state;

the judging unit is used for judging whether continuous closed eye recognition results with time point span larger than preset time length exist in the open-close eye recognition results by combining the image acquisition time points corresponding to the open-close eye recognition results, and if yes, the determining unit is triggered;

and the determining unit is used for determining that the drivers corresponding to the plurality of initial images are in a fatigue state.

With reference to the second aspect of the present application, in a first possible implementation manner of the second aspect of the present application, the open-close eye recognition model is specifically a neural network model that adopts a model architecture of the CNN LeNet-5 model.

With reference to the first possible implementation manner of the second aspect of the present application, in the second possible implementation manner of the second aspect of the present application, the size of the input layer of the open-close eye recognition model is 48 × 48 pixels, and the number of channels of each convolution layer in the open-close eye recognition model is twice the number of channels of the corresponding convolution layer in the CNN LeNet-5 model.

With reference to the first possible implementation manner of the second aspect of the present application, in a third possible implementation manner of the second aspect of the present application, an attention mechanism model architecture is configured between at least one front layer model structure and at least one rear layer model structure, where the attention mechanism model architecture sequentially includes a global pooling layer, a first full connection layer, a second full connection layer, and a sigmod output layer.

With reference to the first possible implementation manner of the second aspect of the present application, in a fourth possible implementation manner of the second aspect of the present application, in a training process of the open-close eye recognition model, based on positive and negative sample human eye images, a Focal _ loss function is used for training the model.

With reference to the first possible implementation manner of the second aspect of the present application, in a fifth possible implementation manner of the second aspect of the present application, in sample eye images used in a training process of the open-close eye recognition model, the eye image in the completely closed eye state is labeled as an eye-close recognition result, the eye image in the squinting state, the completely open eye state, and a non-eye region appears is labeled as an eye-open recognition result, the sample eye image is obtained by performing data enhancement processing on an initial sample eye image, and the data enhancement processing includes at least one of random color enhancement processing, random rotation processing, and random saturation processing.

With reference to the second aspect of the present application, in a sixth possible implementation manner of the second aspect of the present application, the preset time period is specifically 1 to 2 s.

In a third aspect, the present application provides a device for determining a fatigue state of a driver, including a processor and a memory, where the memory stores a computer program, and the processor executes a method provided in the first aspect of the present application or any one of possible implementation manners of the first aspect of the present application when calling the computer program in the memory.

In a fourth aspect, the present application provides a computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the method provided in the first aspect of the present application or any one of the possible implementations of the first aspect of the present application.

From the above, the present application has the following advantageous effects:

the method and the device for detecting the fatigue state of the driver have the advantages that the detection requirement for the fatigue state of the driver is met, a new detection scheme is provided, in the detection process, hardware is not relied on, the driver is judged to be in the fatigue state by adopting a continuous eye-closing recognition result on the basis of introducing image recognition, the time for processing the initial image in actual operation can be greatly reduced, stable high detection precision can be kept, namely, a more convenient detection scheme is provided, and the practicability of remarkable improvement is realized for actual application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for determining a fatigue state of a driver according to the present application;

FIG. 2 is a schematic view of a scene of a face recognition result according to the present application;

FIG. 3 is a schematic diagram of a CNN LeNet-5 model in the prior art;

FIG. 4 is a schematic diagram of a model architecture of an open-closed eye recognition model of the present application;

FIG. 5 is a schematic view of a model architecture of the present attention mechanism model architecture;

FIG. 6 is a schematic structural diagram of a driver fatigue state determining apparatus according to the present application;

fig. 7 is a schematic structural diagram of the driver fatigue state determining apparatus according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus. The naming or numbering of the steps appearing in the present application does not mean that the steps in the method flow have to be executed in the chronological/logical order indicated by the naming or numbering, and the named or numbered process steps may be executed in a modified order depending on the technical purpose to be achieved, as long as the same or similar technical effects are achieved.

The division of the modules presented in this application is a logical division, and in practical applications, there may be another division, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed, and in addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, and the indirect coupling or communication connection between the modules may be in an electrical or other similar form, which is not limited in this application. The modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purpose of the present disclosure.

Before describing the method for determining the fatigue state of a driver provided by the present application, the background related to the present application will be described first.

The method and the device for determining the fatigue state of the driver and the computer-readable storage medium are capable of determining the fatigue state of the driver, are used for providing a more convenient detection scheme for fatigue detection of the driver on a vehicle, and have remarkably improved practicability for practical application.

In the method for determining the fatigue state of the driver, an execution subject may be a device for determining the fatigue state of the driver, or different types of devices for determining the fatigue state of the driver, such as a server, a physical host, User Equipment (UE), a vehicle-mounted terminal, and a vehicle, which are integrated with the device for determining the fatigue state of the driver. The determining device of the fatigue state of the driver may be implemented in a hardware or software manner, the UE may specifically be a terminal device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, or a Personal Digital Assistant (PDA), and the determining device of the fatigue state of the driver may be set in a device cluster manner.

For example, the device for determining the fatigue state of the driver is practical, and specifically can be a vehicle-mounted terminal on a vehicle, or even a vehicle itself, so that whether the carried driver is in the fatigue state can be judged locally in the vehicle in the working process of the vehicle, and if the driver is in the fatigue state, a pre-configured response strategy such as reminding and deceleration can be sent directly.

Or, the determining device for the fatigue state of the driver can also be a server or a UE, and acquires the corresponding image to determine whether the carried driver is in the fatigue state through the communication connection established between the server and the camera or the vehicle, and can directly send a prompt if the driver is in the fatigue state.

Or, the determining device for the fatigue state of the driver may also be a device for executing non-real-time detection processing, and is used for judging whether the carried driver is in the fatigue state after the image is acquired, and is suitable for application scenes with low real-time requirements, such as scene playback and the like.

Next, a method for determining the fatigue state of the driver provided by the present application will be described.

First, referring to fig. 1, fig. 1 shows a schematic flow chart of the method for determining a fatigue state of a driver according to the present application, and the method for determining a fatigue state of a driver according to the present application may specifically include the following steps:

step S101, acquiring a plurality of initial images of a driving position on a vehicle, wherein the plurality of initial images are obtained by shooting the driving position through a camera arranged on the vehicle;

it will be appreciated that the camera, with its viewing angle, is generally facing the driving position so that the driver can be photographed when seated in the driving position.

Of course, the field of view of the camera does not necessarily include only the contents of the driving position, but may include contents of spatial positions other than the driving position, such as a door, a seat beside the driving position, and a seat behind the driving position.

In step S101, for the acquisition of the initial image, a real-time and direct acquisition process may be performed, that is, the camera is included in the determination device of the fatigue state of the driver, and the image is directly read from the camera; alternatively, the processing may be processing of real-time and indirect acquisition, that is, image retrieval may be performed in real time from a camera or a device other than the device for determining the fatigue state of the driver; alternatively, the stored historical images may also be retrieved from local, camera, or other devices for non-real-time and indirect acquisition processing.

Obviously, the specific acquisition process may be adjusted according to the specific device form or application scenario of the device for determining the fatigue state of the driver, and is not limited herein.

Step S102, carrying out face recognition processing on a plurality of initial images to obtain a plurality of face images contained in the plurality of initial images;

it can be understood that the face recognition algorithm involved in the face recognition processing may generally directly adopt the existing recognition algorithm, such as retinaFace, Mtcnn, and the like. Of course, it is also possible to use an improved algorithm or even a completely new algorithm on the basis of an existing recognition algorithm.

The face recognition is generally to recognize image regions of face features in an image, for example, landmark face parts such as eyes, nose, lips, and the like of a face, and further, may perform auxiliary recognition through the pupil center of the left eye, the center of the pupil of the right eye, the tip of the nose, the left corner of the lips, the right corner of the lips, and the like.

The face recognition algorithm can be generally carried on a neural network model for realization, in the training process of the model, a sample image which is configured in advance and is marked with a corresponding face recognition result is sequentially led into the model, so that the model carries out face recognition, the face recognition result is output, forward propagation is completed, a loss function is calculated according to the face recognition result, the parameters of the model are optimized according to the loss function calculation result, backward propagation is realized, when the training requirements of the training, such as training duration, recognition accuracy and the like are met, the training of the model can be completed, and the model at the moment can be put into practical application.

It can be understood that after the face in the initial image is recognized, the corresponding face image can be output.

The face image can be understood as being cut out from the original image according to the recognized face area. Of course, in the output process, processing such as image scaling may be involved.

Step S103, extracting a plurality of human eye images contained in the plurality of human face images according to human face features corresponding to the plurality of human face images, wherein the human eye images comprise a pupil area and an area in a preset range around the pupil area;

in the application, after the face image is obtained, the face image can be focused on the human eye area in the face area according to the face characteristics of the face image, and the corresponding human eye image is extracted.

The face features corresponding to the face image may be the face features recognized in the step S102 during face recognition, and at this time, only the features directly labeled with the eye features need to be extracted from the recognized face feature set, or the features conforming to the eye features may be screened from the recognized face feature set.

Or, the face features corresponding to the face image may also be features obtained by a new recognition process after the face recognition is performed in step S102, and the recognition principle of the face features is similar to that of the face recognition, and may also be implemented by a neural network model, which is not described herein again specifically.

Alternatively, in some implementations, a more simple extraction process may also be employed. In practical application, the face image obtained in step S102 may be set to have a fixed viewing angle and/or a fixed size, and at this time, the eye image may be directly cut or scratched from the face image according to a preset detection frame or an extraction scheme.

For example, the present application may perform outward matting at the pupil center of the left eye and the pupil center of the right eye, where the size of the matting is selected depending on the width w of the face detection frame of the face recognition algorithm involved in step S102, which is specifically as follows:

referring to fig. 2, a scene schematic diagram of a face recognition result of the present application is shown, where the width of face detection is W, the height is H, and it is assumed that the detected center coordinates of the pupil of the left eye is

Then, the coordinates of the upper corner and the lower right corner of the small rectangular frame area around the eye are:

the treatment of the right eye is similar.

Step S104, sequentially inputting a plurality of human eye images into an open-close eye recognition model, carrying out open-close eye recognition processing on the human eye images by the open-close eye recognition model to obtain a plurality of open-close eye recognition results output by the open-close eye recognition model, wherein the open-close eye recognition model is obtained by training an initial model through a sample human eye image marked with a corresponding open-close eye recognition result, and the open-close eye recognition model is used for recognizing whether the human eyes in the input image are in an open-close state or a closed-close state;

it can be understood that, aiming at the fatigue state of the driver, the method is determined by combining the recognized eye-closing recognition result as a data basis on the basis of image recognition, and the eye-closing recognition result can be recognized by a neural network model specially trained by the method, namely an open-closed eye recognition model.

Similar to the above-mentioned model training, the training of the open-closed eye recognition model is to sequentially introduce the sample images, which are pre-configured and labeled with the corresponding open-closed eye recognition results, into the model, so that the model performs open-closed eye recognition, output the open-closed eye recognition results (open-closed eye recognition results or closed-eye recognition results), complete forward propagation, calculate the loss function according to the open-closed eye recognition results, optimize the parameters of the model according to the loss function calculation results, realize backward propagation, complete the training of the model when the training requirements such as training time, training duration, recognition accuracy and the like are met, and the model at this time can be put into practical application.

Step S105, judging whether continuous closed eye recognition results with time point span larger than preset time length exist in the open-close eye recognition results by combining the image acquisition time points corresponding to the open-close eye recognition results, and if yes, triggering step S106;

in the prior art, an electroencephalogram-based detection scheme is dependent on hardware, and acquires an electroencephalogram of a driver to detect a fatigue state; the detection scheme based on PERCLOS needs to determine the time of the eyes closing state of the driver in a specified unit time as a percentage of the total time to measure the fatigue state of the driver, for example, when 70% or 80% is reached, the driver can be determined to be in the fatigue state.

The method does not depend on specially deployed hardware required by an electroencephalogram-based detection scheme, and compared with the detection method based on PERCLOS, the method fixedly consumes longer detection time, and the method judges that the driver is in a fatigue state based on continuous eye-closing recognition results.

It is understood that the human eye image input by the model has a corresponding image acquisition time point, the image acquisition time point is the acquisition time point of the corresponding initial image, in practical application, a plurality of initial images are dynamically acquired and have continuity, in this case, the model can be sequentially input according to the time sequence of the image acquisition time points when being input, of course, the closed eye recognition result output by the model also has a corresponding image acquisition time point, and a plurality of open-closed eye recognition results with disordered time sequence output by the model can also be sequenced into continuous open-closed eye recognition results.

In this case, the present application uses a dynamic determination mechanism to directly determine that the driver is in a fatigue state when there are a plurality of consecutive closed-eye recognition results.

It can be understood that, the plurality of the closed-eye recognition results may be the number of the closed-eye recognition results, or may be the time point span corresponding to the closed-eye recognition results, and the principle is the same, when a continuous closed-eye recognition result with the time point span larger than the preset time length occurs in the dynamic monitoring process, the determination mechanism may be triggered, and the fatigue state of the driver in the corresponding time period is determined through the following step S106.

It is easy to find that, under the determination scheme of the driver fatigue state of the present application, not only because of a dynamic determination mechanism, there is no fixed detection period of the detection scheme based on PERCLOS, and the application is more flexible, and the time consumption required for determining the driver fatigue state can be reduced, but also because of the adopted image recognition, whether the driver fatigue state is present is determined by combining continuous eye-closing recognition results, the error rate is kept low in technical implementation, and the situation that the driver fatigue state is present in a short time (for example, in the case that the fatigue state lasting for 5 seconds may be omitted in the detection scheme based on PERCLOS) is less likely to be omitted, so that the determination scheme has the characteristics of being more accurate and faster, and the time consumption required for determining the driver fatigue state can be further and significantly reduced.

For example, the preset time period in the present application may be specifically set to 1 to 2s, and obviously, the required time consumption is significantly reduced compared to 30s to 60s required by the PERCLOS-based detection scheme.

And step S106, determining that the drivers corresponding to the plurality of initial images are in a fatigue state.

It is understood that in the case where it is determined in step S105 that there are continuous closed-eye recognition results whose time point spans are longer than the preset time period, it is determined that the driver deals with the fatigue state.

The determination process may be understood as recording or outputting on the system that the corresponding driver is in a tired state, e.g. for the initial image marker "determined to be in a tired state".

Subsequently, the response processing of the driver can be executed according to the configured response strategy for the determined event that the driver is in the fatigue state, such as sending out a voice prompt, decelerating and the like.

It can be seen from the embodiment shown in fig. 1 that, for the detection requirement of the fatigue state of the driver, the present application provides a new detection scheme, in the detection process, the present application does not rely on hardware, and the driver is judged to be in the fatigue state by adopting the continuous eye-closing recognition result on the basis of introducing the image recognition, the required processing time including the acquisition of the initial image and the processing thereof can be greatly reduced in the actual operation, and in addition, the stable high detection precision can be kept, that is, a more convenient detection scheme is provided, and the practical application has significantly improved practicability.

Further, as mentioned above, the open-close eye recognition model is specifically configured in the present application, and for this model, there is a further model optimization setting in practical application of the present application to effectively improve the detection accuracy thereof.

In the present application, the open-closed eye recognition model may specifically adopt a CNN LeNet-5 model, and in a further model optimization setting, the model may be performed on the basis of a model architecture of the CNN LeNet-5 model.

It can be understood that the model optimization setup is worth optimizing if the model architecture of the CNN LeNet-5 model is retained.

First, referring to a model architecture diagram of a CNN LeNet-5 model in the prior art shown in fig. 3 and a model architecture diagram of an open-close eye recognition model of the present application shown in fig. 4, a model architecture diagram of the CNN LeNet-5 model may mainly include an INPUT layer (INPUT), a convolutional layer (volumes), an upsampling layer (Subsampling), a Full connectivity layer (Full connectivity), a Gaussian connectivity layer (Gaussian connectivity), and an output layer (output).

From the comparison between fig. 3 and fig. 4, it can be found that the present application can improve the size of the feature map (feature map) in the transmission process in the model.

It can be understood that, compared with general lightweight neural networks, such as Mobilenetv1, Mobilenetv2 and Mobilenetv3, the CNN LeNet-5 model has a larger decrease in network accuracy, because the smaller the input image size is, the smaller the network is subjected to downsampling, the smaller the size of the feature map obtained later is, and therefore useful information is very small.

The input image of the model is enlarged, although the identification precision of the model is improved to a certain extent, the calculation examples required by the algorithm are increased, the embedded deployment is not facilitated, and the required cost is also increased, so that the small network is researched by the application, the parameters of the model architecture of the CNN LeNet-5 model are optimized, in the specific application, the size of the input layer of the model is 48x48 pixels, the number of channels of each convolution layer in the model is twice that of the corresponding convolution layer in the CNN LeNet-5 model, other full-connection layers are kept unchanged, the number of layers is not particularly large at the moment, the size of the characteristic diagram is still large after the multiple downsampling layers, the identification precision is high, and the calculation force required by the model is small.

In addition, the present application may further introduce an attention mechanism model architecture, which may implement an attention mechanism, that is, in a specific operation, in a model, between at least one of front and back layer model structures, the attention mechanism model architecture is configured, and the attention mechanism model architecture may sequentially include a Global pooling layer (Global pooling), a first fully-connected layer, a second fully-connected layer, and a sigmod output layer.

It is understood that, in the present application, the importance of different feature information of each layer in the neural network is different (may be of different channels or different locations), and the important feature information should be emphasized more by the following layer to suppress the unimportant feature information, and correspondingly, the attention mechanism is used to adjust the weight of specific feature information such as each channel or location, so that the neural network is more concerned about learning some specific feature parameters, and the importance thereof is increased, that is, the attention is increased, and the more important feature information is retained for the following layer.

For example, in the eye classification, the feature information of opening and closing of the eyes should be more noticed, rather than the feature information of skin color around the eyes, whether or not glasses are worn, and the like.

In connection with a model architecture diagram of the present application attention mechanism model architecture shown in fig. 5, assuming that the size of the original feature diagram is c × h × w, a global pooling is performed on the feature diagram (the pooling window is a 1 × 1 window obtained by h × w, and the number of channels is not changed) to obtain a c × 1 × 1 feature diagram, and then two fully-connected layers are connected (the number of neurons in the first fully-connected layer is c/16, which is equivalent to performing dimensionality reduction on c, the input is c features, the number of neurons in the second fully-connected layer is c, which is equivalent to performing dimensionality increase back to c features, so that compared with directly using a fully-connected layer, the advantage of having more nonlinearities is obtained, the complex correlation between channels can be better fitted, the parameter amount and the calculation amount are greatly reduced), and then a sigmod layer is connected (here, the sigmod layer is used because the correlation between channels is obtained, therefore, the softmax layer is not used, and the final addition must be 1), c × 1 × 1 is output, the dimension c × h × w of the original feature map is obtained, the weight dimension c × 1 × 1 of the channel is obtained, and then the importance of each channel corresponding to the feature map obtained by multiplying (full multiplication, not matrix multiplication) the feature maps is different.

In addition, for the loss function adopted in the model training process, the following formula is adopted for training the model by adopting the Focal _ loss function on the basis of the human eye images of the positive and negative samples in the specific operation, so that the problem that the accumulated loss function structure result is larger due to excessive samples which are easy to classify in the training process can be avoided to a certain extent, the contribution of the samples which are easy to classify to the total loss function calculation result is greatly reduced by increasing the adjustment factor, the influence on the loss function settlement result of the misclassified samples is small, and under the setting, the training precision and the training efficiency of the model can be further improved.

In addition, for the sample human eye images used in the training process, the method is specifically configured in such a way that the human eye images in the eye closing state are labeled as the eye closing recognition results, the human eye images in the squinting state, the full eye opening state and the non-eye region are labeled as the eye opening recognition results, and the eye opening and closing recognition results are classified more accurately.

Moreover, for the sample human eye image, in order to enhance the data amount of the sample, the initial sample human eye image may be obtained by data enhancement processing, and the data enhancement processing includes at least one of random color enhancement processing, random rotation processing, and random saturation processing.

For example, 100w pieces of training sample eye images in total may be configured, wherein 50w pieces correspond to the open-eye recognition result, 50w pieces correspond to the closed-eye recognition result, wherein the eye images include various environments (different vehicle environments such as light, dark light, and reflective lenses), 20w pieces of testing sample eye images in total may be configured, 10w pieces of open eyes correspond to the open-eye recognition result, and 10w pieces of closed-eye recognition result.

In addition, for the initial model for training, a pre-training model can be used, the training of the open-close eye recognition can be continued by combining the specific sample human eye image through a general model which completes certain training in advance, and the training efficiency and the recognition precision can be further improved.

In order to facilitate understanding of the above-mentioned optimization settings for the model, the effect comparison table of training accuracy corresponding to different optimization settings shown in table 1 below may be specifically combined for understanding.

TABLE 1 Effect comparison Table for different optimization settings corresponding to recognition accuracy

Network	Accuracy of identification
		Original CNN LeNet-5 model	90%
The input size becomes 48x48+ and the number of channels is doubled	95%
		Input size changed to 48x48+ channel number doubling + attention increasing mechanism	96%
Input size changed to 48x48+ channel number doubling + added attention mechanism + training using Focal _ loss	97%
		Input size changed to 48x48+ channel number doubling + added attention mechanism + training using Focal _ loss + enhancement using data	98.6%
Input size becomes 48x48+ channel number doubling + added attention mechanism + training using Focal _ loss + data enhancement + added pre-training model	99.6%

From the above table, it can be seen that after the CNN LeNet-5 model is improved, the recognition accuracy of the model can be obviously improved, so that the model can output a more accurate open-closed eye recognition result in practical application, and the determination accuracy of the model on the fatigue state of the driver is further improved.

The above is the introduction of the method for determining the fatigue state of the driver provided by the present application, and the present application also provides a device for determining the fatigue state of the driver from the perspective of a functional module, in order to better implement the method for determining the fatigue state of the driver provided by the present application.

Referring to fig. 6, fig. 6 is a schematic structural diagram of the apparatus for determining a fatigue state of a driver according to the present application, in which the apparatus 600 for determining a fatigue state of a driver may specifically include the following structures:

an acquisition unit 601 configured to acquire a plurality of initial images of a driving position on a vehicle, the plurality of initial images being obtained by capturing the driving position by a camera disposed on the vehicle;

a face recognition unit 602, configured to perform face recognition processing on the multiple initial images to obtain multiple face images included in the multiple initial images;

an extracting unit 603, configured to extract, according to face features corresponding to the plurality of face images, a plurality of eye images included in the plurality of face images, where the eye images include a pupil area and an area within a preset range around the pupil area;

an open-closed eye recognition unit 604 for sequentially inputting a plurality of human eye images to an open-closed eye recognition model, performing open-closed eye recognition processing on the human eye images by the open-closed eye recognition model, and obtaining a plurality of open-closed eye recognition results output by the open-closed eye recognition model, wherein the open-closed eye recognition model is obtained by training an initial model by a sample human eye image labeled with a corresponding open-closed eye recognition result, and the open-closed eye recognition model is used for recognizing whether the human eyes in the input image are in an open-eye state or a closed-eye state;

a determining unit 605, configured to determine, by combining the image acquisition time points corresponding to the multiple open-close eye recognition results, whether there is a continuous closed-eye recognition result in the multiple open-close eye recognition results, where a time span is greater than a preset time duration, and if yes, trigger the determining unit 606;

a determining unit 606, configured to determine that the driver corresponding to the plurality of initial images is in a fatigue state.

In one exemplary implementation, the open-closed eye recognition model is embodied as a neural network model employing a model architecture of the CNN LeNet-5 model.

In yet another exemplary implementation, the input layer size of the open-closed eye recognition model is 48 × 48 pixels, and the number of channels of each convolutional layer in the open-closed eye recognition model is twice the number of channels of the corresponding convolutional layer in the CNN LeNet-5 model.

In yet another exemplary implementation, the open-closed eye recognition model configures an attention mechanism model architecture between at least one of the front and back layer model structures, the attention mechanism model architecture comprising, in order, a global pooling layer, a first fully-connected layer, a second fully-connected layer, and a sigmod output layer.

In another exemplary implementation, the open-closed eye recognition model is trained by using a Focal _ loss function on the basis of positive and negative sample human eye images in a training process.

In yet another exemplary implementation manner, in the sample human eye images used in the training process of the open-close eye recognition model, the human eye image in the completely closed eye state is labeled as the closed-eye recognition result, and the human eye image in the narrow eye state, the completely open eye state and the non-eye region is labeled as the open-eye recognition result, the sample human eye image is obtained by performing data enhancement processing on the initial sample human eye image, and the data enhancement processing includes at least one of random color enhancement processing, random rotation processing and random saturation processing.

In yet another exemplary implementation, the preset duration is specifically 1 to 2 s.

The present application further provides a device for determining a fatigue state of a driver from a hardware structure perspective, referring to fig. 7, fig. 7 shows a schematic structural diagram of the device for determining a fatigue state of a driver of the present application, specifically, the device for determining a fatigue state of a driver of the present application may include a processor 701, a memory 702, and an input/output device 703, where the processor 701 is configured to implement the steps of the method for determining a fatigue state of a driver in the corresponding embodiment of fig. 1 when executing a computer program stored in the memory 702; alternatively, the processor 701 is configured to implement the functions of the units in the embodiment corresponding to fig. 6 when executing the computer program stored in the memory 702, and the memory 702 is configured to store the computer program required by the processor 701 to execute the method for determining the fatigue state of the driver in the embodiment corresponding to fig. 1.

Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in the memory 702 and executed by the processor 701 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing certain functions, the instruction segments being used to describe the execution of a computer program in a computer device.

The driver fatigue status determination device may include, but is not limited to, a processor 701, a memory 702, and an input-output device 703. It will be appreciated by those skilled in the art that the illustration is merely an example of a device for determining a fatigue state of a driver, and does not constitute a limitation of the device for determining a fatigue state of a driver, and may include more or less components than those illustrated, or may combine some components, or different components, for example, the device for determining a fatigue state of a driver may further include a network access device, a bus, etc., and the processor 701, the memory 702, the input-output device 703, etc. are connected via the bus.

The Processor 701 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the driver fatigue status determining device, with various interfaces and lines connecting the various parts of the overall device.

The memory 702 may be used to store computer programs and/or modules, and the processor 701 may implement various functions of the computer apparatus by running or executing the computer programs and/or modules stored in the memory 702 and invoking data stored in the memory 702. The memory 702 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from use of the determination device of the fatigue state of the driver, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The processor 701, when executing the computer program stored in the memory 702, may specifically implement the following functions:

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the above-described specific working process of the determining apparatus and device for the fatigue state of the driver and the corresponding units thereof may refer to the description of the determining method for the fatigue state of the driver in the embodiment corresponding to fig. 1, and details are not repeated herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

For this reason, the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps of the method for determining the fatigue state of the driver in the embodiment corresponding to fig. 1 in the present application, and specific operations may refer to the description of the method for determining the fatigue state of the driver in the embodiment corresponding to fig. 1, which is not repeated herein.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps of the method for determining the fatigue state of the driver in the embodiment corresponding to fig. 1, the beneficial effects that can be achieved by the method for determining the fatigue state of the driver in the embodiment corresponding to fig. 1 can be achieved, which are detailed in the foregoing description and will not be repeated herein.

The method, the device, the equipment and the computer-readable storage medium for determining the fatigue state of the driver provided by the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of determining a fatigue state of a driver, the method comprising:

sequentially inputting the plurality of human eye images to an open-close eye recognition model, performing open-close eye recognition processing on the human eye images by the open-close eye recognition model, and obtaining a plurality of open-close eye recognition results output by the open-close eye recognition model, wherein the open-close eye recognition model is obtained by training an initial model through a sample human eye image marked with corresponding open-close eye recognition results, and the open-close eye recognition model is used for recognizing whether human eyes in an input image are in an open-close state or a closed-close state;

judging whether continuous closed eye recognition results with time point span larger than preset time length exist in the open eye recognition results or not by combining the image acquisition time points corresponding to the open eye recognition results;

2. The method according to claim 1, characterized in that the open-closed eye recognition model is in particular a neural network model employing a model architecture of the CNN LeNet-5 model.

3. The method of claim 2, wherein the input layer size of the open-closed eye recognition model is 48x48 pixels, and the number of channels in each convolutional layer in the open-closed eye recognition model is twice the number of channels in the corresponding convolutional layer in the CNN LeNet-5 model.

4. The method of claim 2, wherein the open-closed eye recognition model configures an attention mechanism model architecture between at least one of the front and back layer model structures, the attention mechanism model architecture comprising, in order, a global pooling layer, a first fully-connected layer, a second fully-connected layer, and a sigmod output layer.

5. The method of claim 2, wherein the open-closed eye recognition model is trained using a Focal _ loss function based on positive and negative sample eye images during the training process.

6. The method according to claim 2, wherein the eye-opening/closing recognition model is characterized in that, in the sample eye images used in the training process, the eye image in the completely eye-closing state is labeled as the eye-closing recognition result, the eye image in the narrow eye state, the completely eye-opening state and the eye image in the non-eye region are labeled as the eye-opening recognition result, the sample eye image is obtained by performing data enhancement processing on the initial sample eye image, and the data enhancement processing includes at least one of random color enhancement processing, random rotation processing and random saturation processing.

7. Method according to claim 1, characterized in that said preset duration is in particular 1 to 2 s.

8. An apparatus for determining a fatigue state of a driver, the apparatus comprising:

a face recognition unit, configured to perform face recognition processing on the multiple initial images to obtain multiple face images included in the multiple initial images;

an open-closed eye recognition unit configured to sequentially input the plurality of human eye images to an open-closed eye recognition model, perform open-closed eye recognition processing on the human eye images by the open-closed eye recognition model, and obtain a plurality of open-closed eye recognition results output by the open-closed eye recognition model, where the open-closed eye recognition model is obtained by training an initial model with a sample human eye image labeled with a result corresponding to the open-closed eye recognition result, and the open-closed eye recognition model is used to recognize whether human eyes in an input image are in an open-eye state or a closed-eye state;

the judging unit is used for judging whether continuous closed eye recognition results with time point span larger than preset time length exist in the open eye recognition results or not by combining the image acquisition time points corresponding to the open eye recognition results, and if yes, the determining unit is triggered;

the determining unit is used for determining that the drivers corresponding to the plurality of initial images are in a fatigue state.

9. A device for determining the fatigue state of a driver, characterized in that it comprises a processor and a memory, in which a computer program is stored, which when called by the processor executes the method according to any one of claims 1 to 7.

10. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the method of any one of claims 1 to 7.