CN115761705A

CN115761705A - Fatigue detection method and device and vehicle

Info

Publication number: CN115761705A
Application number: CN202211346102.XA
Authority: CN
Inventors: 梁田峰; 许猛; 栗羽峰; 张雅杰
Original assignee: Great Wall Motor Co Ltd
Current assignee: Great Wall Motor Co Ltd
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-03-07

Abstract

The application discloses a fatigue detection method, a fatigue detection device and a vehicle. Wherein, the method comprises the following steps: acquiring a multi-frame target image including a target user within a first time period; extracting target features of a target image based on the face of a target user; generating a target characteristic sequence based on the target characteristics of the multi-frame target images; inputting the target characteristic sequence into a state detection model, and outputting a state detection result of a target user within a first time length, wherein the state detection model is obtained by training based on multi-frame images of a plurality of known user states within the first time length; the fatigue detection result of the target user is determined according to the state detection results corresponding to the multiple continuous first time periods in the second time period, the second time period is equal to the total time period of the multiple first time periods, the fatigue detection result is determined according to the state detection results of the multiple continuous first time periods in the second time period, the anti-interference capacity of fatigue detection under the condition of complex environment change is improved, and therefore the stability and the accuracy of the fatigue detection are improved.

Description

Fatigue detection method and device and vehicle

Technical Field

The application relates to the technical field of auxiliary driving, in particular to a fatigue detection method and device and a vehicle.

Background

With the rapid development of the traffic industry in China, the number of vehicles as traffic tools is increasing, and major traffic accidents caused by vehicle running also occur frequently. Fatigue driving is one of the main causes of traffic accidents.

At present, the fatigue detection method mainly detects a single frame image of a driver, and then performs fatigue judgment or fatigue classification according to open and close eyes, yawning and other conditions in a given period. However, in the actual driving process, the lighting conditions inside the vehicle and the face pose of the driver are complicated and changeable, and when the fatigue detection method in the related art is applied to the fatigue detection process of the driver inside the vehicle, the method cannot adapt to the situation that the light change inside the vehicle is complicated and the face pose of the driver is complicated and changeable in the actual driving process, and the problem of erroneous judgment or missing report is easily generated, so that the anti-interference capability and the accuracy of the detection of the method are still to be improved.

Disclosure of Invention

The embodiment of the application provides a fatigue detection method, a fatigue detection device and a vehicle, and improves the anti-interference capability of the vehicle facing a complex environment and the accuracy of fatigue detection. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a fatigue detection method, where the method includes:

acquiring a plurality of frames of target images within a first time length; the target image comprises a target user;

extracting target features of the target image based on the face of the target user;

generating a target feature sequence based on the target features of the multi-frame target images;

inputting the target characteristic sequence into a state detection model, and outputting a state detection result of the target user within the first duration; the state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states;

determining fatigue detection results of the target user according to the state detection results corresponding to a plurality of continuous first time periods within a second time period; the second period of time is equal to a total of the first periods of time.

In one possible implementation, the above objective features include at least one of: the target yawning state characteristic, the target head state characteristic and the target eye state characteristic;

the extracting of the target feature of the target image based on the face of the target user includes:

determining a target face image and/or a target eye image of the target user based on a preset detection algorithm and the target image;

extracting the target yawning state characteristic of the target face image by using a first model; and/or

Extracting the target head state characteristic of the target face image by using a second model; and/or

Extracting the target eye state features of the target eye image by using a third model;

the first model is obtained by performing comparative learning training on a plurality of face images in known yawning states; the second model is obtained by performing regression training based on a plurality of face images with known head states; the third model is obtained by performing comparative learning training based on a plurality of eye images with known eye states.

In a possible implementation manner, the target feature sequence includes a target yawning state feature sequence and/or a target head state feature sequence and/or a target eye state feature sequence.

In one possible implementation, the target eye image includes a target left-eye image and a target right-eye image; the target eye state features comprise target left eye state features and target right eye state features; the target eye state feature sequence comprises a target left eye state feature sequence and a target right eye state feature sequence.

In a possible implementation manner, after the target feature sequence is generated based on the target features of the target images of the multiple frames in the first time period, before the target feature sequence is input into a state detection model and a state detection result of the target user in the first time period is output, the method further includes:

performing dimensionality reduction processing on the target feature sequence to obtain a first target feature sequence;

reducing the first target characteristic sequence to the dimension same as that of the target characteristic sequence to obtain a second target characteristic sequence;

the inputting the target feature sequence into a state detection model and outputting a state detection result of the target user within the first duration includes:

and inputting the second target characteristic sequence into a state detection model, and outputting a state detection result of the target user within the first time length.

In a possible implementation manner, the state detection result includes: a head state detection result and/or a yawning state detection result and/or an eye state detection result; the head state detection result comprises a head posture Euler angle of the target user within the first time length; the detection result of the yawning state comprises the probability that the target user is in the yawning state within the first duration; the eye state detection result includes a probability that the target user is in an eye-closing state within the first duration.

In a possible implementation manner, the fatigue detection result includes a fatigue degree distribution result; the fatigue degree distribution result comprises at least one fatigue degree of the target user in the second time period and the probability corresponding to the fatigue degree;

the determining the fatigue detection result of the target user according to the state detection results corresponding to a plurality of consecutive first time periods within the second time period includes:

extracting fatigue characteristic information in the state detection results corresponding to a plurality of continuous first time spans in a second time span;

inputting the fatigue characteristic information into a fourth model, and outputting a fatigue degree distribution result of the target user in the second time period; the fourth model is obtained by training based on a plurality of video images with known fatigue degrees and a second duration; the fatigue degree of the video image of the second duration comprises the fatigue degree of a plurality of annotating personnel for annotating the video image.

In a possible implementation manner, the fatigue characteristic information includes at least one of the following: the total eye closing time length, the average eye closing time length of each eye closing time length, the blink frequency, the total yawning time length, the mean value and the variance of the euler angles of the head postures, the percentage of the head pitch angles smaller than the first threshold value, and the percentage of the head yaw angles smaller than the second threshold value or larger than the third threshold value of the target user in the second time length are calculated.

In a second aspect, an embodiment of the present application provides a fatigue detection apparatus, including:

the acquisition module is used for acquiring multi-frame target images within a first time period; the target image comprises a target user;

a feature extraction module for extracting a target feature of the target image based on the face of the target user;

the generating module is used for generating a target characteristic sequence based on the target characteristics of the multi-frame target images;

the state detection module is used for inputting the target characteristic sequence into a state detection model and outputting a state detection result of the target user within the first time length; the state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states;

the determining module is used for determining the fatigue detection result of the target user according to the state detection results corresponding to a plurality of continuous first time periods within a second time period; the second duration is equal to a total duration of the plurality of first durations.

the above-mentioned feature extraction module includes:

the determining unit is used for determining a target face image and/or a target eye image of the target user based on a preset detection algorithm and the target image;

the first extraction unit is used for extracting the target yawning state characteristic of the target face image by using a first model; and/or

A second extraction unit for extracting a target head state feature of the target face image using a second model; and/or

A third extraction unit, configured to extract a target eye state feature of the target eye image using a third model;

the first model is obtained by performing comparative learning training on a plurality of face images in known yawning states; the second model is obtained by performing regression training on a plurality of face images with known head states; the third model is obtained by performing comparative learning training on a plurality of eye images with known eye states.

In one possible implementation, the target eye image includes a target left-eye image and a target right-eye image; the target eye state characteristics comprise target left eye state characteristics and target right eye state characteristics; the target eye state feature sequence comprises a target left eye state feature sequence and a target right eye state feature sequence.

In a possible implementation manner, the fatigue detection apparatus further includes:

the dimensionality reduction processing module is used for carrying out dimensionality reduction processing on the target feature sequence to obtain a first target feature sequence;

the restoring module is used for restoring the first target characteristic sequence to the dimension which is the same as that of the target characteristic sequence to obtain a second target characteristic sequence;

the state detection module is specifically configured to:

In a possible implementation manner, the fatigue detection result includes a fatigue degree distribution result; the fatigue degree distribution result comprises at least one fatigue degree of the target user in the second time length and the probability corresponding to the fatigue degree;

the determining module includes:

a fourth extraction unit, configured to extract fatigue feature information in the state detection results corresponding to multiple consecutive first time periods within a second time period;

a fatigue degree distribution detection unit, configured to input the fatigue feature information into a fourth model, and output a fatigue degree distribution result of the target user in the second time period; the fourth model is obtained by training based on a plurality of video images with known fatigue degrees and a second duration; the fatigue degree of the video image of the second duration comprises the fatigue degree of a plurality of annotators for annotating the video image.

In a possible implementation manner, the fatigue characteristic information includes at least one of the following: the total eye closing time length of the target user in the second time length, the average eye closing time length of each time, the blinking frequency, the total yawning time length, the mean value and the variance of the euler angles of the head posture, the percentage of the head pitch angle smaller than the first threshold value, and the percentage of the head yaw angle smaller than the second threshold value or larger than the third threshold value.

In a third aspect, an embodiment of the present application provides a vehicle, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps provided in the first aspect of the embodiments of the present application or any one of the possible implementations of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, where the computer storage medium stores a plurality of instructions, and the instructions are adapted to be loaded by a processor and to perform the method steps provided in the first aspect of the present application or any one of the possible implementations of the first aspect.

The beneficial effects brought by the technical scheme provided by some embodiments of the application at least comprise:

in one or more embodiments of the present application, a multi-frame target image including a target user within a first time period is obtained, a target feature of the target image is extracted based on a face of the target user, a target feature sequence is generated based on the target feature of the multi-frame target image, the target feature sequence is input into a state detection model, and a state detection result of the target user within the first time period is output, the state detection model is obtained by training based on the multi-frame images within a plurality of first time periods of known user states, and finally, a fatigue detection result of the target user is determined according to state detection results corresponding to a plurality of first time periods within a second time period, the second time period is equal to a total time period of the plurality of first time periods, and compared with the case that the accuracy is not high due to being easily affected by a complex environment when state detection or fatigue detection is performed based on a single-frame target image, the embodiment of the present application determines a state detection result of the target user within the first time period according to a target feature sequence generated according to a state of the multi-frame target images within the plurality of the first time periods within the second time period, and determines a fatigue detection result of the target user within the second time period, and can improve respective fatigue detection stability of the fatigue detection of the multi-frame target user within the first time period, thereby improving the fatigue detection and improving the fatigue detection accuracy of the fatigue detection by using the multiple-frame detection.

The foregoing description is only an overview of the technical solutions of the present application, and the embodiments of the present invention are described below in order to make the technical means of the present application more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic diagram of a fatigue detection process provided in the related art;

fig. 2A is a schematic structural diagram of a fatigue detection system according to an exemplary embodiment of the present application;

fig. 2B is a schematic view of an application scenario of a fatigue detection method according to an exemplary embodiment of the present application;

FIG. 3 is a schematic flow chart of a fatigue detection method according to an exemplary embodiment of the present application;

FIG. 4A is a schematic diagram of a first model according to an exemplary embodiment of the present application;

FIG. 4B is a schematic diagram of a second model according to an exemplary embodiment of the present application;

FIG. 4C is a schematic diagram of a third model according to an exemplary embodiment of the present application;

fig. 5 is a schematic diagram of an implementation process of determining a state detection result of a target user within a first time period (target user state detection) in a fatigue detection method according to an exemplary embodiment of the present application;

fig. 6 is a schematic flow chart illustrating an implementation of determining a fatigue detection result of a target user according to a state detection result in a fatigue detection method according to an exemplary embodiment of the present application;

fig. 7 is a schematic diagram illustrating an implementation process of a fatigue detection method according to an exemplary embodiment of the present application;

fig. 8 is a schematic diagram of an implementation process of extracting fatigue feature information from a state detection result according to an exemplary embodiment of the present application;

FIG. 9 is a schematic flow chart diagram illustrating another fatigue detection method provided in an exemplary embodiment of the present application;

fig. 10 is a schematic structural diagram of a fatigue detection apparatus according to an exemplary embodiment of the present application;

fig. 11 is a schematic structural diagram of a vehicle according to an exemplary embodiment of the present application.

Detailed Description

In order to make the features and advantages of the present application more obvious and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

At present, as shown in fig. 1, a related fatigue detection method mainly determines a fatigue detection result according to eye features or mouth features in a single-frame target image of a target user directly, for example, first determining whether the target user closes the eye according to the eye features and determining whether the target user yawns according to the mouth features, and then determining whether the target user is tired or the fatigue level of the target user according to states of whether the target user closes the eye and yawns in the single-frame target image.

However, in the process of actually applying fatigue detection, ambient lighting conditions and the facial pose of the user are complicated and changeable, and the user is tired, such as yawning or opening and closing the eyes, are dynamic processes, so the fatigue detection method in the related art as shown in fig. 1 is not only adaptable to the situations that ambient lighting changes are complicated and the facial pose of the user is complicated and changeable in the actual application process, but also is difficult to obtain user states such as eye state and yawning state more accurately through a single frame image, which may affect the stability and accuracy of fatigue detection to a certain extent.

Based on this, the application provides one or more embodiments, the states of the target user in the first time duration, such as the eye state, the head state, the yawning state, and the like, are determined according to the target feature sequence generated according to the target features of the multi-frame target image in the first time duration, and the fatigue detection result of the target user in the second time duration is determined according to the respective states corresponding to the multiple continuous first time durations of the target user in the second time duration, so that the anti-interference capability of fatigue detection under the condition of complex environment change can be improved, the stability of fatigue detection is improved, and the accuracy of fatigue detection can be improved.

Referring to fig. 2A, fig. 2A schematically illustrates a structural diagram of a fatigue detection system according to an embodiment of the present application.

As shown in fig. 2A, the fatigue detection system may include: image capture device 210 and server 220. Wherein:

the image capturing device 210 may be a mobile phone, a tablet computer, a notebook computer, etc. equipped with a user version software and a camera, and may also be a camera or other devices such as a vehicle equipped with a camera, which is not limited in this embodiment of the present application.

Alternatively, when the fatigue condition of the target user is to be known, the target image corresponding to the target user may be acquired by the image acquisition device 210. Meanwhile, the image acquisition device 210 may extract the target feature of the target image based on the face of the target user in the target image, generate a target feature sequence based on the target features of the target images of multiple frames in the first time period and determine the state detection result of the target user in the first time period through the state detection model, and then determine the fatigue detection result of the target user according to the state detection results corresponding to multiple consecutive first time periods in the second time period. The state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states, and the second time length is equal to the total time length of the first time lengths.

Alternatively, after the target image of the target user is acquired by the image acquisition device 210, the image acquisition device 210 may establish a data relationship with the network, and establish a data connection relationship with the server 220 through the network, for example, sending the target image or video image corresponding to the target user, the receiving server 220 determining, based on the target images of multiple frames of the target user in each of the multiple consecutive first durations, the fatigue detection result of the target user in the second duration equal to the total duration of the multiple first durations, and the like.

The server 220 may be a server capable of providing multiple fatigue detections, and may receive data such as a target image or a video image sent by the image capturing device 210 or another server through a network, extract a target feature of the target image based on a face of a target user in the target image, generate a target feature sequence based on a target feature of multiple frames of target images in a first time period and a state detection model to determine a state detection result of the target user in the first time period, and then determine a fatigue detection result of the target user according to state detection results corresponding to a plurality of consecutive first time periods in a second time period. The state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states, and the second time length is equal to the total time length of the first time lengths. The server 220 may further send the determined fatigue detection result to the image capturing device 210 or other terminals through the network, so that the target user or other users can know the fatigue condition of the target user in time.

It should be understood that the target image may be obtained by directly transmitting the image capturing device 210 or other server frame by frame, or may be obtained by extracting or randomly extracting the video image of the second duration or the video images of the first durations at equal intervals from the image capturing device 210 or other server, or may be each frame image constituting the video image of the second duration or constituting the video images of the first durations, which is not limited in this embodiment of the application.

It can be understood that when the target images are extracted from the video images of the second duration or the plurality of video images of the first duration, in order to ensure the stability and accuracy of the fatigue detection, it is necessary to ensure that the number of the single-frame target images extracted in each first duration is equal.

Optionally, in order to ensure timeliness and effectiveness of fatigue detection, when the image acquisition device 210 continuously acquires a target image of a target user or the server 220 continuously receives the target image from the image acquisition device 210, the image acquisition device 210 or the server 220 may perform, after acquiring one frame of the target image, extraction of a target feature of the target image based on the face of the target user in the target image, then, after a time when the first frame of the target image is acquired reaches a first time length, perform generation of a target feature sequence based on the target feature of multiple frames of the target image within the first time length and determination of a state detection result of the target user within the first time length by the state detection model, and so on until the time when the first frame of the target image is acquired reaches multiple first time lengths, that is, a second time length, perform determination of a fatigue detection result of the target user according to state detection results corresponding to multiple consecutive first time lengths within a second time length.

Optionally, after the image capturing device 210 captures a video image of a second duration or the server 220 receives a video image of a second duration from the image capturing device 210 or another server, the image capturing device 210 or the server 220 may first segment the video image into a plurality of video images of a first duration, then extract a target feature of multiple target images in each first duration based on the face of the target user in the video image, determine a state detection result of the target user in each first duration based on a target feature sequence (feature map) generated based on the target feature of the multiple target images in each first duration and the state detection model, and finally determine a fatigue detection result of the target user according to the state detection results corresponding to multiple consecutive first durations in the second duration.

It should be understood that the multi-frame target images in the first time period may be all the target images constituting the video image in the first time period, or may be multi-frame target images extracted at equal intervals or randomly extracted from the video image in the first time period, which is not limited in this embodiment of the application.

It is to be appreciated that the server 220 can be, but is not limited to being, a hardware server, a virtual server, a cloud server, and the like.

It is to be understood that the fatigue detection may be performed by the image capturing device 210 or the server 220 alone, or may be performed by the image capturing device 210 and the server 220 in combination, which is not limited in the embodiments of the present application, and all of the following embodiments are described by taking the fatigue detection performed by the image capturing device 210 as an example.

The network may be a medium that provides a communication link between server 220 and any of the clients in terminal 210, or may be the internet including network devices and transmission media, without limitation. The transmission medium may be a wired link (such as, but not limited to, coaxial cable, fiber optic cable, and Digital Subscriber Line (DSL), etc.) or a wireless link (such as, but not limited to, wireless networking (WIFI), bluetooth, and mobile device networks, etc.).

For example, the image capturing device 210 in fig. 2A may be the vehicle 210A in fig. 2B, in an auxiliary driving scenario, in order to timely know the state of the driver during driving and reduce traffic accidents caused by fatigue driving of the driver, a target image 230 of the driver (target user) during driving may be continuously captured by a camera 211 mounted on the vehicle 210A, and fatigue detection of the driver (target user) during driving is achieved based on the captured target image 230 and the fatigue detection method provided by the present application. Meanwhile, once a fatigue detection result which can indicate that the driver (target user) belongs to fatigue driving appears, a reminding message can be sent immediately, such as but not limited to sending voice reminding or ringtone reminding, so that the driver (target user) is reminded of safety risks in time, the driver (target user) is enabled to adjust the state of the driver (target user) in time or consider parking rest, and the driving safety of the driver (target user) is improved. In the embodiment of the present application, after the fatigue detection result indicating that the driver (target user) belongs to fatigue driving occurs, the vehicle 210A may also be controlled to start the automatic driving system, so as to further ensure the driving safety of the driver (target user).

The fatigue detection method can be applied to a driving assistance scene, can also be applied to an auxiliary teaching scene, can realize timely understanding of the learning state of a student by detecting the fatigue condition of the student, or can be applied to an auxiliary operation management scene, can realize timely avoiding of potential safety hazards caused by fatigue operation of a machine by detecting the fatigue condition of workers on a production line, or can be applied to other scenes, and the embodiment of the application is not limited thereto.

It is understood that the number of the image capturing devices 210 and the servers 220 in the fatigue detection system shown in fig. 2A is only an example, and in a specific implementation, the fatigue detection system may include any number of image capturing devices and servers, which is not specifically limited in this embodiment of the present application. For example, but not limiting of, image capture device 210 may be an image capture device cluster comprised of a plurality of image capture devices, and server 220 may be a server cluster comprised of a plurality of servers.

Next, referring to fig. 2A and fig. 2B, a method for detecting fatigue provided by an embodiment of the present application will be described by taking fatigue detection performed by the image capturing device 210 as an example. Specifically, refer to fig. 3, which is a schematic flow chart of a fatigue detection method according to an exemplary embodiment of the present application. As shown in fig. 3, the fatigue detection method includes the following steps:

s301, obtaining multiple frames of target images in a first time period, wherein the target images comprise target users.

Specifically, when the fatigue condition of the target user is to be known, a target image of the target user can be acquired through the camera, and the target image at least comprises the face of the target user. The face of the target user may represent a part above the neck of the target user, or may represent a part below the hairline, at the chin and above the chin of the target user, and the like. The multi-frame target images in the first time period may be all the target images constituting the video image in the first time period, or may be multi-frame target images extracted at equal intervals or randomly from the video image in the first time period, which is not limited in this embodiment of the application. The video image of the first duration may be a video image of any time period whose duration is the first duration, which is not limited in this embodiment of the application. The first time period may be 0.5S, 1S, 2S, and the like, which is not limited in this embodiment of the present application.

S302, extracting target characteristics of the target image based on the face of the target user.

Specifically, the above target feature includes at least one of: a target yawning state feature, a target head state feature, and a target eye state feature.

Optionally, after the target image is acquired, at least one of the three characteristics, namely the target yawning state characteristic, the target head state characteristic and the target eye state characteristic, may be directly extracted from the target image by using a deep learning model. For example, when the target yawning state feature and the target eye state feature of the target image are to be extracted, the target image may be directly input into a trained deep learning model, the corresponding target yawning state feature and the target eye state feature are extracted through a convolution module of the deep learning model, and the deep learning model is obtained by performing comparison learning training based on images of a known yawning state and an eye state.

Optionally, after the target image is acquired, in order to ensure the effectiveness and accuracy of the extracted target feature, a face image of the target user may be extracted based on the target image for extracting the target yawning state feature and/or the target head state feature, and/or an eye image of the target user may be extracted based on the target image for extracting the target eye state feature, so that interference of other regions unrelated to the target feature in the target image on the extraction of the target feature is avoided, and the effectiveness and accuracy of the target feature are greatly ensured. Meanwhile, when yawning, a person not only has a phenomenon that the mouth is enlarged or the mouth is covered by hands, but also has a phenomenon that facial muscles are tensed and contracted or tears appear in eyes, and the like, so that the embodiment of the application is different from a mode that target yawning state features are determined only through a mouth region, and the target yawning state features are extracted from a facial image of a target user, so that the condition that the target user is mistakenly considered as the target user is yawning when the target user speaks the mouth can be avoided, the comprehensiveness of the target yawning state features is ensured, and the effectiveness and the accuracy of the target yawning state features are further ensured.

Further, after the target image is obtained, determining a target face image of the target user based on a preset detection algorithm and the target image, namely performing face key point detection on the target image to obtain a face position of the target user in the target image, intercepting the target face image of the target user from the target image according to the face position of the target user in the target image, and then extracting a target yawning state feature of the target face image by using a first model and/or extracting a target head state feature of the target face image by using a second model; and/or after the target image is obtained, determining a target eye image of the target user based on a preset detection algorithm and the target image, namely performing face key point detection on the target image to obtain the eye position of the target user in the target image, capturing the target eye image of the target user from the target image according to the eye position of the target user in the target image, and then extracting the target eye state feature of the target eye image by using a third model. The preset detection algorithm may be a face detection algorithm or a face key point detection algorithm, and the like, which is not limited in the embodiment of the present application.

Further, as shown in fig. 4A, the first model includes a first convolution module and a first full-link layer. When the first model is used for extracting the target yawning state features of the target face image, the target face image can be input into the trained first model, and then the corresponding target yawning state features are directly extracted through the first convolution module of the first model. The first build-up module includes at least one build-up layer. The first model is obtained by carrying out supervised contrast learning training based on a plurality of face images in known yawning states, and the loss function of the first model is a contrast learning loss function. The yawning state is used for representing the yawning condition of the user in the face image, and comprises a yawning state and an un-yawning state. In the embodiment of the application, the first model can be trained in a supervised contrast learning mode, the distance between the yawning state features of the face images in the same yawning state input into the first model is shortened, and the distance between the yawning state features of the face images in different yawning states input into the first model is lengthened, so that the features corresponding to different yawning states can be more obviously distinguished, and the anti-interference capability of a fatigue detection process to a complex environment is improved.

Further, as shown in fig. 4B, the second model includes a second convolution module and 3 second fully-connected layers. When the second model is used for extracting the target head state features of the target face image, the target face image can be input into the trained second model, and then the corresponding target head features are directly extracted through a second convolution module of the second model. The target head state features may include a target pitch angle feature, a target yaw angle feature and a target roll angle feature, the third convolution module may include at least three convolution layers, and each target head state may be extracted from the target face image through a convolution layer corresponding to each target head state. The second model is obtained by performing regression training based on a plurality of face images with known head states, and the head states are used for representing the head postures of the users in the face images and comprise a pitch angle, a yaw angle and a roll angle.

As shown in fig. 4B, since the trained second model is finally capable of outputting the estimated angle value corresponding to the head of the target user in the target face image, the value range of the angle value is [ -90,90] degrees, if it is directly regarded as 180 classification problems, if each class corresponds to one degree, regardless of the finally estimated wrong angle, the loss between the class and the real angle is as large, for example, 0 degree, 1 degree and 2 degrees, it is obvious that 1 degree is closer to 2 degrees than 0 degree, and if the corresponding loss between 0 degree and 2 degrees is as large as the loss between 1 degree and 2 degrees, the continuity and magnitude of the angle itself are ignored, so as to affect the accuracy of the second model and the accuracy of the head state detection result, further affecting the effectiveness and accuracy of the target head state feature of the target face image extracted by using the second model. In order to solve the above problems, in the embodiment of the present application, the second model is trained in a regression manner, that is, discretizing 3 angles, namely the pitch angle, the yaw angle, and the roll angle, corresponding to the labeled face images, respectively, assuming that the number of discretized categories is X, then performing X-1 group two classification on the 3 angles of the plurality of face images, and implementing ordered regression training of the second model according to the X-1 group two classification results, thereby ensuring the effectiveness and accuracy of the target head state features extracted by using the second model, and also ensuring the angular ordering of the euler angles (the pitch angle, the yaw angle, and the roll angle) of the head pose estimated based on the target head state features. X is an integer of 2 or more.

It is understood that the head state may include only one or two of the pitch angle, the yaw angle and the roll angle, which is not limited by the embodiment of the present application. The number of the target head state features corresponds to the number and types of the angles in the head state, for example, when the head state includes a pitch angle and a yaw angle, the target head state features extracted from the target face image by the second model obtained by performing regression training based on a plurality of face images of known head states may include target pitch angle features and target yaw angle features.

Further, as shown in fig. 4C, the third model includes a third convolution module and a third fully-connected layer. When the third model is used for extracting the target eye state features of the target eye image, the target eye image can be input into the trained third model, and then the corresponding target eye state features are directly extracted through a third convolution module of the third model. The third convolution module includes at least one convolution layer. The third model is obtained by performing comparison learning training based on a plurality of eye images with known eye states, and the loss function of the third model is a comparison learning loss function. The eye state is used for representing the open and closed eye conditions of the user in the eye image, and at least comprises a closed eye state and an open eye state. In the embodiment of the application, the third model can be trained in a supervised contrast learning mode, the distance between the eye state characteristics of the eye images in the same eye state input into the third model is shortened, and the distance between the eye state characteristics of the eye images in different eye states input into the third model is lengthened, so that the characteristics corresponding to different eye states can be more obviously distinguished, and the anti-interference capability of a fatigue detection process to a complex environment is improved.

Optionally, the target eye image includes a target left-eye image and a target right-eye image, and the target eye state features include target left-eye state features and target right-eye state features, that is, the target left-eye state features of the target left-eye image and the target right-eye state features of the target right-eye image may be extracted by using the third model, so that interference and influence of other regions unrelated to the eyes in the target image on the extracted target eye state features are further reduced.

After extracting the target feature of the target image based on the face of the target user, as shown in fig. 3, the fatigue detection method further includes:

s303, generating a target characteristic sequence based on the target characteristics of the target images of the plurality of frames.

Specifically, the target feature sequence may be obtained by splicing the target features corresponding to the multiple frames of target images within the first time period according to a time sequence. The target feature sequence may be a multi-dimensional feature or a feature map, which is not limited in this embodiment of the present application. The multiple target images in the first duration may be all target images constituting the video image in the first duration, or multiple target images extracted at equal intervals or randomly from the video image in the first duration, which is not limited in this embodiment of the present application. The video image with the first duration may be a video image with any time period with the first duration, which is not limited in this embodiment of the present application. The first time period may be 0.5S, 1S, 2S, and the like, which is not limited in this embodiment of the present application.

Specifically, after the target features corresponding to the multiple frames of target images in the first time period are extracted, the target features of the same category corresponding to the multiple frames of target images in the first time period may be obtained by stitching according to the time sequence of the corresponding target images in the first time period. The target characteristic sequence comprises a target yawning state characteristic sequence and/or a target head state characteristic sequence and/or a target eye state characteristic sequence. The type of the feature sequence included in the target feature sequence corresponds to the type of the target feature of the multi-frame target image in the first time length. After the target yawning state features corresponding to the multiple frames of target images in the first time length are extracted, splicing the target yawning state features corresponding to the multiple frames of target images in the first time length to obtain a target yawning state feature sequence; and/or after the target head state features corresponding to the multiple frames of target images in the first time length are extracted, the target head state features corresponding to the multiple frames of target images in the first time length can be spliced to obtain a target head state feature sequence; and/or after the target eye state features corresponding to the multiple frames of target images in the first time length are extracted, the target eye state features corresponding to the multiple frames of target images in the first time length can be spliced to obtain a target eye state feature sequence.

Further, when the extracted target head state features of the multi-frame target images in the first time period include at least two of the target pitch angle feature, the target yaw angle feature and the target roll angle feature, at least two target head state features corresponding to the multi-frame target images in the first time period can be respectively spliced according to the categories, so that at least two corresponding target head state feature sequences are obtained.

Optionally, when the target left-eye state features and the target right-eye state features corresponding to the multiple frames of target images within the first time period are extracted, the target left-eye state features corresponding to the multiple frames of target images within the first time period may be spliced to obtain a target left-eye state feature sequence, and the target right-eye state features corresponding to the multiple frames of target images may be spliced to obtain a target right-eye state feature sequence.

After generating the target feature sequence based on the target features of the target images of the multiple frames in the first time period, as shown in fig. 3, the fatigue detection method further includes:

s304, inputting the target feature sequence into the state detection model, and outputting the state detection result of the target user within the first time length.

Specifically, the state detection model is obtained by training based on multiple frame images within multiple first time periods of known user states. The state detection result includes: a head state detection result and/or a yawning state detection result and/or an eye state detection result. The head state detection result includes a head posture euler angle (at least one of a head pitch angle, a head yaw angle and a head roll angle) of the target user within a first time length, the yawning state detection result includes a probability that the target user is in a yawning state within the first time length, and the eye state detection result includes a probability that the target user is in a closed-eye state within the first time length.

It can be understood that, in the embodiment of the present application, the category to which the target feature sequence belongs, that is, the state related to the target feature sequence, is the same as the user state involved in the training of the state detection model.

Specifically, when the target feature sequence includes a target yawning state feature sequence, the target yawning state feature sequence may be input into a yawning state detection model (state detection model) obtained by training a plurality of multi-frame images within a first time period based on a known yawning state (user state), so as to output a yawning state detection result of the target user within the first time period; and/or, when the target feature sequence includes the target head state feature sequence, inputting the target head state feature sequence into a head state detection model (state detection model) trained on multiple frame images within a plurality of first time periods of known head states (user states), so as to output a head state detection result of the target user within the first time periods; and/or when the target feature sequence comprises the target eye state feature sequence, inputting the target eye state feature sequence into an eye state detection model (state detection model) obtained by training based on multi-frame images of known eye states (user states) in a plurality of first time periods, so as to output the eye state detection result of the target user in the first time period.

For example, in the process of fatigue detection, an implementation process of detecting user states of a target user, such as an eye state, a yawning state, a head state, and the like, within a first time period is shown in fig. 5, if the first time period is 0.5S, and a time period corresponding to each frame is 1/12 second, that is, a total of 6 target images may be included in the first time period. After the video images with the first duration, namely 6 continuous target images are obtained, the target features of one frame of target image can be extracted every other frame based on the face of the target user in the target image, namely the target features of 3 frames of target images within the first duration are extracted altogether, then the target features of the 3 frames of target images are spliced according to the sequence of the 3 frames of target images within the first duration to obtain a target feature sequence or a target feature map, and then the target feature sequence or the target feature map is input into a trained state detection model, so that the state detection result corresponding to the first duration of the target user is output. The images within the n first time periods (the first time period i1, the first time period i2, the first time period in) involved in the state detection model training may be images acquired before 6 frames of target images of the first time period involved in the state detection.

It can be understood that, when it is desired to determine the yawning state of the target user within the first time duration, the target feature in fig. 5 should be a target yawning state feature, the target feature sequence should be a target yawning state feature sequence, the state detection model should be a yawning state detection model, the user state involved in training the yawning state detection model should be a yawning state, and the state detection result output by the yawning state detection model may include a probability that the target user is in the yawning state at the first time, and may also include a probability that the target user is in the non-yawning state at the first time.

It is to be understood that, when it is desired to determine the head state of the target user within the first time period, the target feature in fig. 5 should be a target head state feature (including at least one of the 3 features related to the euler angles of the head postures, namely the target pitch angle feature, the target yaw angle feature and the target roll angle feature), the target feature sequence should be a target head state feature sequence, the state detection model should be a head state detection model, the user state involved in training the head state detection model should be a head state (including the euler angles of the head postures corresponding to the target head state feature), and the state detection result output by the head state detection model includes at least one euler angle of the head postures, the head yaw angle and the head roll angle, which corresponds to the category of the euler angles of the head postures included in the user state.

It can be understood that, when it is desired to determine the eye state of the target user within the first time period, the target feature in fig. 5 should be a target eye state feature, the target feature sequence should be a target eye state feature sequence, and the state detection model should be an eye state detection model, the user state involved in training the eye state detection model should be an eye state, that is, whether the user is in an eye-open state or an eye-closed state within the first time period, and the state detection result output by the eye state detection model may include a probability that the target user is in the eye-closed state at the first time, and may also include a probability that the target user is in the eye-open state at the first time.

Alternatively, when the target image includes a target left-eye image and a target right-eye image, the state detection model in fig. 5 may obtain a left-eye state detection result and a right-eye state detection result of the target user in the first duration, that is, a probability that the left eye is in the closed-eye state and a probability that the right eye is in the closed-eye state, respectively, and then may perform a weighted summation on the left-eye state detection result and the right-eye state detection result, for example, but not limited to, an average summation of the probability that the left eye is in the closed-eye state and the probability that the right eye is in the closed-eye state, and the like, so as to obtain an eye state detection result of the target user in the first duration, that is, a probability that the target user is in the open-eye state.

It can be understood that, in the embodiment of the present application, each state detection result corresponds to the first duration, rather than to the target image of a single frame, so as to avoid the problem of weak anti-interference capability for a complex environment change when determining the user state according to a single frame image, improve stability and accuracy of user state detection, thereby enhancing effectiveness and anti-interference capability of fatigue detection, and further improving accuracy of fatigue detection.

After determining the state detection results corresponding to a plurality of consecutive first time periods within the second time period, as shown in fig. 3, the fatigue detection method further includes:

s305, determining fatigue detection results of the target user according to state detection results corresponding to a plurality of continuous first time periods in the second time period.

Specifically, the second time period is equal to the total time period of the first time periods, that is, the second time period may be composed of the first time periods. And splicing the time periods corresponding to the first time lengths according to the time sequence to obtain the time period corresponding to the second time length.

In the embodiment of the application, compared with the situation that the accuracy is not high due to the fact that the accuracy is easily influenced by a complex environment when the state detection or the fatigue detection is carried out based on a single-frame target image, the method and the device determine the states of the target user in the first time length, such as the eye state, the head state and the yawning state, through the target feature sequence generated according to the target features of multiple frame target images in the first time length, determine the fatigue detection result of the target user in the second time length according to the states corresponding to the multiple continuous first time lengths of the target user in the second time length, and can improve the anti-interference capacity of the fatigue detection under the condition of the change of the complex environment, so that the stability and the accuracy of the fatigue detection are improved.

Optionally, the fatigue detection result includes a fatigue degree distribution result; the fatigue degree distribution result comprises at least one fatigue degree of the target user in the second time period and the probability corresponding to the fatigue degree. As shown in fig. 6, the implementation process of determining the fatigue detection result of the target user in S305 may include several steps:

s601, extracting fatigue characteristic information in the state detection results corresponding to a plurality of continuous first time periods in a second time period.

Specifically, after state detection results corresponding to a plurality of consecutive first time durations within the second time duration are obtained, fatigue feature information of the target user within the second time duration may be extracted from the state detection results corresponding to the plurality of consecutive first time durations. The fatigue characteristic information includes at least one of: the total eye closing time of the target user in the second time period, the average eye closing time of each eye closing time, the blinking frequency, the total time of yawning, the mean value and the variance of the euler angles of the head postures, the percentage of the head pitch angles smaller than the first threshold value, and the percentage of the head yaw angles smaller than the second threshold value or larger than the third threshold value.

Further, since the head of the person tends to have a head-down or head-left-right shaking amplitude when the person is fatigued, the fatigue characteristic information in the second time period may include, but is not limited to, a mean value and a variance of the head pitch angles and/or the head yaw angles and/or the head roll angles of the target user in the first time period, a percentage of the head pitch angles corresponding to the first time periods that are smaller than the first threshold value (corresponding to the target user being in a head-down state in the second time period), a percentage of the head yaw angles corresponding to the first time periods that are smaller than the first threshold value (corresponding to the target user being in a head-left state in the second time period), a percentage of the head yaw angles corresponding to the first time periods that are smaller than the second threshold value (corresponding to the target user being in a head-left shaking state in the second time period), or a percentage of the head yaw angles corresponding to the first time periods that are larger than the third threshold value (corresponding to the target user being in a right shaking amplitude), and a percentage of the head yaw angles corresponding to the first time periods that are larger than the target user being larger than the second threshold value (corresponding to the target user being in a left-right shaking amplitude or a target user being larger than the target user roll angle in the fifth threshold value. The first threshold may be-10 degrees, 0 degrees, etc., the second threshold may be-30 degrees, -10 degrees, etc., the third threshold may be 30 degrees, 20 degrees, etc., the fourth threshold may be-40 degrees, -10 degrees, etc., and the fifth threshold may be 35 degrees, 25 degrees, etc., which is not limited in the embodiment of the present application.

Further, because the yawning phenomenon often occurs when a person is tired, and the more tired the person is, the longer the frequency or time of the yawning is, when the state detection result includes the probability that the target user is in the yawning state within the first duration, that is, the detection result of the yawning state, the fatigue feature information within the second duration may include, but is not limited to, the frequency or total duration of the yawning of the target user within the second duration. In the embodiment of the application, but not limited to, when the probability of being in the yawning state corresponding to the first time duration is greater than the yawning probability threshold, the target user is considered to be in the yawning state within the first time duration; and when the probability in the yawning state corresponding to the first time duration is less than or equal to the threshold of the yawning probability, the target user is considered to be in the non-yawning state in the first time duration. The threshold of the yawning probability may be 0.5, 0.6, and the like, which is not limited in the embodiment of the present application. The number of times of yawning in the second duration may be the sum of the number of times of yawning in the second duration when the target user is in a non-yawning state in two adjacent first durations, and the number of times of yawning in the two first durations before and after the second duration when the target user is in a yawning state. The total time length of the yawning in the second time length may be the sum of all the first time lengths of the target user in the yawning state in the second time length.

Further, since a person tends to have a phenomenon of dozing off and eye-closing or a slowing down of the blinking frequency when tired, when the state detection result includes the probability that the target user is in the eye-closing state within the first period, i.e., the eye detection result, the tiredness characteristic information within the second period may include, but is not limited to, the total eye-closing period of the target user within the second period, the average eye-closing period per time, the blinking frequency, and the like. In the embodiment of the present application, but not limited to, when the probability of being in the eye-closing state corresponding to the first duration is greater than the threshold of the eye-closing probability, it is considered that the target user is in the eye-closing state within the first duration; and when the probability of being in the eye-closing state corresponding to the first time length is smaller than or equal to the eye-closing probability threshold value, the target user is considered to be in the eye-opening state within the first time length. The eye closing probability threshold may be 0.68, 0.7, etc., which is not limited in this embodiment of the application. The total eye-closing time length in the second time length may be the sum of all first time lengths in which the target user is in the eye-closing state in the second time length. The number of closed eyes in the second time period may be the sum of the number of times that the target user is in the open eye state in two adjacent first times in the second time period and the number of times that the target user is in the closed eye state in two first times before and after the second time period. The average duration of each eye closure in the second duration is equal to the total eye closure duration in the second duration divided by the eye closure times in the second duration. The blink frequency for the second time period may be equal to the number of eye closures for the second time period divided by the second time period.

For example, as shown in fig. 7, the second time period may include 3 consecutive first time periods, where the state detection results corresponding to the target user in the 1 st first time period include a yawning probability of 0.3, an eye-closing probability of 0.4, and a pitch angle of-3 degrees, the state detection results corresponding to the target user in the 2 nd first time period include a yawning probability of 0.6, an eye-closing probability of 0.7, and a pitch angle of-15 degrees, and the state detection results corresponding to the target user in the 3 rd first time period include a yawning probability of 0.8, an eye-closing probability of 0.9, and a pitch angle of-30 degrees. If the threshold of the yawning probability is 0.5, it may be determined that the target user is in a non-yawning state within the 1 st first duration, and is in a yawning state within the 2 nd first duration and the 3 rd first duration, and it may be determined from the state detection result of the target user corresponding to the 3 first durations within the second duration that the total yawning duration of the target user within the second duration is the sum of the 2 first durations, for example, if the first duration is 1S and the second duration is 3S, the total yawning duration of the target user within the 3S is 2S; if the eye closing probability threshold is 0.6, it may be determined that the target user is in an eye opening state within the 1 st first duration, and is in an eye closing state within both the 2 nd first duration and the 3 rd first duration, and it may be determined from the state detection results of the target user corresponding to the 3 first durations within the second duration that the total eye closing duration of the target user within the second duration is the sum of the 2 first durations, for example, if the first duration is 1S and the second duration is 3S, the total eye closing duration of the target user within the 3S is 2S; if the first threshold is-10 degrees, it may be determined that the head pitch angle of the target user in the 1 st first duration is not less than the first threshold, that is, the target user does not have a head-down state, and the head pitch angles in the 2 nd first duration and the 3 rd first duration are both less than the first threshold, that is, the target user is in a head-down state in the 2 nd first duration and the 3 rd first duration, and then it may be determined from the state detection results of the target users corresponding to the 3 first durations in the second duration that the percentage of the head pitch angle of the target user in the second duration that is less than the first threshold is (2/3) × 100%, so that the total under-worded duration, the total closed-eye pitch duration, and the percentage of the head pitch angle that is less than the first threshold, which are determined according to the state detection results of the target users corresponding to the 3 consecutive first durations in the second duration, may be used as the fatigue characteristic information of the target user in the second duration.

After extracting the fatigue feature information in the second time period, as shown in fig. 6, the implementation flow of determining the fatigue detection result of the target user in S305 further includes:

s602, inputting the fatigue characteristic information into a fourth model, and outputting a fatigue degree distribution result of the target user in the second time length, wherein the fourth model is obtained by training a plurality of video images of the second time length with known fatigue degrees, and the fatigue degrees of the video images of the second time length comprise the fatigue degrees of the video images marked by a plurality of marking personnel.

Specifically, after extracting the fatigue feature information in the state detection results corresponding to a plurality of consecutive first time periods within the second time period, the fatigue feature information may be directly input into the trained fourth model, so as to output the fatigue degree distribution result of the target user within the second time period. The fatigue degree may include the degree of no fatigue, light fatigue, and severe fatigue, and may also include multiple fatigue levels, such as level 0 fatigue, level 1 fatigue, and level 2 fatigue, which is not limited in the embodiments of the present application. The fatigue degree of the video image in the second time duration used in the training of the fourth model may include a fatigue degree probability distribution result obtained by performing statistics on the fatigue degree of the video image annotation by a plurality of annotating personnel and then normalizing the statistics.

Specifically, because everyone knows all differences to the criterion of fatigue degree in heart, in this application embodiment, through letting the fatigue degree distribution condition that the video image that the fourth model study that a plurality of second was long corresponds study, make it learn the condition of judging of a plurality of mark personnel to user's fatigue degree in the same video image to improve fatigue detection's accuracy, let the fatigue detection result more accord with the holistic cognition of crowd.

Exemplarily, the loss function of the above fourth model is used when the fatigue degree is divided into N levels in total

Wherein p represents the probability of fatigue level corresponding to the predicted video image,

a probability distribution, CDF, representing the corresponding fatigue level of the labeled video image _p (k) The accumulated value of the probabilities of the k-th fatigue degree corresponding to the video image is represented,

and r is a hyperparameter, and the cumulative value of the probability distribution of the k-th fatigue degree corresponding to the marked video image is represented.

For example, as shown in fig. 8, when the second time period includes 3 consecutive first time periods, after the state detection results of the target user in the 3 first time periods are determined in the manner shown in S304 in fig. 3, fatigue feature information corresponding to the target user in the second time period may be extracted from the state detection results corresponding to the 3 consecutive first time periods in the manner shown in S601 in fig. 6, and then the fatigue feature information is input into the trained fourth model, so as to obtain the fatigue degree distribution result of the target user in the second time period, for example, a probability that the target user does not fatigue in the second time period is a, a probability of slight fatigue is B, and a probability of severe fatigue is C. When the fatigue detection process shown in fig. 8 is applied to a scene such as assisted driving, if the output probability C of severe fatigue is greater than a target fatigue probability threshold (for example, but not limited to, 0.6, 0.7, etc.), an alarm message may also be sent immediately to prompt that the target user is driving fatigue, so that there is a safety risk. As shown in fig. 8, in order to improve the accuracy of fatigue detection and make the fatigue detection result more conform to the overall cognition of the crowd, before performing fatigue detection, K annotating personnel can be made to perform fatigue degree annotation on y video images of the second duration, so that the fatigue degree annotated by K annotating personnel exists in each video image of the second duration, that is, the label corresponding to each video image of the second duration is a fatigue degree probability distribution result, and then the video images of the second durations with known fatigue degree distribution results are used as training data to train the fourth model, so that the fourth model can learn the overall cognition condition of the annotating personnel on the fatigue degree, and the effectiveness and accuracy of fatigue detection are improved. The plurality of video images with the second duration may be a plurality of video images with the same duration but different time periods, or a plurality of video images with the same duration and different users at the same time, and the like, which is not limited in the embodiment of the present application.

Optionally, after S601, that is, after extracting the fatigue feature information in the state detection results corresponding to a plurality of consecutive first time periods within the second time period, it may also be directly determined whether the fatigue feature information within the second time period meets a preset fatigue condition, if the fatigue feature information within the second time period meets the preset fatigue condition, it is determined that the target user is in a fatigue state within the second time period, and if the fatigue feature information within the second time period does not meet the preset fatigue condition, it is determined that the target user is in a non-fatigue state within the second time period. The preset fatigue condition may be, but is not limited to, a total time of the target user in the second time period, or a second time period in which the total time of the target user in the second time period is greater than 1/3 times, or a percentage of a pitch angle of the target user in the second time period, which is smaller than the first threshold, is greater than 60%, and the like, which is not limited in the embodiment of the present application.

Optionally, after the fatigue feature information of the target user in the second time period is extracted in S601, the fatigue state of the target user in the second time period may also be directly matched with the feature information corresponding to each preset fatigue degree instead of detecting the fatigue state of the target user in the second time period through the fourth model, and when the fatigue feature information in the second time period is matched with the feature information corresponding to a certain preset fatigue degree, the fatigue degree corresponding to the matched feature information may be determined as the fatigue degree of the target user in the second time period.

Alternatively, after S303, the state of the target user may not be detected, but a target feature sequence generated based on the target features of the multiple frames of target images in the first time period is directly input into the trained fatigue state detection model, so as to output the fatigue state detection result of the target user in the first time period. The fatigue state detection result may include a probability that the target user is in a fatigue state within the first time period, or may include a plurality of fatigue degrees that the target user may be in within the first time period and a probability corresponding to each fatigue degree, that is, a fatigue degree distribution situation. The training process of the fatigue state detection model is similar to that of the fourth model in fig. 8, and details are not repeated here.

Next, refer to fig. 9, which is a flowchart illustrating another fatigue detection method according to an exemplary embodiment of the present application. As shown in fig. 9, the fatigue detection method includes the following steps:

s901, obtaining a plurality of frames of target images in a first time length, wherein the target images comprise target users.

Specifically, S901 is identical to S301, and is not described herein again.

S902, extracting target features of the target image based on the face of the target user.

Specifically, S902 is identical to S302, and is not described herein again.

And S903, generating a target characteristic sequence based on the target characteristics of the target images of the multiple frames.

Specifically, S903 is identical to S303, and is not described herein again.

And S904, performing dimension reduction processing on the target feature sequence to obtain a first target feature sequence.

Specifically, after the target features of the multiple frames of target images within the first duration are spliced into the target feature sequence, the target feature sequence may be subjected to a dimension reduction process, so as to remove noise in the target feature sequence, and obtain the first target feature sequence. The dimension reduction processing may be, but not limited to, a Principal Component Analysis (PCA) dimension reduction method or the like.

S905, reducing the first target characteristic sequence to the dimension same as that of the target characteristic sequence to obtain a second target characteristic sequence.

Specifically, after the target feature sequence is subjected to dimensionality reduction processing to obtain a first target feature sequence with noise removed, in order to ensure that the state detection model can output a state detection result of the target user within a first duration based on the feature sequence with noise removed, the first target feature sequence with noise removed needs to be reduced to a dimensionality the same as that of the target feature sequence, so as to obtain a second target feature sequence.

S906, inputting the second target feature sequence into the state detection model, and outputting the state detection result of the target user within the first time length.

Specifically, the implementation process of S906 is consistent with that of S304, and is not described herein again.

And S907, determining fatigue detection results of the target user according to the state detection results corresponding to the plurality of continuous first time periods within the second time period.

Specifically, S907 is identical to S305, and is not described herein again.

In the embodiment of the application, the target features of multiple frames of target images in a first time period are spliced into a target feature sequence to be subjected to dimensionality reduction to obtain a first target feature sequence, then the target features are reduced to obtain a second target feature sequence, so that noise in the feature sequence is removed, the influence of external environment factors is avoided, finally the second target feature sequence after the noise is removed is input into a state detection model, the state detection result of a target user in the first time period is output, the fatigue detection result of the target user is determined according to the state detection results corresponding to a plurality of continuous first time periods in a second time period, the state of the target user is detected through the second target feature sequence corresponding to each first time period after the noise is removed, the fatigue condition of the target user in the plurality of continuous first time periods is determined, the anti-interference capability of fatigue detection is further improved, and the effectiveness of the state detection and the accuracy of the fatigue detection are further improved.

Referring to fig. 10, fig. 10 is a fatigue detecting device according to an exemplary embodiment of the present disclosure. The fatigue detection device 1000 includes:

the acquiring module 1010 is configured to acquire a plurality of frames of target images within a first time period; the target image comprises a target user;

a feature extraction module 1020 for extracting a target feature of the target image based on the face of the target user;

a generating module 1030, configured to generate a target feature sequence based on target features of the multiple frames of target images;

the state detection module 1040 is configured to input the target feature sequence into a state detection model, and output a state detection result of the target user within the first duration; the state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states;

a determining module 1050, configured to determine a fatigue detection result of the target user according to the state detection results corresponding to a plurality of consecutive first time periods within a second time period; the second duration is equal to a total duration of the plurality of first durations.

the feature extraction module 1020 includes:

A third extraction unit configured to extract a target eye state feature of the target eye image using a third model; the first model is obtained by performing comparative learning training on a plurality of known face images in a yawning state; the second model is obtained by performing regression training on a plurality of face images with known head states; the third model is obtained by performing comparative learning training based on a plurality of eye images with known eye states.

In a possible implementation manner, the fatigue detection apparatus 1000 further includes:

the state detection module 1040 is specifically configured to: and inputting the second target characteristic sequence into a state detection model, and outputting a state detection result of the target user within the first time length.

In a possible implementation manner, the fatigue detection result includes a fatigue degree distribution result; the fatigue degree distribution result comprises at least one fatigue degree of the target user in the second time period and the probability corresponding to the fatigue degree; the determining module 1050 includes:

a fourth extracting unit, configured to extract fatigue feature information in the state detection result corresponding to multiple consecutive first time periods within a second time period;

The division of the modules in the fatigue detection apparatus is only for illustration, and in other embodiments, the fatigue detection apparatus may be divided into different modules as needed to complete all or part of the functions of the fatigue detection apparatus. The implementation of each module in the fatigue detection apparatus provided in the embodiments of the present specification may be in the form of a computer program. The computer program may be run on a terminal or a server. The program modules formed by the computer program may be stored on the memory of the terminal or the server. The computer program, when executed by a processor, implements all or part of the steps of the fatigue detection method described in the embodiments of the present specification.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a vehicle according to an exemplary embodiment of the present application. As shown in fig. 11, the vehicle 1100 may include: at least one processor 1110, at least one communication bus 1120, a user interface 1130, at least one network interface 1140, and memory 1150.

The communication bus 1120 can be used for realizing the connection communication of the above components.

User interface 1130 may include a Display screen (Display) and a Camera (Camera), and user interface 1130 may also include a standard wired interface or a wireless interface. The camera may be used to capture the face of a target user (driver) driving the vehicle 1100, resulting in a target image.

The network interface 1140 may include a bluetooth module, a Near Field Communication (NFC) module, a Wireless Fidelity (Wi-Fi) module, and the like.

Processor 1110 may include one or more processing cores, among others. Processor 1110 interfaces with various components throughout vehicle 1100 using various interfaces and lines to perform various functions of routing vehicle 1100 and process data by operating or executing instructions, programs, code sets, or instruction sets stored in memory 1150, and invoking data stored in memory 1150. Optionally, the processor 1110 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1110 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is to be understood that the modem may not be integrated into the processor 1110, but may be implemented by a single chip.

The Memory 1150 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 1150 includes non-transitory computer-readable media. The memory 1150 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1150 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as an acquire function, a state detection function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. Memory 1150 may alternatively be at least one storage device located remotely from the aforementioned processor 1110. As shown in fig. 11, the memory 1150, which is one type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and program instructions.

In particular, the processor 1110 may be configured to invoke program instructions stored in the memory 1150 and specifically perform the following operations: acquiring a plurality of frames of target images within a first time length; the target image comprises a target user; extracting target features of the target image based on the face of the target user; generating a target characteristic sequence based on the target characteristics of the multi-frame target images; inputting the target characteristic sequence into a state detection model, and outputting a state detection result of the target user within the first duration; the state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states.

Determining fatigue detection results of the target user according to the state detection results corresponding to a plurality of continuous first time periods within a second time period; the second duration is equal to a total duration of the plurality of first durations.

In some possible embodiments, the target feature includes at least one of: the target yawning state characteristic, the target head state characteristic and the target eye state characteristic; the processor 1110, when executing the extraction of the target feature of the target image based on the face of the target user, is specifically configured to: determining a target face image and/or a target eye image of the target user based on a preset detection algorithm and the target image; extracting the target yawning state characteristics of the target face image by using a first model; and/or extracting the target head state characteristic of the target face image by using a second model; and/or extracting the target eye state characteristics of the target eye image by using a third model; the first model is obtained by performing comparative learning training on a plurality of face images in known yawning states; the second model is obtained by performing regression training based on a plurality of face images with known head states; the third model is obtained by performing comparative learning training based on a plurality of eye images with known eye states.

In some possible embodiments, the target feature sequence includes a target yawning state feature sequence and/or a target head state feature sequence and/or a target eye state feature sequence.

In some possible embodiments, the target eye image includes a target left eye image and a target right eye image; the target eye state characteristics comprise target left eye state characteristics and target right eye state characteristics; the target eye state feature sequence comprises a target left eye state feature sequence and a target right eye state feature sequence.

In some possible embodiments, after the processor 1110 performs generating a target feature sequence based on the target features of the multiple frames of target images, before inputting the target feature sequence into a state detection model and outputting a state detection result of the target user within the first time period, the processor is further configured to perform:

performing dimensionality reduction processing on the target feature sequence to obtain a first target feature sequence; and reducing the first target characteristic sequence to the dimension same as that of the target characteristic sequence to obtain a second target characteristic sequence.

The processor 1110 is specifically configured to perform, when the target feature sequence is input into a state detection model and a state detection result of the target user within the first duration is output: and inputting the second target characteristic sequence into a state detection model, and outputting a state detection result of the target user within the first time length.

In some possible embodiments, the state detection result includes: a head state detection result and/or a yawning state detection result and/or an eye state detection result; the head state detection result comprises a head posture Euler angle of the target user within the first time length; the detection result of the yawning state comprises the probability that the target user is in the yawning state within the first duration; the eye state detection result includes a probability that the target user is in an eye-closing state within the first duration.

In some possible embodiments, the fatigue detection result includes a fatigue degree distribution result; the fatigue degree distribution result comprises at least one fatigue degree of the target user in the second time period and the probability corresponding to the fatigue degree;

when the processor 1110 determines the fatigue detection result of the target user according to the state detection results corresponding to a plurality of consecutive first time periods within a second time period, it is specifically configured to perform: extracting fatigue characteristic information in the state detection results corresponding to a plurality of continuous first time spans in a second time span; inputting the fatigue characteristic information into a fourth model, and outputting a fatigue degree distribution result of the target user in the second time period; the fourth model is obtained by training based on a plurality of video images with known fatigue degrees and a second duration; the fatigue degree of the video image of the second duration comprises the fatigue degree of a plurality of annotating personnel for annotating the video image.

In some possible embodiments, the fatigue characteristics information includes at least one of: the total eye closing time length, the average eye closing time length of each eye closing time length, the blink frequency, the total yawning time length, the mean value and the variance of the euler angles of the head postures, the percentage of the head pitch angles smaller than the first threshold value, and the percentage of the head yaw angles smaller than the second threshold value or larger than the third threshold value of the target user in the second time length are calculated.

Embodiments of the present application further provide a computer storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more steps of any of the above methods. The respective constituent modules of the above-described fatigue detection apparatus may be stored in the storage medium if they are implemented in the form of software functional units and sold or used as independent products.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by instructing relevant hardware by a computer program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. The technical features in the present examples and embodiments may be arbitrarily combined without conflict.

The above-described embodiments are only preferred embodiments of the present application, and are not intended to limit the scope of the present application, and various modifications and improvements made to the technical solutions of the present application by those skilled in the art without departing from the design spirit of the present application should fall within the protection scope defined by the claims of the present application.

Claims

1. A method of fatigue detection, the method comprising:

extracting target features of the target image based on the target user;

generating a target feature sequence based on the target features of the target images of the multiple frames;

inputting the target characteristic sequence into a state detection model, and outputting a state detection result of the target user within the first time length; the state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states;

2. The method of claim 1, wherein the target feature comprises at least one of: the target yawning state characteristic, the target head state characteristic and the target eye state characteristic;

the extracting the target feature of the target image based on the face of the target user comprises:

extracting a target yawning state feature of the target face image by using a first model; and/or

Extracting a target head state feature of the target face image by using a second model; and/or

Extracting the target eye state characteristics of the target eye image by using a third model;

the first model is obtained by performing comparative learning training on a plurality of known face images in a yawning state; the second model is obtained by carrying out regression training on a plurality of face images with known head states; and the third model is obtained by performing comparative learning training on a plurality of eye images with known eye states.

3. The method of claim 2, in which the target feature sequence comprises a target yawning state feature sequence and/or a target head state feature sequence and/or a target eye state feature sequence.

4. The method of any one of claims 1-3, further comprising:

and inputting the second target feature sequence into a state detection model, and outputting a state detection result of the target user within the first time length.

5. The method of any one of claims 1-3, wherein the status detection result comprises: a head state detection result and/or a yawning state detection result and/or an eye state detection result; the head state detection result comprises a head posture Euler angle of the target user within the first duration; the detection result of the yawning state comprises the probability that the target user is in the yawning state within the first duration; the eye state detection result comprises the probability that the target user is in the eye closing state within the first duration.

6. The method of claim 1, wherein the fatigue detection results comprise fatigue degree distribution results; the fatigue degree distribution result comprises at least one fatigue degree of the target user in the second time length and the probability corresponding to the fatigue degree;

the determining the fatigue detection result of the target user according to the state detection results corresponding to a plurality of continuous first time periods within a second time period includes:

extracting fatigue characteristic information in the state detection result corresponding to a plurality of continuous first time spans in a second time span;

inputting the fatigue characteristic information into a fourth model, and outputting a fatigue degree distribution result of the target user in the second time period; the fourth model is obtained by training based on a plurality of video images with known fatigue degrees and second duration; the fatigue degree of the video image of the second duration comprises the fatigue degree of a plurality of annotating personnel on the annotation of the video image.

7. The method of claim 6, wherein the fatigue characteristics information comprises at least one of: the total eye closing time length of the target user in the second time length, the average eye closing time length of each time, the blinking frequency, the total yawning time length, the mean value and the variance of the euler angles of the head posture, the percentage of the head pitch angle smaller than the first threshold value, and the percentage of the head yaw angle smaller than the second threshold value or larger than the third threshold value.

8. A fatigue detecting device, comprising:

the state detection module is used for inputting the target feature sequence into a state detection model and outputting a state detection result of the target user within the first time length; the state detection model is obtained by training based on multi-frame images in a plurality of first time lengths of known user states;

9. The apparatus of claim 8, wherein the fatigue detection results comprise fatigue degree distribution results; the fatigue degree distribution result comprises at least one fatigue degree of the target user in the second time length and the probability corresponding to the fatigue degree;

the determining module comprises:

the extraction unit is used for extracting fatigue characteristic information in the state detection result corresponding to a plurality of continuous first time lengths in a second time length;

the fatigue degree distribution detection unit is used for inputting the fatigue characteristic information into a fourth model and outputting a fatigue degree distribution result of the target user in the second time length; the fourth model is obtained by training based on a plurality of video images with known fatigue degrees and second duration; the fatigue degree of the video image of the second duration comprises the fatigue degree of a plurality of annotating personnel on the annotation of the video image.

10. A vehicle, characterized by comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps according to any of claims 1-7.