CN113591699A

CN113591699A - Online visual fatigue detection system and method based on deep learning

Info

Publication number: CN113591699A
Application number: CN202110869724.XA
Authority: CN
Inventors: 牛毅; 张子楠; 马明明; 李甫; 石光明
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-11-02
Anticipated expiration: 2041-07-30
Also published as: CN113591699B

Abstract

The invention discloses an online visual fatigue detection system based on deep learning, which mainly solves the problems of single information of an operator, poor real-time performance of system operation and low fatigue detection accuracy rate in the prior art, and comprises a data acquisition module, an image data processing module and a fatigue detection module, wherein the data acquisition module is arranged right below a computer display and is used for acquiring eye movement data, RGB (red, green and blue) images and depth information; the image data processing module is used for detecting the positions of the faces and the characteristic points of the faces in the image data and extracting the depth information of the characteristic points; the fatigue detection module is used for extracting, fusing and classifying the features of the eye movement data, the face feature point data and the depth data and outputting the fatigue degree of an operator. The invention uses a non-contact method, reduces the influence on the working state of an operator, avoids the design of manual characteristics and improves the accuracy of visual fatigue detection. The visual fatigue detection device can be used for detecting the visual fatigue level of an operator in real time on line.

Description

Online visual fatigue detection system and method based on deep learning

Technical Field

The invention belongs to the technical field of computer vision and video analysis, and further relates to an online visual fatigue detection system and method, which can be used for online real-time detection of visual fatigue grades of operators.

Background

With the continuous progress of social development, the use of computers has been related to all industries, and more work posts require practitioners to be skilled in the relevant skills of the computers and require the operators to use the computers for a long time. The labor amount of the working physical power is small, the working content is monotonous and repeated, the visual fatigue of an operator is easily caused, the working capacity of the operator is reduced, and the working efficiency is greatly reduced. Moreover, when the human body continuously shows a visual fatigue state, the operator is easily caused to have symptoms such as inattention, dry eyes, dizziness and the like. Therefore, how to detect the visual fatigue of the operator and adopt an effective intervention method in time ensures that the operator can always keep a better working state, thereby improving the task performance of the operator, and is a matter of concern. The prior art is mainly used for visual fatigue detection behaviors which are mainly divided into physiological information and visual information, wherein the physiological information mainly comprises electroencephalogram information and heart rate information, and the visual information mainly comprises eye movement information and facial information.

A method and a device for detecting fatigue are disclosed in a patent document applied by Kunshan division, a institute of microelectronics, Chinese academy of sciences (application No. 201811360966.0, application publication No. CN109657550A, application date: 2018, 11 and 15). The method disclosed by the patent application comprises the steps of firstly shooting a video clip, then detecting each face image in the video clip according to the time dimension, then extracting the characteristic points of a plurality of areas in each face image, then calculating the eye closure degree, the mouth opening and closing degree and the nodding frequency of each moment according to the characteristic points of the plurality of areas, and calculating corresponding threshold values to determine the fatigue degree of an operator. The method has the following defects: since the relevant features and thresholds are designed manually, the quality of the features and the size of the thresholds directly influence the final fatigue detection result. The device disclosed in this patent includes a photographing module, a detecting module, an extracting module and a determining module. When the device works, firstly, a shooting module shoots a video clip; then, the detection module detects each face image in the video clip shot by the shooting module according to the time dimension; then extracting the characteristic points of a plurality of areas in each face image detected by the detection module by an extraction module; and finally, determining the fatigue of the corresponding personnel by a determining module according to the feature points of the plurality of regions extracted by the extracting module. This system has two disadvantages: firstly, the device only inputs video data, has single data type and lacks other effective information, so that the fatigue state judgment result is inaccurate and is easily influenced by illumination factors; secondly, the device only inputs image data in a 2D space, lacks depth information, and therefore cannot represent face pose information.

The patent document "driving fatigue detection method and system combining pseudo-3D convolutional neural network and attention mechanism" (application number: 202010522475.2, application publication number: CN111428699A, application date: 2020, 06 months and 10 days) applied by Nanjing university of science and engineering discloses a driving fatigue detection method and system combining pseudo-3D convolutional neural network and attention mechanism. The system disclosed in the patent application comprises a video acquisition and cutting module, a driving fatigue detection module and a display module. When the system works, firstly, the video acquisition and cutting module acquires real-time video stream of upper half body information of a driver, then the driving fatigue detection module detects the fatigue degree of the driver, and finally, the display module displays input video image information, output driving fatigue detection state information and warning information after the driving fatigue is detected. The system has the disadvantage that the fatigue state of the operator cannot be comprehensively detected due to the fact that only video stream information is collected and the information is single in type. The patent application discloses a driving fatigue detection method combining a pseudo-3D convolutional neural network and an attention mechanism, which comprises the steps of firstly extracting a driving video frame sequence and processing the driving video frame sequence; then, a pseudo-3D convolution module is adopted to carry out space-time feature learning; then constructing a P3D-Attention module, and applying Attention on the channel and the feature map by using an Attention mechanism; and finally, classifying by using a 2D global average pooling layer and a Softmax classification layer. The method directly operates the sequence image, so that the method causes larger network parameter quantity, more redundant information and poorer real-time performance, and has higher requirement on hardware equipment if the real-time performance is required to be met.

Disclosure of Invention

The invention aims to provide an online visual fatigue detection system and method based on deep learning to acquire abundant physiological information of an operator, improve the calculation efficiency and improve the real-time performance of detection and the accuracy rate of fatigue detection aiming at the defects of the prior art.

The idea for realizing the purpose of the invention is as follows: for the problem that the fatigue degree of an operator is increased by the contact part of equipment and a human body in a contact type visual fatigue detection system, the influence on the working state of the operator is reduced by adopting non-contact hardware equipment; for the problem of single data in the non-contact visual fatigue detection system, the eye movement data, the image data and the depth data of an operator are acquired through the data acquisition system, so that richer information is provided for judging the visual fatigue state; for the problem that the real-time performance is difficult to meet in a visual fatigue detection system, image data is converted into text data by adopting a computer vision method, so that the calculation efficiency is improved; for the problem of high difficulty in manually designing the features in the visual fatigue detection system, the method of manually designing the features is avoided and the accuracy is greatly improved by adopting a deep learning method and an end-to-end mode.

According to the above thought, the technical scheme of the invention is as follows:

1. the utility model provides an online visual fatigue detecting system based on deep learning, includes data acquisition module, image data processing module and fatigue detection module, its characterized in that:

the data acquisition module is arranged right below the computer display and comprises an eye movement data acquisition submodule and an RGB image and depth data acquisition submodule which are respectively used for acquiring eye movement data and RGB image and depth information, the eye movement data acquired by the eye movement data acquisition submodule is input to the fatigue detection module, and the RGB image and depth information acquired by the RGB image and depth data acquisition submodule are input to the image data processing module;

the image data processing module comprises a face detection submodule, a face characteristic point extraction submodule and a depth information extraction submodule which are respectively used for detecting the position of a face in image data, detecting the face characteristic point in the image data and extracting the depth information of the characteristic point, and the face characteristic point data and the depth data which are output by the image data processing module are input to the fatigue detection module;

the fatigue detection module comprises a time sequence eye movement network, a space face network, a space depth network, a feature fusion network and a visual fatigue detection network, wherein the time sequence eye movement network, the space face network and the space depth network are connected in parallel and then are sequentially cascaded with the feature fusion network and the visual fatigue detection network, and the fatigue detection module is used for performing feature extraction, feature fusion and classification on eye movement data, human face feature point data and depth data by adopting a deep learning method and outputting the fatigue degree of an operator.

Further, the structure and parameters of each network in the fatigue detection module are as follows:

the time sequence eye movement network has a structure that an input layer → a first convolution layer → a second convolution layer → a third convolution layer → a fourth convolution layer → a fifth convolution layer;

the spatial face network has a structure of an input layer → a 1 st convolution layer → a 2 nd convolution layer → a 3 rd convolution layer → a 4 th convolution layer → a 5 th convolution layer → a 6 th convolution layer;

the spatial depth network has a structure of an input layer → a first convolution layer → a second convolution layer → a third convolution layer;

the feature fusion network has a structure of convolution layer → first full-link layer → second full-link layer, and the size of the first full-link layer is l₁The second full connection layer has a size of l₂；

The visual fatigue detection network comprises a Softmax function with an input size of l₂Tensor of, output size l₃The index of the maximum value in the output tensor is the fatigue degree of the operator.

Each convolution layer in the time sequence eye movement network, the spatial face network and the spatial depth network uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, the convolution layer in the characteristic fusion network uses a one-dimensional convolution kernel with the size of 1, and the convolution step length is 1.

2. A method for on-line visual fatigue detection using the system of claim 1, comprising:

1) collecting data:

1a) obtaining a size T by an eye movement data acquisition submodule_EEye movement data of xE, where E represents the dimension of the eye movement data, T_E＝F_E×n，T_ENumber of frames representing eye movement data, F_ERepresenting the sampling rate of the eye tracker and n representing the time of acquisition;

1b) obtaining the size of MxNxT through the RGB image and depth data acquisition submodule_IImage sequence of size M × N × T_IWherein M represents the width of the image, N represents the height of the image, and T represents the depth data of_I＝F_I×n，T_IRepresenting the number of frames of the image sequence, F_IRepresenting the number of frames per second FPS, n representing the time of acquisition;

2) processing the image data:

2a) inputting RGB image data into a face detection submodule using a histogram of oriented gradients HOG and Support Vector Machine (SVM) method in a Dlib libraryA face detection algorithm for outputting the coordinate P of the upper left corner of a rectangular frame for marking the face position in the image₁And the coordinate P of the lower right corner of the rectangular frame₂；

2b) Inputting the face position data output in the step 2a) into a face characteristic point extraction submodule, wherein the submodule detects 68 a coordinate set of the face characteristic points by using a face characteristic point detection extraction algorithm based on a gradient lifting decision tree GBDT in a Dlib library, and extracts 20 characteristic point positions including binocular characteristic points and internal characteristic points of the mouth;

2c) inputting the positions of the feature points obtained in the step 2b) into a depth information extraction submodule to obtain depth data corresponding to the positions of the feature points;

3) detecting the fatigue degree of an operator:

3a) respectively inputting the eye movement data collected in the step 1a), the face feature point data output in the step 2b) and the depth data output in the step 2c) into a time sequence eye movement network, a space face network and a space depth network in a fatigue detection module, and extracting different features in the data, namely a sequence eye movement feature x by using a deep learning method_gSpatial facial features x_iAnd spatial depth feature x_d；

3b) Inputting the different features output in 3a) into a feature fusion network in a fatigue detection module, and outputting fusion features;

3c) inputting the fusion characteristics output in 3b) into a visual fatigue detection network in a fatigue detection module, and outputting the fatigue degree of an operator.

Compared with the prior art, the invention has the following advantages:

firstly, ensure the normal work of the operator

The system of the invention adopts the non-contact data acquisition module, namely the data acquisition module is arranged under the computer display, thereby ensuring that the effective data of the operator is acquired under the condition of not interfering the working state of the operator and avoiding the working interference to the operator.

Second, the acquired physiological information is rich

Because the system of the invention adopts the eye movement data acquisition submodule and the RGB image and depth data acquisition submodule to simultaneously acquire the eye movement data, the image data and the depth data of the operator, the problem of large deviation of estimating the visual fatigue degree of the operator caused by only acquiring single type of data in the prior art is solved, and the information category is increased.

Thirdly, the detection accuracy is high

Because the method of the invention uses a deep learning method, the end-to-end process is realized, the problem of low accuracy of visual fatigue detection results caused by great difficulty of manual design features in the prior art is solved, and the detection accuracy is improved;

fourthly, the real-time performance of the operation is strong

The method of the invention converts the image data processing problem into the text data processing problem by using the image processing method, overcomes the problem of poor system operation real-time performance caused by processing a large amount of image data by using a deep learning method in the prior art, and improves the real-time performance of visual fatigue detection operation.

Drawings

FIG. 1 is a schematic structural diagram of an online visual fatigue detection system based on deep learning according to the present invention;

FIG. 2 is a schematic diagram of a fatigue detection module in the system of the present invention;

FIG. 3 is a flow chart of an implementation of the on-line visual fatigue detection method based on deep learning according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and examples.

Referring to fig. 1, the online visual fatigue detection system of the present invention includes a data acquisition module 1, an image data processing module 2, and a fatigue detection module 3.

The data acquisition module 1 is arranged right below a computer display, is at a distance of 65cm to 85cm from an operator, and mainly comprises an eye movement data acquisition submodule 11 and an RGB image and depth data acquisition submodule 12, wherein:

the eye movement data acquisition submodule 11 enables the accuracy required by the visual fatigue detection system to be improvedUsing a Tobii Eye Tracker with a sampling rate of 90Hz, adopting an improved version of a PCCR telemetering type Eye movement tracking technology of a traditional pupil corneal reflex technology, using an image sensor to collect a near-infrared light source to generate reflection images on the cornea and the pupil of the Eye of an operator, and accurately calculating the space position and the sight line position of the Eye by an image processing algorithm built in a submodule, wherein the collection comprises a fixation point coordinate P_G(x, y), left eye spatial position P_LE(x, y, z), right eye spatial position P_RE(x, y, z), head position P_H(x, y, z), head pose R_HEye movement data of (x, y, z);

the RGB image and Depth data acquisition sub-module 12 acquires RGB image data and Depth data each having a resolution of 640 × 360 using an Intel real sense Depth Camera D435 i.

The image data processing module 2 mainly comprises a face detection submodule 21, a face feature point extraction submodule 22 and a depth information extraction submodule 23, wherein:

the face detection submodule 21 obtains the face position of the operator in the RGB image by using a face detection algorithm based on the HOG and SVM methods in the Dlib library according to the real-time performance required by the visual fatigue detection system;

the face feature point extraction submodule 22 obtains a 68 face feature point position set of an operator by adopting a face feature point detection extraction algorithm based on GBDT in a Dlib library according to the real-time required by a visual fatigue detection system, and outputs the set comprising 20 feature point positions of binocular feature points and internal feature points of a mouth, wherein the binocular feature point position data represents the opening and closing state of eyes of a human body, and the internal feature point position data of the mouth represents the opening and closing state of the mouth of the human body;

the depth information extraction sub-module 23 outputs depth data having a size of 75 × 20 corresponding to the extracted feature point position data.

The fatigue detection module 3 adopts a one-dimensional convolution neural network algorithm, uses an end-to-end method, directly obtains the fatigue degree of an operator from input data,

referring to fig. 2, the fatigue detection module includes a sequential eye movement network 31, a spatial facial network 32, a spatial depth network 33, a feature fusion network 34 and a visual fatigue detection network 35, and the sequential eye movement network 31, the spatial facial network 32 and the spatial depth network 33 are connected in parallel and then cascaded with the feature fusion network 34 and the visual fatigue detection network 35 in sequence, wherein:

the time sequence eye movement network 31 is formed by sequentially cascading an input layer and 5 convolution layers, wherein each convolution layer uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and the eye movement characteristic x of the output sequence_g；

The spatial face network 32 is formed by sequentially cascading an input layer and 6 convolutional layers, each convolutional layer uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and the output generates a spatial face feature x_i；

The spatial depth network 33 is composed of an input layer and 3 convolutional layers which are sequentially cascaded, each convolutional layer uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and a spatial depth characteristic x is output_d；

The feature fusion network 34 is formed by sequentially cascading 1 convolution layer and 2 fully-connected layers, wherein the convolution layer uses a one-dimensional convolution kernel with the size of 1, the convolution step length is 1, and a fusion feature x is output;

the visual fatigue detection network 35 classifies the visual fatigue by using a Softmax function, and outputs the degree of fatigue of the operator.

The steps of the online visual fatigue detection method of the present invention will be further described with reference to fig. 3.

Step 1, starting a data acquisition module.

Acquiring eye movement data with the size of 450 multiplied by 14 by an eye movement data acquisition submodule, wherein the eye movement data comprises a fixation point coordinate P_G(x, y), left eye spatial position P_LE(x, y, z), right eye spatial position P_RE(x, y, z), head position P_H(x, y, z), head pose R_H(x,y,z)；

Acquiring an image sequence with the size of 640 multiplied by 360 multiplied by 75 and depth data with the size of 640 multiplied by 360 multiplied by 75 through an RGB image and depth data acquisition submodule;

and 2, extracting the characteristic points of the human face and the corresponding depth information in the collected image data.

Inputting RGB image data into a face detection submodule, wherein the submodule outputs coordinates P of the upper left corner of a rectangular frame containing a face in an output image by using a face detection algorithm based on a direction gradient histogram HOG and a Support Vector Machine (SVM) method in a digital library₁And the coordinate P of the lower right corner of the rectangular frame₂；

Coordinate P of upper left corner of rectangular frame₁And the coordinate P of the lower right corner of the rectangular frame₂Inputting the data into a face characteristic point extraction submodule, wherein the submodule obtains 68 position sets of the face characteristic points by using a face characteristic point detection extraction algorithm based on a gradient lifting decision tree GBDT in a Dlib library, extracts 20 characteristic point positions including binocular characteristic points and internal characteristic points of a mouth in the set, and outputs face characteristic point data with the size of 40 multiplied by 75;

and inputting the face feature point data into a depth information extraction submodule, and outputting depth information corresponding to the feature point position.

And 3, detecting the fatigue degree of the operator.

Inputting the eye movement data into a time sequence eye movement network, and outputting a sequence eye movement characteristic x_g；

Inputting the position data of the feature point into the space face network, and outputting the space face feature x_i；

Inputting depth data into a spatial depth network, and outputting a spatial depth characteristic x_d；

Sequence eye movement feature x_gSpatial facial features x_iAnd spatial depth feature x_dInputting the information into a feature fusion network;

sequence eye movement feature x_gWith spatial facial features x_iSplicing is carried out, and primary splicing characteristic x is output_gi；

Will splice feature x once_giResults and spatial depth features x output over a convolutional layer_dSplicing and outputting secondary splicing characteristic x_gid；

Splicing the two times to form a characteristic x_gidSequentially passing through two full connection layers, and outputting a fusion characteristic x;

and inputting the fusion characteristics x into a visual fatigue detection network, and outputting the fatigue degree of an operator.

The above description is only one specific example of the present invention and does not constitute any limitation of the present invention. It will be apparent to persons skilled in the relevant art that various modifications and changes in form and detail can be made therein without departing from the principles and arrangements of the invention, but these modifications and changes are still within the scope of the invention as defined in the appended claims.

Claims

1. The utility model provides an online visual fatigue detecting system based on deep learning, includes data acquisition module (1), image data processing module (2) and tired detection module (3), its characterized in that:

the data acquisition module (1) is arranged right below a computer display and comprises an eye movement data acquisition submodule (11) and an RGB image and depth data acquisition submodule (12) which are respectively used for acquiring eye movement data and RGB image and depth information, the eye movement data acquired by the eye movement data acquisition submodule (11) are input into the fatigue detection module (3), and the RGB image and depth information acquired by the RGB image and depth data acquisition submodule (12) are input into the image data processing module (2);

the image data processing module (2) comprises a face detection submodule (21), a face characteristic point extraction submodule (22) and a depth information extraction submodule (23) which are respectively used for detecting the position of a face in image data, detecting the face characteristic points in the image data and extracting the depth information of the characteristic points, and the face characteristic point data and the depth data output by the image data processing module (2) are input to the fatigue detection module (3);

the fatigue detection module (3) comprises a time sequence eye movement network (31), a space face network (32), a space depth network (33), a feature fusion network (34) and a visual fatigue detection network (35), wherein the time sequence eye movement network (31), the space face network (32) and the space depth network (33) are connected in parallel and then are sequentially cascaded with the feature fusion network (34) and the visual fatigue detection network (35) for performing feature extraction, feature fusion and classification on eye movement data, human face feature point data and depth data by adopting a deep learning method, and outputting the fatigue degree of an operator.

2. The system according to claim 1, wherein the structure and parameters of each network in the fatigue detection module (3) are as follows:

the time-series eye movement network (31) has a structure of an input layer → a first convolution layer → a second convolution layer → a third convolution layer → a fourth convolution layer → a fifth convolution layer;

the spatial face network (32) has a structure of an input layer → a 1 st convolution layer → a 2 nd convolution layer → a 3 rd convolution layer → a 4 th convolution layer → a 5 th convolution layer → a 6 th convolution layer;

the spatial depth network (33) has a structure of an input layer → a first convolution layer → a second convolution layer → a third convolution layer;

the feature fusion network (34) has a structure of convolution layer → first full-link layer → second full-link layer, and the first full-link layer has a size of l₁The second full connection layer has a size of l₂；

The visual fatigue detection network (35) comprises a Softmax function with an input size l₂Tensor of, output size l₃The index of the maximum value in the output tensor is the fatigue degree of the operator.

Each convolution layer in the time sequence eye movement network (31), the space face network (32) and the space depth network (33) uses a one-dimensional convolution kernel with the size of 3, the convolution step is 1, the convolution layer in the characteristic fusion network (34) uses a one-dimensional convolution kernel with the size of 1, and the convolution step is 1.

3. A method for on-line visual fatigue detection using the system of claim 1, comprising:

(1) collecting data:

(1a) obtaining a size T by an eye movement data acquisition submodule_EEye movement data of xE, where E represents the dimension of the eye movement data, T_E＝F_E×n，T_ENumber of frames representing eye movement data, F_ERepresenting the sampling rate of the eye tracker and n representing the time of acquisition;

(1b) obtaining the size of MxNxT through the RGB image and depth data acquisition submodule_IImage sequence of size M × N × T_IWherein M represents the width of the image, N represents the height of the image, and T represents the depth data of_I＝F_I×n，T_IRepresenting the number of frames of the image sequence, F_IRepresenting the number of frames per second FPS, n representing the time of acquisition;

(2) processing the image data:

(2a) inputting RGB image data into a face detection submodule, wherein the submodule outputs a coordinate P of the upper left corner of a rectangular frame for marking the face position in an image by using a face detection algorithm based on a direction gradient histogram HOG and a Support Vector Machine (SVM) method in a digital image library₁And the coordinate P of the lower right corner of the rectangular frame₂；

(2b) Inputting the face position data output in the step (2a) into a face characteristic point extraction submodule, wherein the submodule detects 68 a face characteristic point coordinate set by using a face characteristic point detection extraction algorithm based on a gradient lifting decision tree GBDT in a Dlib library, and extracts 20 characteristic point positions including binocular characteristic points and internal characteristic points of the mouth;

(2c) inputting the positions of the feature points obtained in the step (2b) into a depth information extraction submodule to obtain depth data corresponding to the positions of the feature points;

(3) detecting the fatigue degree of an operator:

(3a) respectively inputting the eye movement data collected in the step (1a), the human face feature point data output in the step (2b) and the depth data output in the step (2c) into a time sequence eye movement network (31), a space face network (32) and a space depth network (33) in a fatigue detection module, and extracting different features in the data, namely a sequence eye movement feature x by using a deep learning method_gSpatial facial features x_iAnd spatial depth feature x_d；

(3b) Inputting the different features output in (3a) to a feature fusion network (34) in the fatigue detection module, and outputting the fusion features;

(3c) inputting the fusion feature outputted in (3b) to a visual fatigue detection network (35) in a fatigue detection module, and outputting the fatigue degree of the operator.

4. The method of claim 3, wherein the eye movement data collected in (1a) comprises: fixation point coordinate P_G(x, y), left eye spatial position P_LE(x, y, z), right eye spatial position P_RE(x, y, z), head position P_H(x, y, z), head pose R_H(x,y,z)。

5. A method according to claim 3, wherein the different features output in (3a) are input to a feature fusion network (34) in the fatigue detection module in (3b), and the output of the fused feature is the sequential eye movement feature x after the three different features are fused into the feature fusion_gWith spatial facial features x_iSplicing is carried out to obtain a splicing characteristic x_gi(ii) a The one-time splicing characteristic x is spliced again_giResults and spatial depth features x output over a convolutional layer_dSplicing to obtain a secondary splicing characteristic x_gid(ii) a The secondary splicing characteristic x_gidAnd sequentially passing through the two full connection layers, and outputting the fusion characteristic x.