WO2022257187A1 - 一种非接触式疲劳检测方法及系统 - Google Patents
一种非接触式疲劳检测方法及系统 Download PDFInfo
- Publication number
- WO2022257187A1 WO2022257187A1 PCT/CN2021/101744 CN2021101744W WO2022257187A1 WO 2022257187 A1 WO2022257187 A1 WO 2022257187A1 CN 2021101744 W CN2021101744 W CN 2021101744W WO 2022257187 A1 WO2022257187 A1 WO 2022257187A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- features
- feature
- time
- face
- domain
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 113
- 230000001815 facial effect Effects 0.000 claims abstract description 36
- 230000004927 fusion Effects 0.000 claims description 55
- 239000000284 extract Substances 0.000 claims description 32
- 238000012549 training Methods 0.000 claims description 27
- 238000005516 engineering process Methods 0.000 claims description 25
- 230000002123 temporal effect Effects 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 19
- 230000008859 change Effects 0.000 claims description 16
- 230000007246 mechanism Effects 0.000 claims description 16
- 230000000241 respiratory effect Effects 0.000 claims description 16
- 230000036626 alertness Effects 0.000 claims description 15
- 230000000875 corresponding effect Effects 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 12
- 230000035559 beat frequency Effects 0.000 claims description 12
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 230000001629 suppression Effects 0.000 claims description 12
- 230000007774 longterm Effects 0.000 claims description 9
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 6
- 210000004709 eyebrow Anatomy 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 6
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 238000010801 machine learning Methods 0.000 claims description 5
- 230000000306 recurrent effect Effects 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000007619 statistical method Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 4
- 230000006403 short-term memory Effects 0.000 claims description 4
- 230000003595 spectral effect Effects 0.000 claims description 4
- 238000000034 method Methods 0.000 abstract description 18
- 238000000605 extraction Methods 0.000 description 13
- 238000012360 testing method Methods 0.000 description 12
- 230000029058 respiratory gaseous exchange Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000005856 abnormality Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 210000000887 face Anatomy 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 208000032140 Sleepiness Diseases 0.000 description 1
- 206010041349 Somnolence Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000013186 photoplethysmography Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000037321 sleepiness Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 210000000115 thoracic cavity Anatomy 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording for evaluating the cardiovascular system, e.g. pulse, heart rate, blood pressure or blood flow
- A61B5/0205—Simultaneously evaluating both cardiovascular conditions and different types of body conditions, e.g. heart and respiratory condition
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/05—Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/15—Biometric patterns based on physiological signals, e.g. heartbeat, blood flow
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording for evaluating the cardiovascular system, e.g. pulse, heart rate, blood pressure or blood flow
- A61B5/024—Measuring pulse rate or heart rate
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/08—Measuring devices for evaluating the respiratory organs
- A61B5/0816—Measuring devices for examining respiratory frequency
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/725—Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7253—Details of waveform analysis characterised by using transforms
- A61B5/7257—Details of waveform analysis characterised by using transforms using Fourier transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
- G06F2218/10—Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
Definitions
- the invention belongs to the field of information technology, and more specifically relates to a non-contact fatigue detection method and system.
- Non-contact fatigue state detection has been gradually applied in many situations in academia and industry, such as: driving fatigue detection, learning fatigue detection, etc.
- Most of the existing fatigue state detection is based on video image processing technology, that is, by extracting facial image features in the video, such as: blink frequency and eye contour changes, etc., to judge the fatigue state.
- video image processing technology that is, by extracting facial image features in the video, such as: blink frequency and eye contour changes, etc., to judge the fatigue state.
- this technology has a high accuracy rate, there are still many defects, such as: it is easily affected by objective factors such as dim light, uneven illumination, and face deflection and tilt, resulting in detection errors; it is easy to be camouflaged by the subject, such as: The influence of subjective factors such as pretending to close eyes and faking facial expressions to conceal their true state will bring great interference to the test.
- fatigue detection based on physiological signals generally includes the following two types: (1) Extracting physiological signals based on wearable devices, such as: electrocardiogram, photoplethysmography, electroencephalogram, as a contact detection method, This method has problems such as inconvenient portability in real life; (2) Extracting physiological signals based on millimeter-wave radar, such as breathing and heart rate, as a non-contact detection method, this method has gradually received widespread attention in the field of industrial vehicles.
- Fatigue detection based on millimeter-wave radar has the advantages of low power consumption and high precision, but there are still some problems: as far as signal acquisition is concerned, millimeter-wave radar is easily interfered by factors such as environmental noise and body movement of the test subject. It cannot be properly solved; as far as signal processing is concerned, existing detection methods are often limited to the calculation of time-frequency domain features, in which time-frequency domain features such as peak-to-peak intervals do not pay attention to their nonlinear characteristics and sequence characteristics.
- the purpose of the present invention is to provide a non-contact fatigue detection method and system, aiming to solve the problem that the existing fatigue detection based on video image processing technology is easily interfered by environmental objective factors, human subjective factors, etc. Fatigue detection technology based on millimeter-wave radar is susceptible to interference from factors such as environmental noise and body movements of the subjects.
- the present invention provides a non-contact fatigue detection method, comprising the following steps:
- the millimeter-wave radar signal Send the millimeter-wave radar signal to the person to be detected, and receive the echo signal reflected from the person to be detected; perform clutter suppression and echo selection on the echo signal, extract the vital sign signal of the person to be detected, and determine the vital sign
- the vital sign signal includes: a respiratory signal and a heart rate signal;
- the feature of the person to be detected after fusion is input to the pre-trained classifier, and the fatigue state recognition of the person to be detected is carried out to judge whether the person to be detected is in a fatigue state;
- the state of the patient is divided into three states: vigilance, normal and fatigue; among them, vigilance and normal state are both non-fatigue states.
- the face detection and alignment are performed based on the face video image, so as to extract time domain features and spatial domain features of the face of the person to be detected, specifically:
- the position of the middle point of the face is calculated, and the face in the current face video image is calibrated and aligned by affine transformation;
- Adjust the aligned facial video image dataset to a picture of a preset size generate a frame sequence with L frame images as a group, and use the time-sensitive network processing flow to divide the frame sequence into K parts, from each Randomly select a frame in each part as the final input frame, obtain a sequence of K frames, and generate a data set; wherein, L and K are both integers greater than 0;
- the data set is input into the residual network ResNet50 to extract the spatial features of the face video image
- a mixed attention module is used to extract inter-frame correlation features.
- the mixed attention module is composed of two parts: a self-attention module and a spatial attention module; specifically: input the extracted spatial features into the self-attention module, and extract the single-frame Correlation features; input single-frame correlation features into the spatial attention module to extract spatial correlation features between adjacent frames; fuse the spatial features with single-frame correlation features and spatial correlation features between adjacent frames, and combine
- the feature input to the gated recurrent unit (Gated recurrent unit, GRU) to extract the time features of the facial video image;
- the spatial and temporal features of the facial video image are input to the fully connected layer, and the spatial and temporal features of the face to be detected are represented by the parameters of the fully connected layer.
- the vital sign signal of the subject to be detected is extracted, and the time-frequency domain characteristics, nonlinear characteristics and time sequence of the vital sign signal are determined. characteristics, specifically:
- Reconstructing the waveform of the echo signal specifically: using wavelet bandpass filtering to eliminate noise, and extracting the respiratory signal and heart rate signal respectively as the vital sign signal;
- the time-domain features, frequency-domain features and nonlinear features of vital signs signals are extracted respectively; for respiratory signals, the extracted time-domain features, frequency-domain features and nonlinear features include mean, variance, Power spectral density, fractal dimension and approximate entropy; for heart rate signals, the extracted time-domain features include single beat frequency features and multi-beat frequency features; the extracted frequency domain features include: low-frequency components, high-frequency components, low-frequency high-frequency ratios, The kurtosis and skewness of the frequency spectrum; the nonlinear features extracted include: approximate entropy, sample entropy, Li's index, Hurst index and detrend fluctuation index; the purpose of the single beat frequency feature is to extract the instantaneous change feature of each heartbeat; The purpose of the multi-beat frequency feature and the frequency domain feature is to extract the long-term change feature of multiple heartbeats; the purpose of the nonlinear feature is to further extract the nonlinear change feature of the heart rate, and
- time-series features first, set sub-sliding windows in the detection window, and extract the time-domain features, frequency-domain features and nonlinear features of vital sign signals in each sub-sliding window;
- the extracted relevant features are sequentially put into the combined model of the convolutional neural network CNN and the bidirectional long-term short-term memory neural network BiLSTM, and the features of its fully connected layer are extracted as the timing features of the vital sign signal;
- features that are relatively highly correlated with fatigue state classification are selected from the extracted features, and used as time-frequency domain features, nonlinear features, and time-series features of the final vital sign signal.
- the time-frequency domain features, nonlinear features and time series features of the vital sign signal are fused with the time domain features and space domain features of the face of the person to be detected to obtain the fused person to be detected characteristics, specifically:
- the classifier is trained through the following steps:
- the training sample includes the fusion features of a plurality of trainers; the fusion features of each trainer include the trainer's millimeter-wave radar features and facial video features; the millimeter-wave radar features include: vital sign signals
- the time-frequency domain feature, non-linear feature and timing feature; Described face video feature comprises: the time domain feature and the spatial domain feature of people's face;
- a state label to the data set corresponding to the fusion feature of each trainer to form a training data set for each trainer; the state label indicates the state of the trainer corresponding to the fusion feature; wherein, the state of the trainer belongs to alertness, normal and one of the three states of fatigue;
- the present invention provides a non-contact fatigue detection system, comprising:
- the millimeter wave feature determination unit is used to send the millimeter wave radar signal to the person to be detected, and receive the echo signal reflected from the person to be detected; perform clutter suppression and echo selection on the echo signal to extract the signal of the person to be detected Vital sign signals, and determine the time-frequency domain characteristics, nonlinear characteristics and time series characteristics of the vital sign signals;
- the vital sign signals include: respiratory signals and heart rate signals;
- a face video feature determination unit configured to acquire a face video image of the person to be detected, perform face detection and alignment based on the face video image, to extract time domain features and space domain features of the face of the person to be detected;
- the feature fusion unit is used to fuse the time-frequency domain features, nonlinear features and time-series features of the vital sign signal with the time domain features and space domain features of the face of the subject to be detected to obtain the fused features of the subject to be detected;
- the fatigue detection unit is used to input the fused features of the person to be detected into a pre-trained classifier to identify the fatigue state of the person to be detected, and judge whether the person to be detected is in a fatigue state; the classifier is based on the fused
- the detector feature divides the state of the person to be detected into three states: alertness, normal state, and fatigue state; wherein, alertness and normal state are both non-fatigue states.
- the facial video feature determination unit performs face detection on the facial video image, extracts facial feature points, and obtains a sequence of facial feature points; based on the sequence of facial feature points, according to the feature point
- the point information of the eyes and eyebrows in the middle of the face is calculated to calculate the position of the middle point of the face, and the face in the current face video image is calibrated and aligned by affine transformation;
- the aligned face video image data set is adjusted to the preset Set the size of the picture, generate a frame sequence with L frame images as a group, and use the time-sensitive network processing flow to divide the frame sequence into K parts, randomly select a frame from each part as the final input frame, Obtain a sequence of K frames to generate a data set; wherein, L and K are integers greater than 0;
- the data set is input into the residual network ResNet50 to extract the spatial features of the face video image; using mixed attention
- the module extracts inter-frame correlation features, and the mixed attention module is composed of a self-attention module
- the millimeter-wave feature determination unit performs waveform reconstruction on the echo signal, specifically: adopting wavelet band-pass filtering to eliminate noise, extracting the respiratory signal and heart rate signal respectively as vital signs signal; use time-frequency analysis and nonlinear analysis techniques to extract time-domain features, frequency-domain features and nonlinear features of vital sign signals; for respiratory signals, the extracted time-frequency domain and nonlinear features include mean, variance, and power spectrum Density, fractal dimension and approximate entropy; for heart rate signals, the extracted time domain features include single beat frequency features and multi beat frequency features; the extracted frequency domain features include: low frequency components, high frequency components, low frequency high frequency ratio, spectrum Kurtosis and skewness; the nonlinear features extracted include: approximate entropy, sample entropy, Li's index, Hurst index and detrend fluctuation index; the purpose of the single beat frequency feature is to extract the instantaneous change feature of each heartbeat; the The purpose of the multi-beat frequency feature and the frequency domain feature is to extract the
- the feature fusion unit adopts polynomial feature generation and deep feature synthesis technology to fuse the time-frequency domain features, nonlinear features and timing features of the vital sign signals of the sliding detection window and its sub-windows and the waiting time Detect the time-domain features and spatial features of the person's face to obtain preliminary fusion features; combine the preliminary fusion features with the time-frequency domain features, nonlinear features, and timing features of the vital signs signal and the time-domain features and spatial features of the person's face to be detected Merge; obtain the merged features; adopt the Transformer model to screen the merged features of the sliding detection window and its sub-windows based on the attention mechanism; wherein, for the sliding window, perform feature selection based on the attention mechanism; for the sub-windows After putting the relevant features into the Transformer sequential model in chronological order, feature selection is performed based on the attention mechanism; the features selected by the sliding window and the sub-window are combined to obtain the fused features of the person to be detected.
- the system further includes a classifier training unit, configured to determine training samples, the training samples include fusion features of multiple trainers; the fusion features of each trainer include the trainer's millimeter wave Radar features and facial video features; the millimeter-wave radar features include: time-frequency domain features, nonlinear features, and timing features of vital sign signals; the facial video features include: time-domain features and air-space features of people's faces; Add a state label to the data set corresponding to the fusion feature of each trainer to form a training data set for each trainer; the state label indicates the state of the trainer corresponding to the fusion feature; wherein, the state of the trainer belongs to alertness, normal And one of the three states of fatigue; input the training data set of each trainer into the classifier to train the learning classifier in combination with the state labels therein, and obtain the trained classifier.
- the training samples include fusion features of multiple trainers
- the fusion features of each trainer include the trainer's millimeter wave Radar features and facial video features
- the invention provides a non-contact fatigue detection method and system, which collects the millimeter wave data and video image data of the person to be tested at the same time, and sends the millimeter wave radar signal to the subject through the millimeter wave transceiver module in the millimeter wave radar detection part And collect its echo signal, extract vital sign signals such as breathing and heart rate from it, and calculate its related features;
- the features extracted by the method are fused, and fatigue detection is performed on this basis.
- the two technologies are fused to effectively suppress the interference of subjective and objective factors.
- the temporal and spatial features of the video image are extracted through the mixed attention mechanism. Deep learning extracts the nonlinear features and timing features of millimeter-wave radar to further improve the accuracy of fatigue detection.
- This method uses non-contact technology for fatigue detection, which has strong flexibility, makes up for the defects of single detection technology, and improves the robustness of detection.
- FIG. 1 is a flow chart of a non-contact fatigue detection method provided by an embodiment of the present invention
- Fig. 2 is a flow chart of the feature fusion part provided by the embodiment of the present invention.
- FIG. 3 is a block diagram of a non-contact fatigue detection system provided by an embodiment of the present invention.
- FIG. 4 is a schematic diagram of a sliding window provided by an embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of the CNN+BiLSTM provided by the embodiment of the present invention.
- FIG. 6 is a schematic diagram of a single beat frequency waveform and its characteristic points of a heart rate signal provided by an embodiment of the present invention
- FIG. 7 is a structural diagram of a non-contact fatigue detection system provided by an embodiment of the present invention.
- Fig. 1 is the flowchart of the non-contact fatigue detection method provided by the embodiment of the present invention; as shown in Fig. 1 , it includes the following steps:
- the vital sign signal includes: respiratory signal and heart rate signal;
- the subject's vital sign signal including : respiration and heart rate
- extract its corresponding time-frequency domain features, nonlinear features and time-series features from the subjects’ vital signs signals perform feature fusion and select typical features related to the fatigue state.
- S102 acquiring a face video image of the person to be detected, and performing face detection and alignment based on the face video image, so as to extract time-domain features and space-domain features of the face of the person to be detected;
- the features obtained by the millimeter-wave radar and video image processing technology are fused. Specifically, the process is shown in Figure 2. If there is an abnormality in the millimeter-wave, such as: there is environmental interference or the subject continues to There is no millimeter-wave feature output, delete the millimeter-wave data in the detection window, and perform fatigue detection based on the video image; if there is an abnormality in the video image and the human face cannot be detected, such as: the ambient light is dim, the face is tilted and deflected etc.; there is no video image feature output, delete the video data in the detection window, and perform fatigue detection based on the millimeter wave radar; if neither the millimeter wave nor the video image detects the subject, it will display no subject because there is no feature to perform subsequent calculations object, re-enter the loop, and continue to perform fatigue detection in the next detection window; if the test results of the subject in the two methods are normal, then comprehensively use the features extracted by millimeter wave detection and video image detection,
- S104 input the fused features of the subject to be detected into a pre-trained classifier, identify the fatigue state of the subject to be detected, and judge whether the subject to be detected is in a fatigue state; the classifier will
- the state of the subject to be detected is divided into three states: alertness, normal state and fatigue; wherein, alertness and normal state are both non-fatigue states.
- step S101 is implemented through the following process: transmitting a millimeter-wave radar signal to the subject; receiving the echo signal reflected from the subject; performing clutter suppression and echo selection on the echo signal Extract the vital sign signals of the subjects; calculate the time-frequency domain, nonlinear and time series characteristics of the vital sign signals in each sliding detection window.
- step S102 is realized through the following process: real-time collection of video image information of the subject; extraction of face feature points; face alignment; generation of input data sets; input into the ResNet network for spatial feature extraction; input mixed attention
- the module performs inter-frame correlation feature extraction; input GRU unit for temporal feature extraction; feature input fully connected layer.
- the invention provides a non-contact fatigue detection method and system, which have the advantages of high reliability, strong robustness, low power, good convenience and the like.
- the detection principle is as follows: First, the detection system simultaneously monitors millimeter wave data and video image data, in which the millimeter wave module emits low power consumption millimeter waves to the subject, and detects the echo generated by the reflection of the signal from the human body (such as: chest cavity, etc.). Wave signals, from which vital signs such as heart rate and respiration are extracted, and their time-frequency domain, nonlinear and time-series characteristics are calculated. Secondly, for the video image data, operations such as face detection, face feature point extraction, and face alignment are performed, and on this basis, its temporal and spatial feature information is extracted. Finally, fatigue detection is performed based on a classifier based on the fusion of mmWave and video image features.
- Fig. 3 is a block diagram of the non-contact fatigue detection system provided by the embodiment of the present invention, as shown in Fig. 3, mainly including:
- the millimeter wave radar part mainly includes: (1) millimeter wave transceiver module; (2) real-time signal processing module; (3) feature extraction module.
- the millimeter wave transceiver module is specifically: transmitting millimeter waves and receiving millimeter wave echo signals.
- the transmitter generates a chirp signal, and after passing through the power amplifier, the transmitting antenna sends out a chirp (ie: chirp pulse) with a cycle of T f , a sawtooth wave with a frequency modulation bandwidth of B, and its frame period (ie, sawtooth wave repetition period , including multiple Chirps in each frame period) is T i .
- the receiving antenna at the receiving end detects and preprocesses echo signals generated by reflections from various objects and human bodies in the receiving environment.
- the real-time signal processing module specifically includes: performing real-time acquisition and processing of echo signals, and extracting heart rate and respiration signals. It mainly includes four steps: real-time acquisition of echo signals, clutter suppression, echo selection, and waveform reconstruction. The specific process is as follows:
- the echo signal of the millimeter wave may include various clutter interferences.
- Adaptive background subtraction and singular value decomposition are used respectively to filter out stationary noise from static objects (reflection signals) such as tables and walls and non-stationary noise from moving objects (reflection signals).
- Echo selection Accurately locate the distance of the subject, and select a column of signals representing the distance unit from the echo signal matrix Q, which includes the original signal of the subject's heart rate and respiration. Specifically, firstly, Fourier transform is performed on each row of the echo signal to obtain an N ⁇ M distance matrix R. Among them: N represents the number of frames, M represents the number of sampling points of each Chirp; each column of the matrix R represents a distance unit. Next, calculate the energy sum over each range cell, Thirdly, find out the column m max where the maximum energy sum is located, and the distance unit represented by this column is the distance between the subject and the fatigue detection system. Fourth, extract its m max column signal from the matrix Q, use the arctangent function to calculate the phase and perform the phase unwrapping operation.
- the feature extraction module comprehensively utilizes time-frequency domain analysis, nonlinear analysis and deep learning to extract relevant features.
- a sliding detection window (for example: a sliding window with a length of 20s and a step size of 1s) is set as a buffer to extract the relevant features of millimeter wave data and video picture data in the window, as shown in Figure 4 shown.
- the feature extraction part based on millimeter-wave radar includes two steps: feature calculation and feature selection:
- Feature calculation Compute features for respiration and heart rate signals within a sliding detection window.
- the specific operation process is as follows. Firstly, time-frequency analysis and nonlinear analysis techniques are comprehensively used to extract the time-domain, frequency-domain and nonlinear features of the vital sign signal respectively. Among them, for the respiratory signal, the extracted time-frequency domain and nonlinear features include mean value, variance, power spectral density, fractal dimension and approximate entropy.
- the heart rate signal as an example, its characteristics are shown in Table 1 below, mainly including: bpm, ibi, sdnn, sdsd, rmssd, pnn20, pnn50 and other time-domain features; low-frequency components, high-frequency components, low-frequency high-frequency ratio Equal frequency domain features; and nonlinear features such as approximate entropy, sample entropy, Lee's index, and Hurst index. Second, extract temporal features.
- sub-sliding windows are set (for example: in the 20s detection window, a sub-sliding window with a window length of 10s and a step size of 1s is further set to further subdivide the detection window), and the time points in each sub-window are respectively extracted.
- the relevant features are put into the CNN+BiLSTM model in chronological order, and the features of its fully connected layer are extracted to quantify the dynamic changes of heart rate and respiratory signals.
- the CNN+BiLSTM model is shown in flowchart 5, including 1 CNN layer, 2 BiLSTM layers, 1 Attention layer, and 2 Dense layers.
- the purpose of the single beat frequency feature is to extract the instantaneous change feature of each heartbeat;
- the purpose of the multi-beat frequency feature and the frequency domain feature is to extract the long-term change feature of multiple heartbeats;
- the purpose of the nonlinear feature is to further extract the heart rate
- the non-linear change feature of the non-linear feature has a strong correlation with the fatigue state, which can improve the recognition accuracy of the fatigue state of the classifier.
- the video image processing part mainly includes: (1) video acquisition module; (2) real-time signal processing module; (3) feature extraction module.
- the video acquisition module is specifically: use the video acquisition equipment to collect the video image data of the subject in real time, and send the data back to the host computer in real time and save it for timely processing.
- the real-time signal processing module mainly includes three steps: face detection, face alignment and data set generation.
- the specific processing process is as follows:
- (2.1) Face detection Extracting a facial feature point sequence, that is, in an optional embodiment, acquiring facial data in a video image, and performing facial feature point extraction.
- the harr feature extraction method is used to extract the range of interest (ROI) area of the face by detecting the gray level change of the image, and sum the pixel coordinates in the area.
- ROI range of interest
- the landmark algorithm in the dlib library uses the landmark algorithm in the dlib library to extract 68 feature points of the face (including: eyebrows, eyes, nose, mouth and facial contours, etc.), and obtain the feature point sequence p (t) :
- the feature extraction module specifically includes: performing feature extraction on the data set generated in the above steps, and giving the identification result of fatigue detection by means of a classifier.
- the fused features are input into the GRU unit to extract the temporal features of the video sequence.
- the fused features are input into the GRU unit to extract the temporal features of the video sequence.
- the parameters of the fully connected layer are used to characterize the temporal and spatial characteristics of the video sequence.
- the technology fusion part includes: (1) algorithm design; (2) feature fusion; (3) fatigue detection.
- Algorithm design is specifically to judge the current test status when millimeter wave technology and video image technology are integrated. (i) If there is an abnormality in the video image (such as: the ambient light is too dark or the face is tilted and deflected) and the face cannot be detected, then (no video feature output, delete the video data in the sliding detection window) perform fatigue based on the millimeter wave radar feature Detection; (ii) If there is an abnormality in the millimeter wave (such as the subject continuously shaking or other strong interference during the test), then (no millimeter wave characteristic output, delete the millimeter wave data in the sliding detection window) based on the video image features Fatigue detection; (iii) if both video and millimeter wave are abnormal, it will display abnormal detection or no target to be tested, and return to the cycle to continue monitoring; (iv) if both video and millimeter wave are normal, then based on the fusion of the two features On, the fatigue state identification is carried out with the aid of classifiers.
- Feature fusion is specifically, firstly, using polynomial feature generation and deep feature synthesis technology to fuse the millimeter-wave features and video image features of the sliding detection window and its sub-sliding windows, and initially realize the fusion of two technology-related features.
- the features after preliminary fusion are merged with millimeter-wave features and video image features to form merged features; secondly, the Transformer model is used to screen the sliding detection window and its sub-windows related merged features based on the attention mechanism.
- the feature selection is based on the attention mechanism; for the sub-window, the relevant features are put into the Transformer sequential model in chronological order, and the feature selection is based on the attention mechanism; the features selected by the sliding window and the sub-window Combine to get the fused features.
- Fatigue detection is specifically to build a three-category model based on Transformer to identify the three states of alertness, normal, and fatigue.
- the accuracy rate, confusion matrix, ROC curve and AUC area are used as the evaluation indicators of fatigue detection. The larger the accuracy value and the AUC area, the better the recognition effect; the confusion matrix shows the specific prediction accuracy of each category.
- the three-category model that is, the classifier is trained through the following steps:
- the training sample includes the fusion features of a plurality of trainers; the fusion features of each trainer include the trainer's millimeter-wave radar features and facial video features; the millimeter-wave radar features include: vital sign signals
- the time-frequency domain feature, non-linear feature and timing feature; Described face video feature comprises: the time domain feature and the spatial domain feature of people's face;
- a state label to the data set corresponding to the fusion feature of each trainer to form a training data set for each trainer; the state label indicates the state of the trainer corresponding to the fusion feature; wherein, the state of the trainer belongs to alertness, normal and one of the three states of fatigue;
- Fig. 7 is an architecture diagram of a non-contact fatigue detection system provided by an embodiment of the present invention, as shown in Fig. 6, including:
- the millimeter-wave feature determination unit 710 is configured to send a millimeter-wave radar signal to the subject to be detected, and receive an echo signal reflected from the subject to be detected; perform clutter suppression and echo selection on the echo signal and extract the subject to be detected
- the vital sign signal and determine the time-frequency domain characteristics, nonlinear characteristics and time series characteristics of the vital sign signal; the vital sign signal includes: respiratory signal and heart rate signal;
- the face video feature determination unit 720 is used to acquire the face video image of the person to be detected, and perform face detection and alignment based on the face video image, so as to extract the time domain feature and the space domain feature of the face of the person to be detected;
- a feature fusion unit 730 configured to fuse the time-frequency domain features, nonlinear features, and time-series features of the vital sign signal with the time-domain features and spatial features of the face of the subject to be detected, to obtain the fused features of the subject to be detected ;
- Fatigue detection unit 740 for inputting the characteristics of the person to be detected after fusion into a pre-trained classifier, performing fatigue state recognition of the person to be detected, and judging whether the person to be detected is in a fatigue state; the classifier is based on the fused
- the characteristics of the person to be detected divide the state of the person to be detected into three states: alertness, normal state, and fatigue state; wherein, the alertness state and the normal state are non-fatigue states.
- the classifier training unit 750 is used to determine the training samples, the training samples include the fusion features of multiple trainers; the fusion features of each trainer include the trainer's millimeter-wave radar features and facial video features; Wave radar features include: time-frequency domain features, nonlinear features, and timing features of vital sign signals; the facial video features include: time-domain features and spatial features of faces; data corresponding to fusion features of each trainer Concentratingly add state labels to form each trainer's training data set; the state label indicates the state of the trainer corresponding to the fusion feature; wherein, the state of the trainer belongs to one of the three states of alertness, normal and fatigue; The training data set of each trainer is input into the classifier to train and learn the classifier in combination with the state labels in it to obtain the trained classifier.
- the accuracy rate increases from 0.698 to 0.752.
- the fatigue detection method and system proposed by the present invention in which the millimeter wave introduces nonlinear features, and further combines the millimeter wave timing features and video features for fatigue detection, can accurately identify the fatigue state, and its recognition accuracy rate reaches 0.979, which can accurately To identify the fatigue state, the recognition accuracy rate reaches 0.979.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- General Physics & Mathematics (AREA)
- Surgery (AREA)
- Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Evolutionary Computation (AREA)
- Physiology (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Cardiology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Fuzzy Systems (AREA)
- Pulmonology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Image Analysis (AREA)
Abstract
一种非接触式疲劳检测方法及系统,其中的方法主要包括:向待检测者发送毫米波雷达信号,并接收从待检测者反射的回波信号,确定生命体征信号的时频域特征、非线性特征以及时序特征(S101);获取待检测者的脸部视频图像,基于脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征(S102);将生命体征信号的时频域特征、非线性特征以及时序特征与待检测者人脸的时域特征和空域特征融合,得到融合后的特征(S103);将融合后的特征输入到分类器,进行待检测者的疲劳状态识别,判断待检测者是否处于疲劳状态(S104)。该方法将两种检测技术融合,从而有效抑制主客观因素的干扰,提高疲劳检测的精度。
Description
本发明属于信息技术领域,更具体地,涉及一种非接触式疲劳检测方法及系统。
非接触式的疲劳状态检测已逐渐被应用在学术界和工业界的很多情境中,如:驾驶疲劳检测、学习疲劳检测等。现有的疲劳状态检测大多基于视频图像处理技术,即:通过提取视频中人脸图像特征,如:眨眼频率及眼部轮廓变化等特征,判断其疲劳状态。尽管该技术已有较高的准确率,但仍存在诸多缺陷,如:容易受光线昏暗、光照不均、面部偏转倾斜等客观因素的影响,从而导致检测误差;容易受被试伪装,如:通过假装闭眼、伪装面部表情等掩饰其真实状态等主观因素的影响,从而对测试带来极大干扰。
而与此同时,由于生理信号具有独特性和稳定性等优点,基于生理信号的疲劳状态检测不断发展。概括来说,基于生理信号的疲劳检测一般包括以下两种:(1)基于可穿戴设备提取生理信号,如:心电图、光电容积脉搏波描记法、脑电图,作为一种接触式检测手段,该方式在实际生活中存在不便携带等问题;(2)基于毫米波雷达提取生理信号,如:呼吸和心率,作为一种非接触式检测手段,该方式已逐渐在工业汽车领域受到广泛关注。基于毫米波雷达的疲劳检测具有功耗小、精度高等优点,但也仍存在一定问题:就信号采集而言,毫米波雷达易受环境噪声及被试肢体运动等因素的干扰,现有方法尚无法妥善解决;就信号处理而言,现有检测手段往往局限于时频域特征计算,其中,时频域特征如峰峰间隔,而没有关注其非线性特征及时序特征。
【发明内容】
针对现有技术的缺陷,本发明的目的在于提供一种非接触式疲劳检测方法及系统,旨在解决现有基于视频图像处理技术的疲劳检测易受环境客观因素、人为主观因素等干扰,现有基于毫米波雷达的疲劳检测技术易受环境噪声及被试肢体运动等因素干扰问题。
为实现上述目的,第一方面,本发明提供了一种非接触式疲劳检测方法,包括如下步骤:
向待检测者发送毫米波雷达信号,并接收从待检测者反射的回波信号;对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征;所述生命体征信号包括:呼吸信号和心率信号;
获取待检测者的脸部视频图像,基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征;
将所述生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征;
将融合后的待检测者特征输入到预训练好的分类器,进行待检测者的疲劳状态识别,判断待检测者是否处于疲劳状态;所述分类器基于融合后的待检测者特征将待检测者的状态划分为:警觉、正常以及疲劳三种状态;其中,警觉和正常状态均为非疲劳状态。
在一个可选的示例中,所述基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征,具体为:
对脸部视频图像进行人脸检测,提取人脸特征点,得到人脸特征点序列;
基于人脸特征点序列,根据特征点中眼睛和眉毛区域点位信息,计算出人脸中间点的位置,采用仿射变换对当前脸部视频图像中的人脸进行校 准和对齐;
将对齐后的脸部视频图像数据集调整为预设大小的图片,以L帧图像为一组生成一个帧序列,并借鉴时间敏感性网络处理流程,将帧序列划分成K个部分,从每个部分里随机选取一帧作为最终的输入帧,得到一个K帧的序列,生成一个数据集;其中,L和K均为大于0的整数;
将所述数据集输入到残差网络ResNet50中,提取脸部视频图像的空间特征;
采用混合注意力模块提取帧间关联特征,所述混合注意力模块由自注意力模块和空间注意力模块两部分组成;具体为:将所提取的空间特征输入自注意力模块,提取单帧的关联特征;将单帧关联特征输入空间注意力模块提取相邻帧间的空间关联特征;将所述空间特征与单帧的关联特征、相邻帧间的空间关联特征相融合,并将融合后的特征输入到门控循环单元(Gated recurrent unit,GRU),提取脸部视频图像的时间特征;
将脸部视频图像的空间特征和时间特征输入到全连接层,用全连接层参数表征待检测者人脸的空域特征和时域特征。
在一个可选的示例中,所述对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征,具体为:
对所述回波信号进行波形重构,具体为:采用小波带通滤波剔除噪声,分别提取其中的呼吸信号和心率信号,作为生命体征信号;
利用时频分析和非线性分析技术,分别提取生命体征信号的时域特征、频域特征和非线性特征;对呼吸信号,提取的时域特征、频域特征和非线性特征包括均值、方差、功率谱密度、分形维数和近似熵;对心率信号,提取的时域特征包括单拍频特征和多拍频特征;提取的频域特征包括:低频分量、高频分量、低频高频比、频谱的峰度以及偏度;提取的非线性特 征包括:近似熵、样本熵、李氏指数、Hurst指数以及去趋势波动指数;所述单拍频特征目的是提取每次心跳的瞬时变化特征;所述多拍频特征和频域特征目的是提取多次心跳的长期变化特征;所述非线性特征的目的是进一步提取心率的非线性变化特征,所述非线性特征与疲劳状态具备很强的相关性,可提高分类器疲劳状态识别精度;
利用深度学习技术,提取时序特征:首先,在检测窗口内,设置子滑动窗口,分别提取各子滑动窗口内生命体征信号的时域特征、频域特征和非线性特征;其次,按时间顺序将提取的相关特征依次放入卷积神经网络CNN与双向长短期记忆神经网络BiLSTM组合的模型,提取其全连接层的特征,作为生命体征信号的时序特征;
基于统计分析和机器学习,从提取的特征中筛选出与疲劳状态分类相关性相对较高的特征,作为最终的生命体征信号的时频域特征、非线性特征以及时序特征。
在一个可选的示例中,所述将生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征,具体为:
采用多项式特征生成和深度特征合成技术,融合滑动检测窗口及其子窗口的生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征,得到初步融合特征;
将初步融合特征与生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征合并;得到合并后的特征;
采用Transformer模型,基于注意力机制分别筛选滑动检测窗口及其子窗口的所述合并后的特征;其中,对于滑动窗口,基于注意力机制进行特征选择;对于子窗口,按时间顺序将相关特征依次放入Transformer时序模型后,基于注意力机制进行特征选择;将滑动窗口及子窗口所选择的特征 进行合并,得到融合后的待检测者特征。
在一个可选的示例中,所述分类器通过如下步骤训练:
确定训练样本,所述训练样本包括多个训练者的融合特征;每个训练者的融合特征包括该训练者的毫米波雷达特征和脸部视频特征;所述毫米波雷达特征包括:生命体征信号的时频域特征、非线性特征以及时序特征;所述脸部视频特征包括:人脸的时域特征和空域特征;
在每个训练者的融合特征对应的数据集中加入状态标签,组成每个训练者的训练数据集;所述状态标签指示所述融合特征对应的训练者状态;其中,训练者状态属于警觉、正常以及疲劳三种状态中的一种;
将每个训练者的训练数据集输入到分类器中,以结合其中的状态标签训练学习分类器,得到训练后的分类器。
第二方面,本发明提供了一种非接触式疲劳检测系统,包括:
毫米波特征确定单元,用于向待检测者发送毫米波雷达信号,并接收从待检测者反射的回波信号;对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征;所述生命体征信号包括:呼吸信号和心率信号;
脸部视频特征确定单元,用于获取待检测者的脸部视频图像,基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征;
特征融合单元,用于将所述生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征;
疲劳检测单元,用于将融合后的待检测者特征输入到预训练好的分类器,进行待检测者的疲劳状态识别,判断待检测者是否处于疲劳状态;所述分类器基于融合后的待检测者特征将待检测者的状态划分为:警觉、正 常以及疲劳三种状态;其中,警觉和正常状态均为非疲劳状态。
在一个可选的示例中,所述脸部视频特征确定单元,对脸部视频图像进行人脸检测,提取人脸特征点,得到人脸特征点序列;基于人脸特征点序列,根据特征点中眼睛和眉毛区域点位信息,计算出人脸中间点的位置,采用仿射变换对当前脸部视频图像中的人脸进行校准和对齐;将对齐后的脸部视频图像数据集调整为预设大小的图片,以L帧图像为一组生成一个帧序列,并借鉴时间敏感性网络处理流程,将帧序列划分成K个部分,从每个部分里随机选取一帧作为最终的输入帧,得到一个K帧的序列,生成一个数据集;其中,L和K均为大于0的整数;将所述数据集输入到残差网络ResNet50中,提取脸部视频图像的空间特征;采用混合注意力模块提取帧间关联特征,所述混合注意力模块由自注意力模块和空间注意力模块两部分组成;具体为:将所提取的空间特征输入自注意力模块,提取单帧的关联特征;将单帧关联特征输入空间注意力模块提取相邻帧间的空间关联特征;将所述空间特征与单帧的关联特征、相邻帧间的空间关联特征相融合,并将融合后的特征输入到门控循环单元GRU,提取脸部视频图像的时间特征;以及将脸部视频图像的空间特征和时间特征输入到全连接层,用全连接层参数表征待检测者人脸的空域特征和时域特征。
在一个可选的示例中,所述毫米波特征确定单元对所述回波信号进行波形重构,具体为:采用小波带通滤波剔除噪声,分别提取其中的呼吸信号和心率信号,作为生命体征信号;利用时频分析和非线性分析技术,分别提取生命体征信号的时域特征、频域特征和非线性特征;对呼吸信号,提取的时频域和非线性特征包括均值、方差、功率谱密度、分形维数和近似熵;对心率信号,提取的时域特征包括单拍频特征和多拍频特征;提取的频域特征包括:低频分量、高频分量、低频高频比、频谱的峰度以及偏度;提取的非线性特征包括:近似熵、样本熵、李氏指数、Hurst指数以及 去趋势波动指数;所述单拍频特征目的是提取每次心跳的瞬时变化特征;所述多拍频特征和频域特征目的是提取多次心跳的长期变化特征;所述非线性特征的目的是进一步提取心率的非线性变化特征,所述非线性特征与疲劳状态具备很强的相关性,可提高分类器疲劳状态识别精度;利用深度学习技术,提取时序特征:首先,在检测窗口内,设置子滑动窗口,分别提取各子滑动窗口内生命体征信号的时域特征、频域特征和非线性特征;其次,按时间顺序将提取的相关特征依次放入卷积神经网络CNN与双向长短期记忆神经网络BiLSTM组合的模型,提取其全连接层的特征,作为生命体征信号的时序特征;基于统计分析和机器学习,从提取的特征中筛选出与疲劳状态分类相关性相对较高的特征,作为最终的生命体征信号的时频域特征、非线性特征以及时序特征。
在一个可选的示例中,所述特征融合单元,采用多项式特征生成和深度特征合成技术,融合滑动检测窗口及其子窗口的生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征,得到初步融合特征;将初步融合特征与生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征合并;得到合并后的特征;采用Transformer模型,基于注意力机制分别筛选滑动检测窗口及其子窗口的所述合并后的特征;其中,对于滑动窗口,基于注意力机制进行特征选择;对于子窗口,按时间顺序将相关特征依次放入Transformer时序模型后,基于注意力机制进行特征选择;将滑动窗口及子窗口所选择的特征进行合并,得到融合后的待检测者特征。
在一个可选的示例中,该系统还包括分类器训练单元,用于确定训练样本,所述训练样本包括多个训练者的融合特征;每个训练者的融合特征包括该训练者的毫米波雷达特征和脸部视频特征;所述毫米波雷达特征包括:生命体征信号的时频域特征、非线性特征以及时序特征;所述脸部视 频特征包括:人脸的时域特征和空域特征;在每个训练者的融合特征对应的数据集中加入状态标签,组成每个训练者的训练数据集;所述状态标签指示所述融合特征对应的训练者状态;其中,训练者状态属于警觉、正常以及疲劳三种状态中的一种;将每个训练者的训练数据集输入到分类器中,以结合其中的状态标签训练学习分类器,得到训练后的分类器。
总体而言,通过本发明所构思的以上技术方案与现有技术相比,具有以下有益效果:
本发明提供了一种非接触式疲劳检测方法及系统,同时采集待检测者的毫米波数据及视频图像数据,在毫米波雷达检测部分,通过毫米波收发模块,向被试发送毫米波雷达信号并采集其回波信号,从中提取呼吸和心率等生命体征信号并计算其相关特征;在视频图像检测部分,通过视频采集设备连续采集被试人脸面部信息并提取相关特征;最终,将两种方式所提取的特征进行融合,并在此基础上进行疲劳检测,将两种技术融合从而有效抑制主客观因素的干扰,通过混合注意力机制提取视频图像的时间和空间特征,通过非线性分析和深度学习提取毫米波雷达的非线性特征和时序特征,进一步提高疲劳检测的精度。该方法采用非接触式技术进行疲劳检测,灵活性较强,弥补了单一检测技术的缺陷,提高了检测的鲁棒性。
图1为本发明实施例提供的非接触式疲劳检测方法流程图;
图2为本发明实施例提供的特征融合部分流程图;
图3为本发明实施例提供的非接触式疲劳检测系统框图;
图4为本发明实施例提供的滑动窗口示意图;
图5为本发明实施例提供的CNN+BiLSTM结构示意图;
图6为本发明实施例提供的心率信号的单拍频波形及其特征点示意图;
图7为本发明实施例提供的非接触式疲劳检测系统架构图。
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。
图1为本发明实施例提供的非接触式疲劳检测方法流程图;如图1所示,包括如下步骤:
S101,向待检测者发送毫米波雷达信号,并接收从待检测者反射的回波信号;对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征;所述生命体征信号包括:呼吸信号和心率信号;
具体为:向待检测者发射毫米波雷达信号,并接收从该被试反射的回波信号;对所述回波信号进行杂波抑制和回波选择后,提取被试的生命体征信号(含:呼吸和心率);从被试的生命体征信号中提取其相应的时频域特征、非线性特征和时序特征;执行特征融合,选择与疲劳状态相关的典型特征。
S102,获取待检测者的脸部视频图像,基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征;
具体为:获取视频图像中的人脸位置信息并提取其特征点;将人脸特征点对齐并生成输入数据集;采用ResNet网络提取视频数据的空间特征;采用混合注意力模块提取单帧关联特征和帧间关联特征,并执行特征融合;将融合后的特征输入GRU单元提取时间特征,并将特征输出到全连接层,以表征提取的人脸空域特征和时域特征。
S103,将所述生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征;
可以理解的是,将毫米波雷达和视频图像处理技术所得到的特征进行融合,具体来说,其流程见图2,若毫米波存在异常,如:存在环境干扰或被试在测试过程中持续性晃动等,则无毫米波特征输出,删除检测窗口内的毫米波数据,基于视频图像进行疲劳检测;若视频图像存在异常而无法检测到人脸,如:存在环境光线昏暗、人脸倾斜偏转等;则无视频图像特征输出,删除检测窗口内的视频数据,基于毫米波雷达进行疲劳检测;若毫米波和视频图像均没检测到被试,则因无特征执行后续计算而显示无被试对象,重新进入循环,在下一个检测窗口内继续执行疲劳检测;若被试在两种方式下的检测结果均无异常,则综合利用毫米波检测和视频图像检测所提取的特征,在融合二者特征的基础上,进行疲劳检测。
S104,将融合后的待检测者特征输入到预训练好的分类器,进行待检测者的疲劳状态识别,判断待检测者是否处于疲劳状态;所述分类器基于融合后的待检测者特征将待检测者的状态划分为:警觉、正常以及疲劳三种状态;其中,警觉和正常状态均为非疲劳状态。
在一个具体的实施例中,步骤S101通过如下过程实现:向被试发射毫米波雷达信号;接收从被试反射回来的回波信号;对所述回波信号进行杂波抑制、回波选择后提取被试的生命体征信号;分别计算各滑动检测窗口内,生命体征信号的时频域、非线性和时序特征。
在一个具体的实施例中,步骤S102通过如下过程实现:实时采集被试视频图像信息;提取人脸特征点;人脸对齐;生成输入数据集;输入ResNet网络进行空间特征提取;输入混合注意力模块进行帧间关联特征提取;输入GRU单元进行时间特征提取;特征输入全连接层。
本发明提供一种非接触式疲劳检测方法及系统,具有可靠性高、鲁棒性强、功率小、便利性好等优点。其检测原理为:首先,检测系统同时监测毫米波数据和视频图像数据,其中毫米波模块向被试发射低功耗毫米波, 并检测该信号从人体(如:胸腔等)反射所产生的回波信号,从中提取心率及呼吸等生命体征信号,计算其时频域、非线性和时序特征。其次,对于视频图像数据,执行人脸检测、人脸特征点提取、人脸对齐等操作,并在此基础上提取其时域空域特征信息。最后,在毫米波和视频图像特征融合的基础上,基于分类器执行疲劳检测。
图3为本发明实施例提供的非接触式疲劳检测系统框图,如图3所示,主要包括:
(一)毫米波雷达部分
其中,毫米波雷达部分主要包括:(1)毫米波收发模块;(2)实时信号处理模块;(3)特征提取模块。
(1)毫米波收发模块具体为:发射毫米波并接受毫米波回波信号。具体来说,发射端产生线性调频信号,经过功率放大器后由发射天线发出周期为T
f的Chirp(即:线性调频脉冲),调频带宽为B的锯齿波,其帧周期(即锯齿波重复周期,每个帧周期内包含多个Chirp)为T
i。接收端接收天线检测接收环境中来自各种物体及人体等反射所产生的回波信号并进行预处理。
(2)实时信号处理模块具体为:执行回波信号的实时采集和处理,提取心率及呼吸信号。主要包括:回波信号的实时采集、杂波抑制、回波选择、波形重构四个步骤。具体处理过程如下:
(2-1)实时采集。通过Socket模块监听UDP端口,实时捕获UDP数据包并在上位机保存原始数据。
(2-2)杂波抑制。毫米波的回波信号中可能包括各种杂波干扰。分别采用自适应背景减法和奇异值分解,滤除来自桌子、墙等静态物体(反射信号)的平稳噪声和来自运动物体(反射信号)的非平稳噪声。
(2-3)回波选择。对被试所在的距离进行精准定位,并从回波信号矩阵Q中选出表征该距离单元的一列信号,其中即包含了被试心率及呼吸的原 始信号。具体来说,首先,对回波信号的每一行分别做傅里叶变换,得到一个N×M的距离矩阵R。其中:N表示帧的个数,M表示每个Chirp的采样点数;矩阵R的每一列表征一个距离单元。接着,计算每个距离单元上的能量和,
第三,找出最大能量和所在的列m
max,该列所表征的距离单元即为被试到疲劳检测系统之间的距离。第四,从矩阵Q中提取其第m
max列信号,利用反正切函数计算相位并执行相位解缠操作。
(2-4)波形重构。采用小波带通滤波剔除噪声,分别提取呼吸和心率信号,其中呼吸和心率的通带[f
L,f
H]分别为:[0.1-0.6]Hz和[0.8-2.5]Hz。
(3)特征提取模块,综合利用时频域分析、非线性分析和深度学习提取相关特征。在一个可选的实施例中,设置滑动检测窗口(例如:设置长度为20s,步长为1s的滑动窗口)作为缓冲区,提取窗口内毫米波数据和视频图片数据的相关特征,如图4所示。具体来说,基于毫米波雷达的特征提取部分包括两个步骤:特征计算与特征选择:
(3-1)特征计算。计算滑动检测窗口内呼吸和心率信号的特征。其具体操作流程为,首先,综合利用时频分析和非线性分析技术,分别提取生命体征信号的时域、频域和非线性特征。其中,对呼吸信号,提取的时频域和非线性特征包括均值、方差、功率谱密度、分形维数和近似熵。具体来说,以心率信号为例,其特征如下表1所示,主要包括:bpm、ibi、sdnn、sdsd、rmssd、pnn20、pnn50等时域特征;低频分量、高频分量、低频高频比等频域特征;以及近似熵、样本熵、李氏指数、Hurst指数等非线性特征。其次,提取时序特征。在检测窗口内,设置子滑动窗口(例如:在20s检测窗口内,进一步设置窗口长度为10s、步长为1s的子滑动窗口,将检测窗口进一步细分),分别提取各子窗口内的时域、频域和非线性特征后,按时间顺序将相关特征依次放入CNN+BiLSTM模型,提取其全连接层的特征, 对心率及呼吸信号的动态变化进行量化。其中,CNN+BiLSTM模型如流程图5所示,含1个CNN层,2个BiLSTM层,1个Attention层,2个Dense层。
表1心率信号特征表
具体地,单拍频特征目的是提取每次心跳的瞬时变化特征;所述多拍频特征和频域特征目的是提取多次心跳的长期变化特征;所述非线性特征的目的是进一步提取心率的非线性变化特征,所述非线性特征与疲劳状态具备很强的相关性,可提高分类器疲劳状态识别精度。
(3-2)特征选择。筛选与疲劳状态相关的典型特征。首先,在特征预处理的基础上,剔除冗余特征,执行异常特征处理和标准化操作。其次,采用多项式特征生成和深度特征合成技术,实现时频特征、非线性特征和时序特征的汇聚与融合。最后,综合采用统计分析(如:PCA和递归消除等)和机器学习(如:随机森林特征选择),初步筛选与疲劳状态分类标签相关性较高的特征。
(二)视频图像处理部分
其中,视频图像处理部分主要包括:(1)视频采集模块;(2)实时信号处理模块;(3)特征提取模块。
(1)视频采集模块具体为:使用视频采集设备实时采集被试的视频图像数据,将数据实时回传至上位机并保存,以便于及时处理。
(2)实时信号处理模块主要包含人脸检测、人脸对齐和生成数据集三个步骤,具体处理过程如下:
(2.1)人脸检测。提取人脸特征点序列,即在一个可选的实施例中,获取视频图像中的人脸数据,并进行人脸特征点提取。首先,采用harr特征提取方法,通过检测图像的灰度变化,提取人脸感兴趣(range of interest, ROI)区域,对区域内像素坐标求和。然后,采用dlib库中的landmark算法,提取面部的68个特征点(含:眉毛、眼睛、鼻子、嘴部和面部轮廓等),得到特征点序列p
(t):
(2.2)人脸对齐。基于人脸特征点序列,根据眼睛和眉毛区域的点位信息,计算出人脸中间点的位置,采用仿射变换对当前视频序列中的人脸进行校准和对齐。
(2.3)生成输入数据集。首先,将对齐后的人脸数据集调整为224*224大小的图片;其次,将每个视频图像中的疲劳状态对应标签进行编码;然后,以L帧为一组生成一个帧序列(L为滑动窗口内视频的总帧数)。由于视频采样频率不同(例如:25fps或30fps),L存在差异,因此借鉴时间敏感型网络(Temporal Segment Networks,TSN)处理流程,将视频帧分为K个部分,从每个部分里随机选取一帧作为最终的输入帧,得到一个K帧的序列与对应疲劳状态标签拼接生成一个输入数据集。
(3)特征提取模块具体为:对上述步骤生成的数据集进行特征提取,并借助分类器给出疲劳检测的识别结果。首先,将数据集输入到残差网络ResNet50中,提取视频序列的空间特征;其次,采用混合注意力模块(该模块由自注意力模块和空间注意力模块两部分组成),提取帧间关联特征:将所提取的空间特征输入自注意力模块,提取单帧的关联特征;将单帧关联特征输入空间注意力模块提取相邻帧间的空间关联特征;执行特征融合操作,将空间特征与单帧关联特征、相邻帧间的关联特征相融合。接着,将融合后的特征输入GRU单元,提取视频序列的时间特征。最后,将特征向量调整维度后输入一个全连接层,用该全连接层参数表征视频序列的时间和空间特性。
(三)技术融合部分包括:(1)算法设计;(2)特征融合;(3)疲劳检测。
(1)算法设计具体为,在毫米波技术和视频图像技术融合时,对当前测试状态进行判断。(i)若视频图像存在异常(如:环境光线太暗或人脸倾斜偏转)而无法检测到人脸,则(无视频特征输出,删除滑动检测窗口内视频数据)基于毫米波雷达特征进行疲劳检测;(ii)若毫米波存在异常(如被试在测试过程中持续性晃动或存在其他强干扰),则(无毫米波特征输出,删除滑动检测窗口内毫米波数据)基于视频图像特征进行疲劳检测;(iii)若视频和毫米波均异常,则显示检测异常或无待测目标,并重新返回循环继续监测;(iv)若视频和毫米波均正常,则在二者特征融合的基础上,借助分类器进行疲劳状态识别。
(2)特征融合具体为,首先,采用多项式特征生成和深度特征合成技术,融合滑动检测窗口及其子滑动窗口的毫米波特征和视频图像特征,初步实现两种技术相关特征的融合。将初步融合后的特征和毫米波特征和视频图像特征合并,组成合并特征;其次,采用Transformer模型,基于注意力机制分别筛选滑动检测窗口及其子窗口相关合并特征。其中,对于滑动窗口,基于注意力机制进行特征选择;对于子窗口,按时间顺序将相关特征依次放入Transformer时序模型后,基于注意力机制进行特征选择;将滑动窗口及子窗口所选择的特征进行合并,得到融合后的特征。
(3)疲劳检测具体为,基于Transformer构建三分类模型,对警觉、正常、疲劳三种状态进行识别。实验采用准确率、混淆矩阵、ROC曲线和AUC面积作为疲劳检测的评价指标。其中准确率值和AUC面积越大,识别的效果越好;混淆矩阵显示具体每种类别的预测准确率。
具体地,三分类模型,即分类器通过如下步骤训练:
确定训练样本,所述训练样本包括多个训练者的融合特征;每个训练 者的融合特征包括该训练者的毫米波雷达特征和脸部视频特征;所述毫米波雷达特征包括:生命体征信号的时频域特征、非线性特征以及时序特征;所述脸部视频特征包括:人脸的时域特征和空域特征;
在每个训练者的融合特征对应的数据集中加入状态标签,组成每个训练者的训练数据集;所述状态标签指示所述融合特征对应的训练者状态;其中,训练者状态属于警觉、正常以及疲劳三种状态中的一种;
将每个训练者的训练数据集输入到分类器中,以结合其中的状态标签训练学习分类器,得到训练后的分类器。
图7为本发明实施例提供的非接触式疲劳检测系统架构图,如图6所示,包括:
毫米波特征确定单元710,用于向待检测者发送毫米波雷达信号,并接收从待检测者反射的回波信号;对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征;所述生命体征信号包括:呼吸信号和心率信号;
脸部视频特征确定单元720,用于获取待检测者的脸部视频图像,基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征;
特征融合单元730,用于将所述生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征;
疲劳检测单元740,用于将融合后的待检测者特征输入到预训练好的分类器,进行待检测者的疲劳状态识别,判断待检测者是否处于疲劳状态;所述分类器基于融合后的待检测者特征将待检测者的状态划分为:警觉、正常以及疲劳三种状态;其中,警觉和正常状态均为非疲劳状态。
分类器训练单元750,用于确定训练样本,所述训练样本包括多个训练 者的融合特征;每个训练者的融合特征包括该训练者的毫米波雷达特征和脸部视频特征;所述毫米波雷达特征包括:生命体征信号的时频域特征、非线性特征以及时序特征;所述脸部视频特征包括:人脸的时域特征和空域特征;在每个训练者的融合特征对应的数据集中加入状态标签,组成每个训练者的训练数据集;所述状态标签指示所述融合特征对应的训练者状态;其中,训练者状态属于警觉、正常以及疲劳三种状态中的一种;将每个训练者的训练数据集输入到分类器中,以结合其中的状态标签训练学习分类器,得到训练后的分类器。
具体地,图7中各个单元的详细功能实现可参见前述方法实施例中的介绍,在此不做赘述。
为验证本发明所提出的非接触式疲劳检测方法及系统的可靠性,本实施例招募了12名被试分别参与了两次、持续时间分别为10分钟的测试(两次测试,一次为疲劳状态下的测试、另一次为非疲劳状态下测试)。每次测试前,被试均需填写卡罗林斯卡嗜睡量表(Karolinska sleepiness scale),用于评估其疲劳等级;在测试过程中,分别采用手机和毫米波设备,采集视频和毫米波数据,进行疲劳检测。其检测结果如表2所示。由表2可知,首先,毫米波非线性特征的引入,可显著提高疲劳检测识别精度,即:准确率、精准率、F1值和AUC面积的值均提高了0.05以上。以准确率为例,随着非线性特征的引入,准确率从0.698提高到0.752。其次,本发明所提出的疲劳检测方法及系统,其中毫米波引入了非线性特征,进一步结合毫米波时序特征和视频特征进行疲劳检测,可以准确识别疲劳状态,其识别准确率达到0.979,可以准确识别疲劳状态,其识别准确率达到0.979。
表2疲劳检测结果对比表(10折交叉验证平均结果)
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。
Claims (10)
- 一种非接触式疲劳检测方法,其特征在于,包括如下步骤:向待检测者发送毫米波雷达信号,并接收从待检测者反射的回波信号;对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征;所述生命体征信号包括:呼吸信号和心率信号;获取待检测者的脸部视频图像,基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征;将所述生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征;将融合后的待检测者特征输入到预训练好的分类器,进行待检测者的疲劳状态识别,判断待检测者是否处于疲劳状态;所述分类器基于融合后的待检测者特征将待检测者的状态划分为:警觉、正常以及疲劳三种状态;其中,警觉和正常状态均为非疲劳状态。
- 根据权利要求1所述的非接触式疲劳检测方法,其特征在于,基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征,具体为:对脸部视频图像进行人脸检测,提取人脸特征点,得到人脸特征点序列;基于人脸特征点序列,根据特征点中眼睛和眉毛区域点位信息,计算出人脸中间点的位置,采用仿射变换对当前脸部视频图像中的人脸进行校准和对齐;将对齐后的脸部视频图像数据集调整为预设大小的图片,以L帧图像为一组生成一个帧序列,并借鉴时间敏感性网络处理流程,将帧序列划分成K个部分,从每个部分里随机选取一帧作为最终的输入帧,得到一个K 帧的序列,生成一个数据集;其中,L和K均为大于0的整数;将所述数据集输入到残差网络ResNet50中,提取脸部视频图像的空间特征;采用混合注意力模块提取帧间关联特征,所述混合注意力模块由自注意力模块和空间注意力模块两部分组成;具体为:将所提取的空间特征输入自注意力模块,提取单帧的关联特征;将单帧关联特征输入空间注意力模块提取相邻帧间的空间关联特征;将所述空间特征与单帧的关联特征、相邻帧间的空间关联特征相融合,并将融合后的特征输入到门控循环单元GRU,提取脸部视频图像的时间特征;将脸部视频图像的空间特征和时间特征输入到全连接层,用全连接层参数表征待检测者人脸的空域特征和时域特征。
- 根据权利要求1所述的非接触式疲劳检测方法,其特征在于,所述对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征,具体为:对所述回波信号进行波形重构,具体为:采用小波带通滤波剔除噪声,分别提取其中的呼吸信号和心率信号,作为生命体征信号;利用时频分析和非线性分析技术,分别提取生命体征信号的时域特征、频域特征和非线性特征;对呼吸信号,提取的时域特征、频域特征和非线性特征包括均值、方差、功率谱密度、分形维数和近似熵;对心率信号,提取的时域特征包括单拍频特征和多拍频特征;提取的频域特征包括:低频分量、高频分量、低频高频比、频谱的峰度以及偏度;提取的非线性特征包括:近似熵、样本熵、李氏指数、Hurst指数以及去趋势波动指数;所述单拍频特征目的是提取每次心跳的瞬时变化特征;所述多拍频特征和频域特征目的是提取多次心跳的长期变化特征;所述非线性特征的目的是进一步提取心率的非线性变化特征,所述非线性特征与疲劳状态具备很强的 相关性,可提高分类器疲劳状态识别精度;利用深度学习技术,提取时序特征:首先,在检测窗口内,设置子滑动窗口,分别提取各子滑动窗口内生命体征信号的时域特征、频域特征和非线性特征;其次,按时间顺序将提取的相关特征依次放入卷积神经网络CNN与双向长短期记忆神经网络BiLSTM组合的模型,提取其全连接层的特征,作为生命体征信号的时序特征;基于统计分析和机器学习,从提取的特征中筛选出与疲劳状态分类相关性相对较高的特征,作为最终的生命体征信号的时频域特征、非线性特征以及时序特征。
- 根据权利要求1所述的非接触式疲劳检测方法,其特征在于,所述将生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征,具体为:采用多项式特征生成和深度特征合成技术,融合滑动检测窗口及其子窗口的生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征,得到初步融合特征;将初步融合特征与生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征合并,得到合并后的特征;采用Transformer模型,基于注意力机制分别筛选滑动检测窗口及其子窗口的所述合并后的特征;其中,对于滑动窗口,基于注意力机制进行特征选择;对于子窗口,按时间顺序将相关特征依次放入Transformer时序模型后,基于注意力机制进行特征选择;将滑动窗口及子窗口所选择的特征进行合并,得到融合后的待检测者特征。
- 根据权利要求1至4任一项所述的非接触式疲劳检测方法,其特征在于,所述分类器通过如下步骤训练:确定训练样本,所述训练样本包括多个训练者的融合特征;每个训练 者的融合特征包括该训练者的毫米波雷达特征和脸部视频特征;所述毫米波雷达特征包括:生命体征信号的时频域特征、非线性特征以及时序特征;所述脸部视频特征包括:人脸的时域特征和空域特征;在每个训练者的融合特征对应的数据集中加入状态标签,组成每个训练者的训练数据集;所述状态标签指示所述融合特征对应的训练者状态;其中,训练者状态属于警觉、正常以及疲劳三种状态中的一种;将每个训练者的训练数据集输入到分类器中,以结合其中的状态标签训练学习分类器,得到训练后的分类器。
- 一种非接触式疲劳检测系统,其特征在于,包括:毫米波特征确定单元,用于向待检测者发送毫米波雷达信号,并接收从待检测者反射的回波信号;对所述回波信号进行杂波抑制、回波选择后提取待检测者的生命体征信号,并确定生命体征信号的时频域特征、非线性特征以及时序特征;所述生命体征信号包括:呼吸信号和心率信号;脸部视频特征确定单元,用于获取待检测者的脸部视频图像,基于所述脸部视频图像进行人脸检测和对齐,以提取待检测者人脸的时域特征和空域特征;特征融合单元,用于将所述生命体征信号的时频域特征、非线性特征以及时序特征与所述待检测者人脸的时域特征和空域特征融合,得到融合后的待检测者特征;疲劳检测单元,用于将融合后的待检测者特征输入到预训练好的分类器,进行待检测者的疲劳状态识别,判断待检测者是否处于疲劳状态;所述分类器基于融合后的待检测者特征将待检测者的状态划分为:警觉、正常以及疲劳三种状态;其中,警觉和正常状态均为非疲劳状态。
- 根据权利要求6所述的非接触式疲劳检测系统,其特征在于,所述脸部视频特征确定单元,对脸部视频图像进行人脸检测,提取人脸特征点, 得到人脸特征点序列;基于人脸特征点序列,根据特征点中眼睛和眉毛区域点位信息,计算出人脸中间点的位置,采用仿射变换对当前脸部视频图像中的人脸进行校准和对齐;将对齐后的脸部视频图像数据集调整为预设大小的图片,以L帧图像为一组生成一个帧序列,并借鉴时间敏感性网络处理流程,将帧序列划分成K个部分,从每个部分里随机选取一帧作为最终的输入帧,得到一个K帧的序列,生成一个数据集;其中,L和K均为大于0的整数;将所述数据集输入到残差网络ResNet50中,提取脸部视频图像的空间特征;采用混合注意力模块提取帧间关联特征,所述混合注意力模块由自注意力模块和空间注意力模块两部分组成;具体为:将所提取的空间特征输入自注意力模块,提取单帧的关联特征;将单帧关联特征输入空间注意力模块提取相邻帧间的空间关联特征;将所述空间特征与单帧的关联特征、相邻帧间的空间关联特征相融合,并将融合后的特征输入到门控循环单元GRU,提取脸部视频图像的时间特征;以及将脸部视频图像的空间特征和时间特征输入到全连接层,用全连接层参数表征待检测者人脸的空域特征和时域特征。
- 根据权利要求6所述的非接触式疲劳检测系统,其特征在于,所述毫米波特征确定单元对所述回波信号进行波形重构,具体为:采用小波带通滤波剔除噪声,分别提取其中的呼吸信号和心率信号,作为生命体征信号;利用时频分析和非线性分析技术,分别提取生命体征信号的时域特征、频域特征和非线性特征;对呼吸信号,提取的时域特征、频域特征和非线性特征包括均值、方差、功率谱密度、分形维数和近似熵;对心率信号,提取的时域特征包括单拍频特征和多拍频特征;提取的频域特征包括:低频分量、高频分量、低频高频比、频谱的峰度以及偏度;提取的非线性特征包括:近似熵、样本熵、李氏指数、Hurst指数以及去趋势波动指数;所述单拍频特征目的是提取每次心跳的瞬时变化特征;所述多拍频特征和频域 特征目的是提取多次心跳的长期变化特征;所述非线性特征的目的是进一步提取心率的非线性变化特征,所述非线性特征与疲劳状态具备很强的相关性,可提高分类器疲劳状态识别精度;利用深度学习技术,提取时序特征:首先,在检测窗口内,设置子滑动窗口,分别提取各子滑动窗口内生命体征信号的时域特征、频域特征和非线性特征;其次,按时间顺序将所提取的相关特征依次放入卷积神经网络CNN与双向长短期记忆神经网络BiLSTM组合的模型,提取其全连接层的特征,作为生命体征信号的时序特征;基于统计分析和机器学习,从提取的特征中筛选出与疲劳状态分类相关性相对较高的特征,作为最终的生命体征信号的时频域特征、非线性特征以及时序特征。
- 根据权利要求6所述的非接触式疲劳检测系统,其特征在于,所述特征融合单元,采用多项式特征生成和深度特征合成技术,融合滑动检测窗口及其子窗口的生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征,得到初步融合特征;将初步融合特征与生命体征信号的时频域特征、非线性特征以及时序特征和待检测者人脸的时域特征和空域特征合并;得到合并后的特征;采用Transformer模型,基于注意力机制分别筛选滑动检测窗口及其子窗口的所述合并后的特征;其中,对于滑动窗口,基于注意力机制进行特征选择;对于子窗口,按时间顺序将相关特征依次放入Transformer时序模型后,基于注意力机制进行特征选择;将滑动窗口及子窗口所选择的特征进行合并,得到融合后的待检测者特征。
- 根据权利要求6至9任一项所述的非接触式疲劳检测系统,其特征在于,还包括分类器训练单元,用于确定训练样本,所述训练样本包括多个训练者的融合特征;每个训练者的融合特征包括该训练者的毫米波雷达特征和脸部视频特征;所述毫米波雷达特征包括:生命体征信号的时频域 特征、非线性特征以及时序特征;所述脸部视频特征包括:人脸的时域特征和空域特征;在每个训练者的融合特征对应的数据集中加入状态标签,组成每个训练者的训练数据集;所述状态标签指示所述融合特征对应的训练者状态;其中,训练者状态属于警觉、正常以及疲劳三种状态中的一种;将每个训练者的训练数据集输入到分类器中,以结合其中的状态标签训练学习分类器,得到训练后的分类器。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/038,989 US20240023884A1 (en) | 2021-06-11 | 2021-06-23 | Non-contact fatigue detection method and system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110652542.7A CN113420624B (zh) | 2021-06-11 | 2021-06-11 | 一种非接触式疲劳检测方法及系统 |
CN202110652542.7 | 2021-06-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022257187A1 true WO2022257187A1 (zh) | 2022-12-15 |
Family
ID=77788356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/101744 WO2022257187A1 (zh) | 2021-06-11 | 2021-06-23 | 一种非接触式疲劳检测方法及系统 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240023884A1 (zh) |
CN (1) | CN113420624B (zh) |
WO (1) | WO2022257187A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230012177A1 (en) * | 2021-07-07 | 2023-01-12 | The Bank Of New York Mellon | System and methods for generating optimal data predictions in real-time for time series data signals |
CN116458852A (zh) * | 2023-06-16 | 2023-07-21 | 山东协和学院 | 基于云平台及下肢康复机器人的康复训练系统及方法 |
CN116776130A (zh) * | 2023-08-23 | 2023-09-19 | 成都新欣神风电子科技有限公司 | 一种用于异常电路信号的检测方法及装置 |
CN116933145A (zh) * | 2023-09-18 | 2023-10-24 | 北京交通大学 | 工业设备中的部件的故障确定方法及相关设备 |
CN118817924A (zh) * | 2024-09-18 | 2024-10-22 | 常州赛格电子仪器有限公司 | 基于多模态信息的油色谱油样动态检测方法及系统 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12122392B2 (en) | 2021-08-24 | 2024-10-22 | Nvidia Corporation | Context-based state estimation |
US11830259B2 (en) * | 2021-08-24 | 2023-11-28 | Nvidia Corporation | Robust state estimation |
CN114052692B (zh) * | 2021-10-26 | 2024-01-16 | 珠海脉动时代健康科技有限公司 | 基于毫米波雷达的心率分析方法及设备 |
US20240245314A1 (en) * | 2021-11-01 | 2024-07-25 | Honor Device Co., Ltd. | Vital Sign Detection Method and Electronic Device |
CN114052740B (zh) * | 2021-11-29 | 2022-12-30 | 中国科学技术大学 | 基于毫米波雷达的非接触心电图监测方法 |
CN114259255B (zh) * | 2021-12-06 | 2023-12-08 | 深圳信息职业技术学院 | 一种基于频域信号与时域信号的模态融合胎心率分类方法 |
CN114202794B (zh) * | 2022-02-17 | 2022-11-25 | 之江实验室 | 一种基于人脸ppg信号的疲劳检测方法和装置 |
CN114469178A (zh) * | 2022-02-25 | 2022-05-13 | 大连理工大学 | 一种可应用于智能手机的基于声波信号的眨眼检测方法 |
CN114343661B (zh) * | 2022-03-07 | 2022-05-27 | 西南交通大学 | 高铁司机反应时间估计方法、装置、设备及可读存储介质 |
CN114821713B (zh) * | 2022-04-08 | 2023-04-07 | 湖南大学 | 一种基于Video Transformer的疲劳驾驶检测方法 |
CN114936203B (zh) * | 2022-05-20 | 2023-04-07 | 北京思路智园科技有限公司 | 基于时序数据和业务数据融合分析的方法 |
CN114781465B (zh) * | 2022-06-20 | 2022-08-30 | 华中师范大学 | 一种基于rPPG的非接触式疲劳检测系统及方法 |
CN115721294B (zh) * | 2022-11-24 | 2023-09-12 | 北京金茂绿建科技有限公司 | 基于毫米波感知的呼吸监测方法、装置、电子设备和介质 |
CN118709050B (zh) * | 2024-08-28 | 2024-10-29 | 安徽大学 | 一种空间异构融合的鲸类信号分类方法及系统 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170036541A1 (en) * | 2014-04-14 | 2017-02-09 | Novelic D.O.O. | Mm-wave radar driver fatigue sensor apparatus |
US10292585B1 (en) * | 2016-12-23 | 2019-05-21 | X Development Llc | Mental state measurement using sensors attached to non-wearable objects |
CN110115592A (zh) * | 2018-02-07 | 2019-08-13 | 英飞凌科技股份有限公司 | 使用毫米波雷达传感器确定人的参与水平的系统和方法 |
CN111166357A (zh) * | 2020-01-06 | 2020-05-19 | 四川宇然智荟科技有限公司 | 多传感器融合的疲劳监测装置系统及其监测方法 |
CN111329455A (zh) * | 2020-03-18 | 2020-06-26 | 南京润楠医疗电子研究院有限公司 | 一种非接触式的心血管健康评估方法 |
CN112401863A (zh) * | 2020-11-19 | 2021-02-26 | 华中师范大学 | 一种基于毫米波雷达的非接触式实时生命体征监测系统及方法 |
CN112418095A (zh) * | 2020-11-24 | 2021-02-26 | 华中师范大学 | 一种结合注意力机制的面部表情识别方法及系统 |
CN112686094A (zh) * | 2020-12-03 | 2021-04-20 | 华中师范大学 | 一种基于毫米波雷达的非接触式身份识别方法及系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI436305B (zh) * | 2011-07-26 | 2014-05-01 | Holux Technology Inc | 疲勞度偵測方法及其裝置 |
CN103714660B (zh) * | 2013-12-26 | 2017-02-08 | 苏州清研微视电子科技有限公司 | 基于图像处理融合心率特征与表情特征实现疲劳驾驶判别的系统 |
CN110200640B (zh) * | 2019-05-14 | 2022-02-18 | 南京理工大学 | 基于双模态传感器的非接触式情绪识别方法 |
CN112381011B (zh) * | 2020-11-18 | 2023-08-22 | 中国科学院自动化研究所 | 基于人脸图像的非接触式心率测量方法、系统及装置 |
-
2021
- 2021-06-11 CN CN202110652542.7A patent/CN113420624B/zh active Active
- 2021-06-23 US US18/038,989 patent/US20240023884A1/en active Pending
- 2021-06-23 WO PCT/CN2021/101744 patent/WO2022257187A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170036541A1 (en) * | 2014-04-14 | 2017-02-09 | Novelic D.O.O. | Mm-wave radar driver fatigue sensor apparatus |
US10292585B1 (en) * | 2016-12-23 | 2019-05-21 | X Development Llc | Mental state measurement using sensors attached to non-wearable objects |
CN110115592A (zh) * | 2018-02-07 | 2019-08-13 | 英飞凌科技股份有限公司 | 使用毫米波雷达传感器确定人的参与水平的系统和方法 |
CN111166357A (zh) * | 2020-01-06 | 2020-05-19 | 四川宇然智荟科技有限公司 | 多传感器融合的疲劳监测装置系统及其监测方法 |
CN111329455A (zh) * | 2020-03-18 | 2020-06-26 | 南京润楠医疗电子研究院有限公司 | 一种非接触式的心血管健康评估方法 |
CN112401863A (zh) * | 2020-11-19 | 2021-02-26 | 华中师范大学 | 一种基于毫米波雷达的非接触式实时生命体征监测系统及方法 |
CN112418095A (zh) * | 2020-11-24 | 2021-02-26 | 华中师范大学 | 一种结合注意力机制的面部表情识别方法及系统 |
CN112686094A (zh) * | 2020-12-03 | 2021-04-20 | 华中师范大学 | 一种基于毫米波雷达的非接触式身份识别方法及系统 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230012177A1 (en) * | 2021-07-07 | 2023-01-12 | The Bank Of New York Mellon | System and methods for generating optimal data predictions in real-time for time series data signals |
CN116458852A (zh) * | 2023-06-16 | 2023-07-21 | 山东协和学院 | 基于云平台及下肢康复机器人的康复训练系统及方法 |
CN116458852B (zh) * | 2023-06-16 | 2023-09-01 | 山东协和学院 | 基于云平台及下肢康复机器人的康复训练系统及方法 |
CN116776130A (zh) * | 2023-08-23 | 2023-09-19 | 成都新欣神风电子科技有限公司 | 一种用于异常电路信号的检测方法及装置 |
CN116933145A (zh) * | 2023-09-18 | 2023-10-24 | 北京交通大学 | 工业设备中的部件的故障确定方法及相关设备 |
CN116933145B (zh) * | 2023-09-18 | 2023-12-01 | 北京交通大学 | 工业设备中的部件的故障确定方法及相关设备 |
CN118817924A (zh) * | 2024-09-18 | 2024-10-22 | 常州赛格电子仪器有限公司 | 基于多模态信息的油色谱油样动态检测方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN113420624A (zh) | 2021-09-21 |
US20240023884A1 (en) | 2024-01-25 |
CN113420624B (zh) | 2022-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022257187A1 (zh) | 一种非接触式疲劳检测方法及系统 | |
Song et al. | PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography | |
Zhang et al. | Driver drowsiness detection using multi-channel second order blind identifications | |
Casado et al. | Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces | |
US12036021B2 (en) | Non-contact fatigue detection system and method based on RPPG | |
US20220373646A1 (en) | Joint estimation of respiratory and heart rates using ultra-wideband radar | |
US20240000345A1 (en) | MILLIMETER-WAVE (mmWave) RADAR-BASED NON-CONTACT IDENTITY RECOGNITION METHOD AND SYSTEM | |
CN106793962A (zh) | 用于使用视频图像来连续估计人体血压的方法和装置 | |
Przybyło | A deep learning approach for remote heart rate estimation | |
WO2021257737A1 (en) | Systems and methods for measuring vital signs using multimodal health sensing platforms | |
Tang et al. | Augmenting experimental data with simulations to improve activity classification in healthcare monitoring | |
Zheng et al. | Heart rate prediction from facial video with masks using eye location and corrected by convolutional neural networks | |
Luguev et al. | Deep learning based affective sensing with remote photoplethysmography | |
CN114067435A (zh) | 一种基于伪3d卷积网络与注意力机制的睡眠行为检测方法和系统 | |
Xie et al. | Signal quality detection towards practical non-touch vital sign monitoring | |
Wang et al. | HeRe: Heartbeat signal reconstruction for low-power millimeter-wave radar based on deep learning | |
Ren et al. | Improving video-based heart rate and respiratory rate estimation via pulse-respiration quotient | |
Wu et al. | Anti-jamming heart rate estimation using a spatial–temporal fusion network | |
CN109276233A (zh) | 一种面向视频流的分布式人脸及生理特征识别方法 | |
Du et al. | Non-interference driving fatigue detection system based on intelligent steering wheel | |
Gao et al. | Region of Interest Analysis Using Delaunay Triangulation for Facial Video-Based Heart Rate Estimation | |
Braun et al. | How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites? | |
Rafiqi et al. | Work-in-progress, PupilWare-M: Cognitive load estimation using unmodified smartphone cameras | |
Li et al. | A UWB radar-based approach of detecting vital signals | |
CN116975594A (zh) | 远程心冲击信号的特征提取方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21944693 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18038989 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21944693 Country of ref document: EP Kind code of ref document: A1 |