WO2020051781A1 - Systèmes et procédés de détection de somnolence - Google Patents

Systèmes et procédés de détection de somnolence Download PDF

Info

Publication number
WO2020051781A1
WO2020051781A1 PCT/CN2018/105132 CN2018105132W WO2020051781A1 WO 2020051781 A1 WO2020051781 A1 WO 2020051781A1 CN 2018105132 W CN2018105132 W CN 2018105132W WO 2020051781 A1 WO2020051781 A1 WO 2020051781A1
Authority
WO
WIPO (PCT)
Prior art keywords
eye
yawning
training
blinking
driver
Prior art date
Application number
PCT/CN2018/105132
Other languages
English (en)
Inventor
Guangda YU
Original Assignee
Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology And Development Co., Ltd. filed Critical Beijing Didi Infinity Technology And Development Co., Ltd.
Priority to MX2021002807A priority Critical patent/MX2021002807A/es
Priority to PCT/CN2018/105132 priority patent/WO2020051781A1/fr
Priority to BR112021004647-0A priority patent/BR112021004647B1/pt
Priority to CN201880001325.8A priority patent/CN111052127A/zh
Publication of WO2020051781A1 publication Critical patent/WO2020051781A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3697Output of additional, non-guidance related information, e.g. low fuel level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris

Definitions

  • the present disclosure generally relates to systems and methods for user behavior, and in particular, to systems and methods for detecting drowsiness of a user.
  • the existing methods of drowsy driving detection include analyzing images of the driver and determining that the driver is sleepy if the driver’s eyes frequently kept closed.
  • the reliability of assessment of the driver’s sleepy condition may deteriorate if there is noise in the driver’s images, given that the eyes of the driver may appear small in these images. It is desirable to provide systems and methods for detecting drowsiness of the driver more reliably and efficiently.
  • a system may include at least one computer-readable storage device including a set of instructions for determining a degree of drowsiness of a driver, and at least one processor in communication with the at least one computer-readable storage device.
  • the at least one processor may be configured to cause the system to receive a plurality of video frames from a camera.
  • the at least one processor may be also configured to cause the system to detect a face of a driver in the plurality of video frames, and extract the detected faces in the plurality of video frames.
  • the at least one processor may be further configured to cause the system to obtain a trained eye-blinking detection model and a trained yawning detection model.
  • the at least one processor may be further configured to cause the system to generate an eye-blinking detection result by inputting the extracted faces into the trained eye-blinking detection model, and generate a yawning detection result by inputting the extracted faces into the trained yawning detection model.
  • the at least one processor may be further configured to cause the system to determine a degree of drowsiness of the driver based on the eye-blinking detection result and the yawning detection result.
  • the at least one processor may be further configured to cause the system to generate a notification based on the degree of drowsiness.
  • the trained eye-blinking detection model may be generated by a process for training an eye-blinking detection model.
  • the process may include obtaining a preliminary eye-blinking detection model; obtaining a plurality of training face samples; classifying the plurality of training face samples into a set of positive eye-blink training samples and a set of negative eye-blink training samples, wherein eyes in each of the positive eye-blink training samples blink and eyes in each of the negative eye-blink training samples do not blink; and training the preliminary eye-blinking detection model based on the set of positive eye-blink training samples and the set of negative eye-blink training samples to generate the trained eye-blinking detection model.
  • the trained yawning detection model may be generated by a process for training a yawning detection model.
  • the process may include obtaining a preliminary yawning detection model; obtaining a plurality of training face samples; classifying the plurality of training face samples into a set of positive yawn training samples and a set of negative yawn training samples, wherein a person of the face in each of the positive yawn training samples yawns and a person of the face in each of the negative yawn training samples does not yawn; and training the preliminary yawning detection model based on the set of positive yawn training samples and the set of negative yawn training samples to generate the trained yawning detection model.
  • the at least one processor may be further configured to cause the system to determine whether an angle between a direction perpendicular to a face of the driver in the at least one video frame of the plurality of video frames and a capturing direction of the camera is greater than a threshold for at least one video frame of the plurality of video frames.
  • the at least one processor may be further configured to cause the system to discard the at least one video frame of the plurality of video frames from the plurality of video frames to be inputted into the trained eye-blinking detection model or the trained yawning detection model.
  • the at least one processor may be configured to cause the system to input the extracted face in each of the plurality of video frames into a trained angle determination model to generate a result that whether the angle between the direction perpendicular to the face of the driver and the capturing direction of the camera is greater than the threshold.
  • the trained angle determination model may be generated by a process for training an angle determination model.
  • the process may include obtaining a preliminary angle determination model; obtaining a plurality of training face samples; classifying the plurality of training face samples into a set of positive angel training samples and a set of negative angel training samples, wherein the angle between the direction perpendicular to a face of a driver and the capturing direction of a camera in each positive angle training sample is greater than 60, and the angle between the direction perpendicular to a face of the driver and the capturing direction of the camera in each negative angle training sample is less than or equal to 60 degrees; and training the preliminary angle determination model based on the set of positive training samples and the set of negative training samples to generate the trained angle determination model.
  • the at least one processor may be further configured to cause the system to determine a frequency of eye-blinking by the driver based on the eye-blinking detection result; determine a frequency of yawning by the driver based on the yawning detection result; and determine the degree of drowsiness based on the frequency of eye-blinking and the frequency of yawning.
  • the at least one processor may be further configured to cause the system to determine a number count of one or more blinks by the driver detected in the plurality of video frames based on the eye-blinking detection result; determine a total time length of the plurality of video frames; and determine the frequency of eye-blinking based on a number count of the one or more blinks and the total time length of the plurality of video frames.
  • the at least one processor may be further configured to cause the system to determine the number of one or more yawns by the driver detected in the plurality of successive frames based on the plurality of yawning detection results; determine a total time length of the plurality of video frames; and determine the frequency of yawning based on a number count of the one or more yawns of the driver and the total time length of the plurality of video frames.
  • the notification may include at least one of a sound, a vibration, or a light.
  • a loudness or frequency of the sound, a strength of the vibration, or an intensity or frequency of the light may depend on the degree of drowsiness.
  • a method for determining a degree of drowsiness of a driver may include receiving a plurality of video frames from a camera. The method may also include detecting a face of a driver in the plurality of video frames. The method may further include extracting the detected faces in the plurality of video frames. The method may further include obtaining a trained eye-blinking detection model and a trained yawning detection model. The method may further include generating an eye-blinking detection result by inputting the extracted faces into the trained eye-blinking detection model, and generating a yawning detection result by inputting the extracted faces into the trained yawning detection model. The method may further include determining a degree of drowsiness of the driver based on the eye-blinking detection result and the yawning detection result. The method may further include generating a notification based on the degree of drowsiness.
  • a system for determining a degree of drowsiness of a driver may include an acquisition module, a detection module, an extraction module, a generation module and a determination module.
  • the acquisition module may be configured to receive a plurality of video frames from a camera, and obtain a trained eye-blinking detection model and a trained yawning detection model.
  • the detection module may be configured to detect a face of a driver in the plurality of video frames.
  • the extraction module may be configured to extract the detected faces in the plurality of video frames.
  • the generation module may be configured to generate an eye-blinking detection result by inputting the extracted faces into the trained eye-blinking detection model, generate a yawning detection result by inputting the extracted faces into the trained yawning detection model, and generate a notification based on a degree of drowsiness.
  • the determination module may be configured to determine the degree of drowsiness of the driver based on the eye-blinking detection result and the yawning detection result.
  • a non-transitory computer-readable storage medium embodying a computer program product.
  • the computer program product including instructions may be configured to cause a computing device to receive a plurality of video frames from a camera.
  • the computer program product including instructions may be also configured to cause the computing device to detect a face of a driver in the plurality of video frames.
  • the computer program product including instructions may be further configured to cause the computing device to extract the detected faces in the plurality of video frames.
  • the computer program product including instructions may be further configured to cause the computing device to obtain a trained eye-blinking detection model and a trained yawning detection model.
  • the computer program product including instructions may be further configured to cause the computing device to generate an eye-blinking detection result by inputting the extracted faces into the trained eye-blinking detection model, and generate a yawning detection result by inputting the extracted faces into the trained yawning detection model.
  • the computer program product including instructions may be further configured to cause the computing device to determine a degree of drowsiness of the driver based on the eye-blinking detection result and the yawning detection result.
  • the computer program product including instructions may be further configured to cause the computing device to generate a notification based on the degree of drowsiness.
  • FIG. 1 is a schematic diagram illustrating an exemplary drowsiness detection system according to some embodiments of the present disclosure
  • FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of a computing device according to some embodiments of the present disclosure
  • FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of a mobile device on which a terminal may be implemented according to some embodiments of the present disclosure
  • FIG. 4 is a block diagram illustrating an exemplary processing engine according to some embodiments of the present disclosure
  • FIG. 5 is a flowchart illustrating an exemplary process for determining a degree of drowsiness of a driver according to some embodiments of the present disclosure
  • FIG. 6 is a flowchart illustrating an exemplary process for determining a frequency of eye-blinking according to some embodiments of the present disclosure
  • FIG. 7 is a flowchart illustrating an exemplary process for determining a frequency of yawning according to some embodiments of the present disclosure
  • FIG. 8 is a schematic diagram illustrating an exemplary blink according to some embodiments of the present disclosure.
  • FIG. 9 is a schematic diagram illustrating an exemplary yawn according to some embodiments of the present disclosure.
  • FIG. 10 is a schematic diagram illustrating exemplary images of yawning according to some embodiments of the present disclosure.
  • FIG. 11 is a schematic diagram illustrating an exemplary model according to some embodiments of the present disclosure.
  • FIG. 12 is a schematic diagram illustrating an exemplary automobile data recorder for detecting drowsiness of a driver according to some embodiments of the present disclosure.
  • the flowcharts used in the present disclosure illustrate operations that systems implement according to some embodiments in the present disclosure. It is to be expressly understood, the operations of the flowchart may be implemented not in order. Conversely, the operations may be implemented in inverted order, or simultaneously. Moreover, one or more other operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.
  • the systems and methods disclosed in the present disclosure are described primarily regarding driver drowsiness detection service, it should also be understood that this is only one exemplary embodiment.
  • the system or method of the present disclosure may be applied to any other kind of drowsiness detection.
  • the systems or methods of the present disclosure may be used to detect the degree of drowsiness of the student or the audience.
  • the systems or methods of the present disclosure may be used to detect the degree of drowsiness of the mechanical engineer or the operator.
  • the term “user” in the present disclosure may refer to an individual, an entity or a tool that may be detected in the drowsiness detection system.
  • the user may be a driver, an operator, a student, a worker, or the like, or a combination thereof.
  • the terms “drowsiness” and “degree of drowsiness” may be used herein interchangeably.
  • An aspect of the present disclosure relates to systems and methods for detecting a drowsy driving of a driver.
  • the drowsy driving of the driver may be judged based on the degree of the drowsiness of the driver.
  • the degree of the drowsiness may be determined based on an eye-blinking detection result and a yawning detection result.
  • the eye-blinking detection result may be generated by inputting face image (s) of the driver to a trained eye-blinking detection model.
  • the yawning detection result may be generated by inputting the face image (s) of the driver to a trained yawning detection model.
  • the face image (s) of the driver may be extracted from a plurality of real-time video frames that include the face of the driver.
  • a notification e.g., sound, vibration, light
  • the type and/or strength of the notification may depend on the degree of drowsiness.
  • FIG. 1 is a schematic diagram illustrating an exemplary drowsiness detection system 100 according to some embodiments of the present disclosure.
  • the drowsiness detection system 100 may be used to detect a drowsy driving of a driver.
  • the drowsiness detection system 100 may be used to determine a degree of drowsiness of a mechanical engineer or operator who is working on a production line.
  • the drowsiness detection system 100 may be used to determine a degree of drowsiness of a student (or an audience) that attends a class (or a lecture) .
  • the drowsiness detection system 100 may be mounted on a vehicle or a component thereof (e.g., an automobile data recorder in a vehicle) .
  • the drowsiness detection system 100 may include a processing device 110, a terminal 120, an image capturing device 130, a network 140 and a storage 150.
  • the processing device 110 may be a single processing device or a processing device group.
  • the processing device group may be centralized, or distributed (e.g., the processing device 110 may be a distributed system) .
  • the processing device 110 may be local or remote.
  • the processing device 110 may access information and/or data stored in the terminal 120, the image capturing device 130, and/or the storage 150 via the network 140.
  • the processing device 110 may be directly connected to the terminal 120, the image capturing device 130, and/or the storage 150 to access stored information and/or data.
  • the processing device 110 may be implemented on a cloud platform.
  • the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or a combination thereof.
  • the processing device 110 may be implemented on a computing device 200 having one or more components illustrated in FIG. 2 in the present disclosure.
  • the processing device 110 may include a processing engine 112.
  • the processing engine 112 may process information and/or data related to the drowsiness detection to perform one or more functions described in the present disclosure. For example, the processing engine 112 may determine a degree of drowsiness of the driver based on an eye-blinking detection result and a yawning detection result.
  • the processing engine 112 may include one or more processing engines (e.g., single-core processing engine (s) or multi-core processor (s) ) .
  • the processing engine 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or a combination thereof.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • ASIP application-specific instruction-set processor
  • GPU graphics processing unit
  • PPU physics processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • PLD programmable logic device
  • controller a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or a combination thereof.
  • the terminal 120 may include a tablet computer 120-1, a laptop computer 120-2, a built-in device in a vehicle 120-3, a mobile device 120-4, or the like, or a combination thereof.
  • the mobile device 120-4 may include a smart home device, a wearable device, a smart mobile device, an augmented reality device, or the like, or a combination thereof.
  • the wearable device may include a smart bracelet, a smart footgear, smart glasses, a smart helmet, a smart watch, smart clothing, a smart backpack, a smart accessory, or the like, or a combination thereof.
  • the smart mobile device may include a smartphone, a personal digital assistant (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or a combination thereof.
  • the built-in device in the vehicle 120-3 may include an onboard computer, an automobile data recorder, an onboard human-computer interaction (HCI) system, an onboard television, etc.
  • the terminal 120 may be a device with positioning technology for locating the position of the terminal 120.
  • the terminal 120 may generate a notification based on the degree of drowsiness determined by the processing engine 112.
  • the image capturing device 130 may be configured to capture an image of one or more objects.
  • the image may include a still picture, a motion picture, a video (offline or live streaming) , a frame of a video (or referred to as a video frame) , or a combination thereof.
  • the one or more objects may be static or moving.
  • the one or more objects may be an animal, a human being (a driver, an operator, a student, a worker) or a portion thereof (face) , goods, or the like, or a combination thereof.
  • the image capturing device 130 may include an automobile data recorder 130-1, a dome camera 130-2, a fixed camera 130-3, or the like, or a combination thereof.
  • the automobile data recorder 130-1 may be mounted in a vehicle and generally configured to record a road condition around the vehicle when the driver is driving.
  • the automobile data recorder 130-1 may include a camera orienting the face of a driver and capture an image of a face of a driver.
  • the imaging capturing device 130 may be combined with the terminal 120 (e.g., the mobile device 120-4) .
  • a mobile device of a driver may include a camera that may capture images of the face of the driver.
  • the camera of the mobile device may simultaneously capture images of the face of the driver.
  • the drowsiness detection method described in the present disclosure may be implemented by the mobile device or processors thereof to determine a degree of drowsiness of the driver based on the captured images.
  • the mobile device may further include a component for generating a notification (e.g., a screen, a loudspeaker, a vibrating component) to generate a notification based on the determined degree of drowsiness of the driver.
  • the notification may include a sound, a vibration, a light, or the like, or a combination thereof.
  • the network 140 may facilitate exchange of information and/or data.
  • one or more components of the drowsiness detection system 100 e.g., the processing device 110, the terminal 120, the image capturing device 130 or the storage 150
  • the processing device 110 or the processing engine 112 may receive a plurality of video frames from the image capturing device 130 via the network 140.
  • the processing device 110 (or the processing engine 112) may send a notification to the terminal 120 via the network 140.
  • the network 140 may be any type of wired or wireless network, or combination thereof.
  • the network 140 may include a cable network, a wireline network, an optical fiber network, a telecommunications network, an intranet, an Internet, a local area network (LAN) , a wide area network (WAN) , a wireless local area network (WLAN) , a metropolitan area network (MAN) , a public telephone switched network (PSTN) , a Bluetooth network, a ZigBee network, a near field communication (NFC) network, or the like, or a combination thereof.
  • the network 140 may include one or more network access points.
  • the network 140 may include wired or wireless network access points such as base stations and/or internet exchange points 140-1, 140-2, through which one or more components of the drowsiness detection system 100 may be connected to the network 140 to exchange data and/or information.
  • the storage 150 may store data and/or instructions.
  • the storage 150 may store data obtained from the terminal 120 and/or the image capturing device 130.
  • the storage 150 may store a plurality of images of one or more objects captured from the image capturing device 130.
  • the storage 150 may store data and/or instructions that the processing device 110 may execute or use to perform exemplary methods described in the present disclosure.
  • storage 150 may include a mass storage, removable storage, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or a combination thereof.
  • Exemplary mass storage may include a magnetic disk, an optical disk, solid-state drives, etc.
  • Exemplary removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc.
  • Exemplary volatile read-and-write memory may include random-access memory (RAM) .
  • Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc.
  • DRAM dynamic RAM
  • DDR SDRAM double date rate synchronous dynamic RAM
  • SRAM static RAM
  • T-RAM thyristor RAM
  • Z-RAM zero-capacitor RAM
  • Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (EPROM) , an electrically-erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc.
  • the storage 150 may be implemented on a cloud platform.
  • the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or a combination thereof.
  • the storage 150 may be connected to the network 140 to communicate with one or more components of the drowsiness detection system 100 (e.g., the processing device 110, the terminal 120, or the image capturing device 130) .
  • One or more components of the drowsiness detection system 100 may access the data or instructions stored in the storage 150 via the network 140.
  • the storage 150 may be directly connected to or communicate with one or more components of the drowsiness detection system 100 (e.g., the processing device 110, the terminal 120, the image capturing device 130) .
  • the storage 150 may be part of the processing device 110.
  • one or more components of the drowsiness detection system 100 may have permissions to access the storage 150.
  • one or more components of the drowsiness detection system 100 may read and/or modify information when one or more conditions are met.
  • FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of a computing device according to some embodiments of the present disclosure.
  • the processing device 110, the terminal 120, and/or the image capturing device 130 may be implemented on the computing device 200.
  • the processing engine 112 may be implemented on the computing device 200 and configured to perform functions of the processing engine 112 disclosed in this disclosure.
  • the computing device 200 may be used to implement any component of the drowsiness detection system 100 as described herein.
  • the processing engine 112 may be implemented on the computing device 200, via its hardware, software program, firmware, or a combination thereof.
  • only one such computer is shown, for convenience, the computer functions relating to the drowsiness detection as described herein may be implemented in a distributed fashion on a number of similar platforms to distribute the processing load.
  • the computing device 200 may include COM ports 250 connected to and from a network connected thereto to facilitate data communications.
  • the computing device 200 may also include a processor 220, in the form of one or more processors (e.g., logic circuits) , for executing program instructions.
  • the processor 220 may include interface circuits and processing circuits therein.
  • the interface circuits may be configured to receive electronic signals from a bus 210, wherein the electronic signals encode structured data and/or instructions for the processing circuits to process.
  • the processing circuits may conduct logic calculations, and then determine a conclusion, a result, and/or an instruction encoded as electronic signals. Then the interface circuits may send out the electronic signals from the processing circuits via the bus 210.
  • the computing device 200 may further include program storage and data storage of different forms including, for example, a disk 270, and a read-only memory (ROM) 230, or a random access memory (RAM) 240, for various data files to be processed and/or transmitted by the computing device.
  • the exemplary computer platform may also include program instructions stored in the ROM 230, RAM 240, and/or another type of non-transitory storage medium to be executed by the processor 220.
  • the methods and/or processes of the present disclosure may be implemented as the program instructions.
  • the computing device 200 also includes an I/O component 260, supporting input/output between the computer and other components.
  • the computing device 200 may also receive programming and data via network communications.
  • step A and step B may also be performed by two different CPUs and/or processors jointly or separately in the computing device 200 (e.g., the first processor executes step A and the second processor executes step B, or the first and second processors jointly execute steps A and B) .
  • FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary mobile device 300 on which the terminal 120 and/or the image capturing device 130 may be implemented according to some embodiments of the present disclosure.
  • the mobile device 300 may include a communication platform 310, a display 320, a graphics processing unit (GPU) 330, a central processing unit (CPU) 340, a I/O 350, a memory 360, a mobile operating system (OS) 370, and a storage 390.
  • any other suitable component including but not limited to a system bus or a controller (not shown) , may also be included in the mobile device 300.
  • the mobile operating system 370 e.g., iOS TM , Android TM , Windows Phone TM , etc.
  • the applications 380 may include a browser or any other suitable mobile apps for receiving and rendering information relating to processing or other information from the drowsiness detection system 100.
  • User interactions with the information stream may be achieved via the I/O 350 and provided to the processing engine 112 and/or other components of the drowsiness detection system 100 via the network 140.
  • a camera (not shown in the figure) of the mobile device 300 may capture image (s) of a face of a driver.
  • the CPU 340 of the mobile device 300 may determine a degree of drowsiness of the driver based on the captured video of the face of the driver. Then the mobile device 300 may generate a notification based on the degree of drowsiness of the driver.
  • the notification may include a sound from a loudspeaker (not shown in the figure) of the mobile device 300, a vibration from a vibrator (not shown in the figure) of the mobile device 300, a light from the display 320 (or LEDs not shown in the figure) , or the like, or a combination thereof.
  • the mobile device 300 may be realized by a device outside the mobile device 300.
  • the mobile device 300 may work merely as a processing device and control another device to generate the notification.
  • the mobile device 300 may receive an instruction of notification and work merely as a device for generating notifications.
  • computer hardware platforms may be used as the hardware platform (s) for one or more of the elements described herein.
  • a computer with user interface elements may be used to implement a personal computer (PC) or any other type of work station or terminal device.
  • PC personal computer
  • a computer may also act as a system if appropriately programmed.
  • FIG. 4 is a block diagram illustrating an exemplary processing engine according to some embodiments of the present disclosure.
  • the processing engine 112 may include an acquisition module 410, a detection module 420, an extraction module 430, a generation module 440 and a determination module 450.
  • the acquisition module 410 may be configured to receive information.
  • the acquisition module 410 may receive a plurality of images.
  • the images may be video frames of a video captured by a camera via the network 140.
  • the processing engine 112 may receive a plurality of video frames from a camera.
  • the camera may be a network camera, a fixed camera, a dome camera, a covert camera, a Pan– Tilt–Zoom (PTZ) camera, an infrared camera, a thermal camera, or the like, or a combination thereof.
  • the plurality of video frames may correspond to a driver, especially, a face of the driver.
  • the acquisition module 410 may also receive a trained eye-blinking detection model and a trained yawning detection model from a storage device (e.g., the storage 150) via the network 140.
  • the trained eye-blinking detection model may be generated by training a preliminary eye-blinking detection model based on a plurality of training face samples.
  • the trained yawning detection model may be generated by training a preliminary yawning detection model based on a plurality of training face samples.
  • the training face samples used in training the preliminary eye-blinking detection model and the preliminary yawning detection model may be the same or different.
  • the detection module 420 may be configured to detect the face of a user in the plurality of images (e.g., video frames of a video) .
  • the user may be anyone who needs to be monitored.
  • the user may be a driver, an operator, a worker, a student, an audience, or the like, or a combination thereof.
  • the face detection may be implemented based on a template matching, a skin color segmentation, a geometric rule confirmation, or the like, or a combination thereof.
  • the face detection may be implemented based on a model that employs a singular value algorithm, a binary wavelet transformation algorithm, an AdaBoost algorithm, etc.
  • the extraction module 430 may be configured to extract the detected faces from the plurality of video frames.
  • the detected faces may be in the form of image, or pixel value.
  • the detected faces may include a plurality of face features including size, shape and/or a location of face outline, hair, lips, jaw, eyes, mouth, eyebrows, nose, or the like, or a combination thereof.
  • the generation module 440 may be configured to generate information associated with drowsiness detection. In some embodiments, the generation module 440 may generate detection result associated with drowsiness detection. For example, the generation module 440 may generate an eye-blinking detection result by inputting the extracted faces into the trained eye-blinking detecting model. As another example, the generation module 440 may generate a yawning detection result by inputting the extracted faces into the trained yawning detecting model. In some embodiments, the generation module 440 may generate a notification (or an instruction for notification) based on the eye-blinking detection result and the yawning detection result.
  • the determination module 450 may be configured to determine a degree of drowsiness. In some embodiments, the degree of drowsiness of the user may be determined based on the frequency of eye-blinking and the frequency of yawning. To determine the frequency of eye-blinking, the determination module 450 may determine the number of blinks by the user detected in a plurality of video frames based on the eye-blinking detection result and a total time length of the plurality of video frames. To determine the frequency of yawning, the determination module 450 may determine the number of yawns by the user detected in a plurality of video frames and a total time length of the plurality of video frames.
  • the processing engine 112 may further include a model training module (not shown in the figure) .
  • the model training module may be configured to train a preliminary eye-blinking detection model based on a plurality of training face samples to generate a trained eye-blinking detection model.
  • the model training module may also train a preliminary yawning detection model based on a plurality of training face samples to generate a trained yawning detection model.
  • the model training module may further train a preliminary angle determination model based on a plurality of training face samples to generate a trained angle determination model.
  • the angle determination model may be used to determine whether an angle between the direction perpendicular to the face of a user and the capturing direction of the camera is greater than a threshold (e.g., 30 degrees, 60 degrees, 100 degrees) .
  • a threshold e.g. 30 degrees, 60 degrees, 100 degrees.
  • the plurality of training face samples used in training the preliminary eye-blinking detection model, preliminary yawning detection model, and the preliminary angle determination model may be the same or different. Detailed descriptions of the exemplary process of training and using the model may be found elsewhere in this disclosure (e.g., FIG. 11 and the descriptions thereof) .
  • the modules in the processing engine 112 may be connected to or communicate with each other via a wired connection or a wireless connection.
  • the wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or a combination thereof.
  • the wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or a combination thereof.
  • LAN Local Area Network
  • WAN Wide Area Network
  • Bluetooth a Bluetooth
  • ZigBee ZigBee
  • NFC Near Field Communication
  • FIG. 5 is a flowchart illustrating an exemplary process for determining a degree of drowsiness of a driver according to some embodiments of the present disclosure.
  • the process 500 may be implemented as a set of instructions (e.g., an application) stored in the storage 390, ROM 230 or RAM 240.
  • the CPU 340, processor 220 and/or the modules in FIG. 4 may execute the set of instructions, and when executing the instructions, the CPU 340, the processor 220 and/or the modules in FIG. 4 may be configured to perform the process 500.
  • the operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 5 and described below is not intended to be limiting.
  • the processing engine 112 may receive a plurality of images.
  • the images may be video frames of a video captured by a camera.
  • the processing engine 112 may receive a plurality of video frames from a camera.
  • the camera may include but is not limited to a network camera, a fixed camera, a dome camera, a covert camera, a Pan–Tilt–Zoom (PTZ) camera, an infrared camera, a thermal camera, or the like, or a combination thereof.
  • the camera may be a standalone camera or integrated into a vehicle-mounted device (e.g., an automobile data recorder) .
  • the camera may be mounted on or part of a mobile device of a user.
  • the camera may capture a video of an object.
  • the video may include a still picture, a motion picture, an offline video, a live streaming video, or a combination thereof.
  • the object may be a user that is needed to be monitored (e.g., a driver, an operator, a student, a worker) .
  • the user may be a driver who is driving.
  • the user in order to reduce the risk of mechanical injury of a mechanical engineer or operator, the user may be the mechanical engineer or operator.
  • the user in order to raise the attention of a student (or an audience) , the user may be a student (or an audience) that attends a class (or a lecture) .
  • the plurality of video frames may be extracted continuously or discontinuously from the video.
  • the plurality of video frames may be extracted at fixed time intervals (e.g., 5 minutes, 10 minutes, 30 minutes, 1 hour, 2 hours, 5 hours, etc. ) or certain time points (e.g., 4 a.m., 9 p.m. ) .
  • the plurality of extracted video frames may correspond to a portion of the video wherein the user inside the video is at a certain condition or doing a certain thing (e.g., driving, teaching, working) .
  • the processing engine 112 may detect a face of a user in the plurality of images (e.g., video frames of a video) .
  • the face detection may be implemented in various ways including a template matching, a skin color segmentation, a geometric rule confirmation, or the like, or a combination thereof.
  • the face detection may be implemented based on a model that employs a singular value algorithm, a binary wavelet transformation algorithm, an AdaBoost algorithm or the like, or a combination thereof.
  • any video frame from the plurality of video frames that doesn’ t contain a face of the desired user may be discarded.
  • the processing engine 112 may determine an angle between the direction perpendicular to the face of the user and the capturing direction of the camera in each of the plurality of video frames.
  • the capturing direction of the camera refers to the orientation of lens of the camera.
  • the processing engine 112 may further determine the angle is greater than a threshold (e.g., 30 degrees, 60 degrees, 100 degrees) . If the processing engine 112 may determine that the angle between the direction perpendicular to the face of the user and the capturing direction of the camera in a video frame is greater than the threshold, the processing engine 112 may discard the video frame from the plurality of video frames. The discarded video frames may not be processed in further operations in FIG.
  • the determination of the angle between the direction perpendicular to the face of the user and the capturing direction of the camera may be implemented based on a model. Detailed descriptions of the exemplary process of training and using the model may be found elsewhere in this disclosure (e.g., FIG. 11 and the descriptions thereof) .
  • the processing engine 112 may extract the detected faces from the plurality of video frames.
  • the extracted faces may be in the form of images (pixel values) or part thereof.
  • the extracted faces may correspond to a plurality of face features including size, shape and/or a location of face outline, hair, lips, jaw, eyes, mouth, eyebrows, nose, or the like, or a combination thereof.
  • the extracted faces may be stored in a storage device (e.g., the storage 150) .
  • the processing engine 112 may obtain a trained eye-blinking detection model and a trained yawning detection model.
  • the trained eye-blinking detection model and the trained yawning detection model may be generated by training a preliminary eye-blinking detection model and a preliminary yawning detection model, respectively.
  • the preliminary eye-blinking detection model and/or the preliminary yawning detection model may be a convolutional neural network (CNN) , a deep belief network (DBN) , Stacked Auto-Encoders (SAE) , a logistic regression (LR) model, a support vector machine (SVM) , a decision tree model, a Naive Bayesian Model, a random forest model, a Restricted Boltzmann Machine (RBM) , a Q-learning Model, or the like, or a combination thereof.
  • the CNN model may include at least one of a convolutional layer, a Rectified Linear Unit (ReLU) layer, a fully connected layer or a pooling layer.
  • ReLU Rectified Linear Unit
  • the trained eye-blinking detection model may be generated by training a preliminary eye-blinking detection model based on a plurality of training face samples.
  • the trained yawning detection model may be generated by training a preliminary yawning detection model based on a plurality of training face samples.
  • the training face samples used in training the preliminary eye-blinking detection model and the preliminary yawning detection model may be the same or different. Detailed descriptions of the exemplary process of training the preliminary eye-blinking detection model and the preliminary yawning detection model may be found elsewhere in disclosure (e.g., FIG. 11 and the descriptions thereof) .
  • the plurality of training face samples may be acquired from a storage device (e.g., the storage 150) , the terminal 120 via the network 140, the image capturing device 130 via the network 140, or the like, or a combination thereof.
  • the plurality of training face samples may include a set of positive eye-blink training samples and a set of negative eye-blink training samples based on whether the eye (s) of the user in the training face samples blink. For example, an eye in each of the positive eye-blink training samples blink, and an eye in each of the negative eye-blink training samples does not blink.
  • a blink refers to a series of eye actions in which an action of closing eyes (also referred to as eye-close) is followed by an action of opening eyes (also referred to as eye-open) .
  • the eye-open and/or eye-close may each last for at least one video frame.
  • FIG. 8, and the descriptions thereof Detailed descriptions of the exemplary process of the determination of an eye-blink may be found elsewhere in this disclosure (e.g., FIG. 8, and the descriptions thereof) .
  • the plurality of training face samples may include a set of positive yawn training samples and a set of negative yawn training samples based on whether the face of the user in the training face samples yawns.
  • the face in each of the positive yawn training samples yawn and the face in each of the negative yawn training samples does not yawn.
  • a yawn refers to a series of face actions including the changes of face outline, mouth, eye, eyebrow, cheek, nose, jaw, hair, or the like, or a combination thereof. For example, when a person is yawning, his or her mouth may open, eyes may become small, eyebrow may rise, and jaw may tend downward.
  • the plurality of training face samples may include faces in the form of images (pixels) .
  • the face may be a segmented face with different potions or facial organs (e.g., eyes, mouth, nose, jaw, hair) segmented out.
  • the face may be an entire face without segmentation.
  • the yawn detection model may learn the change of greyscales of the entire face during a yawn without the need of knowing the default shape and size of each portion of the face.
  • the location of each portion of the face may be marked (e.g., eyes, mouth) to speed up the training of the yawn detection model.
  • the yawning detection model may learn the location and shape of the portions of the face during the training process.
  • Detailed descriptions of the exemplary process of the determination of yawning and obtaining of the trained yawning detection model may be found elsewhere in this disclosure (e.g., FIGs. 9-11 and the descriptions thereof) .
  • the processing engine 112 may generate an eye-blinking detection result by inputting the extracted faces into the trained eye-blinking detecting model.
  • the eye-blinking detection result may include a result that the eyes in the extracted faces blink or a result that the eyes in the extracted faces don’ t blink.
  • the result that the eyes blink may be determined according to a series of eye actions including an action of closing eyes (also referred to as eye-close) followed by an action of opening eyes (also referred to as eye-open) .
  • the eye-open and/or eye-close may each last for at least one video frame.
  • FIG. 8 and the descriptions thereof Detailed descriptions of the exemplary process of the generation of the eye-blinking result may be found elsewhere in this disclosure (e.g., FIG. 8 and the descriptions thereof) .
  • the processing engine 112 may further determine the frequency of eye-blinking by the user based on the eye-blinking result.
  • the frequency of eye-blinking may be in a range of 0 to 100 times per minute (e.g., zero, 5 times per minute, 10 times per minute, 20 times per minute, 50 times per minute, 100 times per minute) .
  • Detailed descriptions of the exemplary process of the determination of the frequency of eye-blinking may be found elsewhere in this disclosure (e.g., FIG. 6 and the relevant descriptions thereof) .
  • the processing engine 112 may generate a yawning detection result by inputting the extracted faces into the trained yawning detecting model.
  • the yawning detection result may include a result of yawning and/or a result of not being yawning.
  • Detailed descriptions of the exemplary process of the determination of yawning result may be found elsewhere in this disclosure (e.g., FIG. 9, 10 and the descriptions thereof) .
  • the processing engine 112 may further determine the frequency of yawning by the user based on the yawning detection result.
  • the frequency of yawning may be in a range of 0 to 30 times per minutes (e.g. zero, 1 time per minute, 2 times per minute, 5 times per minute, 10 times per minute, 30 times per minute) .
  • Detailed descriptions of the exemplary process of the determination of the frequency of yawning may be found elsewhere in this disclosure (e.g., FIG. 7 and the relevant descriptions thereof) .
  • the processing engine 112 may determine a degree of drowsiness of the user based on the eye-blinking detection result and the yawning detection result. More particularly, the degree of drowsiness of the user may be determined based on the frequency of eye-blinking and the frequency of yawning. For example, a drowsiness score may be determined according to Equation (1) as below:
  • d represents the drowsiness score
  • a represents a weight coefficient of the frequency of eye-blinking
  • f 1 represents the frequency of eye-blinking
  • b represents a weight coefficient of the frequency of yawning
  • f 2 represents the frequency of yawning.
  • the weight coefficient of the frequency of eye-blinking a and the weight coefficient of the frequency of yawning b be the same or different.
  • the weight coefficients may be default parameters stored in a storage device (e.g., the storage device 150) , or set or adjusted by an operator of the drowsiness detection system 100.
  • the drowsiness score may be represented by a numeric value (e.g., 0, 1, 2, 5, 10, 20, 50, 100) .
  • the degree of drowsiness may be determined based on the drowsiness score.
  • the degree of drowsiness may include a plurality of levels (e.g., low, medium, high) . For example, when the drowsiness score is less than a first threshold (e.g., 10) , the degree of drowsiness may be a low level. When the drowsiness score is less than a second threshold (e.g., 30) but greater than or equal to the first threshold, the degree of drowsiness may be a medium level. When the drowsiness score is greater than or equal to the second threshold, the degree of drowsiness may be determined at a high level.
  • a first threshold e.g. 10
  • a second threshold e.g., 30
  • the degree of drowsiness may be determined
  • the processing engine 112 may generate a notification based on the degree of drowsiness.
  • the processing engine 112 may generate an instruction for generating a notification based on the degree of drowsiness, which may be transmitted to a terminal of the user via a network (e.g., the network 140) .
  • the terminal may, after receiving the instruction, generate a notification according to the received instruction.
  • the notification may include a sound, a vibration, a light, or the like, or a combination thereof.
  • the processing engine 112 may determine the loudness or frequency of the sound, strength or frequency of the vibration, an intensity or twinkling frequency of the light based on the degree of drowsiness. For example, when the degree of drowsiness of the user is in the low level, the processing engine 112 or the terminal 120 may generate the notification that merely includes twinkling light. When the degree of drowsiness of the user is in the medium level, the processing engine 112 or the terminal 120 may generate the notification that includes twinkling light and a soft sound.
  • the processing engine 112 or the terminal 120 may generate the notification that includes a loud sound (e.g., with high decibels) , bright light with a high-frequency twinkling, and/or a strong vibration.
  • a loud sound e.g., with high decibels
  • bright light with a high-frequency twinkling e.g., bright light with a high-frequency twinkling
  • a strong vibration e.g., a strong vibration.
  • the camera of the mobile device may simultaneously capture images of the face of the driver.
  • the drowsiness detection method described in 500 may be implemented by the mobile device or processors thereof to determine a degree of drowsiness of the driver based on the captured images.
  • the mobile device may further include a notification generating component (e.g., a screen, a loudspeaker, a vibrating component) to generate a notification based on the determined degree of drowsiness of the driver.
  • the notification may include a sound, a vibration, a light, or the like, or a combination thereof.
  • one or more other optional operations may be omitted in the exemplary process 500.
  • 570 may be omitted.
  • the generation module 440 may generate a notification based on the eye-blinking result in 550 and the yawning detection result in 560.
  • 550 may be omitted.
  • the determination module 450 may determine the degree of drowsiness of the user based on the yawning detection result.
  • FIG. 6 is a flowchart illustrating an exemplary process for determining the frequency of eye-blinking according to some embodiments of the present disclosure.
  • the process 600 may be implemented as a set of instructions (e.g., an application) stored in the storage 390, ROM 230 or RAM 240.
  • the CPU 340, processor 220 and/or the modules in FIG. 4 may execute the set of instructions, and when executing the instructions, the CPU 340, the processor 220 and/or the modules in FIG. 4 may be configured to perform the process 600.
  • the operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 600 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed.
  • the order in which the operations of the process as illustrated in FIG. 6 and described below is not intended to be limiting.
  • the determination of the frequency blinking by the user based on the eye-blinking result described above in connection with operation 550 of the process 500 may be performed according to the process 600.
  • the processing engine 112 may determine the number (or a number count) of blinks (also referred to as eye-blinks or eye-blinking) in the plurality of video frames based on an eye-blinking detection result.
  • the number of blinks may be zero times, 1 time, 2 times, 5 times, 10 times, 20 times.
  • a blink refers to a series of eye actions in which an action of closing eyes (also referred to as eye-close) is followed by an action of opening eyes (also referred to as eye-open) .
  • the eye-open and/or eye-close may each last for at least one video frame.
  • FIG. 8 and the descriptions thereof Detailed descriptions of the exemplary process of the determination of the number of blinks may be found elsewhere in this disclosure (e.g., FIG. 8 and the descriptions thereof) .
  • the processing engine 112 may determine the total time length of the plurality of video frames.
  • the total time length of the plurality of video frames may be determined by summing up the length of each of the plurality of video frames.
  • the total time length may be 5 seconds, 10 seconds, 20 seconds, 1 minute, 5 minutes.
  • at least one of the plurality of video frames may be discarded (or skipped) .
  • the time lengths of the plurality of video frames may be same or different.
  • the time length of the each of the plurality of video frames may be a default parameter stored in a storage device (e.g., the storage 150) , or set or adjusted by an operator of the drowsiness detection system 100.
  • the processing engine 112 may determine the frequency of eye-blinking based on the number of blinks and the total time length of the plurality of video frames.
  • the frequency of eye-blinking may be 0, once per second, twice per second, once per minute, twice per minute, 10 times per minute, 20 times per minute, etc.
  • the number of blinks is 6 times
  • the total time length of the plurality of video frames is 48 s
  • the frequency of eye-blinking may be
  • operation 610 and operation 620 may be combined as a single operation in which the processing engine 112 may determine both the number of blinks and the total time length of the plurality of video frames.
  • FIG. 7 is a flowchart illustrating an exemplary process for determining the frequency of yawning according to some embodiments of the present disclosure.
  • the process 700 may be implemented as a set of instructions (e.g., an application) stored in the storage 390, ROM 230 or RAM 240.
  • the CPU 340, processor 220 and/or the modules in FIG. 4 may execute the set of instructions, and when executing the instructions, the CPU 340, the processor 220 and/or the modules in FIG. 4 may be configured to perform the process 700.
  • the operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 700 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 5 and described below is not intended to be limiting.
  • the processing engine 112 may determine the number (or a number count) of yawns in the plurality of video frames based on a yawning detection result.
  • the number of yawns may be 0, 1 time, 5 times, 10 times, 20 times, etc.
  • a yawn refers to a series of face actions including the changes of face outline, mouth, eyebrow, nose, hair, or the like, or a combination thereof. Detailed descriptions of the exemplary process of the determination of the number of yawns may be found elsewhere in this disclosure (e.g., FIGs. 9 and 10 and the descriptions thereof) .
  • the processing engine 112 may determine the total time length of the plurality of video frames.
  • the total time length of the plurality of video frames may be determined by summing up the length of each of the plurality of video frames.
  • the total time length may be 5 seconds, 10 seconds, 20 seconds, 1 minute, 5 minutes.
  • the processing engine 112 may determine a frequency of yawning based on the number of yawns and the total time length of the plurality of video frames.
  • the frequency of yawning may be 0, once per minute, twice per minute, 3 times per minute, 5 times per minute, etc.
  • the number of yawns is 2 times
  • the total time length of the plurality of video frames is 48 seconds
  • operation 710 and operation 720 may be combined as a single operation in which the processing engine 112 may determine both the number of yawns and the total time length of the plurality of video frames.
  • FIG. 8 is a schematic diagram illustrating an exemplary blink according to some embodiments of the present disclosure.
  • a blink may be determined according to a series of eye actions detected, which include an action of closing eyes (also referred to as eye-close) followed by an action of opening eyes (also referred to as eye-open) .
  • the eye-open and/or eye-close may each last for at least one video frame.
  • the length of each video frame may be less than 0.2 s, which is a typical duration of a blink.
  • the trained eye-blinking detection model may generate a result of “open” or “close” of eyes in response to each inputted video frame.
  • the result of “open” or “close” of eyes may be arranged in a time sequence of the video frames.
  • “Open” 810 may represent a video frame with opened eyes
  • “close” 820 may represent a video frame with closed eyes.
  • “Open” 810 and “close” 820 in the continuous video frames may represent an action of closing eyes (eye-close) .
  • “close” 830 may represent a video frame with closed eyes
  • “open” 840 may represent a video frame with opened eyes.
  • “Close” 830 and “open” 840 in the continuous video frames may represent an action of opening eyes.
  • the successive action of closing eyes and opening eyes may constitute a blink.
  • the trained eye-blinking detection model may directly generate a result that the eyes of the inputted face blink or not.
  • FIG. 9 is a schematic diagram illustrating an exemplary yawn according to some embodiment of the present disclosure.
  • yawning may be confirmed only when a positive (or affirmative) result of yawning is generated by the yawning detection model in a number of continuous video frames (or lasts longer than a time threshold) .
  • “yes” frames 910, 920, 930, 940 may represent frames in which the face is determined to yawn by the yawning detection model (or referred to as frames with positive yawning result)
  • the face in “no” frames 950 and 960 are determined not to yawn by the yawning detection model (or referred to as frames with negative yawning result) .
  • the successive video frames 910-930 may be confirmed to be a yawn.
  • the frames 940-960 only include one “yes” video frame, the frames 940-960 is not confirmed as a yawn.
  • the positive yawning result (e.g., “yes” ) in frame 940 may be caused due to other activities by the driver that includes a change of the entire face (e.g., talking, laughing) .
  • FIG. 10 is a schematic diagram illustrating an exemplary process for detecting a yawning according to some embodiments of the present disclosure.
  • a yawn may cause the changes of shapes, sizes, and/or locations of any portion of the face (e.g., eyes, mouth, ears, lips, nose, eyebrow, hair) .
  • the face e.g., eyes, mouth, ears, lips, nose, eyebrow, hair
  • his or her mouth may open, his or her cheek may be stressed, his or her head may leans backward, his or her jaw may go down and/or his or her eyes may become small.
  • video frames 1010-1080 may correspond to a change of the face during a yawn.
  • the man’s mouth is closed, and other portions of his face are in a usual condition.
  • the yawning detection model may generate a result that the man in 1010 is not yawning.
  • the man’s mouth begins to open or has opened slightly, and his head leans backward slightly.
  • the yawning detection model may generate a result that the man has a low chance of yawning (but a high chance of, for example, deep breathing or talking) .
  • video frames 1040 and 1050 the mouth of the man is widely opened, the cheek is stressed, the eyes are closed slightly, and the head leans backward.
  • the yawning detection model may generate a result that the man has a high chance of yawning.
  • the processing engine 112 may use the method described in, e.g., FIG. 9 to confirm whether the man is yawning. In video frames 1050-1080, the face of the man is returning to its usual condition.
  • the yawning detection model may not identify the portions of the faces to generate the yawning detection result but may generate the yawning detection result based on the greyscale change of pixels in the face. For example, the yawning detection model may not know that a bottom portion of the face of the man includes a mouth and the change of shape or size of the mouth. However, the yawning detection model may recognize that the average pixel value in a bottom section of the face increases (becomes darker when opening mouth) and determine a result that the man is yawning based on it.
  • FIG. 11 is a schematic diagram illustrating an exemplary model according to some embodiments of the present disclosure.
  • the model 1110 may be obtained by operation 540 in process 500.
  • the model 1110 may be convolutional neural network (CNN) , deep belief network (DBN) , Stacked Auto-Encoders (SAE) , logistic regression (LR) model, support vector machine (SVM) , decision tree model, Naive Bayesian Model, random forest model, or Restricted Boltzmann Machine (RBM) , Q-learning Model, or the like, or a combination thereof.
  • CNN convolutional neural network
  • DNN deep belief network
  • SAE Stacked Auto-Encoders
  • LR logistic regression
  • SVM support vector machine
  • decision tree model Naive Bayesian Model
  • random forest model random forest model
  • RBM Restricted Boltzmann Machine
  • Q-learning Model Q-learning Model
  • the model 1110 may be used to generate an eye-blinking result, a yawning result, an angel determination result, or the like, or a combination thereof.
  • the angle determination result may include whether an angle between the direction perpendicular to the face of a user and the capturing direction of the camera is greater than a threshold (e.g., 30 degrees, 60 degrees, 100 degrees) .
  • the model 1110 may be trained in a plurality of iterations. During each of the plurality of iterations, a plurality of training face samples 1120 may be input into the model 1110 to generate a preliminary result 1130. In some embodiments, the plurality of training face samples 1120 may be labeled as positive and/or negative samples.
  • the plurality of training samples may be labelled as yawn or not yawn, eye-blinking or not eye-blinking, appropriate angle (an angle between the direction perpendicular to face of a user and the capturing direction of the camera less than or equal to the threshold) or inappropriate angle (an angle between the direction perpendicular to face of a user and the capturing direction of the camera greater than the threshold) .
  • the one or more internal parameters of the model 1110 may include a weight factor, a bias term, etc.
  • a loss function may be obtained based on the label of the training face sample and the preliminary result 1130. For example, if the preliminary result 1130 and the label of the training face sample is the same, the loss function may be small. If the preliminary result 1130 and the label of the training face sample is different, the loss function may be large. The loss function may be used to update the one or more parameters of the model 1110.
  • the iterations may terminate when a preset condition is satisfied.
  • the preset condition may include that a loss function reaches a minimum value (convergence) in the training process of the model.
  • the preset condition may include the number of iterations (for example, two hundred times) performed reaches a threshold number. If the number of iterations performed reaches the threshold number (e.g., the preset condition is satisfied) , the iterations may terminate.
  • the threshold number may be set by an operator or according to default settings of drowsiness detection system 100, or a combination thereof.
  • different training face samples and labels may be used to generate different models.
  • the plurality of training face samples 1120 may be classified into a set of positive eye-blink training samples and a set of negative eye-blink training samples based on whether the eye (s) of the user in the training face samples blink.
  • the set of positive eye-blink training samples and the set of negative eye-blink training samples may be used to train an eye-blinking detection model.
  • the plurality of training face samples 1120 may be classified into a set of positive yawn training samples and a set of negative yawn training samples based on whether the face of the user in the training face samples 1120 yawns.
  • the set of positive yawn training samples and the set of negative yawn training samples may be used to train a yawning detection model.
  • the plurality of training face samples 1120 may be classified into a set of positive angle training samples and a set of negative angle training samples based on whether the angle between the direction perpendicular to face of a user and the capturing direction of the camera is less than or equal to the threshold. More particularly, the angle between the direction perpendicular to face of a user and the capturing direction of the camera may be less than or equal to the threshold in the set of positive angle training samples while the angle between the direction perpendicular to face of a user and the capturing direction of the camera may be greater than the threshold in the set of negative angle training samples.
  • the set of positive angle training samples and the set of negative angle training samples may be used to train an angle determination model.
  • a trained model may be generated.
  • Detected faces 1140 may be inputted to the trained model, and the trained model may generate detecting results 1150 in response to the detected faces 1140.
  • a trained eye-blinking detection model may output a detection result 1150 as to whether one or more eyes in the detected faces 1140 blink.
  • a trained yawning detection model may output a detection result 1150 as to whether the detected faces 1140 yawn.
  • a trained angle determination model may generate a result as to whether an angle between the direction perpendicular to detected faces 1140 and the capturing direction of the camera is less than or equal to a threshold (e.g., 30 degrees, 60 degrees) .
  • the model 1110 (and/or its one or more internal parameters) may be further updated based on the detecting results 1150.
  • FIG. 12 is a schematic diagram illustrating an exemplary automobile data recorder according to some embodiment of the present disclosure.
  • a camera in the automobile data recorder 1210 may be orientated toward a driver to acquire a plurality of video frames of the face of the driver.
  • the automobile data recorder 1210 may be connected to the vehicle via an electric wire 1220.
  • an onboard HCI system, a mobile device of the driver, or the automobile data recorder 1210 itself may process the acquired plurality of video frames to determine a degree of drowsiness of the driver.
  • a notification may be generated by the onboard HCI system, the mobile device of the driver or the automobile data recorder 1210 based on the degree of drowsiness of the driver to warn the driver of drowsy driving and remind the driver to take a break.
  • aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a "block, " “module, ” “engine, ” “unit, ” “component, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable media having computer readable program code embodied thereon.
  • a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electromagnetic, optical, or the like, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present disclosure may be written in a combination of one or more programming languages, including an object-oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 1703, Perl, COBOL 1702, PHP, ABAP, dynamic programming languages such as Python, Ruby, and Groovy, or other programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer or in a cloud computing environment or offered as a service such as a software as a service (SaaS) .
  • LAN local area network
  • WAN wide area network
  • SaaS software as a service

Abstract

L'invention concerne un procédé de détermination d'un degré de somnolence chez un conducteur. Le procédé peut comprendre la réception d'une pluralité de trames vidéo en provenance d'une caméra (510). Le procédé peut également comprendre la détection d'un visage d'un conducteur dans la pluralité de trames vidéo (520) et l'extraction des visages détectés dans la pluralité de trames vidéo (530). Le procédé peut en outre comprendre l'obtention d'un modèle de détection de clignement d'œil entraîné et d'un modèle de détection de bâillement entraîné (540). Le procédé peut en outre comprendre la génération d'un résultat de détection de clignement d'œil par l'entrée des visages extraits dans le modèle de détection de clignement d'œil entraîné (550), et la génération d'un résultat de détection de bâillement par l'entrée des visages extraits dans le modèle de détection de bâillement entraîné (560). Le procédé peut en outre comprendre la détermination d'un degré de somnolence du conducteur en fonction du résultat de détection de clignement d'œil et du résultat de détection de bâillement (570). Le procédé peut en outre comprendre la génération d'une notification en fonction du degré de somnolence (580).
PCT/CN2018/105132 2018-09-12 2018-09-12 Systèmes et procédés de détection de somnolence WO2020051781A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
MX2021002807A MX2021002807A (es) 2018-09-12 2018-09-12 Sistemas y métodos para detección de estado de somnolencia.
PCT/CN2018/105132 WO2020051781A1 (fr) 2018-09-12 2018-09-12 Systèmes et procédés de détection de somnolence
BR112021004647-0A BR112021004647B1 (pt) 2018-09-12 2018-09-12 Sistemas e métodos para detecção de sonolência
CN201880001325.8A CN111052127A (zh) 2018-09-12 2018-09-12 疲劳检测的系统和方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/105132 WO2020051781A1 (fr) 2018-09-12 2018-09-12 Systèmes et procédés de détection de somnolence

Publications (1)

Publication Number Publication Date
WO2020051781A1 true WO2020051781A1 (fr) 2020-03-19

Family

ID=69777342

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/105132 WO2020051781A1 (fr) 2018-09-12 2018-09-12 Systèmes et procédés de détection de somnolence

Country Status (4)

Country Link
CN (1) CN111052127A (fr)
BR (1) BR112021004647B1 (fr)
MX (1) MX2021002807A (fr)
WO (1) WO2020051781A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114312800A (zh) * 2022-02-14 2022-04-12 深圳市发掘科技有限公司 车辆安全驾驶方法、装置、计算机设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992352A (zh) * 2021-03-10 2021-06-18 广州云从鼎望科技有限公司 员工健康预警方法、装置及介质
CN113243917B (zh) * 2021-05-18 2023-05-12 中国民用航空总局第二研究所 一种民航管制员的疲劳检测方法、装置、电子设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040170304A1 (en) * 2003-02-28 2004-09-02 Haven Richard Earl Apparatus and method for detecting pupils
EP2557549A1 (fr) * 2010-04-05 2013-02-13 Toyota Jidosha Kabushiki Kaisha Dispositif d'estimation d'état de corps biologique
CN106295600A (zh) * 2016-08-18 2017-01-04 宁波傲视智绘光电科技有限公司 驾驶员状态实时检测方法和装置
CN106446811A (zh) * 2016-09-12 2017-02-22 北京智芯原动科技有限公司 基于深度学习的驾驶员疲劳检测方法及装置
CN107491769A (zh) * 2017-09-11 2017-12-19 中国地质大学(武汉) 基于AdaBoost算法的疲劳驾驶检测方法及系统
CN107697069A (zh) * 2017-10-31 2018-02-16 上海汽车集团股份有限公司 汽车驾驶员疲劳驾驶智能控制方法
CN108294759A (zh) * 2017-01-13 2018-07-20 天津工业大学 一种基于cnn眼部状态识别的驾驶员疲劳检测方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324166B (zh) * 2011-09-19 2013-06-12 深圳市汉华安道科技有限责任公司 一种疲劳驾驶检测方法及装置
CN104732251B (zh) * 2015-04-23 2017-12-22 郑州畅想高科股份有限公司 一种基于视频的机车司机驾驶状态检测方法
CN108020931A (zh) * 2016-10-28 2018-05-11 北京嘀嘀无限科技发展有限公司 驾驶辅助系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040170304A1 (en) * 2003-02-28 2004-09-02 Haven Richard Earl Apparatus and method for detecting pupils
EP2557549A1 (fr) * 2010-04-05 2013-02-13 Toyota Jidosha Kabushiki Kaisha Dispositif d'estimation d'état de corps biologique
CN106295600A (zh) * 2016-08-18 2017-01-04 宁波傲视智绘光电科技有限公司 驾驶员状态实时检测方法和装置
CN106446811A (zh) * 2016-09-12 2017-02-22 北京智芯原动科技有限公司 基于深度学习的驾驶员疲劳检测方法及装置
CN108294759A (zh) * 2017-01-13 2018-07-20 天津工业大学 一种基于cnn眼部状态识别的驾驶员疲劳检测方法
CN107491769A (zh) * 2017-09-11 2017-12-19 中国地质大学(武汉) 基于AdaBoost算法的疲劳驾驶检测方法及系统
CN107697069A (zh) * 2017-10-31 2018-02-16 上海汽车集团股份有限公司 汽车驾驶员疲劳驾驶智能控制方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114312800A (zh) * 2022-02-14 2022-04-12 深圳市发掘科技有限公司 车辆安全驾驶方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN111052127A (zh) 2020-04-21
MX2021002807A (es) 2021-08-11
BR112021004647A2 (pt) 2021-06-01
BR112021004647B1 (pt) 2024-01-16

Similar Documents

Publication Publication Date Title
US11321385B2 (en) Visualization of image themes based on image content
CN109964236B (zh) 用于检测图像中的对象的神经网络
US9864901B2 (en) Feature detection and masking in images based on color distributions
CN109214343B (zh) 用于生成人脸关键点检测模型的方法和装置
US20210012127A1 (en) Action recognition method and apparatus, driving action analysis method and apparatus, and storage medium
US20200175262A1 (en) Robot navigation for personal assistance
EP3274916B1 (fr) Génération automatisée de modèle tridimensionnel
US20190102603A1 (en) Method and apparatus for determining image quality
US20170011258A1 (en) Image analysis in support of robotic manipulation
KR101198322B1 (ko) 얼굴 표정 인식 방법 및 시스템
CN113015984A (zh) 卷积神经网络中的错误校正
CN112115866A (zh) 人脸识别方法、装置、电子设备及计算机可读存储介质
WO2020051781A1 (fr) Systèmes et procédés de détection de somnolence
CN108875485A (zh) 一种底图录入方法、装置及系统
US11132544B2 (en) Visual fatigue recognition method, visual fatigue recognition device, virtual reality apparatus and storage medium
CN107832721B (zh) 用于输出信息的方法和装置
US10679039B2 (en) Detecting actions to discourage recognition
CN111699512A (zh) 异常场景检测系统和方法
US11657288B2 (en) Convolutional computing using multilayered analysis engine
CN111860316A (zh) 一种驾驶行为的识别方法、装置及存储介质
US11605220B2 (en) Systems and methods for video surveillance
JP7211428B2 (ja) 情報処理装置、制御方法、及びプログラム
Willoughby et al. DrunkSelfie: intoxication detection from smartphone facial images
CN113505672B (zh) 虹膜采集装置、虹膜采集方法、电子设备和可读介质
US11727724B1 (en) Emotion detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933275

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112021004647

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112021004647

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20210311

122 Ep: pct application non-entry in european phase

Ref document number: 18933275

Country of ref document: EP

Kind code of ref document: A1