CN112101186A

CN112101186A - Device and method for identifying a vehicle driver and use thereof

Info

Publication number: CN112101186A
Application number: CN202010953462.0A
Authority: CN
Inventors: 张聪; 冯天鹏; 吕骋; 郭彦东; 马君
Original assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-18

Abstract

An apparatus for vehicle driver identification comprising: an NIR LED illuminator configured to emit NIR light in a vehicle; an NIR light sensing unit configured to capture reflected NIR light; an image control and processing unit configured to coordinate the NIR LED illuminator and the NIR light sensing unit and to analyze the reflected NIR light to generate an image; a face detector configured to determine that a human face exists in the image and to recognize a face region; a facial feature extractor configured to analyze the facial region to extract a feature vector representing the facial region; a facial feature dictionary configured to store existing feature vectors; a face retrieval system configured to generate a recognition result indicating whether a similarity between the feature vector and any existing feature vector is greater than a first threshold; and a user interface configured to display the recognition result.

Description

Device and method for identifying a vehicle driver and use thereof

Technical Field

The present invention relates generally to artificial intelligence, and more particularly to an apparatus and method for in-cabin (in-cabin) driver identification and applications thereof.

Background

The background description provided herein is for the purpose of generally presenting the context of the disclosure. The subject matter discussed in the background of the invention section should not be considered prior art as being mentioned in the background of the invention section. Similarly, the problems mentioned in the background section of the invention or associated with the subject matter of the background section of the invention should not be considered as having been previously recognized in the prior art. The subject matter in the background section of the invention merely represents different approaches which may themselves be inventions.

Personal identification, which distinguishes the identity of an individual among a group of people, is a key function in many applications. Such identification requires the collection of biometrics from a group of target people. A good collection of such biological information, more specifically a good statistical distribution of such collected data, requires that each data point has a statistically significant separation in its feature space.

In-cabin driver identification (driver ID) has recently attracted increasing attention in the automotive industry because of its potential to enable numerous intelligent or safety functions for vehicles. For example, in-cabin driver identification may be used as a substitute for vehicle keys to provide a fluid keyless entry experience, or to provide other triggering alerts to prevent any illegal entry. In addition, in-cabin driver identification also equips the vehicle with various custom settings based on the driver's preferences, such as seat position, in-vehicle temperature, rearview mirror angle, etc. More recently, in-cabin driver identification has also been used as an input to many in-cabin entertainment systems or interfaces. With knowledge of the driver ID, the embedded system of the vehicle can provide custom entertainment functions, such as playing driver's favorite albums, personalized route navigation, and providing drivers with favorite news channels.

In the industry, there are mainly two types of human recognition: invasive and non-contact. Invasive methods typically require direct measurement of biological characteristics, such as DNA fingerprints, that are unambiguously associated with individual identification. Invasive methods are superior in the accuracy of identification. However, these methods rely heavily on well-controlled laboratory environments, expensive equipment, and long test times. Furthermore, the nature of the invasive methods that require direct contact between the system and the individual's body can result in an unpleasant and annoying experience. Thus, if an intrusive approach is applied to most real-time applications, including vehicle cabin installations, the time requirements and cost of such an approach make it impractical, if not impossible.

Non-contact methods attempt to identify individuals based on indirect measurements of biometric features, such as footprint, handwriting, or face recognition. Most contactless human identification technologies do not require expensive equipment and highly skilled professionals, thereby greatly reducing costs.

Of all these contactless technologies, camera-based human recognition has attracted great interest in both academic and industrial areas. Camera-based person recognition relies only on the camera module and subsequent computing modules containing specific algorithms to distinguish individuals. These algorithms aim to extract salient features from the input image and then make decisions based on similarities or dissimilarities in a previously learned feature dictionary. These features may be extracted from various features such as body posture, height, weight, skin and motion patterns, etc. The most robust and accurate camera-based recognition method widely used within the industry, especially in automotive cabin applications, is the face recognition technology.

In-cabin driver face recognition is an emerging technology, and can provide a contactless and accurate solution for the recognition problem of drivers. In-cabin driver face recognition uses a camera to capture facial biometric information and compares it to a stored database of facial information to find the best match. As said, the system should comprise two modules: a face feature library and a face recognition module. The facial feature library may register new facial information or delete existing facial information. The face recognition module captures images through a camera, extracts features using designed algorithms, and finds the best match in a pre-constructed face feature library. The first process is called face registration and the second process is called face retrieval.

However, in-cabin face recognition is fundamentally different from ordinary face recognition in many ways. First, camera-based face recognition within the cabin must be robust under all lighting conditions (including in a dark environment). Unlike most face recognition settings in outdoor well-lit scenes or assuming well-controlled indoor lighting, in-cabin face recognition presents more challenges in terms of imaging and image quality. Glare compensation can be easily added even for outdoor identification applications that are under-illuminated. However, cabin lighting is difficult because cabin equipment is very sensitive to power consumption, which may cause severe heat generation due to its small size compared to ordinary face recognition equipment. In addition, the compensating light source for better imaging is visible to the human eye, which can create an unpleasant experience for the vehicle driver. The bright light that is targeted at the face of the driver and is visible to the human eye not only hurts the human eye, but also distracts, possibly leading to traffic accidents.

In addition to imaging challenges, on-board face recognition is also unique in the way it operates using constrained computing resources. On-board face recognition, including face registration and retrieval, requires minimal consumption of computing resources, since all algorithms run in embedded systems, i.e. Electronic Control Units (ECUs), which are equipped with only limited computing power. Compared to general face recognition methods, which are contained in powerful local servers or even online clouds with theoretically unlimited computing power, the ECU has little advantage in scalability and real-time performance. In order to produce a practical on-board face recognition system, the algorithm design for face recognition should be very careful to reduce the code computation.

Third, on-board face recognition plays a very special role in vehicles, as it is an input to various modules, and also needs to be well presented in the user interface. As described above, the result of the face recognition should be connected to various function modules and transmitted to a vehicle display such as an embedded display, a head-up display, or a dashboard. The complex connectivity of the face recognition system with other modules further increases the difficulty of designing such systems.

In-cabin face recognition is different compared to other recognition scenarios where the camera is located off-board. For example, the vehicle exterior surveillance identification system may capture only one or at most a few images for ID registration, while the inside of the cabin should achieve higher accuracy by capturing multiple images of the target face from multiple angles. By capturing images from more angles, the system can more fully understand the driver's face regardless of the driver's head pose, making recognition more robust.

Accordingly, there is a need in the art to address the above-described deficiencies and inadequacies, but heretofore unaddressed.

Disclosure of Invention

The invention relates to a device and a method for visualizing the potential behavior of objects in the surroundings of a vehicle.

In one aspect of the present invention, an apparatus for vehicle driver identification includes: a Near Infrared (NIR) Light Emitting Diode (LED) illuminator configured to emit NIR light in a vehicle; a Near Infrared (NIR) light sensing unit configured to capture reflected NIR light; an image control and processing unit configured to coordinate the NIR LED illuminator and the NIR light sensing unit, and to analyze the reflected NIR light captured by the NIR light sensing unit to generate an image; a face detector configured to determine that a human face exists in the image and to recognize a face region of the human face; a facial feature extractor configured to analyze the facial region to extract a feature vector representing the facial region; a facial feature dictionary configured to store existing feature vectors; a face retrieval system configured to generate a recognition result indicating whether a similarity between the feature vector and any existing feature vector is greater than a first threshold; and a user interface configured to display the recognition result.

In one embodiment, the NIR light-sensing unit is a Focal Plane Array (FPA) NIR light-sensing unit.

In one embodiment, the NIR light-sensing unit is covered with a filter having a pass band between 825nm to 875 nm. In another embodiment, the NIR light sensing unit is covered with a filter having a pass band between 915nm to 965 nm.

In one embodiment, the image control and processing unit is configured to coordinate the NIR LED luminaire and the NIR light sensing unit by controlling one or more of a duty cycle of the NIR LED luminaire, an analog gain of the NIR light sensing unit, a digital gain of the NIR light sensing unit, an exposure time of the NIR light sensing unit, and a frame rate of the NIR light sensing unit.

In one embodiment, the image control and processing unit is configured to coordinate the NIR LED illuminator and the NIR light sensing unit to generate an image with the best imaging quality.

In one embodiment, the face detector is configured to determine the presence of a human face in an image using a Deep Neural Network (DNN) and identify a face region of the human face. In one embodiment, the deep neural network is a multitasking convolutional neural network (MTCNN). In another embodiment, the deep neural network is a Fast region-based convolutional neural network (Fast R-CNN).

In one embodiment, the apparatus further comprises a face alignment unit. The face alignment unit is configured to calibrate the face region to a calibrated face region associated with an upright posture of the driver, wherein the facial feature extractor is configured to analyze the calibrated face region to extract a feature vector representing the calibrated face region.

In one embodiment, the facial feature extractor is configured to employ one or more of a backbone network, local feature descriptors, clustering techniques, and dimension reduction techniques.

In one embodiment, the similarity is a cosine similarity.

In another aspect of the invention, a method for vehicle driver identification includes: emitting Near Infrared (NIR) light in a vehicle by a NIR Light Emitting Diode (LED) illuminator; capturing reflected Near Infrared (NIR) light by a NIR light sensing unit; coordinating, by the image control and processing unit, the NIR LED illuminator and the NIR light sensing unit; analyzing, by the image control and processing unit, the reflected NIR light captured by the NIR light sensing unit to generate an image; determining that a human face exists in the image; recognizing a face area of a human face; analyzing the face region to extract a feature vector representing the face region; determining whether a similarity between the feature vector and any existing feature vectors in the facial feature dictionary is greater than a first threshold; generating a first recognition result indicating an identity associated with a first existing feature vector in a facial feature dictionary when a similarity between the feature vector and the first existing feature vector is greater than a first threshold; and displaying the first recognition result; generating a second recognition result indicating that the feature vector is not present in the facial feature dictionary when a similarity between the feature vector and any existing feature vector in the facial feature dictionary is not greater than a first threshold; displaying the second recognition result; and storing the facial features in the facial feature dictionary.

In one embodiment, the image control and processing unit coordinates the NIR LED illuminator and the NIR light sensing unit by controlling one or more of a duty cycle of the NIR LED illuminator, an analog gain of the NIR light sensing unit, a digital gain of the NIR light sensing unit, an exposure time of the NIR light sensing unit, and a frame rate of the NIR light sensing unit.

In one embodiment, the image control and processing unit coordinates the NIR LED illuminator and the NIR light sensing unit to generate an image with the best imaging quality.

In one embodiment, a face is determined to be present in an image and a face region of the face is identified by employing a Deep Neural Network (DNN). In one embodiment, the deep neural network is a multitasking convolutional neural network (MTCNN). In one embodiment, the deep neural network is a Fast region-based convolutional neural network (Fast R-CNN).

In one embodiment, the method further comprises: the face region is calibrated to a calibrated face region associated with an upright posture of the driver, wherein the calibrated face region is analyzed to extract a feature vector representing the calibrated face region.

In one embodiment, the face region is analyzed by employing one or more of a backbone network, local feature descriptors, clustering techniques, and dimension reduction techniques to extract feature vectors representing the face region.

In one embodiment, the similarity is a cosine similarity.

In another aspect, the invention relates to a non-transitory tangible computer-readable medium storing instructions that, when executed by one or more processors, cause a method of vehicle driver identification to be performed. The method comprises the following steps: emitting Near Infrared (NIR) light in a vehicle by a NIR Light Emitting Diode (LED) illuminator; capturing reflected Near Infrared (NIR) light by a NIR light sensing unit; coordinating, by the image control and processing unit, the NIR LED illuminator and the NIR light sensing unit; analyzing, by the image control and processing unit, the reflected NIR light captured by the NIR light sensing unit to generate an image; determining that a human face exists in the image; recognizing a face area of a human face; analyzing the face region to extract a feature vector representing the face region; determining whether a similarity between the feature vector and any existing feature vectors in the facial feature dictionary is greater than a first threshold; generating a first recognition result indicating an identity associated with a first existing feature vector in a facial feature dictionary when a similarity between the feature vector and the first existing feature vector is greater than a first threshold; and displaying the first recognition result; generating a second recognition result indicating that the feature vector is not present in the facial feature dictionary when a similarity between the feature vector and any existing feature vector in the facial feature dictionary is not greater than a first threshold; displaying the second recognition result; and storing the facial features in the facial feature dictionary.

These and other aspects of the present invention will become apparent from the following description of the preferred embodiments taken in conjunction with the accompanying drawings, although variations and modifications therein may be affected without departing from the spirit and scope of the novel concepts of the disclosure.

Drawings

The accompanying drawings illustrate one or more embodiments of the invention and, together with the written description, serve to explain the principles of the invention. The same reference numbers may be used throughout the drawings to refer to the same or like elements of an embodiment.

Fig. 1 schematically shows the overall architecture of a system for vehicle driver identification according to one embodiment of the present invention.

Fig. 2 schematically shows a flow chart of face registration using a system for vehicle driver recognition according to one embodiment of the present invention.

FIG. 3 schematically shows a flow diagram for facial retrieval using a system for vehicle driver recognition according to one embodiment of the present invention.

Fig. 4 schematically shows a flow chart of a method for vehicle driver identification according to an embodiment of the invention.

Fig. 5 schematically shows a flow chart of a method for vehicle driver identification according to an embodiment of the invention.

Detailed Description

The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like reference numerals refer to like elements throughout.

In the context of the present invention and in the specific context in which each term is used, the terms used in this specification generally have their ordinary meaning in the art. Certain terms used to describe the invention are discussed below or elsewhere in the specification to provide additional guidance to the practitioner regarding the description of the invention. For convenience, certain terms may be highlighted, e.g., using italics and/or quotation marks. The use of highlighting does not affect the scope and meaning of the term; the terms are used in the same context, whether highlighted or not, and have the same scope and meaning. It should be understood that the same thing can be described in more than one way. Thus, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is there any special meaning to whether or not that term is set forth or discussed herein. Synonyms for certain terms are provided. The recitation of one or more synonyms does not exclude the use of other synonyms. Examples used anywhere in this specification, including examples of any term discussed herein, are exemplary only and in no way limit the scope and meaning of the invention or any exemplary term. Also, the present invention is not limited to the various embodiments presented in this specification.

It will be understood that, as used in the specification herein and throughout the claims that follow, the meaning of "a", "an", and "the" includes plural referents unless the context clearly dictates otherwise. Also, it will be understood that when an element is referred to as being "on" another element, it can be directly on the other element or intervening elements may be present therebetween. In contrast, when an element is referred to as being "directly on" another element, there are no intervening elements present. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the present invention.

Furthermore, relative terms, such as "lower" or "bottom" and "upper" or "top," may be used herein to describe one element's relationship to another element as illustrated. It will be understood that relative terms are intended to encompass different orientations of the device in addition to the orientation depicted in the figures. For example, if the device in one of the figures is turned over, elements described as being on the "lower" side of other elements would then be oriented on "upper" sides of the other elements. Thus, the exemplary term "lower" can encompass both an orientation of "lower" and "upper," depending on the particular orientation of the figure. Similarly, if the device in one of the figures is turned over, elements described as "below" or "beneath" other elements would then be oriented "above" the other elements. Thus, the exemplary terms "below" or "beneath" can encompass both an orientation of above and below.

It will be further understood that the terms "comprises," "comprising," or "includes," or "including," or "having," or "carrying," or "including," or "involving," etc., are open-ended, i.e., mean including, but not limited to. When used in this specification, they specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, regions, integers, steps, operations, elements, components, and/or groups thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present invention and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, at least one of the phrases a, B, and C should be construed as a logic (a or B or C) that uses a non-exclusive logical or. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

As used herein, the term module may include an Application Specific Integrated Circuit (ASIC), an electronic circuit; a combinational logic circuit; a Field Programmable Gate Array (FPGA); a processor (shared, dedicated, or group) that executes code; other suitable hardware components that can provide the above-described functionality; or a combination of some or all of the above, such as in a system on a chip, or may refer to portions thereof. The term module may include memory (shared, dedicated, or group) that stores code executed by the processor.

As used herein, the term chip or computer chip generally refers to a hardware electronic component, and may refer to or include a small electronic circuit unit, also known as an Integrated Circuit (IC), or a combination of electronic circuits or ICs.

As used herein, the term microcontroller unit or its abbreviation MCU generally refers to a small computer on a single IC chip that can execute programs for controlling other devices or machines. A microcontroller unit contains one or more CPUs (processor cores) as well as memory and programmable input/output (I/O) peripherals, typically designed for embedded applications.

As used herein, the term interface generally refers to a communication tool or device at the point of interaction between components for performing wired or wireless data communication between the components. In general, the interface may be applicable on both hardware and software, and may be a unidirectional or bidirectional interface. Examples of physical hardware interfaces may include electrical connectors, buses, ports, cables, terminals, and other I/O devices or components. The components in communication with the interface may be, for example, components or peripherals of a computer system.

As used herein, the term code may include software, firmware, and/or microcode, and may refer to programs, routines, functions, classes, and/or objects. A single (shared) processor may be used to execute some or all code from multiple modules. In addition, some or all code from multiple modules may be stored by a single (shared) memory. A set of processors may also be used to execute some or all code from a single module. In addition, a set of memories may be used to store some or all of the code from a single module.

The apparatus and methods are described in the following detailed description and in conjunction with the following figures by way of various blocks, components, circuits, processes, algorithms, etc. (collectively referred to as "elements"). These elements may be implemented using electronic hardware, computer software, or any combination thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. For example, an element or any portion of an element or any combination of elements may be implemented as a "processing system" that includes one or more processors. Examples of processors include microprocessors, microcontrollers, Graphics Processing Units (GPUs), Central Processing Units (CPUs), application processors, Digital Signal Processors (DSPs), Reduced Instruction Set Computing (RISC) processors, systems on chip (socs), baseband processors, Field Programmable Gate Arrays (FPGAs), Programmable Logic Devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functions described throughout this disclosure. One or more processors in the processing system may execute software. Software should be construed broadly to mean instructions, instruction sets, code segments, program code, programs, subprograms, software components, applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

Thus, in one or more example embodiments, the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded on a computer-readable medium as one or more instructions or code. Computer readable media includes computer storage media. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable ROM (EEPROM), optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the above types of computer-readable media, or any other medium which can be used to store computer-executable code in the form of computer-accessible instructions or data structures.

The following description is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. The broad teachings of the present invention can be implemented in a variety of forms. Therefore, while this disclosure includes particular examples, the true scope of the disclosure should not be so limited since other modifications will become apparent upon a study of the drawings, the specification, and the following claims. For purposes of clarity, the same reference numbers will be used in the drawings to identify similar elements. It should be understood that one or more steps of the method may be performed in a different order (or simultaneously) without altering the principles of the present invention.

Fig. 1 schematically shows the overall architecture of a system for vehicle driver identification according to one embodiment of the present invention. The system 100 includes a Near Infrared (NIR) Light Emitting Diode (LED) illuminator 102, a Near Infrared (NIR) light sensing unit 104, an image control and processing unit 106, a face detector 108, a facial feature extractor 110, a facial feature dictionary 112, a face retrieval system 114, a user interface 116, and one or more application interfaces 118.

The NIR LED illuminator 102 emits light (i.e., electromagnetic radiation) in the NIR spectrum. The NIR LED illuminator 102 is configured to synchronize with the NIR light sensing unit 104 and provide sufficient brightness so that the NIR light sensing unit 104 can capture details of the driver's face even at night. On the other hand, the NIR illuminator 102 is invisible to the driver, because the NIR spectrum does not overlap with the visible spectrum of a human, eliminating the possibility of distractions and eye hazards. The human visual system responds only to the spectrum of about 400nm to 700 nm. In one embodiment, the NIR LED illuminator 102 emits light having a spectrum of 825nm to 875nm, aligned with the response spectrum of the NIR light-sensing unit 104. In another embodiment, the NIR LED illuminator 102 emits light having a spectrum of 915nm to 965nm, aligned with the response spectrum of the NIR light sensing unit 104. In one embodiment, NIR LED illuminator 102 is powered by Pulse Width Modulation (PWM) with a configurable duty cycle. A larger duty cycle provides more benefit to the NIR light sensing unit 104, but at the cost of a higher risk of overheating. A balance between lighting quality and overheating is required because of poor thermal conduction due to the limited module size. Also, the duty cycle should be synchronized with the NIR light-sensing unit 104 so that its radiation is fully utilized by the NIR light-sensing unit 104, resulting in an optimal sensing quality. This synchronization is achieved by the image control and processing unit 106, which will be described in detail below.

The light sensing unit 104 captures light originating from the NIR LED luminaire 102 or other light sources, such as the sun, moon or other luminaires having a rich NIR component in their spectrum. The NIR light sensing unit 104 converts the illumination intensity into an electrical signal, more specifically, into a digital readout of the NIR light sensing unit 104. NIR light sensing unit 104 captures light from NIR LED illuminator 102 or other light sources. In one embodiment, the NIR light-sensing unit 104 is designed to respond only to specific spectral bands. For example, the NIR light-sensing unit 104 is designed to respond only to the 825nm to 875nm spectral band. In another example, the NIR light-sensing unit 104 is designed to respond only to the spectral band of 915nm to 965 nm. This selective sensitivity to the spectrum may be achieved by applying a band pass filter on top of the NIR light sensing unit 104. It should be noted that the NIR light sensing unit does not rely solely on the NIR LED illuminator 102. The spectrum of sunlight is much broader than that of the human visual system and is also strong in the NIR band. The NIR light sensing unit 104 should work well, even better, under good sunlight irradiation conditions. Nevertheless, when no natural NIR light source is present, the NIR LED illuminator 102 is essential for low light or dark conditions.

In one embodiment, the light sensing unit 104 is a camera covered with an NIR band pass filter. In one embodiment, NIR light-sensing unit 104 is a Focal Plane Array (FPA) NIR light-sensing unit 104. FPA NIR light-sensing unit 104 is an image sensing device consisting of an array (typically rectangular) of light-sensing pixels at the focal plane of the lens. FPA NIR light-sensing unit 104 operates by detecting photons of a particular wavelength and then generating a charge, voltage, or resistance related to the number of photons detected per pixel. The charge, voltage or resistance is then measured, digitized, and used to construct an image of the object, scene or phenomenon that emitted the photons. The FPA NIR light-sensing unit 104 may provide various attributes to control its sensing behavior, including exposure time, digital gain, analog gain, gamma value, and frame rate. These attributes are crucial for the image control and processing unit 106 to obtain the best image quality, which is the basis for subsequent face registration or face retrieval.

Image control and processing unit 106 adjusts the behavior of both NIR LED illuminator 102 and NIR light sensing unit 104 and generates an image. The image control and processing unit 106 is configured to adjust the on/off period of the NIR LED illuminator 102 and the light-sensing shutter of the NIR light-sensing unit 104 together to fully utilize the energy from the NIR LED illuminator 102 for better imaging quality. Also, the image control and processing unit 106 may analyze statistics of the digital readout of the NIR light sensing unit 104 and send commands to both the NIR LED luminaire 102 and the NIR light sensing unit 104. Some examples of the properties of the NIR light sensing unit 104 that the image control and processing unit 106 may control include exposure time, analog gain, digital gain, gamma value, and frame rate. Based on commands sent by the image control and processing unit 106, the NIR LED luminaire 102 and the NIR light sensing unit may adjust their properties accordingly. Then, the image control and processing unit 106 generates an image with the best imaging quality to maximize the likelihood of successful face registration and face retrieval.

In one embodiment, the image control and processing unit 106 is included in an Electronic Control Unit (ECU) of the vehicle. The image control and processing unit 106 sends commands to the NIR LED illuminator 102 and the NIR light sensing unit 104 to coordinate them. More specifically, the image control and processing unit 106 coordinates the alignment of the duty cycle of the NIR LED illuminator 102 and the exposure time of the NIR light sensing unit 104 under low light conditions. In one embodiment, the image control and processing unit 106 may analyze the digital readout of the NIR light sensing unit 104 for statistical data to evaluate the lighting conditions. If the image control and processing unit 106 determines that the ambient lighting is sufficiently intense, it will turn off the NIR LED illuminator 102, thereby significantly reducing power consumption and heat generation.

Fig. 2 schematically shows a flow chart of face registration using a system for vehicle driver recognition according to one embodiment of the present invention. The face detector 108 is configured to determine whether a face region is present in the image 202 generated by the image control and processing unit 106. If the face detector 108 determines that a face region 204 of a person is present in the image 202 generated by the image control and processing unit 106, the face detector 108 locates the position of the face region 204 using a bounding box. The face detector 108 may be implemented by a Deep Neural Network (DNN), such as a multitasking convolutional neural network (MTCNN), a Fast region-based convolutional neural network (Fast R-CNN), or any other deep neural network. A deep neural network is an artificial neural network having multiple layers between input and output layers. Whether linear or non-linear, DNN finds the correct mathematical operation to convert the input to output. Each mathematical operation is considered to be a layer, while a complex deep neural network has many layers. The network traverses the layers and computes the probability of each output. Convolutional Neural Networks (CNNs) are a class of deep neural networks most commonly used for analyzing visual images.

In one embodiment, the face detector 108 may be a software or hardware implementation in the ECU that may determine whether a face region 204 of a person is present in the image 202 generated by the image control and processing unit 106. As described above, the image 202 has been optimized by the image control and processing unit 106. The face detector 108 may crop the face region 204 in the image 202. It should be noted that for in-cabin driver registration, images of the driver's face from different perspectives may enhance the robustness of subsequent face retrieval, making subsequent face retrieval relatively immune to perspective changes. Capturing images of the driver's face from different perspectives may be accomplished by displaying the driver visual guidance on the embedded display so that the driver can follow the visual guidance and move his head during the registration process until images of the driver's face from different perspectives are captured.

In addition to bounding boxes, the face detector 108 may output a series of key facial points called landmarks (landmark) 208. These landmarks 208 are points such as the tip of the nose, the center of the eyes, and the corners of the mouth on a human face. These landmarks 208 are critical for subsequent feature generation. Since the discrimination between faces is greatest in these salient regions, regions of the face near these landmarks 208 may be given higher weight during subsequent face registration and face retrieval. More importantly, these landmarks 208 can be used for perspective correction of the driver's face. During face registration or face retrieval, the face of the driver is likely not perfectly perpendicular to the imaging plane. Therefore, the person's face region 204 must be perspective corrected before further processing.

Based on the landmark 208, the face alignment unit 206 calibrates the posture of the face of the driver to an upright posture. In one embodiment, the face alignment unit 206 aligns the pose of the driver's face using three axis angles including yaw, pitch, and roll, thereby providing more information for subsequent face recognition. It should be noted that the face alignment unit 206 may be a separate unit from the face detector 108, but it may also be a component integrated in the face detector 108. It should also be noted that alignment by the face alignment unit 206 based on the landmarks 208 increases the accuracy of subsequent face recognition compared to other systems that generate only bounding boxes.

The facial feature extractor 110 analyzes the aligned face regions 204 and extracts feature vectors 212 representing the face regions 204. This process may also be referred to as face coding. The facial feature extractor 110 may be implemented by various neural networks, such as a backbone network (e.g., mobile networks) or local feature descriptors (e.g., SIFT) plus clusters (e.g., K-means clusters) and their subsequent dimension reduction techniques such as bag of words (BoW). A backbone network is a portion of a computer network that interconnects portions of the network to provide a path for the exchange of information between different LANs or subnetworks. Scale-invariant feature transform (SIFT) is a feature detection algorithm in computer vision to detect and describe local features in images. K-means clustering is a vector quantization method, originally from signal processing, that is popular in cluster analysis in data mining. K-means clustering aims at dividing n observations into K clusters, where each observation belongs to a cluster with the closest mean, as a prototype of the cluster. The bag of words (BoW) model is a simplified representation used in natural language processing and information retrieval. In this model, text (such as a sentence or document) is represented as a package (corpus) of words in the text, while grammars and even word order are ignored, but multiplicity is preserved. Bag of words (BoW) models have also been used for computer vision. It should be noted that for face registration, the system may need to capture multiple images of the same driver to better match.

In one embodiment, the facial feature extractor 110 is essentially an encoder of information used to identify the facial region 204 of each driver. In one embodiment, the facial feature extractor 110 is included in the ECU. The facial feature extractor 110 outputs a high-dimensional feature vector 212 representing the detected face region 204 of the driver. The feature vector 212 should be a fair representation (mathematically, a good discriminative distribution) of the detected driver's face region 204. This means that, regardless of how captured, the feature vectors 212 of the same person should be very close to each other in feature space, while the feature vectors 212 of two different persons should be well separated, even if captured under the same conditions.

The facial feature dictionary 112 stores feature vectors of all registered drivers. The facial feature dictionary 112 can add new feature vectors (e.g., feature vector 212). The facial feature dictionary 112 can also delete existing feature vectors. The facial feature dictionary 112 may add or delete feature vectors in response to a user's command (e.g., a driver's command). Alternatively, the facial feature dictionary 112 may automatically add or delete feature vectors in some cases (e.g., add feature vectors 212 after they are determined to be new feature vectors). The facial feature dictionary 112 includes feature vectors 212 for different drivers, and a lookup table that associates each feature vector 212 with its corresponding driver.

Operating in NIR spectroscopy can reduce the negative impact of complex lighting on recognition performance, but presents more challenges in designing algorithms. First, NIR based recognition ignores hue information, which helps to build stronger recognition. Second, a large-scale face data set needs to be collected to train the facial feature extractor 110 based on a deep Convolutional Neural Network (CNN). Given that almost all public facial images are in the Visible (VIS) spectrum, rather than in the NIR spectrum, the depth CNN may not be generalized because of the significant domain differences. Thus, a public VIS dataset (e.g., MS-Celeb-1M, VGGFace2) and a private NIR image are used, and the NIR spectra are oversampled to simultaneously remove the imbalance. MS-Celeb-1M is a common VIS dataset that identifies facial images and associates those images to corresponding entity keys in the knowledge base. VGGFace2 is another common VIS dataset that contains 331 million images of 9131 million objects.

The control of the trade-off between complexity (versus time and space) and model capacity is crucial, limited by computational resources and the size of CNN parameters. Intensive arithmetic operations consume a lot of power and cause overheating, while too many parameters consume intolerable load time. Therefore, some light nets (e.g., mobile nets (mobilen), shuffle nets (Shufflenet)) having depth-direction separable convolutions (dwconv) and point-direction convolutions (PW Conv) may be employed. In addition, quantification and distillation techniques are employed to reduce the computational load.

Compared to outdoor environments where true positive rate is critical, in-cabin scenarios place more attention on safety and convenience, i.e. lower false negative rate and higher true rate. Therefore, a test data set containing various difficult cases (e.g. glasses, lighting, head pose) is created and a lot of experiments are performed using the test data set to find better solutions and find the best model. The model can only be deployed by rigorous testing.

In order to make a user (e.g., a driver) feel a sense of participation and improve the accuracy of recognition, a registration guide may be displayed on a screen. More specifically, the user should be in several different poses so that feature vectors corresponding to different perspectives can be extracted and stored in the facial feature dictionary 112. Theoretically, information loss resulting from the projection of a three-dimensional (3D) real world image onto a two-dimensional (2D) face image can be partially eliminated in this way, thereby improving face recognition performance in practice.

FIG. 3 schematically shows a flow diagram for facial retrieval using a system for vehicle driver recognition according to one embodiment of the present invention. Similar to the flow chart of face registration shown in fig. 2, the face detector 108, the face alignment unit 206, and the face extractor function in the same manner. If the face detector 108 determines that a face region 204 of a person is present in the image 202 generated by the image control and processing unit 106, the face detector 108 locates the position of the face region 204 using a bounding box and outputs a series of landmarks 208. Based on the landmark 208, the face alignment unit 206 calibrates the posture of the face of the driver to an upright posture. The facial feature extractor 110 analyzes the aligned face regions 204 and extracts feature vectors 212 representing the face regions 204.

The feature vectors 212 generated by the facial feature extractor 110 will ultimately be used to initiate a query of the facial feature dictionary 112. More specifically, the face retrieval system 114 attempts to find the most similar feature vectors in the facial feature dictionary 112. There are many ways to quantify the similarity between two feature vectors and arrange them accordingly. In one embodiment, the face retrieval system is a similarity comparator 316. The similarity comparator 316 uses cosine similarity as a metric. For two N-dimensional feature vectors f representing the extracted feature vector 212 and the existing feature vectors in the facial feature dictionary 112^qAnd fⁱTheir cosine similarity is given byAnd (4) measuring by using a formula.

The numerator is two N-dimensional feature vectors f^qAnd fⁱIs the dot product of (a), and the denominator is the two N-dimensional feature vectors f^qAnd fⁱThe product of the magnitudes of (a) and (b). Feature vector f^qAnd fⁱNormalized to 1, the denominator can be omitted, i.e. in this case the cosine similarity is substantially equal to the euclidean distance. Then, the similarity comparator 316 compares the cosine similarity with a predetermined similarity threshold. If the cosine similarity is greater than the predetermined similarity threshold, the similarity comparator 316 generates an identification result 318. In other words, the similarity comparator 316 identifies the identity of the driver (e.g., John Doe).

However, if the similarity comparator 316 does not find any match in the facial feature dictionary 112 (i.e., no cosine similarity is greater than the predetermined similarity threshold), the similarity comparator 316 will output a recognition result 318 indicating that the driver has not registered, thereby indicating that a potential illegal vehicle entered or a facial registration is required.

As shown in fig. 1 and 3, the face retrieval system 114 (e.g., the similarity comparator 316) outputs the recognition result 318 to the user interface 116. The user interface 116 then displays the recognition result 318 to the driver (and passenger) accordingly. In one embodiment, the user interface 116 is an in-cabin display 320. The in-cabin display 320 displays visual or audio feedback to the driver to inform the driver of the recognition result 318. The user interface 116 may be an in-cabin video or audio device including, but not limited to, an instrument panel display, an embedded display, a heads-up display, and a speaker. The recognition result 318 may be presented graphically or textually indicating failure or success of the face search. Also, the user interface 116 outputs guidance for the driver to switch between the face registration mode and the face retrieval mode. For security reasons, switching from retrieval mode to enrolment mode requires a second step of authentication, such as a password, fingerprint and key activation.

The face retrieval system 114 (e.g., the similarity comparator 316) may also output the recognition results 318 to one or more application interfaces 118. As shown in fig. 3, the one or more application interfaces 118 may include, but are not limited to, a personalized entertainment system 322, a keyless entry system 324, a personalized seating system 326. The one or more application interfaces 118 may also include an anti-theft system, a driving pattern customization system, and a digital validation payment system. All of the one or more application interfaces 118 previously required additional verification of identity, such as passwords, fingerprints, and key activations.

In another aspect of the present invention, the method for vehicle driver identification, as shown in fig. 4 and 5, comprises the steps of: fig. 4 and 5 together schematically show a flow chart of a method for vehicle driver identification according to an embodiment of the invention. The method may be implemented by the system 100 for vehicle driver identification described above. It should be noted that the method may be implemented by other means. It should be noted that all or part of the steps according to the embodiments of the present invention may be implemented by hardware or a program indicating the relevant hardware.

At step 402, NIR LED illuminator 102 emits NIR light in a vehicle.

In step 404, the NIR light sensing unit 104 captures the reflected NIR light.

At step 406, the image control and processing unit 106 coordinates the NIR LED illuminator and the NIR light sensing unit.

In step 408, the image control and processing unit 106 analyzes the reflected NIR light captured by the NIR light sensing unit to generate an image.

In step 410, the face detector 108 determines that a human face is present in the image.

In step 412, the face detector 108 identifies a face region of the human face.

In step 414, the face detector 108 analyzes the face region to extract a feature vector representing the face region. Step 414 is followed by step 502 in fig. 5.

At step 502, the face retrieval system 114 determines whether the similarity between the feature vectors and any existing feature vectors in the facial feature dictionary 112 is greater than a first threshold.

When the similarity between the feature vector and a first existing feature vector in the facial feature dictionary 112 is greater than a first threshold, the face retrieval system 114 generates a first recognition result indicating an identification associated with the first existing feature vector at step 504. At step 512, the user interface 116 displays the first recognition result.

When the similarity between the feature vector and any existing feature vectors in the facial feature dictionary 112 is not greater than the first threshold, the face retrieval system 114 generates a second recognition result indicating that the feature vector does not exist in the facial feature dictionary 112 at step 506. At step 508, the user interface 116 displays the second recognition result. At step 510, the facial feature dictionary 112 stores the facial features in the facial feature dictionary 112.

Yet another aspect of the invention provides a non-transitory tangible computer-readable medium storing instructions that, when executed by one or more processors, cause performance of the above disclosed method for vehicle driver identification. The computer-executable instructions or program code enable the above-disclosed apparatus or similar system to perform various operations in accordance with the above-disclosed methods. The storage medium or memory may include, but is not limited to, high speed random access media or memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and non-volatile memory devices such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices or other non-volatile solid state memory devices.

The foregoing description of the exemplary embodiments of the invention has been presented only for the purposes of illustration and description and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching.

The embodiments were chosen and described in order to explain the principles of the invention and its practical application to enable others skilled in the art to utilize the invention and various embodiments and with various embodiments suited to the particular use contemplated. Alternative embodiments will become apparent to those skilled in the art to which the present invention pertains without departing from its spirit and scope. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description and the exemplary embodiments described therein.

In the specification of this disclosure, some references are cited and discussed, which may include patents, patent applications, and various publications. Citation and/or discussion of such references is provided merely to clarify the description of the present disclosure and is not an admission that any such reference is "prior art" to the disclosure described herein. All references cited and discussed in this specification are herein incorporated by reference in their entirety to the same extent as if each reference were individually incorporated by reference.

Claims

1. An apparatus for vehicle driver identification, comprising:

a Near Infrared (NIR) Light Emitting Diode (LED) illuminator configured to emit NIR light in the vehicle;

a Near Infrared (NIR) light sensing unit configured to capture reflected NIR light;

an image control and processing unit configured to coordinate the NIR LED illuminator and the NIR light sensing unit and analyze the reflected NIR light captured by the NIR light sensing unit to generate an image;

a face detector configured to determine that a human face exists in the image and to identify a face region of the human face;

a facial feature extractor configured to analyze the facial region to extract a feature vector representing the facial region;

a facial feature dictionary configured to store existing feature vectors;

a face retrieval system configured to generate a recognition result indicating whether a similarity between the feature vector and any of the existing feature vectors is greater than a first threshold; and

a user interface configured to display the recognition result.

2. The apparatus of claim 1, wherein the image control and processing unit is configured to coordinate the NIR LED luminaire and the NIR light-sensing unit by controlling one or more of a duty cycle of the NIR LED luminaire, an analog gain of the NIR light-sensing unit, a digital gain of the NIR light-sensing unit, an exposure time of the NIR light-sensing unit, and a frame rate of the NIR light-sensing unit.

3. The apparatus of claim 2, wherein the image control and processing unit is configured to coordinate the NIR LED illuminator and the NIR light sensing unit to generate an image with optimal imaging quality.

4. The apparatus of claim 1, further comprising:

a face alignment unit configured to calibrate the face region to a calibrated face region associated with an upright posture of a driver, wherein the facial feature extractor is configured to analyze the calibrated face region to extract a feature vector representing the calibrated face region.

5. The apparatus of claim 1, wherein the facial feature extractor is configured to employ one or more of a backbone network, local feature descriptors, clustering techniques, and dimension reduction techniques.

6. The apparatus of claim 1, wherein the similarity is a cosine similarity.

7. A method for vehicle driver identification, comprising:

emitting NIR light in the vehicle by a near infrared NIR Light Emitting Diode (LED) illuminator;

capturing reflected NIR light by a near infrared NIR light sensing unit;

coordinating, by an image control and processing unit, the NIR LED illuminator and the NIR light sensing unit;

analyzing, by an image control and processing unit, the reflected NIR light captured by the NIR light sensing unit to generate an image;

determining that a human face exists in the image;

identifying a face region of the human face;

analyzing the face region to extract a feature vector representing the face region;

determining whether a similarity between the feature vector and any existing feature vectors in a facial feature dictionary is greater than a first threshold; and

when the similarity between the feature vector and a first existing feature vector in the facial feature dictionary is greater than the first threshold,

generating a first recognition result indicating an identity associated with the first existing feature vector; and

displaying the first recognition result;

when the similarity between the feature vector and any existing feature vector in the facial feature dictionary is not greater than the first threshold,

generating a second recognition result indicating that the feature vector does not exist in the facial feature library;

displaying the second recognition result; and

storing the facial features in the facial feature dictionary.

8. The method of claim 7, wherein the image control and processing unit coordinates the NIR LED illuminator and the NIR light sensing unit by controlling one or more of a duty cycle of the NIR LED illuminator, an analog gain of the NIR light sensing unit, a digital gain of the NIR light sensing unit, an exposure time of the NIR light sensing unit, and a frame rate of the NIR light sensing unit.

9. The method of claim 8, wherein the image control and processing unit coordinates the NIR LED illuminator and the NIR light sensing unit to generate an image with optimal imaging quality.

10. The method of claim 7, further comprising:

calibrating the face region to a calibrated face region associated with an upright posture of a driver, wherein the calibrated face region is analyzed to extract the feature vector representing the calibrated face region.

11. The method of claim 7, wherein the facial region is analyzed to extract a feature vector representing the facial region by employing one or more of a backbone network, local feature descriptors, clustering techniques, and dimension reduction techniques.

12. The method of claim 7, wherein the similarity is a cosine similarity.

13. A non-transitory tangible computer-readable medium storing instructions that, when executed by one or more processors, cause performance of a method for vehicle driver identification, the method comprising: