WO2023158060A1

WO2023158060A1 - Multi-sensor fusion-based driver monitoring apparatus and method

Info

Publication number: WO2023158060A1
Application number: PCT/KR2022/017955
Authority: WO
Inventors: 한동석; 알윈폴로즈
Original assignee: 경북대학교 산학협력단
Priority date: 2022-02-18
Filing date: 2022-11-15
Publication date: 2023-08-24

Abstract

The present invention relates to a multi-sensor fusion-based driver monitoring apparatus and method, and the multi-sensor fusion-based driver monitoring apparatus according to the present invention monitors and outputs emotions and actions of a driver such that efficiency of an autonomous vehicle is increased, by comprising: an information collecting unit for collecting biometric information and front image information of the driver; a preprocessing unit for preprocessing the biometric information and the front image information; a shape vector generating unit that extracts respective feature vectors on the basis of the biometric information and front image information, which are preprocessed by the preprocessing unit, and generates a shape vector by combining the respective feature vectors; a determining unit for determining emotions and actions of the driver on the basis of the shape vector; and an output unit for outputting the emotions and actions of the driver.

Description

Multi-sensor fusion-based driver monitoring device and method

The present invention relates to an apparatus and method for driver monitoring based on multi-sensor convergence, and more particularly, to an apparatus and method for driver monitoring based on multi-sensor convergence that determines the driver's emotions and activities through an image of the driver's front and bio signals.

A driver monitoring system (DMS), conventionally known as a driver attention monitor, plays a very important role in autonomous vehicles.

The driver monitoring system acquires the driver's image through hardware devices such as cameras and biometric devices installed inside the vehicle, judges the driver's actions such as face recognition, fatigue detection, forward gaze detection and gesture detection, and monitors the driver's image. based on the driver's emotions.

The driver's emotions or behaviors are used as information that is very helpful in safe driving and improving the efficiency of vehicle autonomous driving in autonomous vehicles, and the driver monitoring system processes the driver's emotions or behaviors in real time to efficiently implement autonomous driving. You need a skill that can be done without delay.

However, the technology that performs the driver's emotion or behavior without information processing has problems that are not free from errors that occur in the process of judging the driver's behavior or emotion, and that errors frequently occur in real-time autonomous driving systems. do.

Therefore, there is a need for research and development on a technology for determining a driver's emotion or behavior in real time free from classification errors and implementation errors.

[Prior art literature]

[Patent Literature]

(Patent Document 1) (Republic of Korea) Patent Registration No. 10-1241841

The present invention has been made to solve the above problems, and an object of the present invention is to provide a multi-sensor fusion-based driver monitoring apparatus and method that simultaneously predicts driver's emotions and behaviors while reducing classification errors.

To achieve the above object, a multi-sensor fusion-based driver monitoring device according to an embodiment of the present invention includes an information collection unit that collects driver's front image information and biometric information; a pre-processing unit pre-processing the front image information and biometric information; a shape vector generating unit extracting each feature vector based on the front image information and biometric information preprocessed by the pre-processing unit and generating a shape vector by combining the respective feature vectors; a determination unit determining the driver's emotion and behavior based on the shape vector; and an output unit outputting the driver's emotions and behaviors.

Here, the information collection unit may include: an image collection unit collecting front image information of the driver; and a biometric collection unit that collects biometric information of the driver, including a heart rate collection unit that collects the driver's heart rate information and a voice collection unit that collects the driver's voice information.

Accordingly, the pre-processing unit sets the driver's region of interest from the front image information, converts the front image information set as the region of interest into foreground extraction information, front feature point information, and front heat map information, and presents them in the heartbeat information. Heartbeat filtering may be performed to remove a noise signal of the voice information, and voice filtering may be performed to remove a noise signal present in the voice information.

Accordingly, the feature vector generation unit removes noise signals from the foreground extraction information converted from the front image information, the front feature point information, the front heat map information, and the biometric information by using a feature vector extractor that performs vector combining. Each feature vector may be extracted based on the heartbeat information and the voice information, and a single shape vector may be generated by combining the feature vectors.

A multi-sensor fusion-based driver monitoring method according to another embodiment of the present invention is performed through a multi-sensor fusion-based driver monitoring device, and includes an information collection step of collecting driver's front image information and biometric information; a preprocessing step of preprocessing the front image information and biometric information; a shape vector generation step of extracting each feature vector based on the preprocessed front image information and biometric information, and generating a shape vector by combining the respective feature vectors; a determination step of determining the driver's emotion and behavior based on the shape vector; and an output step of outputting the driver's emotions and behaviors.

Here, the information collection step may include collecting front image information of the driver; and collecting the biometric information including the driver's heartbeat information and the driver's voice information.

Accordingly, the preprocessing step may include setting a region of interest of the driver in the front image information; converting front image information set as the region of interest into foreground extraction information, front feature point information, and front heat map information; and performing filtering to remove a noise signal present in the heartbeat information and the voice information included in the biometric information.

Accordingly, in the shape vector generating step, the foreground extraction information, the front surface feature information, the front heat map information, and the heartbeat information and the voice included in the biometric information are performed by using a feature vector extractor that performs vector combining. extracting each feature vector based on the information; and generating a single shape vector by combining the respective feature vectors.

According to one aspect of the present invention described above, by providing a multi-sensor convergence-based driver monitoring device and method, the combination error problem occurring in the process of combining driver information measured by different devices and the driver's emotion and behavior recognition Recognition error problems that occur can be solved.

In addition, by providing a driver monitoring device and method based on multi-sensor convergence, the driver's convenience and efficiency of autonomous driving can be increased.

1 is a block diagram of a multi-sensor fusion-based driver monitoring device according to an embodiment of the present invention.

FIG. 2 is a block diagram of an information collection unit of FIG. 1 .

FIG. 3 is a diagram of the multi-sensor fusion-based driver monitoring device of FIG. 1 .

FIG. 4 is a diagram for explaining pre-processing of converting front image information into foreground extraction information in the pre-processing unit of FIG. 1 .

FIG. 5 is a diagram for explaining pre-processing of converting front image information into front feature information in the pre-processing unit of FIG. 1 .

6 is a flowchart of a multi-sensor fusion-based driver monitoring method according to another embodiment of the present invention.

FIG. 7 is a flowchart of an information collection step of the multi-sensor fusion-based driver monitoring method of FIG. 6 .

8 is a flowchart of a preprocessing step of the multi-sensor fusion-based driver monitoring method of FIG. 6 .

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed description of the present invention which follows refers to the accompanying drawings which illustrate, by way of illustration, specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable one skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different from each other but are not necessarily mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in another embodiment without departing from the spirit and scope of the invention in connection with one embodiment. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description set forth below is not to be taken in a limiting sense, and the scope of the present invention, if properly described, is limited only by the appended claims, along with all equivalents as claimed by those claims. Like reference numbers in the drawings indicate the same or similar function throughout the various aspects.

Terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, "comprises" and "comprising" do not exclude the presence or addition of one or more other elements other than the recited elements. Throughout the specification, like reference numerals refer to like elements, and “and/or” includes each and all combinations of one or more of the recited elements, even though “first”, “second”, etc. are various elements. However, these components are not limited by these terms, of course. These terms are only used to distinguish one component from another component. Therefore, the first mentioned below Of course, the component may be the second component within the technical spirit of the present invention.

Unless otherwise defined, all terms used herein may be used with meanings commonly understood by those skilled in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

First, a multi-sensor fusion-based driver monitoring device according to an embodiment of the present invention will be described with reference to FIGS. 1 to 5 .

1 is a block diagram of a multi-sensor fusion-based driver monitoring device 100 according to an embodiment of the present invention.

Referring to FIG. 1 , a multi-sensor fusion-based driver monitoring device 100 according to an embodiment of the present invention includes an information collection unit 110, a pre-processing unit 130, a shape vector generator 150, and a determination unit 170. ) and an output unit 190.

The multi-sensor fusion-based driver monitoring device 100 according to an embodiment of the present invention may be a part of a vehicle control system or a separate terminal to be implemented as a vehicle option function. In addition, the configuration of the information collection unit 110, the pre-processing unit 130, the shape vector generation unit 150, the determination unit 170, and the output unit 190 may be formed as an integrated module or composed of one or more modules. there is. However, on the contrary, each component may be composed of a separate module.

In addition, the multi-sensor fusion-based driver monitoring device 100 may be mobile or fixed. For example, the multi-sensor fusion-based driver monitoring apparatus 100 may be in the form of a server or an engine, and may include a device, an apparatus, a terminal, a user equipment (UE), It may be called by other terms such as mobile station (MS), wireless device, handheld device, etc.

In addition, the multi-sensor fusion-based driver monitoring device 100 may execute or manufacture various software based on an operating system (OS), that is, a system. Here, the operating system is a system program for enabling software to use the hardware of the device, and includes mobile computer operating systems such as Android OS, Windows mobile OS, Bada OS, Symbian OS, Blackberry OS, and Windows-based, Linux-based, Unix-based, MAC, It can include all computer operating systems such as AIX and HP-UX.

2 is a block diagram of the information collection unit 110 included in the multi-sensor fusion-based driver monitoring device 100 according to an embodiment of the present invention.

First, the information collection unit 110 collects driver's front image information and biometric information.

At this time, the information collection unit 110 may collect the driver's face image captured by any one of a camera, a vision sensor, and a motion recognition sensor provided in the vehicle as the driver's front image information. To this end, the information collection unit 110 may include an image collection unit 111 that collects front image information of the driver.

In addition, the information collection unit 110 may further include a biometric collection unit 113 that collects user's biometric information. In this case, the biometric collection unit 113 may include a heartbeat collection unit 11a that collects heartbeat information of the user and a voice collection unit 113b that collects voice information of the user.

The heart rate collection unit 113a included in the biometric collection unit 113 is a smart watch worn by the driver or a heart rate measuring device capable of measuring the driver's heart rate. The driver's heartbeat may be collected as heartbeat information, and the voice collection unit 113b may collect the driver's voice measured by a voice recognition means of any one of a microphone, a voice input/output device, and a voice recognition sensor provided in the vehicle as voice information. there is. Here, the smart watch that measures the driver's heart rate is preferably used as a watch-type device that can measure the driver's heart rate, such as the Galaxy Watch series and Apple Watch series that exist in the smart watch market, but is not limited thereto.

3 is a diagram of a multi-sensor fusion-based driver monitoring device according to an embodiment of the present invention.

Referring to FIG. 3 , the pre-processing unit 130 preprocesses the collected driver's front image information and biometric information when the information collection unit 110 collects driver's front image information and biometric information.

First, the pre-processing unit 130 may set a region of interest (ROI) in front image information collected by the information collecting unit 110 .

Accordingly, the pre-processing unit 130 may select a partial region included in the front image information according to the front image information collected by the information collection unit 110 and set the selected partial region as a region of interest.

The pre-processing unit 130 is provided as a single module to perform pre-processing of the driver's front image information and biometric information at the same time, or is provided as a plurality of pre-processing modules and pre-processes the front image information and biometric information simultaneously in each pre-processing module. It can be converted into different information by performing Accordingly, in the specification of the present invention, for easy description, the pre-processing unit 130 included in the multi-sensor fusion-based driver monitoring device 100 according to an embodiment of the present invention is described as being provided with a plurality of pre-processing modules. , but not limited to

Accordingly, although not shown in FIG. 3 , the pre-processing unit 130 may include a first pre-processing module for pre-processing the front image information and a second pre-processing module for pre-processing the driver's biometric information.

First, a process in which the first pre-processing module pre-processes the front image information of the driver, which is collected by the information collection unit 110 and has a region of interest set, will be described with reference to FIGS. 4 and 5 .

FIG. 4 is a diagram for explaining pre-processing of converting front image information into foreground extraction information in the pre-processing unit 130 of FIG. 1 .

Referring to FIG. 4 , the first pre-processing module included in the pre-processing unit 130 may pre-process the driver's front image information, which is collected by the information collection unit 110 and set the region of interest, and convert it into foreground extraction information.

In this regard, the first preprocessing module may generate a data set based on front image information of the driver for which the region of interest is set.

In this case, the first pre-processing module may set the region set as the region of interest in the generated data set as the driver's face region.

Also, the first pre-processing module may set an area not set as an ROI in the generated data set as a foreground area of the front image information.

Accordingly, the first preprocessing module may extract a foreground region included in the data set and not set as a region of interest by using a foreground extraction algorithm. Here, the first pre-processing module may extract the foreground from the front image information of the driver using the GrabCut algorithm as a foreground extraction algorithm. To this end, the first pre-processing module may be provided with the GrabCut algorithm. Here, the Grabcut algorithm is used as a foreground extraction algorithm to easily describe the present invention, but a known algorithm or foreground extraction method can be used to extract or delete a foreground from an image, so it is not limited thereto.

Through this, the first pre-processing module may convert the front image information into foreground extraction information including only the driver's front image face based on the data set from which the foreground region is extracted. In addition, the first preprocessing module may further convert foreground extraction information in which only the driver's front image face exists to gray foreground extraction information in order to reduce complexity of calculation performed in a deep learning model described later.

Meanwhile, FIG. 5 is a diagram for explaining pre-processing of converting front image information into front feature point information in the pre-processing unit of FIG. 1 .

Referring to FIG. 5 , the first pre-processing module extracts the driver's feature points from the foreground extraction information obtained by converting the driver's front image information, which is collected by the information collection unit 110 and sets the region of interest, and further converts the driver's feature points into front feature point information. can

Accordingly, the first pre-processing module extracts the foreground using the Grabcut algorithm for the front image information in which the region of interest is set, and uses the Facial Image Threshing (FIT) method to extract the foreground from the foreground extraction information converted to gray. The size of the foreground extraction information can be adjusted by deleting the excluded area. Here, the facial image thrashing technique is used to correct missing emotion or behavioral data, remove inappropriate data, merge large-scale data, resize and crop images, and use emotion and behavioral video sequences input by facial image thrashing It can be converted to an output image with image editing and separation applied. This facial image thrashing technique is performed from a facial image thrashing machine, which includes a data receiver, a multi-task cascaded convolutional network (MTCNN), an image resizer, and data that is a pre-trained Xception algorithm model. Separators may be included. In this way, the facial image thrashing machine converts the emotional and behavioral video sequences into images with a data receiver, identifies a person's face in the converted image using MTCNN, and reduces the size of the image through an image size adjuster. . Thus, finally, the emotional and behavioral images can be separated into appropriately labeled directories using a data separator.

Subsequently, the first preprocessing module may identify feature points on the driver's face by using a feature point tracking algorithm in foreground extraction information in which only the face region exists. As such a feature point tracking algorithm, a SURF (Speeded UP Robust Features) algorithm may be used. Therefore, in the present specification, a known SURF algorithm is used to easily explain the present invention, but detects or detects facial feature points present in an image. For extraction, a known algorithm or facial feature point method may be used and is not limited thereto. In addition, the first preprocessing module may further include a device or module capable of performing a face image thrashing technique and a SURF algorithm to extract facial feature points of the driver from foreground extraction information and convert the facial feature points of the driver into front feature point information.

Accordingly, the first pre-processing module may extract a pixel value of the region of interest from the foreground extraction information whose size has been adjusted, combine it with the identified feature point, and further convert the pixel value into front feature point information. Such front feature point information may be used for learning or classification of a deep learning model that extracts driver's emotions and behaviors.

Meanwhile, the first pre-processing module may further convert the front image information of the driver in which the region of interest is set into front heat map information. Accordingly, the first preprocessing module may identify an object in the region set as the region of interest and track the location of the object. Here, when the driver's facial expression changes, the object may be specified as a facial body organ used for facial expression change, such as eyes, eyebrows, and mouth.

The first pre-processing module may determine the movement of the object according to the change in pixel value in the image, and determine whether the object is grounded in the region of interest according to the position of the pixel where the pixel value changes is in contact with the pixel existing in the region of interest. there is.

In addition, the first pre-processing module may check a pixel value of an object that changes according to the movement of the object and track the location of the object according to the location where the pixel value changes.

In this case, the first pre-processing module may selectively further track an object having a predetermined size or more among objects that come into contact with the region of interest. Through this, the first preprocessing module may further track other body organs such as fingers, hands, arms, etc., other than body organs present on the driver's face.

Accordingly, the first pre-processing module tracks objects that are grounded in the ROI, that is, the eyes, eyebrows, mouth, etc. of the driver's face, and the driver's fingers, hands, etc., and checks the coordinate values of pixels existing at the location of the tracked object, A heat map of driver's front image information may be generated by accumulating pixel coordinate values. Through this, the first preprocessing module may generate a heat map for the driver's face based on the front image information in which the region of interest is set, and further convert the heat map into front heat map information.

Accordingly, the first pre-processing module may convert front image information of the driver for which the region of interest is set into foreground extraction information, front feature point information, and front heat map information.

Meanwhile, the second pre-processing module included in the pre-processing unit 130 may perform filtering to remove noise signals from biometric information collected by the information collection unit 110 . Accordingly, the second pre-processing module may perform filtering to remove the sleeping signal from the driver's heartbeat information collected by the heartbeat collection unit 113a included in the information collection unit 110 . Here, the heartbeat information is a one-dimensional analog signal obtained by measuring an action current generated in the myocardium according to the heartbeat by a standard 12 derivation method, and the measured signal may include a change in amplitude of the heartbeat measured in real time.

Accordingly, the second preprocessing module may remove baseline variation noise from the heartbeat signal included in the heartbeat information. It is a low-frequency component of less than 1 Hz generated by breathing of a person who catches the baseline fluctuation, and can be removed using a band pass filter. Accordingly, the second preprocessing module may remove the noise signal included in the heartbeat information.

Meanwhile, the second pre-processing module may perform filtering to remove a noise signal from the driver's voice information collected by the voice collection unit 113b. Here, the voice signal included in the voice information is a signal generated by a person vibrating the vocal cords, and the frequency correlation between adjacent signals is very high, so that ambient background noise can be distinguished.

Accordingly, the second pre-processing module may determine the background noise by setting, as the background noise, a signal having a frequency correlation less than a threshold value in the driver's voice information. Here, the background noise may be a voice signal excluding the driver's voice, such as a vehicle's exhaust sound and a parking sound of a parked vehicle, as the background noise.

Accordingly, the second pre-processing module may remove the noise signal included in the voice information by separating and deleting the voice signal determined as background noise from the voice information.

As such, the pre-processing unit 130 may filter the heartbeat information and voice information collected by the information collecting unit 110 through the second pre-processing module to remove noise signals.

Meanwhile, the shape vector generator 150 extracts each feature vector based on the front image information and biometric information preprocessed by the preprocessor 130 and generates a shape vector by combining each feature vector.

Here, the shape vector generator 150 uses a feature vector extractor that performs vector combining to extract foreground information converted from front image information of the driver, front feature point information, front heat map information, and noise signals are removed from biometric information. Each feature vector may be extracted based on heartbeat information and voice information.

In addition, the shape vector generator 150 may extract each feature vector using a feature vector extractor, and combine the extracted feature vectors to generate a single shape vector.

The shape vector generator 150 may extract pixel values, front feature points, and a front heat map of the face image included in the foreground extraction information, front feature point information, and front heat map information converted in the first preprocessing module.

In this case, the shape vector generator 150 may extract a face feature vector based on the pixel values of the face image, the front feature points, and the front heat map using a feature vector extractor. Here, the feature vector extractor may extract the face feature vector using a known wavelet transform technique.

Meanwhile, the shape vector generator 150 may detect a waveform from a heartbeat signal included in the heartbeat information in order to extract a feature vector from the heartbeat information on which filtering has been performed. Here, the heartbeat signal is a signal composed of a waveform including a P wave, Q wave, R wave, S wave, and T wave in one cycle, and can be detected by various known techniques. Also, the heartbeat signal may be a heartbeat signal having one cycle or a plurality of cycles.

Accordingly, in the case of a heartbeat signal of one cycle, five waveforms including P wave, Q wave, R wave, S wave, and T wave may be detected, and in the case of a heartbeat signal having a plurality of cycles, one A doubling of the waveform included in the ECG signal of the period may be detected.

Accordingly, the shape vector generator 150 may extract specific coordinates from each of the detected waveforms. Here, the specific coordinates are coordinates specified through five waveforms, PP, which is the peak point of the P wave, QP, which is the peak point of the Q wave, RP, which is the peak point of the R wave, SP, which is the peak point of the S wave, and the peak of the T wave. Includes branch TP.

In addition, the shape vector generation unit 150 sets the point having the highest amplitude value of the P wave, R wave and T wave as the peak point, and sets the point having the lowest amplitude value as the peak point of the Q wave and S wave. can

Through this, the shape vector generator 150 may extract a feature vector based on two or more specific coordinates included in a heartbeat signal having one cycle or a plurality of cycles. Here, the feature vector may be calculated using time corresponding to the x-axis and amplitude corresponding to the y-axis of specific coordinates.

In this regard, when the shape vector generator 150 extracts a feature vector using two or more specific coordinates from a heartbeat signal consisting of one cycle, the distance between PP, the peak point of the P wave, and RP, the peak point of the R wave Or the slope, the distance or slope between RP, the peak point of the R wave and TP, the peak point of the T wave, the distance or slope between SP, the peak point of the S wave and TP, the peak point of the T wave, PP, the peak point of the P wave The distance or slope between SP and the peak point of the S wave, the distance or slope between PP, the peak point of the P wave and TP, the peak point of the T wave, and the distance between PP, the peak point of the P wave and QP, the peak point of the Q wave Alternatively, the slope, the distance or slope between QP, the peak point of the Q wave, and RP, the peak point of the R wave, the distance or slope between QP, the peak point of the Q wave and SP, the peak point of the S wave, QP, the peak point of the Q wave The distance or slope between TP, the peak point of the T wave, and the distance or slope between RP, the peak point of the R wave, and SP, the peak point of the S wave, can calculate the distance or slope between a total of 10 peak points.

In addition, the shape vector generation unit 150 uses the peak points of all waveforms to calculate the RP and S waves, which are the peak points of the R wave, which clearly represent the external characteristics of the heartbeat signal, when the distance and slope between the peak points are not calculated. The distance and slope can be calculated using only SP, the peak point of the T wave, and TP, the peak point of the T wave.

Accordingly, the shape vector generator 150 may calculate the distance between the peak points using a known method such as a Manhattan distance, a Euclidean distance, or a Minkowski distance.

Also, the shape vector generator 150 may calculate the slope between peak points using a known slope formula between two coordinate points.

Through this, the shape vector generator 150 may calculate specific coordinates of two or more waveforms among five waveforms included in the heartbeat signal and extract them as feature vectors.

In addition, when the heartbeat signal has a plurality of cycles, the shape vector generator 150 determines each waveform extracted from the heartbeat signal corresponding to the current cycle (n) and the heartbeat signal corresponding to the previous cycle (n-1). A feature vector can be extracted by selecting one or more coordinates in each period.

In this regard, when the heartbeat signal includes a plurality of cycles and uses specific coordinates of two or more waveforms, the shape vector generator 150 may use the P wave (P wave P corresponding to the heartbeat signal of the previous cycle (n−1)). (n-1)) between PP(n-1), the peak point, and RP(n), the peak point of the R wave (R(n)) corresponding to the heart rate signal of the current cycle (n) (R(n) The distance or slope of -P(n-1), PP(n-1), which is the peak point of the P wave (P(n-1)) corresponding to the heart rate signal of the previous cycle (n-1), and the current cycle ( The distance or slope between TP(n) (T(n)-P(n-1)), which is the peak point of the T wave (T(n)) corresponding to the heartbeat signal of n), is calculated and extracted as a feature vector. can As such, the shape vector generator 150 may extract a heartbeat feature vector for heartbeat information from the heartbeat information on which filtering is performed.

Meanwhile, the shape vector generator 150 may calculate a feature value of a voice signal included in the voice information by using a Mel Frequency Cepstral Cofficent (MFCC) method in order to extract a feature vector from the filtered voice information. . This MFCC method is a method of detecting a spectrum-based effective feature value using nonlinear frequency characteristics of the human ear. Accordingly, the shape vector generation unit 150 may calculate the feature values of the voice signal not only by using the MFCC method, but by various methods known in the art, so it is not limited thereto.

Subsequently, the shape vector generator 150 may extract a feature vector of the speech signal based on the specific value of the speech signal calculated using the MFCC method. In this regard, the shape vector generator 150 may detect a feature value, that is, a feature sequence, of the voice signal using the MFCC method.

In addition, the shape vector generator 150 may calculate the i-vector by the coupling factor analysis method using the Baum-Welch statistics on the feature values of the speech signal. Here, Baum-Welch statistics is a well-known technique, and description thereof will be omitted in this specification.

Accordingly, the shape vector generator 150 may extract the i-vector calculated through the feature values of the voice signal as the voice feature vector for the voice information.

On the other hand, the shape vector generator 150 further uses a feature vector extractor that performs vector combination to extract the face feature vector based on the foreground extraction information, the front feature point information, and the front heat map information, and the heartbeat feature vector extracted from the heartbeat signal. Voice feature vectors extracted from voice signals can be combined.

At this time, the shape vector generator 150 may generate one shape vector by taking each of the extracted feature vectors as an input and outputting the extracted feature vectors in a combined form.

Meanwhile, the determination unit 170 determines the driver's emotion and behavior based on the shape vector generated by the shape vector generator 150 . In this case, the determination unit 170 may determine the driver's emotion and behavior by inputting the shape vector to the pre-learned deep learning model. Accordingly, the driver's emotion is judged as one of normal, surprise, sadness, happiness, fear, disgust, anger, and boredom, and the driver's behavior is normal driving, yawning, blinking, checking the mirror, smoking, and using a mobile phone. It can be judged by one of the actions of using and checking the driver's vision.

Here, deep learning is defined as a collection of machine learning algorithms that attempt a high level of abstraction through a combination of several nonlinear transformation methods, and is a field of machine learning that learns human ways of thinking from a large frame.

Accordingly, the deep learning models used to extract the driver's emotions and behaviors in the present invention include a deep neural networks model, a convolution neural networks (CNN) model, a deep believe networks (DBN) model, and a deep residual neural network (Deep Residual Networks) model. One of the Learning for Image Recognition (ResNet) models can be used. Accordingly, in the present invention, a 34-layer residual neural network model obtained by modifying the structure of a known residual neural network model is used, but is not limited thereto.

Accordingly, the 34-layer residual neural network model is an image classification algorithm model, and a shortcut is added to prevent accuracy from deteriorating when using a deep layer of the algorithm model. This residual neural network model is most preferably used in a structure consisting of 34 layers, so it is referred to as a 34-layer residual neural network model, but may be used in a structure of 18, 34, 50, 101, and 152 layers, or It may be made of a structure with a larger number of layers than that.

Subsequently, the 34-layer residual neural network model is formed with a 7x7 convolution layer including 64 output channels and a 3x3 maximum pooling layer, a batch normalization layer after each convolution layer, and the same number of output channels. It can be composed of four modules provided with a plurality of residual blocks. At this time, the first module may use the same number of channels as the input channel number.

Through this, the 34-layer residual neural network model compared to the Inception architecture model, the number of channels in the first residual block of the subsequent module is doubled compared to the previous module (here, the first module), and the height and width are reduced by half, resulting in an Inception architecture Optimization is simple because it can be modified more simply than the model, and when the depth of the network is increased by providing a structure with a larger number of layers, optimization is simple and higher accuracy can be achieved.

Therefore, the determination unit 170 can solve the problem of recognition errors occurring in recognizing the driver's emotion and behavior by using a residual neural network model that provides higher accuracy as the depth of the network increases, and the determination unit 170 collects It is possible to determine the driver's emotion and behavior with higher accuracy based on the characteristic vector of the image and signal.

Finally, the output unit 190 outputs the driver's emotions and behavior determined by the determination unit 170 . In this regard, the self-driving vehicle may control vehicle functions based on the driver's emotions and behaviors output from the output unit 190 . For example, the determination unit 170 determines the driver's emotion as boredom and the driver's behavior as yawn by using a deep learning model, and accordingly, the output unit 190 determines the driver's emotion and behavior as boredom. and output with yawns. At this time, the self-driving vehicle determines that the driver is in a tired state based on the boredom and yawn output from the output unit 190 and opens the vehicle window or plays an exciting song to change the driver's condition. You can control it. For another example, the determination unit 170 determines the driver's emotion as anger and the driver's behavior as mobile phone use using a deep learning model, and accordingly, the output unit outputs anger and mobile phone use. do. At this time, the self-driving vehicle determines that a situation in which a traffic accident may occur while the driver is using the mobile phone has occurred based on the anger output by the output unit 190 and the use of the mobile phone, and controls the speed of the vehicle or generates a warning sound. Thus, vehicle functions can be controlled so that the driver can concentrate on driving.

As such, the multi-sensor fusion-based driver monitoring device 100 according to an embodiment of the present invention can monitor and output the driver's emotions and behaviors so that the efficiency of the autonomous vehicle can be increased through the above-described configuration.

A multi-sensor fusion-based driver monitoring method according to another embodiment of the present invention is a method performed through the multi-sensor fusion-based driver monitoring device 100 of FIG. It may be performed in substantially the same configuration as the configuration. Accordingly, the same reference numerals are given to the same components as those of the multi-sensor fusion-based driver monitoring apparatus 100 of FIG. 1, and repeated descriptions are omitted.

Referring to FIG. 6 , in a multi-sensor fusion-based monitoring method according to another embodiment of the present invention, an information collection step 610 of collecting driver's front image information and biometric information, and a preprocessing step of preprocessing the front image information and biometric information 630, a shape vector generation step 650 of extracting each feature vector based on the preprocessed front image information and biometric information and combining each feature vector to generate a shape vector (650), a driver based on the shape vector A decision step 670 for determining the driver's emotions and actions and an output step 690 for outputting the driver's emotions and actions are included.

Referring to FIG. 7 , the information collection step 610 may include collecting driver's front image information (611) and collecting biometric information including driver's heartbeat information and driver's voice information (613). can In this case, in the information collection step 610, the driver's front image information collection step 611 and the driver's biometric information collection step 613 may be simultaneously performed.

Meanwhile, FIG. 8 is a flowchart of a preprocessing step of the multi-sensor fusion-based driver monitoring method of FIG. 6 .

Referring to FIG. 8 , in the pre-processing step 630, a step 631 of setting a driver's region of interest from front image information is performed, and the front image information set as the region of interest is converted into foreground extraction information, front feature point information, and front heat map information. It may include step 633 and step 635 of performing filtering to remove noise signals present in heartbeat information and voice information included in biometric information.

Accordingly, converting the foreground extraction information, the front feature point information, and the front heat map information (step 633) includes converting the front image information into foreground extraction information (633a) and converting the front image information into front feature point information (633b). ), converting from front image information to front heat map information (633c), each step being performed simultaneously or, although not shown in FIG. 8, converting to foreground extraction information (633a), front feature points Converting to information (633b) and converting to front heat map information (633c) may be performed in order.

Meanwhile, in the preprocessing step 630, the step 631 of setting the region of interest of the driver and the step 635 of performing filtering may be simultaneously performed.

In this regard, performing filtering (635) includes performing filtering to remove noise signals from heart rate information (635a) and performing filtering to remove noise signals from voice signals (635b); .

Through this, the shape vector generation step 650 is based on the foreground extraction information, the front feature information, the front heat map information, and the heartbeat information and voice information included in the biometric information using a feature vector extractor that performs vector combining, respectively. It may include extracting a feature vector of and generating a single shape vector by combining each feature vector.

As such, the multi-sensor fusion-based driver monitoring method according to another embodiment of the present invention can monitor and output the driver's emotions and behaviors so that the efficiency of the autonomous vehicle can be increased through the above-described steps.

Although the above has been described with reference to embodiments, it will be understood that those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention described in the claims below. You will be able to.

[Description of code]

100: multi-sensor fusion-based driver monitoring device

110: information collection unit

111: image collection unit

113: living body collection unit, 113a: heartbeat collection unit, 113b: voice collection unit

130: pre-processing unit

150: shape vector generation unit

170: judgment unit

190: output unit

Claims

an information collection unit that collects driver's front image information and biometric information;

a pre-processing unit pre-processing the front image information and biometric information;

a shape vector generating unit extracting each feature vector based on the front image information and biometric information preprocessed by the pre-processing unit and generating a shape vector by combining the respective feature vectors;

a determination unit determining the driver's emotion and behavior based on the shape vector; and

A multi-sensor fusion-based driver monitoring device comprising an output unit outputting the driver's emotions and behaviors.
According to claim 1,

The information collection unit,

an image collection unit that collects front image information of the driver; and

a biometric collection unit that collects the driver's biometric information including a heartbeat collection unit that collects the driver's heartbeat information and a voice collection unit that collects the driver's voice information; a driver based on multi-sensor fusion monitoring device.
According to claim 2,

The pre-processing unit,

The driver's region of interest is set from the front image information, the front image information set as the region of interest is converted into foreground extraction information, front feature point information, and front heat map information, and the heartbeat information included in the biometric information, the A driver monitoring device based on multi-sensor fusion, characterized in that filtering is performed to remove noise signals present in voice information.
According to claim 3,

The shape vector generator,

The foreground extraction information converted from the front image information, the front feature point information, and the front heat map information converted from the front image information using a feature vector extractor performing vector combining, and the heart rate information and the voice information from which noise signals are removed from the biometric information A driver monitoring device based on multi-sensor fusion, characterized in that each feature vector is extracted based on the basis and combined to generate a single shape vector.
A driver monitoring method performed through a multi-sensor fusion-based driver monitoring device,

an information collection step of collecting driver's front image information and biometric information;

a preprocessing step of preprocessing the front image information and biometric information;

a shape vector generation step of extracting each feature vector based on the preprocessed front image information and biometric information, and generating a shape vector by combining the respective feature vectors;

a determination step of determining the driver's emotion and behavior based on the shape vector; and

A multi-sensor fusion-based driver monitoring method comprising an output step of outputting the driver's emotions and behaviors.
According to claim 5,

In the information gathering step,

collecting front image information of the driver; and

and collecting the biometric information including the driver's heart rate information and the driver's voice information.
According to claim 6,

In the preprocessing step,

setting a region of interest of the driver from the front image information;

converting front image information set as the region of interest into foreground extraction information, front feature point information, and front heat map information; and

and performing filtering to remove noise signals present in the heartbeat information and the voice information included in the biometric information.
According to claim 7,

The shape vector generation step,

Using a feature vector extractor that performs vector combining, each feature vector is obtained based on the foreground extraction information, the front face feature information, the front heat map information, the heart rate information included in the biometric information, and the voice information. extracting; and

The multi-sensor fusion-based driver monitoring method comprising the step of generating a single shape vector by combining the respective feature vectors.