CN117485348A

CN117485348A - Driver intention recognition method

Info

Publication number: CN117485348A
Application number: CN202311618805.8A
Authority: CN
Inventors: 牛超; 赵运; 郑岳琦; 马天发; 全杰; 孙德荣; 王晋武
Original assignee: Changchun Automotive Test Center Co ltd
Current assignee: Changchun Automotive Test Center Co ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-02-02
Anticipated expiration: 2043-11-30
Also published as: CN117485348B

Abstract

The invention provides a driver intention recognition method, which comprises the following steps of S1: collecting data of driver behaviors and image and video data of driving environment; step S2: extracting key visual features from the data of the step S1 for subsequent intention recognition; step S3: carrying out semantic analysis on the acquired driving environment image and video data, and extracting key semantic information; step S4: and fusing various feature information according to the visual feature extraction and the semantic analysis to obtain a driver intention recognition result.

Description

Driver intention recognition method

Technical Field

The invention relates to the technical field of intelligent interaction of automobiles, in particular to a driver intention recognition method.

Background

With the intelligent starting of automobiles, people want to know more and more themselves for the needs of good experience of automobiles, and customize corresponding service contents and auxiliary driving according to the states and the needs of the people; the method accurately identifies the driving intention of the driver and plays an extremely important role in providing more humanized service and safer and more comfortable auxiliary driving for the driver.

At present, the existing driver identification functions are integrated in different ADAS controllers, and the driver intention is often identified by a single signal such as a turn signal light signal, a brake switch signal and the like. However, due to the fact that signals are few and the identification method is simple, erroneous judgment is often caused, and therefore the performance of the system is affected; in the case of various judgment information, a collision occurs, and the intention of the driver cannot be accurately recognized.

Disclosure of Invention

The present invention is directed to a driver intention recognition method, which solves the above-mentioned problems of the related art.

The invention is realized by the following technical scheme: a driver intention recognition method, the method comprising the steps of:

step S1: collecting data of driver behaviors and image and video data of driving environment;

step S2: extracting key visual features from the data of the step S1 for subsequent intention recognition;

step S3: carrying out semantic analysis on the acquired driving environment image and video data, and extracting key semantic information;

step S4: and fusing various feature information according to the visual feature extraction and the semantic analysis to obtain a driver intention recognition result.

Specifically, the visual feature extraction in the step S2 specifically includes the following steps:

step S2.1: face detection and recognition, namely detecting the face in the driver behavior data through an MTCNN neural network, and recognizing the detected face through a FaceNet face recognition algorithm to acquire the identity information of the driver;

step S2.2: the method comprises the following steps of performing expression analysis, namely analyzing the facial expression of a driver through a FERNET network, and extracting the current expression state of the driver through a facial expression recognition algorithm, wherein the expression state comprises anger, happiness and confusion;

step S2.3: gesture recognition, namely recognizing hand actions of a driver through a space-time attention network, and extracting the type and state of the gesture of the driver;

step S2.4: estimating the space gesture, namely estimating the head gesture of a driver through a 3D gesture estimation network to acquire information such as the rotation angle and the direction of the head;

step S2.5: and (3) analyzing driving scenes, namely performing target detection and tracking on images of driving environments through a fast R-CNN model, and extracting scene information such as road signs, traffic lights, pedestrians, obstacles and the like.

Specifically, the specific steps for extracting the key semantic information in the step S3 are as follows:

step S3.1: dividing the driving environment image, extracting semantic information of different areas, specifically comprising dividing different areas such as roads, sky, buildings and the like, and deducing semantic meaning of each area;

step S3.2: establishing a semantic relation model through a graph neural network, analyzing the relation between target objects in a driving environment, and judging the relation between traffic signals and pedestrians and the relation between vehicles and road signs;

step S3.3: action recognition and intention reasoning utilize action recognition and intention reasoning algorithm to comprehensively analyze the behavior and driving environment of the driver, and deduce the intention of the driver by analyzing the behavior action and environment semantic information of the driver;

step S3.4: and modeling and analyzing the driver behaviors and the driving environment through a cyclic neural network by combining the historical data, and predicting possible future events, including traffic jams and road conditions, by considering the historical data of the current driving environment.

Specifically, the specific process of multi-mode fusion in the step 4 is as follows:

step S4.1: the method comprises the steps of carrying out fusion on feature information from visual feature extraction and semantic analysis, and specifically comprises the steps of fusing visual features of face recognition, expression analysis and gesture recognition with semantic features of target detection and scene segmentation to form a multi-modal feature vector;

step S4.2: according to the contribution degree of different modal information, the weight of the characteristic information is adjusted, and according to the complexity degree of a driving scene and the criticality of the behavior of a driver, the weight of different characteristics is adjusted, so that in the fusion process, important characteristic information can better influence the final intention recognition result;

step S4.3: when conflict exists among different modal information, conflict resolution is carried out, when the face recognition result shows that the driver is in an anger state, but other visual characteristics and semantic analysis results indicate that the driving scene is normal, the conflict resolution is needed, and the final intention recognition result is determined;

step S4.4: and predicting the intention of the driver at the current moment by combining the intention recognition result at the previous moment and the current driving environment state, obtaining a more accurate intention recognition result and adapting to the change of the driving environment.

Specifically, in the step S4.2, the specific process of adjusting the weight of the feature information is as follows:

firstly, setting by random assignment or according to priori knowledge, and initializing the weight of each mode;

secondly, forward propagation is carried out on various characteristic information data in the multi-mode characteristic vector through each mode, and an output value of each mode is calculated;

thirdly, calculating a total loss value by weighting and summing the losses of each mode based on the output value of each mode;

fourth, the loss values are back propagated to the network, and the contribution degree of each mode to the total loss is calculated;

fifthly, calculating the gradient of each modal weight according to the contribution degree of each modal to the total loss;

sixth, the gradient descent optimizer is used to update the weights of each modality.

Specifically, in step S4.3, when there is a conflict between different modality information, a specific process of resolving the conflict is as follows:

the conflict is detected in the data after the multi-mode fusion through the support vector machine model, the area with the conflict in the fusion data is marked, and for the detected conflict, the decision is made through combining the weight of each mode information through the decision tree algorithm, so that the final intention recognition result is determined.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a driver intention recognition method, which fuses multi-mode information, can accurately recognize the intention of a driver by carrying out weight adjustment and conflict resolution on the multi-mode information, provides accurate and comprehensive driver intention information for an intelligent driving system, and improves driving safety and driving experience.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only preferred embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an overall structure diagram of a driver intention recognition method provided by the present invention.

Fig. 2 is a specific flowchart of step S2 visual feature extraction provided in the present invention.

Fig. 3 is a specific flowchart of the semantic analysis of step S3 provided in the present invention.

Fig. 4 is a specific flowchart of step S4 multi-mode fusion provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein. Based on the embodiments of the invention described in the present application, all other embodiments that a person skilled in the art would have without inventive effort shall fall within the scope of the invention.

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without one or more of these details. In other instances, well-known features have not been described in detail in order to avoid obscuring the invention.

It should be understood that the present invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of the associated listed items.

In order to provide a thorough understanding of the present invention, detailed structures will be presented in the following description in order to illustrate the technical solutions presented by the present invention. Alternative embodiments of the invention are described in detail below, however, the invention may have other implementations in addition to these detailed descriptions.

Referring to fig. 1, a driver intention recognition method includes the steps of:

step S2: extracting key visual features from the step S1 for subsequent intention recognition;

According to the driver intention recognition method, firstly, the face in the driver behavior data is detected through the MTCNN neural network, and then the detected face is recognized by using a FaceNet face recognition algorithm, so that the identity information of the driver is obtained. Facial expression recognition is carried out through the FERNET network, facial expressions of the driver are analyzed, and the current expression states of the driver, such as anger, happiness, confusion and the like, are extracted from the behavior data of the driver so as to judge the emotion and intention of the driver. Meanwhile, through the empty attention network, the hand actions of the driver are identified to extract the types and states of gestures of the driver, such as steering wheel, button pressing and the like, so as to help judge the operation intention of the driver. On the basis, the target object in the driving environment is detected and tracked through the fast R-CNN network, so that the information such as the position, the motion state and the like of the target object is extracted, and the intention of a driver is assisted to be judged. And then estimating the head gesture of the driver through a 3D gesture estimation network so as to acquire information such as the rotation angle and the direction of the head and judge the sight line and the attention direction of the driver. To help determine the driving scenario and intent in which the driver is currently located. And analyzing the relation between the target objects in the driving environment by using a graph neural network semantic relation model. For example, determining the relationship between traffic lights and pedestrians, and between vehicles and road signs, these models may provide richer semantic information that aids in determining the driver's intent. By using a cyclic neural network (RNN), the behavior and the driving environment of a driver are comprehensively analyzed, the intention of the driver is deduced through analyzing the behavior action and the environmental semantic information of the driver, such as lane changing, acceleration, parking and the like, a long-short-term memory network (LSTM) model is reused, the behavior and the driving environment of the driver are modeled and analyzed, and the history data of the current driving environment is considered to predict the possible occurrence of events such as traffic jams, road conditions and the like in the future so as to improve the accuracy of supporting the intention recognition of the driver. And fusing various feature information of the extracted visual feature extraction and semantic analysis, comprehensively utilizing the feature information of different modes, and improving the accuracy of driver intention recognition. And carrying out post-processing reasoning on the fused intention recognition result. Statistical methods are used to infer the next behavior of the driver, converting the intent recognition results into corresponding driving operations such as acceleration, braking, steering, etc. This translates the driver's intent into specific row instructions to support decision and control of the intelligent driving system.

Specifically, referring to fig. 2, the visual feature extraction in step 2 specifically includes the following steps:

exemplary, specific steps of step S2.1 are as follows:

first, a driver behavior data set needs to be prepared for training and testing, which data set should contain behavior data of the driver, and face images associated with each behavior, ensuring that the face images in the data set have been labeled with corresponding identity information.

Inputting an image in the driver behavior data, transmitting the image to an MTCNN model, and outputting the detected face position by the model;

extracting the face position detected by the MTCNN from the original image, and preprocessing, wherein the preprocessing step can comprise operations of cutting, adjusting the size, normalizing and the like, so that all face images are ensured to have the same size and characteristic representation;

the face images after preprocessing are identified by using a FaceNet face recognition algorithm, the face images are mapped to a high-dimensional feature space, the similarity between the faces is calculated, and the most similar identity information can be found by comparing the face images to be identified with the face images with known identities;

and selecting a proper similarity threshold according to the output result of the FaceNet, comparing the face to be recognized with the face with the known identity, if the similarity exceeds the threshold, considering the face to belong to the known identity, and extracting corresponding identity information.

exemplary, specific steps of step S2.2 are as follows:

firstly, preparing a driver facial expression data set for training and testing, wherein the data set contains facial expression images of a driver under different emotions and is marked with corresponding expression categories;

dividing the data set into a training set and a testing set, training the FERNT network by using the training set, updating the weight and bias of the network by using a back propagation algorithm and a proper optimizer (such as Adam) by inputting the preprocessed expression image into the FERNT network, and defining a proper loss function (such as cross entropy loss) to measure the difference between the prediction result of the model and the real label in the training process;

setting a threshold according to the output result of the FERNET network, and judging that the driver is in an anger state when the prediction probability of the anger class exceeds the threshold; when the prediction probability of the happy category exceeds a threshold value, judging that the driver is in a happy state; when the prediction probability of the confusion class exceeds the threshold value, it is judged that the driver is in a confusion state.

exemplary, specific steps of step S2.3 are as follows:

first, a driver hand motion data set is prepared for training and testing. The data set should contain video sequences of the driver under different gestures and be annotated with the corresponding gesture types and states.

Training a time-space attention network by using a training set, and measuring the difference between a predicted result of the model and a real label by a loss function by inputting a preprocessed video sequence into the network;

setting a threshold according to the output result of the space-time attention network, and judging that the gesture is being performed by the driver when the prediction probability of a certain gesture type exceeds the threshold; when the prediction probability of a certain gesture state exceeds a threshold value, the driver is judged to be in the gesture state.

exemplary, specific steps of step S2.4 are as follows:

first, a driver head pose data set for training and testing is prepared. The data set should contain head images or video sequences of the driver in different postures, and corresponding information such as head rotation angle and direction is marked.

Training a 3D pose estimation network using a training set, by inputting a preprocessed head image or video sequence into the network,

according to the output result of the 3D gesture estimation network, information such as the rotation angle and the direction of the head of the driver can be obtained, and the rotation angle and the direction of the head can be extracted from the Euler angle or the rotation matrix output by the network.

Exemplary, specific steps of step S2.5 are as follows:

first, a driving environment image dataset for training and testing is prepared. The dataset should contain images of various objects in the driving environment (e.g., road signs, traffic lights, pedestrians, obstacles, etc.), and be labeled with bounding box locations and category labels for the corresponding objects.

The Faster R-CNN model is trained using a training set. Inputting the preprocessed driving environment image into the model, and then outputting a result;

according to the output result of the fast R-CNN model, scene information such as road signs, traffic lights, pedestrians, obstacles and the like in a driving environment can be extracted.

Specifically, referring to fig. 3, the specific steps for extracting the key semantic information in the step 3 are as follows:

Specifically, referring to fig. 4, the specific process of the multimodal fusion in step 4 is as follows:

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. A driver intention recognition method, characterized in that the method comprises the steps of:

2. The method for recognizing the intention of a driver according to claim 1, wherein the visual feature extraction in the step S2 specifically includes the steps of:

3. The method for identifying the intention of the driver according to claim 2, wherein the specific step of extracting the key semantic information in the step S3 is as follows:

4. A method for identifying driver's intention according to claim 3, wherein the specific procedure of the multi-modal fusion in step 4 is as follows:

5. The method for identifying the intention of the driver according to claim 4, wherein in the step S4.2, the specific process of weighting the characteristic information is as follows:

6. The method for identifying the intention of the driver according to claim 5, wherein in the step S4.3, when there is a conflict between different modality information, the specific process of performing the conflict resolution is as follows: