CN117485348A - Driver intention recognition method - Google Patents
Driver intention recognition method Download PDFInfo
- Publication number
- CN117485348A CN117485348A CN202311618805.8A CN202311618805A CN117485348A CN 117485348 A CN117485348 A CN 117485348A CN 202311618805 A CN202311618805 A CN 202311618805A CN 117485348 A CN117485348 A CN 117485348A
- Authority
- CN
- China
- Prior art keywords
- driver
- intention
- information
- semantic
- driving environment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000006399 behavior Effects 0.000 claims abstract description 29
- 230000000007 visual effect Effects 0.000 claims abstract description 22
- 238000004458 analytical method Methods 0.000 claims abstract description 16
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 230000009471 action Effects 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000008921 facial expression Effects 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 9
- 230000014509 gene expression Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000010195 expression analysis Methods 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000003066 decision tree Methods 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 3
- 230000000644 propagated effect Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 description 14
- 238000012360 testing method Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 102100034112 Alkyldihydroxyacetonephosphate synthase, peroxisomal Human genes 0.000 description 1
- 206010010305 Confusional state Diseases 0.000 description 1
- 101000799143 Homo sapiens Alkyldihydroxyacetonephosphate synthase, peroxisomal Proteins 0.000 description 1
- 238000000848 angular dependent Auger electron spectroscopy Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/08—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2540/00—Input parameters relating to occupants
- B60W2540/043—Identity of occupants
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2540/00—Input parameters relating to occupants
- B60W2540/223—Posture, e.g. hand, foot, or seat position, turned or inclined
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2552/00—Input parameters relating to infrastructure
- B60W2552/50—Barriers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2552/00—Input parameters relating to infrastructure
- B60W2552/53—Road markings, e.g. lane marker or crosswalk
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/40—Dynamic objects, e.g. animals, windblown objects
- B60W2554/402—Type
- B60W2554/4029—Pedestrians
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Mathematical Physics (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention provides a driver intention recognition method, which comprises the following steps of S1: collecting data of driver behaviors and image and video data of driving environment; step S2: extracting key visual features from the data of the step S1 for subsequent intention recognition; step S3: carrying out semantic analysis on the acquired driving environment image and video data, and extracting key semantic information; step S4: and fusing various feature information according to the visual feature extraction and the semantic analysis to obtain a driver intention recognition result.
Description
Technical Field
The invention relates to the technical field of intelligent interaction of automobiles, in particular to a driver intention recognition method.
Background
With the intelligent starting of automobiles, people want to know more and more themselves for the needs of good experience of automobiles, and customize corresponding service contents and auxiliary driving according to the states and the needs of the people; the method accurately identifies the driving intention of the driver and plays an extremely important role in providing more humanized service and safer and more comfortable auxiliary driving for the driver.
At present, the existing driver identification functions are integrated in different ADAS controllers, and the driver intention is often identified by a single signal such as a turn signal light signal, a brake switch signal and the like. However, due to the fact that signals are few and the identification method is simple, erroneous judgment is often caused, and therefore the performance of the system is affected; in the case of various judgment information, a collision occurs, and the intention of the driver cannot be accurately recognized.
Disclosure of Invention
The present invention is directed to a driver intention recognition method, which solves the above-mentioned problems of the related art.
The invention is realized by the following technical scheme: a driver intention recognition method, the method comprising the steps of:
step S1: collecting data of driver behaviors and image and video data of driving environment;
step S2: extracting key visual features from the data of the step S1 for subsequent intention recognition;
step S3: carrying out semantic analysis on the acquired driving environment image and video data, and extracting key semantic information;
step S4: and fusing various feature information according to the visual feature extraction and the semantic analysis to obtain a driver intention recognition result.
Specifically, the visual feature extraction in the step S2 specifically includes the following steps:
step S2.1: face detection and recognition, namely detecting the face in the driver behavior data through an MTCNN neural network, and recognizing the detected face through a FaceNet face recognition algorithm to acquire the identity information of the driver;
step S2.2: the method comprises the following steps of performing expression analysis, namely analyzing the facial expression of a driver through a FERNET network, and extracting the current expression state of the driver through a facial expression recognition algorithm, wherein the expression state comprises anger, happiness and confusion;
step S2.3: gesture recognition, namely recognizing hand actions of a driver through a space-time attention network, and extracting the type and state of the gesture of the driver;
step S2.4: estimating the space gesture, namely estimating the head gesture of a driver through a 3D gesture estimation network to acquire information such as the rotation angle and the direction of the head;
step S2.5: and (3) analyzing driving scenes, namely performing target detection and tracking on images of driving environments through a fast R-CNN model, and extracting scene information such as road signs, traffic lights, pedestrians, obstacles and the like.
Specifically, the specific steps for extracting the key semantic information in the step S3 are as follows:
step S3.1: dividing the driving environment image, extracting semantic information of different areas, specifically comprising dividing different areas such as roads, sky, buildings and the like, and deducing semantic meaning of each area;
step S3.2: establishing a semantic relation model through a graph neural network, analyzing the relation between target objects in a driving environment, and judging the relation between traffic signals and pedestrians and the relation between vehicles and road signs;
step S3.3: action recognition and intention reasoning utilize action recognition and intention reasoning algorithm to comprehensively analyze the behavior and driving environment of the driver, and deduce the intention of the driver by analyzing the behavior action and environment semantic information of the driver;
step S3.4: and modeling and analyzing the driver behaviors and the driving environment through a cyclic neural network by combining the historical data, and predicting possible future events, including traffic jams and road conditions, by considering the historical data of the current driving environment.
Specifically, the specific process of multi-mode fusion in the step 4 is as follows:
step S4.1: the method comprises the steps of carrying out fusion on feature information from visual feature extraction and semantic analysis, and specifically comprises the steps of fusing visual features of face recognition, expression analysis and gesture recognition with semantic features of target detection and scene segmentation to form a multi-modal feature vector;
step S4.2: according to the contribution degree of different modal information, the weight of the characteristic information is adjusted, and according to the complexity degree of a driving scene and the criticality of the behavior of a driver, the weight of different characteristics is adjusted, so that in the fusion process, important characteristic information can better influence the final intention recognition result;
step S4.3: when conflict exists among different modal information, conflict resolution is carried out, when the face recognition result shows that the driver is in an anger state, but other visual characteristics and semantic analysis results indicate that the driving scene is normal, the conflict resolution is needed, and the final intention recognition result is determined;
step S4.4: and predicting the intention of the driver at the current moment by combining the intention recognition result at the previous moment and the current driving environment state, obtaining a more accurate intention recognition result and adapting to the change of the driving environment.
Specifically, in the step S4.2, the specific process of adjusting the weight of the feature information is as follows:
firstly, setting by random assignment or according to priori knowledge, and initializing the weight of each mode;
secondly, forward propagation is carried out on various characteristic information data in the multi-mode characteristic vector through each mode, and an output value of each mode is calculated;
thirdly, calculating a total loss value by weighting and summing the losses of each mode based on the output value of each mode;
fourth, the loss values are back propagated to the network, and the contribution degree of each mode to the total loss is calculated;
fifthly, calculating the gradient of each modal weight according to the contribution degree of each modal to the total loss;
sixth, the gradient descent optimizer is used to update the weights of each modality.
Specifically, in step S4.3, when there is a conflict between different modality information, a specific process of resolving the conflict is as follows:
the conflict is detected in the data after the multi-mode fusion through the support vector machine model, the area with the conflict in the fusion data is marked, and for the detected conflict, the decision is made through combining the weight of each mode information through the decision tree algorithm, so that the final intention recognition result is determined.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a driver intention recognition method, which fuses multi-mode information, can accurately recognize the intention of a driver by carrying out weight adjustment and conflict resolution on the multi-mode information, provides accurate and comprehensive driver intention information for an intelligent driving system, and improves driving safety and driving experience.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only preferred embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an overall structure diagram of a driver intention recognition method provided by the present invention.
Fig. 2 is a specific flowchart of step S2 visual feature extraction provided in the present invention.
Fig. 3 is a specific flowchart of the semantic analysis of step S3 provided in the present invention.
Fig. 4 is a specific flowchart of step S4 multi-mode fusion provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein. Based on the embodiments of the invention described in the present application, all other embodiments that a person skilled in the art would have without inventive effort shall fall within the scope of the invention.
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without one or more of these details. In other instances, well-known features have not been described in detail in order to avoid obscuring the invention.
It should be understood that the present invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of the associated listed items.
In order to provide a thorough understanding of the present invention, detailed structures will be presented in the following description in order to illustrate the technical solutions presented by the present invention. Alternative embodiments of the invention are described in detail below, however, the invention may have other implementations in addition to these detailed descriptions.
Referring to fig. 1, a driver intention recognition method includes the steps of:
step S1: collecting data of driver behaviors and image and video data of driving environment;
step S2: extracting key visual features from the step S1 for subsequent intention recognition;
step S3: carrying out semantic analysis on the acquired driving environment image and video data, and extracting key semantic information;
step S4: and fusing various feature information according to the visual feature extraction and the semantic analysis to obtain a driver intention recognition result.
According to the driver intention recognition method, firstly, the face in the driver behavior data is detected through the MTCNN neural network, and then the detected face is recognized by using a FaceNet face recognition algorithm, so that the identity information of the driver is obtained. Facial expression recognition is carried out through the FERNET network, facial expressions of the driver are analyzed, and the current expression states of the driver, such as anger, happiness, confusion and the like, are extracted from the behavior data of the driver so as to judge the emotion and intention of the driver. Meanwhile, through the empty attention network, the hand actions of the driver are identified to extract the types and states of gestures of the driver, such as steering wheel, button pressing and the like, so as to help judge the operation intention of the driver. On the basis, the target object in the driving environment is detected and tracked through the fast R-CNN network, so that the information such as the position, the motion state and the like of the target object is extracted, and the intention of a driver is assisted to be judged. And then estimating the head gesture of the driver through a 3D gesture estimation network so as to acquire information such as the rotation angle and the direction of the head and judge the sight line and the attention direction of the driver. To help determine the driving scenario and intent in which the driver is currently located. And analyzing the relation between the target objects in the driving environment by using a graph neural network semantic relation model. For example, determining the relationship between traffic lights and pedestrians, and between vehicles and road signs, these models may provide richer semantic information that aids in determining the driver's intent. By using a cyclic neural network (RNN), the behavior and the driving environment of a driver are comprehensively analyzed, the intention of the driver is deduced through analyzing the behavior action and the environmental semantic information of the driver, such as lane changing, acceleration, parking and the like, a long-short-term memory network (LSTM) model is reused, the behavior and the driving environment of the driver are modeled and analyzed, and the history data of the current driving environment is considered to predict the possible occurrence of events such as traffic jams, road conditions and the like in the future so as to improve the accuracy of supporting the intention recognition of the driver. And fusing various feature information of the extracted visual feature extraction and semantic analysis, comprehensively utilizing the feature information of different modes, and improving the accuracy of driver intention recognition. And carrying out post-processing reasoning on the fused intention recognition result. Statistical methods are used to infer the next behavior of the driver, converting the intent recognition results into corresponding driving operations such as acceleration, braking, steering, etc. This translates the driver's intent into specific row instructions to support decision and control of the intelligent driving system.
Specifically, referring to fig. 2, the visual feature extraction in step 2 specifically includes the following steps:
step S2.1: face detection and recognition, namely detecting the face in the driver behavior data through an MTCNN neural network, and recognizing the detected face through a FaceNet face recognition algorithm to acquire the identity information of the driver;
exemplary, specific steps of step S2.1 are as follows:
first, a driver behavior data set needs to be prepared for training and testing, which data set should contain behavior data of the driver, and face images associated with each behavior, ensuring that the face images in the data set have been labeled with corresponding identity information.
Inputting an image in the driver behavior data, transmitting the image to an MTCNN model, and outputting the detected face position by the model;
extracting the face position detected by the MTCNN from the original image, and preprocessing, wherein the preprocessing step can comprise operations of cutting, adjusting the size, normalizing and the like, so that all face images are ensured to have the same size and characteristic representation;
the face images after preprocessing are identified by using a FaceNet face recognition algorithm, the face images are mapped to a high-dimensional feature space, the similarity between the faces is calculated, and the most similar identity information can be found by comparing the face images to be identified with the face images with known identities;
and selecting a proper similarity threshold according to the output result of the FaceNet, comparing the face to be recognized with the face with the known identity, if the similarity exceeds the threshold, considering the face to belong to the known identity, and extracting corresponding identity information.
Step S2.2: the method comprises the following steps of performing expression analysis, namely analyzing the facial expression of a driver through a FERNET network, and extracting the current expression state of the driver through a facial expression recognition algorithm, wherein the expression state comprises anger, happiness and confusion;
exemplary, specific steps of step S2.2 are as follows:
firstly, preparing a driver facial expression data set for training and testing, wherein the data set contains facial expression images of a driver under different emotions and is marked with corresponding expression categories;
dividing the data set into a training set and a testing set, training the FERNT network by using the training set, updating the weight and bias of the network by using a back propagation algorithm and a proper optimizer (such as Adam) by inputting the preprocessed expression image into the FERNT network, and defining a proper loss function (such as cross entropy loss) to measure the difference between the prediction result of the model and the real label in the training process;
setting a threshold according to the output result of the FERNET network, and judging that the driver is in an anger state when the prediction probability of the anger class exceeds the threshold; when the prediction probability of the happy category exceeds a threshold value, judging that the driver is in a happy state; when the prediction probability of the confusion class exceeds the threshold value, it is judged that the driver is in a confusion state.
Step S2.3: gesture recognition, namely recognizing hand actions of a driver through a space-time attention network, and extracting the type and state of the gesture of the driver;
exemplary, specific steps of step S2.3 are as follows:
first, a driver hand motion data set is prepared for training and testing. The data set should contain video sequences of the driver under different gestures and be annotated with the corresponding gesture types and states.
Training a time-space attention network by using a training set, and measuring the difference between a predicted result of the model and a real label by a loss function by inputting a preprocessed video sequence into the network;
setting a threshold according to the output result of the space-time attention network, and judging that the gesture is being performed by the driver when the prediction probability of a certain gesture type exceeds the threshold; when the prediction probability of a certain gesture state exceeds a threshold value, the driver is judged to be in the gesture state.
Step S2.4: estimating the space gesture, namely estimating the head gesture of a driver through a 3D gesture estimation network to acquire information such as the rotation angle and the direction of the head;
exemplary, specific steps of step S2.4 are as follows:
first, a driver head pose data set for training and testing is prepared. The data set should contain head images or video sequences of the driver in different postures, and corresponding information such as head rotation angle and direction is marked.
Training a 3D pose estimation network using a training set, by inputting a preprocessed head image or video sequence into the network,
according to the output result of the 3D gesture estimation network, information such as the rotation angle and the direction of the head of the driver can be obtained, and the rotation angle and the direction of the head can be extracted from the Euler angle or the rotation matrix output by the network.
Step S2.5: and (3) analyzing driving scenes, namely performing target detection and tracking on images of driving environments through a fast R-CNN model, and extracting scene information such as road signs, traffic lights, pedestrians, obstacles and the like.
Exemplary, specific steps of step S2.5 are as follows:
first, a driving environment image dataset for training and testing is prepared. The dataset should contain images of various objects in the driving environment (e.g., road signs, traffic lights, pedestrians, obstacles, etc.), and be labeled with bounding box locations and category labels for the corresponding objects.
The Faster R-CNN model is trained using a training set. Inputting the preprocessed driving environment image into the model, and then outputting a result;
according to the output result of the fast R-CNN model, scene information such as road signs, traffic lights, pedestrians, obstacles and the like in a driving environment can be extracted.
Specifically, referring to fig. 3, the specific steps for extracting the key semantic information in the step 3 are as follows:
step S3.1: dividing the driving environment image, extracting semantic information of different areas, specifically comprising dividing different areas such as roads, sky, buildings and the like, and deducing semantic meaning of each area;
step S3.2: establishing a semantic relation model through a graph neural network, analyzing the relation between target objects in a driving environment, and judging the relation between traffic signals and pedestrians and the relation between vehicles and road signs;
step S3.3: action recognition and intention reasoning utilize action recognition and intention reasoning algorithm to comprehensively analyze the behavior and driving environment of the driver, and deduce the intention of the driver by analyzing the behavior action and environment semantic information of the driver;
step S3.4: and modeling and analyzing the driver behaviors and the driving environment through a cyclic neural network by combining the historical data, and predicting possible future events, including traffic jams and road conditions, by considering the historical data of the current driving environment.
Specifically, referring to fig. 4, the specific process of the multimodal fusion in step 4 is as follows:
step S4.1: the method comprises the steps of carrying out fusion on feature information from visual feature extraction and semantic analysis, and specifically comprises the steps of fusing visual features of face recognition, expression analysis and gesture recognition with semantic features of target detection and scene segmentation to form a multi-modal feature vector;
step S4.2: according to the contribution degree of different modal information, the weight of the characteristic information is adjusted, and according to the complexity degree of a driving scene and the criticality of the behavior of a driver, the weight of different characteristics is adjusted, so that in the fusion process, important characteristic information can better influence the final intention recognition result;
step S4.3: when conflict exists among different modal information, conflict resolution is carried out, when the face recognition result shows that the driver is in an anger state, but other visual characteristics and semantic analysis results indicate that the driving scene is normal, the conflict resolution is needed, and the final intention recognition result is determined;
step S4.4: and predicting the intention of the driver at the current moment by combining the intention recognition result at the previous moment and the current driving environment state, obtaining a more accurate intention recognition result and adapting to the change of the driving environment.
Specifically, in the step S4.2, the specific process of adjusting the weight of the feature information is as follows:
firstly, setting by random assignment or according to priori knowledge, and initializing the weight of each mode;
secondly, forward propagation is carried out on various characteristic information data in the multi-mode characteristic vector through each mode, and an output value of each mode is calculated;
thirdly, calculating a total loss value by weighting and summing the losses of each mode based on the output value of each mode;
fourth, the loss values are back propagated to the network, and the contribution degree of each mode to the total loss is calculated;
fifthly, calculating the gradient of each modal weight according to the contribution degree of each modal to the total loss;
sixth, the gradient descent optimizer is used to update the weights of each modality.
Specifically, in step S4.3, when there is a conflict between different modality information, a specific process of resolving the conflict is as follows:
the conflict is detected in the data after the multi-mode fusion through the support vector machine model, the area with the conflict in the fusion data is marked, and for the detected conflict, the decision is made through combining the weight of each mode information through the decision tree algorithm, so that the final intention recognition result is determined.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.
Claims (6)
1. A driver intention recognition method, characterized in that the method comprises the steps of:
step S1: collecting data of driver behaviors and image and video data of driving environment;
step S2: extracting key visual features from the data of the step S1 for subsequent intention recognition;
step S3: carrying out semantic analysis on the acquired driving environment image and video data, and extracting key semantic information;
step S4: and fusing various feature information according to the visual feature extraction and the semantic analysis to obtain a driver intention recognition result.
2. The method for recognizing the intention of a driver according to claim 1, wherein the visual feature extraction in the step S2 specifically includes the steps of:
step S2.1: face detection and recognition, namely detecting the face in the driver behavior data through an MTCNN neural network, and recognizing the detected face through a FaceNet face recognition algorithm to acquire the identity information of the driver;
step S2.2: the method comprises the following steps of performing expression analysis, namely analyzing the facial expression of a driver through a FERNET network, and extracting the current expression state of the driver through a facial expression recognition algorithm, wherein the expression state comprises anger, happiness and confusion;
step S2.3: gesture recognition, namely recognizing hand actions of a driver through a space-time attention network, and extracting the type and state of the gesture of the driver;
step S2.4: estimating the space gesture, namely estimating the head gesture of a driver through a 3D gesture estimation network to acquire information such as the rotation angle and the direction of the head;
step S2.5: and (3) analyzing driving scenes, namely performing target detection and tracking on images of driving environments through a fast R-CNN model, and extracting scene information such as road signs, traffic lights, pedestrians, obstacles and the like.
3. The method for identifying the intention of the driver according to claim 2, wherein the specific step of extracting the key semantic information in the step S3 is as follows:
step S3.1: dividing the driving environment image, extracting semantic information of different areas, specifically comprising dividing different areas such as roads, sky, buildings and the like, and deducing semantic meaning of each area;
step S3.2: establishing a semantic relation model through a graph neural network, analyzing the relation between target objects in a driving environment, and judging the relation between traffic signals and pedestrians and the relation between vehicles and road signs;
step S3.3: action recognition and intention reasoning utilize action recognition and intention reasoning algorithm to comprehensively analyze the behavior and driving environment of the driver, and deduce the intention of the driver by analyzing the behavior action and environment semantic information of the driver;
step S3.4: and modeling and analyzing the driver behaviors and the driving environment through a cyclic neural network by combining the historical data, and predicting possible future events, including traffic jams and road conditions, by considering the historical data of the current driving environment.
4. A method for identifying driver's intention according to claim 3, wherein the specific procedure of the multi-modal fusion in step 4 is as follows:
step S4.1: the method comprises the steps of carrying out fusion on feature information from visual feature extraction and semantic analysis, and specifically comprises the steps of fusing visual features of face recognition, expression analysis and gesture recognition with semantic features of target detection and scene segmentation to form a multi-modal feature vector;
step S4.2: according to the contribution degree of different modal information, the weight of the characteristic information is adjusted, and according to the complexity degree of a driving scene and the criticality of the behavior of a driver, the weight of different characteristics is adjusted, so that in the fusion process, important characteristic information can better influence the final intention recognition result;
step S4.3: when conflict exists among different modal information, conflict resolution is carried out, when the face recognition result shows that the driver is in an anger state, but other visual characteristics and semantic analysis results indicate that the driving scene is normal, the conflict resolution is needed, and the final intention recognition result is determined;
step S4.4: and predicting the intention of the driver at the current moment by combining the intention recognition result at the previous moment and the current driving environment state, obtaining a more accurate intention recognition result and adapting to the change of the driving environment.
5. The method for identifying the intention of the driver according to claim 4, wherein in the step S4.2, the specific process of weighting the characteristic information is as follows:
firstly, setting by random assignment or according to priori knowledge, and initializing the weight of each mode;
secondly, forward propagation is carried out on various characteristic information data in the multi-mode characteristic vector through each mode, and an output value of each mode is calculated;
thirdly, calculating a total loss value by weighting and summing the losses of each mode based on the output value of each mode;
fourth, the loss values are back propagated to the network, and the contribution degree of each mode to the total loss is calculated;
fifthly, calculating the gradient of each modal weight according to the contribution degree of each modal to the total loss;
sixth, the gradient descent optimizer is used to update the weights of each modality.
6. The method for identifying the intention of the driver according to claim 5, wherein in the step S4.3, when there is a conflict between different modality information, the specific process of performing the conflict resolution is as follows:
the conflict is detected in the data after the multi-mode fusion through the support vector machine model, the area with the conflict in the fusion data is marked, and for the detected conflict, the decision is made through combining the weight of each mode information through the decision tree algorithm, so that the final intention recognition result is determined.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311618805.8A CN117485348B (en) | 2023-11-30 | 2023-11-30 | Driver intention recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311618805.8A CN117485348B (en) | 2023-11-30 | 2023-11-30 | Driver intention recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117485348A true CN117485348A (en) | 2024-02-02 |
CN117485348B CN117485348B (en) | 2024-07-19 |
Family
ID=89676424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311618805.8A Active CN117485348B (en) | 2023-11-30 | 2023-11-30 | Driver intention recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117485348B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109658503A (en) * | 2018-12-29 | 2019-04-19 | 北京理工大学 | A kind of driving behavior intention detection method merging EEG signals |
WO2020074565A1 (en) * | 2018-10-11 | 2020-04-16 | Continental Automotive Gmbh | Driver assistance system for a vehicle |
CN112434588A (en) * | 2020-11-18 | 2021-03-02 | 青岛慧拓智能机器有限公司 | Inference method for end-to-end driver expressway lane change intention |
CN113392692A (en) * | 2020-02-26 | 2021-09-14 | 本田技研工业株式会社 | Driver-centric risk assessment: risk object identification via causal reasoning for intent-aware driving models |
CN113386775A (en) * | 2021-06-16 | 2021-09-14 | 杭州电子科技大学 | Driver intention identification method considering human-vehicle-road characteristics |
CN115027484A (en) * | 2022-05-23 | 2022-09-09 | 吉林大学 | Human-computer fusion perception method for high-degree automatic driving |
-
2023
- 2023-11-30 CN CN202311618805.8A patent/CN117485348B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020074565A1 (en) * | 2018-10-11 | 2020-04-16 | Continental Automotive Gmbh | Driver assistance system for a vehicle |
CN109658503A (en) * | 2018-12-29 | 2019-04-19 | 北京理工大学 | A kind of driving behavior intention detection method merging EEG signals |
CN113392692A (en) * | 2020-02-26 | 2021-09-14 | 本田技研工业株式会社 | Driver-centric risk assessment: risk object identification via causal reasoning for intent-aware driving models |
CN112434588A (en) * | 2020-11-18 | 2021-03-02 | 青岛慧拓智能机器有限公司 | Inference method for end-to-end driver expressway lane change intention |
CN113386775A (en) * | 2021-06-16 | 2021-09-14 | 杭州电子科技大学 | Driver intention identification method considering human-vehicle-road characteristics |
CN115027484A (en) * | 2022-05-23 | 2022-09-09 | 吉林大学 | Human-computer fusion perception method for high-degree automatic driving |
Also Published As
Publication number | Publication date |
---|---|
CN117485348B (en) | 2024-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109598066B (en) | Effect evaluation method, apparatus, device and storage medium for prediction module | |
CN108550259B (en) | Road congestion judging method, terminal device and computer readable storage medium | |
CN107247947B (en) | Face attribute identification method and device | |
KR20200075344A (en) | Detector, method of object detection, learning apparatus, and learning method for domain transformation | |
CN105654139B (en) | A kind of real-time online multi-object tracking method using time dynamic apparent model | |
CN112734808B (en) | Trajectory prediction method for vulnerable road users in vehicle driving environment | |
JP6418574B2 (en) | Risk estimation device, risk estimation method, and computer program for risk estimation | |
US11420623B2 (en) | Systems for determining object importance in on-road driving scenarios and methods thereof | |
CN111797771A (en) | Method and system for detecting weak supervision video behaviors based on iterative learning | |
EP3382570A1 (en) | Method for characterizing driving events of a vehicle based on an accelerometer sensor | |
CN116363712B (en) | Palmprint palm vein recognition method based on modal informativity evaluation strategy | |
US20220383736A1 (en) | Method for estimating coverage of the area of traffic scenarios | |
CN114067292A (en) | Image processing method and device for intelligent driving | |
CN118334604B (en) | Accident detection and data set construction method and equipment based on multi-mode large model | |
CN116964588A (en) | Target detection method, target detection model training method and device | |
CN117246358A (en) | Circuit board for automatic driving auxiliary system and method thereof | |
JP5293321B2 (en) | Object identification device and program | |
CN116985793B (en) | Automatic driving safety control system and method based on deep learning algorithm | |
CN117485348B (en) | Driver intention recognition method | |
CN117130010A (en) | Obstacle sensing method and system for unmanned vehicle and unmanned vehicle | |
CN116108669A (en) | Scene generation method based on deep learning heterogeneous driver model | |
CN115775372A (en) | Method for identifying passengers getting on or off vehicle, terminal device and storage medium | |
CN116997890A (en) | Generating an unknown unsafe scenario, improving an automated vehicle, and a computer system | |
CN114360056A (en) | Door opening early warning method, device, equipment and storage medium | |
CN114677662A (en) | Method, device, equipment and storage medium for predicting vehicle front obstacle state |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |