CN117218716B - DVS-based automobile cabin gesture recognition system and method - Google Patents
DVS-based automobile cabin gesture recognition system and method Download PDFInfo
- Publication number
- CN117218716B CN117218716B CN202311005673.1A CN202311005673A CN117218716B CN 117218716 B CN117218716 B CN 117218716B CN 202311005673 A CN202311005673 A CN 202311005673A CN 117218716 B CN117218716 B CN 117218716B
- Authority
- CN
- China
- Prior art keywords
- gesture
- vehicle
- dimensional
- feedback
- dvs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 238000012795 verification Methods 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 claims abstract description 7
- 230000003993 interaction Effects 0.000 claims description 35
- 230000008569 process Effects 0.000 claims description 26
- 230000000007 visual effect Effects 0.000 claims description 18
- 230000000694 effects Effects 0.000 claims description 15
- 210000003811 finger Anatomy 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 15
- 210000003813 thumb Anatomy 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 12
- 230000009471 action Effects 0.000 claims description 11
- 230000008447 perception Effects 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 10
- 238000012805 post-processing Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 210000001145 finger joint Anatomy 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 description 4
- 230000008713 feedback mechanism Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 206010039203 Road traffic accident Diseases 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Landscapes
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a DVS-based automobile cabin gesture recognition system and a method, wherein the system comprises the following steps: the sensing layer, the decision layer and the execution layer are connected in sequence; the sensing layer acquires a gesture image of a user by using DVS; the decision layer classifies the gesture image signals through an algorithm processing module and an output module; the execution layer executes the gesture command through the feedback module; the method comprises the steps of collecting gesture data through DVS, transmitting an image signal to an algorithm processing module for processing, extracting depth information and three-dimensional hand skeleton characteristics for multi-mode fusion, outputting a signal to an execution layer after algorithm verification, signal analysis and model verification of an output module, and executing a command by a feedback module; the system and the method can filter gesture habit differences of different individuals of the user and solve the problem of low recognition accuracy and speed.
Description
Technical Field
The invention relates to a gesture recognition system, in particular to a DVS-based automobile cabin gesture recognition system and a DVS-based automobile cabin gesture recognition method, and belongs to the technical field of automobile cabin and vehicle-mounted gesture recognition.
Background
With the development of automobile intellectualization, gesture recognition technology is widely applied to man-machine interaction and control of automobile cabins. Enriches the mental culture life of people and brings pleasant experience to people. The development of man-machine interaction can improve driving safety and convenience, reduce the visual and operation load of a driver, and promote a gesture interaction system under a cabin area to enter a rapid development period.
For the prior art, as disclosed in publication number CN110119200a, the infrared distance sensor is linearly connected with the optical sensor through a wire, the optical sensor is linearly connected with the gyroscope through a wire, the optical sensor is linearly connected with the accelerator through a wire, the optical sensor is linearly connected with the MCU processor through a wire, and the MCU processor is linearly connected with the gesture storage module through a wire. The automobile gesture recognition system can effectively reduce automobile keys, simplify the operation of an automobile, effectively improve the rapidness and high efficiency of the operation, and enable the operation of the automobile to be intelligent, and further provide certain entertainment for operators, but the gesture recognition of the existing automobile cabin is mainly a static gesture recognition mode, wherein the wearable sensor equipment is common, has better robustness, is relatively high in input use cost and is not suitable for mass production. For vehicle-mounted static gesture recognition, although the recognition rate is high and the recognition is easy, the vehicle-mounted static gesture recognition is not suitable for the current production and living standard. Based on DVS, gesture actions can be captured in real time, space-time feature network fusion is carried out through LSTM and MobilenetV3 networks, a three-dimensional gesture database of the user gesture is obtained, and corresponding gesture signals are fed back in real time through gesture matching output signals. The method has obvious precision advantage and clear image compared with the traditional camera.
Therefore, in order to solve the drawbacks of the prior art and the market demand of intelligent automobile cabins, it is necessary to design a DVS-based gesture recognition system and method for intelligent automobile cabins to solve the above problems.
Disclosure of Invention
The invention aims to solve at least one technical problem and provide a DVS-based automobile cabin gesture recognition system and method, which have high recognition precision and safety, and the gesture learning and optimizing method can realize intelligent self-adaption to environmental change and provide good man-machine interaction experience for users.
The invention realizes the above purpose through the following technical scheme: the utility model provides a car cabin gesture recognition system based on DVS, includes perception layer, decision-making layer, the execution layer that connects gradually, its characterized in that: the perception layer consists of DVS; the decision layer consists of an algorithm processing module and an output module; the execution layer is composed of a feedback module.
As still further aspects of the invention: the DVS of the perception layer is assembled within the cabin, captures the gesture motion (brightness or distance change information) of the driver or passenger with microsecond time resolution, and generates time-related events that carry the time stamp and spatial location information of event findings.
As still further aspects of the invention: the DVS is configured to receive the events, filter, cluster, and sort the time to reconstruct a time series of gesture actions.
As still further aspects of the invention: the decision layer receives gesture signals acquired by the perception layer DVS, processes the gesture signals through a decision layer algorithm and outputs the gesture signals to the execution layer.
As still further aspects of the invention: the algorithm processing module combines the depth information and the three-dimensional hand skeleton information characteristics, and carries out multi-mode fusion with the LSTM through the MobileNet V3.
As still further aspects of the invention: the output module processes the information output by the algorithm processing module through algorithm verification, signal analysis and model verification, and inputs the information to the execution layer;
wherein, the output module comprises the following steps:
s1, constructing a DVS gesture library: collecting various three-dimensional gesture data according to vehicle-mounted function requirements and interaction habits, and constructing a three-dimensional gesture library, wherein each gesture corresponds to a vehicle control operation;
s2, gesture matching: performing gesture matching on the acquired three-dimensional gesture image sequence by using a fusion model of LSTM and MobilenetV3, and matching with each gesture template in a gesture library to obtain the most matched gesture category and matching degree;
s3, gesture filtering: setting a threshold value based on the matching degree of the three-dimensional gestures, filtering out gestures with lower matching degree, and only selecting gestures with matching degree higher than the threshold value to perform subsequent control operation;
s4, control instruction generation: generating a corresponding vehicle control instruction according to a gesture template which is most matched with the input three-dimensional gesture;
s5, scene judgment: judging the driving scene of the current vehicle, if the gesture operation which is the best match is not matched with the current scene, not generating a control instruction, and giving a warning;
s6, visual feedback: displaying the most matched gesture templates on the vehicle-mounted display screen, and displaying the execution effect of the corresponding vehicle-mounted functions;
s7, operation record: recording a three-dimensional gesture operation process of a driver and a feedback process of a system, wherein the three-dimensional gesture operation process and the feedback process are used for incremental learning of a gesture library and optimization of template matching;
s8, matching optimization: the matching model between the three-dimensional gesture and the template is continuously optimized by using an incremental learning method.
As still further aspects of the invention: the feedback module of the execution layer receives the command signal output by the decision layer and executes the command, and the output mode comprises air conditioner air volume, media volume, window lifting and center control screen page turning;
wherein, the feedback module comprises the following steps:
s1, visual feedback: the best matched gesture template and corresponding vehicle-mounted function execution effect (such as opening of a vehicle window) are displayed on a vehicle-mounted display screen, so that proper visual feedback is given to a driver, and gestures are convenient to correct or re-input;
s2, voice feedback: the system informs the driver of the most matched gesture operation and the executed vehicle-mounted control instruction in a voice mode, and carries out necessary voice reminding and interaction;
s3, executing the function: and after receiving the control instruction generated by the mapping, the vehicle-mounted system controls the corresponding vehicle-mounted functional module to perform operation execution (such as opening a vehicle window). The execution result is also displayed as a feedback;
s4, matching results: the system informs the driver of the matching result between the three-dimensional gesture and the gesture template, including the gesture type and the matching degree of the best matching. This can also be used as a feedback for the driver to judge the accuracy of the gesture input;
s5, reporting errors and reminding: if the system detects that the three-dimensional gesture is not matched with the current driving scene and a control instruction is not generated, a fault reporting prompt is given in a visual voice mode, and a driver is prompted to input the gesture again;
s6, operation record: the system records the whole three-dimensional gesture operation process and feedback process for subsequent analysis of interaction effect and improvement of experience. The recorded content can also be used for online learning and optimization of a matching model;
s7, user evaluation: the system inquires evaluation feedback of the three-dimensional gesture interaction effect to the driver, and accordingly selects whether the matching model and the interaction rule need to be updated to achieve personalized optimization.
As still further aspects of the invention: the feedback module specifically comprises: the air conditioner, the volume, the central control screen, the car window and the like can respond differently when receiving command signals, wherein the central control screen can respond according to gesture page scrolling, page turning, space clicking and the like; the volume can be adjusted according to the distance between the thumb and the index finger in the z direction of the space coordinate system; the vehicle window can ascend or descend according to the up-and-down swing of the gesture, and the front-and-back swing enables the skylight to be opened or closed; the air quantity of the air conditioner is adjusted according to the distance between the thumb and the index finger in the x direction of the space coordinate system.
As still further aspects of the invention: the feedback module performs feedback output and mainly comprises the following gesture and control instruction mapping activation: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are horizontal and the rest four thumbs point downwards to swing upwards and downwards, the vehicle window descends along with the palms, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and the index finger is singly extended to move left and right up and down, the left and right page turning of the screen page can be controlled, the sliding of the up and down contents can be controlled, when the index finger joint has larger angle change, a single click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double click command is continuously generated twice, the double click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.
The DVS-based automobile cabin gesture recognition method comprises an automobile cabin gesture recognition system, wherein the gesture recognition method is based on a multi-mode fusion network model for detection, and comprises the following steps of:
s1, capturing gesture actions of a user in real time through DVS, generating a gesture sequence frame image, forming an event sequence of the gesture image, further processing the event sequence as original input of a network to extract time sequence and spatial characteristics, acquiring gestures with different visual angles and modes, enriching input information, and improving detection robustness;
s2, detecting gesture key points: detecting gesture key points in each frame of images on the time sequence of the gesture images by using a key point detection model OpenPose, obtaining time sequence coordinates of the gesture key points with multiple views, and capturing fine action characteristics and three-dimensional space information of the gestures;
s3, preprocessing a gesture image: performing scale normalization, image rotation, noise filtering, frame selection and modal Registration preprocessing on the acquired gesture image time sequences to improve the matching degree and the feature extraction effect of the time sequences of different modalities;
s4, extracting spatial features of the gesture image: carrying out feature extraction on each frame of image of the preprocessed gesture image time sequence by using a lightweight MobilenetV3 and other networks so as to acquire advanced spatial feature mapping of the images, and enhancing understanding of gesture details;
s5, multi-mode space-time feature network fusion; fusing the multi-view gesture key point time sequence and the image space feature map obtained by step2 and step4, constructing space-time features of the gesture, and performing gesture detection as input of an LSTM network;
s6, post-processing of detection results: post-processing is carried out on the prediction result of the LSTM, wherein the post-processing comprises multi-mode feature mapping, coordinate mapping, smoothing processing and three-dimensional reconstruction to obtain gesture types, multi-view key point time sequences and three-dimensional gesture space information.
The beneficial effects of the invention are as follows:
1. the system can be personalized: by recording and analyzing the gesture operation process of the driver and evaluating the feedback effect, the system can realize personalized optimization of three-dimensional gesture interaction, select the interaction rule and the mapping model which are most matched with the operation habit of the driver, and provide personalized man-machine interaction experience;
2. the operation record and feedback mechanism can provide data support for the investigation of possible traffic accidents after the fact, the generation process and responsibility attribution of the control instruction can be conveniently judged, and the safety guarantee mechanism and the error prevention capability of the system can be improved continuously, so that the safety is improved.
3. The collected large amount of gesture operation data provide conditions for the learning and optimization of the system, the understanding of the system to complex scenes and personal habits can be enhanced by analyzing the data, a man-machine interaction mechanism is continuously improved, intelligent self-adaption to environmental changes is realized, and the learning and optimization is continuously promoted.
4. The three-dimensional gesture interaction system and the targeted control feedback mechanism can be well applied to intelligent driving scenes, high coordination of gesture natural interaction and a vehicle-mounted system is achieved, the operation convenience of a driver and the usability of the system are improved, and the method has high practical application value.
Drawings
FIG. 1 is a block diagram of a gesture recognition system of the present invention;
FIG. 2 is a block diagram of a gesture recognition method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment one: as shown in fig. 1, a DVS-based gesture recognition system for an automobile cabin includes a sensing layer, a decision layer, and an executing layer, which are sequentially connected, and is characterized in that: the perception layer consists of DVS; the decision layer consists of an algorithm processing module and an output module; the execution layer is composed of a feedback module.
Embodiment two: in addition to all the technical features in the first embodiment, the present embodiment further includes:
the DVS of the perception layer is assembled within the cabin, captures the gesture motion (brightness or distance change information) of the driver or passenger with microsecond time resolution, and generates time-related events that carry the time stamp and spatial location information of event findings.
The DVS is configured to receive the events, filter, cluster, and sort the time to reconstruct a time series of gesture actions.
The decision layer receives gesture signals acquired by the perception layer DVS, processes the gesture signals through a decision layer algorithm and outputs the gesture signals to the execution layer.
The algorithm processing module combines the depth information and the three-dimensional hand skeleton information characteristics, and carries out multi-mode fusion with the LSTM through the MobileNet V3.
Embodiment III: in addition to all the technical features in the first embodiment, the present embodiment further includes:
the output module processes the information output by the algorithm processing module through algorithm verification, signal analysis and model verification, and inputs the information to the execution layer;
wherein, the output module comprises the following steps:
s1, constructing a DVS gesture library: collecting various three-dimensional gesture data according to vehicle-mounted function requirements and interaction habits, and constructing a three-dimensional gesture library, wherein each gesture corresponds to a vehicle control operation;
s2, gesture matching: performing gesture matching on the acquired three-dimensional gesture image sequence by using a fusion model of LSTM and MobilenetV3, and matching with each gesture template in a gesture library to obtain the most matched gesture category and matching degree;
s3, gesture filtering: setting a threshold value based on the matching degree of the three-dimensional gestures, filtering out gestures with lower matching degree, and only selecting gestures with higher matching degree than the threshold value for subsequent control operation, so that the accuracy of instructions and the robustness of a system can be improved;
s4, control instruction generation: generating a corresponding vehicle control instruction according to a gesture template which is most matched with the input three-dimensional gesture, and generating a control instruction for opening a left front door window if the operation corresponding to the most matched gesture template is that the left front door window is opened;
s5, scene judgment: judging the driving scene of the current vehicle, if the gesture operation which is the best match is not matched with the current scene, not generating a control instruction, and giving an alarm. The misoperation caused by scene change in the three-dimensional gesture interaction process can be avoided;
s6, visual feedback: the best matched gesture template is displayed on the vehicle-mounted display screen, the execution effect of the corresponding vehicle-mounted function is displayed, appropriate visual feedback is given to a driver, and the gesture is convenient to correct or re-input;
s7, operation record: and recording a three-dimensional gesture operation process of a driver and a feedback process of a system, and optimizing incremental learning and template matching of a gesture library to realize personalized customization.
S8, matching optimization: the incremental learning method is used for continuously optimizing a matching model between the three-dimensional gesture and the template, so that accuracy and robustness of gesture matching are improved, and a foundation is provided for realizing high-performance man-machine interaction.
The output method can improve the data processing speed, maintain good accuracy and robustness and provide personalized man-machine interaction.
Embodiment four: in addition to all the technical features in the first embodiment, the present embodiment further includes:
the feedback module of the execution layer receives the command signal output by the decision layer and executes the command, and the output mode comprises air conditioner air volume, media volume, window lifting and center control screen page turning;
wherein, the feedback module comprises the following steps:
s1, visual feedback: the best matched gesture template and corresponding vehicle-mounted function execution effect (such as opening of a vehicle window) are displayed on a vehicle-mounted display screen, so that proper visual feedback is given to a driver, and gestures are convenient to correct or re-input;
s2, voice feedback: the system informs the driver of the most matched gesture operation and the executed vehicle-mounted control instruction in a voice mode, and carries out necessary voice reminding and interaction;
s3, executing the function: and after receiving the control instruction generated by the mapping, the vehicle-mounted system controls the corresponding vehicle-mounted functional module to perform operation execution (such as opening a vehicle window). The execution result is also displayed as a feedback;
s4, matching results: the system informs the driver of the matching result between the three-dimensional gesture and the gesture template, including the gesture type and the matching degree of the best matching. This can also be used as a feedback for the driver to judge the accuracy of the gesture input;
s5, reporting errors and reminding: if the system detects that the three-dimensional gesture is not matched with the current driving scene and a control instruction is not generated, a fault reporting prompt is given in a visual voice mode, and a driver is prompted to input the gesture again;
s6, operation record: the system records the whole three-dimensional gesture operation process and feedback process for subsequent analysis of interaction effect and improvement of experience. The recorded content can also be used for online learning and optimization of a matching model;
s7, user evaluation: the system inquires evaluation feedback of the three-dimensional gesture interaction effect to the driver, and accordingly selects whether the matching model and the interaction rule need to be updated to achieve personalized optimization.
Through the feedback module, interactive feedback and communication of man-machine interaction can be effectively realized, dangerous operation is avoided, the intelligence of the system is enhanced, and the safety of the system is ensured.
Fifth embodiment: in addition to all the technical features in the first embodiment, the present embodiment further includes:
the feedback module specifically comprises: the air conditioner, the volume, the central control screen, the car window and the like can respond differently when receiving command signals, wherein the central control screen can respond according to gesture page scrolling, page turning, space clicking and the like; the volume can be adjusted according to the distance between the thumb and the index finger in the z direction of the space coordinate system; the vehicle window can ascend or descend according to the up-and-down swing of the gesture, and the front-and-back swing enables the skylight to be opened or closed; the air quantity of the air conditioner is adjusted according to the distance between the thumb and the index finger in the x direction of the space coordinate system.
The feedback module performs feedback output and mainly comprises the following gesture and control instruction mapping activation: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are horizontal and the rest four thumbs point downwards to swing upwards and downwards, the vehicle window descends along with the palms, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and the index finger is singly extended to move left and right up and down, the left and right page turning of the screen page can be controlled, the sliding of the up and down contents can be controlled, when the index finger joint has larger angle change, a single click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double click command is continuously generated twice, the double click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.
Example six: as shown in fig. 2, a DVS-based car cabin gesture recognition method, including a car cabin gesture recognition system, the gesture recognition method detects based on a multi-mode fusion network model, the gesture recognition method includes the following steps:
s1, capturing gesture actions of a user in real time through DVS, generating a gesture sequence frame image, forming an event sequence of the gesture image, further processing the event sequence as original input of a network to extract time sequence and spatial characteristics, acquiring gestures with different visual angles and modes, enriching input information, and improving detection robustness;
s2, detecting gesture key points: detecting gesture key points in each frame of images on the time sequence of the gesture images by using a key point detection model OpenPose, obtaining time sequence coordinates of the gesture key points with multiple views, and capturing fine action characteristics and three-dimensional space information of the gestures;
s3, preprocessing a gesture image: performing scale normalization, image rotation, noise filtering, frame selection and modal Registration preprocessing on the acquired gesture image time sequences to improve the matching degree and the feature extraction effect of the time sequences of different modalities;
s4, extracting spatial features of the gesture image: carrying out feature extraction on each frame of image of the preprocessed gesture image time sequence by using a lightweight MobilenetV3 and other networks so as to acquire advanced spatial feature mapping of the images, and enhancing understanding of gesture details;
s5, multi-mode space-time feature network fusion; fusing the multi-view gesture key point time sequence and the image space feature map obtained by step2 and step4, constructing space-time features of the gesture, and performing gesture detection as input of an LSTM network;
s6, post-processing of detection results: post-processing is carried out on the prediction result of the LSTM, wherein the post-processing comprises multi-mode feature mapping, coordinate mapping, smoothing processing and three-dimensional reconstruction to obtain gesture types, multi-view key point time sequences and three-dimensional gesture space information.
Through multi-mode network fusion, a new network model with high precision, strong timeliness and high robustness can be obtained.
The method is realized based on DVS, so that the overall stability of the gesture recognition system is improved, the accuracy of gesture recognition of the automobile cabin can be effectively improved, the recognition delay is reduced, the overall robustness of the system is improved, and better human-computer interaction experience is provided; the operation record and feedback mechanism can provide data support for the investigation of possible traffic accidents after the fact, is convenient for judging the generation process and responsibility attribution of the control instruction, and is beneficial to continuously improving the safety guarantee mechanism and error proofing capability of the system and improving the safety; the gesture recognition system and the targeted feedback mechanism can be well applied to intelligent driving scenes, so that the gesture natural interaction and the high cooperation of the automobile cabin system are realized, the operation convenience of a driver and the usability of the system are improved, and the method has high practical application value.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
Claims (2)
1. The utility model provides a car cabin gesture recognition system based on DVS, includes perception layer, decision-making layer and the execution layer that connects gradually, its characterized in that: the perception layer consists of DVS; the decision layer consists of an algorithm processing module and an output module; the execution layer consists of a feedback module;
the DVS of the perception layer is assembled in a vehicle cabin, captures gesture actions of a driver or a passenger with microsecond time resolution, and generates time-related events, wherein the events carry time stamps and spatial position information of event discovery, and the DVS is used for receiving the events, filtering, clustering and sequencing the time to reconstruct a time sequence of the gesture actions;
the decision layer receives gesture signals acquired by the DVS of the perception layer, processes the gesture signals through a decision layer algorithm and outputs the gesture signals to the execution layer, and the algorithm processing module combines depth information and three-dimensional hand skeleton information characteristics and carries out multi-mode fusion with LSTM through MobileNet V3;
the output module processes the information output by the algorithm processing module through algorithm verification, signal analysis and model verification, and inputs the information to the execution layer;
wherein, the output module comprises the following steps:
s1, constructing a DVS gesture library: collecting various three-dimensional gesture data according to vehicle-mounted function requirements and interaction habits, and constructing a three-dimensional gesture library, wherein each gesture corresponds to a vehicle control operation;
s2, gesture matching: performing gesture matching on the acquired three-dimensional gesture image sequence by using a fusion model of LSTM and MobilenetV3, and matching with each gesture template in a gesture library to obtain the most matched gesture category and matching degree;
s3, gesture filtering: setting a threshold value based on the matching degree of the three-dimensional gestures, filtering out gestures with lower matching degree, and only selecting gestures with matching degree higher than the threshold value to perform subsequent control operation;
s4, control instruction generation: generating a corresponding vehicle control instruction according to a gesture template which is most matched with the input three-dimensional gesture, and generating a control instruction for opening a left front door window if the operation corresponding to the most matched gesture template is that the left front door window is opened;
s5, scene judgment: judging the driving scene of the current vehicle, if the gesture operation which is the best match is not matched with the current scene, not generating a control instruction, and giving a warning;
s6, visual feedback: displaying the best matched gesture template on the vehicle-mounted display screen, displaying the execution effect of the corresponding vehicle-mounted function, and giving appropriate visual feedback to a driver;
s7, operation record: recording a three-dimensional gesture operation process of a driver and a feedback process of a system, wherein the three-dimensional gesture operation process and the feedback process are used for incremental learning of a gesture library and optimization of template matching;
s8, matching optimization: the incremental learning method is used for continuously optimizing a matching model between the three-dimensional gesture and the template, so that accuracy and robustness of gesture matching are improved, and a foundation is provided for realizing high-performance man-machine interaction;
the feedback module of the execution layer receives the command signal output by the decision layer and executes the command, and the output mode comprises air conditioner air volume, media volume, window lifting and center control screen page turning;
wherein, the feedback module comprises the following steps:
s1, visual feedback: the best matched gesture template and the corresponding vehicle-mounted function execution effect are displayed on the vehicle-mounted display screen, appropriate visual feedback is given to a driver, and the gesture is convenient to correct or re-input;
s2, voice feedback: the system informs the driver of the most matched gesture operation and the executed vehicle-mounted control instruction in a voice mode, and carries out necessary voice reminding and interaction;
s3, executing the function: after receiving the control instruction generated by mapping, the vehicle-mounted system controls the corresponding vehicle-mounted functional module to perform operation execution;
s4, matching results: the system informs the driver of the matching result between the three-dimensional gesture and the gesture template, including the most matched gesture category and matching degree;
s5, reporting errors and reminding: if the system detects that the three-dimensional gesture is not matched with the current driving scene and a control instruction is not generated, a fault reporting prompt is given in a visual voice mode, and a driver is prompted to input the gesture again;
s6, operation record: the system records the whole three-dimensional gesture operation process and feedback process for subsequent analysis of interaction effect and improvement of experience;
s7, user evaluation: the system inquires evaluation feedback of the three-dimensional gesture interaction effect to a driver, and accordingly selects whether a matching model and an interaction rule need to be updated so as to realize personalized optimization;
the feedback module specifically comprises: the air conditioner, the volume, the central control screen, the car window and the like can respond differently when receiving command signals, wherein the central control screen can respond according to gesture page scrolling, page turning, space clicking and the like; the volume can be adjusted according to the distance between the thumb and the index finger in the z direction of the space coordinate system; the vehicle window can ascend or descend according to the up-and-down swing of the gesture, and the front-and-back swing enables the skylight to be opened or closed; the air quantity of the air conditioner is adjusted according to the distance between the thumb and the index finger in the x direction of the space coordinate system;
the feedback module performs feedback output and comprises the following gesture and control instruction mapping activation:
when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed;
when the palms are horizontal and the rest four thumbs point downwards to swing upwards and downwards, the vehicle window descends along with the palms, and otherwise ascends;
when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and the index finger is singly extended to move left and right up and down, the left and right page turning of the screen page can be controlled, the sliding of the up and down contents can be controlled, when the index finger joint has larger angle change, a single click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double click command is continuously generated twice, the double click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.
2. A DVS-based car cabin gesture recognition method, comprising the car cabin gesture recognition system of claim 1, wherein the gesture recognition method is based on a multi-modal fusion network model for detection, the gesture recognition method comprising the steps of:
s1, capturing gesture actions of a user in real time through DVS, generating gesture sequence frame images, forming an event sequence of the gesture images, and further processing the event sequence as original input of a network to extract time sequence and spatial characteristics, wherein gestures with different visual angles and modes are required to be acquired;
s2, detecting gesture key points: detecting gesture key points in each frame of images on the time sequence of the gesture images by using a key point detection model OpenPose, obtaining time sequence coordinates of the gesture key points with multiple views, and capturing fine action characteristics and three-dimensional space information of the gestures;
s3, preprocessing a gesture image: performing scale normalization, image rotation, noise filtering, frame selection and modal Registration preprocessing on the acquired gesture image time sequence;
s4, extracting spatial features of the gesture image: extracting features of each frame of the preprocessed gesture image time sequence by using a lightweight MobilenetV3 network;
s5, multi-mode space-time feature network fusion; fusing the multi-view gesture key point time sequence and the image space feature map obtained by step2 and step4, constructing space-time features of the gesture, and performing gesture detection as input of an LSTM network;
s6, post-processing of detection results: post-processing is carried out on the prediction result of the LSTM, wherein the post-processing comprises multi-mode feature mapping, coordinate mapping, smoothing processing and three-dimensional reconstruction to obtain gesture types, multi-view key point time sequences and three-dimensional gesture space information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311005673.1A CN117218716B (en) | 2023-08-10 | 2023-08-10 | DVS-based automobile cabin gesture recognition system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311005673.1A CN117218716B (en) | 2023-08-10 | 2023-08-10 | DVS-based automobile cabin gesture recognition system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117218716A CN117218716A (en) | 2023-12-12 |
CN117218716B true CN117218716B (en) | 2024-04-09 |
Family
ID=89050131
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311005673.1A Active CN117218716B (en) | 2023-08-10 | 2023-08-10 | DVS-based automobile cabin gesture recognition system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117218716B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103076877A (en) * | 2011-12-16 | 2013-05-01 | 微软公司 | Interacting with a mobile device within a vehicle using gestures |
KR20140101276A (en) * | 2013-02-07 | 2014-08-19 | 삼성전자주식회사 | Method of displaying menu based on depth information and space gesture of user |
KR20200055202A (en) * | 2018-11-12 | 2020-05-21 | 삼성전자주식회사 | Electronic device which provides voice recognition service triggered by gesture and method of operating the same |
CN111813224A (en) * | 2020-07-09 | 2020-10-23 | 电子科技大学 | Method for establishing and identifying fine gesture library based on ultrahigh-resolution radar |
CN111988493A (en) * | 2019-05-21 | 2020-11-24 | 北京小米移动软件有限公司 | Interaction processing method, device, equipment and storage medium |
CN112507898A (en) * | 2020-12-14 | 2021-03-16 | 重庆邮电大学 | Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN |
CN112558305A (en) * | 2020-12-22 | 2021-03-26 | 华人运通(上海)云计算科技有限公司 | Control method, device and medium for display picture and head-up display control system |
CN112905004A (en) * | 2021-01-21 | 2021-06-04 | 浙江吉利控股集团有限公司 | Gesture control method and device for vehicle-mounted display screen and storage medium |
CN112949512A (en) * | 2021-03-08 | 2021-06-11 | 豪威芯仑传感器(上海)有限公司 | Dynamic gesture recognition method, gesture interaction method and interaction system |
CN113807287A (en) * | 2021-09-24 | 2021-12-17 | 福建平潭瑞谦智能科技有限公司 | 3D structured light face recognition method |
CN114265498A (en) * | 2021-12-16 | 2022-04-01 | 中国电子科技集团公司第二十八研究所 | Method for combining multi-modal gesture recognition and visual feedback mechanism |
CN114973408A (en) * | 2022-05-10 | 2022-08-30 | 西安交通大学 | Dynamic gesture recognition method and device |
CN116071817A (en) * | 2022-10-25 | 2023-05-05 | 中国矿业大学 | Network architecture and training method of gesture recognition system for automobile cabin |
CN116449947A (en) * | 2023-03-22 | 2023-07-18 | 江苏北斗星通汽车电子有限公司 | Automobile cabin domain gesture recognition system and method based on TOF camera |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8396252B2 (en) * | 2010-05-20 | 2013-03-12 | Edge 3 Technologies | Systems and related methods for three dimensional gesture recognition in vehicles |
WO2015184308A1 (en) * | 2014-05-29 | 2015-12-03 | Northwestern University | Motion contrast depth scanning |
KR102530219B1 (en) * | 2015-10-30 | 2023-05-09 | 삼성전자주식회사 | Method and apparatus of detecting gesture recognition error |
KR20190104929A (en) * | 2019-08-22 | 2019-09-11 | 엘지전자 주식회사 | Method for performing user authentication and function execution simultaneously and electronic device for the same |
US11144129B2 (en) * | 2020-03-04 | 2021-10-12 | Panasonic Avionics Corporation | Depth sensing infrared input device and associated methods thereof |
-
2023
- 2023-08-10 CN CN202311005673.1A patent/CN117218716B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103076877A (en) * | 2011-12-16 | 2013-05-01 | 微软公司 | Interacting with a mobile device within a vehicle using gestures |
KR20140101276A (en) * | 2013-02-07 | 2014-08-19 | 삼성전자주식회사 | Method of displaying menu based on depth information and space gesture of user |
KR20200055202A (en) * | 2018-11-12 | 2020-05-21 | 삼성전자주식회사 | Electronic device which provides voice recognition service triggered by gesture and method of operating the same |
CN111988493A (en) * | 2019-05-21 | 2020-11-24 | 北京小米移动软件有限公司 | Interaction processing method, device, equipment and storage medium |
CN111813224A (en) * | 2020-07-09 | 2020-10-23 | 电子科技大学 | Method for establishing and identifying fine gesture library based on ultrahigh-resolution radar |
CN112507898A (en) * | 2020-12-14 | 2021-03-16 | 重庆邮电大学 | Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN |
CN112558305A (en) * | 2020-12-22 | 2021-03-26 | 华人运通(上海)云计算科技有限公司 | Control method, device and medium for display picture and head-up display control system |
CN112905004A (en) * | 2021-01-21 | 2021-06-04 | 浙江吉利控股集团有限公司 | Gesture control method and device for vehicle-mounted display screen and storage medium |
CN112949512A (en) * | 2021-03-08 | 2021-06-11 | 豪威芯仑传感器(上海)有限公司 | Dynamic gesture recognition method, gesture interaction method and interaction system |
WO2022188259A1 (en) * | 2021-03-08 | 2022-09-15 | 豪威芯仑传感器(上海)有限公司 | Dynamic gesture recognition method, gesture interaction method, and interaction system |
CN113807287A (en) * | 2021-09-24 | 2021-12-17 | 福建平潭瑞谦智能科技有限公司 | 3D structured light face recognition method |
CN114265498A (en) * | 2021-12-16 | 2022-04-01 | 中国电子科技集团公司第二十八研究所 | Method for combining multi-modal gesture recognition and visual feedback mechanism |
CN114973408A (en) * | 2022-05-10 | 2022-08-30 | 西安交通大学 | Dynamic gesture recognition method and device |
CN116071817A (en) * | 2022-10-25 | 2023-05-05 | 中国矿业大学 | Network architecture and training method of gesture recognition system for automobile cabin |
CN116449947A (en) * | 2023-03-22 | 2023-07-18 | 江苏北斗星通汽车电子有限公司 | Automobile cabin domain gesture recognition system and method based on TOF camera |
Non-Patent Citations (2)
Title |
---|
Dynamic hand gesture recognition of Arabic sign language by using deep convolutional neural networks;Mohammad H. Ismai等;《Indonesian Journal of Electrical Engineering and Computer Science》;20220228;952~962 * |
基于深度学习的多分支轻量级网络模型的研究和应用;王铃昊;《中国优秀硕士学位论文电子期刊》;20230115;30-40页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117218716A (en) | 2023-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pickering et al. | A research study of hand gesture recognition technologies and applications for human vehicle interaction | |
CN111104820A (en) | Gesture recognition method based on deep learning | |
CN103530540B (en) | User identity attribute detection method based on man-machine interaction behavior characteristics | |
CN104750397A (en) | Somatosensory-based natural interaction method for virtual mine | |
CN105844263A (en) | Summary view of video objects sharing common attributes | |
CN102880292A (en) | Mobile terminal and control method thereof | |
CN113378641B (en) | Gesture recognition method based on deep neural network and attention mechanism | |
CN113591659B (en) | Gesture control intention recognition method and system based on multi-mode input | |
CN113327479B (en) | MR technology-based intelligent training system for driving motor vehicle | |
CN104881127A (en) | Virtual vehicle man-machine interaction method and system | |
CN113377193A (en) | Vending machine interaction method and system based on reliable gesture recognition | |
CN114821753B (en) | Eye movement interaction system based on visual image information | |
Meng et al. | Application and development of AI technology in automobile intelligent cockpit | |
CN111695408A (en) | Intelligent gesture information recognition system and method and information data processing terminal | |
Wang et al. | Gaze-aware hand gesture recognition for intelligent construction | |
CN117218716B (en) | DVS-based automobile cabin gesture recognition system and method | |
KR20120048190A (en) | Vehicle control system using motion recognition | |
CN116449947B (en) | Automobile cabin domain gesture recognition system and method based on TOF camera | |
CN105929944B (en) | A kind of three-dimensional man-machine interaction method | |
Zhang et al. | Mid-air gestures for in-vehicle media player: elicitation, segmentation, recognition, and eye-tracking testing | |
CN110413106B (en) | Augmented reality input method and system based on voice and gestures | |
CN109582136B (en) | Three-dimensional window gesture navigation method and device, mobile terminal and storage medium | |
Fu et al. | Research on application of cognitive-driven human-computer interaction | |
CN113807280A (en) | Kinect-based virtual ship cabin system and method | |
CN109144237B (en) | Multi-channel man-machine interactive navigation method for robot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |