CN117218716B

CN117218716B - DVS-based automobile cabin gesture recognition system and method

Info

Publication number: CN117218716B
Application number: CN202311005673.1A
Authority: CN
Inventors: 孙晓凯; 郝敬宾; 刘新华; 华德正; 梁赐; 曹戎格; 徐通; 刘晓帆; 周皓
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2024-04-09
Anticipated expiration: 2043-08-10
Also published as: CN117218716A

Abstract

The invention discloses a DVS-based automobile cabin gesture recognition system and a method, wherein the system comprises the following steps: the sensing layer, the decision layer and the execution layer are connected in sequence; the sensing layer acquires a gesture image of a user by using DVS; the decision layer classifies the gesture image signals through an algorithm processing module and an output module; the execution layer executes the gesture command through the feedback module; the method comprises the steps of collecting gesture data through DVS, transmitting an image signal to an algorithm processing module for processing, extracting depth information and three-dimensional hand skeleton characteristics for multi-mode fusion, outputting a signal to an execution layer after algorithm verification, signal analysis and model verification of an output module, and executing a command by a feedback module; the system and the method can filter gesture habit differences of different individuals of the user and solve the problem of low recognition accuracy and speed.

Description

DVS-based automobile cabin gesture recognition system and method

Technical Field

The invention relates to a gesture recognition system, in particular to a DVS-based automobile cabin gesture recognition system and a DVS-based automobile cabin gesture recognition method, and belongs to the technical field of automobile cabin and vehicle-mounted gesture recognition.

Background

With the development of automobile intellectualization, gesture recognition technology is widely applied to man-machine interaction and control of automobile cabins. Enriches the mental culture life of people and brings pleasant experience to people. The development of man-machine interaction can improve driving safety and convenience, reduce the visual and operation load of a driver, and promote a gesture interaction system under a cabin area to enter a rapid development period.

For the prior art, as disclosed in publication number CN110119200a, the infrared distance sensor is linearly connected with the optical sensor through a wire, the optical sensor is linearly connected with the gyroscope through a wire, the optical sensor is linearly connected with the accelerator through a wire, the optical sensor is linearly connected with the MCU processor through a wire, and the MCU processor is linearly connected with the gesture storage module through a wire. The automobile gesture recognition system can effectively reduce automobile keys, simplify the operation of an automobile, effectively improve the rapidness and high efficiency of the operation, and enable the operation of the automobile to be intelligent, and further provide certain entertainment for operators, but the gesture recognition of the existing automobile cabin is mainly a static gesture recognition mode, wherein the wearable sensor equipment is common, has better robustness, is relatively high in input use cost and is not suitable for mass production. For vehicle-mounted static gesture recognition, although the recognition rate is high and the recognition is easy, the vehicle-mounted static gesture recognition is not suitable for the current production and living standard. Based on DVS, gesture actions can be captured in real time, space-time feature network fusion is carried out through LSTM and MobilenetV3 networks, a three-dimensional gesture database of the user gesture is obtained, and corresponding gesture signals are fed back in real time through gesture matching output signals. The method has obvious precision advantage and clear image compared with the traditional camera.

Therefore, in order to solve the drawbacks of the prior art and the market demand of intelligent automobile cabins, it is necessary to design a DVS-based gesture recognition system and method for intelligent automobile cabins to solve the above problems.

Disclosure of Invention

The invention aims to solve at least one technical problem and provide a DVS-based automobile cabin gesture recognition system and method, which have high recognition precision and safety, and the gesture learning and optimizing method can realize intelligent self-adaption to environmental change and provide good man-machine interaction experience for users.

The invention realizes the above purpose through the following technical scheme: the utility model provides a car cabin gesture recognition system based on DVS, includes perception layer, decision-making layer, the execution layer that connects gradually, its characterized in that: the perception layer consists of DVS; the decision layer consists of an algorithm processing module and an output module; the execution layer is composed of a feedback module.

As still further aspects of the invention: the DVS of the perception layer is assembled within the cabin, captures the gesture motion (brightness or distance change information) of the driver or passenger with microsecond time resolution, and generates time-related events that carry the time stamp and spatial location information of event findings.

As still further aspects of the invention: the DVS is configured to receive the events, filter, cluster, and sort the time to reconstruct a time series of gesture actions.

As still further aspects of the invention: the decision layer receives gesture signals acquired by the perception layer DVS, processes the gesture signals through a decision layer algorithm and outputs the gesture signals to the execution layer.

As still further aspects of the invention: the algorithm processing module combines the depth information and the three-dimensional hand skeleton information characteristics, and carries out multi-mode fusion with the LSTM through the MobileNet V3.

As still further aspects of the invention: the output module processes the information output by the algorithm processing module through algorithm verification, signal analysis and model verification, and inputs the information to the execution layer;

wherein, the output module comprises the following steps:

s1, constructing a DVS gesture library: collecting various three-dimensional gesture data according to vehicle-mounted function requirements and interaction habits, and constructing a three-dimensional gesture library, wherein each gesture corresponds to a vehicle control operation;

s2, gesture matching: performing gesture matching on the acquired three-dimensional gesture image sequence by using a fusion model of LSTM and MobilenetV3, and matching with each gesture template in a gesture library to obtain the most matched gesture category and matching degree;

s3, gesture filtering: setting a threshold value based on the matching degree of the three-dimensional gestures, filtering out gestures with lower matching degree, and only selecting gestures with matching degree higher than the threshold value to perform subsequent control operation;

s4, control instruction generation: generating a corresponding vehicle control instruction according to a gesture template which is most matched with the input three-dimensional gesture;

s5, scene judgment: judging the driving scene of the current vehicle, if the gesture operation which is the best match is not matched with the current scene, not generating a control instruction, and giving a warning;

s6, visual feedback: displaying the most matched gesture templates on the vehicle-mounted display screen, and displaying the execution effect of the corresponding vehicle-mounted functions;

s7, operation record: recording a three-dimensional gesture operation process of a driver and a feedback process of a system, wherein the three-dimensional gesture operation process and the feedback process are used for incremental learning of a gesture library and optimization of template matching;

s8, matching optimization: the matching model between the three-dimensional gesture and the template is continuously optimized by using an incremental learning method.

As still further aspects of the invention: the feedback module of the execution layer receives the command signal output by the decision layer and executes the command, and the output mode comprises air conditioner air volume, media volume, window lifting and center control screen page turning;

wherein, the feedback module comprises the following steps:

s1, visual feedback: the best matched gesture template and corresponding vehicle-mounted function execution effect (such as opening of a vehicle window) are displayed on a vehicle-mounted display screen, so that proper visual feedback is given to a driver, and gestures are convenient to correct or re-input;

s2, voice feedback: the system informs the driver of the most matched gesture operation and the executed vehicle-mounted control instruction in a voice mode, and carries out necessary voice reminding and interaction;

s3, executing the function: and after receiving the control instruction generated by the mapping, the vehicle-mounted system controls the corresponding vehicle-mounted functional module to perform operation execution (such as opening a vehicle window). The execution result is also displayed as a feedback;

s4, matching results: the system informs the driver of the matching result between the three-dimensional gesture and the gesture template, including the gesture type and the matching degree of the best matching. This can also be used as a feedback for the driver to judge the accuracy of the gesture input;

s5, reporting errors and reminding: if the system detects that the three-dimensional gesture is not matched with the current driving scene and a control instruction is not generated, a fault reporting prompt is given in a visual voice mode, and a driver is prompted to input the gesture again;

s6, operation record: the system records the whole three-dimensional gesture operation process and feedback process for subsequent analysis of interaction effect and improvement of experience. The recorded content can also be used for online learning and optimization of a matching model;

s7, user evaluation: the system inquires evaluation feedback of the three-dimensional gesture interaction effect to the driver, and accordingly selects whether the matching model and the interaction rule need to be updated to achieve personalized optimization.

As still further aspects of the invention: the feedback module specifically comprises: the air conditioner, the volume, the central control screen, the car window and the like can respond differently when receiving command signals, wherein the central control screen can respond according to gesture page scrolling, page turning, space clicking and the like; the volume can be adjusted according to the distance between the thumb and the index finger in the z direction of the space coordinate system; the vehicle window can ascend or descend according to the up-and-down swing of the gesture, and the front-and-back swing enables the skylight to be opened or closed; the air quantity of the air conditioner is adjusted according to the distance between the thumb and the index finger in the x direction of the space coordinate system.

As still further aspects of the invention: the feedback module performs feedback output and mainly comprises the following gesture and control instruction mapping activation: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are horizontal and the rest four thumbs point downwards to swing upwards and downwards, the vehicle window descends along with the palms, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and the index finger is singly extended to move left and right up and down, the left and right page turning of the screen page can be controlled, the sliding of the up and down contents can be controlled, when the index finger joint has larger angle change, a single click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double click command is continuously generated twice, the double click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.

The DVS-based automobile cabin gesture recognition method comprises an automobile cabin gesture recognition system, wherein the gesture recognition method is based on a multi-mode fusion network model for detection, and comprises the following steps of:

s1, capturing gesture actions of a user in real time through DVS, generating a gesture sequence frame image, forming an event sequence of the gesture image, further processing the event sequence as original input of a network to extract time sequence and spatial characteristics, acquiring gestures with different visual angles and modes, enriching input information, and improving detection robustness;

s2, detecting gesture key points: detecting gesture key points in each frame of images on the time sequence of the gesture images by using a key point detection model OpenPose, obtaining time sequence coordinates of the gesture key points with multiple views, and capturing fine action characteristics and three-dimensional space information of the gestures;

s3, preprocessing a gesture image: performing scale normalization, image rotation, noise filtering, frame selection and modal Registration preprocessing on the acquired gesture image time sequences to improve the matching degree and the feature extraction effect of the time sequences of different modalities;

s4, extracting spatial features of the gesture image: carrying out feature extraction on each frame of image of the preprocessed gesture image time sequence by using a lightweight MobilenetV3 and other networks so as to acquire advanced spatial feature mapping of the images, and enhancing understanding of gesture details;

s5, multi-mode space-time feature network fusion; fusing the multi-view gesture key point time sequence and the image space feature map obtained by step2 and step4, constructing space-time features of the gesture, and performing gesture detection as input of an LSTM network;

s6, post-processing of detection results: post-processing is carried out on the prediction result of the LSTM, wherein the post-processing comprises multi-mode feature mapping, coordinate mapping, smoothing processing and three-dimensional reconstruction to obtain gesture types, multi-view key point time sequences and three-dimensional gesture space information.

The beneficial effects of the invention are as follows:

1. the system can be personalized: by recording and analyzing the gesture operation process of the driver and evaluating the feedback effect, the system can realize personalized optimization of three-dimensional gesture interaction, select the interaction rule and the mapping model which are most matched with the operation habit of the driver, and provide personalized man-machine interaction experience;

2. the operation record and feedback mechanism can provide data support for the investigation of possible traffic accidents after the fact, the generation process and responsibility attribution of the control instruction can be conveniently judged, and the safety guarantee mechanism and the error prevention capability of the system can be improved continuously, so that the safety is improved.

3. The collected large amount of gesture operation data provide conditions for the learning and optimization of the system, the understanding of the system to complex scenes and personal habits can be enhanced by analyzing the data, a man-machine interaction mechanism is continuously improved, intelligent self-adaption to environmental changes is realized, and the learning and optimization is continuously promoted.

4. The three-dimensional gesture interaction system and the targeted control feedback mechanism can be well applied to intelligent driving scenes, high coordination of gesture natural interaction and a vehicle-mounted system is achieved, the operation convenience of a driver and the usability of the system are improved, and the method has high practical application value.

Drawings

FIG. 1 is a block diagram of a gesture recognition system of the present invention;

FIG. 2 is a block diagram of a gesture recognition method according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Embodiment one: as shown in fig. 1, a DVS-based gesture recognition system for an automobile cabin includes a sensing layer, a decision layer, and an executing layer, which are sequentially connected, and is characterized in that: the perception layer consists of DVS; the decision layer consists of an algorithm processing module and an output module; the execution layer is composed of a feedback module.

Embodiment two: in addition to all the technical features in the first embodiment, the present embodiment further includes:

the DVS of the perception layer is assembled within the cabin, captures the gesture motion (brightness or distance change information) of the driver or passenger with microsecond time resolution, and generates time-related events that carry the time stamp and spatial location information of event findings.

The DVS is configured to receive the events, filter, cluster, and sort the time to reconstruct a time series of gesture actions.

The decision layer receives gesture signals acquired by the perception layer DVS, processes the gesture signals through a decision layer algorithm and outputs the gesture signals to the execution layer.

The algorithm processing module combines the depth information and the three-dimensional hand skeleton information characteristics, and carries out multi-mode fusion with the LSTM through the MobileNet V3.

Embodiment III: in addition to all the technical features in the first embodiment, the present embodiment further includes:

the output module processes the information output by the algorithm processing module through algorithm verification, signal analysis and model verification, and inputs the information to the execution layer;

wherein, the output module comprises the following steps:

s3, gesture filtering: setting a threshold value based on the matching degree of the three-dimensional gestures, filtering out gestures with lower matching degree, and only selecting gestures with higher matching degree than the threshold value for subsequent control operation, so that the accuracy of instructions and the robustness of a system can be improved;

s4, control instruction generation: generating a corresponding vehicle control instruction according to a gesture template which is most matched with the input three-dimensional gesture, and generating a control instruction for opening a left front door window if the operation corresponding to the most matched gesture template is that the left front door window is opened;

s5, scene judgment: judging the driving scene of the current vehicle, if the gesture operation which is the best match is not matched with the current scene, not generating a control instruction, and giving an alarm. The misoperation caused by scene change in the three-dimensional gesture interaction process can be avoided;

s6, visual feedback: the best matched gesture template is displayed on the vehicle-mounted display screen, the execution effect of the corresponding vehicle-mounted function is displayed, appropriate visual feedback is given to a driver, and the gesture is convenient to correct or re-input;

s7, operation record: and recording a three-dimensional gesture operation process of a driver and a feedback process of a system, and optimizing incremental learning and template matching of a gesture library to realize personalized customization.

S8, matching optimization: the incremental learning method is used for continuously optimizing a matching model between the three-dimensional gesture and the template, so that accuracy and robustness of gesture matching are improved, and a foundation is provided for realizing high-performance man-machine interaction.

The output method can improve the data processing speed, maintain good accuracy and robustness and provide personalized man-machine interaction.

Embodiment four: in addition to all the technical features in the first embodiment, the present embodiment further includes:

the feedback module of the execution layer receives the command signal output by the decision layer and executes the command, and the output mode comprises air conditioner air volume, media volume, window lifting and center control screen page turning;

wherein, the feedback module comprises the following steps:

Through the feedback module, interactive feedback and communication of man-machine interaction can be effectively realized, dangerous operation is avoided, the intelligence of the system is enhanced, and the safety of the system is ensured.

Fifth embodiment: in addition to all the technical features in the first embodiment, the present embodiment further includes:

the feedback module specifically comprises: the air conditioner, the volume, the central control screen, the car window and the like can respond differently when receiving command signals, wherein the central control screen can respond according to gesture page scrolling, page turning, space clicking and the like; the volume can be adjusted according to the distance between the thumb and the index finger in the z direction of the space coordinate system; the vehicle window can ascend or descend according to the up-and-down swing of the gesture, and the front-and-back swing enables the skylight to be opened or closed; the air quantity of the air conditioner is adjusted according to the distance between the thumb and the index finger in the x direction of the space coordinate system.

The feedback module performs feedback output and mainly comprises the following gesture and control instruction mapping activation: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are horizontal and the rest four thumbs point downwards to swing upwards and downwards, the vehicle window descends along with the palms, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and the index finger is singly extended to move left and right up and down, the left and right page turning of the screen page can be controlled, the sliding of the up and down contents can be controlled, when the index finger joint has larger angle change, a single click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double click command is continuously generated twice, the double click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.

Example six: as shown in fig. 2, a DVS-based car cabin gesture recognition method, including a car cabin gesture recognition system, the gesture recognition method detects based on a multi-mode fusion network model, the gesture recognition method includes the following steps:

Through multi-mode network fusion, a new network model with high precision, strong timeliness and high robustness can be obtained.

The method is realized based on DVS, so that the overall stability of the gesture recognition system is improved, the accuracy of gesture recognition of the automobile cabin can be effectively improved, the recognition delay is reduced, the overall robustness of the system is improved, and better human-computer interaction experience is provided; the operation record and feedback mechanism can provide data support for the investigation of possible traffic accidents after the fact, is convenient for judging the generation process and responsibility attribution of the control instruction, and is beneficial to continuously improving the safety guarantee mechanism and error proofing capability of the system and improving the safety; the gesture recognition system and the targeted feedback mechanism can be well applied to intelligent driving scenes, so that the gesture natural interaction and the high cooperation of the automobile cabin system are realized, the operation convenience of a driver and the usability of the system are improved, and the method has high practical application value.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. The utility model provides a car cabin gesture recognition system based on DVS, includes perception layer, decision-making layer and the execution layer that connects gradually, its characterized in that: the perception layer consists of DVS; the decision layer consists of an algorithm processing module and an output module; the execution layer consists of a feedback module;

the DVS of the perception layer is assembled in a vehicle cabin, captures gesture actions of a driver or a passenger with microsecond time resolution, and generates time-related events, wherein the events carry time stamps and spatial position information of event discovery, and the DVS is used for receiving the events, filtering, clustering and sequencing the time to reconstruct a time sequence of the gesture actions;

the decision layer receives gesture signals acquired by the DVS of the perception layer, processes the gesture signals through a decision layer algorithm and outputs the gesture signals to the execution layer, and the algorithm processing module combines depth information and three-dimensional hand skeleton information characteristics and carries out multi-mode fusion with LSTM through MobileNet V3;

wherein, the output module comprises the following steps:

s6, visual feedback: displaying the best matched gesture template on the vehicle-mounted display screen, displaying the execution effect of the corresponding vehicle-mounted function, and giving appropriate visual feedback to a driver;

s8, matching optimization: the incremental learning method is used for continuously optimizing a matching model between the three-dimensional gesture and the template, so that accuracy and robustness of gesture matching are improved, and a foundation is provided for realizing high-performance man-machine interaction;

wherein, the feedback module comprises the following steps:

s1, visual feedback: the best matched gesture template and the corresponding vehicle-mounted function execution effect are displayed on the vehicle-mounted display screen, appropriate visual feedback is given to a driver, and the gesture is convenient to correct or re-input;

s3, executing the function: after receiving the control instruction generated by mapping, the vehicle-mounted system controls the corresponding vehicle-mounted functional module to perform operation execution;

s4, matching results: the system informs the driver of the matching result between the three-dimensional gesture and the gesture template, including the most matched gesture category and matching degree;

s6, operation record: the system records the whole three-dimensional gesture operation process and feedback process for subsequent analysis of interaction effect and improvement of experience;

s7, user evaluation: the system inquires evaluation feedback of the three-dimensional gesture interaction effect to a driver, and accordingly selects whether a matching model and an interaction rule need to be updated so as to realize personalized optimization;

the feedback module specifically comprises: the air conditioner, the volume, the central control screen, the car window and the like can respond differently when receiving command signals, wherein the central control screen can respond according to gesture page scrolling, page turning, space clicking and the like; the volume can be adjusted according to the distance between the thumb and the index finger in the z direction of the space coordinate system; the vehicle window can ascend or descend according to the up-and-down swing of the gesture, and the front-and-back swing enables the skylight to be opened or closed; the air quantity of the air conditioner is adjusted according to the distance between the thumb and the index finger in the x direction of the space coordinate system;

the feedback module performs feedback output and comprises the following gesture and control instruction mapping activation:

when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed;

when the palms are horizontal and the rest four thumbs point downwards to swing upwards and downwards, the vehicle window descends along with the palms, and otherwise ascends;

when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and the index finger is singly extended to move left and right up and down, the left and right page turning of the screen page can be controlled, the sliding of the up and down contents can be controlled, when the index finger joint has larger angle change, a single click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double click command is continuously generated twice, the double click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.

2. A DVS-based car cabin gesture recognition method, comprising the car cabin gesture recognition system of claim 1, wherein the gesture recognition method is based on a multi-modal fusion network model for detection, the gesture recognition method comprising the steps of:

s1, capturing gesture actions of a user in real time through DVS, generating gesture sequence frame images, forming an event sequence of the gesture images, and further processing the event sequence as original input of a network to extract time sequence and spatial characteristics, wherein gestures with different visual angles and modes are required to be acquired;

s3, preprocessing a gesture image: performing scale normalization, image rotation, noise filtering, frame selection and modal Registration preprocessing on the acquired gesture image time sequence;

s4, extracting spatial features of the gesture image: extracting features of each frame of the preprocessed gesture image time sequence by using a lightweight MobilenetV3 network;