CN116449947A

CN116449947A - Automobile cabin domain gesture recognition system and method based on TOF camera

Info

Publication number: CN116449947A
Application number: CN202310284871.XA
Authority: CN
Inventors: 郝敬宾; 孙晓凯; 张正烜; 徐林浩; 刘新华; 华德正; 梁赐; 刘晓帆; 周皓
Original assignee: Jiangsu Bdstar Navigation Automotive Electronics Co ltd
Current assignee: Jiangsu Bdstar Navigation Automotive Electronics Co ltd
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-07-18
Anticipated expiration: 2043-03-22
Also published as: CN116449947B

Abstract

The invention discloses an automobile cabin domain gesture recognition system and method based on a TOF camera, wherein the system comprises the following steps: the sensing layer, the decision layer and the execution layer are connected in sequence; the sensing layer acquires a gesture image of a user by using a TOF camera, and an external vehicle condition sensor assists in sensing an environment; the decision layer collects gesture image signals through the cabin domain controller and the chip module and processes the gesture image signals; the execution layer executes the gesture command through the response module; the method comprises the steps of collecting gesture data through a TOF camera, transmitting gesture signals to a cabin controller, performing signal processing through a chip module, sensing an external environment with the assistance of an external vehicle condition sensor, and outputting vehicle running state signals; after logic verification, signal processing and model verification of the chip module, signals are output to an execution layer, and the response module executes commands. The system can improve the man-machine interaction experience of the user in the driving process and improve the reliability of driving safety.

Description

Automobile cabin domain gesture recognition system and method based on TOF camera

Technical Field

The invention relates to the field of gesture recognition of automobile cabins, in particular to a gesture recognition system and method of automobile cabins based on a TOF camera.

Background

With the development of artificial intelligence, the machine vision technology gradually enters daily life of people, so that the mental culture life of people is enriched, and pleasant experience is brought to people. The development of the artificial intelligent interaction mode of gesture recognition enables a user to control interaction with equipment by gestures, so that the development of man-machine interaction is promoted, and the vehicle-mounted gesture interaction system enters a rapid development period.

At present, the vehicle-mounted gesture recognition is mainly realized by means of wearable sensing equipment and a simple static gesture recognition mode, wherein the wearable sensing equipment is relatively high in use cost and not suitable for mass production although the accuracy is high and the robustness is good. For vehicle-mounted static gesture recognition, although the recognition rate is high and the recognition is easy, the vehicle-mounted static gesture recognition is not suitable for the current production and living standard. Based on the TOF camera, the depth image can be generated and tracked in real time, and the high-resolution depth image of the gesture of the user is obtained through an optimization algorithm, a depth fusion algorithm and an image defogging algorithm of the TOF, so that the method has obvious precision advantage and clear image compared with the traditional camera. However, the imaging device has certain defects and is easily interfered by external environment light and imaging precision.

Therefore, in order to solve the drawbacks of the prior art, it is necessary to design a system and a method for gesture recognition of an intelligent cabin of an automobile based on a TOF camera to solve the above problems.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a gesture recognition system and method for an automobile cabin area based on a TOF camera, which have higher recognition precision and efficiency, are not easy to be interfered by external noise and ambient light, and can provide good human-computer interaction experience for users.

In order to achieve the above object, the present invention has the following technical scheme:

an automobile cabin domain gesture recognition system based on a TOF camera comprises a perception layer, a decision layer and an execution layer which are sequentially connected; the sensing layer consists of a TOF camera and an external vehicle condition sensor; the decision layer consists of a cabin domain controller and a chip module; the execution layer is composed of response modules.

The TOF camera in the perception layer is used for collecting gesture information in the cabin range in real time, the external vehicle condition sensor is used for assisting the gesture system to detect the external environment, if the detection is abnormal, the user safety belt in the cabin is slowly tightened, and the vehicle is subjected to deceleration operation.

The signal of the receiving perception layer in the decision layer is processed by the decision layer and sent to the execution layer, wherein a visual domain module and an intelligent information domain module are added in the cabin domain controller, so that the functions of increasing the visual image tracking processing capacity and the information transmission efficiency are respectively achieved; and simultaneously, three steps of logic verification, signal processing and model verification are set in the chip module, and the obtained gesture information of the user is subjected to matching verification and a corresponding command signal is sent out.

And the response module in the execution layer receives the command signal output by the decision layer, executes the command and displays the output mode through a central control screen, a media sound, a vehicle skylight and the like.

Further, the response module central control screen, the media sound and the vehicle skylight can make different responses when receiving command signals, wherein the central control screen can perform responses such as up-and-down screen sliding, left-and-right page turning, space clicking and the like according to gestures; the media sound can adjust the volume according to the distance between the thumb and the index finger; the vehicle skylight and the surrounding windows can ascend or descend according to the up-and-down swing of the palm, the front-and-back swing enables the skylight to be opened or closed, the safety belt can be slowly tightened according to the output signals transmitted by the external vehicle condition sensor, and the vehicle is decelerated.

A gesture recognition method for an automobile cabin area based on a TOF camera comprises the following steps:

(1) The method comprises the steps of collecting a user real-time gesture image through a TOF camera, transmitting pulse light through the TOF module, receiving light reflected back by the hand of the user through a built-in sensor, and obtaining the user real-time gesture depth image according to the time difference of the pulse light and the light.

(2) Optimizing an image obtained by the TOF camera by adopting a deep Convolutional Neural Network (CNN);

(3) Processing the image by adopting a depth fusion frame to obtain a high-resolution depth map;

(4) Removing ambient light noise by adopting a defogging algorithm to obtain a high-resolution gray level image;

(5) The method comprises the steps of obtaining a fusion depth high-resolution image through image fusion, obtaining a three-dimensional gesture model through point cloud data processing, processing a three-dimensional hand network through a three-dimensional gesture regressive, rendering the hand network through a grid renderer, obtaining a real-time three-dimensional gesture image of the hand, testing and matching data, outputting a signal of matching information to an execution layer, and responding to bottom hardware such as a central control screen, a media sound box and an automobile skylight of a response module.

Further, the step of optimizing the depth convolution neural network image comprises the following steps:

(2-1) making a TOF dataset. A large number of simulated TOF datasets are produced on a graphical basis, containing TOF raw measurements with sufficient error simulation, depth truth values at high resolution, and corresponding RGB images.

(2-2) performing error correction on the TOF depth map. And providing a cascade iterative convolutional neural network to repair errors in the depth map step by step in a residual prediction mode.

(2-3) super resolution of RGB guided TOF depth map. And analyzing the modal difference between the RGB image and the depth map, designing a pre-fusion module, a post-fusion module and a cross-modal airspace attention fusion module, and applying a second-order gradient smoothing consistency loss in a loss function.

(2-4) acceleration and deployment of TOF imaging optimization algorithms on low power embedded devices. The time complexity of the method is reduced on the premise of minimum accuracy loss through network structure optimization design. The optimization method can improve the operation speed of the algorithm and simultaneously maintain good accuracy.

Further, the depth fusion framework processes the image, which includes:

(3-1) projecting depth information obtained by the TOF camera onto a reference stereoscopic camera view angle;

(3-2) calculating by a stereo matching algorithm to obtain a high-resolution depth map;

(3-3) estimating the confidence of the stereoscopic parallax and the TOF depth map by using a CNN network;

(3-4) fusing the upsampled TOF output result and the stereoscopic parallax; the depth map precision can be effectively improved through the fusion frame.

Further, the defogging algorithm removes ambient light noise, and the step of obtaining the high-resolution gray scale image comprises the following steps:

(4-1) setting different thresholds based on the ambient light difference to divide the region where the ambient light difference is large;

(4-2) defogging the bright area by adopting a contrast map priori defogging algorithm, and processing the dark area by adopting a dark channel priori defogging algorithm;

(4-3) adopting a dynamic threshold white balance algorithm to adjust the brightness and color of the restored image;

the processing method can effectively solve the problems of defogging color distortion and quality degradation of the image caused by the difference of ambient light, can improve the robustness of the output image, has higher color fidelity and definition of the restored image, and is more in line with the visual observation characteristic.

Further, the method for identifying the gesture of the automobile cabin domain based on the TOF camera is characterized in that the image fusion method is a multi-scale transformation image fusion method, and the method comprises the following steps:

(1) Respectively carrying out multi-scale decomposition on the original image to obtain a series of sub-images of a transformation domain;

(2) Extracting the most effective features on each scale in the transformation domain by adopting a certain fusion rule to obtain a composite multi-scale representation;

(3) Performing multi-scale inverse transformation on the composite multi-scale representation to obtain a fused image;

the processing method can obtain the high-resolution fusion depth image of the user hand;

further, the method for generating the three-dimensional gesture image by the depth map point cloud comprises the following steps: performing undischort operation on the image coordinate system to obtain a point cloud coordinate system, wherein the transformation formula is as follows:

wherein x, y, z are the point cloud coordinate system, x 'y' is the image coordinate system, and D is the depth value;

further, the method for generating the three-dimensional gesture by the point cloud image comprises the following steps:

(1) Preprocessing of point cloud images

(2) Single-frame point cloud gesture recognition algorithm for template matching

(3) Gesture recognition algorithm of continuous point cloud

According to the method, edge detection can be carried out on the gesture of the user, boundary points and interior points are divided, marks of fingertips and deformed joints are detected, the three-dimensional space positions of the obtained key points are finally fitted with a hand skeleton model of a gesture library, and matched gesture instruction signals are output to a response module;

further, the gesture library is matched, and the response of the execution layer mainly comprises the following gestures and the response of the bottom hardware: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are nearly parallel, the rest four points except the thumb swing downwards to swing upwards and downwards, the vehicle window descends along with the movement, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and a forefinger is singly extended to move left and right up and down, the left and right page turning of a screen page can be controlled, the sliding of the up and down contents can be controlled, when the joint of the forefinger has larger angle change, a single-click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double-click response of the screen is obtained when the single-machine command continuously appears twice, and the double-click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.

Compared with the prior art, the invention has the following beneficial effects:

1. the system can effectively and efficiently process gesture information, can reduce the power consumption of equipment and the operation time under the action of a CNN optimization algorithm, and enhances the robustness of a cabin domain system;

2. the external vehicle condition sensor with the auxiliary function of the sensing layer can detect the external environment in the driving process, so that the driving safety of a user is further ensured;

3. the resolution of a gesture image of a user and the robustness of gesture recognition in a complex environment can be effectively improved based on an optimization algorithm, a defogging algorithm, an image fusion method, a point cloud conversion three-dimensional gesture method and the like;

4. compared with the traditional control screen mode, the gesture recognition improves the operability of the hands of the user, and the gesture is simple and easy to understand and convenient to memorize, so that most of operable functions in the cabin are covered, and the man-machine interaction experience of the user is improved.

Drawings

FIG. 1 is a block diagram of a system architecture of the present invention;

FIG. 2 is a block diagram of the method of the present invention;

FIG. 3 is a block diagram of a CNN optimization method of the present invention;

FIG. 4 is a depth fusion framework of the present invention;

FIG. 5 is a block diagram of an image defogging model of the present invention;

FIG. 6 is a block diagram of a multi-scale transformed image fusion of the present invention;

FIG. 7 is a graph of the point cloud versus depth coordinate system change matrix of the present invention;

FIG. 8 is a block diagram of three-dimensional features of a point cloud processing generated gesture of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

Examples

As shown in fig. 1, the gesture recognition system for the automobile cabin area based on the TOF camera comprises a perception layer, a decision layer and an execution layer which are sequentially connected; wherein:

the sensing layer consists of a TOF camera and an external vehicle condition sensor; the TOF camera is used for acquiring gesture information in the cabin range in real time, acquiring a hand depth image of a user, the external vehicle condition sensor is used for assisting the gesture recognition system to detect the external environment of the vehicle, if the external environment or the driving state is abnormal, the safety belt can be slowly tightened, the vehicle is decelerated, and then a fist-making gesture signal is matched.

The decision layer consists of a cabin domain controller and a chip module; the received sensing layer signals are processed by a decision layer and sent to an execution layer, wherein a visual domain module and an intelligent information domain module are added in a cabin domain controller to respectively play roles of increasing the visual image tracking processing capacity and the information transmission efficiency, and a better implementation environment is provided for a user gesture recognition algorithm; meanwhile, three steps of logic verification, signal processing and model verification are arranged on the chip module, and through the processing of the three steps, the recognition of gesture signals and the matching efficiency of a gesture library are accelerated, and the obtained gesture information of a user is subjected to matching verification and a corresponding command signal is sent out.

The execution layer consists of response modules; the response module receives the command signal output by the decision layer, executes the command and displays the output mode through a central control screen, a media sound box, a vehicle skylight, a safety belt and the like.

The response module is used for responding the central control screen, the media sound and the vehicle skylight differently when receiving the command signal, wherein the central control screen can respond up and down screen sliding, left and right page turning, space clicking and the like according to gestures; the media sound can adjust the volume according to the distance between the thumb and the index finger; the vehicle skylight and the surrounding windows can ascend or descend according to the up-and-down swing of the palm, and the front-and-back swing enables the skylight to be opened or closed.

In this embodiment, a flowchart of a specific working method of the gesture recognition process of the decision layer is shown in fig. 2, and includes:

and acquiring a user real-time gesture image through a TOF camera, transmitting pulse light by the TOF module, and receiving light reflected by the hand of the user by utilizing a built-in sensor. Then, according to the time difference between the two, a gesture depth image of a user in real time is obtained and transmitted to a cabin controller, then a gesture signal is processed by a vision domain module and an intelligent information domain module, and is processed again through three steps of logic verification, signal processing and model verification of a chip module, wherein the chip module acquires the gesture image of the user in real time, and an optimization algorithm is adopted to optimize the image obtained by a TOF camera, so that the processing speed is improved; processing the image by adopting a depth fusion frame to obtain a high-precision depth map; removing ambient light noise by adopting a defogging algorithm to obtain a high-resolution gray level image; the method comprises the steps of obtaining a fusion depth high-resolution image through image fusion, obtaining a three-dimensional gesture model through point cloud data processing, processing a three-dimensional hand network through a three-dimensional gesture regressive, rendering the hand network through a grid renderer, obtaining a real-time three-dimensional gesture image of the hand, testing and matching data, outputting a signal of matching information to an execution layer, and responding to bottom hardware such as a central control screen, a media sound, an automobile skylight, a safety belt and the like of a response module.

In this embodiment, a specific working method flowchart of the optimization method for TOF images, as shown in fig. 3, includes:

s1, manufacturing an analog TOF data set. A large number of simulated TOF datasets are produced on a graphical basis, containing TOF raw measurements with sufficient error simulation, depth truth values determined at high resolution, and corresponding RGB images.

S2, performing error repair on the TOF depth map. And providing a cascade iterative convolutional neural network to repair errors in the depth map step by step in a residual prediction mode.

S3, RGB guides super-resolution of the TOF depth map. And analyzing the modal difference between the RGB image and the depth map, designing a pre-fusion module, a post-fusion module and a cross-modal airspace attention fusion module, and applying a second-order gradient smoothing consistency loss in a loss function.

S4, accelerating and deploying the TOF imaging optimization algorithm on the low-power-consumption embedded equipment. The time complexity of the method is reduced on the premise of minimum accuracy loss through network structure optimization design.

The TOF simulation data set mainly comprises scenes such as a user driving position, a co-driving position, a car cabin back space and the like, the generation of the data set comprises a scene construction part and an imaging rendering part, wherein the scene construction part mainly comprises a car cabin model of Beidou satellite, the cabin internal scene under different light intensities is constructed by using a 3D model to simulate the actual TOF camera using scene, the imaging rendering of a correlation coefficient map uses transient rendering, the amplitude modulation of a rendering result is carried out through the light received by the rendering camera on each time segment to obtain the simulation of the correlation coefficient map, and the bottom layer of the renderer is based on the light tracking principle, so that the simulation of multipath errors can be supported, and the renderer can render the time-resolved image under the pose due to the random position and orientation of the user gesture. Assuming that a rendering time interval is divided into N small segments, a rendering result R with a size H W N can be obtained, and a correlation coefficient diagram to be simulated can be obtained by the following formula:

n is the number of time segments, R is the rendering result of time resolution, f is the modulation frequency, τ is the duration of the time segments (in seconds), C ₀ And C ₁ Is a correlation coefficient diagram of the obtained different phase offsets.

The optimization method can improve the operation speed of the algorithm and simultaneously maintain good accuracy.

In this embodiment, a specific workflow of image processing by the depth fusion framework, as shown in fig. 4, includes:

s1, projecting depth information obtained by the TOF camera to a reference stereoscopic camera view angle.

S2, calculating by a stereo matching algorithm to obtain a high-resolution depth map.

S3, estimating and obtaining the confidence coefficient of the stereoscopic parallax and the TOF depth map by using the CNN network

S4, fusing the up-sampled TOF output result and the stereo parallax;

the depth map precision can be effectively improved through the fusion frame.

In this embodiment, a specific working method flowchart of removing ambient light noise by the defogging algorithm to obtain a high-resolution gray scale image is shown in fig. 5, and includes:

s1, setting different threshold values based on ambient light differences to divide areas with larger ambient light differences;

s2, defogging a bright area by adopting a comparison graph priori defogging algorithm, and processing a dark area by adopting a dark channel priori defogging algorithm;

s3, adjusting the brightness and color of the restored image by adopting a dynamic threshold white balance algorithm;

the processing of ambient light converts an original image into a gray image, obtains gradient information of the gray image, denoises the gradient of the gray image, sets a gradient threshold w, judges that the gradient threshold w is smaller than the gradient threshold as an initial area, counts the average brightness of pixel values of corresponding positions of the original image for each divided connected area, and if the average brightness is larger than the brightness threshold w ₁ Then remain as bright areas; otherwise, dark areas are set, thereby further refining the bright areas.

The processing method can effectively solve the problems of defogging color distortion and quality reduction of the image caused by the difference of ambient light, and has higher contrast. The robustness of the output image can be improved, and the color fidelity and definition of the restored image are higher, so that the method is more in line with the visual observation characteristic.

In this embodiment, a flowchart of a specific working method for multi-scale transformation image fusion, as shown in fig. 6, includes:

s1, respectively carrying out multi-scale decomposition on original images to obtain a series of sub-images of a transformation domain;

s2, extracting the most effective features on each scale in the transformation domain by adopting a certain fusion rule to obtain a composite multi-scale representation;

s3, performing multi-scale inverse transformation on the composite multi-scale representation to obtain a fused image;

the processing method can obtain the high-resolution fusion depth image of the user hand, adopts a CVT conversion method, has multi-resolution and time-frequency localization analysis characteristics of DWT conversion, has anisotropy and strong directivity, can accurately and sparsely represent the edge information of the image by using fewer non-zero coefficients, integrates most useful information of the image by approximating the non-zero coefficient points after the curve-like singular feature conversion of the image, has more concentrated energy, and is beneficial to analyzing important features such as edges, textures and the like of the image.

In this embodiment, a specific workflow diagram for generating a three-dimensional gesture image by using a depth map point cloud, as shown in fig. 7, includes:

s1, performing undistitor operation on an image coordinate system to obtain a point cloud coordinate system;

s2, converting the image coordinate system into a world coordinate system;

s3, setting an external parameter matrix, wherein the world coordinate origin and the camera origin are coincident, namely, rotation and translation are not generated, so that the external parameter matrix is formed:

s4, the coordinate origin of the camera coordinate system and the world coordinate system coincide, so that the same object under the camera coordinate and the world coordinate has the same depth, namely Z _c ＝Z _w Further reduction can be made as:

s5, calculating to obtain image points [ u, v ] from the transformation matrix formula]T to world coordinate point [ x _w ,y _w ,z _w ]T is a transformation formula, and the constraint condition is:

in this embodiment, a specific working method for generating a gesture three-dimensional feature map by point cloud processing, as shown in fig. 8, includes:

s1, preprocessing a point cloud image;

s2, a single-frame point cloud gesture recognition algorithm matched with a template;

s3, a gesture recognition algorithm of the continuous point cloud;

the method is realized based on statistical noise filtering, a region growing algorithm and point cloud voxel by a point cloud preprocessing frame of filtering, segmentation and normal estimation; the template matching algorithm comprises three steps of preprocessing, plane point extraction and template matching, wherein the purpose of the preprocessing part is to pre-classify the obtained gestures by the number of the extending fingers so as to reduce the matching times of the algorithm and improve the algorithm efficiency; the gesture recognition algorithm of the continuous frame point cloud is used for marking key points of gestures and improving the abstract degree of the algorithm in a mode of constructing feature vectors; through the steps, edge detection can be carried out on the gesture of the user, boundary points and interior points are divided, marks of fingertips and deformed joints are detected, finally, the three-dimensional space positions of the obtained key points are fitted with a hand skeleton model of a gesture library, and matched gesture instruction signals are output to a response module in an execution layer;

in this embodiment, in the response module, the gesture signal of the user is matched with the gesture library, and the response performed by the execution layer mainly includes the following gestures and the response of the bottom hardware: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are nearly parallel, the rest four points except the thumb swing downwards to swing upwards and downwards, the vehicle window descends along with the movement, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and a forefinger is singly extended to move left and right up and down, the left and right page turning of a screen page can be controlled, the sliding of the up and down contents can be controlled, when the joint of the forefinger has larger angle change, a single-click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double-click response of the screen is obtained when the single-machine command continuously appears twice, and the double-click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.

In summary, the method is realized based on the TOF camera and the automobile cabin domain controller, so that the overall stability of the system is improved, the accuracy of vehicle-mounted gesture recognition can be effectively improved, the interference of external environment light and noise is reduced, the robustness of the overall recognition process is improved, the processing method of TOF images is optimized, the operation speed is improved, the multi-scale feature fusion of the depth map and the gray map is realized, the fusion depth high-resolution map is obtained, and the recognition accuracy of matching is greatly improved; the gesture operation is simple and easy to understand, the purposes of improving user experience and gesture recognition rate are achieved, and meanwhile safety of a user in the driving process is improved to a certain extent.

The present invention is not limited to the above embodiments, but is capable of modification and variation in detail, and other modifications and variations can be made by those skilled in the art without departing from the scope of the present invention.

Claims

1. The gesture recognition system for the automobile cabin area based on the TOF camera is characterized by comprising a perception layer, a decision layer and an execution layer which are sequentially connected; the decision layer receives the signal processing of the sensing layer and then sends the signal processing to the execution layer, wherein,

the sensing layer consists of a TOF camera and an external vehicle condition sensor; the TOF camera is used for collecting gesture information in the cabin range in real time, the external vehicle condition sensor is used for assisting the gesture recognition system to sense the external environment, if the detected environment state is abnormal, the safety belt in the cabin is slowly tightened, and the vehicle is subjected to deceleration operation;

the decision layer consists of a cabin domain controller and a chip module; the cabin domain controller comprises a visual domain module and an intelligent information domain module, which respectively play roles of increasing the visual image tracking processing capacity and the information transmission efficiency; simultaneously, three steps of logic verification, signal processing and model verification are set in the chip module, the obtained gesture information of the user is subjected to matching verification, and a corresponding command signal is output;

the execution layer consists of a response module; and the response module receives the command signal output by the decision layer and executes the command.

2. The response module of claim 1, wherein the outputted command signal is displayed through a center screen, a media sound or a sunroof of a vehicle, and a surrounding window.

3. The response module according to claim 2, wherein the central control screen, the media sound and the vehicle sunroof respond differently when receiving the command signal, wherein the central control screen performs up-down screen sliding, left-right page turning and space clicking responses according to gestures; the media sound equipment adjusts the volume according to the distance between the thumb and the index finger; the vehicle skylight and the surrounding windows ascend or descend according to the up-and-down waving of the palm, and the front-and-back waving enables the skylight to be opened or closed.

4. The gesture recognition method for the automobile cabin area based on the TOF camera is characterized by comprising the following steps of:

(1) Acquiring a user real-time gesture image through a TOF camera, transmitting pulse light through the TOF module, receiving light reflected by the hands of the user by utilizing a built-in sensor, and obtaining a user real-time gesture depth image according to the time difference of the pulse light and the light;

(2) Optimizing an image obtained by the TOF camera by adopting a CNN network;

(5) The method comprises the steps of obtaining a fusion depth high-resolution image through image fusion, obtaining a three-dimensional gesture model through point cloud data processing, processing a three-dimensional hand network through a three-dimensional gesture regressive, rendering the hand network through a grid renderer, obtaining a hand real-time three-dimensional gesture image, performing test matching on data, outputting a matching information output signal to an execution layer, and responding to bottom hardware in a response module.

5. The method for recognizing gesture in car cockpit area based on TOF camera according to claim 4, wherein said CNN network image optimizing step comprises:

(1-1) making TOF datasets

Creating a plurality of simulated TOF datasets on a graphical basis, including TOF raw measurements with sufficient error simulation, depth truth values at high resolution, and corresponding RGB images;

(1-2) error correction of TOF depth map

Providing a cascade iterative convolutional neural network to repair errors in the depth map step by step in a residual prediction mode;

(1-3) super resolution of RGB directed TOF depth map

Analyzing the modal difference between the RGB image and the depth map, designing a pre-fusion module, a post-fusion module and a cross-modal airspace attention fusion module, and applying a second-order gradient smoothing consistency loss in a loss function;

(1-4) acceleration and deployment of TOF imaging optimization algorithm on low-power consumption embedded device

The time complexity of the method is reduced on the premise of minimum accuracy loss through network structure optimization design.

6. The method for recognizing gesture in car cockpit area based on TOF camera in claim 4, wherein said depth fusion frame processes the image, comprising:

(2-1) projecting depth information obtained by the TOF camera onto a reference stereoscopic camera view angle;

(2-2) calculating by a stereo matching algorithm to obtain a high-resolution depth map;

(2-3) estimating the confidence of the stereoscopic parallax and the TOF depth map by using a CNN network;

(2-4) fusing the upsampled TOF output result and the stereoscopic parallax.

7. The method for recognizing gesture in automobile cabin area based on TOF camera according to claim 4, wherein the step of removing ambient light noise by defogging algorithm to obtain high resolution gray scale image comprises:

(3-1) dividing a region where the ambient light difference is large by setting different thresholds based on the ambient light difference;

(3-2) defogging the bright area by adopting a contrast map priori defogging algorithm, and processing the dark area by adopting a dark channel priori defogging algorithm;

(3-3) adjusting the brightness and color of the restored image by adopting a dynamic threshold white balance algorithm.

8. The method for recognizing gesture in automobile cabin based on TOF camera according to claim 4, wherein the image fusion method is a multi-scale image fusion method, comprising the steps of:

(4-1) respectively carrying out multi-scale decomposition on the original images to obtain a series of sub-images of a transformation domain;

(4-2) extracting the most effective features on each scale in the transformation domain by adopting a certain fusion rule to obtain a composite multi-scale representation;

(4-3) performing multi-scale inverse transformation on the composite multi-scale representation to obtain a fused image.

9. The method for recognizing the gesture in the automobile cabin area based on the TOF camera according to claim 8, wherein the method for processing the point cloud data is as follows:

(4-5) performing undististor operation on the image coordinate system to obtain a point cloud coordinate system, wherein the transformation formula is as follows:

(4-6) preprocessing the point cloud image;

(4-7) a single-frame point cloud gesture recognition algorithm matched with a template;

(4-8) a gesture recognition algorithm of the continuous point cloud;

and finally, matching the three-dimensional space position of the obtained key point with a gesture library, and outputting a matched gesture instruction signal to a response module.

10. The method for recognizing the gesture in the automobile cabin area based on the TOF camera according to claim 9, wherein the gesture library is matched, and the response of the execution layer mainly comprises the following gestures and the response of underlying hardware: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are nearly parallel, the rest four points except the thumb swing downwards to swing upwards and downwards, the vehicle window descends along with the movement, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and a forefinger is singly extended to move left and right up and down, the left and right page turning of a screen page can be controlled, the sliding of the up and down contents can be controlled, when the joint of the forefinger has larger angle change, a single-click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double-click response of the screen is obtained when the single-machine command continuously appears twice, and the double-click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.