CN116449947A - Automobile cabin domain gesture recognition system and method based on TOF camera - Google Patents

Automobile cabin domain gesture recognition system and method based on TOF camera Download PDF

Info

Publication number
CN116449947A
CN116449947A CN202310284871.XA CN202310284871A CN116449947A CN 116449947 A CN116449947 A CN 116449947A CN 202310284871 A CN202310284871 A CN 202310284871A CN 116449947 A CN116449947 A CN 116449947A
Authority
CN
China
Prior art keywords
gesture
image
tof
tof camera
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310284871.XA
Other languages
Chinese (zh)
Other versions
CN116449947B (en
Inventor
郝敬宾
孙晓凯
张正烜
徐林浩
刘新华
华德正
梁赐
刘晓帆
周皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Bdstar Navigation Automotive Electronics Co ltd
Original Assignee
Jiangsu Bdstar Navigation Automotive Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Bdstar Navigation Automotive Electronics Co ltd filed Critical Jiangsu Bdstar Navigation Automotive Electronics Co ltd
Priority to CN202310284871.XA priority Critical patent/CN116449947B/en
Publication of CN116449947A publication Critical patent/CN116449947A/en
Application granted granted Critical
Publication of CN116449947B publication Critical patent/CN116449947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R25/00Fittings or systems for preventing or indicating unauthorised use or theft of vehicles
    • B60R25/20Means to switch the anti-theft system on or off
    • B60R25/2045Means to switch the anti-theft system on or off by hand gestures
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K35/00Instruments specially adapted for vehicles; Arrangement of instruments in or on vehicles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K35/00Instruments specially adapted for vehicles; Arrangement of instruments in or on vehicles
    • B60K35/10Input arrangements, i.e. from user to vehicle, associated with vehicle functions or specially adapted therefor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K2360/00Indexing scheme associated with groups B60K35/00 or B60K37/00 relating to details of instruments or dashboards
    • B60K2360/146Instrument input by gesture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Transportation (AREA)
  • Combustion & Propulsion (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an automobile cabin domain gesture recognition system and method based on a TOF camera, wherein the system comprises the following steps: the sensing layer, the decision layer and the execution layer are connected in sequence; the sensing layer acquires a gesture image of a user by using a TOF camera, and an external vehicle condition sensor assists in sensing an environment; the decision layer collects gesture image signals through the cabin domain controller and the chip module and processes the gesture image signals; the execution layer executes the gesture command through the response module; the method comprises the steps of collecting gesture data through a TOF camera, transmitting gesture signals to a cabin controller, performing signal processing through a chip module, sensing an external environment with the assistance of an external vehicle condition sensor, and outputting vehicle running state signals; after logic verification, signal processing and model verification of the chip module, signals are output to an execution layer, and the response module executes commands. The system can improve the man-machine interaction experience of the user in the driving process and improve the reliability of driving safety.

Description

Automobile cabin domain gesture recognition system and method based on TOF camera
Technical Field
The invention relates to the field of gesture recognition of automobile cabins, in particular to a gesture recognition system and method of automobile cabins based on a TOF camera.
Background
With the development of artificial intelligence, the machine vision technology gradually enters daily life of people, so that the mental culture life of people is enriched, and pleasant experience is brought to people. The development of the artificial intelligent interaction mode of gesture recognition enables a user to control interaction with equipment by gestures, so that the development of man-machine interaction is promoted, and the vehicle-mounted gesture interaction system enters a rapid development period.
At present, the vehicle-mounted gesture recognition is mainly realized by means of wearable sensing equipment and a simple static gesture recognition mode, wherein the wearable sensing equipment is relatively high in use cost and not suitable for mass production although the accuracy is high and the robustness is good. For vehicle-mounted static gesture recognition, although the recognition rate is high and the recognition is easy, the vehicle-mounted static gesture recognition is not suitable for the current production and living standard. Based on the TOF camera, the depth image can be generated and tracked in real time, and the high-resolution depth image of the gesture of the user is obtained through an optimization algorithm, a depth fusion algorithm and an image defogging algorithm of the TOF, so that the method has obvious precision advantage and clear image compared with the traditional camera. However, the imaging device has certain defects and is easily interfered by external environment light and imaging precision.
Therefore, in order to solve the drawbacks of the prior art, it is necessary to design a system and a method for gesture recognition of an intelligent cabin of an automobile based on a TOF camera to solve the above problems.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a gesture recognition system and method for an automobile cabin area based on a TOF camera, which have higher recognition precision and efficiency, are not easy to be interfered by external noise and ambient light, and can provide good human-computer interaction experience for users.
In order to achieve the above object, the present invention has the following technical scheme:
an automobile cabin domain gesture recognition system based on a TOF camera comprises a perception layer, a decision layer and an execution layer which are sequentially connected; the sensing layer consists of a TOF camera and an external vehicle condition sensor; the decision layer consists of a cabin domain controller and a chip module; the execution layer is composed of response modules.
The TOF camera in the perception layer is used for collecting gesture information in the cabin range in real time, the external vehicle condition sensor is used for assisting the gesture system to detect the external environment, if the detection is abnormal, the user safety belt in the cabin is slowly tightened, and the vehicle is subjected to deceleration operation.
The signal of the receiving perception layer in the decision layer is processed by the decision layer and sent to the execution layer, wherein a visual domain module and an intelligent information domain module are added in the cabin domain controller, so that the functions of increasing the visual image tracking processing capacity and the information transmission efficiency are respectively achieved; and simultaneously, three steps of logic verification, signal processing and model verification are set in the chip module, and the obtained gesture information of the user is subjected to matching verification and a corresponding command signal is sent out.
And the response module in the execution layer receives the command signal output by the decision layer, executes the command and displays the output mode through a central control screen, a media sound, a vehicle skylight and the like.
Further, the response module central control screen, the media sound and the vehicle skylight can make different responses when receiving command signals, wherein the central control screen can perform responses such as up-and-down screen sliding, left-and-right page turning, space clicking and the like according to gestures; the media sound can adjust the volume according to the distance between the thumb and the index finger; the vehicle skylight and the surrounding windows can ascend or descend according to the up-and-down swing of the palm, the front-and-back swing enables the skylight to be opened or closed, the safety belt can be slowly tightened according to the output signals transmitted by the external vehicle condition sensor, and the vehicle is decelerated.
A gesture recognition method for an automobile cabin area based on a TOF camera comprises the following steps:
(1) The method comprises the steps of collecting a user real-time gesture image through a TOF camera, transmitting pulse light through the TOF module, receiving light reflected back by the hand of the user through a built-in sensor, and obtaining the user real-time gesture depth image according to the time difference of the pulse light and the light.
(2) Optimizing an image obtained by the TOF camera by adopting a deep Convolutional Neural Network (CNN);
(3) Processing the image by adopting a depth fusion frame to obtain a high-resolution depth map;
(4) Removing ambient light noise by adopting a defogging algorithm to obtain a high-resolution gray level image;
(5) The method comprises the steps of obtaining a fusion depth high-resolution image through image fusion, obtaining a three-dimensional gesture model through point cloud data processing, processing a three-dimensional hand network through a three-dimensional gesture regressive, rendering the hand network through a grid renderer, obtaining a real-time three-dimensional gesture image of the hand, testing and matching data, outputting a signal of matching information to an execution layer, and responding to bottom hardware such as a central control screen, a media sound box and an automobile skylight of a response module.
Further, the step of optimizing the depth convolution neural network image comprises the following steps:
(2-1) making a TOF dataset. A large number of simulated TOF datasets are produced on a graphical basis, containing TOF raw measurements with sufficient error simulation, depth truth values at high resolution, and corresponding RGB images.
(2-2) performing error correction on the TOF depth map. And providing a cascade iterative convolutional neural network to repair errors in the depth map step by step in a residual prediction mode.
(2-3) super resolution of RGB guided TOF depth map. And analyzing the modal difference between the RGB image and the depth map, designing a pre-fusion module, a post-fusion module and a cross-modal airspace attention fusion module, and applying a second-order gradient smoothing consistency loss in a loss function.
(2-4) acceleration and deployment of TOF imaging optimization algorithms on low power embedded devices. The time complexity of the method is reduced on the premise of minimum accuracy loss through network structure optimization design. The optimization method can improve the operation speed of the algorithm and simultaneously maintain good accuracy.
Further, the depth fusion framework processes the image, which includes:
(3-1) projecting depth information obtained by the TOF camera onto a reference stereoscopic camera view angle;
(3-2) calculating by a stereo matching algorithm to obtain a high-resolution depth map;
(3-3) estimating the confidence of the stereoscopic parallax and the TOF depth map by using a CNN network;
(3-4) fusing the upsampled TOF output result and the stereoscopic parallax; the depth map precision can be effectively improved through the fusion frame.
Further, the defogging algorithm removes ambient light noise, and the step of obtaining the high-resolution gray scale image comprises the following steps:
(4-1) setting different thresholds based on the ambient light difference to divide the region where the ambient light difference is large;
(4-2) defogging the bright area by adopting a contrast map priori defogging algorithm, and processing the dark area by adopting a dark channel priori defogging algorithm;
(4-3) adopting a dynamic threshold white balance algorithm to adjust the brightness and color of the restored image;
the processing method can effectively solve the problems of defogging color distortion and quality degradation of the image caused by the difference of ambient light, can improve the robustness of the output image, has higher color fidelity and definition of the restored image, and is more in line with the visual observation characteristic.
Further, the method for identifying the gesture of the automobile cabin domain based on the TOF camera is characterized in that the image fusion method is a multi-scale transformation image fusion method, and the method comprises the following steps:
(1) Respectively carrying out multi-scale decomposition on the original image to obtain a series of sub-images of a transformation domain;
(2) Extracting the most effective features on each scale in the transformation domain by adopting a certain fusion rule to obtain a composite multi-scale representation;
(3) Performing multi-scale inverse transformation on the composite multi-scale representation to obtain a fused image;
the processing method can obtain the high-resolution fusion depth image of the user hand;
further, the method for generating the three-dimensional gesture image by the depth map point cloud comprises the following steps: performing undischort operation on the image coordinate system to obtain a point cloud coordinate system, wherein the transformation formula is as follows:
wherein x, y, z are the point cloud coordinate system, x 'y' is the image coordinate system, and D is the depth value;
further, the method for generating the three-dimensional gesture by the point cloud image comprises the following steps:
(1) Preprocessing of point cloud images
(2) Single-frame point cloud gesture recognition algorithm for template matching
(3) Gesture recognition algorithm of continuous point cloud
According to the method, edge detection can be carried out on the gesture of the user, boundary points and interior points are divided, marks of fingertips and deformed joints are detected, the three-dimensional space positions of the obtained key points are finally fitted with a hand skeleton model of a gesture library, and matched gesture instruction signals are output to a response module;
further, the gesture library is matched, and the response of the execution layer mainly comprises the following gestures and the response of the bottom hardware: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are nearly parallel, the rest four points except the thumb swing downwards to swing upwards and downwards, the vehicle window descends along with the movement, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and a forefinger is singly extended to move left and right up and down, the left and right page turning of a screen page can be controlled, the sliding of the up and down contents can be controlled, when the joint of the forefinger has larger angle change, a single-click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double-click response of the screen is obtained when the single-machine command continuously appears twice, and the double-click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.
Compared with the prior art, the invention has the following beneficial effects:
1. the system can effectively and efficiently process gesture information, can reduce the power consumption of equipment and the operation time under the action of a CNN optimization algorithm, and enhances the robustness of a cabin domain system;
2. the external vehicle condition sensor with the auxiliary function of the sensing layer can detect the external environment in the driving process, so that the driving safety of a user is further ensured;
3. the resolution of a gesture image of a user and the robustness of gesture recognition in a complex environment can be effectively improved based on an optimization algorithm, a defogging algorithm, an image fusion method, a point cloud conversion three-dimensional gesture method and the like;
4. compared with the traditional control screen mode, the gesture recognition improves the operability of the hands of the user, and the gesture is simple and easy to understand and convenient to memorize, so that most of operable functions in the cabin are covered, and the man-machine interaction experience of the user is improved.
Drawings
FIG. 1 is a block diagram of a system architecture of the present invention;
FIG. 2 is a block diagram of the method of the present invention;
FIG. 3 is a block diagram of a CNN optimization method of the present invention;
FIG. 4 is a depth fusion framework of the present invention;
FIG. 5 is a block diagram of an image defogging model of the present invention;
FIG. 6 is a block diagram of a multi-scale transformed image fusion of the present invention;
FIG. 7 is a graph of the point cloud versus depth coordinate system change matrix of the present invention;
FIG. 8 is a block diagram of three-dimensional features of a point cloud processing generated gesture of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Examples
As shown in fig. 1, the gesture recognition system for the automobile cabin area based on the TOF camera comprises a perception layer, a decision layer and an execution layer which are sequentially connected; wherein:
the sensing layer consists of a TOF camera and an external vehicle condition sensor; the TOF camera is used for acquiring gesture information in the cabin range in real time, acquiring a hand depth image of a user, the external vehicle condition sensor is used for assisting the gesture recognition system to detect the external environment of the vehicle, if the external environment or the driving state is abnormal, the safety belt can be slowly tightened, the vehicle is decelerated, and then a fist-making gesture signal is matched.
The decision layer consists of a cabin domain controller and a chip module; the received sensing layer signals are processed by a decision layer and sent to an execution layer, wherein a visual domain module and an intelligent information domain module are added in a cabin domain controller to respectively play roles of increasing the visual image tracking processing capacity and the information transmission efficiency, and a better implementation environment is provided for a user gesture recognition algorithm; meanwhile, three steps of logic verification, signal processing and model verification are arranged on the chip module, and through the processing of the three steps, the recognition of gesture signals and the matching efficiency of a gesture library are accelerated, and the obtained gesture information of a user is subjected to matching verification and a corresponding command signal is sent out.
The execution layer consists of response modules; the response module receives the command signal output by the decision layer, executes the command and displays the output mode through a central control screen, a media sound box, a vehicle skylight, a safety belt and the like.
The response module is used for responding the central control screen, the media sound and the vehicle skylight differently when receiving the command signal, wherein the central control screen can respond up and down screen sliding, left and right page turning, space clicking and the like according to gestures; the media sound can adjust the volume according to the distance between the thumb and the index finger; the vehicle skylight and the surrounding windows can ascend or descend according to the up-and-down swing of the palm, and the front-and-back swing enables the skylight to be opened or closed.
In this embodiment, a flowchart of a specific working method of the gesture recognition process of the decision layer is shown in fig. 2, and includes:
and acquiring a user real-time gesture image through a TOF camera, transmitting pulse light by the TOF module, and receiving light reflected by the hand of the user by utilizing a built-in sensor. Then, according to the time difference between the two, a gesture depth image of a user in real time is obtained and transmitted to a cabin controller, then a gesture signal is processed by a vision domain module and an intelligent information domain module, and is processed again through three steps of logic verification, signal processing and model verification of a chip module, wherein the chip module acquires the gesture image of the user in real time, and an optimization algorithm is adopted to optimize the image obtained by a TOF camera, so that the processing speed is improved; processing the image by adopting a depth fusion frame to obtain a high-precision depth map; removing ambient light noise by adopting a defogging algorithm to obtain a high-resolution gray level image; the method comprises the steps of obtaining a fusion depth high-resolution image through image fusion, obtaining a three-dimensional gesture model through point cloud data processing, processing a three-dimensional hand network through a three-dimensional gesture regressive, rendering the hand network through a grid renderer, obtaining a real-time three-dimensional gesture image of the hand, testing and matching data, outputting a signal of matching information to an execution layer, and responding to bottom hardware such as a central control screen, a media sound, an automobile skylight, a safety belt and the like of a response module.
In this embodiment, a specific working method flowchart of the optimization method for TOF images, as shown in fig. 3, includes:
s1, manufacturing an analog TOF data set. A large number of simulated TOF datasets are produced on a graphical basis, containing TOF raw measurements with sufficient error simulation, depth truth values determined at high resolution, and corresponding RGB images.
S2, performing error repair on the TOF depth map. And providing a cascade iterative convolutional neural network to repair errors in the depth map step by step in a residual prediction mode.
S3, RGB guides super-resolution of the TOF depth map. And analyzing the modal difference between the RGB image and the depth map, designing a pre-fusion module, a post-fusion module and a cross-modal airspace attention fusion module, and applying a second-order gradient smoothing consistency loss in a loss function.
S4, accelerating and deploying the TOF imaging optimization algorithm on the low-power-consumption embedded equipment. The time complexity of the method is reduced on the premise of minimum accuracy loss through network structure optimization design.
The TOF simulation data set mainly comprises scenes such as a user driving position, a co-driving position, a car cabin back space and the like, the generation of the data set comprises a scene construction part and an imaging rendering part, wherein the scene construction part mainly comprises a car cabin model of Beidou satellite, the cabin internal scene under different light intensities is constructed by using a 3D model to simulate the actual TOF camera using scene, the imaging rendering of a correlation coefficient map uses transient rendering, the amplitude modulation of a rendering result is carried out through the light received by the rendering camera on each time segment to obtain the simulation of the correlation coefficient map, and the bottom layer of the renderer is based on the light tracking principle, so that the simulation of multipath errors can be supported, and the renderer can render the time-resolved image under the pose due to the random position and orientation of the user gesture. Assuming that a rendering time interval is divided into N small segments, a rendering result R with a size H W N can be obtained, and a correlation coefficient diagram to be simulated can be obtained by the following formula:
n is the number of time segments, R is the rendering result of time resolution, f is the modulation frequency, τ is the duration of the time segments (in seconds), C 0 And C 1 Is a correlation coefficient diagram of the obtained different phase offsets.
The optimization method can improve the operation speed of the algorithm and simultaneously maintain good accuracy.
In this embodiment, a specific workflow of image processing by the depth fusion framework, as shown in fig. 4, includes:
s1, projecting depth information obtained by the TOF camera to a reference stereoscopic camera view angle.
S2, calculating by a stereo matching algorithm to obtain a high-resolution depth map.
S3, estimating and obtaining the confidence coefficient of the stereoscopic parallax and the TOF depth map by using the CNN network
S4, fusing the up-sampled TOF output result and the stereo parallax;
the depth map precision can be effectively improved through the fusion frame.
In this embodiment, a specific working method flowchart of removing ambient light noise by the defogging algorithm to obtain a high-resolution gray scale image is shown in fig. 5, and includes:
s1, setting different threshold values based on ambient light differences to divide areas with larger ambient light differences;
s2, defogging a bright area by adopting a comparison graph priori defogging algorithm, and processing a dark area by adopting a dark channel priori defogging algorithm;
s3, adjusting the brightness and color of the restored image by adopting a dynamic threshold white balance algorithm;
the processing of ambient light converts an original image into a gray image, obtains gradient information of the gray image, denoises the gradient of the gray image, sets a gradient threshold w, judges that the gradient threshold w is smaller than the gradient threshold as an initial area, counts the average brightness of pixel values of corresponding positions of the original image for each divided connected area, and if the average brightness is larger than the brightness threshold w 1 Then remain as bright areas; otherwise, dark areas are set, thereby further refining the bright areas.
The processing method can effectively solve the problems of defogging color distortion and quality reduction of the image caused by the difference of ambient light, and has higher contrast. The robustness of the output image can be improved, and the color fidelity and definition of the restored image are higher, so that the method is more in line with the visual observation characteristic.
In this embodiment, a flowchart of a specific working method for multi-scale transformation image fusion, as shown in fig. 6, includes:
s1, respectively carrying out multi-scale decomposition on original images to obtain a series of sub-images of a transformation domain;
s2, extracting the most effective features on each scale in the transformation domain by adopting a certain fusion rule to obtain a composite multi-scale representation;
s3, performing multi-scale inverse transformation on the composite multi-scale representation to obtain a fused image;
the processing method can obtain the high-resolution fusion depth image of the user hand, adopts a CVT conversion method, has multi-resolution and time-frequency localization analysis characteristics of DWT conversion, has anisotropy and strong directivity, can accurately and sparsely represent the edge information of the image by using fewer non-zero coefficients, integrates most useful information of the image by approximating the non-zero coefficient points after the curve-like singular feature conversion of the image, has more concentrated energy, and is beneficial to analyzing important features such as edges, textures and the like of the image.
In this embodiment, a specific workflow diagram for generating a three-dimensional gesture image by using a depth map point cloud, as shown in fig. 7, includes:
s1, performing undistitor operation on an image coordinate system to obtain a point cloud coordinate system;
s2, converting the image coordinate system into a world coordinate system;
s3, setting an external parameter matrix, wherein the world coordinate origin and the camera origin are coincident, namely, rotation and translation are not generated, so that the external parameter matrix is formed:
s4, the coordinate origin of the camera coordinate system and the world coordinate system coincide, so that the same object under the camera coordinate and the world coordinate has the same depth, namely Z c =Z w Further reduction can be made as:
s5, calculating to obtain image points [ u, v ] from the transformation matrix formula]T to world coordinate point [ x w ,y w ,z w ]T is a transformation formula, and the constraint condition is:
wherein x, y, z are the point cloud coordinate system, x 'y' is the image coordinate system, and D is the depth value;
in this embodiment, a specific working method for generating a gesture three-dimensional feature map by point cloud processing, as shown in fig. 8, includes:
s1, preprocessing a point cloud image;
s2, a single-frame point cloud gesture recognition algorithm matched with a template;
s3, a gesture recognition algorithm of the continuous point cloud;
the method is realized based on statistical noise filtering, a region growing algorithm and point cloud voxel by a point cloud preprocessing frame of filtering, segmentation and normal estimation; the template matching algorithm comprises three steps of preprocessing, plane point extraction and template matching, wherein the purpose of the preprocessing part is to pre-classify the obtained gestures by the number of the extending fingers so as to reduce the matching times of the algorithm and improve the algorithm efficiency; the gesture recognition algorithm of the continuous frame point cloud is used for marking key points of gestures and improving the abstract degree of the algorithm in a mode of constructing feature vectors; through the steps, edge detection can be carried out on the gesture of the user, boundary points and interior points are divided, marks of fingertips and deformed joints are detected, finally, the three-dimensional space positions of the obtained key points are fitted with a hand skeleton model of a gesture library, and matched gesture instruction signals are output to a response module in an execution layer;
in this embodiment, in the response module, the gesture signal of the user is matched with the gesture library, and the response performed by the execution layer mainly includes the following gestures and the response of the bottom hardware: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are nearly parallel, the rest four points except the thumb swing downwards to swing upwards and downwards, the vehicle window descends along with the movement, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and a forefinger is singly extended to move left and right up and down, the left and right page turning of a screen page can be controlled, the sliding of the up and down contents can be controlled, when the joint of the forefinger has larger angle change, a single-click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double-click response of the screen is obtained when the single-machine command continuously appears twice, and the double-click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.
In summary, the method is realized based on the TOF camera and the automobile cabin domain controller, so that the overall stability of the system is improved, the accuracy of vehicle-mounted gesture recognition can be effectively improved, the interference of external environment light and noise is reduced, the robustness of the overall recognition process is improved, the processing method of TOF images is optimized, the operation speed is improved, the multi-scale feature fusion of the depth map and the gray map is realized, the fusion depth high-resolution map is obtained, and the recognition accuracy of matching is greatly improved; the gesture operation is simple and easy to understand, the purposes of improving user experience and gesture recognition rate are achieved, and meanwhile safety of a user in the driving process is improved to a certain extent.
The present invention is not limited to the above embodiments, but is capable of modification and variation in detail, and other modifications and variations can be made by those skilled in the art without departing from the scope of the present invention.

Claims (10)

1. The gesture recognition system for the automobile cabin area based on the TOF camera is characterized by comprising a perception layer, a decision layer and an execution layer which are sequentially connected; the decision layer receives the signal processing of the sensing layer and then sends the signal processing to the execution layer, wherein,
the sensing layer consists of a TOF camera and an external vehicle condition sensor; the TOF camera is used for collecting gesture information in the cabin range in real time, the external vehicle condition sensor is used for assisting the gesture recognition system to sense the external environment, if the detected environment state is abnormal, the safety belt in the cabin is slowly tightened, and the vehicle is subjected to deceleration operation;
the decision layer consists of a cabin domain controller and a chip module; the cabin domain controller comprises a visual domain module and an intelligent information domain module, which respectively play roles of increasing the visual image tracking processing capacity and the information transmission efficiency; simultaneously, three steps of logic verification, signal processing and model verification are set in the chip module, the obtained gesture information of the user is subjected to matching verification, and a corresponding command signal is output;
the execution layer consists of a response module; and the response module receives the command signal output by the decision layer and executes the command.
2. The response module of claim 1, wherein the outputted command signal is displayed through a center screen, a media sound or a sunroof of a vehicle, and a surrounding window.
3. The response module according to claim 2, wherein the central control screen, the media sound and the vehicle sunroof respond differently when receiving the command signal, wherein the central control screen performs up-down screen sliding, left-right page turning and space clicking responses according to gestures; the media sound equipment adjusts the volume according to the distance between the thumb and the index finger; the vehicle skylight and the surrounding windows ascend or descend according to the up-and-down waving of the palm, and the front-and-back waving enables the skylight to be opened or closed.
4. The gesture recognition method for the automobile cabin area based on the TOF camera is characterized by comprising the following steps of:
(1) Acquiring a user real-time gesture image through a TOF camera, transmitting pulse light through the TOF module, receiving light reflected by the hands of the user by utilizing a built-in sensor, and obtaining a user real-time gesture depth image according to the time difference of the pulse light and the light;
(2) Optimizing an image obtained by the TOF camera by adopting a CNN network;
(3) Processing the image by adopting a depth fusion frame to obtain a high-resolution depth map;
(4) Removing ambient light noise by adopting a defogging algorithm to obtain a high-resolution gray level image;
(5) The method comprises the steps of obtaining a fusion depth high-resolution image through image fusion, obtaining a three-dimensional gesture model through point cloud data processing, processing a three-dimensional hand network through a three-dimensional gesture regressive, rendering the hand network through a grid renderer, obtaining a hand real-time three-dimensional gesture image, performing test matching on data, outputting a matching information output signal to an execution layer, and responding to bottom hardware in a response module.
5. The method for recognizing gesture in car cockpit area based on TOF camera according to claim 4, wherein said CNN network image optimizing step comprises:
(1-1) making TOF datasets
Creating a plurality of simulated TOF datasets on a graphical basis, including TOF raw measurements with sufficient error simulation, depth truth values at high resolution, and corresponding RGB images;
(1-2) error correction of TOF depth map
Providing a cascade iterative convolutional neural network to repair errors in the depth map step by step in a residual prediction mode;
(1-3) super resolution of RGB directed TOF depth map
Analyzing the modal difference between the RGB image and the depth map, designing a pre-fusion module, a post-fusion module and a cross-modal airspace attention fusion module, and applying a second-order gradient smoothing consistency loss in a loss function;
(1-4) acceleration and deployment of TOF imaging optimization algorithm on low-power consumption embedded device
The time complexity of the method is reduced on the premise of minimum accuracy loss through network structure optimization design.
6. The method for recognizing gesture in car cockpit area based on TOF camera in claim 4, wherein said depth fusion frame processes the image, comprising:
(2-1) projecting depth information obtained by the TOF camera onto a reference stereoscopic camera view angle;
(2-2) calculating by a stereo matching algorithm to obtain a high-resolution depth map;
(2-3) estimating the confidence of the stereoscopic parallax and the TOF depth map by using a CNN network;
(2-4) fusing the upsampled TOF output result and the stereoscopic parallax.
7. The method for recognizing gesture in automobile cabin area based on TOF camera according to claim 4, wherein the step of removing ambient light noise by defogging algorithm to obtain high resolution gray scale image comprises:
(3-1) dividing a region where the ambient light difference is large by setting different thresholds based on the ambient light difference;
(3-2) defogging the bright area by adopting a contrast map priori defogging algorithm, and processing the dark area by adopting a dark channel priori defogging algorithm;
(3-3) adjusting the brightness and color of the restored image by adopting a dynamic threshold white balance algorithm.
8. The method for recognizing gesture in automobile cabin based on TOF camera according to claim 4, wherein the image fusion method is a multi-scale image fusion method, comprising the steps of:
(4-1) respectively carrying out multi-scale decomposition on the original images to obtain a series of sub-images of a transformation domain;
(4-2) extracting the most effective features on each scale in the transformation domain by adopting a certain fusion rule to obtain a composite multi-scale representation;
(4-3) performing multi-scale inverse transformation on the composite multi-scale representation to obtain a fused image.
9. The method for recognizing the gesture in the automobile cabin area based on the TOF camera according to claim 8, wherein the method for processing the point cloud data is as follows:
(4-5) performing undististor operation on the image coordinate system to obtain a point cloud coordinate system, wherein the transformation formula is as follows:
wherein x, y, z are the point cloud coordinate system, x 'y' is the image coordinate system, and D is the depth value;
(4-6) preprocessing the point cloud image;
(4-7) a single-frame point cloud gesture recognition algorithm matched with a template;
(4-8) a gesture recognition algorithm of the continuous point cloud;
and finally, matching the three-dimensional space position of the obtained key point with a gesture library, and outputting a matched gesture instruction signal to a response module.
10. The method for recognizing the gesture in the automobile cabin area based on the TOF camera according to claim 9, wherein the gesture library is matched, and the response of the execution layer mainly comprises the following gestures and the response of underlying hardware: when the index finger is touched with the thumb, the distance between the thumb and the index finger is changed to output the media volume adjusting signal, and the size of the media volume is changed; when the palms are nearly parallel, the rest four points except the thumb swing downwards to swing upwards and downwards, the vehicle window descends along with the movement, and otherwise ascends; when the palm stands up and swings forward and backward, the sunroof at the top of the vehicle is opened or closed forward and backward along with the palm; when the human-computer interaction relates to a vehicle-mounted screen, a fist is held and a forefinger is singly extended to move left and right up and down, the left and right page turning of a screen page can be controlled, the sliding of the up and down contents can be controlled, when the joint of the forefinger has larger angle change, a single-click screen command is responded at the corresponding position of the screen once, the time interval is less than 0.5 seconds, the double-click response of the screen is obtained when the single-machine command continuously appears twice, and the double-click operation is carried out on the screen once; when a fist-making gesture command occurs, the user can autonomously judge whether the running environment is safe or not, and the vehicle immediately enters a decelerating running state.
CN202310284871.XA 2023-03-22 2023-03-22 Automobile cabin domain gesture recognition system and method based on TOF camera Active CN116449947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310284871.XA CN116449947B (en) 2023-03-22 2023-03-22 Automobile cabin domain gesture recognition system and method based on TOF camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310284871.XA CN116449947B (en) 2023-03-22 2023-03-22 Automobile cabin domain gesture recognition system and method based on TOF camera

Publications (2)

Publication Number Publication Date
CN116449947A true CN116449947A (en) 2023-07-18
CN116449947B CN116449947B (en) 2024-02-02

Family

ID=87126619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310284871.XA Active CN116449947B (en) 2023-03-22 2023-03-22 Automobile cabin domain gesture recognition system and method based on TOF camera

Country Status (1)

Country Link
CN (1) CN116449947B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218716A (en) * 2023-08-10 2023-12-12 中国矿业大学 DVS-based automobile cabin gesture recognition system and method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110286676A1 (en) * 2010-05-20 2011-11-24 Edge3 Technologies Llc Systems and related methods for three dimensional gesture recognition in vehicles
CN108803426A (en) * 2018-06-27 2018-11-13 常州星宇车灯股份有限公司 A kind of vehicle device control system based on TOF gesture identifications
US20190302895A1 (en) * 2018-03-27 2019-10-03 Usens Inc. Hand gesture recognition system for vehicular interactive control
CN112507924A (en) * 2020-12-16 2021-03-16 深圳荆虹科技有限公司 3D gesture recognition method, device and system
KR20210057358A (en) * 2019-11-12 2021-05-21 주식회사 에스오에스랩 Gesture recognition method and gesture recognition device performing the same
CN113448429A (en) * 2020-03-25 2021-09-28 南京人工智能高等研究院有限公司 Method and device for controlling electronic equipment based on gestures, storage medium and electronic equipment
US20220036050A1 (en) * 2018-02-12 2022-02-03 Avodah, Inc. Real-time gesture recognition method and apparatus
WO2022188259A1 (en) * 2021-03-08 2022-09-15 豪威芯仑传感器(上海)有限公司 Dynamic gesture recognition method, gesture interaction method, and interaction system
CN115719507A (en) * 2021-08-23 2023-02-28 中移(上海)信息通信科技有限公司 Image identification method and device and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110286676A1 (en) * 2010-05-20 2011-11-24 Edge3 Technologies Llc Systems and related methods for three dimensional gesture recognition in vehicles
US20220036050A1 (en) * 2018-02-12 2022-02-03 Avodah, Inc. Real-time gesture recognition method and apparatus
US20190302895A1 (en) * 2018-03-27 2019-10-03 Usens Inc. Hand gesture recognition system for vehicular interactive control
CN108803426A (en) * 2018-06-27 2018-11-13 常州星宇车灯股份有限公司 A kind of vehicle device control system based on TOF gesture identifications
KR20210057358A (en) * 2019-11-12 2021-05-21 주식회사 에스오에스랩 Gesture recognition method and gesture recognition device performing the same
CN113448429A (en) * 2020-03-25 2021-09-28 南京人工智能高等研究院有限公司 Method and device for controlling electronic equipment based on gestures, storage medium and electronic equipment
CN112507924A (en) * 2020-12-16 2021-03-16 深圳荆虹科技有限公司 3D gesture recognition method, device and system
WO2022188259A1 (en) * 2021-03-08 2022-09-15 豪威芯仑传感器(上海)有限公司 Dynamic gesture recognition method, gesture interaction method, and interaction system
CN115719507A (en) * 2021-08-23 2023-02-28 中移(上海)信息通信科技有限公司 Image identification method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙晓静: "基于多模态的手部姿态估计方法研究", 中国知网优秀硕士论文库, no. 10, pages 22 - 35 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218716A (en) * 2023-08-10 2023-12-12 中国矿业大学 DVS-based automobile cabin gesture recognition system and method
CN117218716B (en) * 2023-08-10 2024-04-09 中国矿业大学 DVS-based automobile cabin gesture recognition system and method

Also Published As

Publication number Publication date
CN116449947B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN108345869A (en) Driver's gesture recognition method based on depth image and virtual data
CN104598915B (en) A kind of gesture identification method and device
CN111563446A (en) Human-machine interaction safety early warning and control method based on digital twin
CN109558832A (en) A kind of human body attitude detection method, device, equipment and storage medium
CN108519812B (en) Three-dimensional micro Doppler gesture recognition method based on convolutional neural network
CN105739702A (en) Multi-posture fingertip tracking method for natural man-machine interaction
CN110688965B (en) IPT simulation training gesture recognition method based on binocular vision
CN110047101A (en) Gestures of object estimation method, the method for obtaining dense depth image, related device
CN102426480A (en) Man-machine interactive system and real-time gesture tracking processing method for same
CN110796018A (en) Hand motion recognition method based on depth image and color image
CN103473801A (en) Facial expression editing method based on single camera and motion capturing data
CN104038799A (en) Three-dimensional television-oriented gesture manipulation method
CN116449947B (en) Automobile cabin domain gesture recognition system and method based on TOF camera
CN107621880A (en) A kind of robot wheel chair interaction control method based on improvement head orientation estimation method
CN111158476B (en) Key recognition method, system, equipment and storage medium of virtual keyboard
CN111161160A (en) Method and device for detecting obstacle in foggy weather, electronic equipment and storage medium
CN111695408A (en) Intelligent gesture information recognition system and method and information data processing terminal
CN105930793A (en) Human body detection method based on SAE characteristic visual learning
CN110490165B (en) Dynamic gesture tracking method based on convolutional neural network
CN116466827A (en) Intelligent man-machine interaction system and method thereof
CN114067359B (en) Pedestrian detection method integrating human body key points and visible part attention characteristics
CN113920498B (en) Point cloud 3D object detection method based on multilayer feature pyramid
Zheng Gesture recognition real-time control system based on YOLOV4
CN113807280A (en) Kinect-based virtual ship cabin system and method
CN111695475A (en) Method for intelligently controlling household appliances based on NMI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant