CN115063849A

CN115063849A - Dynamic gesture vehicle control system and method based on deep learning

Info

Publication number: CN115063849A
Application number: CN202210561028.7A
Authority: CN
Inventors: 刘赫; 吕贵林; 陈涛; 孙玉洋
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-09-16

Abstract

A dynamic gesture vehicle control system and method based on deep learning relates to the technical field of deep learning, solves the problems of low accuracy and slow recognition speed of existing dynamic gesture recognition, and can be applied to mid-to-high-end vehicle control systems. The system includes a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal includes a face recognition module, a gesture recognition module and a vehicle control module, and each module realizes communication connection through Ethernet; the cloud platform includes an identity authentication module, and the identity authentication module is connected to the The vehicle terminal is connected through the ACP protocol. The dynamic gesture car control method includes: collecting and recognizing a static face image through a high-precision camera, and uploading the face signal to the cloud through the ACP protocol for identity verification by the vehicle communication terminal; capturing the user's dynamic gesture image, and analyzing and calculating in real time through the gesture recognition module Accurate gesture signal is generated; it is transmitted to the on-board computing unit through Ethernet for fusion processing, and sent to the wire-controlled unit to control the car inside and outside the car.

Description

A deep learning-based dynamic gesture car control system and method

技术领域technical field

本发明涉及深度学习技术领域，具体涉及一种基于深度学习的动态手势控车技术。The invention relates to the technical field of deep learning, in particular to a dynamic gesture car control technology based on deep learning.

背景技术Background technique

随着经济发展和人们生活水平的不断提高，车辆销售数量不断攀升，车位资源加剧紧张，车外控制车辆出泊车需求日益明显，车内控制的舒适性要求日渐多样化，成为了中高端汽车重要配置。因此手势控车方法成为各大车厂追求创新的重要方向之一。With economic development and the continuous improvement of people's living standards, the number of vehicle sales continues to rise, and the parking space resources are intensified. Important configuration. Therefore, the gesture car control method has become one of the important directions for major car manufacturers to pursue innovation.

目前，针对车外控车系统，仅停留在智能终端，例如手机APP等层面，终端控制常依赖云端下发控制指令，在系统性能优化程度低的前提下，存在远控链路过长、资源消耗大、用户体验感差、超时、指令异常等缺陷。另外针对车内控制，手势识别方法相比于触摸按键方法，可以带给用户更好的人机交互体验，保障用户集中注意力驾驶车辆，做到三维空间立体控制取代二维平面控制。但目前现有的动态手势空车技术，对手势识别准确率不高，识别速度比较慢，严重影响用户体验。At present, for the vehicle external control vehicle system, it only stays at the level of intelligent terminals, such as mobile APP, etc. Terminal control often relies on the cloud to issue control commands. Under the premise of low system performance optimization, there are long remote control links and resources. Defects such as high consumption, poor user experience, timeout, and abnormal instructions. In addition, for in-vehicle control, compared with the touch button method, the gesture recognition method can bring users a better human-computer interaction experience, ensure that users concentrate on driving the vehicle, and achieve three-dimensional spatial control instead of two-dimensional plane control. However, the current dynamic gesture emptying technology has low accuracy in gesture recognition, and the recognition speed is relatively slow, which seriously affects the user experience.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术动态手势识别准确率不高以及识别速度慢的问题，本发明提出了一种基于深度学习的动态手势控车系统和方法。In order to solve the problems of low accuracy and slow recognition speed of dynamic gesture recognition in the prior art, the present invention proposes a deep learning-based dynamic gesture vehicle control system and method.

本发明的技术方案如下：The technical scheme of the present invention is as follows:

一种基于深度学习的动态手势控车系统，包括车载终端和云平台；所述车载终端包括人脸识别模块、手势识别模块和车辆控制模块，各个模块通过以太网实现通信连接；所述云平台包括身份认证模块，所述身份认证模块与所述车载终端通过ACP协议连接；A dynamic gesture vehicle control system based on deep learning, comprising a vehicle terminal and a cloud platform; the vehicle terminal includes a face recognition module, a gesture recognition module and a vehicle control module, and each module is connected through Ethernet for communication; the cloud platform Including an identity authentication module, the identity authentication module is connected with the vehicle terminal through the ACP protocol;

所述手势识别模块包括：深度可分离卷积神经网络模块、手势动作定位模块、损失函数计算模块、通道注意力模块和动态手势数据库；The gesture recognition module includes: a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;

动态手势数据库用于捕捉和存储用户常用手势信息，通过损失函数计算模块配合手势动作定位模块，使预测框更精准拟合真实框，通道注意力模块用于输出特征检测结果，深度可分离卷积神经网络模块对用户手势信息进行深度训练，实现快速特征提取和识别用户手势操作。The dynamic gesture database is used to capture and store the user's common gesture information. The loss function calculation module cooperates with the gesture action positioning module to make the prediction frame fit the real frame more accurately. The channel attention module is used to output feature detection results, and the depthwise separable convolution The neural network module performs in-depth training on user gesture information to achieve fast feature extraction and recognition of user gesture operations.

优选地，所述人脸识别模块包括：级联卷积神经网络、深度卷积神经网络、度量模块、归一化计算模块和损失函数计算模块；Preferably, the face recognition module includes: a cascaded convolutional neural network, a deep convolutional neural network, a metric module, a normalization calculation module and a loss function calculation module;

所述人脸识别模块将摄像头采集到的静态人脸图像，经级联卷积神经网络提取目标人脸的图像信息，再经过深度卷积神经网络模块提取到人脸图像中的深度特征向量，经度量模块判断特征向量的相关程度，最后通过对人脸特征数据经归一化计算模块和损失函数计算模块计算整合出精准人脸数据，与云端人脸数据库进行比对认证，最终实现人脸识别的目的。The face recognition module extracts the image information of the target face from the static face image collected by the camera through the cascaded convolutional neural network, and then extracts the depth feature vector in the face image through the deep convolutional neural network module, The measurement module judges the correlation degree of the feature vector, and finally calculates and integrates the accurate face data through the normalization calculation module and the loss function calculation module of the face feature data, and compares and authenticates with the cloud face database, and finally realizes the face purpose of identification.

优选地，所述深度可分离卷积神经网络模块利用深度卷积和1×1卷积进行深度融合，所述深度可分离卷积神经网络模块的第一步和最后一步都采用1×1卷积，中间步骤利用ResNet特征融合理念对浅层特征融合，通过压缩通道数量减少网络参数量。Preferably, the depthwise separable convolutional neural network module uses depthwise convolution and 1×1 convolution to perform deep fusion, and the first and last steps of the depthwise separable convolutional neural network module both use 1×1 convolution The intermediate step uses the ResNet feature fusion concept to fuse shallow features, and reduces the amount of network parameters by compressing the number of channels.

优选地，所述损失函数计算模块用于计算检测框和真实框各自宽高比的相似性损失，具体计算方法为：Preferably, the loss function calculation module is used to calculate the similarity loss of the respective aspect ratios of the detection frame and the real frame, and the specific calculation method is:

其中，in,

b和b₁分别表示检测框和真实框的中心点，ρ为欧式距离，c表示检测框和真实框二者距离最远的顶点间的距离，IoU表示b和b₁的交并比，x和y分别表示检测框的宽和高，x₁和y₁分别表示真实框的宽和高；b and b ₁ represent the center point of the detection frame and the real frame, respectively, ρ is the Euclidean distance, c is the distance between the farthest vertices between the detection frame and the real frame, IoU represents the intersection ratio of b and b ₁ , x and y represent the width and height of the detection frame, respectively, and x ₁ and y ₁ represent the width and height of the real frame, respectively;

手势动作定位模块损失对传统定位算法中的边界框预测损失项进行替换，改进后的损失函数L由误差、置信度误差和分类误差三部分构成：The gesture action localization module loss replaces the bounding box prediction loss term in the traditional localization algorithm. The improved loss function L consists of three parts: error, confidence error and classification error:

L＝L_C+L_con+L_s，L=L _C +L _con +L _s ,

其中，in,

其中，s²代表网格数，刀代表边界框数，

代表目标物是否落入第i个网格第j个边界框，

代表没有落入，

代表已经落入；该网络采用13★13和26★26尺度特征融合，因此s²＝13²和s²＝26²，B＝2。where s ² represents the number of grids, knife represents the number of bounding boxes,

Represents whether the target falls into the jth bounding box of the ith grid,

represents not falling,

The representatives have fallen in; the network uses 13★13 and 26★26 scale feature fusion, so s ² =13 ² and s ² =26 ² , B=2.

优选地，所述通道注意力模块采用双通道，经过双通道两个不同尺度的输出后添加通道注意力模块，通过分配不同权重，对两个不同尺度的特征输出，经非极大值抑制得出最终的检测结果。Preferably, the channel attention module adopts dual channels. After the output of two different scales of the dual channels, the channel attention module is added. By assigning different weights, the feature outputs of the two different scales are suppressed by non-maximum value. the final test results.

优选地，所述身份认证模块为基于公钥密码体制的数字签名认证系统，通过车内外高精摄像头采集到人脸图像，图像数据预处理后经CAN线传输到车载通信终端TBOX，由TBOX将数据投传到云平台TSP进行初次人脸图像存储并生成唯一识别码，进行初次数字证书申请到认证系统PKI，在PKI系统中首先通过RA模块对用户的身份注册请求进行身份核查，然后由CA模块为用户从证书库获取并颁发数字证书，进而将用户的身份信息以及用户的公钥通过数字证书的形式完成绑定，从而实现用户身份认证。Preferably, the identity authentication module is a digital signature authentication system based on a public key cryptosystem. Face images are collected through high-precision cameras inside and outside the vehicle. After preprocessing, the image data is transmitted to the vehicle-mounted communication terminal TBOX via a CAN line. The data is sent to the cloud platform TSP for initial face image storage and a unique identification code is generated, and the initial digital certificate is applied to the authentication system PKI. The module obtains and issues a digital certificate for the user from the certificate store, and then binds the user's identity information and the user's public key in the form of a digital certificate, thereby realizing user identity authentication.

优选地，所述的车辆控制模块包括车载高精摄像头、网络通信模块、车载计算单元和线控单元；车载高精摄像头通过手势识别算法提取到准确的手势图像信号，经通信网络模块实时传输到车载计算单元，对手势信号进行过滤计算处理，并融合其它通过传感器感知到的防碰撞信号，统一发送至线控单元实现手势对车辆的控制。Preferably, the vehicle control module includes a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit and a wire-controlled unit; the vehicle-mounted high-precision camera extracts an accurate gesture image signal through a gesture recognition algorithm, and transmits it to the real-time through the communication network module. The on-board computing unit filters and calculates the gesture signal, and integrates other anti-collision signals perceived by the sensor, and sends it to the wire control unit to realize the gesture control of the vehicle.

优选地，所述手势对车辆的控制包括对车辆的外部控制和车内控制，所述外部控制包括倒车、刹车以及转向，所述车内控制包括对车内电子座椅、车机屏幕、车内音响以及氛围灯的控制。Preferably, the control of the vehicle by the gesture includes an external control of the vehicle and an in-vehicle control, the external control includes reversing, braking and steering, and the in-vehicle control includes an in-vehicle electronic seat, an in-vehicle screen, and an in-vehicle control. Internal audio and ambient lighting controls.

一种基于深度学习的动态手势控车方法，应用以上所述动态手势控车系统，所述动态手势控车方法包括以下步骤：A deep learning-based dynamic gesture car control method, applying the above dynamic gesture car control system, the dynamic gesture car control method includes the following steps:

S1、通过高精摄像头采集并识别到静态人脸图像，车载通信终端将该人脸信号通过ACP协议上传云端进行身份验证；S1. A static face image is collected and recognized by a high-precision camera, and the in-vehicle communication terminal uploads the face signal to the cloud through the ACP protocol for identity verification;

S2、身份验证通过后，捕捉用户动态手势图像，经过手势识别模块实时分析计算出准确的手势信号，S2. After the identity verification is passed, the dynamic gesture image of the user is captured, and the accurate gesture signal is calculated and analyzed in real time by the gesture recognition module.

S3、手势信号经以太网传输到车载计算单元进行融合处理，然后发送到线控单元实现车内外的控车。S3. The gesture signal is transmitted to the in-vehicle computing unit through the Ethernet for fusion processing, and then sent to the wire-controlled unit to control the vehicle inside and outside the vehicle.

与现有技术相比，本发明解决了动态手势识别准确率不高以及识别速度慢的问题，具体有益效果为：Compared with the prior art, the present invention solves the problems of low dynamic gesture recognition accuracy and slow recognition speed, and the specific beneficial effects are:

1.本发明提供的基于深度学习的动态手势控车系统，针对当前车位资源紧张，存在用户无法在有效空间内进入车内的场景，解决了用户在车外实现手势泊入和泊出车辆功能，在狭窄停车场景下可以控制车辆泊入泊出；并且在车内场景下利用动态手势识别技术，将手势识别能力和车内线控能力实现结合，无需触摸仪表或车机屏实现控制，在保障车主集中精力行驶车辆的同时，极大提高了人机交互体验。1. The deep learning-based dynamic gesture car control system provided by the present invention, in view of the current shortage of parking space resources, there is a scene where the user cannot enter the car in the effective space, and solves the problem that the user realizes the function of parking in and out of the vehicle by gestures outside the car, In the narrow parking scene, the vehicle can be controlled to park in and out; and in the in-vehicle scene, the dynamic gesture recognition technology is used to combine the gesture recognition ability and the in-vehicle wire control ability, without touching the instrument or the car screen to achieve control, which ensures the owner of the car. While concentrating on driving the vehicle, the human-computer interaction experience is greatly improved.

2.本发明提供的基于深度学习的动态手势控车系统，在复杂背景下，通过卷积神经网络算法，在终端实现减小网络模型开销，采用基于深度学习的视频手势动作定位算法，实现动态手势识别准确率的提高，同时建立了动态手势数据库用于手势模型的训练，通过卷积神经网络、损失函数、通道注意力等算法和手段，在保障实时性、终端能力和符合用户场景等前提下，提高手势识别精准度，并录入用户定制的动态手势数据库，深度训练后可以更快地识别用户简单手势操作，以贴合用户手势控车场景。2. The deep learning-based dynamic gesture car control system provided by the present invention, in a complex background, through the convolutional neural network algorithm, the network model overhead is reduced at the terminal, and the deep learning-based video gesture action positioning algorithm is used to achieve dynamic The accuracy of gesture recognition is improved, and a dynamic gesture database is established for the training of gesture models. Through algorithms and means such as convolutional neural networks, loss functions, and channel attention, the premise of ensuring real-time performance, terminal capabilities, and user scenarios is guaranteed. In the next step, the accuracy of gesture recognition is improved, and the user-customized dynamic gesture database is entered. After in-depth training, the user's simple gesture operations can be recognized more quickly, so as to fit the user's gesture-controlled car scene.

3.本发明提供的基于深度学习的动态手势控车方法，该方法结合了人脸识别和动态手势识别提供了低复杂度且精准的图像信号，另外结合了云端认证系统和车端控车系统，实现了真正意义上的整个手势控车系统，提高整个人机交互性能，保障用户安全行驶。3. The deep learning-based dynamic gesture car control method provided by the present invention combines face recognition and dynamic gesture recognition to provide low-complexity and accurate image signals, and also combines a cloud authentication system and a vehicle-end car control system , realizes the whole gesture control system in the true sense, improves the whole human-computer interaction performance, and ensures the safe driving of users.

附图说明Description of drawings

图1为实施例1所述的动态手势控车系统结构示意图；1 is a schematic structural diagram of the dynamic gesture vehicle control system according to Embodiment 1;

图2为本发明实施例2所述的人脸识别模块的结构示意图；2 is a schematic structural diagram of a face recognition module according to Embodiment 2 of the present invention;

图3为本发明实施例1所述的手势识别的结构示意图；3 is a schematic structural diagram of gesture recognition according to Embodiment 1 of the present invention;

图4为本发明实施例3所述的分离卷积神经网络模块的结构示意图；4 is a schematic structural diagram of a separate convolutional neural network module according to Embodiment 3 of the present invention;

图5为本发明实施例5所述的通道注意力模块的结构示意图；5 is a schematic structural diagram of the channel attention module according to Embodiment 5 of the present invention;

图6为本发明实施例6所述的身份认证模块的结构示意图；6 is a schematic structural diagram of an identity authentication module according to Embodiment 6 of the present invention;

图7为本发明实施例7所述的车辆控制模块的结构示意图。FIG. 7 is a schematic structural diagram of a vehicle control module according to Embodiment 7 of the present invention.

具体实施方式Detailed ways

为使本发明的技术方案更加清楚，下面将结合本发明的说明书附图，对本发明实施例中的技术方案进行清楚、完整地描述，需要说明的是，以下实施例仅用于更好地理解本发明的技术方案，而不应理解为对本发明的限制。In order to make the technical solutions of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the description of the present invention. It should be noted that the following embodiments are only used for better understanding The technical solutions of the present invention should not be construed as limitations of the present invention.

实施例1.Example 1.

本实施例提供了一种基于深度学习的动态手势控车系统，其结构示意图见图1，包括车载终端和云平台；所述车载终端包括人脸识别模块、手势识别模块和车辆控制模块，各个模块通过以太网实现通信连接；所述云平台包括身份认证模块，所述身份认证模块与所述车载终端通过ACP协议连接。This embodiment provides a deep learning-based dynamic gesture vehicle control system, the schematic diagram of which is shown in Figure 1, including a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal includes a face recognition module, a gesture recognition module, and a vehicle control module. The module realizes communication connection through Ethernet; the cloud platform includes an identity authentication module, and the identity authentication module is connected with the vehicle terminal through ACP protocol.

本实施例提供了一种动态手势控车系统，系统结合了人脸识别和动态手势识别提供了低复杂度且精准的图像信号，另外结合了云端认证系统和车端控车系统，实现了真正意义上的整个手势控车系统，提高整个人机交互性能，保障用户安全行驶。This embodiment provides a dynamic gesture car control system. The system combines face recognition and dynamic gesture recognition to provide low-complexity and accurate image signals. In addition, it combines a cloud authentication system and a car-end car control system to achieve real In the sense of the whole gesture control system, the whole human-computer interaction performance is improved, and the user can drive safely.

动态手势数据库用于捕捉和存储用户常用手势信息，通过损失函数计算模块配合手势动作定位模块，使预测框更精准拟合真实框，通道注意力模块对不同特征通道之间的关系建模，深度可分离卷积神经网络模块减小网络开支，深度训练后实现快速特征提取和识别用户手势操作。The dynamic gesture database is used to capture and store the user's common gesture information. The loss function calculation module cooperates with the gesture action positioning module to make the prediction frame fit the real frame more accurately. The channel attention module models the relationship between different feature channels, and the depth The separable convolutional neural network module reduces network overhead, and realizes fast feature extraction and recognition of user gesture operations after deep training.

图3为本实施例提供的手势识别模块的结构示意图，其中，深度可分离卷积神经网络模块减小网络开支，极大提高网络处理速度和识别精度，保障系统实时性，有利于在移动端和嵌入式设备中进行使用。手势动作定位模块解决了传统定位方法梯度为0和不同对象无法对齐的缺陷，考虑了真实框和检测框的中心点之间的距离信息，同时添加了检测框和真实框各自宽高比的相似性损失项，使预测框更精准拟合真实框。FIG. 3 is a schematic structural diagram of the gesture recognition module provided in this embodiment, wherein the depth-separable convolutional neural network module reduces network expenses, greatly improves network processing speed and recognition accuracy, ensures system real-time performance, and is beneficial to mobile terminals. and embedded devices. The gesture action localization module solves the defects of the traditional localization method that the gradient is 0 and different objects cannot be aligned. It considers the distance information between the center point of the real frame and the detection frame, and adds the similarity of the respective aspect ratios of the detection frame and the real frame. The loss term is used to make the predicted box fit the real box more accurately.

动态手势数据库可以更好地贴合实际手势控车场景，利用相机采集手势图像和背景数据，其中背景数据包含光照强度、角度等数据。另外，使动态手势数据库在时间维度上具有差异性，以满足不同使用者的操作习惯。通过数据库捕捉用户常用手势信息，通过深度训练可提高识别精准度，当网络异常依然可从库中提取识别。本实施例通过卷积神经网络、损失函数、通道注意力等算法和手段，在保障实时性、终端能力和符合用户场景等前提下，提高手势识别精准度，并录入用户定制的动态手势数据库，深度训练后可以更快地识别用户简单手势操作，以贴合用户手势控车场景。The dynamic gesture database can better fit the actual gesture control scene, and use the camera to collect gesture images and background data, where the background data includes data such as light intensity and angle. In addition, the dynamic gesture database is differentiated in the time dimension to meet the operation habits of different users. The user's common gesture information is captured through the database, and the recognition accuracy can be improved through in-depth training. When the network is abnormal, the recognition can still be extracted from the library. This embodiment uses algorithms and means such as convolutional neural network, loss function, channel attention, etc., under the premise of ensuring real-time performance, terminal capability, and conformity to user scenarios, etc., to improve gesture recognition accuracy, and enter a user-customized dynamic gesture database. After in-depth training, the user's simple gesture operations can be recognized more quickly to fit the user's gesture-controlled car scene.

实施例2.Example 2.

本实施例为对实施例1的进一步举例说明，所述人脸识别模块包括：级联卷积神经网络、深度卷积神经网络、度量模块、归一化计算模块和损失函数计算模块；This embodiment is a further illustration of Embodiment 1, and the face recognition module includes: a cascaded convolutional neural network, a deep convolutional neural network, a metric module, a normalization calculation module and a loss function calculation module;

图2为本实施例提供的人脸识别模块结构示意图，所述人脸识别模块基于facenet皮尔森判别网络，所述人脸识别模块将摄像头采集到的静态人脸图像，经级联卷积神经网络提取目标人脸的图像信息，再经过深度卷积神经网络模块提取到人脸图像中的深度特征向量，经度量模块判断特征向量的相关程度，最后通过对人脸特征数据经归一化计算模块和损失函数计算模块计算整合出精准人脸数据，与云端人脸数据库进行比对认证，最终实现人脸识别的目的。2 is a schematic structural diagram of a face recognition module provided by the present embodiment. The face recognition module is based on the facenet Pearson discriminant network. The network extracts the image information of the target face, and then the deep convolutional neural network module extracts the deep feature vector in the face image, and the metric module determines the correlation degree of the feature vector, and finally calculates the face feature data by normalizing it. The module and the loss function calculation module calculate and integrate the accurate face data, compare and authenticate with the cloud face database, and finally realize the purpose of face recognition.

实施例3.Example 3.

本实施例为对实施例1的进一步举例说明，所述深度可分离卷积神经网络模块利用深度卷积和1×1卷积进行深度融合，所述深度可分离卷积神经网络模块的第一步和最后一步都采用1×1卷积，中间步骤利用ResNet特征融合理念对浅层特征融合，通过压缩通道数量减少网络参数量。This embodiment is a further illustration of Embodiment 1. The depthwise separable convolutional neural network module uses depthwise convolution and 1×1 convolution to perform depth fusion. Both the first step and the last step use 1×1 convolution, and the intermediate step uses the ResNet feature fusion concept to fuse shallow features, and reduces the amount of network parameters by compressing the number of channels.

本实施例所述深度可分离卷积神经网络模块示意图如图4所示，第一步有利于对输入特征进行升维避免由于非线性激活造成特征损失，中间利用ResNet特征融合理念对浅层特征融合，通过压缩通道数量减少网络参数量，最后一步减少融合的特征数，本实施例在不影响网络性能前提下降低了系统计算度。The schematic diagram of the deeply separable convolutional neural network module in this embodiment is shown in Figure 4. The first step is to increase the dimension of the input features to avoid feature loss due to nonlinear activation. In fusion, the number of network parameters is reduced by compressing the number of channels, and the number of features to be fused is reduced in the last step. This embodiment reduces the computational degree of the system on the premise of not affecting the network performance.

实施例4.Example 4.

本实施例为对实施例1的进一步举例说明，所述损失函数计算模块用于计算检测框和真实框各自宽高比的相似性损失，具体计算方法为：This embodiment is a further illustration of Embodiment 1. The loss function calculation module is used to calculate the similarity loss of the respective aspect ratios of the detection frame and the real frame. The specific calculation method is as follows:

其中，in,

L＝L_C+L_con+L_s，L=L _C +L _con +L _s ,

其中，in,

其中，s²代表网格数，刀代表边界框数，

代表目标物是否落入第i个网格第j个边界框，

代表没有落入，

Represents whether the target falls into the jth bounding box of the ith grid,

represents not falling,

实施例5.Example 5.

本实施例为对实施例1的进一步举例说明，所述通道注意力模块采用双通道，经过双通道两个不同尺度的输出后添加通道注意力模块，通过分配不同权重，对两个不同尺度的特征输出，经非极大值抑制得出最终的检测结果。This embodiment is a further illustration of Embodiment 1. The channel attention module adopts dual channels. After the output of two different scales of the dual channels, a channel attention module is added. Feature output, the final detection result is obtained by non-maximum suppression.

由于不同通道的信息对于手势目标的特征表达能力是不同的，通道注意力模块对不同特征通道之间的关系建模，突出关键信息的权重并去除无关信息，提高手势检测的精准率。为提升网络性能，本实施例提出双通道注意力模块，如图5所示，手势图像通过预处理模块，经过双通道两个不同尺度的输出后添加通道注意力模块，通过分配不同权重，对s1和s2这两个不同尺度的特征输出，经非极大值抑制得出最终的检测结果。Since the information of different channels has different feature expression capabilities for gesture targets, the channel attention module models the relationship between different feature channels, highlights the weight of key information and removes irrelevant information to improve the accuracy of gesture detection. In order to improve the network performance, this embodiment proposes a dual-channel attention module. As shown in Figure 5, the gesture image is passed through the preprocessing module, and the channel attention module is added after the output of two different scales of the dual-channel. The feature outputs of two different scales, s1 and s2, are subjected to non-maximum suppression to obtain the final detection result.

实施例6.Example 6.

本实施例为对实施例1的进一步举例说明，图6为本发明实施例提供的身份认证模块结构示意图，所述身份认证模块为基于公钥密码体制的数字签名认证系统，通过车内外高精摄像头采集到人脸图像，图像数据预处理后经CAN线传输到车载通信终端TBOX，由TBOX将数据投传到云平台TSP进行初次人脸图像存储并生成唯一识别码，进行初次数字证书申请到认证系统PKI，在PKI系统中首先通过RA模块对用户的身份注册请求进行身份核查，然后由CA模块为用户从证书库获取并颁发数字证书，进而将用户的身份信息以及用户的公钥通过数字证书的形式完成绑定，从而实现用户身份认证。This embodiment is a further illustration of Embodiment 1, and FIG. 6 is a schematic structural diagram of an identity authentication module provided by an embodiment of the present invention. The identity authentication module is a digital signature authentication system based on public key cryptography. The camera collects the face image, and the image data is preprocessed and transmitted to the in-vehicle communication terminal TBOX through the CAN line. The TBOX transmits the data to the cloud platform TSP for initial face image storage and generates a unique identification code, and applies for the initial digital certificate. The authentication system PKI, in the PKI system, firstly checks the identity of the user's identity registration request through the RA module, and then the CA module obtains and issues a digital certificate for the user from the certificate store, and then the user's identity information and the user's public key are passed through the digital certificate. The binding is completed in the form of a certificate, thereby realizing user identity authentication.

实施例7.Example 7.

本实施例为对实施例1的进一步举例说明，图7为本实施例提供的车辆控制模块的结构示意图，所述的车辆控制模块包括车载高精摄像头、网络通信模块、车载计算单元和线控单元；车载高精摄像头通过手势识别算法提取到准确的手势图像信号，经通信网络模块实时传输到车载计算单元，对手势信号进行过滤计算处理，并融合其它通过传感器感知到的防碰撞信号，统一发送至线控单元实现手势对车辆的控制。This embodiment is a further illustration of Embodiment 1. FIG. 7 is a schematic structural diagram of a vehicle control module provided in this embodiment. The vehicle control module includes an on-board high-precision camera, a network communication module, an on-board computing unit, and a wire-control unit. The vehicle high-precision camera extracts the accurate gesture image signal through the gesture recognition algorithm, transmits it to the vehicle computing unit in real time through the communication network module, filters the gesture signal, and fuses other anti-collision signals perceived by the sensor. Send it to the wire control unit to control the vehicle with gestures.

实施例8.Example 8.

本实施例为对实施例7的进一步举例说明，所述手势对车辆的控制包括对车辆的外部控制和车内控制，所述外部控制包括倒车、刹车以及转向，所述车内控制包括对车内电子座椅、车机屏幕、车内音响以及氛围灯的控制。This embodiment is a further illustration of Embodiment 7, the control of the vehicle by gestures includes external control of the vehicle and in-vehicle control, the external control includes reversing, braking and steering, and the in-vehicle control includes control of the vehicle Control of electronic seats, car screen, car audio and ambient lighting.

实施例9.Example 9.

本实施例提供了一种基于深度学习的动态手势控车方法，应用实施例1-8中任一项所述动态手势控车系统，所述动态手势控车方法包括以下步骤：This embodiment provides a deep learning-based dynamic gesture car control method, applying the dynamic gesture car control system in any one of Embodiments 1-8, and the dynamic gesture car control method includes the following steps:

本实施例提供的基于深度学习的动态手势控车方法，该方法结合了人脸识别和动态手势识别提供了低复杂度且精准的图像信号，另外结合了云端认证系统和车端控车系统，实现了真正意义上的整个手势控车系统，提高整个人机交互性能，保障用户安全行驶。The deep learning-based dynamic gesture car control method provided in this embodiment combines face recognition and dynamic gesture recognition to provide low-complexity and accurate image signals, and also combines a cloud authentication system and a vehicle-end car control system. Realize the whole gesture car control system in the true sense, improve the whole human-computer interaction performance, and ensure the safe driving of users.

Claims

1. a kind of dynamic gesture vehicle control system based on deep learning, is characterized in that, comprises vehicle-mounted terminal and cloud platform; Described vehicle-mounted terminal comprises face recognition module, gesture recognition module and vehicle control module, and each module realizes communication by Ethernet connection; the cloud platform includes an identity authentication module, and the identity authentication module is connected with the vehicle terminal through the ACP protocol;

The gesture recognition module includes: a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;

The dynamic gesture database is used to capture and store the user's common gesture information. The loss function calculation module cooperates with the gesture action positioning module to make the prediction frame fit the real frame more accurately. The channel attention module is used to output feature detection results, and the depthwise separable convolution The neural network module performs in-depth training on user gesture information to achieve fast feature extraction and recognition of user gesture operations.

2. The dynamic gesture car control system based on deep learning according to claim 1, wherein the face recognition module comprises: cascaded convolutional neural network, deep convolutional neural network, metric module, normalization Calculation module and loss function calculation module;

The face recognition module extracts the image information of the target face from the static face image collected by the camera through the cascaded convolutional neural network, and then extracts the depth feature vector in the face image through the deep convolutional neural network module, The measurement module judges the correlation degree of the feature vector, and finally calculates and integrates the accurate face data through the normalization calculation module and the loss function calculation module of the face feature data, and compares and authenticates with the cloud face database, and finally realizes the face purpose of identification.

3 . The deep learning-based dynamic gesture car control system according to claim 1 , wherein the depthwise separable convolutional neural network module utilizes depthwise convolution and 1×1 convolution to perform depth fusion, and the depth The first and last steps of the separable convolutional neural network module use 1×1 convolution, and the intermediate step uses the ResNet feature fusion concept to fuse shallow features, and reduces the amount of network parameters by compressing the number of channels.

4. The deep learning-based dynamic gesture car control system according to claim 1, wherein the loss function calculation module is used to calculate the similarity loss of the respective aspect ratios of the detection frame and the real frame, and the specific calculation method is :

in,

b and b ₁ represent the center point of the detection frame and the real frame, respectively, ρ is the Euclidean distance, c is the distance between the farthest vertices between the detection frame and the real frame, IoU represents the intersection ratio of b and b ₁ , x and y represent the width and height of the detection frame, respectively, and x ₁ and y ₁ represent the width and height of the real frame, respectively;

The gesture action localization module loss replaces the bounding box prediction loss term in the traditional localization algorithm. The improved loss function L consists of three parts: error, confidence error and classification error:

L=L _c +L _con +L _s ,

in,

where s ² represents the number of grids, B represents the number of bounding boxes,

Represents whether the target falls into the jth bounding box of the ith grid,

represents not falling,

5. The deep learning-based dynamic gesture car control system according to claim 1, wherein the channel attention module adopts dual channels, and the channel attention module is added after the outputs of two different scales of the dual channels, through Different weights are assigned, and the final detection results are obtained by non-maximum suppression for feature outputs of two different scales.

6. The deep learning-based dynamic gesture car control system according to claim 1, wherein the identity authentication module is a digital signature authentication system based on public key cryptography, and collects human faces through high-precision cameras inside and outside the vehicle The image and image data are preprocessed and transmitted to the in-vehicle communication terminal TBOX through the CAN line. The TBOX transmits the data to the cloud platform TSP for initial face image storage and generates a unique identification code, and applies for the initial digital certificate to the authentication system PKI. In the PKI system, the RA module firstly checks the identity of the user's identity registration request, and then the CA module obtains and issues a digital certificate for the user from the certificate store, and then binds the user's identity information and the user's public key in the form of a digital certificate. to achieve user identity authentication.

7. The deep learning-based dynamic gesture vehicle control system according to claim 1, wherein the vehicle control module comprises a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit and a wire-controlled unit; the vehicle-mounted high-precision camera The camera extracts the accurate gesture image signal through the gesture recognition algorithm, transmits it to the vehicle-mounted computing unit in real time through the communication network module, filters and calculates the gesture signal, and integrates other anti-collision signals perceived by the sensor, and sends it to the wire control unit uniformly Implement gesture control of the vehicle.

8 . The deep learning-based dynamic gesture vehicle control system according to claim 7 , wherein the control of the vehicle by the gesture includes external control and interior control of the vehicle, and the external control includes reversing, braking and Steering, the in-vehicle control includes the control of the in-vehicle electronic seat, the in-vehicle screen, the in-vehicle audio and the ambient light.

9. A deep learning-based dynamic gesture vehicle control method, wherein the dynamic gesture vehicle control system according to any one of claims 1-8 is applied, and the dynamic gesture vehicle control method comprises the following steps:

S1. A static face image is collected and recognized by a high-precision camera, and the in-vehicle communication terminal uploads the face signal to the cloud through the ACP protocol for identity verification;

S2. After the identity verification is passed, the dynamic gesture image of the user is captured, and the accurate gesture signal is calculated and analyzed in real time by the gesture recognition module.

S3. The gesture signal is transmitted to the in-vehicle computing unit through the Ethernet for fusion processing, and then sent to the wire-controlled unit to control the vehicle inside and outside the vehicle.