CN115063849A - Dynamic gesture vehicle control system and method based on deep learning - Google Patents

Dynamic gesture vehicle control system and method based on deep learning Download PDF

Info

Publication number
CN115063849A
CN115063849A CN202210561028.7A CN202210561028A CN115063849A CN 115063849 A CN115063849 A CN 115063849A CN 202210561028 A CN202210561028 A CN 202210561028A CN 115063849 A CN115063849 A CN 115063849A
Authority
CN
China
Prior art keywords
module
vehicle
gesture
face
dynamic gesture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210561028.7A
Other languages
Chinese (zh)
Inventor
刘赫
吕贵林
陈涛
孙玉洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FAW Group Corp
Original Assignee
FAW Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FAW Group Corp filed Critical FAW Group Corp
Priority to CN202210561028.7A priority Critical patent/CN115063849A/en
Publication of CN115063849A publication Critical patent/CN115063849A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A dynamic gesture vehicle control system and method based on deep learning relates to the technical field of deep learning, solves the problems of low accuracy and low recognition speed of existing dynamic gesture recognition, and can be applied to a medium-high-end vehicle control system. The system comprises a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP protocol. The dynamic gesture car control method comprises the following steps: acquiring and identifying a static face image through a high-precision camera, and uploading the face signal to a cloud terminal through an ACP (advanced carrier protocol) by a vehicle-mounted communication terminal for identity verification; capturing a dynamic gesture image of a user, and analyzing and calculating an accurate gesture signal in real time through a gesture recognition module; the data are transmitted to a vehicle-mounted computing unit through Ethernet for fusion processing, and are sent to a line control unit to realize vehicle control inside and outside the vehicle.

Description

Dynamic gesture vehicle control system and method based on deep learning
Technical Field
The invention relates to the technical field of deep learning, in particular to a dynamic gesture vehicle control technology based on deep learning.
Background
With the economic development and the continuous improvement of the living standard of people, the sales quantity of vehicles continuously rises, the parking space resources are increasingly tense, the demand of controlling the vehicles to park outside the vehicle is increasingly obvious, the comfort requirement of controlling inside the vehicle is increasingly diversified, and the vehicle is an important configuration of middle and high-end automobiles. Therefore, the gesture car control method becomes one of the important directions for various large car factories to pursue innovation.
At present, aiming at an external vehicle control system, the external vehicle control system only stays at an intelligent terminal, for example, a mobile phone APP and other layers, terminal control often depends on a cloud to issue a control instruction, and under the premise that the system performance optimization degree is low, the external vehicle control system has the defects of overlong remote control link, high resource consumption, poor user experience, overtime, abnormal instruction and the like. In addition, aiming at in-vehicle control, compared with a touch key method, the gesture recognition method can bring better human-computer interaction experience to a user, ensure that the user concentrates attention to drive the vehicle, and achieve three-dimensional space stereo control to replace two-dimensional plane control. However, the existing dynamic gesture empty car technology has low gesture recognition accuracy and slow recognition speed, and seriously influences user experience.
Disclosure of Invention
The invention provides a dynamic gesture vehicle control system and method based on deep learning, and aims to solve the problems that in the prior art, the dynamic gesture recognition accuracy is not high and the recognition speed is low.
The technical scheme of the invention is as follows:
a dynamic gesture vehicle control system based on deep learning comprises a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP (application program carrier) protocol;
the gesture recognition module includes: the system comprises a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;
the dynamic gesture database is used for capturing and storing commonly used gesture information of a user, the prediction box is enabled to be more accurately fitted with the real box through the cooperation of the loss function calculation module and the gesture action positioning module, the channel attention module is used for outputting a feature detection result, and the depth separable convolutional neural network module is used for conducting depth training on the gesture information of the user, so that rapid feature extraction and user gesture recognition operation are achieved.
Preferably, the face recognition module includes: the device comprises a cascade convolution neural network, a deep convolution neural network, a measurement module, a normalization calculation module and a loss function calculation module;
the face recognition module extracts image information of a target face from a static face image collected by a camera through a cascaded convolutional neural network, extracts a depth characteristic vector from the face image through a depth convolutional neural network module, judges the correlation degree of the characteristic vector through a longitude quantity module, calculates and integrates accurate face data through the face characteristic data through a normalization calculation module and a loss function calculation module, compares the face characteristic data with a cloud face database, and finally achieves the purpose of face recognition.
Preferably, the deep separable convolutional neural network module performs deep fusion by using a deep convolutional sum 1 × 1 convolution, the first step and the last step of the deep separable convolutional neural network module both use 1 × 1 convolution, the intermediate step fuses shallow features by using a ResNet feature fusion concept, and the number of network parameters is reduced by compressing the number of channels.
Preferably, the loss function calculating module is configured to calculate a similarity loss of the aspect ratio of each of the detection frame and the real frame, and the specific calculating method includes:
Figure BDA0003656588210000021
wherein,
Figure BDA0003656588210000031
Figure BDA0003656588210000032
b and b 1 Respectively representing the center points of the detection frame and the real frame, p is Euclidean distance, c is the distance between the vertexes with the farthest distance between the detection frame and the real frame, and IoU is b and b 1 X and y represent the width and height of the detection box, respectively, x 1 And y 1 Respectively representing the width and height of the real box;
the gesture motion positioning module replaces a boundary box prediction loss item in a traditional positioning algorithm by loss, and an improved loss function L consists of three parts, namely an error, a confidence error and a classification error:
L=L C +L con +L s
wherein,
Figure BDA0003656588210000033
wherein s is 2 Representing the number of grids, the number of bounding boxes,
Figure BDA0003656588210000034
representing whether the object falls within the jth bounding box of the ith grid,
Figure BDA0003656588210000035
the representative is not a member of the group,
Figure BDA0003656588210000036
represents that has fallen into; the network employs a fusion of dimension features 13 receptacle13 and 26 receptacle26, thus s 2 =13 2 And s 2 =26 2 ,B=2。
Preferably, the channel attention module adopts two channels, the channel attention module is added after two outputs of different scales of the two channels are processed, different weights are distributed, the features of the two different scales are output, and a final detection result is obtained through non-maximum suppression.
Preferably, the identity authentication module is a digital signature authentication system based on a public key cryptosystem, human face images are acquired through high-precision cameras inside and outside a vehicle, image data are transmitted to a vehicle-mounted communication terminal TBOX through a CAN (controller area network) line after being preprocessed, the TBOX sends data to a cloud platform TSP (Total suspended particulate) for primary human face image storage and generating a unique identification code, a primary digital certificate is applied to the authentication system PKI, identity verification is performed on an identity registration request of a user through an RA (random access) module in the PKI system, then a CA (certificate Authority) module acquires and issues a digital certificate for the user from a certificate bank, and then identity information of the user and a public key of the user are bound in a form of the digital certificate, so that user identity authentication is achieved.
Preferably, the vehicle control module comprises a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit and a line control unit; the vehicle-mounted high-precision camera extracts accurate gesture image signals through a gesture recognition algorithm, transmits the accurate gesture image signals to the vehicle-mounted computing unit through the communication network module in real time, filters, calculates and processes the gesture signals, integrates other anti-collision signals sensed through the sensor, and uniformly sends the anti-collision signals to the line control unit to realize control of gestures on vehicles.
Preferably, the control of the vehicle by the gesture comprises external control and in-vehicle control of the vehicle, the external control comprises reversing, braking and steering, and the in-vehicle control comprises control of an in-vehicle electronic seat, a vehicle machine screen, an in-vehicle sound box and an atmosphere lamp.
A dynamic gesture car control method based on deep learning is applied to the dynamic gesture car control system, and comprises the following steps:
s1, acquiring and recognizing a static face image through a high-precision camera, and uploading the face signal to a cloud end through an ACP (advanced carrier protocol) by the vehicle-mounted communication terminal for identity verification;
s2, after passing the identity authentication, capturing the dynamic gesture image of the user, analyzing and calculating the accurate gesture signal in real time through the gesture recognition module,
and S3, transmitting the gesture signals to the vehicle-mounted computing unit through the Ethernet for fusion processing, and then transmitting the signals to the drive-by-wire unit to realize the control of the inside and outside of the vehicle.
Compared with the prior art, the method solves the problems of low accuracy and low recognition speed of dynamic gesture recognition, and has the following specific beneficial effects:
1. the dynamic gesture vehicle control system based on deep learning provided by the invention aims at the current situation that the parking space resources are insufficient and a user cannot enter the vehicle in an effective space, so that the gesture vehicle parking and parking functions of the user outside the vehicle are realized, and the vehicle can be controlled to park in and out in a narrow parking scene; and utilize dynamic gesture recognition technology under the scene in the car, realize combining gesture recognition ability and interior line control ability in the car, need not touch instrument or car screen and realize controlling, when guaranteeing that the car owner concentrates on the vehicle that traveles, greatly improved human-computer interaction experience.
2. According to the dynamic gesture vehicle control system based on deep learning, provided by the invention, under a complex background, network model overhead is reduced at a terminal through a convolutional neural network algorithm, the accuracy of dynamic gesture recognition is improved by adopting a video gesture action positioning algorithm based on deep learning, meanwhile, a dynamic gesture database is established for training a gesture model, and through algorithms and means such as a convolutional neural network, a loss function, channel attention and the like, on the premise of guaranteeing real-time performance and terminal capability and according with a user scene, the gesture recognition accuracy is improved, a dynamic gesture database customized by a user is input, and simple gesture operation of the user can be recognized more quickly after deep training so as to fit the user gesture vehicle control scene.
3. According to the dynamic gesture vehicle control method based on deep learning, provided by the invention, the face recognition and the dynamic gesture recognition are combined to provide a low-complexity and accurate image signal, and in addition, the cloud authentication system and the vehicle end vehicle control system are combined to realize the whole gesture vehicle control system in the real sense, so that the whole human-computer interaction performance is improved, and the safe driving of a user is ensured.
Drawings
Fig. 1 is a schematic structural diagram of a dynamic gesture car control system according to embodiment 1;
fig. 2 is a schematic structural diagram of a face recognition module according to embodiment 2 of the present invention;
fig. 3 is a schematic structural diagram of gesture recognition according to embodiment 1 of the present invention;
fig. 4 is a schematic structural diagram of a separate convolutional neural network module according to embodiment 3 of the present invention;
fig. 5 is a schematic structural diagram of a channel attention module according to embodiment 5 of the present invention;
fig. 6 is a schematic structural diagram of an identity authentication module according to embodiment 6 of the present invention;
fig. 7 is a schematic structural diagram of a vehicle control module according to embodiment 7 of the present invention.
Detailed Description
In order to make the technical solutions of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the specification of the present invention, and it should be noted that the following embodiments are only used for better understanding of the technical solutions of the present invention, and should not be construed as limiting the present invention.
Example 1.
The embodiment provides a dynamic gesture vehicle control system based on deep learning, which is shown in a structural schematic diagram of fig. 1 and comprises a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP protocol.
The embodiment provides a dynamic gesture car control system, and the system combines face recognition and dynamic gesture recognition to provide low-complexity and accurate image signals, and combines a cloud authentication system and a car end car control system, so that the whole gesture car control system in the true sense is realized, the whole human-computer interaction performance is improved, and the safe driving of a user is guaranteed.
The gesture recognition module includes: the system comprises a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;
the dynamic gesture database is used for capturing and storing user common gesture information, the prediction box is enabled to be more accurately fitted with the real box through the loss function calculation module and the gesture action positioning module, the channel attention module models the relation between different characteristic channels, the network expenditure is reduced through the deep separable convolutional neural network module, and the rapid characteristic extraction and the user gesture recognition operation are achieved after deep training.
Fig. 3 is a schematic structural diagram of the gesture recognition module provided in this embodiment, in which the deep separable convolutional neural network module reduces network overhead, greatly improves network processing speed and recognition accuracy, guarantees system real-time performance, and is favorable for use in a mobile terminal and an embedded device. The gesture action positioning module overcomes the defect that the gradient of a traditional positioning method is 0 and different objects cannot be aligned, distance information between the central points of the real frame and the detection frame is considered, and similarity loss terms of the aspect ratios of the detection frame and the real frame are added, so that the prediction frame can be more accurately fitted with the real frame.
The dynamic gesture database can be better fit with an actual gesture car control scene, and a camera is used for collecting gesture images and background data, wherein the background data comprises data such as illumination intensity and angles. In addition, the dynamic gesture database has difference in time dimension so as to meet the operation habits of different users. The common gesture information of the user is captured through the database, the recognition accuracy can be improved through deep training, and the user can still extract and recognize the common gesture information from the database when the network is abnormal. According to the method and the device, through algorithms and means such as a convolutional neural network, a loss function and channel attention, on the premise that real-time performance and terminal capacity are guaranteed, a user scene is met, gesture recognition accuracy is improved, a dynamic gesture database customized by a user is input, simple gesture operation of the user can be recognized more quickly after deep training, and the user gesture vehicle control scene is fit.
Example 2.
This embodiment is a further example of embodiment 1, and the face recognition module includes: the device comprises a cascade convolution neural network, a deep convolution neural network, a measurement module, a normalization calculation module and a loss function calculation module;
fig. 2 is a schematic structural diagram of a face recognition module provided in this embodiment, the face recognition module is based on a facenet pearson discrimination network, the face recognition module extracts a static face image acquired by a camera through a cascaded convolutional neural network, extracts image information of a target face through a deep convolutional neural network module, extracts a depth feature vector in the face image through the deep convolutional neural network module, and the longitude quantity module judges the degree of correlation of the feature vector, and finally calculates and integrates accurate face data through a normalization calculation module and a loss function calculation module for the face feature data, and compares the accurate face data with a cloud face database for authentication, thereby finally achieving the purpose of face recognition.
Example 3.
This embodiment is a further illustration of embodiment 1, in which the deep separable convolutional neural network module performs deep fusion by using a deep convolutional sum 1 × 1 convolution, the first step and the last step of the deep separable convolutional neural network module both use 1 × 1 convolution, the intermediate step uses a ResNet feature fusion concept to fuse shallow features, and the number of network parameters is reduced by compressing the number of channels.
The schematic diagram of the depth separable convolutional neural network module in this embodiment is shown in fig. 4, the first step is favorable for performing dimension enhancement on input features to avoid feature loss caused by nonlinear activation, the shallow features are fused by using the ResNet feature fusion concept in the middle, the number of network parameters is reduced by compressing the number of channels, and the fused feature number is reduced in the last step.
Example 4.
This embodiment is a further example of embodiment 1, where the loss function calculation module is configured to calculate a similarity loss between the aspect ratio of the detection frame and the actual frame, and the specific calculation method is as follows:
Figure BDA0003656588210000081
wherein,
Figure BDA0003656588210000082
Figure BDA0003656588210000083
b and b 1 Respectively representing the central points of the detection frame and the real frame, rho is the Euclidean distance, c is the distance between the vertex points with the farthest distance between the detection frame and the real frame, IoU is b and b 1 X and y represent the width and height of the detection box, respectively, x 1 And y 1 Respectively representing the width and height of the real box;
the gesture motion positioning module replaces a boundary box prediction loss item in a traditional positioning algorithm by loss, and an improved loss function L consists of three parts, namely an error, a confidence error and a classification error:
L=L C +L con +L s
wherein,
Figure BDA0003656588210000084
wherein s is 2 Representing the number of grids, the number of bounding boxes,
Figure BDA0003656588210000085
representing whether the object falls within the jth bounding box of the ith grid,
Figure BDA0003656588210000086
the representative is not a member of the group,
Figure BDA0003656588210000087
represents that has fallen into; the network employs a fusion of dimension features 13 receptacle13 and 26 receptacle26, thus s 2 =13 2 And s 2 =26 2 ,B=2。
Example 5.
This embodiment is a further example of embodiment 1, where the channel attention module uses two channels, adds the channel attention module after two different scales of outputs of the two channels, and obtains a final detection result by distributing different weights to feature outputs of the two different scales and suppressing a non-maximum value.
Because the information of different channels has different feature expression capacities for the gesture target, the channel attention module models the relationship between different feature channels, highlights the weight of key information and removes irrelevant information, and improves the accuracy of gesture detection. In order to improve the network performance, the present embodiment provides a dual-channel attention module, as shown in fig. 5, the gesture image passes through the preprocessing module, the dual-channel attention module is added after two different scales of outputs, different weights are assigned, and a final detection result is obtained by performing non-maximum suppression on the feature outputs of the two different scales of s1 and s 2.
Example 6.
In this embodiment, as a further example of embodiment 1, fig. 6 is a schematic structural diagram of an identity authentication module provided in an embodiment of the present invention, where the identity authentication module is a digital signature authentication system based on a public key cryptosystem, a human face image is acquired by using high-precision cameras inside and outside a vehicle, image data is preprocessed and then transmitted to a vehicle-mounted communication terminal TBOX through a CAN line, the TBOX transfers the data to a cloud platform TSP for primary human face image storage and generates a unique identification code, a primary digital certificate is applied to an authentication system PKI, in the PKI system, an identity registration request of a user is first authenticated through an RA module, then a CA module acquires and issues a digital certificate from a certificate repository for the user, and further identity information of the user and a public key of the user are bound in the form of the digital certificate, so as to implement user identity authentication.
Example 7.
In this embodiment, to further illustrate embodiment 1, fig. 7 is a schematic structural diagram of a vehicle control module provided in this embodiment, where the vehicle control module includes a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit, and a line control unit; the vehicle-mounted high-precision camera extracts accurate gesture image signals through a gesture recognition algorithm, transmits the accurate gesture image signals to the vehicle-mounted computing unit through the communication network module in real time, filters, calculates and processes the gesture signals, integrates other anti-collision signals sensed through the sensor, and uniformly sends the anti-collision signals to the line control unit to realize control of gestures on vehicles.
Example 8.
This embodiment is a further illustration of embodiment 7, where the control of the vehicle by the gesture includes external control and in-vehicle control of the vehicle, where the external control includes reversing, braking, and steering, and the in-vehicle control includes control of an in-vehicle electronic seat, an in-vehicle screen, an in-vehicle audio, and an ambience light.
Example 9.
The embodiment provides a dynamic gesture vehicle control method based on deep learning, which is applied to the dynamic gesture vehicle control system in any one of embodiments 1 to 8, and comprises the following steps:
s1, acquiring and recognizing a static face image through a high-precision camera, and uploading the face signal to a cloud end through an ACP (advanced carrier protocol) by the vehicle-mounted communication terminal for identity verification;
s2, after passing the identity authentication, capturing the dynamic gesture image of the user, analyzing and calculating the accurate gesture signal in real time through the gesture recognition module,
and S3, transmitting the gesture signals to the vehicle-mounted computing unit through the Ethernet for fusion processing, and then transmitting the signals to the drive-by-wire unit to realize the control of the inside and outside of the vehicle.
According to the dynamic gesture vehicle control method based on deep learning, the method combines face recognition and dynamic gesture recognition to provide low-complexity and accurate image signals, and combines a cloud authentication system and a vehicle-end vehicle control system, so that the whole gesture vehicle control system in the true sense is realized, the whole human-computer interaction performance is improved, and the safe driving of a user is guaranteed.

Claims (9)

1. A dynamic gesture vehicle control system based on deep learning is characterized by comprising a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP (application program carrier) protocol;
the gesture recognition module includes: the system comprises a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;
the dynamic gesture database is used for capturing and storing commonly used gesture information of a user, the prediction box is enabled to be more accurately fitted with the real box through the cooperation of the loss function calculation module and the gesture action positioning module, the channel attention module is used for outputting a feature detection result, and the depth separable convolutional neural network module is used for conducting depth training on the gesture information of the user, so that rapid feature extraction and user gesture recognition operation are achieved.
2. The deep learning based dynamic gesture vehicle control system according to claim 1, wherein the face recognition module comprises: the device comprises a cascade convolution neural network, a deep convolution neural network, a measurement module, a normalization calculation module and a loss function calculation module;
the face recognition module extracts image information of a target face from a static face image collected by a camera through a cascaded convolutional neural network, extracts a depth characteristic vector from the face image through a depth convolutional neural network module, judges the correlation degree of the characteristic vector through a longitude quantity module, calculates and integrates accurate face data through the face characteristic data through a normalization calculation module and a loss function calculation module, compares the face characteristic data with a cloud face database, and finally achieves the purpose of face recognition.
3. The deep learning-based dynamic gesture vehicle control system according to claim 1, wherein the deep separable convolutional neural network module performs deep fusion by using deep convolution and 1 x 1 convolution, the first step and the last step of the deep separable convolutional neural network module both use 1 x 1 convolution, the intermediate step uses the ResNet feature fusion concept to fuse shallow features, and the number of network parameters is reduced by compressing the number of channels.
4. The dynamic gesture vehicle control system based on deep learning of claim 1, wherein the loss function calculation module is configured to calculate a similarity loss of respective aspect ratios of the detection frame and the real frame, and the specific calculation method is as follows:
Figure FDA0003656588200000021
wherein,
Figure FDA0003656588200000022
Figure FDA0003656588200000023
b and b 1 Respectively representing the center points of the detection frame and the real frame, p is Euclidean distance, c is the distance between the vertexes with the farthest distance between the detection frame and the real frame, and IoU is b and b 1 X and y represent the width and height of the detection box, respectively, x 1 And y 1 Respectively representing the width and height of the real box;
the gesture motion positioning module replaces a boundary box prediction loss item in a traditional positioning algorithm by loss, and an improved loss function L consists of three parts, namely an error, a confidence error and a classification error:
L=L c +L con +L s
wherein,
Figure FDA0003656588200000024
wherein s is 2 Represents the number of grids, B represents the number of bounding boxes,
Figure FDA0003656588200000025
representing whether the object falls within the jth bounding box of the ith grid,
Figure FDA0003656588200000026
the representative is not a member of the group,
Figure FDA0003656588200000027
represents that has fallen into; the network employs a fusion of dimension features 13 receptacle13 and 26 receptacle26, thus s 2 =13 2 And s 2 =26 2 ,B=2。
5. The dynamic gesture vehicle control system based on deep learning of claim 1, wherein the channel attention module adopts two channels, the channel attention module is added after the output of two different scales of the two channels, the final detection result is obtained by distributing different weights to the feature output of two different scales and inhibiting by non-maximum value.
6. The dynamic gesture vehicle control system based on deep learning of claim 1, wherein the identity authentication module is a digital signature authentication system based on a public key cryptosystem, the human face images are acquired through high-precision cameras inside and outside the vehicle, the image data are transmitted to a vehicle-mounted communication terminal TBOX through a CAN (controller area network) line after being preprocessed, the data are transmitted to a cloud platform TSP (short message service) by the TBOX to be subjected to primary human face image storage and generate a unique identification code, a primary digital certificate is applied to the authentication system PKI, in the PKI system, firstly, identity verification is performed on an identity registration request of a user through an RA (random access) module, then, a CA (certificate authority) module acquires and issues a digital certificate for the user from a certificate bank, and further, identity information of the user and a public key of the user are bound through the form of the digital certificate, so that user identity authentication is achieved.
7. The deep learning-based dynamic gesture vehicle control system according to claim 1, wherein the vehicle control module comprises a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit and a line control unit; the vehicle-mounted high-precision camera extracts accurate gesture image signals through a gesture recognition algorithm, transmits the accurate gesture image signals to the vehicle-mounted computing unit through the communication network module in real time, filters, calculates and processes the gesture signals, integrates other anti-collision signals sensed through the sensor, and uniformly sends the anti-collision signals to the line control unit to realize control of gestures on vehicles.
8. The deep learning based dynamic gesture car control system according to claim 7, wherein the control of the vehicle by the gesture comprises external control and in-car control of the vehicle, the external control comprises reversing, braking and steering, and the in-car control comprises control of in-car electronic seats, car screen, in-car stereo and atmosphere lights.
9. A dynamic gesture car control method based on deep learning is characterized in that the dynamic gesture car control system of any one of claims 1-8 is applied, and the dynamic gesture car control method comprises the following steps:
s1, acquiring and recognizing a static face image through a high-precision camera, and uploading the face signal to a cloud end through an ACP (advanced carrier protocol) by the vehicle-mounted communication terminal for identity verification;
s2, after passing the identity authentication, capturing the dynamic gesture image of the user, analyzing and calculating the accurate gesture signal in real time through the gesture recognition module,
and S3, transmitting the gesture signals to the vehicle-mounted computing unit through the Ethernet for fusion processing, and then transmitting the signals to the drive-by-wire unit to realize the control of the inside and outside of the vehicle.
CN202210561028.7A 2022-05-23 2022-05-23 Dynamic gesture vehicle control system and method based on deep learning Pending CN115063849A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210561028.7A CN115063849A (en) 2022-05-23 2022-05-23 Dynamic gesture vehicle control system and method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210561028.7A CN115063849A (en) 2022-05-23 2022-05-23 Dynamic gesture vehicle control system and method based on deep learning

Publications (1)

Publication Number Publication Date
CN115063849A true CN115063849A (en) 2022-09-16

Family

ID=83198308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210561028.7A Pending CN115063849A (en) 2022-05-23 2022-05-23 Dynamic gesture vehicle control system and method based on deep learning

Country Status (1)

Country Link
CN (1) CN115063849A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620397A (en) * 2022-11-07 2023-01-17 江苏北斗星通汽车电子有限公司 Vehicle-mounted gesture recognition system based on Leapmotion sensor
CN118675204A (en) * 2024-08-26 2024-09-20 杭州锐见智行科技有限公司 Hiss gesture detection method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620397A (en) * 2022-11-07 2023-01-17 江苏北斗星通汽车电子有限公司 Vehicle-mounted gesture recognition system based on Leapmotion sensor
CN118675204A (en) * 2024-08-26 2024-09-20 杭州锐见智行科技有限公司 Hiss gesture detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115063849A (en) Dynamic gesture vehicle control system and method based on deep learning
CN112085952B (en) Method and device for monitoring vehicle data, computer equipment and storage medium
US20220277558A1 (en) Cascaded Neural Network-Based Attention Detection Method, Computer Device, And Computer-Readable Storage Medium
WO2016157499A1 (en) Image processing apparatus, object detection apparatus, and image processing method
KR20210101313A (en) Face recognition method, neural network training method, apparatus and electronic device
WO2020100922A1 (en) Data distribution system, sensor device, and server
US20210155250A1 (en) Human-computer interaction method, vehicle-mounted device and readable storage medium
EP2722815A1 (en) Object recognition device
CN112597995B (en) License plate detection model training method, device, equipment and medium
CN112434566A (en) Passenger flow statistical method and device, electronic equipment and storage medium
KR20130106640A (en) Apparatus for trace of wanted criminal and missing person using image recognition and method thereof
WO2022179124A1 (en) Image restoration method and apparatus
CN111062311B (en) Pedestrian gesture recognition and interaction method based on depth-level separable convolution network
CN111339834B (en) Method for identifying vehicle driving direction, computer device and storage medium
CN108074395B (en) Identity recognition method and device
CN109388368B (en) Human-computer interaction method and device, unmanned vehicle and storage medium thereof
CN114120634B (en) Dangerous driving behavior identification method, device, equipment and storage medium based on WiFi
CN111600839B (en) Traffic accident handling method, equipment and storage medium
CN114743277A (en) Living body detection method, living body detection device, electronic apparatus, storage medium, and program product
CN114379582A (en) Method, system and storage medium for controlling respective automatic driving functions of vehicles
CN112699798A (en) Traffic police action recognition method and device with vehicle-road cooperation
CN113822115A (en) Image recognition method, image recognition device and computer-readable storage medium
Neelima et al. A computer vision model for vehicle detection in traffic surveillance
CN112969053A (en) In-vehicle information transmission method and device, vehicle-mounted equipment and storage medium
JP6922447B2 (en) Information processing system, server and communication method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination