CN115063849A - Dynamic gesture vehicle control system and method based on deep learning - Google Patents
Dynamic gesture vehicle control system and method based on deep learning Download PDFInfo
- Publication number
- CN115063849A CN115063849A CN202210561028.7A CN202210561028A CN115063849A CN 115063849 A CN115063849 A CN 115063849A CN 202210561028 A CN202210561028 A CN 202210561028A CN 115063849 A CN115063849 A CN 115063849A
- Authority
- CN
- China
- Prior art keywords
- module
- vehicle
- gesture
- face
- dynamic gesture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000004891 communication Methods 0.000 claims abstract description 17
- 230000003068 static effect Effects 0.000 claims abstract description 7
- 238000012795 verification Methods 0.000 claims abstract description 6
- 238000007499 fusion processing Methods 0.000 claims abstract description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 25
- 238000004364 calculation method Methods 0.000 claims description 22
- 238000001514 detection method Methods 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 21
- 239000000284 extract Substances 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 10
- 230000009471 action Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000002401 inhibitory effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 13
- 230000003993 interaction Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
A dynamic gesture vehicle control system and method based on deep learning relates to the technical field of deep learning, solves the problems of low accuracy and low recognition speed of existing dynamic gesture recognition, and can be applied to a medium-high-end vehicle control system. The system comprises a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP protocol. The dynamic gesture car control method comprises the following steps: acquiring and identifying a static face image through a high-precision camera, and uploading the face signal to a cloud terminal through an ACP (advanced carrier protocol) by a vehicle-mounted communication terminal for identity verification; capturing a dynamic gesture image of a user, and analyzing and calculating an accurate gesture signal in real time through a gesture recognition module; the data are transmitted to a vehicle-mounted computing unit through Ethernet for fusion processing, and are sent to a line control unit to realize vehicle control inside and outside the vehicle.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to a dynamic gesture vehicle control technology based on deep learning.
Background
With the economic development and the continuous improvement of the living standard of people, the sales quantity of vehicles continuously rises, the parking space resources are increasingly tense, the demand of controlling the vehicles to park outside the vehicle is increasingly obvious, the comfort requirement of controlling inside the vehicle is increasingly diversified, and the vehicle is an important configuration of middle and high-end automobiles. Therefore, the gesture car control method becomes one of the important directions for various large car factories to pursue innovation.
At present, aiming at an external vehicle control system, the external vehicle control system only stays at an intelligent terminal, for example, a mobile phone APP and other layers, terminal control often depends on a cloud to issue a control instruction, and under the premise that the system performance optimization degree is low, the external vehicle control system has the defects of overlong remote control link, high resource consumption, poor user experience, overtime, abnormal instruction and the like. In addition, aiming at in-vehicle control, compared with a touch key method, the gesture recognition method can bring better human-computer interaction experience to a user, ensure that the user concentrates attention to drive the vehicle, and achieve three-dimensional space stereo control to replace two-dimensional plane control. However, the existing dynamic gesture empty car technology has low gesture recognition accuracy and slow recognition speed, and seriously influences user experience.
Disclosure of Invention
The invention provides a dynamic gesture vehicle control system and method based on deep learning, and aims to solve the problems that in the prior art, the dynamic gesture recognition accuracy is not high and the recognition speed is low.
The technical scheme of the invention is as follows:
a dynamic gesture vehicle control system based on deep learning comprises a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP (application program carrier) protocol;
the gesture recognition module includes: the system comprises a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;
the dynamic gesture database is used for capturing and storing commonly used gesture information of a user, the prediction box is enabled to be more accurately fitted with the real box through the cooperation of the loss function calculation module and the gesture action positioning module, the channel attention module is used for outputting a feature detection result, and the depth separable convolutional neural network module is used for conducting depth training on the gesture information of the user, so that rapid feature extraction and user gesture recognition operation are achieved.
Preferably, the face recognition module includes: the device comprises a cascade convolution neural network, a deep convolution neural network, a measurement module, a normalization calculation module and a loss function calculation module;
the face recognition module extracts image information of a target face from a static face image collected by a camera through a cascaded convolutional neural network, extracts a depth characteristic vector from the face image through a depth convolutional neural network module, judges the correlation degree of the characteristic vector through a longitude quantity module, calculates and integrates accurate face data through the face characteristic data through a normalization calculation module and a loss function calculation module, compares the face characteristic data with a cloud face database, and finally achieves the purpose of face recognition.
Preferably, the deep separable convolutional neural network module performs deep fusion by using a deep convolutional sum 1 × 1 convolution, the first step and the last step of the deep separable convolutional neural network module both use 1 × 1 convolution, the intermediate step fuses shallow features by using a ResNet feature fusion concept, and the number of network parameters is reduced by compressing the number of channels.
Preferably, the loss function calculating module is configured to calculate a similarity loss of the aspect ratio of each of the detection frame and the real frame, and the specific calculating method includes:
wherein,
b and b 1 Respectively representing the center points of the detection frame and the real frame, p is Euclidean distance, c is the distance between the vertexes with the farthest distance between the detection frame and the real frame, and IoU is b and b 1 X and y represent the width and height of the detection box, respectively, x 1 And y 1 Respectively representing the width and height of the real box;
the gesture motion positioning module replaces a boundary box prediction loss item in a traditional positioning algorithm by loss, and an improved loss function L consists of three parts, namely an error, a confidence error and a classification error:
L=L C +L con +L s ,
wherein,
wherein s is 2 Representing the number of grids, the number of bounding boxes,representing whether the object falls within the jth bounding box of the ith grid,the representative is not a member of the group,represents that has fallen into; the network employs a fusion of dimension features 13 receptacle13 and 26 receptacle26, thus s 2 =13 2 And s 2 =26 2 ,B=2。
Preferably, the channel attention module adopts two channels, the channel attention module is added after two outputs of different scales of the two channels are processed, different weights are distributed, the features of the two different scales are output, and a final detection result is obtained through non-maximum suppression.
Preferably, the identity authentication module is a digital signature authentication system based on a public key cryptosystem, human face images are acquired through high-precision cameras inside and outside a vehicle, image data are transmitted to a vehicle-mounted communication terminal TBOX through a CAN (controller area network) line after being preprocessed, the TBOX sends data to a cloud platform TSP (Total suspended particulate) for primary human face image storage and generating a unique identification code, a primary digital certificate is applied to the authentication system PKI, identity verification is performed on an identity registration request of a user through an RA (random access) module in the PKI system, then a CA (certificate Authority) module acquires and issues a digital certificate for the user from a certificate bank, and then identity information of the user and a public key of the user are bound in a form of the digital certificate, so that user identity authentication is achieved.
Preferably, the vehicle control module comprises a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit and a line control unit; the vehicle-mounted high-precision camera extracts accurate gesture image signals through a gesture recognition algorithm, transmits the accurate gesture image signals to the vehicle-mounted computing unit through the communication network module in real time, filters, calculates and processes the gesture signals, integrates other anti-collision signals sensed through the sensor, and uniformly sends the anti-collision signals to the line control unit to realize control of gestures on vehicles.
Preferably, the control of the vehicle by the gesture comprises external control and in-vehicle control of the vehicle, the external control comprises reversing, braking and steering, and the in-vehicle control comprises control of an in-vehicle electronic seat, a vehicle machine screen, an in-vehicle sound box and an atmosphere lamp.
A dynamic gesture car control method based on deep learning is applied to the dynamic gesture car control system, and comprises the following steps:
s1, acquiring and recognizing a static face image through a high-precision camera, and uploading the face signal to a cloud end through an ACP (advanced carrier protocol) by the vehicle-mounted communication terminal for identity verification;
s2, after passing the identity authentication, capturing the dynamic gesture image of the user, analyzing and calculating the accurate gesture signal in real time through the gesture recognition module,
and S3, transmitting the gesture signals to the vehicle-mounted computing unit through the Ethernet for fusion processing, and then transmitting the signals to the drive-by-wire unit to realize the control of the inside and outside of the vehicle.
Compared with the prior art, the method solves the problems of low accuracy and low recognition speed of dynamic gesture recognition, and has the following specific beneficial effects:
1. the dynamic gesture vehicle control system based on deep learning provided by the invention aims at the current situation that the parking space resources are insufficient and a user cannot enter the vehicle in an effective space, so that the gesture vehicle parking and parking functions of the user outside the vehicle are realized, and the vehicle can be controlled to park in and out in a narrow parking scene; and utilize dynamic gesture recognition technology under the scene in the car, realize combining gesture recognition ability and interior line control ability in the car, need not touch instrument or car screen and realize controlling, when guaranteeing that the car owner concentrates on the vehicle that traveles, greatly improved human-computer interaction experience.
2. According to the dynamic gesture vehicle control system based on deep learning, provided by the invention, under a complex background, network model overhead is reduced at a terminal through a convolutional neural network algorithm, the accuracy of dynamic gesture recognition is improved by adopting a video gesture action positioning algorithm based on deep learning, meanwhile, a dynamic gesture database is established for training a gesture model, and through algorithms and means such as a convolutional neural network, a loss function, channel attention and the like, on the premise of guaranteeing real-time performance and terminal capability and according with a user scene, the gesture recognition accuracy is improved, a dynamic gesture database customized by a user is input, and simple gesture operation of the user can be recognized more quickly after deep training so as to fit the user gesture vehicle control scene.
3. According to the dynamic gesture vehicle control method based on deep learning, provided by the invention, the face recognition and the dynamic gesture recognition are combined to provide a low-complexity and accurate image signal, and in addition, the cloud authentication system and the vehicle end vehicle control system are combined to realize the whole gesture vehicle control system in the real sense, so that the whole human-computer interaction performance is improved, and the safe driving of a user is ensured.
Drawings
Fig. 1 is a schematic structural diagram of a dynamic gesture car control system according to embodiment 1;
fig. 2 is a schematic structural diagram of a face recognition module according to embodiment 2 of the present invention;
fig. 3 is a schematic structural diagram of gesture recognition according to embodiment 1 of the present invention;
fig. 4 is a schematic structural diagram of a separate convolutional neural network module according to embodiment 3 of the present invention;
fig. 5 is a schematic structural diagram of a channel attention module according to embodiment 5 of the present invention;
fig. 6 is a schematic structural diagram of an identity authentication module according to embodiment 6 of the present invention;
fig. 7 is a schematic structural diagram of a vehicle control module according to embodiment 7 of the present invention.
Detailed Description
In order to make the technical solutions of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the specification of the present invention, and it should be noted that the following embodiments are only used for better understanding of the technical solutions of the present invention, and should not be construed as limiting the present invention.
Example 1.
The embodiment provides a dynamic gesture vehicle control system based on deep learning, which is shown in a structural schematic diagram of fig. 1 and comprises a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP protocol.
The embodiment provides a dynamic gesture car control system, and the system combines face recognition and dynamic gesture recognition to provide low-complexity and accurate image signals, and combines a cloud authentication system and a car end car control system, so that the whole gesture car control system in the true sense is realized, the whole human-computer interaction performance is improved, and the safe driving of a user is guaranteed.
The gesture recognition module includes: the system comprises a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;
the dynamic gesture database is used for capturing and storing user common gesture information, the prediction box is enabled to be more accurately fitted with the real box through the loss function calculation module and the gesture action positioning module, the channel attention module models the relation between different characteristic channels, the network expenditure is reduced through the deep separable convolutional neural network module, and the rapid characteristic extraction and the user gesture recognition operation are achieved after deep training.
Fig. 3 is a schematic structural diagram of the gesture recognition module provided in this embodiment, in which the deep separable convolutional neural network module reduces network overhead, greatly improves network processing speed and recognition accuracy, guarantees system real-time performance, and is favorable for use in a mobile terminal and an embedded device. The gesture action positioning module overcomes the defect that the gradient of a traditional positioning method is 0 and different objects cannot be aligned, distance information between the central points of the real frame and the detection frame is considered, and similarity loss terms of the aspect ratios of the detection frame and the real frame are added, so that the prediction frame can be more accurately fitted with the real frame.
The dynamic gesture database can be better fit with an actual gesture car control scene, and a camera is used for collecting gesture images and background data, wherein the background data comprises data such as illumination intensity and angles. In addition, the dynamic gesture database has difference in time dimension so as to meet the operation habits of different users. The common gesture information of the user is captured through the database, the recognition accuracy can be improved through deep training, and the user can still extract and recognize the common gesture information from the database when the network is abnormal. According to the method and the device, through algorithms and means such as a convolutional neural network, a loss function and channel attention, on the premise that real-time performance and terminal capacity are guaranteed, a user scene is met, gesture recognition accuracy is improved, a dynamic gesture database customized by a user is input, simple gesture operation of the user can be recognized more quickly after deep training, and the user gesture vehicle control scene is fit.
Example 2.
This embodiment is a further example of embodiment 1, and the face recognition module includes: the device comprises a cascade convolution neural network, a deep convolution neural network, a measurement module, a normalization calculation module and a loss function calculation module;
fig. 2 is a schematic structural diagram of a face recognition module provided in this embodiment, the face recognition module is based on a facenet pearson discrimination network, the face recognition module extracts a static face image acquired by a camera through a cascaded convolutional neural network, extracts image information of a target face through a deep convolutional neural network module, extracts a depth feature vector in the face image through the deep convolutional neural network module, and the longitude quantity module judges the degree of correlation of the feature vector, and finally calculates and integrates accurate face data through a normalization calculation module and a loss function calculation module for the face feature data, and compares the accurate face data with a cloud face database for authentication, thereby finally achieving the purpose of face recognition.
Example 3.
This embodiment is a further illustration of embodiment 1, in which the deep separable convolutional neural network module performs deep fusion by using a deep convolutional sum 1 × 1 convolution, the first step and the last step of the deep separable convolutional neural network module both use 1 × 1 convolution, the intermediate step uses a ResNet feature fusion concept to fuse shallow features, and the number of network parameters is reduced by compressing the number of channels.
The schematic diagram of the depth separable convolutional neural network module in this embodiment is shown in fig. 4, the first step is favorable for performing dimension enhancement on input features to avoid feature loss caused by nonlinear activation, the shallow features are fused by using the ResNet feature fusion concept in the middle, the number of network parameters is reduced by compressing the number of channels, and the fused feature number is reduced in the last step.
Example 4.
This embodiment is a further example of embodiment 1, where the loss function calculation module is configured to calculate a similarity loss between the aspect ratio of the detection frame and the actual frame, and the specific calculation method is as follows:
wherein,
b and b 1 Respectively representing the central points of the detection frame and the real frame, rho is the Euclidean distance, c is the distance between the vertex points with the farthest distance between the detection frame and the real frame, IoU is b and b 1 X and y represent the width and height of the detection box, respectively, x 1 And y 1 Respectively representing the width and height of the real box;
the gesture motion positioning module replaces a boundary box prediction loss item in a traditional positioning algorithm by loss, and an improved loss function L consists of three parts, namely an error, a confidence error and a classification error:
L=L C +L con +L s ,
wherein,
wherein s is 2 Representing the number of grids, the number of bounding boxes,representing whether the object falls within the jth bounding box of the ith grid,the representative is not a member of the group,represents that has fallen into; the network employs a fusion of dimension features 13 receptacle13 and 26 receptacle26, thus s 2 =13 2 And s 2 =26 2 ,B=2。
Example 5.
This embodiment is a further example of embodiment 1, where the channel attention module uses two channels, adds the channel attention module after two different scales of outputs of the two channels, and obtains a final detection result by distributing different weights to feature outputs of the two different scales and suppressing a non-maximum value.
Because the information of different channels has different feature expression capacities for the gesture target, the channel attention module models the relationship between different feature channels, highlights the weight of key information and removes irrelevant information, and improves the accuracy of gesture detection. In order to improve the network performance, the present embodiment provides a dual-channel attention module, as shown in fig. 5, the gesture image passes through the preprocessing module, the dual-channel attention module is added after two different scales of outputs, different weights are assigned, and a final detection result is obtained by performing non-maximum suppression on the feature outputs of the two different scales of s1 and s 2.
Example 6.
In this embodiment, as a further example of embodiment 1, fig. 6 is a schematic structural diagram of an identity authentication module provided in an embodiment of the present invention, where the identity authentication module is a digital signature authentication system based on a public key cryptosystem, a human face image is acquired by using high-precision cameras inside and outside a vehicle, image data is preprocessed and then transmitted to a vehicle-mounted communication terminal TBOX through a CAN line, the TBOX transfers the data to a cloud platform TSP for primary human face image storage and generates a unique identification code, a primary digital certificate is applied to an authentication system PKI, in the PKI system, an identity registration request of a user is first authenticated through an RA module, then a CA module acquires and issues a digital certificate from a certificate repository for the user, and further identity information of the user and a public key of the user are bound in the form of the digital certificate, so as to implement user identity authentication.
Example 7.
In this embodiment, to further illustrate embodiment 1, fig. 7 is a schematic structural diagram of a vehicle control module provided in this embodiment, where the vehicle control module includes a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit, and a line control unit; the vehicle-mounted high-precision camera extracts accurate gesture image signals through a gesture recognition algorithm, transmits the accurate gesture image signals to the vehicle-mounted computing unit through the communication network module in real time, filters, calculates and processes the gesture signals, integrates other anti-collision signals sensed through the sensor, and uniformly sends the anti-collision signals to the line control unit to realize control of gestures on vehicles.
Example 8.
This embodiment is a further illustration of embodiment 7, where the control of the vehicle by the gesture includes external control and in-vehicle control of the vehicle, where the external control includes reversing, braking, and steering, and the in-vehicle control includes control of an in-vehicle electronic seat, an in-vehicle screen, an in-vehicle audio, and an ambience light.
Example 9.
The embodiment provides a dynamic gesture vehicle control method based on deep learning, which is applied to the dynamic gesture vehicle control system in any one of embodiments 1 to 8, and comprises the following steps:
s1, acquiring and recognizing a static face image through a high-precision camera, and uploading the face signal to a cloud end through an ACP (advanced carrier protocol) by the vehicle-mounted communication terminal for identity verification;
s2, after passing the identity authentication, capturing the dynamic gesture image of the user, analyzing and calculating the accurate gesture signal in real time through the gesture recognition module,
and S3, transmitting the gesture signals to the vehicle-mounted computing unit through the Ethernet for fusion processing, and then transmitting the signals to the drive-by-wire unit to realize the control of the inside and outside of the vehicle.
According to the dynamic gesture vehicle control method based on deep learning, the method combines face recognition and dynamic gesture recognition to provide low-complexity and accurate image signals, and combines a cloud authentication system and a vehicle-end vehicle control system, so that the whole gesture vehicle control system in the true sense is realized, the whole human-computer interaction performance is improved, and the safe driving of a user is guaranteed.
Claims (9)
1. A dynamic gesture vehicle control system based on deep learning is characterized by comprising a vehicle-mounted terminal and a cloud platform; the vehicle-mounted terminal comprises a face recognition module, a gesture recognition module and a vehicle control module, and all the modules are in communication connection through the Ethernet; the cloud platform comprises an identity authentication module, and the identity authentication module is connected with the vehicle-mounted terminal through an ACP (application program carrier) protocol;
the gesture recognition module includes: the system comprises a depth separable convolutional neural network module, a gesture action positioning module, a loss function calculation module, a channel attention module and a dynamic gesture database;
the dynamic gesture database is used for capturing and storing commonly used gesture information of a user, the prediction box is enabled to be more accurately fitted with the real box through the cooperation of the loss function calculation module and the gesture action positioning module, the channel attention module is used for outputting a feature detection result, and the depth separable convolutional neural network module is used for conducting depth training on the gesture information of the user, so that rapid feature extraction and user gesture recognition operation are achieved.
2. The deep learning based dynamic gesture vehicle control system according to claim 1, wherein the face recognition module comprises: the device comprises a cascade convolution neural network, a deep convolution neural network, a measurement module, a normalization calculation module and a loss function calculation module;
the face recognition module extracts image information of a target face from a static face image collected by a camera through a cascaded convolutional neural network, extracts a depth characteristic vector from the face image through a depth convolutional neural network module, judges the correlation degree of the characteristic vector through a longitude quantity module, calculates and integrates accurate face data through the face characteristic data through a normalization calculation module and a loss function calculation module, compares the face characteristic data with a cloud face database, and finally achieves the purpose of face recognition.
3. The deep learning-based dynamic gesture vehicle control system according to claim 1, wherein the deep separable convolutional neural network module performs deep fusion by using deep convolution and 1 x 1 convolution, the first step and the last step of the deep separable convolutional neural network module both use 1 x 1 convolution, the intermediate step uses the ResNet feature fusion concept to fuse shallow features, and the number of network parameters is reduced by compressing the number of channels.
4. The dynamic gesture vehicle control system based on deep learning of claim 1, wherein the loss function calculation module is configured to calculate a similarity loss of respective aspect ratios of the detection frame and the real frame, and the specific calculation method is as follows:
wherein,
b and b 1 Respectively representing the center points of the detection frame and the real frame, p is Euclidean distance, c is the distance between the vertexes with the farthest distance between the detection frame and the real frame, and IoU is b and b 1 X and y represent the width and height of the detection box, respectively, x 1 And y 1 Respectively representing the width and height of the real box;
the gesture motion positioning module replaces a boundary box prediction loss item in a traditional positioning algorithm by loss, and an improved loss function L consists of three parts, namely an error, a confidence error and a classification error:
L=L c +L con +L s ,
wherein,
wherein s is 2 Represents the number of grids, B represents the number of bounding boxes,representing whether the object falls within the jth bounding box of the ith grid,the representative is not a member of the group,represents that has fallen into; the network employs a fusion of dimension features 13 receptacle13 and 26 receptacle26, thus s 2 =13 2 And s 2 =26 2 ,B=2。
5. The dynamic gesture vehicle control system based on deep learning of claim 1, wherein the channel attention module adopts two channels, the channel attention module is added after the output of two different scales of the two channels, the final detection result is obtained by distributing different weights to the feature output of two different scales and inhibiting by non-maximum value.
6. The dynamic gesture vehicle control system based on deep learning of claim 1, wherein the identity authentication module is a digital signature authentication system based on a public key cryptosystem, the human face images are acquired through high-precision cameras inside and outside the vehicle, the image data are transmitted to a vehicle-mounted communication terminal TBOX through a CAN (controller area network) line after being preprocessed, the data are transmitted to a cloud platform TSP (short message service) by the TBOX to be subjected to primary human face image storage and generate a unique identification code, a primary digital certificate is applied to the authentication system PKI, in the PKI system, firstly, identity verification is performed on an identity registration request of a user through an RA (random access) module, then, a CA (certificate authority) module acquires and issues a digital certificate for the user from a certificate bank, and further, identity information of the user and a public key of the user are bound through the form of the digital certificate, so that user identity authentication is achieved.
7. The deep learning-based dynamic gesture vehicle control system according to claim 1, wherein the vehicle control module comprises a vehicle-mounted high-precision camera, a network communication module, a vehicle-mounted computing unit and a line control unit; the vehicle-mounted high-precision camera extracts accurate gesture image signals through a gesture recognition algorithm, transmits the accurate gesture image signals to the vehicle-mounted computing unit through the communication network module in real time, filters, calculates and processes the gesture signals, integrates other anti-collision signals sensed through the sensor, and uniformly sends the anti-collision signals to the line control unit to realize control of gestures on vehicles.
8. The deep learning based dynamic gesture car control system according to claim 7, wherein the control of the vehicle by the gesture comprises external control and in-car control of the vehicle, the external control comprises reversing, braking and steering, and the in-car control comprises control of in-car electronic seats, car screen, in-car stereo and atmosphere lights.
9. A dynamic gesture car control method based on deep learning is characterized in that the dynamic gesture car control system of any one of claims 1-8 is applied, and the dynamic gesture car control method comprises the following steps:
s1, acquiring and recognizing a static face image through a high-precision camera, and uploading the face signal to a cloud end through an ACP (advanced carrier protocol) by the vehicle-mounted communication terminal for identity verification;
s2, after passing the identity authentication, capturing the dynamic gesture image of the user, analyzing and calculating the accurate gesture signal in real time through the gesture recognition module,
and S3, transmitting the gesture signals to the vehicle-mounted computing unit through the Ethernet for fusion processing, and then transmitting the signals to the drive-by-wire unit to realize the control of the inside and outside of the vehicle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210561028.7A CN115063849A (en) | 2022-05-23 | 2022-05-23 | Dynamic gesture vehicle control system and method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210561028.7A CN115063849A (en) | 2022-05-23 | 2022-05-23 | Dynamic gesture vehicle control system and method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115063849A true CN115063849A (en) | 2022-09-16 |
Family
ID=83198308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210561028.7A Pending CN115063849A (en) | 2022-05-23 | 2022-05-23 | Dynamic gesture vehicle control system and method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115063849A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115620397A (en) * | 2022-11-07 | 2023-01-17 | 江苏北斗星通汽车电子有限公司 | Vehicle-mounted gesture recognition system based on Leapmotion sensor |
CN118675204A (en) * | 2024-08-26 | 2024-09-20 | 杭州锐见智行科技有限公司 | Hiss gesture detection method and device, electronic equipment and storage medium |
-
2022
- 2022-05-23 CN CN202210561028.7A patent/CN115063849A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115620397A (en) * | 2022-11-07 | 2023-01-17 | 江苏北斗星通汽车电子有限公司 | Vehicle-mounted gesture recognition system based on Leapmotion sensor |
CN118675204A (en) * | 2024-08-26 | 2024-09-20 | 杭州锐见智行科技有限公司 | Hiss gesture detection method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115063849A (en) | Dynamic gesture vehicle control system and method based on deep learning | |
CN112085952B (en) | Method and device for monitoring vehicle data, computer equipment and storage medium | |
US20220277558A1 (en) | Cascaded Neural Network-Based Attention Detection Method, Computer Device, And Computer-Readable Storage Medium | |
WO2016157499A1 (en) | Image processing apparatus, object detection apparatus, and image processing method | |
KR20210101313A (en) | Face recognition method, neural network training method, apparatus and electronic device | |
WO2020100922A1 (en) | Data distribution system, sensor device, and server | |
US20210155250A1 (en) | Human-computer interaction method, vehicle-mounted device and readable storage medium | |
EP2722815A1 (en) | Object recognition device | |
CN112597995B (en) | License plate detection model training method, device, equipment and medium | |
CN112434566A (en) | Passenger flow statistical method and device, electronic equipment and storage medium | |
KR20130106640A (en) | Apparatus for trace of wanted criminal and missing person using image recognition and method thereof | |
WO2022179124A1 (en) | Image restoration method and apparatus | |
CN111062311B (en) | Pedestrian gesture recognition and interaction method based on depth-level separable convolution network | |
CN111339834B (en) | Method for identifying vehicle driving direction, computer device and storage medium | |
CN108074395B (en) | Identity recognition method and device | |
CN109388368B (en) | Human-computer interaction method and device, unmanned vehicle and storage medium thereof | |
CN114120634B (en) | Dangerous driving behavior identification method, device, equipment and storage medium based on WiFi | |
CN111600839B (en) | Traffic accident handling method, equipment and storage medium | |
CN114743277A (en) | Living body detection method, living body detection device, electronic apparatus, storage medium, and program product | |
CN114379582A (en) | Method, system and storage medium for controlling respective automatic driving functions of vehicles | |
CN112699798A (en) | Traffic police action recognition method and device with vehicle-road cooperation | |
CN113822115A (en) | Image recognition method, image recognition device and computer-readable storage medium | |
Neelima et al. | A computer vision model for vehicle detection in traffic surveillance | |
CN112969053A (en) | In-vehicle information transmission method and device, vehicle-mounted equipment and storage medium | |
JP6922447B2 (en) | Information processing system, server and communication method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |