CN113296604A - True 3D gesture interaction method based on convolutional neural network - Google Patents

True 3D gesture interaction method based on convolutional neural network Download PDF

Info

Publication number
CN113296604A
CN113296604A CN202110564285.1A CN202110564285A CN113296604A CN 113296604 A CN113296604 A CN 113296604A CN 202110564285 A CN202110564285 A CN 202110564285A CN 113296604 A CN113296604 A CN 113296604A
Authority
CN
China
Prior art keywords
gesture
model
neural network
real
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110564285.1A
Other languages
Chinese (zh)
Other versions
CN113296604B (en
Inventor
王琼华
张力
李小伟
李大海
马孝铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Beihang University
Original Assignee
Sichuan University
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University, Beihang University filed Critical Sichuan University
Priority to CN202110564285.1A priority Critical patent/CN113296604B/en
Publication of CN113296604A publication Critical patent/CN113296604A/en
Application granted granted Critical
Publication of CN113296604B publication Critical patent/CN113296604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a true 3D gesture interaction method based on a convolutional neural network. The method includes the steps that leapfunction is adopted to obtain gesture data, semantics of gestures are output by synthesizing a gesture instruction predicted by the leapfunction and a gesture instruction predicted by a trained neural network model, the gestures are interacted with 3D images, the interacted 3D images are rendered by utilizing back ray tracing, real-time rendering of the 3D models is achieved by adopting a space bounding box technology in the rendering process, and finally the rendered 3D models are displayed through an integrated imaging 3D display. According to the invention, the accuracy of gesture interaction can be obviously improved and the experience of a user can be improved while a real 3D scene is displayed.

Description

True 3D gesture interaction method based on convolutional neural network
One, the technical field
The invention belongs to the technical field of interaction, and particularly relates to a true 3D gesture interaction method based on a convolutional neural network.
Second, background Art
With the rapid development of the digital information era and the 3D display technology, the traditional display platform is gradually cooled down due to the unicity of the display form and the lack of the user experience, and instead, a new media technology is adopted, which can attract the eyes of the user and realize the user interaction experience. Therefore, people hope to have a 3D image interaction technology to support man-machine interaction. Gesture interaction is an ergonomic way of interaction that can quickly and naturally express some of the simple intentions of a user. However, the existing true 3D display platform is low in gesture recognition accuracy, poor in customer experience effect and not practical.
Third, the invention
The invention provides a true 3D gesture interaction method based on a convolutional neural network. The method includes the steps that somatosensory interaction equipment is used for obtaining gesture data, and gesture images and corresponding gesture semantics are input into a designed convolutional neural network for training. The final gesture semantics are subjected to modified gesture semantics by synthesizing the semantics output by the Leap Motion and the semantics output by the trained convolutional neural network, the gestures are defined, as shown in the attached drawing 1, the gestures are interacted with the 3D image, the interacted model is rendered by utilizing back ray tracing, the real-time rendering of the 3D model is realized by adopting a space bounding box technology in the rendering process, and finally the rendered 3D model is displayed by an integrated imaging 3D display. The method provides a gesture interaction mode with higher accuracy for the user while displaying a real 3D scene, and improves the experience of the user. The method comprises three processes of gesture interaction and gesture semantic correction of the 3D model, real-time rendering of the 3D model and real-time display of the 3D image.
In the gesture interaction process of the 3D model, the scaling factor of the 3D model is controlled by the five-finger varices, and as shown in fig. 100, the interaction gesture may be represented as:
Figure BDA0003080315220000021
wherein L, W, H represents the length, width, and height, S, respectively, of the maximum bounding box of the 3D model1、S2、S3Respectively representing L, W, H corresponding scaling factors.
By translating the movement of the palm control 3D model, the interactive gesture is as shown in fig. 102, and the translation operation on the three-dimensional affine coordinates can be represented as:
Figure BDA0003080315220000022
where x, y, z represent the coordinates of the 3D model centroid, Tx、Ty、TzThe offsets corresponding to x, y, z are indicated.
The rotation of the 3D model can be controlled by rotating the index finger, the interactive gesture is shown in fig. 101, and the three-dimensional affine coordinates are rotated around the x-axis and the y-axis, and the rotation operation around the z-axis can be respectively represented as:
Figure BDA0003080315220000023
Figure BDA0003080315220000024
Figure BDA0003080315220000025
wherein θ is the variation of the rotation angle of the palm center, and the gesture semantic correction process is to train a gesture set by deep learning, and the network structure of the gesture set is shown in fig. 2. A training sample containing M training samples is formed by capturing pictures of gestures in the interaction process and attaching labels
Figure BDA0003080315220000026
In which IiRepresenting the ith image, yi={yi0,yi1,yi2,...,yi(c-1)Is the corresponding annotation, if the sample is labeled as C category, then yicGiven an image, we can get a predicted component s, which is 1, and 0 otherwisei={si0,si1,si2,...,si(c-1)And calculating corresponding probability vector p by a softmax functioni={pi0,pi1,pi2,...,pi(c-1)},pic=softmax(sic) We take cross entropy as the loss function of the target:
Figure BDA0003080315220000031
the network model is trained by loss L in an end-to-end mode, specifically, a labeled data set is used for training, an Adam optimizer is used for optimizing, when the error L reaches a stable state, the network is trained completely, the training is stopped, then probability vectors calculated by Leap Motion are synthesized, different weights are defined according to different instructions, and finally a predicted value is calculated, wherein the process is shown in the attached figure 3.
In the real-time rendering process of the 3D model, firstly, lens parameters, integrated imaging 3D display parameters and the 3D model are input. After the parameters are input, a three-dimensional scene group is established, a virtual camera is established, an image plane is established to preprocess input data, whether an interactive instruction is detected in an interactive module is judged, and if the interactive instruction is detected, the parameters of the integrated light field visual model are changed and the integrated light field visual model enters a rendering module. In the rendering module, the real-time rendering of the 3D model is realized by using the ray tracing and bounding box technology.
In the 3D image display process, the micro image array is input into the integrated imaging 3D display to display a true 3D image with stereoscopic vision, and the overall effect diagram is shown in fig. 4, in which fig. 400 is a Leap Motion gesture information acquisition device, fig. 401 is a real-time information processing device, and fig. 402 is an integrated imaging 3D display.
The invention solves the technical problem of the gesture interaction technology of true 3D display based on the convolutional neural network, and provides a high-accuracy true 3D gesture interaction method.
Description of the drawings
FIG. 1 is a diagram of an interaction gesture.
FIG. 2 is a diagram of a convolutional neural network architecture in accordance with the present invention.
FIG. 3 is a schematic diagram of a convolutional neural network-based gesture recognition system according to the present invention.
FIG. 4 is a diagram of the overall effect of true 3D gesture interaction based on a convolutional neural network.
The figures of the drawings are numbered:
the system comprises a 100 five-finger varicosity gesture, a 101 index finger circling gesture, a 102 palm waving gesture, a 400Leap Motion gesture information acquisition device, a 401 real-time information processing device and a 402 integrated imaging 3D display.
It should be understood that the above-described figures are merely schematic and are not drawn to scale.
Fifth, detailed description of the invention
The following describes an exemplary embodiment of a convolutional neural network-based true 3D gesture interaction method, and further details the present invention. It should be noted that the following embodiments are only for illustrative purposes and should not be construed as limiting the scope of the present invention, and those skilled in the art will be able to make modifications and variations of the present invention without departing from the scope of the present invention.
The real 3D gesture interaction method based on integrated imaging specifically comprises three processes of gesture interaction of a 3D model, real-time rendering of the 3D model and real-time display of a 3D image.
In the gesture interaction process of the 3D model, the interaction command is three interaction gestures defined by calculating the moving direction, speed and displacement of the hand and the change conditions of a pitch angle, a roll angle, a yaw angle and the like according to the palm direction and normal vector detected by the Leap Motion device, the center and radius of a palm ball, the direction and position of fingers and the like, as shown in the attached drawing 1. In which the five fingers are flexed as shown in fig. 100, the index finger is rotated as shown in fig. 101, the palm is translated as shown in fig. 102, and the scaling, moving and rotating of the 3D model can be realized by performing matrix operation on the three-dimensional scene group of the 3D model. Scaling the three-dimensional affine coordinates can be represented as:
Figure BDA0003080315220000041
wherein L, W, H represents the length, width, and height, S, respectively, of the maximum bounding box of the 3D model1、S2、S3Respectively representing L, W, H corresponding scaling factors.
The translation operation on the three-dimensional affine coordinates can be expressed as:
Figure BDA0003080315220000042
where x, y, z represent the coordinates of the 3D model centroid, Tx、Ty、TzThe offsets corresponding to x, y, z are indicated.
The three-dimensional affine coordinates are rotated around an x axis and a y axis, and the rotation operation around the z axis can be respectively expressed as:
Figure BDA0003080315220000051
Figure BDA0003080315220000052
Figure BDA0003080315220000053
the method mainly comprises two parts of data preprocessing and convolutional neural network training, wherein the network structure is shown in figure 2, the data preprocessing is to compress and graye an obtained gesture image, finally, corresponding instruction labels are attached, and a training sample containing M training samples is established
Figure BDA0003080315220000054
In which IiRepresenting the ith image, yi={yi0,yi1,yi2,...,yi(c-1)The data set entered for the corresponding annotation. The convolutional neural network framework is composed of 8 convolutional layers in which the convolutional kernel size is 3 × 3 and ReLU is used as an activation function, and two fully-connected layers and 2 pooling layers. The convolutional layer performs feature extraction on the image, and finally generates a feature map with an output channel of 1024. Then inputting the probability vector into a full connection layer, introducing a Dropout mechanism in the full connection process, preventing over-fitting of the network, enhancing the robustness of the network, and finally calculating a corresponding probability vector p through a softmax functioni={pi0,pi1,pi2,...,pi(c-1)During the training process, we adopt cross entropy as a loss function of the target:
Figure BDA0003080315220000055
the network model is trained in an end-to-end mode through loss L, specifically, a labeled data set is used for training, an Adam optimizer is used for optimizing, when the error L reaches a stable state, the fact that the network is trained is indicated, and the training is stopped. And then testing different instructions, weighting the predicted value of the trained network model and the predicted value of the Leap Motion, wherein different instructions have different weight values, and finally obtaining the predicted value of the gesture interaction system, wherein the schematic block diagram of the gesture interaction system is shown in the attached figure 3.
In the real-time rendering process of the 3D model, firstly, lens parameters, display screen parameters and the 3D model are input. After the parameters are input, a three-dimensional scene group is established, a virtual camera is established, an image plane is established to preprocess input data, whether an interactive instruction is detected in an interactive module is judged, if the interactive instruction is detected, the parameters of the integrated light field vision model are changed and enter a rendering module, and the rendering module applies light ray tracing and bounding box technology.
In the display process of the 3D image, a real 3D image with an interactive function can be displayed on the integrated imaging 3D display by debugging related parameters between the integrated imaging 3D display and the image element, and an overall effect diagram of the display process is shown in fig. 4, in which fig. 400 is a Leap Motion gesture information acquisition device, fig. 401 is a real-time information processing device, and fig. 402 is an integrated imaging 3D display.

Claims (1)

1. A real 3D gesture interaction method based on a convolutional neural network is characterized in that a Leap Motion somatosensory controller is adopted to obtain gesture data information, the semantics of a gesture are output by integrating a gesture instruction predicted by the Leap Motion and a gesture instruction predicted by a trained neural network model, the three-dimensional affine coordinate of a 3D model is changed, the three-dimensional affine coordinate of the 3D model is enabled to realize the functions of zooming, translation and rotation interaction, a back ray tracing is utilized to render an interacted 3D image, the real-time rendering of the 3D model is realized by adopting a space bounding box technology in the rendering process, and finally the rendered 3D model is displayed through an integrated imaging 3D display; the specific steps are as follows:
the method comprises the following steps: acquiring gesture data information by adopting a Leap Motion somatosensory controller, outputting the semantics of a gesture by integrating a gesture instruction predicted by the Leap Motion and a gesture instruction predicted by a trained neural network model, defining three interactive gestures of five-finger varicose, palm translation and index finger rotation, and realizing the scaling, moving and rotating of the 3D model by performing matrix transformation on three-dimensional affine coordinates of the 3D model;
step two: real-time rendering of the 3D model, wherein a rendering module uses a ray tracing and bounding box technology to calculate an incident radiance value of a closest collision point of a ray emitted by a ray emitting plane and the surface of the 3D model as a color value of a plane pixel of a corresponding unit image array to generate a micro-image array;
step three: and displaying the 3D image in real time, inputting the micro-image array generated in the step into the integrated imaging 3D display, and displaying a true 3D image with stereoscopic vision.
CN202110564285.1A 2021-05-24 2021-05-24 True 3D gesture interaction method based on convolutional neural network Active CN113296604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110564285.1A CN113296604B (en) 2021-05-24 2021-05-24 True 3D gesture interaction method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110564285.1A CN113296604B (en) 2021-05-24 2021-05-24 True 3D gesture interaction method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN113296604A true CN113296604A (en) 2021-08-24
CN113296604B CN113296604B (en) 2022-07-08

Family

ID=77324142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110564285.1A Active CN113296604B (en) 2021-05-24 2021-05-24 True 3D gesture interaction method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN113296604B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050859A (en) * 2014-05-08 2014-09-17 南京大学 Interactive digital stereoscopic sand table system
CN108182728A (en) * 2018-01-19 2018-06-19 武汉理工大学 A kind of online body-sensing three-dimensional modeling method and system based on Leap Motion
US20190107894A1 (en) * 2017-10-07 2019-04-11 Tata Consultancy Services Limited System and method for deep learning based hand gesture recognition in first person view
CN109657634A (en) * 2018-12-26 2019-04-19 中国地质大学(武汉) A kind of 3D gesture identification method and system based on depth convolutional neural networks
CN109933097A (en) * 2016-11-21 2019-06-25 清华大学深圳研究生院 A kind of robot for space remote control system based on three-dimension gesture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050859A (en) * 2014-05-08 2014-09-17 南京大学 Interactive digital stereoscopic sand table system
CN109933097A (en) * 2016-11-21 2019-06-25 清华大学深圳研究生院 A kind of robot for space remote control system based on three-dimension gesture
US20190107894A1 (en) * 2017-10-07 2019-04-11 Tata Consultancy Services Limited System and method for deep learning based hand gesture recognition in first person view
CN108182728A (en) * 2018-01-19 2018-06-19 武汉理工大学 A kind of online body-sensing three-dimensional modeling method and system based on Leap Motion
CN109657634A (en) * 2018-12-26 2019-04-19 中国地质大学(武汉) A kind of 3D gesture identification method and system based on depth convolutional neural networks

Also Published As

Publication number Publication date
CN113296604B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
Liu et al. Semantic-aware implicit neural audio-driven video portrait generation
US11600013B2 (en) Facial features tracker with advanced training for natural rendering of human faces in real-time
Memo et al. Head-mounted gesture controlled interface for human-computer interaction
US11727596B1 (en) Controllable video characters with natural motions extracted from real-world videos
TWI654539B (en) Virtual reality interaction method, device and system
Dash et al. Designing of marker-based augmented reality learning environment for kids using convolutional neural network architecture
US20220301295A1 (en) Recurrent multi-task convolutional neural network architecture
CN111124117B (en) Augmented reality interaction method and device based on sketch of hand drawing
CN113255457A (en) Animation character facial expression generation method and system based on facial expression recognition
CN113034652A (en) Virtual image driving method, device, equipment and storage medium
CN112954292B (en) Digital museum navigation system and method based on augmented reality
WO2024001095A1 (en) Facial expression recognition method, terminal device and storage medium
CN115008454A (en) Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement
Albanis et al. Dronepose: photorealistic uav-assistant dataset synthesis for 3d pose estimation via a smooth silhouette loss
Bhakar et al. A review on classifications of tracking systems in augmented reality
CN113296604B (en) True 3D gesture interaction method based on convolutional neural network
CN114049678B (en) Facial motion capturing method and system based on deep learning
CN113076918B (en) Video-based facial expression cloning method
KR20230078502A (en) Apparatus and method for image processing
CN115761143A (en) 3D virtual reloading model generation method and device based on 2D image
TW202311815A (en) Display of digital media content on physical surface
CN115222917A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN114779942A (en) Virtual reality immersive interaction system, equipment and method
Okamoto et al. Assembly assisted by augmented reality (A 3 R)
EP4275176A1 (en) Three-dimensional scan registration with deformable models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant