CN116403280A - Monocular camera augmented reality gesture interaction method based on key point detection - Google Patents

Monocular camera augmented reality gesture interaction method based on key point detection Download PDF

Info

Publication number
CN116403280A
CN116403280A CN202310309434.9A CN202310309434A CN116403280A CN 116403280 A CN116403280 A CN 116403280A CN 202310309434 A CN202310309434 A CN 202310309434A CN 116403280 A CN116403280 A CN 116403280A
Authority
CN
China
Prior art keywords
hand
virtual
key
key point
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310309434.9A
Other languages
Chinese (zh)
Inventor
张玉梅
肖跃灵
吴晓军
李鼎钺
戎宇莹
赵焱青
刘诗轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN202310309434.9A priority Critical patent/CN116403280A/en
Publication of CN116403280A publication Critical patent/CN116403280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

A monocular camera augmented reality gesture interaction method based on key point detection comprises the steps of obtaining an input image, detecting hand key point coordinates, determining character string data, transmitting the character string data S, constructing a virtual hand model, storing the character string data S, performing virtual hand motion, triggering gesture interaction of augmented reality and evaluating performance. According to the invention, the character string data are determined, the character string data S are stored, the virtual hand motion is triggered, the gesture interaction of the augmented reality is realized, the hand key point detection network is used, the network is applied to the gesture tracking recognition of the monocular camera of the augmented reality, the characteristics of the hand of the current frame are effectively screened, and meanwhile, the data standardization is carried out on the tracked hand key point information, so that the data are more standard, the motion of the hand of the virtual world is conveniently driven, and the real-time performance is high. The gesture tracking method has the advantages of higher instantaneity, stronger immersion, low equipment cost and the like, and can be used for gesture tracking and recognition in different backgrounds.

Description

Monocular camera augmented reality gesture interaction method based on key point detection
Technical Field
The invention belongs to the technical field of augmented reality interaction, and particularly relates to a method for tracking, identifying and transmitting data by gestures.
Background
In recent years, with the gradual popularization of virtual reality equipment, virtual reality interaction has become a very active topic and research hot spot, economy and society develop at a high speed, almost everyone is under more or less pressure and anxiety, in the virtual reality world, people can temporarily draw away from the current emotion world and enter a brand new and novel virtual world, for virtual reality experience, the immersion of the virtual reality world is very important, the immersion comes from the immersion and naturalness of interaction to a great extent, and the interaction experience closer to reality can bring stronger immersion to people, so that the work of researching closer to natural interaction has very profound significance.
The interaction is mostly related to the hand, and the hand movement of human is mainly to control the movement of fingers through muscles, and the muscles and tendons are driven by nerves to move bones. The human hand bone comprises carpal bones, metacarpal bones and phalanges, the joints between the multi-metacarpal joints and the phalanges of the finger joints are mainly provided with functions of bending, stretching, retracting, expanding and rotating, the gesture of the hand is greatly dependent on the position of the hand joint, if the position of the hand joint point in reality can be obtained through a monocular camera and is transmitted into a virtual hand in the virtual world, various forms of the hand of the virtual hand can be controlled, and the hand movement in the virtual world can be directly controlled through the movement of the hand in reality.
The interaction modes of virtual reality are various, and most traditional interaction is performed through a game handle, but the interaction modes are that various buttons on the handle are pressed to complete interaction, and the technical problems of great difference from direct interaction of hands in the real world and low immersion are solved; based on the interaction of the data glove, the immersion feeling is strong, but the problem of higher experience cost exists, and the wide popularization has great difficulty.
The interaction based on monocular camera machine vision is a new research direction in the field of gesture interaction, and is introduced into machine learning to enable the interaction to be closer to an original target, namely artificial intelligence. The deep learning method comprises an artificial neural network, a convolutional neural network and a cyclic neural network, and the deep learning can automatically learn the characteristics in big data. Currently, deep learning is capable of effectively performing gesture tracking in the field of gesture interaction.
In the field of extended display interaction technology, a technical problem to be solved urgently at present is to provide a gesture interaction method with higher accuracy, stronger immersion and lower equipment cost.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art, and provide the monocular camera augmented reality gesture interaction method based on key point detection, which has the advantages of higher accuracy, stronger immersion and lower equipment cost.
The technical scheme adopted for solving the technical problems comprises the following steps.
(1) Acquiring an input image
Taking a real-time image shot by a monocular camera as an input image, wherein the width w of the input image is at least 400 pixels, and the height h is at least 200 pixels.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence coefficient threshold value theta, theta epsilon (0, 1), and starting the hand detection model when the hand confidence coefficient is lower than theta.
2) Hand detection is carried out on the input image, from left to right, the hand numbers are respectively number 0 and 1, n is 0 or 1, and n is H n ,H n Including a left hand tag l or a right hand tag r.
3) And positioning the detected hand, and cutting out the hand area.
4) Inputting each hand region into a hand key point detection network to detect hand key points, and outputting H n The coordinates of the 21 key points of the hand are as follows:
key point 0 is wrist and marked as
Figure BDA0004147876470000021
The key points 1 to 4 are four joints from the root of the thumb to the fingertip, and are marked as +.>
Figure BDA0004147876470000022
The key points 5 to 8 are four joint points from the root of the index finger to the tip of the finger, and are marked as
Figure BDA0004147876470000023
The key points 9 to 12 are four nodes from the root of the middle finger to the tip of the finger, and are marked as
Figure BDA0004147876470000031
The key points 13 to 16 are four joint points from the root of the ring finger to the tip of the finger, and are marked as
Figure BDA0004147876470000032
The key points 17 to 20 are four joint points from the root of the little finger to the fingertip, and are marked as
Figure BDA0004147876470000033
Will H n The hand key point j is marked as
Figure BDA0004147876470000034
The x, y and z axes are respectively +.>
Figure BDA0004147876470000035
Wherein->
Figure BDA0004147876470000036
Figure BDA0004147876470000037
The x-axis and y-axis coordinates of (2) are the relative coordinates of the key point 0 on the input image, and the z-axis coordinates of (4)>
Figure BDA0004147876470000038
Is a minimum value, marked as n-hand H n Is>
Figure BDA0004147876470000039
If the number is negative, the root of the wrist is far away from the camera, otherwise, the wrist is close to the camera.
(3) Determining character string data
Press-down type pair H n Converting the coordinates of key points of the hands into x, y and z axes after conversion respectively
Figure BDA00041478764700000310
Figure BDA00041478764700000311
Figure BDA00041478764700000312
Figure BDA00041478764700000313
Figure BDA00041478764700000314
Calculating the n-hand H by pressing n The distance between the key point 5 and the key point 17 as the palm width L n
Figure BDA00041478764700000315
The character string data S is obtained as follows:
Figure BDA00041478764700000316
where k e {0,1,..20 }.
(4) Transmitting character string data S
And transmitting the character string data S to the Unity engine through a user datagram protocol.
(5) Constructing virtual hand models
And drawing the bone position by using a Unity engine, and adding the bone rotation angle and the relative displacement on the key points.
(6) Storing character string data S
1) The hand coordinate data is transmitted to the Unity engine.
2) Standardized character string data S' is obtained as follows:
Figure BDA0004147876470000041
3) Storing normalized string data S'
The normalized string data S' is stored to the left-hand string h as follows l And right hand string h r In (a):
Figure BDA0004147876470000042
Figure BDA0004147876470000043
wherein N is a null value.
(7) Virtual hand movement
1) Comma is used for writing the left-hand character string h l And right hand string h r Split into left-hand string arrays F containing 64 substrings l ' and Right-hand string array F r ' left-hand string array F is set using the float Parse function l ' and Right-hand string array F r ' conversion to left-hand floating point arrays F, respectively l And right-hand floating point array F r
Figure BDA0004147876470000044
Figure BDA0004147876470000045
Wherein, the left hand character string h l N is not assigned to the left-hand floating point array F l The method comprises the steps of carrying out a first treatment on the surface of the Right hand string h r When N is set, the N is not assigned to the right-hand floating point array F r
2) The virtual hand as a whole moves with the camera as a sub-object of the camera.
3) Determining relative coordinates of key points
Determining the relative coordinates of the left hand key point i with respect to the left hand key point 0 as follows
Figure BDA0004147876470000051
Left hand palm width d lH
Figure BDA0004147876470000052
Figure BDA0004147876470000053
Figure BDA0004147876470000054
Figure BDA0004147876470000055
Wherein the method comprises the steps of,
Figure BDA0004147876470000056
For left-hand floating point array F l Element i>
Figure BDA0004147876470000057
For left-hand floating point array F l Element 3i of (2) receives the x-axis coordinates of the left hand key i,/c>
Figure BDA0004147876470000058
For left-hand floating point array F l Element 0 of (c) receives the x-axis coordinates of left hand keypoint 0,
Figure BDA0004147876470000059
for left-hand floating point array F l 3i+1 element of (2) receives the y-axis coordinate of the left-hand key i, ++>
Figure BDA00041478764700000510
For left-hand floating point array F l Element 1 of (2) receives the y-axis coordinates of left-hand keypoint 0, +>
Figure BDA00041478764700000511
For left-hand floating point array F l The 3i+2 element of (2) receives the z-axis coordinate of the left-hand key i, ++>
Figure BDA00041478764700000512
For left-hand floating point array F l Element number 2 of (c) receives the z-axis coordinates of left hand keypoint 0.
The relative coordinates of the right hand key point i with respect to the right hand key point 0 are determined as follows
Figure BDA00041478764700000513
Palm width d of right hand rH
Figure BDA00041478764700000514
Figure BDA00041478764700000515
Figure BDA00041478764700000516
Figure BDA00041478764700000517
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00041478764700000518
for right-hand floating-point array F r Element i>
Figure BDA00041478764700000519
For right-hand floating-point array F r Element 3i of (2) receives the x-axis coordinates of the right hand key i,/c>
Figure BDA00041478764700000520
For right-hand floating-point array F r Element 0 of (c) receives the x-axis coordinates of right hand keypoint 0,
Figure BDA00041478764700000521
for right-hand floating-point array F r 3i+1 element of (2) receives the y-axis coordinate of the right hand key i, ++>
Figure BDA00041478764700000522
For right-hand floating-point array F r Element 1 of (2) receives the y-axis coordinates of right hand keypoint 0, +>
Figure BDA00041478764700000523
For right-hand floating-point array F r The 3i+2 element of (2) receives the z-axis coordinate of the right-hand key i, ++>
Figure BDA0004147876470000061
For right-hand floating-point array F r Element number 2 of (c) receives the z-axis coordinates of right hand keypoint 0.
4) Determining virtual hand scaling ratio
Determining the distance d between the left hand keypoint 0 and the left hand keypoint 1 as follows lR
Figure BDA0004147876470000062
The distance d between the right hand keypoint 0 and the right hand keypoint 1 is determined by the following rR
Figure BDA0004147876470000063
Determining a virtual left hand scaling ratio M by pressing l
Figure BDA0004147876470000064
Determining a virtual right hand zoom ratio M by pressing r
Figure BDA0004147876470000065
Wherein d lM Is the distance between the key point 0 and the key point 1 of the left hand of the virtual hand model, d rM Is the distance between keypoint 0 and keypoint 1 of the right hand of the virtual hand model.
5) Determining relative hand movement position
Determining coordinates C of the virtual left hand position motion relative to the camera by pressing lx 、C ly 、C lz
Figure BDA0004147876470000066
Figure BDA0004147876470000067
Figure BDA0004147876470000068
Wherein D is lx 、D ly 、D lz Initial position D for virtual left hand l Coordinates relative to the camera.
Determining coordinates C of the virtual right hand position motion relative to the camera according to the following mode rx 、C ry 、C rz
Figure BDA0004147876470000069
Figure BDA0004147876470000071
Figure BDA0004147876470000072
Wherein D is rx 、D ry 、D rz Initial position D for virtual right hand r Coordinates relative to the camera.
(8) Gesture interactions that trigger augmented reality
1) When the distance between the current position of the virtual hand of the augmented reality and the virtual object is less than or equal to 0.3 and the distances between the virtual hand key points 8, 12, 16 and 20 and the virtual hand key point 0 are all less than or equal to 0.05, triggering gesture interaction for picking up the virtual object.
2) When the distances between the virtual hand key points 8, 12, 16 and 20 of the augmented reality and the virtual hand key point 0 are all more than 0.05, triggering gesture interaction for putting down the virtual object.
(9) Performance evaluation
The number of frames per second FPS of the processed image of the monocular camera augmented reality gesture interaction method based on key point detection is evaluated as follows:
Figure BDA0004147876470000073
wherein t is e Is the time when one frame is processed, t s Is the time of starting processing of one frame, the frame number FPS of the processed image per second>At 30 frames/second, the method is a method with high real-time performance.
In the step (2) of detecting hand keypoint coordinates of the present invention, the confidence threshold of the hand detection is preferably 0.68.
In the step of constructing the virtual hand model in the step (5) of the invention, the rotation angle of the bones added on the key points is as follows: the rotation angle of the wrist key point 0 is 0-180 degrees, the rotation angles of the rest key points are 0-90 degrees, and the radius of the finger on the camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
The invention adopts the hand confidence threshold in the step of the hand key point detection network, and only when the hand confidence is lower than the threshold, the hand detection model is restarted. The hand detection and then the key point detection are needed to be carried out in the first frame of the image, because the video is continuous, according to the coordinates of the key points of the hand of the previous frame, the hand area can be prejudged, then the hand area is sent to the key point detection model of the next frame, the hand detection model can not be continuously and repeatedly used, each frame only needs to infer the hand area from the key points of the previous frame, and the hand area is sent to the key point detection model of the next frame.
Compared with the prior art, the invention has the following advantages:
according to the invention, the hand key point detection network is adopted, and the network is applied to the gesture tracking and recognition of the monocular camera in augmented reality, so that the characteristics of the hand of the current frame are effectively screened, and meanwhile, the tracked hand key point information is subjected to data standardization, so that the data is more standard, the motion of a virtual hand is conveniently driven, and the real-time performance is high. The method has the advantages of higher instantaneity, stronger immersion and lower equipment cost, and can be used for carrying out gesture tracking and identification in different backgrounds.
Drawings
Fig. 1 is a flow chart of embodiment 1 of the present invention.
Fig. 2 is a schematic numbered diagram of the hand keys.
Fig. 3 is a frame rate plot of the hand keypoint tracking detection of the method of example 1.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the following embodiments.
Example 1
The monocular camera augmented reality gesture interaction method based on key point detection of the embodiment comprises the following steps (see fig. 1):
(1) Acquiring an input image
Taking a real-time image shot by a monocular camera as an input image, wherein the width w of the input image is at least 400 pixels, and the height h is at least 200 pixels.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence threshold value theta, theta epsilon (0, 1), wherein the value theta of the embodiment is 0.68, and starting the hand detection model when the hand confidence is lower than theta.
2) Hand detection is carried out on the input image, from left to right, the hand numbers are respectively number 0 and 1, n is 0 or 1, and n is H n ,H n Including a left hand tag l or a right hand tag r.
3) And positioning the detected hand, and cutting out the hand area.
4) Inputting each hand region into a hand key point detection network to detect hand key points, and outputting H n The coordinates of the 21 key points of the hand are as follows:
in FIG. 2, key point 0 is the wrist, denoted as
Figure BDA0004147876470000091
The key points 1 to 4 are four joints from the root of the thumb to the fingertip, and are marked as +.>
Figure BDA0004147876470000092
The key points 5 to 8 are four joint points from the root of the index finger to the tip of the finger, and are marked as +.>
Figure BDA0004147876470000093
The key points 9 to 12 are four joints from the root of the finger to the tip of the finger, and are marked as +.>
Figure BDA0004147876470000094
The key points 13 to 16 are four joint points from the root of the ring finger to the tip of the finger, and are marked as
Figure BDA0004147876470000095
The key points 17 to 20 are four joint points from the root of the little finger to the fingertip, and are marked as
Figure BDA0004147876470000096
Will H n The hand key point j is marked as
Figure BDA0004147876470000097
The x, y and z axes are respectively +.>
Figure BDA0004147876470000098
Wherein->
Figure BDA0004147876470000099
Figure BDA00041478764700000910
The x-axis and y-axis coordinates of (2) are the relative coordinates of the key point 0 on the input image, and the z-axis coordinates of (4)>
Figure BDA00041478764700000911
Is a minimum value, marked as n-hand H n Is defined by the z-axis origin of coordinates,/>
Figure BDA00041478764700000912
if the number is negative, the root of the wrist is far away from the camera, otherwise, the wrist is close to the camera.
(3) Determining character string data
Press-down type pair H n Converting the coordinates of key points of the hands into x, y and z axes after conversion respectively
Figure BDA00041478764700000913
Figure BDA00041478764700000914
Figure BDA00041478764700000915
Figure BDA00041478764700000916
Figure BDA00041478764700000917
Calculating the n-hand H by pressing n The distance between the key point 5 and the key point 17 as the palm width L n
Figure BDA00041478764700000918
The character string data S is obtained as follows:
Figure BDA0004147876470000101
where k e {0,1,..20 }.
The invention adopts the step of determining the character string data, so that the character string data is more suitable for the transmission and the use of the user datagram protocol, and the data transmission speed and the processing efficiency are improved.
(4) Transmitting character string data S
And transmitting the character string data S to the Unity engine through a user datagram protocol.
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, adding skeleton rotation angles and relative displacement on key points, wherein the rotation angle of a wrist key point 0 is 0-180 degrees, the rotation angles of other key points are 0-90 degrees, the rotation angle of the wrist key point 0 is 90 degrees, the rotation angles of other key points are 45 degrees, and the radius of a finger on a camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
(6) Storing character string data S
1) The hand coordinate data is transmitted to the Unity engine.
2) Standardized character string data S' is obtained as follows:
Figure BDA0004147876470000102
3) Storing normalized string data S'
The normalized string data S' is stored to the left-hand string h as follows l And right hand string h r In (a):
Figure BDA0004147876470000103
Figure BDA0004147876470000111
wherein N is a null value.
The invention adopts the step of storing the character string data, screens out the meaningful information obtained from the user datagram protocol transmission, and lays a foundation for driving the motion of the virtual hand.
(7) Virtual hand movement
1) Comma is used for writing the left-hand character string h l And right hand string h r Split into left-hand string arrays F containing 64 substrings l ' and Right-hand string array F r ' left-hand string array F is set using the float Parse function l ' and Right-hand string array F r ' conversion to left-hand floating point arrays F, respectively l And right-hand floating point array F r
Figure BDA0004147876470000112
Figure BDA0004147876470000113
Wherein, the left hand character string h l N is not assigned to the left-hand floating point array F l The method comprises the steps of carrying out a first treatment on the surface of the Right hand string h r When N is set, the N is not assigned to the right-hand floating point array F r
2) The virtual hand as a whole moves with the camera as a sub-object of the camera.
3) Determining relative coordinates of key points
Determining the relative coordinates of the left hand key point i with respect to the left hand key point 0 as follows
Figure BDA0004147876470000114
Left hand palm width d lH
Figure BDA0004147876470000115
Figure BDA0004147876470000116
Figure BDA0004147876470000117
Figure BDA0004147876470000118
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004147876470000119
for left-hand floating point array F l Element i>
Figure BDA00041478764700001110
For left-hand floating point array F l Element 3i of (2) receives the x-axis coordinates of the left hand key i,/c>
Figure BDA0004147876470000121
For left-hand floating point array F l Element 0 of (2) receives the x-axis coordinates of left-hand keypoint 0, < >>
Figure BDA0004147876470000122
For left-hand floating point array F l 3i+1 element of (2) receives the y-axis coordinate of the left-hand key i, ++>
Figure BDA0004147876470000123
For left-hand floating point array F l Element 1 of (2) receives the y-axis coordinates of left-hand keypoint 0, +>
Figure BDA0004147876470000124
For left-hand floating point array F l The 3i+2 element of (2) receives the z-axis coordinate of the left-hand key i, ++>
Figure BDA0004147876470000125
For left-hand floating point array F l Element number 2 of (c) receives the z-axis coordinates of left hand keypoint 0.
The relative coordinates of the right hand key point i with respect to the right hand key point 0 are determined as follows
Figure BDA0004147876470000126
Palm width d of right hand rH
Figure BDA0004147876470000127
Figure BDA0004147876470000128
Figure BDA0004147876470000129
Figure BDA00041478764700001210
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00041478764700001211
for right-hand floating-point array F r Element i>
Figure BDA00041478764700001212
For right-hand floating-point array F r Element 3i of (2) receives the x-axis coordinates of the right hand key i,/c>
Figure BDA00041478764700001213
For right-hand floating-point array F r Element 0 of (c) receives the x-axis coordinates of right hand keypoint 0,
Figure BDA00041478764700001214
for right-hand floating-point array F r 3i+1 element of (2) receives the y-axis coordinate of the right hand key i, ++>
Figure BDA00041478764700001215
For right-hand floating-point array F r Element 1 of (2) receives the y-axis coordinates of right hand keypoint 0, +>
Figure BDA00041478764700001216
For right-hand floating-point array F r The 3i+2 element of (2) receives the z-axis coordinate of the right-hand key i, ++>
Figure BDA00041478764700001217
For right-hand floating-point array F r Element number 2 of (c) receives the z-axis coordinates of right hand keypoint 0.
4) Determining virtual hand scaling ratio
Determining the distance d between the left hand keypoint 0 and the left hand keypoint 1 as follows lR
Figure BDA00041478764700001218
The distance d between the right hand keypoint 0 and the right hand keypoint 1 is determined by the following rR
Figure BDA00041478764700001219
Determining a virtual left hand scaling ratio M by pressing l
Figure BDA0004147876470000131
Determining a virtual right hand zoom ratio M by pressing r
Figure BDA0004147876470000132
Wherein d lM Is the distance between the key point 0 and the key point 1 of the left hand of the virtual hand model, d rM Is the distance between keypoint 0 and keypoint 1 of the right hand of the virtual hand model.
5) Determining relative hand movement position
Determining coordinates C of the virtual left hand position motion relative to the camera by pressing lx 、C ly 、C lz
Figure BDA0004147876470000133
Figure BDA0004147876470000134
Figure BDA0004147876470000135
Wherein D is lx 、D ly 、D lz Initial position D for virtual left hand l Coordinates relative to the camera.
Determining coordinates C of the virtual right hand position motion relative to the camera according to the following mode rx 、C ry 、C rz
Figure BDA0004147876470000136
Figure BDA0004147876470000137
Figure BDA0004147876470000138
Wherein D is rx 、D ry 、D rz Initial position D for virtual right hand r Coordinates relative to the camera.
Because the invention adopts the step of driving the virtual hand to move by data, the motion of the virtual hand can be finer and natural and is more consistent with the hand motion in reality, thereby improving the interactive immersion of the experimenter in the augmented reality environment.
(8) Gesture interactions that trigger augmented reality
1) When the distance between the current position of the virtual hand of the augmented reality and the virtual object is less than or equal to 0.3 and the distances between the virtual hand key points 8, 12, 16 and 20 and the virtual hand key point 0 are all less than or equal to 0.05, triggering gesture interaction for picking up the virtual object.
2) When the distances between the virtual hand key points 8, 12, 16 and 20 of the augmented reality and the virtual hand key point 0 are all more than 0.05, triggering gesture interaction for putting down the virtual object.
Because the invention adopts the key point detection network to capture the hand data for interaction, compared with the traditional handle and data glove interaction mode which has high price and needs special hardware, the invention has lower equipment cost and stronger universality.
(9) Performance evaluation
The number of frames per second FPS of the processed image of the monocular camera augmented reality gesture interaction method based on key point detection is evaluated as follows:
Figure BDA0004147876470000141
wherein t is e Is the time when one frame is processed, t s Is the time of starting processing of one frame, the frame number FPS of the processed image per second>At 30 frames/second, the method is a method with high real-time performance.
And (3) completing the monocular camera augmented reality gesture interaction method based on key point detection.
Example 2
The monocular camera augmented reality gesture interaction method based on key point detection in the embodiment comprises the following steps:
(1) Acquiring an input image
This step is the same as in example 1.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence threshold value theta, theta epsilon (0, 1), wherein the value theta of the embodiment is 0.01, and starting the hand detection model when the hand confidence is lower than theta.
The other steps of this step are the same as those of example 1.
(3) Determining character string data
This step is the same as in example 1.
(4) Transmitting character string data S
This step is the same as in example 1.
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, adding skeleton rotation angles and relative displacement on key points, wherein the rotation angle of a wrist key point 0 is 0-180 degrees, the rotation angles of other key points are 0-90 degrees, the rotation angle of the wrist key point 0 is 0 degrees, the rotation angles of other key points are 0 degrees, and the radius of a finger on a camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
The other steps were the same as in example 1. And (3) completing the monocular camera augmented reality gesture interaction method based on key point detection.
Example 3
The monocular camera augmented reality gesture interaction method based on key point detection in the embodiment comprises the following steps:
(1) Acquiring an input image
This step is the same as in example 1.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence threshold value theta, theta epsilon (0, 1), wherein the value theta of the embodiment is 0.98, and starting the hand detection model when the hand confidence is lower than theta.
The other steps of this step are the same as those of example 1.
(3) Determining character string data
This step is the same as in example 1.
(4) Transmitting character string data S
This step is the same as in example 1.
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, adding skeleton rotation angles and relative displacement on key points, wherein the rotation angle of a wrist key point 0 is 0-180 degrees, the rotation angles of other key points are 0-90 degrees, the rotation angle of the wrist key point 0 is 180 degrees, the rotation angles of other key points are 90 degrees, and the radius of a finger on a camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
The other steps were the same as in example 1. And (3) completing the monocular camera augmented reality gesture interaction method based on key point detection.
In order to verify the beneficial effects of the invention, the inventor adopts the method of the embodiment 1 of the invention to carry out simulation experiments, and the experimental conditions are as follows:
1. simulation conditions
Software environment: pyCharm 2019.3.1x64.
The hardware conditions are as follows: 1 personal computer, 1 Nvidia3060Ti video card, 1 1080P camera, 1 personal mobile phone.
Computer configuration:
1) A processor: intel (R) Core (TM) i7-10700 CPU@2.90GHz 2.90GHz.
2) Memory: 32.0GB.
The software platform is as follows: python3.8.
Other third library: opencv-python4.6.0, mediap 0.9.1, socket.
2. Simulation content and results
Experiments were performed under the above simulation conditions, and the experimental results are shown in fig. 3.
In fig. 3, the abscissa represents the running time of the present invention, and the ordinate represents the number of frames the present invention can process for 1 second, i.e. FPS; as can be seen from fig. 3, the frame number of the video image processed in 1 second fluctuates by about 30, which indicates that the video image processing speed is high and the real-time performance is achieved.

Claims (3)

1. The monocular camera augmented reality gesture interaction method based on key point detection is characterized by comprising the following steps of:
(1) Acquiring an input image
Taking a real-time image shot by a monocular camera as an input image, wherein the width w of the input image is at least 400 pixels, and the height h is at least 200 pixels;
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence coefficient threshold value theta, theta epsilon (0, 1), and starting a hand detection model when the hand confidence coefficient is lower than theta;
2) Hand detection is carried out on the input image, from left to right, the hand numbers are respectively number 0 and 1, n is 0 or 1, and n is H n ,H n Including left hand label l or right hand label r;
3) Positioning the detected hand, and cutting out a hand area;
4) Inputting each hand region into a hand key point detection network to detect hand key points, and outputting H n The coordinates of the 21 key points of the hand are as follows:
key point 0 is wrist and marked as
Figure FDA0004147876440000011
The key points 1 to 4 are four joints from the root of the thumb to the fingertip, and are marked as +.>
Figure FDA0004147876440000012
The key points 5 to 8 are four joint points from the root of the index finger to the tip of the finger, and are marked as
Figure FDA0004147876440000013
The key points 9 to 12 are four nodes from the root of the middle finger to the tip of the finger, and are marked as
Figure FDA0004147876440000014
The key points 13 to 16 are the ring fingers from the root to the fingerThe four sharp nodes, designated as
Figure FDA0004147876440000015
The key points 17 to 20 are four joint points from the root of the little finger to the fingertip, and are marked as
Figure FDA0004147876440000016
Will H n The hand key point j is marked as
Figure FDA0004147876440000017
The x, y and z axes are respectively +.>
Figure FDA0004147876440000018
Wherein->
Figure FDA0004147876440000019
Figure FDA00041478764400000110
The x-axis and y-axis coordinates of (2) are the relative coordinates of the key point 0 on the input image, and the z-axis coordinates of (4)>
Figure FDA00041478764400000111
Is a minimum value, marked as n-hand H n Is>
Figure FDA00041478764400000112
If the number is negative, the root of the wrist is far away from the camera, otherwise, the wrist is close to the camera;
(3) Determining character string data
Press-down type pair H n Converting the coordinates of key points of the hands into x, y and z axes after conversion respectively
Figure FDA0004147876440000021
Figure FDA0004147876440000022
Figure FDA0004147876440000023
Figure FDA0004147876440000024
Figure FDA0004147876440000025
Calculating the n-hand H by pressing n The distance between the key point 5 and the key point 17 as the palm width L n
Figure FDA0004147876440000026
The character string data S is obtained as follows:
Figure FDA0004147876440000027
wherein k e {0,1,., 20};
(4) Transmitting character string data S
Transmitting the character string data S to the Unity engine through a user datagram protocol;
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, and adding skeleton rotation angles and relative displacement on key points;
(6) Storing character string data S
1) The hand coordinate data is transmitted to a Unity engine;
2) Standardized character string data S' is obtained as follows:
Figure FDA0004147876440000028
3) Storing normalized string data S'
The normalized string data S' is stored to the left-hand string h as follows l And right hand string h r In (a):
Figure FDA0004147876440000031
Figure FDA0004147876440000032
wherein N is a null value;
(7) Virtual hand movement
1) Comma is used for writing the left-hand character string h l And right hand string h r Split into left-hand string arrays F containing 64 substrings l ' and Right-hand string array F r ' left-hand string array F is set using the float Parse function l ' and Right-hand string array F r ' conversion to left-hand floating point arrays F, respectively l And right-hand floating point array F r
Figure FDA0004147876440000033
Figure FDA0004147876440000034
Wherein, the left hand character string h l N is not assigned to the left-hand floating point array F l The method comprises the steps of carrying out a first treatment on the surface of the Right hand string h r When N is set, the N is not assigned to the right-hand floating point array F r
2) The virtual hand as a whole is used as a sub-object of the camera to move along with the camera;
3) Determining relative coordinates of key points
Determining the relative coordinates of the left hand key point i with respect to the left hand key point 0 as follows
Figure FDA0004147876440000035
Left hand palm width d lH
Figure FDA0004147876440000036
Figure FDA0004147876440000037
Figure FDA0004147876440000038
Figure FDA0004147876440000041
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004147876440000042
for left-hand floating point array F l Element i>
Figure FDA0004147876440000043
For left-hand floating point array F l Element 3i of (2) receives the x-axis coordinates of the left hand key i,/c>
Figure FDA0004147876440000044
For left-hand floating point array F l Element 0 of (2) receives the x-axis coordinates of left-hand keypoint 0, < >>
Figure FDA0004147876440000045
For left-hand floating point array F l Element 3i+1 of (1) receives the left hand offY-axis coordinate of key i, +.>
Figure FDA0004147876440000046
For left-hand floating point array F l Element 1 of (2) receives the y-axis coordinates of left-hand keypoint 0, +>
Figure FDA0004147876440000047
For left-hand floating point array F l The 3i+2 element of (2) receives the z-axis coordinate of the left-hand key i, ++>
Figure FDA0004147876440000048
For left-hand floating point array F l The element 2 of (2) receives the z-axis coordinate of the left-hand key point 0;
the relative coordinates of the right hand key point i with respect to the right hand key point 0 are determined as follows
Figure FDA0004147876440000049
Palm width d of right hand rH
Figure FDA00041478764400000410
Figure FDA00041478764400000411
Figure FDA00041478764400000412
Figure FDA00041478764400000413
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA00041478764400000414
float for the right handPoint array F r Element i>
Figure FDA00041478764400000415
For right-hand floating-point array F r Element 3i of (2) receives the x-axis coordinates of the right hand key i,/c>
Figure FDA00041478764400000416
For right-hand floating-point array F r Element 0 of (2) receives the x-axis coordinates of right hand keypoint 0, < >>
Figure FDA00041478764400000417
For right-hand floating-point array F r 3i+1 element of (2) receives the y-axis coordinate of the right hand key i, ++>
Figure FDA00041478764400000418
For right-hand floating-point array F r Element 1 of (2) receives the y-axis coordinates of right hand keypoint 0, +>
Figure FDA00041478764400000419
For right-hand floating-point array F r The 3i+2 element of (2) receives the z-axis coordinate of the right-hand key i, ++>
Figure FDA00041478764400000420
For right-hand floating-point array F r The element 2 of (2) receives the z-axis coordinate of the right-hand key point 0;
4) Determining virtual hand scaling ratio
Determining the distance d between the left hand keypoint 0 and the left hand keypoint 1 as follows lR
Figure FDA00041478764400000421
The distance d between the right hand keypoint 0 and the right hand keypoint 1 is determined by the following rR
Figure FDA0004147876440000051
Determining a virtual left hand scaling ratio M by pressing l
Figure FDA0004147876440000052
Determining a virtual right hand zoom ratio M by pressing r
Figure FDA0004147876440000053
Wherein d lM Is the distance between the key point 0 and the key point 1 of the left hand of the virtual hand model, d rM The distance between the key point 0 and the key point 1 of the right hand of the virtual hand model is;
5) Determining relative hand movement position
Determining coordinates C of the virtual left hand position motion relative to the camera by pressing lx 、C ly 、C lz
Figure FDA0004147876440000054
Figure FDA0004147876440000055
Figure FDA0004147876440000056
Wherein D is lx 、D ly 、D lz Initial position D for virtual left hand l Coordinates relative to the camera;
determining relative camera when virtual right hand position moves by pressingCoordinates C of the image head rx 、C ry 、C rz
Figure FDA0004147876440000057
Figure FDA0004147876440000058
Figure FDA0004147876440000059
Wherein D is rx 、D ry 、D rz Initial position D for virtual right hand r Coordinates relative to the camera;
(8) Gesture interactions that trigger augmented reality
1) Triggering gesture interaction for picking up the virtual object when the distance between the current position of the virtual hand of the augmented reality and the virtual object is less than or equal to 0.3 and the distances between the virtual hand key points 8, 12, 16 and 20 and the virtual hand key point 0 are all less than or equal to 0.05;
2) Triggering gesture interaction for putting down a virtual object when the distances between the virtual hand key points 8, 12, 16 and 20 of the augmented reality and the virtual hand key point 0 are all more than 0.05;
(9) Performance evaluation
The number of frames per second FPS of the processed image of the monocular camera augmented reality gesture interaction method based on key point detection is evaluated as follows:
Figure FDA0004147876440000061
wherein t is e Is the time when one frame is processed, t s Is the time of starting processing of one frame, the frame number FPS of the processed image per second>At 30 frames/second, the method is a method with high real-time performance.
2. The monocular camera augmented reality gesture interaction method based on keypoint detection of claim 1, wherein the method is characterized by: in the step (2) of detecting the hand key point coordinates, the confidence threshold of the hand detection is 0.68.
3. The monocular camera augmented reality gesture interaction method based on keypoint detection of claim 1, wherein the method is characterized by: in the step of constructing the virtual hand model (5), the adding skeleton rotation angle on the key points is as follows: the rotation angle of the wrist key point 0 is 0-180 degrees, the rotation angles of the rest key points are 0-90 degrees, and the radius of the finger on the camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
CN202310309434.9A 2023-03-28 2023-03-28 Monocular camera augmented reality gesture interaction method based on key point detection Pending CN116403280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310309434.9A CN116403280A (en) 2023-03-28 2023-03-28 Monocular camera augmented reality gesture interaction method based on key point detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310309434.9A CN116403280A (en) 2023-03-28 2023-03-28 Monocular camera augmented reality gesture interaction method based on key point detection

Publications (1)

Publication Number Publication Date
CN116403280A true CN116403280A (en) 2023-07-07

Family

ID=87015360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310309434.9A Pending CN116403280A (en) 2023-03-28 2023-03-28 Monocular camera augmented reality gesture interaction method based on key point detection

Country Status (1)

Country Link
CN (1) CN116403280A (en)

Similar Documents

Publication Publication Date Title
WO2021103648A1 (en) Hand key point detection method, gesture recognition method, and related devices
CN107563494B (en) First-view-angle fingertip detection method based on convolutional neural network and heat map
Zhou et al. A novel finger and hand pose estimation technique for real-time hand gesture recognition
CN112926423B (en) Pinch gesture detection and recognition method, device and system
CN109597485B (en) Gesture interaction system based on double-fingered-area features and working method thereof
CN108509026B (en) Remote maintenance support system and method based on enhanced interaction mode
CN111680594A (en) Augmented reality interaction method based on gesture recognition
US20100103092A1 (en) Video-based handwritten character input apparatus and method thereof
CN107832736B (en) Real-time human body action recognition method and real-time human body action recognition device
CN109145802B (en) Kinect-based multi-person gesture man-machine interaction method and device
Linqin et al. Dynamic hand gesture recognition using RGB-D data for natural human-computer interaction
CN111444488A (en) Identity authentication method based on dynamic gesture
CN107292295B (en) Gesture segmentation method and device
JP2003256850A (en) Movement recognizing device and image processor and its program
WO2024078088A1 (en) Interaction processing method and apparatus
Liu et al. Ultrasonic positioning and IMU data fusion for pen-based 3D hand gesture recognition
CN116403280A (en) Monocular camera augmented reality gesture interaction method based on key point detection
Dhamanskar et al. Human computer interaction using hand gestures and voice
CN117011929A (en) Head posture estimation method, device, equipment and storage medium
Jiang et al. A brief analysis of gesture recognition in VR
CN114077307A (en) Simulation system and method with input interface
Wang Real-time hand-tracking as a user input device
Dutta et al. A Hand Gesture-operated System for Rehabilitation using an End-to-End Detection Framework
CN113703564A (en) Man-machine interaction equipment and system based on facial features
CN113705280B (en) Human-computer interaction method and device based on facial features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination