CN116403280A - Monocular camera augmented reality gesture interaction method based on key point detection - Google Patents
Monocular camera augmented reality gesture interaction method based on key point detection Download PDFInfo
- Publication number
- CN116403280A CN116403280A CN202310309434.9A CN202310309434A CN116403280A CN 116403280 A CN116403280 A CN 116403280A CN 202310309434 A CN202310309434 A CN 202310309434A CN 116403280 A CN116403280 A CN 116403280A
- Authority
- CN
- China
- Prior art keywords
- hand
- virtual
- key
- key point
- coordinates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 47
- 238000001514 detection method Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000003190 augmentative effect Effects 0.000 title claims abstract description 31
- 230000033001 locomotion Effects 0.000 claims abstract description 22
- 238000007667 floating Methods 0.000 claims description 33
- 210000003811 finger Anatomy 0.000 claims description 23
- 210000000707 wrist Anatomy 0.000 claims description 17
- 239000004973 liquid crystal related substance Substances 0.000 claims description 10
- 238000003491 array Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000006073 displacement reaction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 210000004247 hand Anatomy 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 210000004932 little finger Anatomy 0.000 claims description 3
- 210000003813 thumb Anatomy 0.000 claims description 3
- 238000007654 immersion Methods 0.000 abstract description 11
- 230000001960 triggered effect Effects 0.000 abstract 1
- 210000000988 bone and bone Anatomy 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 210000002478 hand joint Anatomy 0.000 description 2
- 210000003205 muscle Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000003010 carpal bone Anatomy 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000001145 finger joint Anatomy 0.000 description 1
- 210000002411 hand bone Anatomy 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000000236 metacarpal bone Anatomy 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000002435 tendon Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/245—Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Abstract
A monocular camera augmented reality gesture interaction method based on key point detection comprises the steps of obtaining an input image, detecting hand key point coordinates, determining character string data, transmitting the character string data S, constructing a virtual hand model, storing the character string data S, performing virtual hand motion, triggering gesture interaction of augmented reality and evaluating performance. According to the invention, the character string data are determined, the character string data S are stored, the virtual hand motion is triggered, the gesture interaction of the augmented reality is realized, the hand key point detection network is used, the network is applied to the gesture tracking recognition of the monocular camera of the augmented reality, the characteristics of the hand of the current frame are effectively screened, and meanwhile, the data standardization is carried out on the tracked hand key point information, so that the data are more standard, the motion of the hand of the virtual world is conveniently driven, and the real-time performance is high. The gesture tracking method has the advantages of higher instantaneity, stronger immersion, low equipment cost and the like, and can be used for gesture tracking and recognition in different backgrounds.
Description
Technical Field
The invention belongs to the technical field of augmented reality interaction, and particularly relates to a method for tracking, identifying and transmitting data by gestures.
Background
In recent years, with the gradual popularization of virtual reality equipment, virtual reality interaction has become a very active topic and research hot spot, economy and society develop at a high speed, almost everyone is under more or less pressure and anxiety, in the virtual reality world, people can temporarily draw away from the current emotion world and enter a brand new and novel virtual world, for virtual reality experience, the immersion of the virtual reality world is very important, the immersion comes from the immersion and naturalness of interaction to a great extent, and the interaction experience closer to reality can bring stronger immersion to people, so that the work of researching closer to natural interaction has very profound significance.
The interaction is mostly related to the hand, and the hand movement of human is mainly to control the movement of fingers through muscles, and the muscles and tendons are driven by nerves to move bones. The human hand bone comprises carpal bones, metacarpal bones and phalanges, the joints between the multi-metacarpal joints and the phalanges of the finger joints are mainly provided with functions of bending, stretching, retracting, expanding and rotating, the gesture of the hand is greatly dependent on the position of the hand joint, if the position of the hand joint point in reality can be obtained through a monocular camera and is transmitted into a virtual hand in the virtual world, various forms of the hand of the virtual hand can be controlled, and the hand movement in the virtual world can be directly controlled through the movement of the hand in reality.
The interaction modes of virtual reality are various, and most traditional interaction is performed through a game handle, but the interaction modes are that various buttons on the handle are pressed to complete interaction, and the technical problems of great difference from direct interaction of hands in the real world and low immersion are solved; based on the interaction of the data glove, the immersion feeling is strong, but the problem of higher experience cost exists, and the wide popularization has great difficulty.
The interaction based on monocular camera machine vision is a new research direction in the field of gesture interaction, and is introduced into machine learning to enable the interaction to be closer to an original target, namely artificial intelligence. The deep learning method comprises an artificial neural network, a convolutional neural network and a cyclic neural network, and the deep learning can automatically learn the characteristics in big data. Currently, deep learning is capable of effectively performing gesture tracking in the field of gesture interaction.
In the field of extended display interaction technology, a technical problem to be solved urgently at present is to provide a gesture interaction method with higher accuracy, stronger immersion and lower equipment cost.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art, and provide the monocular camera augmented reality gesture interaction method based on key point detection, which has the advantages of higher accuracy, stronger immersion and lower equipment cost.
The technical scheme adopted for solving the technical problems comprises the following steps.
(1) Acquiring an input image
Taking a real-time image shot by a monocular camera as an input image, wherein the width w of the input image is at least 400 pixels, and the height h is at least 200 pixels.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence coefficient threshold value theta, theta epsilon (0, 1), and starting the hand detection model when the hand confidence coefficient is lower than theta.
2) Hand detection is carried out on the input image, from left to right, the hand numbers are respectively number 0 and 1, n is 0 or 1, and n is H n ,H n Including a left hand tag l or a right hand tag r.
3) And positioning the detected hand, and cutting out the hand area.
4) Inputting each hand region into a hand key point detection network to detect hand key points, and outputting H n The coordinates of the 21 key points of the hand are as follows:
Will H n The hand key point j is marked asThe x, y and z axes are respectively +.>Wherein-> The x-axis and y-axis coordinates of (2) are the relative coordinates of the key point 0 on the input image, and the z-axis coordinates of (4)>Is a minimum value, marked as n-hand H n Is>If the number is negative, the root of the wrist is far away from the camera, otherwise, the wrist is close to the camera.
(3) Determining character string data
Press-down type pair H n Converting the coordinates of key points of the hands into x, y and z axes after conversion respectively
Calculating the n-hand H by pressing n The distance between the key point 5 and the key point 17 as the palm width L n :
The character string data S is obtained as follows:
where k e {0,1,..20 }.
(4) Transmitting character string data S
And transmitting the character string data S to the Unity engine through a user datagram protocol.
(5) Constructing virtual hand models
And drawing the bone position by using a Unity engine, and adding the bone rotation angle and the relative displacement on the key points.
(6) Storing character string data S
1) The hand coordinate data is transmitted to the Unity engine.
2) Standardized character string data S' is obtained as follows:
3) Storing normalized string data S'
The normalized string data S' is stored to the left-hand string h as follows l And right hand string h r In (a):
wherein N is a null value.
(7) Virtual hand movement
1) Comma is used for writing the left-hand character string h l And right hand string h r Split into left-hand string arrays F containing 64 substrings l ' and Right-hand string array F r ' left-hand string array F is set using the float Parse function l ' and Right-hand string array F r ' conversion to left-hand floating point arrays F, respectively l And right-hand floating point array F r :
Wherein, the left hand character string h l N is not assigned to the left-hand floating point array F l The method comprises the steps of carrying out a first treatment on the surface of the Right hand string h r When N is set, the N is not assigned to the right-hand floating point array F r 。
2) The virtual hand as a whole moves with the camera as a sub-object of the camera.
3) Determining relative coordinates of key points
Determining the relative coordinates of the left hand key point i with respect to the left hand key point 0 as followsLeft hand palm width d lH :
Wherein the method comprises the steps of,For left-hand floating point array F l Element i>For left-hand floating point array F l Element 3i of (2) receives the x-axis coordinates of the left hand key i,/c>For left-hand floating point array F l Element 0 of (c) receives the x-axis coordinates of left hand keypoint 0,for left-hand floating point array F l 3i+1 element of (2) receives the y-axis coordinate of the left-hand key i, ++>For left-hand floating point array F l Element 1 of (2) receives the y-axis coordinates of left-hand keypoint 0, +>For left-hand floating point array F l The 3i+2 element of (2) receives the z-axis coordinate of the left-hand key i, ++>For left-hand floating point array F l Element number 2 of (c) receives the z-axis coordinates of left hand keypoint 0.
The relative coordinates of the right hand key point i with respect to the right hand key point 0 are determined as followsPalm width d of right hand rH :
Wherein, the liquid crystal display device comprises a liquid crystal display device,for right-hand floating-point array F r Element i>For right-hand floating-point array F r Element 3i of (2) receives the x-axis coordinates of the right hand key i,/c>For right-hand floating-point array F r Element 0 of (c) receives the x-axis coordinates of right hand keypoint 0,for right-hand floating-point array F r 3i+1 element of (2) receives the y-axis coordinate of the right hand key i, ++>For right-hand floating-point array F r Element 1 of (2) receives the y-axis coordinates of right hand keypoint 0, +>For right-hand floating-point array F r The 3i+2 element of (2) receives the z-axis coordinate of the right-hand key i, ++>For right-hand floating-point array F r Element number 2 of (c) receives the z-axis coordinates of right hand keypoint 0.
4) Determining virtual hand scaling ratio
Determining the distance d between the left hand keypoint 0 and the left hand keypoint 1 as follows lR :
The distance d between the right hand keypoint 0 and the right hand keypoint 1 is determined by the following rR :
Determining a virtual left hand scaling ratio M by pressing l :
Determining a virtual right hand zoom ratio M by pressing r :
Wherein d lM Is the distance between the key point 0 and the key point 1 of the left hand of the virtual hand model, d rM Is the distance between keypoint 0 and keypoint 1 of the right hand of the virtual hand model.
5) Determining relative hand movement position
Determining coordinates C of the virtual left hand position motion relative to the camera by pressing lx 、C ly 、C lz :
Wherein D is lx 、D ly 、D lz Initial position D for virtual left hand l Coordinates relative to the camera.
Determining coordinates C of the virtual right hand position motion relative to the camera according to the following mode rx 、C ry 、C rz :
Wherein D is rx 、D ry 、D rz Initial position D for virtual right hand r Coordinates relative to the camera.
(8) Gesture interactions that trigger augmented reality
1) When the distance between the current position of the virtual hand of the augmented reality and the virtual object is less than or equal to 0.3 and the distances between the virtual hand key points 8, 12, 16 and 20 and the virtual hand key point 0 are all less than or equal to 0.05, triggering gesture interaction for picking up the virtual object.
2) When the distances between the virtual hand key points 8, 12, 16 and 20 of the augmented reality and the virtual hand key point 0 are all more than 0.05, triggering gesture interaction for putting down the virtual object.
(9) Performance evaluation
The number of frames per second FPS of the processed image of the monocular camera augmented reality gesture interaction method based on key point detection is evaluated as follows:
wherein t is e Is the time when one frame is processed, t s Is the time of starting processing of one frame, the frame number FPS of the processed image per second>At 30 frames/second, the method is a method with high real-time performance.
In the step (2) of detecting hand keypoint coordinates of the present invention, the confidence threshold of the hand detection is preferably 0.68.
In the step of constructing the virtual hand model in the step (5) of the invention, the rotation angle of the bones added on the key points is as follows: the rotation angle of the wrist key point 0 is 0-180 degrees, the rotation angles of the rest key points are 0-90 degrees, and the radius of the finger on the camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
The invention adopts the hand confidence threshold in the step of the hand key point detection network, and only when the hand confidence is lower than the threshold, the hand detection model is restarted. The hand detection and then the key point detection are needed to be carried out in the first frame of the image, because the video is continuous, according to the coordinates of the key points of the hand of the previous frame, the hand area can be prejudged, then the hand area is sent to the key point detection model of the next frame, the hand detection model can not be continuously and repeatedly used, each frame only needs to infer the hand area from the key points of the previous frame, and the hand area is sent to the key point detection model of the next frame.
Compared with the prior art, the invention has the following advantages:
according to the invention, the hand key point detection network is adopted, and the network is applied to the gesture tracking and recognition of the monocular camera in augmented reality, so that the characteristics of the hand of the current frame are effectively screened, and meanwhile, the tracked hand key point information is subjected to data standardization, so that the data is more standard, the motion of a virtual hand is conveniently driven, and the real-time performance is high. The method has the advantages of higher instantaneity, stronger immersion and lower equipment cost, and can be used for carrying out gesture tracking and identification in different backgrounds.
Drawings
Fig. 1 is a flow chart of embodiment 1 of the present invention.
Fig. 2 is a schematic numbered diagram of the hand keys.
Fig. 3 is a frame rate plot of the hand keypoint tracking detection of the method of example 1.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the following embodiments.
Example 1
The monocular camera augmented reality gesture interaction method based on key point detection of the embodiment comprises the following steps (see fig. 1):
(1) Acquiring an input image
Taking a real-time image shot by a monocular camera as an input image, wherein the width w of the input image is at least 400 pixels, and the height h is at least 200 pixels.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence threshold value theta, theta epsilon (0, 1), wherein the value theta of the embodiment is 0.68, and starting the hand detection model when the hand confidence is lower than theta.
2) Hand detection is carried out on the input image, from left to right, the hand numbers are respectively number 0 and 1, n is 0 or 1, and n is H n ,H n Including a left hand tag l or a right hand tag r.
3) And positioning the detected hand, and cutting out the hand area.
4) Inputting each hand region into a hand key point detection network to detect hand key points, and outputting H n The coordinates of the 21 key points of the hand are as follows:
in FIG. 2, key point 0 is the wrist, denoted asThe key points 1 to 4 are four joints from the root of the thumb to the fingertip, and are marked as +.>The key points 5 to 8 are four joint points from the root of the index finger to the tip of the finger, and are marked as +.>The key points 9 to 12 are four joints from the root of the finger to the tip of the finger, and are marked as +.>The key points 13 to 16 are four joint points from the root of the ring finger to the tip of the finger, and are marked asThe key points 17 to 20 are four joint points from the root of the little finger to the fingertip, and are marked as
Will H n The hand key point j is marked asThe x, y and z axes are respectively +.>Wherein-> The x-axis and y-axis coordinates of (2) are the relative coordinates of the key point 0 on the input image, and the z-axis coordinates of (4)>Is a minimum value, marked as n-hand H n Is defined by the z-axis origin of coordinates,/>if the number is negative, the root of the wrist is far away from the camera, otherwise, the wrist is close to the camera.
(3) Determining character string data
Press-down type pair H n Converting the coordinates of key points of the hands into x, y and z axes after conversion respectively
Calculating the n-hand H by pressing n The distance between the key point 5 and the key point 17 as the palm width L n :
The character string data S is obtained as follows:
where k e {0,1,..20 }.
The invention adopts the step of determining the character string data, so that the character string data is more suitable for the transmission and the use of the user datagram protocol, and the data transmission speed and the processing efficiency are improved.
(4) Transmitting character string data S
And transmitting the character string data S to the Unity engine through a user datagram protocol.
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, adding skeleton rotation angles and relative displacement on key points, wherein the rotation angle of a wrist key point 0 is 0-180 degrees, the rotation angles of other key points are 0-90 degrees, the rotation angle of the wrist key point 0 is 90 degrees, the rotation angles of other key points are 45 degrees, and the radius of a finger on a camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
(6) Storing character string data S
1) The hand coordinate data is transmitted to the Unity engine.
2) Standardized character string data S' is obtained as follows:
3) Storing normalized string data S'
The normalized string data S' is stored to the left-hand string h as follows l And right hand string h r In (a):
wherein N is a null value.
The invention adopts the step of storing the character string data, screens out the meaningful information obtained from the user datagram protocol transmission, and lays a foundation for driving the motion of the virtual hand.
(7) Virtual hand movement
1) Comma is used for writing the left-hand character string h l And right hand string h r Split into left-hand string arrays F containing 64 substrings l ' and Right-hand string array F r ' left-hand string array F is set using the float Parse function l ' and Right-hand string array F r ' conversion to left-hand floating point arrays F, respectively l And right-hand floating point array F r :
Wherein, the left hand character string h l N is not assigned to the left-hand floating point array F l The method comprises the steps of carrying out a first treatment on the surface of the Right hand string h r When N is set, the N is not assigned to the right-hand floating point array F r 。
2) The virtual hand as a whole moves with the camera as a sub-object of the camera.
3) Determining relative coordinates of key points
Determining the relative coordinates of the left hand key point i with respect to the left hand key point 0 as followsLeft hand palm width d lH :
Wherein, the liquid crystal display device comprises a liquid crystal display device,for left-hand floating point array F l Element i>For left-hand floating point array F l Element 3i of (2) receives the x-axis coordinates of the left hand key i,/c>For left-hand floating point array F l Element 0 of (2) receives the x-axis coordinates of left-hand keypoint 0, < >>For left-hand floating point array F l 3i+1 element of (2) receives the y-axis coordinate of the left-hand key i, ++>For left-hand floating point array F l Element 1 of (2) receives the y-axis coordinates of left-hand keypoint 0, +>For left-hand floating point array F l The 3i+2 element of (2) receives the z-axis coordinate of the left-hand key i, ++>For left-hand floating point array F l Element number 2 of (c) receives the z-axis coordinates of left hand keypoint 0.
The relative coordinates of the right hand key point i with respect to the right hand key point 0 are determined as followsPalm width d of right hand rH :
Wherein, the liquid crystal display device comprises a liquid crystal display device,for right-hand floating-point array F r Element i>For right-hand floating-point array F r Element 3i of (2) receives the x-axis coordinates of the right hand key i,/c>For right-hand floating-point array F r Element 0 of (c) receives the x-axis coordinates of right hand keypoint 0,for right-hand floating-point array F r 3i+1 element of (2) receives the y-axis coordinate of the right hand key i, ++>For right-hand floating-point array F r Element 1 of (2) receives the y-axis coordinates of right hand keypoint 0, +>For right-hand floating-point array F r The 3i+2 element of (2) receives the z-axis coordinate of the right-hand key i, ++>For right-hand floating-point array F r Element number 2 of (c) receives the z-axis coordinates of right hand keypoint 0.
4) Determining virtual hand scaling ratio
Determining the distance d between the left hand keypoint 0 and the left hand keypoint 1 as follows lR :
The distance d between the right hand keypoint 0 and the right hand keypoint 1 is determined by the following rR :
Determining a virtual left hand scaling ratio M by pressing l :
Determining a virtual right hand zoom ratio M by pressing r :
Wherein d lM Is the distance between the key point 0 and the key point 1 of the left hand of the virtual hand model, d rM Is the distance between keypoint 0 and keypoint 1 of the right hand of the virtual hand model.
5) Determining relative hand movement position
Determining coordinates C of the virtual left hand position motion relative to the camera by pressing lx 、C ly 、C lz :
Wherein D is lx 、D ly 、D lz Initial position D for virtual left hand l Coordinates relative to the camera.
Determining coordinates C of the virtual right hand position motion relative to the camera according to the following mode rx 、C ry 、C rz :
Wherein D is rx 、D ry 、D rz Initial position D for virtual right hand r Coordinates relative to the camera.
Because the invention adopts the step of driving the virtual hand to move by data, the motion of the virtual hand can be finer and natural and is more consistent with the hand motion in reality, thereby improving the interactive immersion of the experimenter in the augmented reality environment.
(8) Gesture interactions that trigger augmented reality
1) When the distance between the current position of the virtual hand of the augmented reality and the virtual object is less than or equal to 0.3 and the distances between the virtual hand key points 8, 12, 16 and 20 and the virtual hand key point 0 are all less than or equal to 0.05, triggering gesture interaction for picking up the virtual object.
2) When the distances between the virtual hand key points 8, 12, 16 and 20 of the augmented reality and the virtual hand key point 0 are all more than 0.05, triggering gesture interaction for putting down the virtual object.
Because the invention adopts the key point detection network to capture the hand data for interaction, compared with the traditional handle and data glove interaction mode which has high price and needs special hardware, the invention has lower equipment cost and stronger universality.
(9) Performance evaluation
The number of frames per second FPS of the processed image of the monocular camera augmented reality gesture interaction method based on key point detection is evaluated as follows:
wherein t is e Is the time when one frame is processed, t s Is the time of starting processing of one frame, the frame number FPS of the processed image per second>At 30 frames/second, the method is a method with high real-time performance.
And (3) completing the monocular camera augmented reality gesture interaction method based on key point detection.
Example 2
The monocular camera augmented reality gesture interaction method based on key point detection in the embodiment comprises the following steps:
(1) Acquiring an input image
This step is the same as in example 1.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence threshold value theta, theta epsilon (0, 1), wherein the value theta of the embodiment is 0.01, and starting the hand detection model when the hand confidence is lower than theta.
The other steps of this step are the same as those of example 1.
(3) Determining character string data
This step is the same as in example 1.
(4) Transmitting character string data S
This step is the same as in example 1.
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, adding skeleton rotation angles and relative displacement on key points, wherein the rotation angle of a wrist key point 0 is 0-180 degrees, the rotation angles of other key points are 0-90 degrees, the rotation angle of the wrist key point 0 is 0 degrees, the rotation angles of other key points are 0 degrees, and the radius of a finger on a camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
The other steps were the same as in example 1. And (3) completing the monocular camera augmented reality gesture interaction method based on key point detection.
Example 3
The monocular camera augmented reality gesture interaction method based on key point detection in the embodiment comprises the following steps:
(1) Acquiring an input image
This step is the same as in example 1.
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence threshold value theta, theta epsilon (0, 1), wherein the value theta of the embodiment is 0.98, and starting the hand detection model when the hand confidence is lower than theta.
The other steps of this step are the same as those of example 1.
(3) Determining character string data
This step is the same as in example 1.
(4) Transmitting character string data S
This step is the same as in example 1.
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, adding skeleton rotation angles and relative displacement on key points, wherein the rotation angle of a wrist key point 0 is 0-180 degrees, the rotation angles of other key points are 0-90 degrees, the rotation angle of the wrist key point 0 is 180 degrees, the rotation angles of other key points are 90 degrees, and the radius of a finger on a camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
The other steps were the same as in example 1. And (3) completing the monocular camera augmented reality gesture interaction method based on key point detection.
In order to verify the beneficial effects of the invention, the inventor adopts the method of the embodiment 1 of the invention to carry out simulation experiments, and the experimental conditions are as follows:
1. simulation conditions
Software environment: pyCharm 2019.3.1x64.
The hardware conditions are as follows: 1 personal computer, 1 Nvidia3060Ti video card, 1 1080P camera, 1 personal mobile phone.
Computer configuration:
1) A processor: intel (R) Core (TM) i7-10700 CPU@2.90GHz 2.90GHz.
2) Memory: 32.0GB.
The software platform is as follows: python3.8.
Other third library: opencv-python4.6.0, mediap 0.9.1, socket.
2. Simulation content and results
Experiments were performed under the above simulation conditions, and the experimental results are shown in fig. 3.
In fig. 3, the abscissa represents the running time of the present invention, and the ordinate represents the number of frames the present invention can process for 1 second, i.e. FPS; as can be seen from fig. 3, the frame number of the video image processed in 1 second fluctuates by about 30, which indicates that the video image processing speed is high and the real-time performance is achieved.
Claims (3)
1. The monocular camera augmented reality gesture interaction method based on key point detection is characterized by comprising the following steps of:
(1) Acquiring an input image
Taking a real-time image shot by a monocular camera as an input image, wherein the width w of the input image is at least 400 pixels, and the height h is at least 200 pixels;
(2) Detecting hand keypoint coordinates
The hand key point detection network is used for obtaining hand key point coordinates of an input image according to the following method:
1) Setting a hand confidence coefficient threshold value theta, theta epsilon (0, 1), and starting a hand detection model when the hand confidence coefficient is lower than theta;
2) Hand detection is carried out on the input image, from left to right, the hand numbers are respectively number 0 and 1, n is 0 or 1, and n is H n ,H n Including left hand label l or right hand label r;
3) Positioning the detected hand, and cutting out a hand area;
4) Inputting each hand region into a hand key point detection network to detect hand key points, and outputting H n The coordinates of the 21 key points of the hand are as follows:
key point 0 is wrist and marked asThe key points 1 to 4 are four joints from the root of the thumb to the fingertip, and are marked as +.>The key points 5 to 8 are four joint points from the root of the index finger to the tip of the finger, and are marked asThe key points 9 to 12 are four nodes from the root of the middle finger to the tip of the finger, and are marked asThe key points 13 to 16 are the ring fingers from the root to the fingerThe four sharp nodes, designated asThe key points 17 to 20 are four joint points from the root of the little finger to the fingertip, and are marked as
Will H n The hand key point j is marked asThe x, y and z axes are respectively +.>Wherein-> The x-axis and y-axis coordinates of (2) are the relative coordinates of the key point 0 on the input image, and the z-axis coordinates of (4)>Is a minimum value, marked as n-hand H n Is>If the number is negative, the root of the wrist is far away from the camera, otherwise, the wrist is close to the camera;
(3) Determining character string data
Press-down type pair H n Converting the coordinates of key points of the hands into x, y and z axes after conversion respectively
Calculating the n-hand H by pressing n The distance between the key point 5 and the key point 17 as the palm width L n :
The character string data S is obtained as follows:
wherein k e {0,1,., 20};
(4) Transmitting character string data S
Transmitting the character string data S to the Unity engine through a user datagram protocol;
(5) Constructing virtual hand models
Drawing skeleton positions by using a Unity engine, and adding skeleton rotation angles and relative displacement on key points;
(6) Storing character string data S
1) The hand coordinate data is transmitted to a Unity engine;
2) Standardized character string data S' is obtained as follows:
3) Storing normalized string data S'
The normalized string data S' is stored to the left-hand string h as follows l And right hand string h r In (a):
wherein N is a null value;
(7) Virtual hand movement
1) Comma is used for writing the left-hand character string h l And right hand string h r Split into left-hand string arrays F containing 64 substrings l ' and Right-hand string array F r ' left-hand string array F is set using the float Parse function l ' and Right-hand string array F r ' conversion to left-hand floating point arrays F, respectively l And right-hand floating point array F r :
Wherein, the left hand character string h l N is not assigned to the left-hand floating point array F l The method comprises the steps of carrying out a first treatment on the surface of the Right hand string h r When N is set, the N is not assigned to the right-hand floating point array F r ;
2) The virtual hand as a whole is used as a sub-object of the camera to move along with the camera;
3) Determining relative coordinates of key points
Determining the relative coordinates of the left hand key point i with respect to the left hand key point 0 as followsLeft hand palm width d lH :
Wherein, the liquid crystal display device comprises a liquid crystal display device,for left-hand floating point array F l Element i>For left-hand floating point array F l Element 3i of (2) receives the x-axis coordinates of the left hand key i,/c>For left-hand floating point array F l Element 0 of (2) receives the x-axis coordinates of left-hand keypoint 0, < >>For left-hand floating point array F l Element 3i+1 of (1) receives the left hand offY-axis coordinate of key i, +.>For left-hand floating point array F l Element 1 of (2) receives the y-axis coordinates of left-hand keypoint 0, +>For left-hand floating point array F l The 3i+2 element of (2) receives the z-axis coordinate of the left-hand key i, ++>For left-hand floating point array F l The element 2 of (2) receives the z-axis coordinate of the left-hand key point 0;
the relative coordinates of the right hand key point i with respect to the right hand key point 0 are determined as followsPalm width d of right hand rH :
Wherein, the liquid crystal display device comprises a liquid crystal display device,float for the right handPoint array F r Element i>For right-hand floating-point array F r Element 3i of (2) receives the x-axis coordinates of the right hand key i,/c>For right-hand floating-point array F r Element 0 of (2) receives the x-axis coordinates of right hand keypoint 0, < >>For right-hand floating-point array F r 3i+1 element of (2) receives the y-axis coordinate of the right hand key i, ++>For right-hand floating-point array F r Element 1 of (2) receives the y-axis coordinates of right hand keypoint 0, +>For right-hand floating-point array F r The 3i+2 element of (2) receives the z-axis coordinate of the right-hand key i, ++>For right-hand floating-point array F r The element 2 of (2) receives the z-axis coordinate of the right-hand key point 0;
4) Determining virtual hand scaling ratio
Determining the distance d between the left hand keypoint 0 and the left hand keypoint 1 as follows lR :
The distance d between the right hand keypoint 0 and the right hand keypoint 1 is determined by the following rR :
Determining a virtual left hand scaling ratio M by pressing l :
Determining a virtual right hand zoom ratio M by pressing r :
Wherein d lM Is the distance between the key point 0 and the key point 1 of the left hand of the virtual hand model, d rM The distance between the key point 0 and the key point 1 of the right hand of the virtual hand model is;
5) Determining relative hand movement position
Determining coordinates C of the virtual left hand position motion relative to the camera by pressing lx 、C ly 、C lz :
Wherein D is lx 、D ly 、D lz Initial position D for virtual left hand l Coordinates relative to the camera;
determining relative camera when virtual right hand position moves by pressingCoordinates C of the image head rx 、C ry 、C rz :
Wherein D is rx 、D ry 、D rz Initial position D for virtual right hand r Coordinates relative to the camera;
(8) Gesture interactions that trigger augmented reality
1) Triggering gesture interaction for picking up the virtual object when the distance between the current position of the virtual hand of the augmented reality and the virtual object is less than or equal to 0.3 and the distances between the virtual hand key points 8, 12, 16 and 20 and the virtual hand key point 0 are all less than or equal to 0.05;
2) Triggering gesture interaction for putting down a virtual object when the distances between the virtual hand key points 8, 12, 16 and 20 of the augmented reality and the virtual hand key point 0 are all more than 0.05;
(9) Performance evaluation
The number of frames per second FPS of the processed image of the monocular camera augmented reality gesture interaction method based on key point detection is evaluated as follows:
wherein t is e Is the time when one frame is processed, t s Is the time of starting processing of one frame, the frame number FPS of the processed image per second>At 30 frames/second, the method is a method with high real-time performance.
2. The monocular camera augmented reality gesture interaction method based on keypoint detection of claim 1, wherein the method is characterized by: in the step (2) of detecting the hand key point coordinates, the confidence threshold of the hand detection is 0.68.
3. The monocular camera augmented reality gesture interaction method based on keypoint detection of claim 1, wherein the method is characterized by: in the step of constructing the virtual hand model (5), the adding skeleton rotation angle on the key points is as follows: the rotation angle of the wrist key point 0 is 0-180 degrees, the rotation angles of the rest key points are 0-90 degrees, and the radius of the finger on the camera is 12d lM In (d), where d lM Is the distance between keypoint 0 and keypoint 1 of the left hand of the virtual hand model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310309434.9A CN116403280A (en) | 2023-03-28 | 2023-03-28 | Monocular camera augmented reality gesture interaction method based on key point detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310309434.9A CN116403280A (en) | 2023-03-28 | 2023-03-28 | Monocular camera augmented reality gesture interaction method based on key point detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116403280A true CN116403280A (en) | 2023-07-07 |
Family
ID=87015360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310309434.9A Pending CN116403280A (en) | 2023-03-28 | 2023-03-28 | Monocular camera augmented reality gesture interaction method based on key point detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116403280A (en) |
-
2023
- 2023-03-28 CN CN202310309434.9A patent/CN116403280A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021103648A1 (en) | Hand key point detection method, gesture recognition method, and related devices | |
CN107563494B (en) | First-view-angle fingertip detection method based on convolutional neural network and heat map | |
Zhou et al. | A novel finger and hand pose estimation technique for real-time hand gesture recognition | |
CN112926423B (en) | Pinch gesture detection and recognition method, device and system | |
CN109597485B (en) | Gesture interaction system based on double-fingered-area features and working method thereof | |
CN108509026B (en) | Remote maintenance support system and method based on enhanced interaction mode | |
CN111680594A (en) | Augmented reality interaction method based on gesture recognition | |
US20100103092A1 (en) | Video-based handwritten character input apparatus and method thereof | |
CN107832736B (en) | Real-time human body action recognition method and real-time human body action recognition device | |
CN109145802B (en) | Kinect-based multi-person gesture man-machine interaction method and device | |
Linqin et al. | Dynamic hand gesture recognition using RGB-D data for natural human-computer interaction | |
CN111444488A (en) | Identity authentication method based on dynamic gesture | |
CN107292295B (en) | Gesture segmentation method and device | |
JP2003256850A (en) | Movement recognizing device and image processor and its program | |
WO2024078088A1 (en) | Interaction processing method and apparatus | |
Liu et al. | Ultrasonic positioning and IMU data fusion for pen-based 3D hand gesture recognition | |
CN116403280A (en) | Monocular camera augmented reality gesture interaction method based on key point detection | |
Dhamanskar et al. | Human computer interaction using hand gestures and voice | |
CN117011929A (en) | Head posture estimation method, device, equipment and storage medium | |
Jiang et al. | A brief analysis of gesture recognition in VR | |
CN114077307A (en) | Simulation system and method with input interface | |
Wang | Real-time hand-tracking as a user input device | |
Dutta et al. | A Hand Gesture-operated System for Rehabilitation using an End-to-End Detection Framework | |
CN113703564A (en) | Man-machine interaction equipment and system based on facial features | |
CN113705280B (en) | Human-computer interaction method and device based on facial features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |