CN111696140B - Monocular-based three-dimensional gesture tracking method - Google Patents

Monocular-based three-dimensional gesture tracking method Download PDF

Info

Publication number
CN111696140B
CN111696140B CN202010387724.1A CN202010387724A CN111696140B CN 111696140 B CN111696140 B CN 111696140B CN 202010387724 A CN202010387724 A CN 202010387724A CN 111696140 B CN111696140 B CN 111696140B
Authority
CN
China
Prior art keywords
dimensional
tracking
coordinates
bone
head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010387724.1A
Other languages
Chinese (zh)
Other versions
CN111696140A (en
Inventor
吴涛
周锋宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Xiaoniao Kankan Technology Co Ltd
Original Assignee
Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Xiaoniao Kankan Technology Co Ltd filed Critical Qingdao Xiaoniao Kankan Technology Co Ltd
Priority to CN202010387724.1A priority Critical patent/CN111696140B/en
Publication of CN111696140A publication Critical patent/CN111696140A/en
Application granted granted Critical
Publication of CN111696140B publication Critical patent/CN111696140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/213Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/1087Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Abstract

The invention provides a monocular-based three-dimensional gesture tracking method, which comprises the following steps: training a hand detection model and a skeleton point identification model, starting the hand detection model and a tracking module according to the detection number of hands in the previous frame image, identifying skeleton points in a region of interest of a current frame in Trackhand through the skeleton point identification model, and performing smoothing filter processing on the identified skeleton points; the data of the head part relative to the position and the gesture in each frame of image are counted, the data of the head part are stored in a queue track of the tracking module in real time, the three-dimensional skeleton coordinates of skeleton points after smoothing filter processing are determined by combining the data of the head part in the track, rendering processing is carried out on the three-dimensional skeleton coordinates to finish gesture tracking, one single-eye camera is used for replacing two infrared double-eye cameras, the cost is reduced, even if the two single-eye cameras are installed, the cost is lower than that of one infrared double-eye camera, the whole energy consumption and heat dissipation are reduced, the whole head wearing quality is lightened, and the head wearing comfort is improved.

Description

Monocular-based three-dimensional gesture tracking method
Technical Field
The invention relates to the field of computer vision, in particular to a monocular-based three-dimensional gesture tracking method.
Background
In order to enhance the sense of immersion of virtual-real combination of VR/AR/MR, the VR/AR/MR has better experience, a human-computer interaction module is indispensable, and particularly, the high-precision real-time restoration of the 3D gesture of the hand in the VR/AR/MR scene greatly influences the sense of immersion of the user experience in the VR/AR/MR scene.
At present, in the VR/AR/MR field, on the mainstream headset, a gesture recognition tracker needs to be additionally added, and in the traditional method, 2 infrared binocular cameras or depth cameras are additionally and independently added to realize finger tracking, but in the VR/AR/MR field, the following problems may exist: 1. additional costs are added; 2. the additional power consumption is increased, and the main stream of the headset is in the form of an integrated machine, namely, the headset is automatically powered by a battery, so that the power consumption of the whole system greatly influences the time of user interaction; 3. heat dissipation can also be a significant challenge while increasing power consumption; 4. the structural design increases the complexity of the structural design and the challenges of ID, and violates the development goals that the head-wearing integrated machine is small in size, portable to wear and free of uncomfortable feeling when worn for a long time by a user; 5. the FOV of the current mature and popular depth camera is generally smaller than about 90 degrees, and the FOV required by the head wear is generally about 110 degrees, namely, the conventional method adopts the depth camera, so that some motion tracks of hands are very easy to be not tracked.
Therefore, there is a need for a monocular-based three-dimensional gesture tracking method that saves cost, reduces power consumption, reduces heat dissipation, increases the visible area, and can reduce the weight of the head and improve the comfort of the head.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a three-dimensional gesture tracking method based on monocular, so as to solve the problems of high cost, large power consumption, high heat dissipation of an infrared binocular camera, increased complexity of the whole head-mounted structure design, increased head-mounted volume, discomfort caused by long-time wearing of a user, small visual angle, and very easy incomplete tracking of the motion trail of hands in the existing method.
The invention provides a monocular-based three-dimensional gesture tracking method, which is characterized by comprising the following steps of:
training a hand detection model and a bone point identification model to enable the hand detection model to automatically lock a hand region of an image to serve as a region of interest, and enabling the bone point identification model to automatically identify bone points in the region of interest;
starting the hand detection model and the tracking module according to the detection number of the hands in the previous frame of image to acquire an interested region of the current frame, and storing the data information of the current frame into a tracking queue Trackhand of the tracking module; the data information of the current frame at least comprises an interested region of the current frame;
performing skeleton point identification on the interested region of the current frame in the Trackhand through the skeleton point identification model, and performing smooth filtering processing on the identified skeleton points according to historical data in the Trackhand;
and storing the data of the head part about the position and the gesture in each frame of image into a queue Trackhead of the tracking module in real time, and determining the three-dimensional skeleton coordinates of skeleton points after the smoothing filter processing by combining the data of the head part in the Trackhead so as to finish gesture tracking.
Preferably, in training the hand detection model and the skeletal point recognition model,
collecting hand image data of at least 100 users by using a head tracking camera as an action behavior case;
and inputting the action behavior cases into the hand detection model and the bone point recognition model for model training.
Preferably, in the process of starting the hand detection model and the tracking module according to the detection number of the hands in the previous frame of image,
if the detection number is 0 or 1, starting the hand detection model and the tracking module;
if the detection number is 2, only the tracking module is started.
Preferably, in the process of identifying the bone points of the region of interest of the current frame in the tracking,
the region of interest comprises the position coordinates of the hand in the image and the size of the region corresponding to the hand;
the number of the skeleton points is 21.
Preferably, in the process of acquiring the region of interest of the current frame, the method further includes:
and estimating the region of interest of the next frame according to the region of interest of the current frame based on an optical flow tracking algorithm so as to provide a reference for bone point identification of the next frame.
Preferably, in determining the three-dimensional bone coordinates of the smoothed bone point in combination with the data of the header in the Trackhead,
reading the data of the head in the track, and acquiring a transfer matrix T and a rotation matrix R of the current frame relative to the head of the previous frame;
and determining the three-dimensional coordinates of the skeleton points according to preset calibration parameters of the tracking camera, the transfer matrix T and the rotation matrix R.
Preferably, the calibration parameters of the preset tracking cameraWherein,
fx, fy represents the pixel focal length of the tracking camera, cx, cy represents the position of the tracking camera optical axis at the image coordinates.
Preferably, in determining the three-dimensional coordinates of the bone points according to the preset calibration parameters of the tracking camera, the transfer matrix T and the rotation matrix R,
selecting any one of the bone points to perform three-dimensional coordinate calculation;
each bone point is calculated in turn until all bone points of both hands are calculated.
Preferably, in selecting any one of the bone points for three-dimensional coordinate calculation,
selecting a point P in the skeleton points, acquiring coordinate information of the point P, and enabling the three-dimensional coordinates of the last frame of the point P to beTwo-dimensional image coordinates +.>Two-dimensional image coordinates are known to be +.> Assume that the three-dimensional coordinates of the preset point P in the current frame are: />
From the coordinate information of the point P And p2=r p1+t; then->+T; wherein the K is -1 For the calibration parameters->Is a matrix of inverse of (a).
Acquiring three-dimensional skeleton coordinates of the point P in the current frameAnd completing the conversion from the two-dimensional bone coordinates to the three-dimensional bone coordinates.
Preferably, in determining three-dimensional bone coordinates of the smoothed bone points in combination with the data of the head in the Trackhead, to complete gesture tracking,
fusing the smoothly filtered skeleton points with the tracking data of the head, and moving the fused data from a camera coordinate system to a coordinate system worn by the VR virtual reality head to form three-dimensional gesture information;
and transmitting the three-dimensional gesture information to a game engine, rendering and then transmitting back to the VR virtual head in real time for display processing, so as to complete gesture tracking.
According to the technical scheme, the three-dimensional gesture tracking method based on the monocular is characterized in that the hand detection model and the bone point recognition model are trained to obtain the region of interest in the hand image shot by the monocular, the region of interest of the next frame of image is estimated according to the region of interest of the previous frame of image, bone point recognition is carried out on the region of interest, meanwhile, head motion data is obtained, three-dimensional calculation is carried out on two-dimensional image data captured by the monocular by combining the head data, so that three-dimensional coordinates of all bone points are determined, two-dimensional to three-dimensional conversion is completed, therefore, the display of three-dimensional coordinates of the hand area can be realized by one common monocular, the cost of the two infrared binocular cameras is reduced by replacing one monocular camera, the cost of the two monocular cameras is lower than that of the infrared binocular camera, the whole energy consumption and heat dissipation are reduced, the monocular camera is mounted on the head, the structural design is reduced, the whole quality of the head is lightened, the comfort of the head is improved, the user does not feel uncomfortable even if wearing the head is carried for a long time, in addition, the display of the monocular camera has a large FOV, and the motion trail of the whole hand can be captured comprehensively.
Drawings
Other objects and attainments together with a more complete understanding of the invention will become apparent and appreciated by referring to the following description taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 is a schematic diagram of a three-dimensional gesture tracking method based on a single purpose according to an embodiment of the present invention;
fig. 2 is a schematic diagram of hand skeleton points in a single-purpose three-dimensional gesture tracking method according to an embodiment of the present invention.
Detailed Description
At present, in the VR/AR/MR field, a gesture recognition tracker needs to be additionally added on a mainstream headset, namely, 2 infrared binocular cameras or depth cameras are added as gesture recognition trackers, which has the following problems: 1. additional costs are added; 2. the additional power consumption is increased, and the main stream of the headset is in the form of an integrated machine, namely, the headset is automatically powered by a battery, so that the power consumption of the whole system greatly influences the time of user interaction; 3. heat dissipation can also be a significant challenge while increasing power consumption; 4. the structural design increases the complexity of the structural design and the challenges of ID, and violates the development goals that the head-wearing integrated machine is small in size, portable to wear and free of uncomfortable feeling when worn for a long time by a user; 5. the FOV of the current mature and popular depth camera is generally smaller than about 90 degrees, and the FOV required by the head wear is generally about 110 degrees, namely, the conventional method adopts the depth camera, so that some motion tracks of hands are very easy to be not tracked.
In view of the foregoing, the present invention provides a three-dimensional gesture tracking method based on monocular, and specific embodiments of the present invention will be described in detail with reference to the accompanying drawings.
In order to illustrate the single-purpose three-dimensional gesture tracking method provided by the invention, fig. 1 illustrates a single-purpose three-dimensional gesture tracking method according to an embodiment of the invention.
The following description of the exemplary embodiment(s) is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. Techniques and equipment known to those of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
As shown in fig. 1, the single-purpose three-dimensional gesture tracking method provided by the invention comprises the following steps:
s110: training a hand detection model and a bone point identification model to enable the hand detection model to automatically lock a hand region of an image to serve as a region of interest, and enabling the bone point identification model to automatically identify bone points in the region of interest;
s120: firstly judging the detection number of hands in the previous frame of image, starting the hand detection model and the tracking module according to the detection number of the hands in the previous frame of image so as to acquire the region of interest of the current frame, and storing the data information of the current frame into a tracking queue Trackhand of the tracking module;
s130: acquiring a region of interest on a current frame image from the Trackhand, identifying a bone point of the region of interest of the current frame in the Trackhand through the bone point identification model, and carrying out smoothing filter processing on the identified bone point according to historical data in the Trackhand;
s140: and counting data of the head part on the position and the gesture in each frame of image, storing the data of the head part into a queue Trackhead of the tracking module in real time, and determining three-dimensional skeleton coordinates of skeleton points after smoothing filter processing by combining the data of the head part in the Trackhead so as to finish gesture tracking.
As shown in fig. 1, in step S110, a head tracking camera is first used to collect hand image data of at least 100 users as action behavior cases, and after the action behavior cases are collected, the action behavior cases are input into a hand detection model and a skeleton point recognition model for model training; the head tracking is not particularly limited, and can be an infrared binocular camera, a monocular camera and an acquisition method, wherein the acquisition method is not particularly limited, can be a traditional method for acquiring hand image data marked with skeleton points by manually marking the skeleton points, can also be a front-edge intelligent mechanized acquisition, in the embodiment, the head tracking camera worn on the head is adopted for acquisition, and then the skeleton points of the hand image are marked, so that the skeleton points are marked accurately, and a hand detection model and a skeleton point identification model are trained more accurately; if 1 head tracking camera is used for head position tracking, each frame of the input hand detection model and the bone point identification model is one image data, and if a plurality of head tracking cameras are used for position tracking, each frame of the input hand detection model and the bone point identification model is a plurality of image data, and in the embodiment, a single head tracking camera is adopted; the number of collected hand image data is not particularly limited, and the model accuracy is higher as the number is larger, in this embodiment, at least 100 hand images of users are collected as action behavior cases, so that the accuracy of the hand detection model and the skeletal point identification model is higher.
As shown in fig. 1, in the three-dimensional gesture tracking method based on a single object provided by the invention, in step S120, the number of detected hands in the previous frame of image is firstly determined, if the number of detected hands is 0 or 1, the hand detection model and the tracking module are started, if the number of detected hands is 2, the tracking module is started only, that is, one person has at most two hands, if the number of detected hands is 2, it is proved that the two hands in the previous frame are in the picture, only the data information of the current frame is required to be stored in the tracking module to wait for the subsequent skeletal point recognition, and if no hand or only one hand is required in the previous frame, the interested region of the current frame is required to be acquired first in the current frame, so that the subsequent skeletal point recognition is performed on the interested region.
As shown in fig. 1 and fig. 2 together, in step S120, the region of interest includes the position coordinates of the hand in the image and the size of the region corresponding to the hand, the total number of skeleton points of the hand is 21, and in the process of acquiring the region of interest of the current frame and identifying skeleton points, the tracking module is started, the data information of the current frame is stored in the tracking queue tracking of the tracking module, and meanwhile, the region of interest of the next frame is estimated according to the region of interest of the current frame based on the optical flow tracking algorithm to provide a reference for identifying skeleton points of the next frame.
As shown in fig. 1, in the single-purpose three-dimensional gesture tracking method provided by the invention, in step S130, a region of interest of a hand on a current frame image is obtained from a track, bone point recognition of the hand is performed on the region of interest of image data through a bone point recognition model, and then each bone point is subjected to smoothing filter processing by comparing with historical data of each bone point, so that the possibility that a certain bone point in a certain frame is not recognized stably is avoided, and the accuracy and stability of hand bone point recognition are improved; the historical data here refers to all the data stored in the trace queue trace every time the step S120 is performed, that is, the set of the data information of the current frame stored every time, and may also be said to be all the data stored in the trace queue trace every time the data information of the current frame is stored when all the data information of the current frame is obtained from the region of interest of the current frame.
As shown in fig. 1, in the single-purpose three-dimensional gesture tracking method provided by the invention, in step S140, data about positions and postures of a head in each frame of image is counted, the data of the head is stored in a queue Trackhead of the tracking module in real time, and three-dimensional skeleton coordinates of the skeleton point are calculated by combining the data in the Trackhead, so that two-dimensional to three-dimensional gesture tracking is completed.
As shown in fig. 1, in the single-purpose three-dimensional gesture tracking method provided by the invention, in step S140, in the process of determining the three-dimensional skeleton coordinates of the skeleton point after smoothing filtering processing by combining the data of the head in the Trackhead, firstly, the data of the head in the Trackhead is read, and a transfer matrix T and a rotation matrix R of the current frame relative to the head of the previous frame are obtained; determining three-dimensional coordinates of skeleton points according to preset calibration parameters of a tracking camera, a transfer matrix T and a rotation matrix R; wherein, the calibration parameters of the preset tracking cameraWherein fx, fy represents the pixel focal length of the tracking camera, cx, cy represents the position of the optical axis of the tracking camera at the image coordinates; when the coordinate calculation is performed, one bone point can be selected for three-dimensional calculation, then three-dimensional coordinates of the rest forty-one bone points (each hand comprises 21 bone points) in the two hands are sequentially calculated according to a consistent calculation method until the three-dimensional coordinates of all bone points in the two hands are calculated, and then subsequent rendering processing is performed to complete gesture tracking of the two hands.
As shown in fig. 1, in step S140, in the three-dimensional gesture tracking method based on a single object provided by the present invention, in the process of selecting any one of bone points to perform three-dimensional coordinate calculation,
the specific operation mode is not particularly limited, in this embodiment, a point P in the skeleton points is selected first, coordinate information of the point P is obtained, and the last frame of the point P is made threeDimensional coordinates ofTwo-dimensional image coordinates +.>Two-dimensional image coordinates are known to be +.>Assume that the three-dimensional coordinates of the preset point P in the current frame are: />Here, P1, L1, and L2 are all matrices, and X1, Y1, Z1, u1, and v1 are all row matrices or column matrices, which are also referred to as matrix operations below.
And then the coordinate information of the point P can be obtained
And p2=r p1+t; (3)
derived from (1) (2) (3)Therefore (S)> Wherein the K is -1 For the calibration parameter +.>An inverse matrix of (a);
acquiring three-dimensional skeleton coordinates of point P in current frameThe conversion of the two-dimensional bone coordinates into the three-dimensional bone coordinates is completed, and the above calculation omits an intermediate process, which is specifically performed by the head-mounted internal chip.
As shown in fig. 1, in the single-purpose three-dimensional gesture tracking method provided by the present invention, in step S140, three-dimensional bone coordinates of the bone points after the smoothing process are determined in combination with the data of the head in the Trackhead, so as to complete the gesture tracking process,
fusing the smoothly filtered skeleton points with the tracking data of the head, and moving the fused data from a camera coordinate system to a coordinate system worn by the VR virtual reality head to form three-dimensional gesture information; and then the three-dimensional gesture information is transmitted to a game engine, rendered and then transmitted back to the VR virtual head to be displayed, so that a user can see the situation picture on a display of the VR virtual reality head, and gesture tracking is completed.
According to the method for tracking the three-dimensional gestures based on the single purpose, the hand detection model and the bone point recognition model are trained to obtain the region of interest in the hand image shot by the single purpose camera, the region of interest of the next frame of image is estimated according to the region of interest of the previous frame of image, then the bone point recognition is carried out on the region of interest, meanwhile, head motion data is obtained, three-dimensional calculation is carried out on the two-dimensional image data captured by the single purpose camera by combining the head data, so that the three-dimensional coordinates of all the bone points are determined, the conversion from two-dimensional to three-dimensional is completed, therefore, the display of the three-dimensional coordinates of the hand area can be realized by one common single purpose camera, the cost of the camera is reduced, the whole energy consumption and the heat dissipation are reduced even if the two single purpose cameras are installed on the head, the structural design is reduced, the whole quality of the head is lightened, the comfort of the head is improved, the user does not feel uncomfortable even if wearing the single purpose camera for a long time, in addition, the single purpose camera is larger in addition, and the hand motion trail can be captured more comprehensively.
The proposed single-purpose based three-dimensional gesture tracking method according to the present invention is described above by way of example with reference to the accompanying drawings. However, it will be appreciated by those skilled in the art that various modifications may be made to the monocular-based three-dimensional gesture tracking method set forth in the foregoing disclosure without departing from the teachings of the present disclosure. Accordingly, the scope of the invention should be determined from the following claims.

Claims (9)

1. A monocular-based three-dimensional gesture tracking method, comprising:
training a hand detection model and a bone point identification model to enable the hand detection model to automatically lock a hand region of an image to serve as a region of interest, and enabling the bone point identification model to automatically identify bone points in the region of interest;
starting the hand detection model and the tracking module according to the detection number of hands in the previous frame image, acquiring an interested region of the current frame, and storing the data information of the current frame into a tracking queue Trackhand of the tracking module; the data information of the current frame at least comprises an interested region of the current frame;
performing skeleton point identification on the interested region of the current frame in the Trackhand through the skeleton point identification model, and performing smooth filtering processing on the identified skeleton points according to historical data in the Trackhand;
storing the data of the head part about the position and the gesture in each frame of image into a queue Trackhead of the tracking module in real time, and determining the three-dimensional skeleton coordinates of skeleton points after the smoothing filter processing by combining the data of the head part in the Trackhead so as to finish gesture tracking;
in determining three-dimensional bone coordinates of the smoothed bone points in combination with the data of the head in the Trackhead,
reading the data of the head in the Trackhead, and acquiring a transfer matrix T and a rotation matrix R of the head of the current frame relative to the head of the previous frame;
and determining the three-dimensional coordinates of the skeleton points according to preset calibration parameters of the tracking camera, the transfer matrix T and the rotation matrix R.
2. The method of claim 1, wherein, in training the hand detection model and the skeletal point recognition model,
collecting hand image data of at least 100 users by using a head tracking camera as an action behavior case;
and inputting the action behavior cases into the hand detection model and the bone point recognition model for model training.
3. The method of claim 1, wherein in the process of starting the hand detection model and the tracking module according to the number of hands detected in the previous frame of image,
if the detection number is 0 or 1, starting the hand detection model and the tracking module;
if the detection number is 2, only the tracking module is started.
4. The method of claim 1, wherein during skeletal point recognition of a region of interest of a current frame in the Trackhand,
the region of interest comprises the position coordinates of the hand in the image and the size of the region corresponding to the hand;
the number of the skeleton points is 21.
5. The monocular-based three-dimensional gesture tracking method of claim 1, further comprising, in acquiring the region of interest of the current frame:
and estimating the region of interest of the next frame according to the region of interest of the current frame based on an optical flow tracking algorithm so as to provide a reference for bone point identification of the next frame.
6. The method of claim 1, wherein the method comprises the steps of,
calibration parameters of the preset tracking cameraWherein,
fx, fy represents the pixel focal length of the tracking camera, cx, cy represents the position of the tracking camera optical axis at the image coordinates.
7. The method for tracking a three-dimensional gesture based on a single object according to claim 1, wherein in determining the three-dimensional coordinates of the skeletal points according to preset calibration parameters of a tracking camera, the transfer matrix T and the rotation matrix R,
selecting any one of the bone points to perform three-dimensional coordinate calculation;
each bone point is calculated in turn until all bone points of both hands are calculated.
8. The method of claim 6, wherein, in selecting any one of the skeletal points for three-dimensional coordinate calculation,
selecting a point P in the skeleton points, acquiring coordinate information of the point P, and enabling the three-dimensional coordinates of the last frame of the point P to beTwo-dimensional image coordinates +.>Two-dimensional image coordinates are known to be +.> Assume that the three-dimensional coordinates of the preset point P in the current frame are: />
From the coordinate information of the point P And p2=r p1+t; then->Wherein the K is -1 For the calibration parametersAn inverse matrix of (a);
acquiring three-dimensional skeleton coordinates of the point P in the current frameAnd completing the conversion from the two-dimensional bone coordinates to the three-dimensional bone coordinates.
9. The method of claim 1, wherein in determining three-dimensional bone coordinates of the smoothed bone points in combination with data of the head in the Trackhead to complete gesture tracking,
fusing the smoothly filtered skeleton points with the tracking data of the head, and moving the fused data from a camera coordinate system to a coordinate system worn by the VR virtual reality head to form three-dimensional gesture information;
and transmitting the three-dimensional gesture information to a game engine, rendering and then transmitting back to the VR virtual head in real time for display processing, so as to complete gesture tracking.
CN202010387724.1A 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method Active CN111696140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010387724.1A CN111696140B (en) 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010387724.1A CN111696140B (en) 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method

Publications (2)

Publication Number Publication Date
CN111696140A CN111696140A (en) 2020-09-22
CN111696140B true CN111696140B (en) 2024-02-13

Family

ID=72477396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010387724.1A Active CN111696140B (en) 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method

Country Status (1)

Country Link
CN (1) CN111696140B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416125A (en) * 2020-11-17 2021-02-26 青岛小鸟看看科技有限公司 VR head-mounted all-in-one machine
CN112927259A (en) * 2021-02-18 2021-06-08 青岛小鸟看看科技有限公司 Multi-camera-based bare hand tracking display method, device and system
CN112927290A (en) * 2021-02-18 2021-06-08 青岛小鸟看看科技有限公司 Bare hand data labeling method and system based on sensor
CN113240741B (en) 2021-05-06 2023-04-07 青岛小鸟看看科技有限公司 Transparent object tracking method and system based on image difference
CN113674395B (en) * 2021-07-19 2023-04-18 广州紫为云科技有限公司 3D hand lightweight real-time capturing and reconstructing system based on monocular RGB camera
CN117809380A (en) * 2024-02-29 2024-04-02 万有引力(宁波)电子科技有限公司 Gesture tracking method, gesture tracking device, gesture tracking apparatus, gesture tracking program product and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104460967A (en) * 2013-11-25 2015-03-25 安徽寰智信息科技股份有限公司 Recognition method of upper limb bone gestures of human body
CN104833360A (en) * 2014-02-08 2015-08-12 无锡维森智能传感技术有限公司 Method for transforming two-dimensional coordinates into three-dimensional coordinates
CN104992171A (en) * 2015-08-04 2015-10-21 易视腾科技有限公司 Method and system for gesture recognition and man-machine interaction based on 2D video sequence
CN106250867A (en) * 2016-08-12 2016-12-21 南京华捷艾米软件科技有限公司 A kind of skeleton based on depth data follows the tracks of the implementation method of system
CN106648103A (en) * 2016-12-28 2017-05-10 歌尔科技有限公司 Gesture tracking method for VR headset device and VR headset device
CN106945059A (en) * 2017-03-27 2017-07-14 中国地质大学(武汉) A kind of gesture tracking method based on population random disorder multi-objective genetic algorithm
CN108196679A (en) * 2018-01-23 2018-06-22 河北中科恒运软件科技股份有限公司 Gesture-capture and grain table method and system based on video flowing
CN108919943A (en) * 2018-05-22 2018-11-30 南京邮电大学 A kind of real-time hand method for tracing based on depth transducer
CN110825234A (en) * 2019-11-11 2020-02-21 江南大学 Projection type augmented reality tracking display method and system for industrial scene

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104460967A (en) * 2013-11-25 2015-03-25 安徽寰智信息科技股份有限公司 Recognition method of upper limb bone gestures of human body
CN104833360A (en) * 2014-02-08 2015-08-12 无锡维森智能传感技术有限公司 Method for transforming two-dimensional coordinates into three-dimensional coordinates
CN104992171A (en) * 2015-08-04 2015-10-21 易视腾科技有限公司 Method and system for gesture recognition and man-machine interaction based on 2D video sequence
CN106250867A (en) * 2016-08-12 2016-12-21 南京华捷艾米软件科技有限公司 A kind of skeleton based on depth data follows the tracks of the implementation method of system
CN106648103A (en) * 2016-12-28 2017-05-10 歌尔科技有限公司 Gesture tracking method for VR headset device and VR headset device
CN106945059A (en) * 2017-03-27 2017-07-14 中国地质大学(武汉) A kind of gesture tracking method based on population random disorder multi-objective genetic algorithm
CN108196679A (en) * 2018-01-23 2018-06-22 河北中科恒运软件科技股份有限公司 Gesture-capture and grain table method and system based on video flowing
CN108919943A (en) * 2018-05-22 2018-11-30 南京邮电大学 A kind of real-time hand method for tracing based on depth transducer
CN110825234A (en) * 2019-11-11 2020-02-21 江南大学 Projection type augmented reality tracking display method and system for industrial scene

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
万琴 ; 余洪山 ; 吴迪 ; 林国汉 ; .基于三维视觉系统的多运动目标跟踪方法综述.计算机工程与应用.2017,(第19期),全文. *
杨凯 ; 魏本征 ; 任晓强 ; 王庆祥 ; 刘怀辉 ; .基于深度图像的人体运动姿态跟踪和识别算法.数据采集与处理.2015,(第05期),全文. *
杨露菁等.智能图像处理及应用.2019,全文. *
陈翰雄 ; 黄雅云 ; 刘宇 ; 闫梦奎 ; 刘峰 ; .基于Kinect的空中手势跟踪识别的研究与实现.电视技术.2015,(第21期),全文. *
黄敦博 ; 林志贤 ; 姚剑敏 ; 郭太良 ; .基于自适应提取和改进CAMSHIFT单目手势跟踪.电视技术.2016,(第07期),全文. *

Also Published As

Publication number Publication date
CN111696140A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN111696140B (en) Monocular-based three-dimensional gesture tracking method
CN106873778B (en) Application operation control method and device and virtual reality equipment
CN106251399B (en) A kind of outdoor scene three-dimensional rebuilding method and implementing device based on lsd-slam
CN103140879B (en) Information presentation device, digital camera, head mounted display, projecting apparatus, information demonstrating method and information are presented program
CN107545302B (en) Eye direction calculation method for combination of left eye image and right eye image of human eye
CN110363867B (en) Virtual decorating system, method, device and medium
CN103400119B (en) Face recognition technology-based mixed reality spectacle interactive display method
CN104978548A (en) Visual line estimation method and visual line estimation device based on three-dimensional active shape model
US7404774B1 (en) Rule based body mechanics calculation
CN109343700B (en) Eye movement control calibration data acquisition method and device
JP7015152B2 (en) Processing equipment, methods and programs related to key point data
CN108983982B (en) AR head display equipment and terminal equipment combined system
CN104571511B (en) The system and method for object are reappeared in a kind of 3D scenes
WO2022174594A1 (en) Multi-camera-based bare hand tracking and display method and system, and apparatus
CN104364733A (en) Position-of-interest detection device, position-of-interest detection method, and position-of-interest detection program
CN112198959A (en) Virtual reality interaction method, device and system
CN109359514B (en) DeskVR-oriented gesture tracking and recognition combined strategy method
CN104821010A (en) Binocular-vision-based real-time extraction method and system for three-dimensional hand information
JP5526465B2 (en) Nail position data detection device, nail position data detection method, and nail position data detection program
CN110955329A (en) Transmission method, electronic device, and computer storage medium
Perra et al. Adaptive eye-camera calibration for head-worn devices
WO2014100449A1 (en) Capturing photos without a camera
CN110841266A (en) Auxiliary training system and method
TWI768852B (en) Device for detecting human body direction and method for detecting human body direction
TWI361093B (en) Measuring object contour method and measuring object contour apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant