CN111696140A - Monocular-based three-dimensional gesture tracking method - Google Patents

Monocular-based three-dimensional gesture tracking method Download PDF

Info

Publication number
CN111696140A
CN111696140A CN202010387724.1A CN202010387724A CN111696140A CN 111696140 A CN111696140 A CN 111696140A CN 202010387724 A CN202010387724 A CN 202010387724A CN 111696140 A CN111696140 A CN 111696140A
Authority
CN
China
Prior art keywords
dimensional
tracking
bone
data
head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010387724.1A
Other languages
Chinese (zh)
Other versions
CN111696140B (en
Inventor
吴涛
周锋宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Xiaoniao Kankan Technology Co Ltd
Original Assignee
Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Xiaoniao Kankan Technology Co Ltd filed Critical Qingdao Xiaoniao Kankan Technology Co Ltd
Priority to CN202010387724.1A priority Critical patent/CN111696140B/en
Publication of CN111696140A publication Critical patent/CN111696140A/en
Application granted granted Critical
Publication of CN111696140B publication Critical patent/CN111696140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/213Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/1087Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Abstract

The invention provides a monocular-based three-dimensional gesture tracking method, which comprises the following steps: training a hand detection model and a skeleton point identification model, starting the hand detection model and a tracking module according to the detection number of hands in the previous frame of image, identifying skeleton points of an interested area of a current frame in Trackhand through the skeleton point identification model, and performing smooth filtering processing on the identified skeleton points; the method comprises the steps of counting data of the head in each frame of image about the position and the posture, storing the data of the head in a queue Trackhead of a tracking module in real time, determining three-dimensional bone coordinates of bone points after smoothing filtering processing by combining the data of the head in the Trackhead, rendering the three-dimensional bone coordinates to finish gesture tracking, replacing two infrared binocular cameras with one monocular camera, reducing the cost of installing the two monocular cameras compared with the infrared binocular cameras, reducing the overall energy consumption and heat dissipation, lightening the overall quality of the head wear, and improving the head wear comfort.

Description

Monocular-based three-dimensional gesture tracking method
Technical Field
The invention relates to the field of computer vision, in particular to a monocular-based three-dimensional gesture tracking method.
Background
In order to enhance the immersion of the virtual-real combination of VR/AR/MR and make the VR/AR/MR have a better experience, the human-computer interaction module is indispensable, and especially the high-precision real-time restoration of the 3D gesture of the hand in the VR/AR/MR scene greatly influences the experience immersion of the user in the VR/AR/MR scene.
At present, in the VR/AR/MR field, on a mainstream all-in-one headset, a gesture recognition tracker needs to be additionally added, and in the conventional method, 2 infrared binocular cameras are additionally and separately added, or a depth camera is used for realizing finger tracking, but in the VR/AR/MR field, the following problems can exist: 1. additional costs are added; 2. extra power consumption is added, and the existing mainstream head is worn in an all-in-one machine mode, namely, power is supplied by a battery independently, so that the power consumption of the whole system greatly influences the interaction time of a user; 3. while increasing power consumption, heat dissipation can also be a significant challenge; 4. the structural design increases the complexity of the structural design and the ID challenge, and violates the development targets that the head-wearing integrated machine is small in size and light to wear, and a user does not feel uncomfortable after wearing the head-wearing integrated machine for a long time; 5. the FOV of the existing mature and popular depth cameras is smaller and is generally about 90 degrees, the FOV required by the head-wearing is generally about 110 degrees, namely, the traditional method adopting the depth camera can not track some motion tracks of hands easily.
Therefore, there is a need for a monocular-based three-dimensional gesture tracking method that saves cost, reduces power consumption, reduces heat dissipation, increases a visible area, reduces a head weight, and improves head comfort.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a monocular-based three-dimensional gesture tracking method, so as to solve the problems of the existing method that an infrared binocular camera has high cost, large power consumption, high heat dissipation, increased complexity of the whole head-mounted structure design, increased head-mounted volume, discomfort caused by long-time wearing of a user, small visual angle, and incomplete tracking of the movement trajectory of the hand.
The invention provides a monocular-based three-dimensional gesture tracking method which is characterized by comprising the following steps:
training a hand detection model and a skeleton point identification model, so that the hand detection model automatically locks a hand area of an image as an area of interest, and the skeleton point identification model automatically identifies skeleton points in the area of interest;
starting the hand detection model and the tracking module according to the detection number of the hands in the previous frame of image to acquire the region of interest of the current frame, and storing the data information of the current frame into a tracking queue Trackhand of the tracking module; the data information of the current frame at least comprises an interested area of the current frame;
carrying out bone point identification on the region of interest of the current frame in the Trackhand through the bone point identification model, and carrying out smooth filtering processing on the identified bone points according to historical data in the Trackhand;
and storing data of the head part of each frame image about the position and the posture into a queue Trackhead of the tracking module in real time, and determining the three-dimensional bone coordinates of the bone points subjected to the smoothing filtering treatment by combining the data of the head part of the Trackhead so as to finish gesture tracking.
Preferably, in the process of training the hand detection model and the bone point identification model,
acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;
and inputting the action behavior case into the hand detection model and the bone point recognition model for model training.
Preferably, in the process of starting the hand detection model and the tracking module according to the detected number of hands in the last frame of image,
if the number of the detection is 0 or 1, starting the hand detection model and the tracking module;
if the number of the detection is 2, only the tracking module is started.
Preferably, in the process of identifying the bone point of the region of interest of the current frame in Trackhand,
the region of interest comprises position coordinates of the hand in the image and a region size corresponding to the hand;
the number of the bone points is 21.
Preferably, in the process of acquiring the region of interest of the current frame, the method further includes:
and estimating the interested area of the next frame according to the interested area of the current frame based on an optical flow tracking algorithm so as to provide reference for carrying out bone point identification on the next frame.
Preferably, in the process of determining the three-dimensional bone coordinates of the bone points after the smoothing filtering processing by combining the data of the Trackhead,
reading the data of the Trackhead middle header, and acquiring a transfer matrix T and a rotation matrix R of the current frame relative to the previous frame header;
and determining the three-dimensional coordinates of the skeleton points according to preset calibration parameters of the tracking camera, the transfer matrix T and the rotation matrix R.
Preferably, the preset calibration parameters of the tracking camera
Figure BDA0002484418300000031
Wherein the content of the first and second substances,
fx, fy denotes the focal length of the pixel of the tracking camera, and cx, cy denotes the position of the optical axis of the tracking camera at the image coordinates.
Preferably, in the process of determining the three-dimensional coordinates of the bone points according to the preset calibration parameters of the tracking camera, the transfer matrix T and the rotation matrix R,
selecting any one of the skeleton points to perform three-dimensional coordinate calculation;
each skeletal point is calculated in turn until all skeletal points of both hands have been calculated.
Preferably, in selecting any one of the bone points for three-dimensional coordinate calculation,
selecting among said bone pointsObtaining the coordinate information of the point P, and making the three-dimensional coordinate of the last frame of the point P be
Figure BDA0002484418300000032
The two-dimensional image coordinates are
Figure BDA0002484418300000033
The coordinates of the two-dimensional image are known to be
Figure BDA0002484418300000034
Figure BDA0002484418300000035
Assuming that the three-dimensional coordinates of the preset point P in the current frame are as follows:
Figure BDA0002484418300000036
from the coordinate information of the point P
Figure BDA0002484418300000037
Figure BDA0002484418300000038
Figure BDA0002484418300000039
And P2 ═ R × P1+ T; then
Figure BDA00024844183000000310
+ T; wherein, K is-1For the calibration parameters
Figure BDA00024844183000000311
The inverse matrix of (c).
Obtaining the three-dimensional skeleton coordinates of the point P in the current frame
Figure BDA00024844183000000312
And completing the conversion from the two-dimensional bone coordinate to the three-dimensional bone coordinate.
Preferably, in the process of determining the three-dimensional bone coordinates of the bone points after the smooth filtering processing by combining the head data in the Trackhead to complete the gesture tracking,
fusing the smoothly filtered bone points with the tracking data of the head, and moving the fused data from a camera coordinate system to a coordinate system worn by the VR virtual reality to form three-dimensional gesture information;
and transmitting the three-dimensional gesture information to a game engine, rendering the three-dimensional gesture information, and then transmitting the three-dimensional gesture information back to the VR virtual head in real time for display processing to finish gesture tracking.
From the above technical solutions, the monocular-based three-dimensional gesture tracking method provided by the present invention obtains the region of interest in the hand image captured by the monocular camera by training the hand detection model and the skeleton point recognition model, estimates the region of interest of the next frame of image according to the region of interest of the previous frame of image, performs skeleton point recognition on the region of interest, simultaneously obtains the head motion data, performs three-dimensional calculation on the two-dimensional image data captured by the monocular camera in combination with the head data, thereby determining the three-dimensional coordinates of each skeleton point, and completes the conversion from two dimensions to three dimensions, so that the display of the three-dimensional coordinates of the hand region can be realized with one common monocular camera, the cost of the camera is reduced with one monocular camera replacing two infrared binocular cameras, even if two monocular cameras are installed, the cost is lower than that of one infrared binocular camera, reduce whole energy consumption and heat dissipation to the monocular camera is installed to wear to the head, reduces structural design, alleviates the whole quality of wearing, increases the travelling comfort of wearing, even the user wears for a long time and also can not feel uncomfortable, and monocular camera FOV is great in addition, can be more comprehensive catch hand motion trail.
Drawings
Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following specification taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 is a schematic diagram of a single-purpose three-dimensional gesture tracking method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating hand skeleton points in a monocular-based three-dimensional gesture tracking method according to an embodiment of the present invention.
Detailed Description
At present, in the VR/AR/MR field, a gesture recognition tracker needs to be additionally added to a mainstream head-mounted all-in-one machine, namely 2 infrared binocular cameras or depth cameras are added to serve as the gesture recognition tracker, and the following problems exist: 1. additional costs are added; 2. extra power consumption is added, and the existing mainstream head is worn in an all-in-one machine mode, namely, power is supplied by a battery independently, so that the power consumption of the whole system greatly influences the interaction time of a user; 3. while increasing power consumption, heat dissipation can also be a significant challenge; 4. the structural design increases the complexity of the structural design and the ID challenge, and violates the development targets that the head-wearing integrated machine is small in size and light to wear, and a user does not feel uncomfortable after wearing the head-wearing integrated machine for a long time; 5. the FOV of the existing mature and popular depth cameras is smaller and is generally about 90 degrees, the FOV required by the head-wearing is generally about 110 degrees, namely, the traditional method adopting the depth camera can not track some motion tracks of hands easily.
In view of the above problems, the present invention provides a three-dimensional gesture tracking method based on monocular, and the following describes in detail an embodiment of the present invention with reference to the accompanying drawings.
For explaining the single-purpose-based three-dimensional gesture tracking method provided by the present invention, fig. 1 shows an exemplary indication of the single-purpose-based three-dimensional gesture tracking method according to the embodiment of the present invention.
The following description of the exemplary embodiment(s) is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. Techniques and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be considered a part of the specification where appropriate.
As shown in fig. 1, the method for tracking a three-dimensional gesture based on a single purpose provided by the present invention includes:
s110: training a hand detection model and a skeleton point identification model to enable the hand detection model to automatically lock a hand area of an image as an interesting area and enable the skeleton point identification model to automatically identify skeleton points in the interesting area;
s120: judging the number of detected hands in the previous frame of image, starting the hand detection model and the tracking module according to the number of detected hands in the previous frame of image to acquire the region of interest of the current frame, and storing the data information of the current frame into a tracking queue Trackhand of the tracking module;
s130: acquiring an interested area on the current frame image from the Trackhand, carrying out bone point identification on the interested area of the current frame in the Trackhand through the bone point identification model, and carrying out smooth filtering processing on the identified bone point according to historical data in the Trackhand;
s140: and counting data of the head part in each frame image about the position and the posture, storing the data of the head part into a queue Trackhead of the tracking module in real time, and determining three-dimensional bone coordinates of the bone points subjected to smoothing filtering processing by combining the data of the head part in the Trackhead so as to finish gesture tracking.
As shown in fig. 1, in the three-dimensional gesture tracking method based on a single purpose provided by the present invention, in step S110, a head tracking camera is first adopted to collect hand image data of at least 100 users as action behavior cases, and after the action behavior cases are collected, the action behavior cases are input into a hand detection model and a bone point recognition model for model training; the head tracking is not particularly limited, the head tracking can be an infrared binocular camera or a monocular camera, the acquisition method is not particularly limited, the hand image data marked with skeleton points can be acquired by a traditional method for marking the skeleton points manually, the mechanical acquisition with front-edge intelligence can also be adopted, in the embodiment, the head tracking camera worn on the head is adopted for acquisition, then the skeleton points of the hand image are marked, so that the skeleton points are accurately marked, and a hand detection model and a skeleton point identification model are more accurately trained; if 1 head tracking camera is used for head position tracking, each frame of the transmitted hand detection model and the skeleton point identification model is image data, and if a plurality of head tracking cameras are used for position tracking, each frame of the transmitted hand detection model and the skeleton point identification model is image data, and in the embodiment, a single head tracking camera is adopted; the number of collected hand image data is not particularly limited, and the more the number is, the higher the model accuracy is, in this embodiment, at least 100 hand images of the user are collected as action behavior cases, so that the precision of the hand detection model and the skeleton point identification model is higher.
As shown in fig. 1, in step S120, the detection number of hands in the previous frame of image is first determined, if the detection number is 0 or 1, the hand detection model and the tracking module are started, if the detection number is 2, only the tracking module is started, that is, one person has at most two hands, and if the detection number is 2, it is verified that both hands in the previous frame are in the picture, only data information of the current frame needs to be stored in the tracking module to wait for subsequent skeletal point identification, and if there is no hand or only one hand in the previous frame, the region of interest of the current frame needs to be obtained in the current frame first, so that the skeletal point identification is performed on the region of interest subsequently.
As shown in fig. 1 and fig. 2, in the step S120, the region of interest includes position coordinates of the hand in the image and a size of a region corresponding to the hand, 21 skeletal points of the hand are provided, and in the process of acquiring the region of interest of the current frame and identifying the skeletal points, the tracking module is started, data information of the current frame is stored in a tracking queue Trackhand of the tracking module, and meanwhile, the region of interest of the next frame is estimated according to the region of interest of the current frame based on an optical flow tracking algorithm, so as to provide a reference for performing skeletal point identification on the next frame.
As shown in fig. 1, in the single-objective three-dimensional gesture tracking method provided by the present invention, in step S130, an area of interest of a hand on a current frame image is obtained from Trackhand, skeleton point recognition of the hand is performed on the area of interest of image data through a skeleton point recognition model, and then smooth filtering processing is performed on each skeleton point by comparing with historical data of each skeleton point, so that the possibility that recognition of a certain skeleton point in a certain frame is not stable is avoided, and the hand skeleton point recognition accuracy and stability are improved; the historical data refers to all existing data stored in the tracking queue Trackhand every time step S120 is performed, that is, a set of data information of the current frame stored every time, and the historical data may also be all data stored in the tracking queue Trackhand, where all the data is data information of the current frame stored when the region of interest of the current frame is obtained.
As shown in fig. 1, in the three-dimensional gesture tracking method based on a single object provided by the present invention, in step S140, data of the head regarding the position and the posture in each frame image is counted, the data of the head is stored in the queue Trackhead of the tracking module in real time, and the three-dimensional bone coordinates of the bone point are calculated by combining the data in the Trackhead, so as to complete two-dimensional to three-dimensional gesture tracking.
As shown in fig. 1, in the three-dimensional gesture tracking method based on a single purpose provided by the present invention, in step S140, in the process of determining the three-dimensional bone coordinates of the bone point after the smoothing filtering processing by combining the data of the head in the Trackhead, the data of the head in the Trackhead is read first, and the transfer matrix T and the rotation matrix R of the current frame relative to the previous frame head are obtained; determining the three-dimensional coordinates of the skeleton points according to preset calibration parameters of the tracking camera, a transfer matrix T and a rotation matrix R; wherein the preset calibration parameters of the tracking camera
Figure BDA0002484418300000075
Wherein fx, fy represents the focal length of the pixel of the tracking camera, and cx, cy represents the position of the optical axis of the tracking camera in the image coordinate; when the coordinate calculation is carried out, one bone point can be selected for carrying out three-dimensional calculation, then the three-dimensional coordinates of the rest forty bone points (each hand comprises 21 bone points) in the two bones are calculated in sequence according to a consistent calculation method until the three-dimensional coordinates of all the bone points on the two hands are calculated, and then the subsequent rendering processing is carried out, so that the gesture tracking of the two hands is completed.
As shown in fig. 1, in the three-dimensional gesture tracking method based on a single object provided by the present invention, in step S140, in the process of selecting any one of the skeleton points to perform three-dimensional coordinate calculation,
the specific operation manner is not specifically limited, in this embodiment, a point P in the skeleton points is first selected, coordinate information of the point P is obtained, and the three-dimensional coordinate of the previous frame of the point P is made to be the three-dimensional coordinate
Figure BDA0002484418300000071
The two-dimensional image coordinates are
Figure BDA0002484418300000072
The coordinates of the two-dimensional image are known to be
Figure BDA0002484418300000073
Assuming that the three-dimensional coordinates of the preset point P in the current frame are as follows:
Figure BDA0002484418300000074
here, P1, L1, and L2 are matrices, and X1, Y1, Z1, u1, and v1 are row matrices or column matrices, and are also matrix operations hereinafter.
Then, the coordinate information of the point P can be obtained
Figure BDA0002484418300000081
Figure BDA0002484418300000082
And P2 ═ R × P1+ T; ③ of
Derived from formula ①②③
Figure BDA0002484418300000083
Therefore, the temperature of the molten steel is controlled,
Figure BDA0002484418300000084
Figure BDA0002484418300000085
wherein, K is-1For the calibration parameter
Figure BDA0002484418300000086
The inverse matrix of (d);
obtaining the three-dimensional skeleton coordinate of the point P in the current frame
Figure BDA0002484418300000087
The conversion from two-dimensional bone coordinates to three-dimensional bone coordinates is completed, and the calculation omits an intermediate process, and the calculation process is specifically executed by the head-mounted internal chip.
As shown in fig. 1, in the three-dimensional gesture tracking method based on a single purpose provided by the present invention, in step S140, the three-dimensional skeleton coordinates of the skeleton point after the smoothing filtering process are determined by combining the head data in the Trackhead, so as to complete the gesture tracking process,
fusing the smoothly filtered bone points with the tracking data of the head, and moving the fused data from a camera coordinate system to a coordinate system worn by the VR virtual reality to form three-dimensional gesture information; and then the three-dimensional gesture information is transmitted to a game engine, rendered and then transmitted back to the VR virtual head in real time for display processing, so that a user can see own situation picture on a displayer worn on the VR virtual head, and gesture tracking is completed.
It can be seen from the foregoing embodiments that, in the monocular-based three-dimensional gesture tracking method provided by the present invention, the hand detection model and the skeleton point recognition model are trained to obtain the region of interest in the hand image captured by the monocular camera, the region of interest of the next frame of image is estimated according to the region of interest of the previous frame of image, then the skeleton point recognition is performed on the region of interest, and simultaneously the head motion data is obtained, the two-dimensional image data captured by the monocular camera is three-dimensionally calculated in combination with the head data, thereby determining the three-dimensional coordinates of each skeleton point, and completing the two-dimensional to three-dimensional conversion, so that the three-dimensional coordinates of the hand region can be displayed with one common monocular camera, reducing the cost of the camera, even if two monocular cameras are installed, the cost is lower than that of one infrared binocular camera, the overall energy consumption and heat dissipation are reduced, and the monocular camera is, reduce structural design, alleviate the whole quality of wearing, increase the travelling comfort of wearing, even the user wears for a long time and also can not feel uncomfortable, monocular camera FOV is great in addition, can be more comprehensive catch hand motion trail.
The proposed monocular-based three-dimensional gesture tracking method according to the present invention is described above by way of example with reference to the accompanying drawings. However, it should be understood by those skilled in the art that various modifications can be made to the monocular-based three-dimensional gesture tracking method of the present invention without departing from the scope of the present invention. Therefore, the scope of the present invention should be determined by the contents of the appended claims.

Claims (10)

1. A monocular-based three-dimensional gesture tracking method is characterized by comprising the following steps:
training a hand detection model and a skeleton point identification model, so that the hand detection model automatically locks a hand area of an image as an area of interest, and the skeleton point identification model automatically identifies skeleton points in the area of interest;
starting the hand detection model and the tracking module according to the detection number of the hands in the previous frame of image, acquiring the region of interest of the current frame, and storing the data information of the current frame into a tracking queue Trackhand of the tracking module; the data information of the current frame at least comprises an interested area of the current frame;
carrying out bone point identification on the region of interest of the current frame in the Trackhand through the bone point identification model, and carrying out smooth filtering processing on the identified bone points according to historical data in the Trackhand;
and storing data of the head part of each frame image about the position and the posture into a queue Trackhead of the tracking module in real time, and determining the three-dimensional bone coordinates of the bone points subjected to the smoothing filtering treatment by combining the data of the head part of the Trackhead so as to finish gesture tracking.
2. The monocular based three-dimensional gesture tracking method of claim 1, wherein, in the process of training the hand detection model and the skeletal point recognition model,
acquiring hand image data of at least 100 users as action behavior cases by adopting a head tracking camera;
and inputting the action behavior case into the hand detection model and the bone point recognition model for model training.
3. The monocular based three-dimensional gesture tracking method according to claim 1, wherein, in the process of starting the hand detection model and tracking module according to the detected number of hands in the previous frame image,
if the number of the detection is 0 or 1, starting the hand detection model and the tracking module;
if the number of the detection is 2, only the tracking module is started.
4. The monocular based three-dimensional gesture tracking method according to claim 1, wherein, in the process of identifying the bone point of interest of the current frame in Trackhand,
the region of interest comprises position coordinates of the hand in the image and a region size corresponding to the hand;
the number of the bone points is 21.
5. The monocular based three-dimensional gesture tracking method according to claim 1, further comprising, in the process of acquiring the region of interest of the current frame:
and estimating the interested area of the next frame according to the interested area of the current frame based on an optical flow tracking algorithm so as to provide reference for carrying out bone point identification on the next frame.
6. The monocular based three-dimensional gesture tracking method according to claim 1, wherein, in determining the three-dimensional bone coordinates of the smoothly filtered bone point in conjunction with the data of the head in the Trackhead,
reading the data of the Trackhead middle header, and acquiring a transfer matrix T and a rotation matrix R of the current frame relative to the previous frame header;
and determining the three-dimensional coordinates of the skeleton points according to preset calibration parameters of the tracking camera, the transfer matrix T and the rotation matrix R.
7. The monocular based three-dimensional gesture tracking method of claim 6, wherein
Calibration parameters of the preset tracking camera
Figure FDA0002484418290000021
Wherein the content of the first and second substances,
fx, fy denotes the focal length of the pixel of the tracking camera, and cx, cy denotes the position of the optical axis of the tracking camera at the image coordinates.
8. The single-purpose three-dimensional gesture tracking method according to claim 6, wherein in the process of determining the three-dimensional coordinates of the skeleton points according to the preset calibration parameters of the tracking camera, the transfer matrix T and the rotation matrix R,
selecting any one of the skeleton points to perform three-dimensional coordinate calculation;
each skeletal point is calculated in turn until all skeletal points of both hands have been calculated.
9. A monocular based three-dimensional gesture tracking method according to claims 6-8, wherein in selecting any one of the skeletal points for three-dimensional coordinate calculation,
selecting a point P in the skeleton points, acquiring coordinate information of the point P, and enabling the three-dimensional coordinate of the last frame of the point P to be
Figure FDA0002484418290000031
The two-dimensional image coordinates are
Figure FDA0002484418290000032
The coordinates of the two-dimensional image are known to be
Figure FDA0002484418290000033
Figure FDA0002484418290000034
Assuming that the three-dimensional coordinates of the preset point P in the current frame are as follows:
Figure FDA0002484418290000035
from the coordinate information of the point P
Figure FDA0002484418290000036
Figure FDA0002484418290000037
And P2 ═ R × P1+ T; then
Figure FDA0002484418290000038
Figure FDA0002484418290000039
Wherein, K is-1For the calibration parameters
Figure FDA00024844182900000310
The inverse matrix of (c).
Obtaining the three-dimensional skeleton coordinates of the point P in the current frame
Figure FDA00024844182900000311
And completing the conversion from the two-dimensional bone coordinate to the three-dimensional bone coordinate.
10. The monocular based three-dimensional gesture tracking method according to claim 1, wherein, in the process of determining the three-dimensional skeleton coordinates of the skeleton point after the smooth filtering process by combining the head data in the Trackhead to complete the gesture tracking,
fusing the smoothly filtered bone points with the tracking data of the head, and moving the fused data from a camera coordinate system to a coordinate system worn by the VR virtual reality to form three-dimensional gesture information;
and transmitting the three-dimensional gesture information to a game engine, rendering the three-dimensional gesture information, and then transmitting the three-dimensional gesture information back to the VR virtual head in real time for display processing to finish gesture tracking.
CN202010387724.1A 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method Active CN111696140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010387724.1A CN111696140B (en) 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010387724.1A CN111696140B (en) 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method

Publications (2)

Publication Number Publication Date
CN111696140A true CN111696140A (en) 2020-09-22
CN111696140B CN111696140B (en) 2024-02-13

Family

ID=72477396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010387724.1A Active CN111696140B (en) 2020-05-09 2020-05-09 Monocular-based three-dimensional gesture tracking method

Country Status (1)

Country Link
CN (1) CN111696140B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927259A (en) * 2021-02-18 2021-06-08 青岛小鸟看看科技有限公司 Multi-camera-based bare hand tracking display method, device and system
CN112927290A (en) * 2021-02-18 2021-06-08 青岛小鸟看看科技有限公司 Bare hand data labeling method and system based on sensor
CN113674395A (en) * 2021-07-19 2021-11-19 广州紫为云科技有限公司 3D hand lightweight real-time capturing and reconstructing system based on monocular RGB camera
WO2022105613A1 (en) * 2020-11-17 2022-05-27 青岛小鸟看看科技有限公司 Head-mounted vr all-in-one machine
WO2022233111A1 (en) * 2021-05-06 2022-11-10 青岛小鸟看看科技有限公司 Transparent object tracking method and system based on image difference

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104460967A (en) * 2013-11-25 2015-03-25 安徽寰智信息科技股份有限公司 Recognition method of upper limb bone gestures of human body
CN104833360A (en) * 2014-02-08 2015-08-12 无锡维森智能传感技术有限公司 Method for transforming two-dimensional coordinates into three-dimensional coordinates
CN104992171A (en) * 2015-08-04 2015-10-21 易视腾科技有限公司 Method and system for gesture recognition and man-machine interaction based on 2D video sequence
CN106250867A (en) * 2016-08-12 2016-12-21 南京华捷艾米软件科技有限公司 A kind of skeleton based on depth data follows the tracks of the implementation method of system
CN106648103A (en) * 2016-12-28 2017-05-10 歌尔科技有限公司 Gesture tracking method for VR headset device and VR headset device
CN106945059A (en) * 2017-03-27 2017-07-14 中国地质大学(武汉) A kind of gesture tracking method based on population random disorder multi-objective genetic algorithm
CN108196679A (en) * 2018-01-23 2018-06-22 河北中科恒运软件科技股份有限公司 Gesture-capture and grain table method and system based on video flowing
CN108919943A (en) * 2018-05-22 2018-11-30 南京邮电大学 A kind of real-time hand method for tracing based on depth transducer
CN110825234A (en) * 2019-11-11 2020-02-21 江南大学 Projection type augmented reality tracking display method and system for industrial scene

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104460967A (en) * 2013-11-25 2015-03-25 安徽寰智信息科技股份有限公司 Recognition method of upper limb bone gestures of human body
CN104833360A (en) * 2014-02-08 2015-08-12 无锡维森智能传感技术有限公司 Method for transforming two-dimensional coordinates into three-dimensional coordinates
CN104992171A (en) * 2015-08-04 2015-10-21 易视腾科技有限公司 Method and system for gesture recognition and man-machine interaction based on 2D video sequence
CN106250867A (en) * 2016-08-12 2016-12-21 南京华捷艾米软件科技有限公司 A kind of skeleton based on depth data follows the tracks of the implementation method of system
US20180047175A1 (en) * 2016-08-12 2018-02-15 Nanjing Huajie Imi Technology Co., Ltd Method for implementing human skeleton tracking system based on depth data
CN106648103A (en) * 2016-12-28 2017-05-10 歌尔科技有限公司 Gesture tracking method for VR headset device and VR headset device
CN106945059A (en) * 2017-03-27 2017-07-14 中国地质大学(武汉) A kind of gesture tracking method based on population random disorder multi-objective genetic algorithm
CN108196679A (en) * 2018-01-23 2018-06-22 河北中科恒运软件科技股份有限公司 Gesture-capture and grain table method and system based on video flowing
CN108919943A (en) * 2018-05-22 2018-11-30 南京邮电大学 A kind of real-time hand method for tracing based on depth transducer
CN110825234A (en) * 2019-11-11 2020-02-21 江南大学 Projection type augmented reality tracking display method and system for industrial scene

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
VR陀螺: "uSens 凌感发布单目RGB相机上的三维手势骨骼识别", pages 1 - 6 *
万琴;余洪山;吴迪;林国汉;: "基于三维视觉系统的多运动目标跟踪方法综述", no. 19 *
杨凯;魏本征;任晓强;王庆祥;刘怀辉;: "基于深度图像的人体运动姿态跟踪和识别算法", no. 05 *
杨露菁等: "智能图像处理及应用" *
陈翰雄;黄雅云;刘宇;闫梦奎;刘峰;: "基于Kinect的空中手势跟踪识别的研究与实现", no. 21 *
黄敦博;林志贤;姚剑敏;郭太良;: "基于自适应提取和改进CAMSHIFT单目手势跟踪", no. 07 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022105613A1 (en) * 2020-11-17 2022-05-27 青岛小鸟看看科技有限公司 Head-mounted vr all-in-one machine
US11941167B2 (en) 2020-11-17 2024-03-26 Qingdao Pico Technology Co., Ltd Head-mounted VR all-in-one machine
CN112927259A (en) * 2021-02-18 2021-06-08 青岛小鸟看看科技有限公司 Multi-camera-based bare hand tracking display method, device and system
CN112927290A (en) * 2021-02-18 2021-06-08 青岛小鸟看看科技有限公司 Bare hand data labeling method and system based on sensor
WO2022174594A1 (en) * 2021-02-18 2022-08-25 青岛小鸟看看科技有限公司 Multi-camera-based bare hand tracking and display method and system, and apparatus
US20220383523A1 (en) * 2021-02-18 2022-12-01 Qingdao Pico Technology Co., Ltd. Hand tracking method, device and system
US11798177B2 (en) * 2021-02-18 2023-10-24 Qingdao Pico Technology Co., Ltd. Hand tracking method, device and system
WO2022233111A1 (en) * 2021-05-06 2022-11-10 青岛小鸟看看科技有限公司 Transparent object tracking method and system based on image difference
US11645764B2 (en) 2021-05-06 2023-05-09 Qingdao Pico Technology Co., Ltd. Image difference-based method and system for tracking a transparent object
CN113674395A (en) * 2021-07-19 2021-11-19 广州紫为云科技有限公司 3D hand lightweight real-time capturing and reconstructing system based on monocular RGB camera

Also Published As

Publication number Publication date
CN111696140B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN111696140B (en) Monocular-based three-dimensional gesture tracking method
CN106873778B (en) Application operation control method and device and virtual reality equipment
CN103140879B (en) Information presentation device, digital camera, head mounted display, projecting apparatus, information demonstrating method and information are presented program
CN110363867B (en) Virtual decorating system, method, device and medium
CN105389539B (en) A kind of three-dimension gesture Attitude estimation method and system based on depth data
CN107545302B (en) Eye direction calculation method for combination of left eye image and right eye image of human eye
CN104978548B (en) A kind of gaze estimation method and device based on three-dimensional active shape model
JP7015152B2 (en) Processing equipment, methods and programs related to key point data
CN108983982B (en) AR head display equipment and terminal equipment combined system
WO2020042542A1 (en) Method and apparatus for acquiring eye movement control calibration data
JP4692526B2 (en) Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method
US20090042661A1 (en) Rule based body mechanics calculation
KR101892735B1 (en) Apparatus and Method for Intuitive Interaction
WO2022174594A1 (en) Multi-camera-based bare hand tracking and display method and system, and apparatus
JP4936491B2 (en) Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method
JP5526465B2 (en) Nail position data detection device, nail position data detection method, and nail position data detection program
Gurbuz et al. Model free head pose estimation using stereovision
CN112416125A (en) VR head-mounted all-in-one machine
CN107422844A (en) A kind of information processing method and electronic equipment
Zou et al. Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking
JP2015100032A (en) Video display device, video presentation method and program
CN112766097B (en) Sight line recognition model training method, sight line recognition device and sight line recognition equipment
Lages et al. Enhanced geometric techniques for point marking in model-free augmented reality
CN115223240B (en) Motion real-time counting method and system based on dynamic time warping algorithm
JP2017227687A (en) Camera assembly, finger shape detection system using camera assembly, finger shape detection method using camera assembly, program implementing detection method, and recording medium of program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant