CN116363741A - Gesture data labeling method and device - Google Patents
Gesture data labeling method and device Download PDFInfo
- Publication number
- CN116363741A CN116363741A CN202111579908.9A CN202111579908A CN116363741A CN 116363741 A CN116363741 A CN 116363741A CN 202111579908 A CN202111579908 A CN 202111579908A CN 116363741 A CN116363741 A CN 116363741A
- Authority
- CN
- China
- Prior art keywords
- image
- gesture
- key point
- acquisition device
- image acquisition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 claims abstract description 52
- 238000013507 mapping Methods 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 16
- 238000004590 computer program Methods 0.000 claims description 15
- 238000003062 neural network model Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 8
- 238000009499 grossing Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 8
- 238000003384 imaging method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the invention provides a method and a device for labeling gesture data, and relates to the technical field of image processing. The method comprises the following steps: acquiring a gesture image and a depth image corresponding to the gesture image; identifying each hand keypoint in the depth image; acquiring position information of each hand key point in the depth image; mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image; and marking the gesture data of the gesture image according to the position information of each hand key point in the gesture image. The embodiment of the invention is used for solving the problems of high cost, low efficiency, large error, easiness in being influenced by the subjective effect and the like caused by manual marking of the gesture image.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for labeling gesture data.
Background
The gestures have the advantages of being visual and easy to understand, free of environmental limitation, free of language limitation and the like, and the basic intention can be expressed clearly by simple gestures and are not easy to cause ambiguity, so that the gestures are a very important part in the daily communication process.
Along with the rapid development of computer vision technology, an important intelligent device control mode is formed by controlling intelligent devices through gestures. The general implementation principle of controlling the intelligent device through gestures is as follows: and recognizing the gesture of the user by adopting a gesture recognition technology, and executing a control instruction corresponding to the recognized gesture. Currently, one of the most widely used gesture recognition techniques is one based on a deep neural network (Deep Neural Networks, DNN) model. Firstly training a deep neural network model through a gesture image marked with gesture data, and then carrying out gesture recognition on the acquired gesture image through the trained deep neural network model. In the training process of the deep neural network model, a large number of gesture images marked with gesture data are required to be used as training samples, and the mode of acquiring the training samples in the prior art is to mark the gesture images manually. However, manually labeling the gesture image can bring the problems of high cost, low efficiency, large error, easy subjective influence and the like.
Disclosure of Invention
In view of the above, the invention provides a method and a device for labeling gesture data, which are used for solving the problems of high cost, low efficiency, large error, easy subjective influence and the like caused by manually labeling gesture images.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for labeling gesture data, including:
acquiring a gesture image and a depth image corresponding to the gesture image;
identifying each hand keypoint in the depth image;
acquiring position information of each hand key point in the depth image;
mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and marking the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the identifying each hand keypoint in the depth image includes:
identifying each hand key point in the depth image through a key point identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
As an optional implementation manner of the embodiment of the present invention, the acquiring the gesture image and the depth image corresponding to the gesture image includes:
And performing image sampling through a first image acquisition device to acquire the gesture image, and performing image sampling through a second image acquisition device synchronously to acquire the depth image.
As an optional implementation manner of the embodiment of the present invention, the position information of each hand key point in the depth image is a pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, mapping the position information of each hand key point in the depth image to the position information of each hand key point in the gesture image includes:
calibrating the first image acquisition device to acquire an inner parameter of the first image acquisition device and an outer parameter of the first image acquisition device;
calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device;
acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device;
And acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
As an optional implementation manner of the embodiment of the present invention, the obtaining, according to the pixel coordinates of each hand key point in the depth image, the internal parameter of the first image capturing device, the internal parameter of the second image capturing device, the rotation parameter, the position parameter, and the depth value of each hand key point, the pixel coordinates of each hand key point in the gesture image includes:
acquiring pixel coordinates of each hand key point in the gesture image according to pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point and the following formula:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of hand keypoints in the depth imageT is the position parameter, z h Is the depth value of the key point of the hand.
As an optional implementation manner of the embodiment of the present invention, the method further includes:
taking a preset time length as a period, periodically acquiring the gesture image and the depth image, and marking gesture data of the gesture image acquired in each period;
and carrying out smoothing processing on gesture data of the gesture images acquired in each period.
As an optional implementation manner of the embodiment of the present invention, before the gesture image is acquired by image acquisition by the first image acquisition device and the depth image corresponding to the gesture image is acquired by image acquisition by the second image acquisition device, the method further includes:
constructing a data acquisition thread; the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
In a second aspect, an embodiment of the present invention provides a device for labeling gesture data, including:
the image acquisition unit is used for acquiring a gesture image and a depth image corresponding to the gesture image;
a key point identification unit, configured to identify each hand key point in the depth image;
the position acquisition unit is used for acquiring the position information of each hand key point in the depth image;
the mapping unit is used for mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and the labeling unit is used for labeling the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the keypoint identification unit is specifically configured to identify each hand keypoint in the depth image through a keypoint identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
As an optional implementation manner of the embodiment of the present invention, the image acquisition unit is specifically configured to acquire the gesture image by performing image sampling through a first image acquisition device, and acquire the depth image by performing image sampling through a second image acquisition device.
As an alternative to the embodiment of the present invention,
the position information of each hand key point in the depth image is the pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the mapping unit is specifically configured to calibrate the first image capturing device, and obtain an internal parameter of the first image capturing device and an external parameter of the first image capturing device; calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device; acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device; and acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
As an optional implementation manner of the embodiment of the present invention, the mapping unit is specifically configured to obtain, according to a pixel coordinate of each hand key point in the depth image, an internal parameter of the first image capturing device, an internal parameter of the second image capturing device, the rotation parameter, the position parameter, and a depth value of each hand key point, and the following formula, a pixel coordinate of each hand key point in the gesture image:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of the hand key points in the depth image, T is the position parameter, z h Is the depth value of the key point of the hand.
As an optional implementation manner of the embodiment of the present invention, the device for labeling gesture data further includes: a correction unit;
the image acquisition unit is further used for periodically acquiring the gesture image and the depth image by taking a preset duration as a period;
the labeling unit is further used for labeling gesture data of the gesture images acquired in each period;
And the correction unit is used for smoothing the gesture data of the gesture image acquired in each period.
As an optional implementation manner of the embodiment of the present invention, the image acquisition unit is further configured to build a data acquisition thread before performing image acquisition by using the first image acquisition device to acquire a gesture image, and performing image acquisition by using the second image acquisition device to acquire a depth image corresponding to the gesture image;
the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory for storing a computer program; the processor is configured to, when invoking a computer program, cause the electronic device to implement the method for labeling gesture data according to the first aspect or any optional implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, which when executed by a computing device, causes the computing device to implement a method for labeling gesture data according to the first aspect or any optional implementation manner of the first aspect.
In a fifth aspect, embodiments of the present invention provide a computer program product, which when run on a computer causes the computer to implement the method for labeling gesture data according to the first aspect or any of the optional embodiments of the first aspect.
The method for labeling gesture data provided by the embodiment of the invention comprises the following steps: firstly, acquiring a gesture image and a depth image corresponding to the gesture image, then identifying each hand key point in the depth image, acquiring the position information of each hand key point in the depth image, mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and finally labeling gesture data of the gesture image according to the position information of each hand key point in the gesture image. The embodiment of the invention can acquire the position information of each hand key point in the depth image by identifying each hand key point in the depth image, map the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and mark the gesture data of the gesture image according to the position information of each hand key point in the gesture image, so the embodiment of the invention provides a full-automatic gesture data marking method, and manual operation is not needed in the gesture data marking process of the gesture image, thus the embodiment of the invention can solve the problems of high cost, low efficiency, large error, easiness in subjective influence and the like caused by manual marking of the gesture image.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a scene structure diagram of a method for labeling gesture data according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a method for labeling gesture data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a hand key point provided in an embodiment of the present invention;
FIG. 4 is a schematic diagram of smoothing processing according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a device for labeling gesture data according to an embodiment of the present invention;
FIG. 6 is a second schematic diagram of a device for labeling gesture data according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the invention will be more clearly understood, a further description of the invention will be made. It should be noted that, without conflict, the embodiments of the present invention and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the invention.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion. Furthermore, in the description of the embodiments of the present invention, unless otherwise indicated, the meaning of "plurality" means two or more.
The following describes a scene architecture of a labeling method for gesture data according to an embodiment of the present invention. Referring to fig. 1, a scene architecture of a method for labeling gesture data according to an embodiment of the present invention includes: a two-dimensional image acquisition device 11, a depth image acquisition device 12, and an image processing device 13. The two-dimensional image acquisition device 11 is used for acquiring gesture images to be annotated, and transmitting the gesture images to be annotated to the image processing device 13. The depth image capturing device 12 is configured to capture an image of an image capturing object of the two-dimensional image capturing device 11 at the same time, obtain a depth image corresponding to a gesture image to be annotated, and transmit the depth image to the image processing device 13. The image processing device 13 is configured to identify each hand key point in the depth image, thereby acquiring position information of each hand key point in the depth image, then map the position information of each hand key point in the depth image to position information of each hand key point in the gesture image, and mark gesture data of the gesture image according to the position information of each hand key point in the gesture image.
It should be noted that, in the scene architecture shown in fig. 1, the two-dimensional image capturing device 11, the depth image capturing device 12, and the image processing device 13 are respectively devices independent from each other in hardware, but the embodiment of the present invention is not limited thereto, and two or all of the two-dimensional image capturing device 11, the depth image capturing device 12, and the image processing device 13 may be integrated into the same hardware device based on the above embodiment. For example: the two-dimensional image acquisition device 11 is integrated in the same hardware device as the image processing device 13.
The embodiment of the invention provides a method for labeling gesture data, which is shown by referring to fig. 2, and comprises the following steps S11 to S15:
s11, acquiring a gesture image and a depth image corresponding to the gesture image.
As an optional implementation manner of the embodiment of the present invention, the acquiring the gesture image and the depth image corresponding to the gesture image includes:
and acquiring the gesture image through the first image acquisition device, and synchronously acquiring the depth image corresponding to the gesture image through the second image acquisition device.
Optionally, the first image capturing device in the embodiment of the present invention may be an image capturing device of a Virtual Reality (VR) or augmented Reality (Augmented Reality, AR) device; the second image acquisition device may be a depth camera.
Further, in order to realize that the first image capturing device and the second image capturing device can perform image capturing at the same time, before step S11 (performing image capturing by the first image capturing device to obtain a gesture image, and performing image capturing by the second image capturing device to obtain a depth image corresponding to the gesture image) the method for labeling gesture data according to the embodiment of the present invention further includes: and (5) constructing a data acquisition thread. The data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
And S12, identifying each hand key point in the depth image.
Illustratively, referring to fig. 3, the hand key points in the embodiment of the present invention may include: the fingertips of the five fingers of the hand (5 keypoints) total 21 keypoints for all joints of the hand (16 keypoints).
As an optional implementation manner of the embodiment of the present invention, the step S12 (identifying each hand key point in the depth image) includes:
Identifying each hand key point in the depth image through a key point identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
That is, a neural network model for identifying hand keypoints in the depth image is trained in advance, and each hand keypoint in the depth image is acquired through the trained neural network model.
Of course, in the embodiment of the present invention, each hand key point in the depth image may be identified in other manners, which is not limited in the embodiment of the present invention, and it is important that each hand key point in the depth image can be accurately identified.
S13, acquiring position information of each hand key point in the depth image.
As an optional implementation manner of the embodiment of the present invention, the position information of each hand key point in the depth image is a pixel coordinate of each hand key point in the depth image.
Specifically, the pixel coordinates of each hand key point in the depth image may be determined according to the pixel points corresponding to each hand key point in the depth image.
And S14, mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the implementation manner of the step S14 (mapping the position information of each hand key point in the depth image to the position information of each hand key point in the gesture image) may include the following steps a to d:
and a step a of calibrating the first image acquisition device to acquire the internal parameters of the first image acquisition device and the external parameters of the first image acquisition device.
Specifically, in the image measurement process and the machine vision application, in order to determine the correlation between the three-dimensional geometric position of a point on the surface of a space object and the corresponding pixel point in the image, a geometric model of camera imaging must be established, wherein parameters of the geometric model are an internal parameter and an external parameter of the camera, and the process of establishing the geometric model of camera imaging is called camera calibration. Wherein, the intrinsic parameters of the camera are parameters related to the characteristics of the camera, including: the focal length, pixel size, luminous flux, etc. of the camera, and the external parameters of the camera are used to describe information such as the position, rotation angle, etc. of the camera in the real world (world coordinate system).
Optionally, an implementation manner of calibrating the first image acquisition device may include: shooting a calibration object through a first image acquisition device, and obtaining a camera model of the first image acquisition device by establishing correspondence between points with known coordinates on the calibration object and corresponding pixel points of the calibration object and utilizing a certain algorithm so as to obtain internal parameters and external parameters of the first image acquisition device.
And b, calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device.
The implementation manner of calibrating the second image acquisition device is similar to that of calibrating the first image acquisition device in the step a, and is not described herein.
And c, acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device.
The rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device.
That is, the rotational relationship and the positional relationship between the first image capturing device and the second image capturing device are obtained by the external parameters of the first image capturing device and the external parameters of the second image capturing device.
Illustratively, the external parameters of the camera are used to describe information such as the position, rotation angle, etc. of the camera in the real world (world coordinate system), and set up: the external parameters of the first image acquisition device include that the position coordinates of the first image acquisition device in the world coordinate system are (x) 1 、y 1 、z 1 ) And the rotation angle of the first image acquisition device is alpha, and the external parameters of the second image acquisition device comprise the position coordinate of the second image acquisition device in the world coordinate system as (x) 2 、y 2 、z 2 ) And the rotation angle of the second image acquisition device is beta, the position parameter is T, and the rotation parameter is R, then the method comprises the following steps:
T=(x 2 -x 1 ,y 2 -y 1 ,z 2 -z 1 )
R=β-α
and d, acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
Further optionally, the implementation manner of step d (obtaining the pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameter of the first image capturing device, the internal parameter of the second image capturing device, the rotation parameter, the position parameter, and the depth value of each hand key point) includes:
acquiring pixel coordinates of each hand key point in the gesture image according to pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point and the following formula:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of the hand key points in the depth image, T is the position parameter, z h Is the depth value of the key point of the hand.
The following describes the implementation principle of acquiring the pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point, and the above formulas.
Setting: one hand key point P on shooting objects of the first image acquisition device and the second image acquisition device h The coordinate value in the world coordinate system with the position of the second image acquisition device as the origin is (x) h ,y h ,z h ) Hand key point P h Imaging as P in a first image acquisition device 1 Hand key point P h Imaging as P in a second image acquisition device 2 The following steps are:
P 1 =R*P 2 +T (1)
wherein R is the rotation parameter, and T is the position parameter.
Hand key point P h Imaging P in a first image acquisition device 1 Pixel coordinate p in gesture image 1 The method comprises the following steps:
p 1 =R 1 *P 1 /z h (2)
hand key point P h Imaging P in a second image acquisition device 2 Pixel coordinate p in depth image 2 The method comprises the following steps:
p 2 =R 2 *P 2 /z h (3)
the above formulas (1), (2) and (3) are derived from:
p 1 =R 1 *(R*P 2 +T)/z h =R 1 *R*P 2 /z h +R 1 *T/z h (4)
again because:
P 2 /z h =p 2 /R 2 (5)
it is further possible to substitute the above formula (5) into formula (4):
p 1 =R 1 *R*p 2 /R 2 +R 1 *T/z h =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
s15, marking gesture data of the gesture image according to the position information of each hand key point in the gesture image.
For example, gesture data obtained by labeling gesture data of the gesture image according to the position information of each hand key point in the gesture image may be as shown in table 1 below:
TABLE 1
The method for labeling gesture data provided by the embodiment of the invention comprises the following steps: firstly, acquiring a gesture image and a depth image corresponding to the gesture image, then identifying each hand key point in the depth image, acquiring the position information of each hand key point in the depth image, mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and finally labeling gesture data of the gesture image according to the position information of each hand key point in the gesture image. The embodiment of the invention can acquire the position information of each hand key point in the depth image by identifying each hand key point in the depth image, map the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and mark the gesture data of the gesture image according to the position information of each hand key point in the gesture image, so the embodiment of the invention provides a full-automatic gesture data marking method, and manual operation is not needed in the gesture data marking process of the gesture image, thus the embodiment of the invention can solve the problems of high cost, low efficiency, large error, easiness in subjective influence and the like caused by manual marking of the gesture image.
As an optional implementation manner of the embodiment of the present invention, the method for labeling gesture data provided by the embodiment of the present invention further includes:
taking a preset time length as a period, periodically acquiring the gesture image and the depth image, and marking gesture data of the gesture image acquired in each period;
and carrying out smoothing processing on gesture data of the gesture images acquired in each period.
That is, the first image acquisition device and the second image acquisition device are controlled to acquire images once at fixed time intervals, the gesture data of the gesture images acquired in each period are marked by the marking method of the gesture data provided by the embodiment shown in fig. 2, and then the gesture data of the gesture images acquired in each period are smoothed for each hand key point.
In general, the position change of each hand key point in the neighborhood period is continuous and smooth, and if the position of a certain hand key point in the gesture data has larger fluctuation and the position of the hand key point in the neighborhood period is very unsmooth, the position of the hand key point is likely to be caused by mislabeling, and the above embodiment further performs smoothing processing on the gesture data of the gesture image acquired in each period, so that the labeled gesture data can be more accurate.
The preset duration may be, for example, 33ms. That is, the first image pickup device and the second image pickup device are controlled to perform image pickup once every 33ms.
Exemplary, referring to FIG. 4, after marking the gesture data of the gesture image of each period, the pixel coordinates of a certain hand key point in the n-2 th period, the n-1 th period, the n-th period, the n+1th period, and the n+2th period are (x 1 ,y 1 )、(x 2 ,y 2 )、(x 3 ,y 3 )、(x 4 ,y 4 )、(x 5 ,y 5 ) According to the pixel coordinates (x 1 ,y 1 ) The pixel coordinates (x) of the hand keypoint in the n-1 th period 2 ,y 2 ) The pixel coordinates (x) of the hand keypoint in the n+1th period 4 ,y 4 ) And the pixel coordinates (x) of the hand keypoint in the n+2th period 5 ,y 5 ) For the pixel coordinates (x 3 ,y 3 ) The coordinate value obtained after the filtering is (x' 3 ,y′ 3 ) Therefore, the pixel coordinate value of the hand key point in the gesture data of the gesture image acquired in the nth period can be corrected to (x '' 3 ,y′ 3 )。
Based on the same inventive concept, as an implementation of the method, the embodiment of the present invention further provides a device for labeling gesture data, where the embodiment of the device corresponds to the embodiment of the method, and for convenience of reading, the embodiment of the present invention does not describe details in the embodiment of the method one by one, but it should be clear that the device for labeling gesture data in the embodiment can correspondingly implement all the details in the embodiment of the method.
An embodiment of the present invention provides a device for labeling gesture data, and fig. 5 is a schematic structural diagram of the device for labeling gesture data, as shown in fig. 5, a device 500 for labeling gesture data includes:
an image acquisition unit 51, configured to acquire a gesture image and a depth image corresponding to the gesture image;
a keypoint identification unit 52 for identifying each hand keypoint in the depth image;
a position obtaining unit 53, configured to obtain position information of each hand key point in the depth image;
a mapping unit 54, configured to map position information of each hand key point in the depth image to position information of each hand key point in the gesture image;
and the labeling unit 55 labels the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the keypoint identifying unit 52 is specifically configured to identify each hand keypoint in the depth image through a keypoint identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
As an optional implementation manner of the embodiment of the present invention, the image capturing unit 51 is specifically configured to obtain the gesture image by performing image sampling by using a first image capturing device, and obtain the depth image by performing image sampling by using a second image capturing device synchronously.
As an alternative to the embodiment of the present invention,
the position information of each hand key point in the depth image is the pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the mapping unit 54 performs calibration on the first image capturing device to obtain an internal parameter of the first image capturing device and an external parameter of the first image capturing device; calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device; acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device; and acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
As an optional implementation manner of this embodiment of the present invention, the mapping unit 54 is specifically configured to obtain, according to a pixel coordinate of each hand keypoint in the depth image, an internal parameter of the first image capturing device, an internal parameter of the second image capturing device, the rotation parameter, the position parameter, and a depth value of each hand keypoint, and the following formula, a pixel coordinate of each hand keypoint in the gesture image:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 For the position parameter, p 2 The pixel coordinates of the hand key points in the depth image, T is the external parameter of the second image acquisition device, and z h Is the depth value of the key point of the hand.
As an alternative implementation manner of the embodiment of the present invention, referring to fig. 6, the device 500 for labeling gesture data further includes: a correction unit 56;
the image acquisition unit 51 is further configured to periodically acquire the gesture image and the depth image with a preset duration as a period;
the labeling unit 55 is further configured to label gesture data of the gesture image acquired in each period;
The correction unit 56 is configured to perform smoothing processing on gesture data of the gesture image acquired in each period.
As an optional implementation manner of the embodiment of the present invention, the image acquisition unit 51 is further configured to build a data acquisition thread before performing image acquisition by using a first image acquisition device to acquire a gesture image and performing image acquisition by using a second image acquisition device to acquire a depth image corresponding to the gesture image;
the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
The gesture data labeling device provided in this embodiment may execute the gesture data labeling method provided in the above method embodiment, and its implementation principle and technical effects are similar, and are not repeated here.
Based on the same inventive concept, the embodiment of the invention also provides electronic equipment. Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 7, where the electronic device provided in this embodiment includes: a memory 701 and a processor 702, the memory 701 for storing a computer program; the processor 702 is configured to execute the method for labeling gesture data provided in the foregoing embodiment when a computer program is invoked.
Based on the same inventive concept, the embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computing device is caused to implement the method for labeling gesture data provided in the foregoing embodiment.
Based on the same inventive concept, the embodiment of the present invention further provides a computer program product, which when running on a computer, causes the computing device to implement the method for labeling gesture data provided in the above embodiment.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
The processor may be a central processing unit (CentralProcessingUnit, CPU), but may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash memory (flashRAM). Memory is an example of a computer-readable medium.
Computer readable media include both non-transitory and non-transitory, removable and non-removable storage media. Storage media may embody any method or technology for storage of information, which may be computer readable instructions, data structures, program modules, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transshipment) such as modulated data signals and carrier waves.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
Claims (12)
1. A method for labeling gesture data, comprising:
acquiring a gesture image and a depth image corresponding to the gesture image;
identifying each hand keypoint in the depth image;
acquiring position information of each hand key point in the depth image;
mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and marking the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
2. The method of claim 1, wherein the identifying individual hand keypoints in the depth image comprises:
Identifying each hand key point in the depth image through a key point identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
3. The method of claim 1, wherein the acquiring the gesture image and the corresponding depth image of the gesture image comprises:
and acquiring the gesture image by performing image sampling through a first image acquisition device, and synchronously acquiring the depth image by performing image sampling through a second image acquisition device.
4. The method of claim 3, wherein the step of,
the position information of each hand key point in the depth image is the pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
5. A method according to claim 3, wherein mapping the position information of each hand keypoint in the depth image to the position information of each hand keypoint in the gesture image comprises:
Calibrating the first image acquisition device to acquire an inner parameter of the first image acquisition device and an outer parameter of the first image acquisition device;
calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device;
acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device;
and acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
6. The method according to claim 5, wherein the acquiring the pixel coordinates of each hand keypoint in the gesture image according to the pixel coordinates of each hand keypoint in the depth image, the internal parameters of the first image capturing device, the internal parameters of the second image capturing device, the rotation parameters, the position parameters, and the depth value of each hand keypoint comprises:
Acquiring pixel coordinates of each hand key point in the gesture image according to pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point and the following formula:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of the hand key points in the depth image, T is the position parameter, z h Is the depth value of the key point of the hand.
7. The method according to any one of claims 1-6, further comprising:
taking a preset time length as a period, periodically acquiring the gesture image and the depth image, and marking gesture data of the gesture image acquired in each period;
and carrying out smoothing processing on gesture data of the gesture images acquired in each period.
8. The method of any of claims 3-6, wherein prior to acquiring a gesture image by image acquisition by a first image acquisition device and synchronizing image acquisition by a second image acquisition device to acquire a depth image corresponding to the gesture image, the method further comprises:
Constructing a data acquisition thread; the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
9. A device for labeling gesture data, comprising:
the image acquisition unit is used for acquiring a gesture image and a depth image corresponding to the gesture image;
a key point identification unit, configured to identify each hand key point in the depth image;
the position acquisition unit is used for acquiring the position information of each hand key point in the depth image;
the mapping unit is used for mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and the labeling unit is used for labeling the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
10. An electronic device, comprising: a memory and a processor, the memory for storing a computer program; the processor is configured to cause the electronic device to implement the method for labeling gesture data according to any one of claims 1-8 when the computer program is invoked.
11. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, which when executed by a computing device, causes the computing device to implement the method of labeling gesture data according to any of claims 1-8.
12. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to carry out the method of labeling gesture data according to any one of claims 1-8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111579908.9A CN116363741A (en) | 2021-12-22 | 2021-12-22 | Gesture data labeling method and device |
PCT/CN2022/139979 WO2023116620A1 (en) | 2021-12-22 | 2022-12-19 | Gesture data annotation method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111579908.9A CN116363741A (en) | 2021-12-22 | 2021-12-22 | Gesture data labeling method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116363741A true CN116363741A (en) | 2023-06-30 |
Family
ID=86901345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111579908.9A Pending CN116363741A (en) | 2021-12-22 | 2021-12-22 | Gesture data labeling method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116363741A (en) |
WO (1) | WO2023116620A1 (en) |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109934065B (en) * | 2017-12-18 | 2021-11-09 | 虹软科技股份有限公司 | Method and device for gesture recognition |
CN109710071B (en) * | 2018-12-26 | 2022-05-17 | 青岛小鸟看看科技有限公司 | Screen control method and device |
CN111815754B (en) * | 2019-04-12 | 2023-05-30 | Oppo广东移动通信有限公司 | Three-dimensional information determining method, three-dimensional information determining device and terminal equipment |
CN112150448B (en) * | 2020-09-28 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | Image processing method, device and equipment and storage medium |
CN112613384B (en) * | 2020-12-18 | 2023-09-19 | 安徽鸿程光电有限公司 | Gesture recognition method, gesture recognition device and control method of interactive display equipment |
CN114882524A (en) * | 2022-04-15 | 2022-08-09 | 华南理工大学 | Monocular three-dimensional gesture estimation method based on full convolution neural network |
-
2021
- 2021-12-22 CN CN202111579908.9A patent/CN116363741A/en active Pending
-
2022
- 2022-12-19 WO PCT/CN2022/139979 patent/WO2023116620A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2023116620A1 (en) | 2023-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110827247B (en) | Label identification method and device | |
CN109740670B (en) | Video classification method and device | |
CN107633526B (en) | Image tracking point acquisition method and device and storage medium | |
CN102831386B (en) | Object identification method and recognition device | |
CN111354042A (en) | Method and device for extracting features of robot visual image, robot and medium | |
CN111612834B (en) | Method, device and equipment for generating target image | |
CN109919971B (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN109299663A (en) | Hand-written script recognition methods, system and terminal device | |
CN110956131B (en) | Single-target tracking method, device and system | |
CN108765532B (en) | Child drawing model building method, reading robot and storage device | |
CN110852954B (en) | Image inclination correction method and system for pointer instrument | |
CN111738036A (en) | Image processing method, device, equipment and storage medium | |
CN112102404B (en) | Object detection tracking method and device and head-mounted display equipment | |
CN112307786B (en) | Batch positioning and identifying method for multiple irregular two-dimensional codes | |
CN108447092B (en) | Method and device for visually positioning marker | |
CN115937003A (en) | Image processing method, image processing device, terminal equipment and readable storage medium | |
CN111758118B (en) | Visual positioning method, device, equipment and readable storage medium | |
CN109558505A (en) | Visual search method, apparatus, computer equipment and storage medium | |
CN116363741A (en) | Gesture data labeling method and device | |
CN111281355B (en) | Method and equipment for determining pulse acquisition position | |
CN110910478B (en) | GIF map generation method and device, electronic equipment and storage medium | |
CN117253022A (en) | Object identification method, device and inspection equipment | |
CN106030658B (en) | For determining the method and device in the orientation of video | |
CN110610184B (en) | Method, device and equipment for detecting salient targets of images | |
CN109816709B (en) | Monocular camera-based depth estimation method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |