CN116363741A - Gesture data labeling method and device - Google Patents

Gesture data labeling method and device Download PDF

Info

Publication number
CN116363741A
CN116363741A CN202111579908.9A CN202111579908A CN116363741A CN 116363741 A CN116363741 A CN 116363741A CN 202111579908 A CN202111579908 A CN 202111579908A CN 116363741 A CN116363741 A CN 116363741A
Authority
CN
China
Prior art keywords
image
gesture
key point
acquisition device
image acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111579908.9A
Other languages
Chinese (zh)
Inventor
孙飞
余海桃
吴涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202111579908.9A priority Critical patent/CN116363741A/en
Priority to PCT/CN2022/139979 priority patent/WO2023116620A1/en
Publication of CN116363741A publication Critical patent/CN116363741A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention provides a method and a device for labeling gesture data, and relates to the technical field of image processing. The method comprises the following steps: acquiring a gesture image and a depth image corresponding to the gesture image; identifying each hand keypoint in the depth image; acquiring position information of each hand key point in the depth image; mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image; and marking the gesture data of the gesture image according to the position information of each hand key point in the gesture image. The embodiment of the invention is used for solving the problems of high cost, low efficiency, large error, easiness in being influenced by the subjective effect and the like caused by manual marking of the gesture image.

Description

Gesture data labeling method and device
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for labeling gesture data.
Background
The gestures have the advantages of being visual and easy to understand, free of environmental limitation, free of language limitation and the like, and the basic intention can be expressed clearly by simple gestures and are not easy to cause ambiguity, so that the gestures are a very important part in the daily communication process.
Along with the rapid development of computer vision technology, an important intelligent device control mode is formed by controlling intelligent devices through gestures. The general implementation principle of controlling the intelligent device through gestures is as follows: and recognizing the gesture of the user by adopting a gesture recognition technology, and executing a control instruction corresponding to the recognized gesture. Currently, one of the most widely used gesture recognition techniques is one based on a deep neural network (Deep Neural Networks, DNN) model. Firstly training a deep neural network model through a gesture image marked with gesture data, and then carrying out gesture recognition on the acquired gesture image through the trained deep neural network model. In the training process of the deep neural network model, a large number of gesture images marked with gesture data are required to be used as training samples, and the mode of acquiring the training samples in the prior art is to mark the gesture images manually. However, manually labeling the gesture image can bring the problems of high cost, low efficiency, large error, easy subjective influence and the like.
Disclosure of Invention
In view of the above, the invention provides a method and a device for labeling gesture data, which are used for solving the problems of high cost, low efficiency, large error, easy subjective influence and the like caused by manually labeling gesture images.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for labeling gesture data, including:
acquiring a gesture image and a depth image corresponding to the gesture image;
identifying each hand keypoint in the depth image;
acquiring position information of each hand key point in the depth image;
mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and marking the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the identifying each hand keypoint in the depth image includes:
identifying each hand key point in the depth image through a key point identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
As an optional implementation manner of the embodiment of the present invention, the acquiring the gesture image and the depth image corresponding to the gesture image includes:
And performing image sampling through a first image acquisition device to acquire the gesture image, and performing image sampling through a second image acquisition device synchronously to acquire the depth image.
As an optional implementation manner of the embodiment of the present invention, the position information of each hand key point in the depth image is a pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, mapping the position information of each hand key point in the depth image to the position information of each hand key point in the gesture image includes:
calibrating the first image acquisition device to acquire an inner parameter of the first image acquisition device and an outer parameter of the first image acquisition device;
calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device;
acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device;
And acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
As an optional implementation manner of the embodiment of the present invention, the obtaining, according to the pixel coordinates of each hand key point in the depth image, the internal parameter of the first image capturing device, the internal parameter of the second image capturing device, the rotation parameter, the position parameter, and the depth value of each hand key point, the pixel coordinates of each hand key point in the gesture image includes:
acquiring pixel coordinates of each hand key point in the gesture image according to pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point and the following formula:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of hand keypoints in the depth imageT is the position parameter, z h Is the depth value of the key point of the hand.
As an optional implementation manner of the embodiment of the present invention, the method further includes:
taking a preset time length as a period, periodically acquiring the gesture image and the depth image, and marking gesture data of the gesture image acquired in each period;
and carrying out smoothing processing on gesture data of the gesture images acquired in each period.
As an optional implementation manner of the embodiment of the present invention, before the gesture image is acquired by image acquisition by the first image acquisition device and the depth image corresponding to the gesture image is acquired by image acquisition by the second image acquisition device, the method further includes:
constructing a data acquisition thread; the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
In a second aspect, an embodiment of the present invention provides a device for labeling gesture data, including:
the image acquisition unit is used for acquiring a gesture image and a depth image corresponding to the gesture image;
a key point identification unit, configured to identify each hand key point in the depth image;
the position acquisition unit is used for acquiring the position information of each hand key point in the depth image;
the mapping unit is used for mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and the labeling unit is used for labeling the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the keypoint identification unit is specifically configured to identify each hand keypoint in the depth image through a keypoint identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
As an optional implementation manner of the embodiment of the present invention, the image acquisition unit is specifically configured to acquire the gesture image by performing image sampling through a first image acquisition device, and acquire the depth image by performing image sampling through a second image acquisition device.
As an alternative to the embodiment of the present invention,
the position information of each hand key point in the depth image is the pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the mapping unit is specifically configured to calibrate the first image capturing device, and obtain an internal parameter of the first image capturing device and an external parameter of the first image capturing device; calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device; acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device; and acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
As an optional implementation manner of the embodiment of the present invention, the mapping unit is specifically configured to obtain, according to a pixel coordinate of each hand key point in the depth image, an internal parameter of the first image capturing device, an internal parameter of the second image capturing device, the rotation parameter, the position parameter, and a depth value of each hand key point, and the following formula, a pixel coordinate of each hand key point in the gesture image:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of the hand key points in the depth image, T is the position parameter, z h Is the depth value of the key point of the hand.
As an optional implementation manner of the embodiment of the present invention, the device for labeling gesture data further includes: a correction unit;
the image acquisition unit is further used for periodically acquiring the gesture image and the depth image by taking a preset duration as a period;
the labeling unit is further used for labeling gesture data of the gesture images acquired in each period;
And the correction unit is used for smoothing the gesture data of the gesture image acquired in each period.
As an optional implementation manner of the embodiment of the present invention, the image acquisition unit is further configured to build a data acquisition thread before performing image acquisition by using the first image acquisition device to acquire a gesture image, and performing image acquisition by using the second image acquisition device to acquire a depth image corresponding to the gesture image;
the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory for storing a computer program; the processor is configured to, when invoking a computer program, cause the electronic device to implement the method for labeling gesture data according to the first aspect or any optional implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, which when executed by a computing device, causes the computing device to implement a method for labeling gesture data according to the first aspect or any optional implementation manner of the first aspect.
In a fifth aspect, embodiments of the present invention provide a computer program product, which when run on a computer causes the computer to implement the method for labeling gesture data according to the first aspect or any of the optional embodiments of the first aspect.
The method for labeling gesture data provided by the embodiment of the invention comprises the following steps: firstly, acquiring a gesture image and a depth image corresponding to the gesture image, then identifying each hand key point in the depth image, acquiring the position information of each hand key point in the depth image, mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and finally labeling gesture data of the gesture image according to the position information of each hand key point in the gesture image. The embodiment of the invention can acquire the position information of each hand key point in the depth image by identifying each hand key point in the depth image, map the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and mark the gesture data of the gesture image according to the position information of each hand key point in the gesture image, so the embodiment of the invention provides a full-automatic gesture data marking method, and manual operation is not needed in the gesture data marking process of the gesture image, thus the embodiment of the invention can solve the problems of high cost, low efficiency, large error, easiness in subjective influence and the like caused by manual marking of the gesture image.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a scene structure diagram of a method for labeling gesture data according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a method for labeling gesture data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a hand key point provided in an embodiment of the present invention;
FIG. 4 is a schematic diagram of smoothing processing according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a device for labeling gesture data according to an embodiment of the present invention;
FIG. 6 is a second schematic diagram of a device for labeling gesture data according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the invention will be more clearly understood, a further description of the invention will be made. It should be noted that, without conflict, the embodiments of the present invention and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the invention.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion. Furthermore, in the description of the embodiments of the present invention, unless otherwise indicated, the meaning of "plurality" means two or more.
The following describes a scene architecture of a labeling method for gesture data according to an embodiment of the present invention. Referring to fig. 1, a scene architecture of a method for labeling gesture data according to an embodiment of the present invention includes: a two-dimensional image acquisition device 11, a depth image acquisition device 12, and an image processing device 13. The two-dimensional image acquisition device 11 is used for acquiring gesture images to be annotated, and transmitting the gesture images to be annotated to the image processing device 13. The depth image capturing device 12 is configured to capture an image of an image capturing object of the two-dimensional image capturing device 11 at the same time, obtain a depth image corresponding to a gesture image to be annotated, and transmit the depth image to the image processing device 13. The image processing device 13 is configured to identify each hand key point in the depth image, thereby acquiring position information of each hand key point in the depth image, then map the position information of each hand key point in the depth image to position information of each hand key point in the gesture image, and mark gesture data of the gesture image according to the position information of each hand key point in the gesture image.
It should be noted that, in the scene architecture shown in fig. 1, the two-dimensional image capturing device 11, the depth image capturing device 12, and the image processing device 13 are respectively devices independent from each other in hardware, but the embodiment of the present invention is not limited thereto, and two or all of the two-dimensional image capturing device 11, the depth image capturing device 12, and the image processing device 13 may be integrated into the same hardware device based on the above embodiment. For example: the two-dimensional image acquisition device 11 is integrated in the same hardware device as the image processing device 13.
The embodiment of the invention provides a method for labeling gesture data, which is shown by referring to fig. 2, and comprises the following steps S11 to S15:
s11, acquiring a gesture image and a depth image corresponding to the gesture image.
As an optional implementation manner of the embodiment of the present invention, the acquiring the gesture image and the depth image corresponding to the gesture image includes:
and acquiring the gesture image through the first image acquisition device, and synchronously acquiring the depth image corresponding to the gesture image through the second image acquisition device.
Optionally, the first image capturing device in the embodiment of the present invention may be an image capturing device of a Virtual Reality (VR) or augmented Reality (Augmented Reality, AR) device; the second image acquisition device may be a depth camera.
Further, in order to realize that the first image capturing device and the second image capturing device can perform image capturing at the same time, before step S11 (performing image capturing by the first image capturing device to obtain a gesture image, and performing image capturing by the second image capturing device to obtain a depth image corresponding to the gesture image) the method for labeling gesture data according to the embodiment of the present invention further includes: and (5) constructing a data acquisition thread. The data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
And S12, identifying each hand key point in the depth image.
Illustratively, referring to fig. 3, the hand key points in the embodiment of the present invention may include: the fingertips of the five fingers of the hand (5 keypoints) total 21 keypoints for all joints of the hand (16 keypoints).
As an optional implementation manner of the embodiment of the present invention, the step S12 (identifying each hand key point in the depth image) includes:
Identifying each hand key point in the depth image through a key point identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
That is, a neural network model for identifying hand keypoints in the depth image is trained in advance, and each hand keypoint in the depth image is acquired through the trained neural network model.
Of course, in the embodiment of the present invention, each hand key point in the depth image may be identified in other manners, which is not limited in the embodiment of the present invention, and it is important that each hand key point in the depth image can be accurately identified.
S13, acquiring position information of each hand key point in the depth image.
As an optional implementation manner of the embodiment of the present invention, the position information of each hand key point in the depth image is a pixel coordinate of each hand key point in the depth image.
Specifically, the pixel coordinates of each hand key point in the depth image may be determined according to the pixel points corresponding to each hand key point in the depth image.
And S14, mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the implementation manner of the step S14 (mapping the position information of each hand key point in the depth image to the position information of each hand key point in the gesture image) may include the following steps a to d:
and a step a of calibrating the first image acquisition device to acquire the internal parameters of the first image acquisition device and the external parameters of the first image acquisition device.
Specifically, in the image measurement process and the machine vision application, in order to determine the correlation between the three-dimensional geometric position of a point on the surface of a space object and the corresponding pixel point in the image, a geometric model of camera imaging must be established, wherein parameters of the geometric model are an internal parameter and an external parameter of the camera, and the process of establishing the geometric model of camera imaging is called camera calibration. Wherein, the intrinsic parameters of the camera are parameters related to the characteristics of the camera, including: the focal length, pixel size, luminous flux, etc. of the camera, and the external parameters of the camera are used to describe information such as the position, rotation angle, etc. of the camera in the real world (world coordinate system).
Optionally, an implementation manner of calibrating the first image acquisition device may include: shooting a calibration object through a first image acquisition device, and obtaining a camera model of the first image acquisition device by establishing correspondence between points with known coordinates on the calibration object and corresponding pixel points of the calibration object and utilizing a certain algorithm so as to obtain internal parameters and external parameters of the first image acquisition device.
And b, calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device.
The implementation manner of calibrating the second image acquisition device is similar to that of calibrating the first image acquisition device in the step a, and is not described herein.
And c, acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device.
The rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device.
That is, the rotational relationship and the positional relationship between the first image capturing device and the second image capturing device are obtained by the external parameters of the first image capturing device and the external parameters of the second image capturing device.
Illustratively, the external parameters of the camera are used to describe information such as the position, rotation angle, etc. of the camera in the real world (world coordinate system), and set up: the external parameters of the first image acquisition device include that the position coordinates of the first image acquisition device in the world coordinate system are (x) 1 、y 1 、z 1 ) And the rotation angle of the first image acquisition device is alpha, and the external parameters of the second image acquisition device comprise the position coordinate of the second image acquisition device in the world coordinate system as (x) 2 、y 2 、z 2 ) And the rotation angle of the second image acquisition device is beta, the position parameter is T, and the rotation parameter is R, then the method comprises the following steps:
T=(x 2 -x 1 ,y 2 -y 1 ,z 2 -z 1 )
R=β-α
and d, acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
Further optionally, the implementation manner of step d (obtaining the pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameter of the first image capturing device, the internal parameter of the second image capturing device, the rotation parameter, the position parameter, and the depth value of each hand key point) includes:
acquiring pixel coordinates of each hand key point in the gesture image according to pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point and the following formula:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of the hand key points in the depth image, T is the position parameter, z h Is the depth value of the key point of the hand.
The following describes the implementation principle of acquiring the pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point, and the above formulas.
Setting: one hand key point P on shooting objects of the first image acquisition device and the second image acquisition device h The coordinate value in the world coordinate system with the position of the second image acquisition device as the origin is (x) h ,y h ,z h ) Hand key point P h Imaging as P in a first image acquisition device 1 Hand key point P h Imaging as P in a second image acquisition device 2 The following steps are:
P 1 =R*P 2 +T (1)
wherein R is the rotation parameter, and T is the position parameter.
Hand key point P h Imaging P in a first image acquisition device 1 Pixel coordinate p in gesture image 1 The method comprises the following steps:
p 1 =R 1 *P 1 /z h (2)
hand key point P h Imaging P in a second image acquisition device 2 Pixel coordinate p in depth image 2 The method comprises the following steps:
p 2 =R 2 *P 2 /z h (3)
the above formulas (1), (2) and (3) are derived from:
p 1 =R 1 *(R*P 2 +T)/z h =R 1 *R*P 2 /z h +R 1 *T/z h (4)
again because:
P 2 /z h =p 2 /R 2 (5)
it is further possible to substitute the above formula (5) into formula (4):
p 1 =R 1 *R*p 2 /R 2 +R 1 *T/z h =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
s15, marking gesture data of the gesture image according to the position information of each hand key point in the gesture image.
For example, gesture data obtained by labeling gesture data of the gesture image according to the position information of each hand key point in the gesture image may be as shown in table 1 below:
TABLE 1
Figure BDA0003426782670000131
The method for labeling gesture data provided by the embodiment of the invention comprises the following steps: firstly, acquiring a gesture image and a depth image corresponding to the gesture image, then identifying each hand key point in the depth image, acquiring the position information of each hand key point in the depth image, mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and finally labeling gesture data of the gesture image according to the position information of each hand key point in the gesture image. The embodiment of the invention can acquire the position information of each hand key point in the depth image by identifying each hand key point in the depth image, map the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image, and mark the gesture data of the gesture image according to the position information of each hand key point in the gesture image, so the embodiment of the invention provides a full-automatic gesture data marking method, and manual operation is not needed in the gesture data marking process of the gesture image, thus the embodiment of the invention can solve the problems of high cost, low efficiency, large error, easiness in subjective influence and the like caused by manual marking of the gesture image.
As an optional implementation manner of the embodiment of the present invention, the method for labeling gesture data provided by the embodiment of the present invention further includes:
taking a preset time length as a period, periodically acquiring the gesture image and the depth image, and marking gesture data of the gesture image acquired in each period;
and carrying out smoothing processing on gesture data of the gesture images acquired in each period.
That is, the first image acquisition device and the second image acquisition device are controlled to acquire images once at fixed time intervals, the gesture data of the gesture images acquired in each period are marked by the marking method of the gesture data provided by the embodiment shown in fig. 2, and then the gesture data of the gesture images acquired in each period are smoothed for each hand key point.
In general, the position change of each hand key point in the neighborhood period is continuous and smooth, and if the position of a certain hand key point in the gesture data has larger fluctuation and the position of the hand key point in the neighborhood period is very unsmooth, the position of the hand key point is likely to be caused by mislabeling, and the above embodiment further performs smoothing processing on the gesture data of the gesture image acquired in each period, so that the labeled gesture data can be more accurate.
The preset duration may be, for example, 33ms. That is, the first image pickup device and the second image pickup device are controlled to perform image pickup once every 33ms.
Exemplary, referring to FIG. 4, after marking the gesture data of the gesture image of each period, the pixel coordinates of a certain hand key point in the n-2 th period, the n-1 th period, the n-th period, the n+1th period, and the n+2th period are (x 1 ,y 1 )、(x 2 ,y 2 )、(x 3 ,y 3 )、(x 4 ,y 4 )、(x 5 ,y 5 ) According to the pixel coordinates (x 1 ,y 1 ) The pixel coordinates (x) of the hand keypoint in the n-1 th period 2 ,y 2 ) The pixel coordinates (x) of the hand keypoint in the n+1th period 4 ,y 4 ) And the pixel coordinates (x) of the hand keypoint in the n+2th period 5 ,y 5 ) For the pixel coordinates (x 3 ,y 3 ) The coordinate value obtained after the filtering is (x' 3 ,y′ 3 ) Therefore, the pixel coordinate value of the hand key point in the gesture data of the gesture image acquired in the nth period can be corrected to (x '' 3 ,y′ 3 )。
Based on the same inventive concept, as an implementation of the method, the embodiment of the present invention further provides a device for labeling gesture data, where the embodiment of the device corresponds to the embodiment of the method, and for convenience of reading, the embodiment of the present invention does not describe details in the embodiment of the method one by one, but it should be clear that the device for labeling gesture data in the embodiment can correspondingly implement all the details in the embodiment of the method.
An embodiment of the present invention provides a device for labeling gesture data, and fig. 5 is a schematic structural diagram of the device for labeling gesture data, as shown in fig. 5, a device 500 for labeling gesture data includes:
an image acquisition unit 51, configured to acquire a gesture image and a depth image corresponding to the gesture image;
a keypoint identification unit 52 for identifying each hand keypoint in the depth image;
a position obtaining unit 53, configured to obtain position information of each hand key point in the depth image;
a mapping unit 54, configured to map position information of each hand key point in the depth image to position information of each hand key point in the gesture image;
and the labeling unit 55 labels the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the keypoint identifying unit 52 is specifically configured to identify each hand keypoint in the depth image through a keypoint identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
As an optional implementation manner of the embodiment of the present invention, the image capturing unit 51 is specifically configured to obtain the gesture image by performing image sampling by using a first image capturing device, and obtain the depth image by performing image sampling by using a second image capturing device synchronously.
As an alternative to the embodiment of the present invention,
the position information of each hand key point in the depth image is the pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
As an optional implementation manner of the embodiment of the present invention, the mapping unit 54 performs calibration on the first image capturing device to obtain an internal parameter of the first image capturing device and an external parameter of the first image capturing device; calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device; acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device; and acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
As an optional implementation manner of this embodiment of the present invention, the mapping unit 54 is specifically configured to obtain, according to a pixel coordinate of each hand keypoint in the depth image, an internal parameter of the first image capturing device, an internal parameter of the second image capturing device, the rotation parameter, the position parameter, and a depth value of each hand keypoint, and the following formula, a pixel coordinate of each hand keypoint in the gesture image:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 For the position parameter, p 2 The pixel coordinates of the hand key points in the depth image, T is the external parameter of the second image acquisition device, and z h Is the depth value of the key point of the hand.
As an alternative implementation manner of the embodiment of the present invention, referring to fig. 6, the device 500 for labeling gesture data further includes: a correction unit 56;
the image acquisition unit 51 is further configured to periodically acquire the gesture image and the depth image with a preset duration as a period;
the labeling unit 55 is further configured to label gesture data of the gesture image acquired in each period;
The correction unit 56 is configured to perform smoothing processing on gesture data of the gesture image acquired in each period.
As an optional implementation manner of the embodiment of the present invention, the image acquisition unit 51 is further configured to build a data acquisition thread before performing image acquisition by using a first image acquisition device to acquire a gesture image and performing image acquisition by using a second image acquisition device to acquire a depth image corresponding to the gesture image;
the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
The gesture data labeling device provided in this embodiment may execute the gesture data labeling method provided in the above method embodiment, and its implementation principle and technical effects are similar, and are not repeated here.
Based on the same inventive concept, the embodiment of the invention also provides electronic equipment. Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 7, where the electronic device provided in this embodiment includes: a memory 701 and a processor 702, the memory 701 for storing a computer program; the processor 702 is configured to execute the method for labeling gesture data provided in the foregoing embodiment when a computer program is invoked.
Based on the same inventive concept, the embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computing device is caused to implement the method for labeling gesture data provided in the foregoing embodiment.
Based on the same inventive concept, the embodiment of the present invention further provides a computer program product, which when running on a computer, causes the computing device to implement the method for labeling gesture data provided in the above embodiment.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
The processor may be a central processing unit (CentralProcessingUnit, CPU), but may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash memory (flashRAM). Memory is an example of a computer-readable medium.
Computer readable media include both non-transitory and non-transitory, removable and non-removable storage media. Storage media may embody any method or technology for storage of information, which may be computer readable instructions, data structures, program modules, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transshipment) such as modulated data signals and carrier waves.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (12)

1. A method for labeling gesture data, comprising:
acquiring a gesture image and a depth image corresponding to the gesture image;
identifying each hand keypoint in the depth image;
acquiring position information of each hand key point in the depth image;
mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and marking the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
2. The method of claim 1, wherein the identifying individual hand keypoints in the depth image comprises:
Identifying each hand key point in the depth image through a key point identification model;
the key point recognition model is a model obtained by training a neural network model based on sample data, and the sample data comprises a plurality of depth images marked with key points of hands.
3. The method of claim 1, wherein the acquiring the gesture image and the corresponding depth image of the gesture image comprises:
and acquiring the gesture image by performing image sampling through a first image acquisition device, and synchronously acquiring the depth image by performing image sampling through a second image acquisition device.
4. The method of claim 3, wherein the step of,
the position information of each hand key point in the depth image is the pixel coordinate of each hand key point in the depth image;
the position information of each hand key point in the gesture image is the pixel coordinate of each hand key point in the gesture image.
5. A method according to claim 3, wherein mapping the position information of each hand keypoint in the depth image to the position information of each hand keypoint in the gesture image comprises:
Calibrating the first image acquisition device to acquire an inner parameter of the first image acquisition device and an outer parameter of the first image acquisition device;
calibrating the second image acquisition device to acquire the internal parameters of the second image acquisition device and the external parameters of the second image acquisition device;
acquiring a rotation parameter and a position parameter according to the external parameter of the first image acquisition device and the external parameter of the second image acquisition device, wherein the rotation parameter is used for representing the rotation relation between the second image acquisition device and the first image acquisition device, and the position parameter is used for representing the position relation between the second image acquisition device and the first image acquisition device;
and acquiring pixel coordinates of each hand key point in the gesture image according to the pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters and the depth values of each hand key point.
6. The method according to claim 5, wherein the acquiring the pixel coordinates of each hand keypoint in the gesture image according to the pixel coordinates of each hand keypoint in the depth image, the internal parameters of the first image capturing device, the internal parameters of the second image capturing device, the rotation parameters, the position parameters, and the depth value of each hand keypoint comprises:
Acquiring pixel coordinates of each hand key point in the gesture image according to pixel coordinates of each hand key point in the depth image, the internal parameters of the first image acquisition device, the internal parameters of the second image acquisition device, the rotation parameters, the position parameters, the depth values of each hand key point and the following formula:
p 1 =R 1 *R*R 2 -1 *p 2 +R 1 *T*z h -1
wherein p is 1 R is the pixel coordinate of the hand key point in the gesture image 1 Is the internal parameter of the first image acquisition device, R is the rotation parameter, R 2 Is an internal parameter, p, of the second image acquisition device 2 Pixel coordinates of the hand key points in the depth image, T is the position parameter, z h Is the depth value of the key point of the hand.
7. The method according to any one of claims 1-6, further comprising:
taking a preset time length as a period, periodically acquiring the gesture image and the depth image, and marking gesture data of the gesture image acquired in each period;
and carrying out smoothing processing on gesture data of the gesture images acquired in each period.
8. The method of any of claims 3-6, wherein prior to acquiring a gesture image by image acquisition by a first image acquisition device and synchronizing image acquisition by a second image acquisition device to acquire a depth image corresponding to the gesture image, the method further comprises:
Constructing a data acquisition thread; the data acquisition thread is used for controlling the first image acquisition device and the second image acquisition device to synchronously acquire images.
9. A device for labeling gesture data, comprising:
the image acquisition unit is used for acquiring a gesture image and a depth image corresponding to the gesture image;
a key point identification unit, configured to identify each hand key point in the depth image;
the position acquisition unit is used for acquiring the position information of each hand key point in the depth image;
the mapping unit is used for mapping the position information of each hand key point in the depth image into the position information of each hand key point in the gesture image;
and the labeling unit is used for labeling the gesture data of the gesture image according to the position information of each hand key point in the gesture image.
10. An electronic device, comprising: a memory and a processor, the memory for storing a computer program; the processor is configured to cause the electronic device to implement the method for labeling gesture data according to any one of claims 1-8 when the computer program is invoked.
11. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, which when executed by a computing device, causes the computing device to implement the method of labeling gesture data according to any of claims 1-8.
12. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to carry out the method of labeling gesture data according to any one of claims 1-8.
CN202111579908.9A 2021-12-22 2021-12-22 Gesture data labeling method and device Pending CN116363741A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111579908.9A CN116363741A (en) 2021-12-22 2021-12-22 Gesture data labeling method and device
PCT/CN2022/139979 WO2023116620A1 (en) 2021-12-22 2022-12-19 Gesture data annotation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111579908.9A CN116363741A (en) 2021-12-22 2021-12-22 Gesture data labeling method and device

Publications (1)

Publication Number Publication Date
CN116363741A true CN116363741A (en) 2023-06-30

Family

ID=86901345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111579908.9A Pending CN116363741A (en) 2021-12-22 2021-12-22 Gesture data labeling method and device

Country Status (2)

Country Link
CN (1) CN116363741A (en)
WO (1) WO2023116620A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934065B (en) * 2017-12-18 2021-11-09 虹软科技股份有限公司 Method and device for gesture recognition
CN109710071B (en) * 2018-12-26 2022-05-17 青岛小鸟看看科技有限公司 Screen control method and device
CN111815754B (en) * 2019-04-12 2023-05-30 Oppo广东移动通信有限公司 Three-dimensional information determining method, three-dimensional information determining device and terminal equipment
CN112150448B (en) * 2020-09-28 2023-09-26 杭州海康威视数字技术股份有限公司 Image processing method, device and equipment and storage medium
CN112613384B (en) * 2020-12-18 2023-09-19 安徽鸿程光电有限公司 Gesture recognition method, gesture recognition device and control method of interactive display equipment
CN114882524A (en) * 2022-04-15 2022-08-09 华南理工大学 Monocular three-dimensional gesture estimation method based on full convolution neural network

Also Published As

Publication number Publication date
WO2023116620A1 (en) 2023-06-29

Similar Documents

Publication Publication Date Title
CN110827247B (en) Label identification method and device
CN109740670B (en) Video classification method and device
CN108304765B (en) Multi-task detection device for face key point positioning and semantic segmentation
CN110580723B (en) Method for carrying out accurate positioning by utilizing deep learning and computer vision
CN107633526B (en) Image tracking point acquisition method and device and storage medium
CN102831386B (en) Object identification method and recognition device
CN111354042A (en) Method and device for extracting features of robot visual image, robot and medium
WO2018014828A1 (en) Method and system for recognizing location information in two-dimensional code
CN111612834B (en) Method, device and equipment for generating target image
JP6997369B2 (en) Programs, ranging methods, and ranging devices
CN108765532B (en) Child drawing model building method, reading robot and storage device
CN110852954B (en) Image inclination correction method and system for pointer instrument
CN112307786B (en) Batch positioning and identifying method for multiple irregular two-dimensional codes
CN111738036A (en) Image processing method, device, equipment and storage medium
CN110956131B (en) Single-target tracking method, device and system
CN112132754A (en) Vehicle movement track correction method and related device
JP7204786B2 (en) Visual search method, device, computer equipment and storage medium
CN115937003A (en) Image processing method, image processing device, terminal equipment and readable storage medium
CN108447092B (en) Method and device for visually positioning marker
CN112102404B (en) Object detection tracking method and device and head-mounted display equipment
CN111758118B (en) Visual positioning method, device, equipment and readable storage medium
CN116363741A (en) Gesture data labeling method and device
CN110610184B (en) Method, device and equipment for detecting salient targets of images
CN106651950B (en) Single-camera pose estimation method based on quadratic curve perspective projection invariance
Eldesokey et al. Ellipse detection for visual cyclists analysis “In the wild”

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination