CN112233161B

CN112233161B - Hand image depth determination method and device, electronic equipment and storage medium

Info

Publication number: CN112233161B
Application number: CN202011102705.6A
Authority: CN
Inventors: 董亚娇
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2024-05-17
Anticipated expiration: 2040-10-15
Also published as: CN112233161A

Abstract

The method inputs an image to be processed into a key point detection network to detect key points of a hand region when the depth analysis is carried out on the hand region in the image to be processed containing the hand region, wherein a preset feature layer of the key point detection network is processed to obtain key point features of the hand region, and the preset feature layer is connected with the depth detection network, so that the key point features of the hand region are input into the depth detection network to carry out depth detection to obtain depth information of the hand region. According to the above process, the scheme is combined with the hand key point characteristics to capture the hand region in the image, so that the interference of the background in the image is avoided, and therefore, the accuracy of the hand region depth analysis result is improved.

Description

Hand image depth determination method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to a hand image depth determining method, a hand image depth determining device, electronic equipment and a storage medium.

Background

In the related art, the implementation scheme of the depth estimation of the hand image is mainly to train a depth prediction network by using the hand image data marked with the depth. Because real hand depth data is noisy, a virtual hand image labeled with hand depth data is typically used to train a depth prediction network. However, the background of the virtual hand image is usually in a single color, so that such a depth prediction network is very susceptible to complex background in practical application, for example, the background close to the hand color is predicted as the hand, so that the accuracy of the prediction result of the hand depth is low.

Disclosure of Invention

The disclosure provides a hand image depth determining method, a hand image depth determining device, electronic equipment and a storage medium, so as to at least solve the problem of low accuracy of hand image depth analysis results in the related art. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided a hand image depth determining method, including:

acquiring an image to be processed containing a hand area;

inputting the image to be processed into a key point detection network, and detecting key points of the hand area;

connecting a depth detection network to a preset feature layer of the key point detection network, so as to input the key point features of the hand region, which are obtained by processing the preset feature layer, into the depth detection network for depth detection, and obtaining the depth information of the hand region.

In a possible implementation manner of the first aspect, the step of detecting the network connection depth at the preset feature layer of the keypoint detection network includes:

the key point detection network comprises a plurality of hourglass network structures which are sequentially connected, and the characteristic layer with the smallest size in the last hourglass network structure in the plurality of hourglass network structures is connected with the depth detection network.

In another possible implementation manner of the first aspect, the step of inputting the key point features of the hand region detected by the preset feature layer into the depth detection network to perform depth detection, and obtaining depth information of the hand region includes:

inputting the key point characteristics of the hand region into the depth detection network to carry out depth detection, and obtaining depth values of all pixel points of the hand region;

When the depth value of the pixel point is larger than or equal to a preset depth threshold value, determining that the pixel point belongs to a background area;

When the depth value of the pixel point is smaller than the preset depth threshold value, determining that the pixel point belongs to the hand region;

and obtaining depth information of the hand region according to the depth values of all the pixel points belonging to the hand region.

In still another possible implementation manner of the first aspect, the step of inputting the key point features of the hand region detected by the preset feature layer into the depth detection network to perform depth detection, and obtaining depth information of the hand region includes:

Inputting the key point characteristics of the hand region into the depth detection network for depth detection to obtain relative depth values between each pixel point contained in the hand region and a preset reference hand key point;

When the relative depth value between the pixel point and the reference hand key point is greater than or equal to a preset relative depth threshold value, determining that the pixel point belongs to a background area;

When the relative depth value between the pixel point and the reference hand key point is smaller than the preset relative depth threshold value, determining that the pixel point belongs to the hand region;

In another possible implementation manner of the first aspect, the method further includes:

acquiring an absolute depth value corresponding to the reference hand key point;

And superposing the relative depth value between each pixel point contained in the hand area and the reference hand key point with the absolute depth value of the reference hand key point to obtain the absolute depth value of each pixel point contained in the hand area.

In a further possible implementation manner of the first aspect, the training process of the keypoint detection network includes:

Acquiring a key point training sample set containing hand key point labeling information, wherein the key point training sample set comprises a real hand sample image and a virtual hand sample image, the real hand sample image comprises two-dimensional hand key point labeling information, and the virtual hand sample image comprises three-dimensional hand key point labeling information;

Inputting training samples in the key point training sample set into an initial key point detection network to detect hand key points to obtain hand key point detection results of the training samples, wherein the hand key point detection results corresponding to the virtual hand sample images comprise three-dimensional position information of all hand key points, and the hand key point detection results corresponding to the real hand sample images comprise two-dimensional position information of all hand key points;

And calculating the key point loss corresponding to the training sample according to the hand key point detection result and the hand key point labeling information corresponding to the training sample, and adjusting the network parameters of the initial key point detection network according to the key point loss until the key point loss meets the key point convergence condition to obtain the key point detection network.

In another possible implementation manner of the first aspect, the training process of the depth detection network includes:

obtaining a virtual hand image training sample set marked with relative depth marking information, wherein the relative depth marking information is the relative depth of each pixel point relative to a reference hand key point;

inputting training samples in the virtual hand image training sample set into a key point detection network obtained through training, and obtaining hand area key point characteristics of the training samples at a preset characteristic layer of the key point detection network;

inputting the hand region key point characteristics of the training sample into an initial depth detection network to obtain a relative depth detection result of each pixel point in the training sample relative to the reference hand key point;

And calculating the depth loss corresponding to the training sample according to the relative depth detection result and the relative depth marking information corresponding to the training sample, and adjusting network parameters of the initial depth detection network according to the depth loss until the depth loss meets a depth convergence condition to obtain the depth detection network.

In a further possible implementation manner of the first aspect, the reference hand keypoints are palm root keypoints.

According to a second aspect of embodiments of the present disclosure, there is provided a hand image depth determination apparatus, comprising:

the image processing device comprises a to-be-processed image acquisition module, a processing module and a processing module, wherein the to-be-processed image acquisition module is configured to acquire to-be-processed images containing hand areas;

the key point detection module is configured to input the image to be processed into a key point detection network and detect key points of the hand area;

the depth detection module is configured to connect a depth detection network to a preset feature layer of the key point detection network, so as to input the key point features of the hand region, which are obtained by processing the preset feature layer, into the depth detection network to carry out depth detection, and obtain the depth information of the hand region.

In a possible implementation manner of the second aspect, the depth detection module is specifically configured to:

In another possible implementation manner of the second aspect, the depth detection module includes:

The depth detection sub-module is configured to input the key point characteristics of the hand region into the depth detection network to carry out depth detection so as to obtain the depth value of each pixel point of the hand region;

The background determination submodule is configured to determine that the pixel point belongs to a background area when the depth value of the pixel point is greater than or equal to a preset depth threshold value;

a hand region determination submodule configured to determine that the pixel point belongs to the hand region when the depth value of the pixel point is smaller than the preset depth threshold value;

A hand depth determination submodule configured to obtain depth information of the hand region according to depth values of all pixel points belonging to the hand region.

In a further possible implementation manner of the second aspect, the depth detection module is specifically configured to:

In a further possible implementation manner of the second aspect, the apparatus further includes:

The reference point depth acquisition module is configured to acquire an absolute depth value corresponding to the reference hand key point;

the hand region absolute depth acquisition module is configured to superimpose the relative depth value between each pixel point contained in the hand region and the reference hand key point with the absolute depth value of the reference hand key point to obtain the absolute depth value of each pixel point contained in the hand region.

The system comprises a key point training sample acquisition module, a key point training sample acquisition module and a hand marking module, wherein the key point training sample acquisition module is configured to acquire a key point training sample set containing hand key point marking information, the key point training sample set comprises a real hand sample image and a virtual hand sample image, the real hand sample image comprises two-dimensional hand key point marking information, and the virtual hand sample image comprises three-dimensional hand key point marking information;

The sample key point detection module is configured to input training samples in the key point training sample set into an initial key point detection network to detect hand key points to obtain hand key point detection results of the training samples, wherein the hand key point detection results corresponding to the virtual hand sample images comprise three-dimensional position information of all hand key points, and the hand key point detection results corresponding to the real hand sample images comprise two-dimensional position information of all hand key points;

The key point detection network adjustment module is configured to calculate and obtain the key point loss corresponding to the training sample according to the hand key point detection result and the hand key point labeling information corresponding to the training sample, and adjust the network parameters of the initial key point detection network according to the key point loss until the key point loss meets the key point convergence condition, so as to obtain the key point detection network.

In a further possible implementation manner of the second aspect, the apparatus includes:

The depth training sample acquisition module is configured to acquire a virtual hand image training sample set marked with relative depth marking information, wherein the relative depth marking information is the relative depth of each pixel point relative to a reference hand key point;

The sample depth detection module is configured to input training samples in the virtual hand image training sample set into a training-obtained key point detection network, obtain hand region key point characteristics of the training samples at a preset characteristic layer of the key point detection network, and input hand region key point characteristics of the training samples into an initial depth detection network to obtain a relative depth detection result of each pixel point in the training samples relative to the reference hand key point;

The depth detection network adjustment module is configured to calculate and obtain depth loss corresponding to the training sample according to the relative depth detection result and the relative depth labeling information corresponding to the training sample, and adjust network parameters of the initial depth detection network according to the depth loss until the depth loss meets a depth convergence condition, so as to obtain the depth detection network.

In a further possible implementation manner of the second aspect, the reference hand keypoint is a palm root keypoint.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

A processor;

A memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the instructions to implement the hand image depth determination method of any one of the first aspects.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the hand image depth determination method as in any one of the first aspects.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product having instructions stored therein which when executed by a processor in the electronic device, implement the hand image depth determination method of any one of the first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: when the depth analysis is carried out on the hand region in the image to be processed containing the hand region, the image to be processed is firstly input into a key point detection network to detect key points of the hand region, wherein a preset feature layer of the key point detection network is processed to obtain key point features of the hand region, and the preset feature layer is connected with the depth detection network, so that the key point features of the hand region are input into the depth detection network to carry out the depth detection, and the depth information of the hand region is obtained. According to the above process, the scheme captures the hand area in the image by combining the hand key point characteristics, so that the interference of the background in the image is avoided. Therefore, the accuracy of the hand region depth analysis result is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flow chart illustrating a method of hand image depth determination according to an exemplary embodiment;

FIG. 2 is a schematic diagram of key points of a hand image, according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating a process of obtaining depth information for a hand image, according to an example embodiment;

FIG. 4 is a flowchart illustrating another hand image depth determination method according to an exemplary embodiment;

FIG. 5 is a schematic diagram of a network architecture of a keypoint detection network and a depth detection network, according to an exemplary embodiment;

FIG. 6 is a flowchart illustrating a keypoint detection network training process, according to an exemplary embodiment;

FIG. 7 is a flowchart illustrating a depth detection network training process according to an example embodiment;

FIG. 8 is a block diagram of a hand image depth determination device, according to an exemplary embodiment;

FIG. 9 is a block diagram of another hand image depth determination device, shown according to an exemplary embodiment;

FIG. 10 is a block diagram of another hand image depth determination device, according to an example embodiment;

fig. 11 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flow chart of a hand image depth determination method for use in a device having computing capabilities, such as a PC, server, or mobile smart terminal, etc., according to an exemplary embodiment. As shown in fig. 1, the method may include the following steps.

In S110, a to-be-processed image including a hand region is acquired.

In one possible implementation, the image to be processed may be an image including a hand region captured by the smart terminal. In another possible implementation, the image to be processed may also be a locally stored image containing hand regions.

In S120, the image to be processed is input to the keypoint detection network to detect keypoints of the hand region.

Before the hand region depth analysis is carried out on the image to be processed, hand key points of the hand region in the image to be processed are analyzed, and hand region key point characteristics are obtained.

In one possible implementation, the keypoint detection network may be used to analyze the keypoint information of the hand region included in the image to be processed to obtain the hand keypoint feature. The key point detection network can adopt a deep convolutional neural network.

The keypoint detection network is capable of detecting the position of a hand region in an image and locating individual keypoints of the hand region, as shown in fig. 2, the hand keypoints mainly comprise 21 major bone nodes, such as finger tips of the hand, joints of individual phalanges, and the like.

In S130, a depth detection network is connected to a preset feature layer of the key point detection network, so that the key point features of the hand region detected by the preset feature layer are input into the depth detection network to perform depth detection to obtain depth information of the hand region.

The key point detection network comprises a plurality of feature layers, features are extracted from the image to be processed layer by utilizing the feature layers, and finally positions of key points contained in a hand region in the image to be processed are determined. The method comprises the steps of obtaining key point characteristics of a hand area in a preset characteristic layer, wherein the key point characteristics of the hand area comprise characteristics of most pixel points of the hand area.

In one possible implementation of the present disclosure, the key point detection network employs a neural network including a plurality of hourglass network structures connected in sequence, and accordingly, the preset feature layer may be a feature layer with a minimum size in the last hourglass network structure. For example, the keypoint detection network comprises two hourglass network structures, wherein the smallest-sized feature layer in the second hourglass network structure is the preset feature layer described above.

And connecting the preset feature layer with a depth detection network so as to input the key point features of the hand region obtained by the detection of the preset feature layer into the depth detection network for depth detection, and finally obtaining the depth information of the hand region.

The technical scheme provided by the embodiment at least brings the following beneficial effects: when the depth analysis is carried out on the hand region in the image, the hand key point detection is carried out on the hand region in the image to obtain the hand region key point characteristics in the image. And then carrying out depth analysis on the key point characteristics of the hand region to obtain the depth information of the hand region. According to the scheme, the hand region in the image is captured by combining the hand key point characteristics, so that interference of the background in the image is avoided, and the accuracy of the hand region depth analysis result is improved.

In one possible implementation, as shown in fig. 3, depth information of pixels in the image may be used to further distinguish between the background and the hand region, where the distinguishing process may include:

in S131, the hand region key point feature is input to the depth detection network to perform depth detection to obtain a depth value of each pixel point of the hand region.

Since the depth of the hand region is usually small and the depth of the background region is large in the image including the hand, the background region and the hand region can be further distinguished by setting a depth threshold value, and finally the depth information of each pixel point belonging to the hand region is obtained.

The preset depth threshold may be set according to an actual service requirement, which is not limited in the present disclosure.

In S132, if the depth of the pixel is greater than or equal to the preset depth threshold, it is determined that the pixel belongs to the background area.

In S133, if the depth of the pixel is less than the preset depth threshold, it is determined that the pixel belongs to the hand region.

In S134, depth information of the hand region is obtained from the depth values of all the pixel points belonging to the hand region.

After the above judgment process is performed on each pixel point in the hand region, all the pixel points belonging to the hand region are obtained, and depth information corresponding to the pixel points, namely, the depth information of the hand region.

In one possible implementation of the present disclosure, the depth difference between the pixels in the hand region is small, and to increase the contrast, a hand key point (for example, a palm root node) is selected as a reference hand key point. The depth detection network outputs the relative depth value of each pixel point in the hand region and the key point of the reference hand. Correspondingly, if the relative depth value of the pixel point is greater than or equal to a preset relative depth threshold value, determining that the pixel point belongs to a background area; if the relative depth value of the pixel point is smaller than a preset relative depth threshold value, determining that the pixel point belongs to the hand area; the relative depth of the hand region is calculated from the relative depths of all the pixel points belonging to the hand region.

According to the hand image depth determination method provided by the embodiment, after the depth information of the hand region is obtained by combining the hand key point characteristics, the depth value of each pixel point of the hand region is compared with the preset depth threshold value, the pixel point belonging to the background region is further identified, the pixel points belonging to the background region are further removed from each pixel point of the hand region, all the pixel points belonging to the hand region are obtained, and finally the depth value of all the pixel points belonging to the hand region is obtained as the depth information of the hand region. According to the scheme, the depth value of the pixel point is further utilized to screen out the pixel point belonging to the background area, and the accuracy of the depth information of the hand area is further improved.

In one possible implementation of the present disclosure, the depth detection network outputs relative depth values of each pixel point in the hand region and the reference hand keypoints. Therefore, in an application scene where the absolute depth value of the hand region is required to be used, the relative depth of each pixel point in the hand region needs to be converted into the absolute depth.

Fig. 4 is a flowchart illustrating another hand image depth determination method according to an exemplary embodiment, which further includes the following steps on the basis of the embodiment shown in fig. 1.

In S140, an absolute depth value of the reference hand keypoint is acquired.

In one possible implementation, the keypoint detection network is trained using a portion of the training sample containing three-dimensional keypoint labeling information, such that the keypoint detection network obtains three-dimensional information (x, y, z) of the hand keypoints of the hand region, where x and y represent coordinates of the hand keypoints on the image plane and z represents coordinates of the hand keypoints in a depth direction perpendicular to the image plane.

In one embodiment of the present disclosure, the reference hand keypoint may be a palm root keypoint. For example, the hand keypoint numbered 0 in fig. 2 is the palm root keypoint.

Of course, in other embodiments of the present disclosure, other key points of the hand may also be determined as reference points, which are not described herein.

In S150, the relative depth value between each pixel point of the hand region and the reference hand key point is superimposed with the absolute depth value of the reference hand key point to obtain the absolute depth value of each pixel point of the hand region.

According to the hand image depth determination method provided by the embodiment, after the relative depth value between each pixel point of the hand region and the reference hand key point in the image to be processed is detected by the depth detection network, the absolute depth value of each pixel point of the hand region is obtained through reduction according to the absolute depth value of the reference hand key point and the relative depth value of each pixel point of the hand region, so that the requirements of different scenes are met.

Fig. 5 is a network architecture diagram of a keypoint detection network and a depth detection network, according to an exemplary embodiment.

As shown in fig. 5, the present embodiment is described by taking a neural network with a double-hourglass network structure as an example, where the neural network with the double-hourglass network structure includes two hourglass network structures, and the size of the feature layer in each of the hourglass network structures is smaller first and then larger.

In this embodiment, the depth detection network is connected to a feature layer with the smallest size (i.e., a preset feature layer) in the second hourglass in the dual-hourglass network structure, and this structure enables the depth detection network to directly use the hand region key point feature extracted by the feature layer with the smallest size in the second hourglass in the dual-hourglass network structure.

In other embodiments of the present disclosure, the key point detection network may be implemented by using other neural networks, so that a feature layer capable of obtaining most of the pixel point features in the hand area is determined to be a preset feature layer according to a specific network structure of the neural network, which is not described in detail herein.

The depth detection network is connected to a preset feature layer of the key point detection network, namely, the features extracted by the preset feature layer are directly transmitted to the depth detection network, so that the accuracy of the key point detection network directly influences the result accuracy of the later-stage depth detection network, the accuracy of the key point detection network needs to be ensured, and in order to improve the accuracy of the key point detection network, the joint training of the real hand image and the virtual hand image can be utilized.

In training the depth detection network, however, the depth detection network is typically trained using virtual hand images with accurate depth annotation data because the depth annotation data of the real hand images is noisy.

In addition, the keypoint detection network needs to be trained with a combination of real hand images and virtual hand images, while the depth detection network only needs to be trained with virtual hand images, so that the two networks need to be trained separately.

The function of the key point detection network is to determine the position of the hand region by predicting the position of the hand key point in the image, so that the depth detection network can accurately identify the hand region and the background region when the depth of the hand region is predicted, namely, the accuracy of the key point detection network influences the accuracy of the depth detection network. Therefore, it is necessary to train the keypoint detection network first and train the depth detection network after the keypoint detection network training is completed.

In addition, in the process of training the depth detection network, if the network parameters of the depth detection network are not fixed, the parameters of the depth loss back propagation process can change, so that the accuracy of the key point prediction result of the key point detection network is affected. Therefore, when training the deep inspection network, it is necessary to fix the network parameters of the key point inspection network and only adjust the network parameters of the deep inspection network.

FIG. 6 is a flowchart illustrating a keypoint detection network training process, as shown in FIG. 6, which may include the following steps, according to an exemplary embodiment.

In S210, a set of keypoint training samples containing hand keypoint labeling information is obtained.

In one possible implementation, the set of keypoint training samples includes a real hand sample image and a virtual hand sample image.

The virtual hand image is different from the real hand image, and is characterized in that a specific hand shape is firstly manufactured, preset skin texture, background and other information are added during rendering, and three-dimensional labeling information (x, y and z) of hand key points is automatically generated during rendering, wherein the three-dimensional labeling information comprises two-dimensional coordinates (x, y) of the hand key points in the image and a coordinate z in the depth direction. But due to the limitations of virtual data generation technology, both the background and the hand type information of the virtual hand image are limited.

The real hand image is a hand image obtained by shooting a real hand and contains rich background and hand type information. However, it is difficult to obtain accurate three-dimensional labeling information of the hand key points in the real hand image, and only two-dimensional labeling information, namely, two-dimensional coordinates (x, y) of the hand key points in the image plane, can be obtained.

Therefore, in order to make the prediction effect of the keypoint detection network better, it is necessary to train the keypoint detection network by combining the real hand image and the virtual hand image, so that the keypoint detection network can learn the characteristics of the hand keypoints well and the characteristics of the hand different from the background.

In addition, since the key point labeling information in the virtual hand image is three-dimensional labeling information and comprises information of the key points in the depth direction, the key point detection network can learn the information of the key points of the hand in the depth direction, namely, three-dimensional position information of the key points of the hand can be finally obtained through prediction.

In S220, the training samples in the key point training sample set are input into the initial key point detection network to perform hand key point detection, so as to obtain a hand key point detection result of the training samples.

The keypoint detection network may adopt a neural network with a double funnel structure shown in fig. 5, and the hand keypoint network performs forward propagation on the keypoint training sample to obtain a hand keypoint detection result corresponding to the keypoint training sample. The hand key point detection result of the virtual hand sample image is three-dimensional position information of the hand key point; the hand key point detection result of the real hand sample image is the two-dimensional position information of the hand key point.

In one possible implementation of the present disclosure, the key point convergence criterion may be that the loss calculated using the loss function is no longer reduced.

The hand keypoint labeling information is extracted from a keypoint training sample, and in one possible implementation manner, the position information of the hand keypoints in the training sample can be converted into a corresponding thermodynamic diagram so that the keypoint detection network can extract the spatial information of the hand keypoints from the thermodynamic diagram.

The thermodynamic diagram is a relatively common display mode in the data visualization project, and can intuitively reflect data information such as hot spot distribution, region aggregation and the like through the degree of color change. In the thermodynamic diagram corresponding to the key point training sample, the brightness (gray value) of the area where the hand key point is located is higher than the brightness (gray value) of the area where the non-hand key point is located, so that the brighter the color of the area where the hand key point is located in the thermodynamic diagram. The key point detection network can easily determine whether the pixel point is a hand or a background by utilizing the brightness or the gray value of each pixel point.

In S230, the keypoint loss is calculated according to the hand keypoint detection result and the hand keypoint labeling information corresponding to the training sample, and whether the keypoint loss meets the keypoint convergence condition is determined.

In a supervised machine learning network, the error of the predicted value and the true value of a single sample is called a loss, the smaller the loss, the better the network. The function used to calculate the loss is called the loss function. The quality of each predicted outcome of the network is measured by the loss function. The loss function here includes the loss between the hand keypoint detection result and the labeled hand keypoint, as well as gradient information of the loss.

In S240, if the key point loss does not meet the key point convergence condition, adjusting network parameters of the initial key point detection network according to the key point loss, updating the key point loss according to the detection result of the adjusted key point detection network, and continuously determining whether the updated key point loss meets the key point convergence condition.

And predicting hand key point information of each key point training sample by using the initial key point detection network, judging whether the loss of the predicted result (namely, the key point prediction error) meets the key point convergence condition, if the loss does not meet the key point convergence condition, adjusting network parameters, continuously predicting the hand key points of each hand key point training sample by using the adjusted network, continuously calculating whether the loss of the predicted result meets the key point convergence condition, and if the loss meets the key point convergence condition, ending the network training process, and determining the current network parameters as final network parameters.

In S250, if the keypoint loss satisfies the keypoint convergence condition, the initial keypoint detection network is determined to be the keypoint detection network.

In one possible implementation manner of the present disclosure, the weight parameter of each layer in the keypoint detection network when the loss function satisfies the keypoint convergence condition may be solved by using a gradient descent method, so as to obtain a final keypoint detection network.

According to the key point detection network training process provided by the embodiment, the key point detection network is trained by utilizing the combination of the real hand image and the virtual hand image, the key point detection network can learn the characteristics of the key points of the hand well through the virtual hand image, and the key point detection network can learn the characteristics of the hand region different from the background region through the real hand image, so that the key point detection network obtained by training through the process has a good key point detection effect.

FIG. 7 is a flowchart illustrating a depth detection network training process, as shown in FIG. 7, which may include the following steps, according to an example embodiment.

In S310, a virtual hand image training sample set labeled with relative depth labeling information is acquired.

The relative depth marking information is the relative depth of each pixel point in the virtual hand image relative to the key point of the reference hand. Because the depth difference of each pixel point in the hand region is smaller, in order to increase the contrast ratio, a palm root key point (i.e., the key point marked with 0 in fig. 2) is selected as a reference hand key point, the absolute depth value of other pixel points in the hand region minus the absolute depth value of the reference hand key point is worth the relative depth map of the hand region, and then the relative depth data in the hand region is normalized, for example, normalized to be within a range of 0-1, so that the learning difficulty of the depth detection network is reduced, and the network learning effect is improved.

In S320, training samples in the virtual hand image training sample set are input into a key point detection network obtained by training, and hand region key point features of the training samples are obtained at a preset feature layer of the key point detection network.

In S330, the hand region key point features are input into the initial depth detection network, and a relative depth detection result of each pixel point in the training sample relative to the reference hand key point is obtained.

In S340, according to the relative depth detection result and the relative depth labeling information corresponding to the training sample, the depth loss corresponding to the training sample is calculated, and the network parameters of the initial depth detection network are adjusted according to the depth loss until the depth loss meets the depth convergence condition, so as to obtain the depth detection network.

And calculating to obtain the depth loss by using a depth detection result obtained by the depth detection network analysis and the relative depth marking information, and solving network parameters when the depth loss meets the depth convergence condition by using a gradient descent method if the depth loss does not meet the depth convergence condition. The process is an iterative calculation process, and each iteration needs to recalculate the depth loss and judge whether the depth loss meets the convergence condition or not, until the depth loss meets the depth convergence condition, the final depth detection network is obtained.

In the depth detection network training process provided by the embodiment, the depth detection network is independently trained by using the virtual hand image marked with accurate depth information, so that the depth detection network can learn the hand depth characteristics well, and the accuracy of the network prediction result is improved.

Corresponding to the above-mentioned hand image depth determining method embodiment, the present disclosure further provides a hand image depth determining device embodiment.

Fig. 8 is a block diagram of a hand image depth determination apparatus according to an exemplary embodiment, and referring to fig. 8, the apparatus includes a pending image acquisition module 110, a keypoint detection module 120, and a depth detection module 130.

The pending acquisition module 110 is configured to acquire a pending image including a hand region.

The keypoint detection module 120 is configured to input the image to be processed into the keypoint detection network for detecting keypoints of the hand region.

The depth detection module 130 is configured to connect a depth detection network to a preset feature layer of the key point detection network, so as to input the key point feature of the hand region detected by the preset feature layer to the depth detection network for depth detection to obtain depth information of the hand region.

In one possible implementation, the key point detection network includes a plurality of hourglass network structures connected in sequence, and the feature layer with the smallest size in the last hourglass network structure in the plurality of hourglass network structures is connected with the depth detection network.

In one possible implementation, the depth detection module 130 includes:

The depth detection sub-module is configured to input the key point characteristics of the hand region into the depth detection network for depth detection to obtain the depth value of each pixel point of the hand region.

The background determination submodule is configured to determine that the pixel point belongs to a background area when the depth value of the pixel point is greater than or equal to a preset depth threshold value.

The hand region determination submodule is configured to determine that the pixel point belongs to the hand region when the depth value of the pixel point is smaller than a preset depth threshold value.

The hand depth determination submodule is configured to obtain depth information of the hand region according to depth values of all pixel points belonging to the hand region.

When the depth analysis is performed on the hand region in the image, the hand image depth determining device provided by the embodiment performs hand key point detection on the hand region in the image to obtain the hand region key point feature in the image. And then carrying out depth analysis on the key point characteristics of the hand region to obtain the depth information of the hand region. According to the scheme, the hand region in the image is captured by combining the hand key point characteristics, so that interference of the background in the image is avoided, and the accuracy of the hand region depth analysis result is improved.

Fig. 9 is a block diagram of another hand image depth determination device according to an exemplary embodiment, and referring to fig. 9, the device further includes a reference point depth acquisition module 210 and a hand region absolute depth acquisition module 220 on the basis of the embodiment shown in fig. 8.

In another possible implementation, the depth detection module 130 is specifically configured to: and inputting the key point characteristics of the hand region into a depth detection network for depth detection to obtain relative depth values between each pixel point contained in the hand region and a preset reference hand key point.

The reference point depth acquisition module 210 is configured to acquire an absolute depth value corresponding to a reference hand key point.

In one possible implementation, the reference hand keypoint may be a palm root keypoint.

The hand region absolute depth obtaining module 220 is configured to superimpose the relative depth value between each pixel point included in the hand region and the reference hand key point with the absolute depth value of the reference hand key point to obtain the absolute depth value of each pixel point included in the hand region.

According to the hand image depth determining device provided by the embodiment, after the relative depth value between each pixel point of the hand region and the reference hand key point in the image to be processed is detected by the depth detection network, the absolute depth value of each pixel point of the hand region is obtained through reduction according to the absolute depth value of the reference hand key point and the relative depth value of each pixel point of the hand region, so that the requirements of different scenes are met.

Fig. 10 is a block diagram of another hand image depth determination apparatus according to an exemplary embodiment, and referring to fig. 10, the apparatus further includes, on the basis of the embodiment shown in fig. 8: the system comprises a keypoint training sample acquisition module 310, a sample keypoint detection module 320, a keypoint detection network adjustment module 330, a depth training sample acquisition module 340, a sample depth detection module 350, and a depth detection network adjustment module 360.

A keypoint training sample acquisition module 310 configured to acquire a keypoint training sample set containing hand keypoint labeling information, the keypoint training sample set comprising a real hand sample image and a virtual hand sample image, wherein the real hand sample image comprises two-dimensional hand keypoint labeling information and the virtual hand sample image comprises three-dimensional hand keypoint labeling information;

The sample keypoint detection module 320 is configured to input a training sample in the keypoint training sample set into the initial keypoint detection network to perform hand keypoint detection, so as to obtain a hand keypoint detection result of the training sample, wherein the hand keypoint detection result corresponding to the virtual hand sample image comprises three-dimensional position information of each hand keypoint, and the hand keypoint detection result corresponding to the real hand sample image comprises two-dimensional position information of each hand keypoint;

the key point detection network adjustment module 330 is configured to calculate and obtain a key point loss corresponding to the training sample according to the hand key point detection result and the hand key point labeling information corresponding to the training sample, and adjust network parameters of the initial key point detection network according to the key point loss until the key point loss meets the key point convergence condition, thereby obtaining the key point detection network.

The depth training sample acquiring module 340 is configured to acquire a virtual hand image training sample set labeled with relative depth labeling information, where the relative depth labeling information is the relative depth of each pixel point relative to the reference hand key point.

The sample depth detection module 350 is configured to input training samples in the virtual hand image training sample set into a trained key point detection network, obtain hand region key point features of the training samples at a preset feature layer of the key point detection network, and input the hand region key point features into an initial depth detection network to obtain a relative depth detection result of each pixel point in the training samples relative to a reference hand key point;

The depth detection network adjustment module 360 is configured to calculate a depth loss corresponding to the training sample according to the relative depth detection result and the relative depth labeling information corresponding to the training sample, and adjust network parameters of the initial depth detection network according to the depth loss until the depth loss meets a depth convergence condition, thereby obtaining the depth detection network.

According to the hand image depth determining device provided by the embodiment, the key point detection network is trained by combining the real hand image and the virtual hand image, the key point detection network can learn the characteristics of the key points of the hand well through the virtual hand image, and the key point detection network can learn the characteristics of the hand region different from the background region through the real hand image. After the key point detection network is trained, the virtual hand image is utilized to train the depth detection network independently, so that the depth detection network can learn hand depth characteristics well, and accuracy of network prediction results is improved.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 11 is a block diagram of an electronic device, according to an example embodiment. Referring to fig. 11, the electronic device includes a processor 410 and a memory 420; wherein the processor 410 and the memory 420 communicate with each other via a bus 430.

The memory 420 stores instructions executable by the processor 410, and the processor 410 executes the instructions in the memory 420 to implement the hand image depth determination method described above.

In an exemplary embodiment, a storage medium is also provided, such as a memory 420, including instructions executable by the processor 410 of the electronic device to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, for example, a ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, having instructions stored thereon, which when executed by a processor in an electronic device, implement the hand image depth determination method described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for determining depth of a hand image, comprising:

acquiring an image to be processed containing a hand area;

Connecting a depth detection network to a preset feature layer of the key point detection network, so as to input the key point features of the hand region, which are obtained by processing the preset feature layer, into the depth detection network for depth detection, and obtaining depth information of the hand region;

the training process of the key point detection network comprises the following steps:

2. The method for determining the depth of the hand image according to claim 1, wherein the step of inputting the hand region key point features obtained by the detection of the preset feature layer into the depth detection network to perform depth detection, and obtaining the depth information of the hand region includes:

3. The method for determining the depth of the hand image according to claim 1, wherein the step of inputting the hand region key point features obtained by the detection of the preset feature layer into the depth detection network to perform depth detection, and obtaining the depth information of the hand region includes:

4. A hand image depth determination method according to claim 3, wherein the method further comprises:

5. The hand image depth determination method according to any one of claims 1 to 4, wherein the step of connecting a depth detection network at a preset feature layer of the keypoint detection network comprises:

6. The hand image depth determination method of claim 1, wherein the training process of the depth detection network comprises:

7. The hand image depth determination method according to claim 3 or 6, wherein the reference hand keypoints are palm root keypoints.

8. A hand image depth determination apparatus, comprising:

The depth detection module is configured to connect a depth detection network to a preset feature layer of the key point detection network so as to input the key point features of the hand area, which are obtained by processing the preset feature layer, into the depth detection network for depth detection, and obtain depth information of the hand area;

The apparatus further comprises:

9. The hand image depth determination device of claim 8, wherein the depth detection module comprises:

The depth detection sub-module is configured to input the key point characteristics of the hand region into the depth detection network for depth detection to obtain depth values of all pixel points of the hand region;

and the hand depth determination submodule is configured to obtain depth information of the hand region according to the depth values of all pixel points belonging to the hand region.

10. The hand image depth determination device of claim 8, wherein the depth detection module is specifically configured to:

11. The hand image depth determination device of claim 10, wherein the device further comprises:

12. The hand image depth determination device of any one of claims 8-11, wherein the depth detection module is specifically configured to:

13. The hand image depth determination device of claim 8, wherein the device comprises:

14. The hand image depth determination device of claim 10 or 13, wherein the reference hand keypoints are palm root keypoints.

15. An electronic device, comprising:

A processor;

A memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the instructions to implement the hand image depth determination method of any one of claims 1 to 7.

16. A storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the hand image depth determination method of any one of claims 1 to 7.