CN110852311A - Three-dimensional human hand key point positioning method and device - Google Patents

Three-dimensional human hand key point positioning method and device Download PDF

Info

Publication number
CN110852311A
CN110852311A CN202010034582.0A CN202010034582A CN110852311A CN 110852311 A CN110852311 A CN 110852311A CN 202010034582 A CN202010034582 A CN 202010034582A CN 110852311 A CN110852311 A CN 110852311A
Authority
CN
China
Prior art keywords
palm
depth image
neural network
normalized
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010034582.0A
Other languages
Chinese (zh)
Inventor
陈俊逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Small Cobalt Technology Co Ltd
Original Assignee
Changsha Small Cobalt Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Small Cobalt Technology Co Ltd filed Critical Changsha Small Cobalt Technology Co Ltd
Priority to CN202010034582.0A priority Critical patent/CN110852311A/en
Publication of CN110852311A publication Critical patent/CN110852311A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method and a device for positioning key points of a three-dimensional human hand, wherein the method comprises the following steps: acquiring a depth image of an actual scene; carrying out palm region segmentation on the depth image through a first neural network to obtain a segmented palm region; carrying out normalization processing and size transformation on the palm area to obtain a depth map of the normalized palm area; judging whether the actual scene corresponding to the depth map of the normalized palm area contains the real palm or not through a second neural network; if yes, predicting the key point coordinates of the depth map of the normalized palm region through a third neural network, and determining the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm region, so that the practicability and reliability of three-dimensional positioning of the key points of the human hand can be improved.

Description

Three-dimensional human hand key point positioning method and device
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a three-dimensional human hand key point positioning method, a three-dimensional human hand key point positioning device, terminal equipment and a computer readable medium.
Background
Palm print, palm vein and other biological feature recognition technologies and gesture recognition technologies all need to perform palm detection and key point positioning on the image. In palm print recognition, it is critical to quickly and accurately recognize the palm area, which may affect the performance of palm print recognition. In the gesture recognition process, if the coordinate positions of the joint points of the fingers and the palms of the human body can be obtained, the gesture can be judged according to the relative position relation of the fingers and the palms. Therefore, palm region positioning and palm keypoint positioning are very important. The existing palm key point detection is mainly based on two-dimensional RGB images or near-infrared images, and the main technologies include three types. The first type is that the palm and the background are segmented by utilizing the color information of the palm, and the position of a key point is deduced according to the contour information of the palm; the second type is that the contour extraction algorithm is directly used for carrying out contour extraction on the palm in the image to obtain contour information of parts including fingers, the palm, the wrist and the like, and further key point inference is carried out according to the contour information; the third type is that a deep learning technology is utilized, a deep neural network for object detection is used for an image, a rectangular frame containing a palm is directly obtained, then finger knuckle line segments are positioned, and then joint point positions of fingers and the palm, namely key point positions, are obtained.
The most similar patent to the present invention is CN 108427942 a, which comprises the following steps: s1, collecting training samples; s2, constructing a network model: constructing a CNN (convolutional neural network) feature extraction network, an RPN (region generation network) candidate region extraction network and a discrimination network; s3, training a network model: initializing a CNN feature extraction network, an RPN candidate region extraction network and a discrimination network; s4, constructing a detection model; s5: palm detection and key point positioning. The technology provided by the patent uses an object detection framework fast R-CNN with highest performance at that time to perform palm region fast positioning, and uses a key point positioning network model to perform palm contour detection and key point positioning on a palm image to be detected. This patent uses near-infrared image as the input of object detection frame and key point location network model, compares with RGB image, and it is less relatively to receive the light influence, but light still can cause the influence, and unable anti-fake simultaneously if using the static two-dimentional palm picture of printing, also can direct recognition, and this makes subsequent biological feature identification and gesture recognition unreliable. In addition, this patent uses two-dimentional near-infrared image to do key point location, can't fix a position the three-dimensional information of key point, and when the palm had the angle deflection, the gesture recognition of two-dimensional picture can receive serious influence.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for positioning a three-dimensional human hand key point, a terminal device, and a computer readable medium, which can improve the practicability and reliability of three-dimensional positioning of a human hand key point.
The first aspect of the embodiments of the present invention provides a method for positioning a three-dimensional human hand key point, including:
acquiring a depth image of an actual scene;
carrying out palm region segmentation on the depth image through a first neural network to obtain a segmented palm region;
carrying out normalization processing and size transformation on the palm area to obtain a depth map of the normalized palm area;
judging whether the actual scene corresponding to the depth map of the normalized palm area contains the real palm or not through a second neural network;
if yes, predicting the key point coordinates of the depth map of the normalized palm area through a third neural network, and determining the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm area.
A second aspect of the embodiments of the present invention provides a three-dimensional human hand key point positioning device, including:
the acquisition module is used for acquiring a depth image of an actual scene;
the segmentation module is used for carrying out palm region segmentation on the depth image through a first neural network to obtain a segmented palm region;
the normalizing module is used for carrying out normalization processing and size conversion on the palm area to obtain a depth map of the normalized palm area;
the anti-counterfeiting module is used for judging whether the actual scene corresponding to the depth map of the normalized palm area contains the real palm or not through a second neural network;
and the positioning module is used for predicting the key point coordinates of the depth map of the normalized palm area through a third neural network when the anti-counterfeiting module detects that the actual scene corresponding to the depth map contains the real palm, and determining the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm area.
A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the three-dimensional human hand keypoint location method when executing the computer program.
A fourth aspect of the embodiments of the present invention provides a computer-readable medium, which stores a computer program, and when the computer program is processed and executed, the computer program implements the steps of the above-mentioned three-dimensional human hand key point positioning method.
The three-dimensional human hand key point positioning method provided by the embodiment of the invention can acquire the depth image of the actual scene, performing palm region segmentation on the depth image through a first neural network to obtain a segmented palm region, carrying out normalization processing and size transformation on the palm area to obtain a depth map of the normalized palm area, and judging whether the actual scene corresponding to the depth map of the normalized palm region contains the real palm through a second neural network, if so, the keypoint coordinates of the depth map of the normalized palm region are predicted by a third neural network, and determining the coordinates of the key points of the palm in the depth image of the actual scene according to the predicted coordinates of the key points of the depth image of the normalized palm area, so that the practicability and reliability of three-dimensional positioning of the key points of the human hand can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of a three-dimensional human hand key point positioning method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a three-dimensional human hand key point positioning device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a refinement of the segmentation module in FIG. 2;
FIG. 4 is a schematic diagram of a detailed structure of the anti-counterfeiting module in FIG. 2;
fig. 5 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Referring to fig. 1, fig. 1 is a diagram illustrating a method for positioning key points of a three-dimensional human hand according to an embodiment of the present invention. As shown in fig. 1, the three-dimensional human hand key point positioning method of the present embodiment includes the following steps:
s101: a depth image of an actual scene is acquired.
In the embodiment of the invention, in an actual scene, a depth (depth) image to be recognized can be acquired through the depth camera equipment. The value of each pixel point in the depth image represents the distance value of an object on the pixel point from the depth camera.
S102: and carrying out palm region segmentation on the depth image through a first neural network to obtain a segmented palm region.
In the embodiment of the invention, the depth image can be subjected to size conversion to obtain a depth image with a fixed size; inputting the depth image with the fixed size into a first neural network, and performing palm region segmentation on the depth image with the fixed size by using the first neural network; and finally, denoising the depth image with the fixed size after the palm region segmentation is carried out, so as to obtain a segmented palm region. More specifically, the first neural network is a convolutional neural network including an encoder and a decoder. The encoder comprises four 8 convolutional layers, each convolutional layer is connected with an activation function layer, and the activation function is preferably a linear rectification function, namely ReLU. The first 6 convolutional layers are followed by the down-sampling layer, the down-sampling window size is 2 x2, and the back 2 convolutional layers are not followed by the down-sampling layer. The convolution kernel size of each of the 8 convolution layers is 3 x 3, and the number of output feature maps is 64, 64,128, 128,256, 256,512 and 512. The decoder is connected with the encoder, the input of the decoder is the output of the encoder, the decoder comprises 9 convolutional layers, the first 8 convolutional layers are followed by an activation function ReLU, and the last 1 convolutional layer is followed by an activation function Sigmoid. The up-sampling window is 2 x2, and the up-sampling layer is not connected after the front 6 convolutional layers + activation function layers of the decoder. The convolution kernel size of the first 8 convolutional layers of the decoder is 3 x 3, and the number of output signatures is 256, 256, 128, 128, 64, 64, 32, respectively. The convolution kernel size of the last 1 convolutional layer of the decoder is 1 x1, and the number of output signatures is 1. The output size of the decoder is consistent with the input size of the convolutional neural network, and whether each pixel belongs to the palm region or not is indicated, if yes, the point value is 1, and if not, the point value is 0. Further, the segmented palm region output by the first neural network often includes noise, for example, some non-palm connected regions are erroneously determined as palm regions, so that the segmented palm region output by the first neural network needs to be denoised, and only one connected region with a palm (i.e. palm region) is included in the segmented palm region output by the first neural network, and other connected regions are eliminated. The specific method of denoising can be: 1) selecting a first initial point from the divided depth image with the fixed size, extracting nearby pixel points communicated with the initial point from the initial point, storing the pixel points in a first list, and marking the pixel points as processed; the pixel value of the starting point is equal to the segmentation value of the pixel point communicated with the starting point; 2) continuously searching points which are adjacent to the points in the first list and have the same segmentation value in unprocessed pixel points in the segmented depth image with the fixed size, adding the points into the first list until all the pixel points which are communicated with the first starting point are found and marked as processed, and adding the pixel points into the first list; 3) selecting a second starting point from unprocessed pixel points in the divided depth image with the fixed size, finding and marking all pixel points communicated with the second starting point, and adding the pixel points into the second list; 4) selecting other starting points except the first starting point and the second starting point from unprocessed pixel points in the segmented depth image with the fixed size, finding and marking all pixel points communicated with the other starting points until all the points in the segmented depth image with the fixed size are marked as processed, and obtaining a plurality of lists, wherein each list maintains all the pixel points corresponding to 1 communicated region; 5) and finding a list with the pixel point value of 1 and the largest number of pixel points in all the lists, and determining that the list with the pixel point value of 1 and the largest number of pixel points corresponds to the palm area.
S103: and carrying out normalization processing and size transformation on the palm area to obtain a depth map of the normalized palm area.
In the embodiment of the present invention, the maximum value and the minimum value of all pixel points in the palm region relative to the maximum value and the minimum value in the horizontal and vertical coordinates of the coordinate system corresponding to the depth image of the fixed size in S102 are selected to obtain the maximum value and the minimum value
Figure 88755DEST_PATH_IMAGE001
Obtaining the diagonal position coordinates of the rectangular frame containing the human hand according to the maximum value and the minimum value of the horizontal and vertical coordinates; wherein the position coordinate of the upper left corner is () The coordinate of the lower right corner is
Figure 51474DEST_PATH_IMAGE003
) (ii) a And cutting the depth image through the rectangular frame to obtain a depth map of the palm area, normalizing the depth value of the palm area, and carrying out size transformation on the palm area to obtain a normalized depth map of the palm area with a fixed size.
S104: and judging whether the actual scene corresponding to the depth map of the normalized palm region contains the real palm or not through a second neural network.
In the embodiment of the invention, the depth map of the normalized palm region is fed into a second neural network; then, a judgment value can be obtained through the second neural network, and whether the actual scene corresponding to the depth map of the normalized palm area contains the real palm or not is judged through the judgment value; the second neural network comprises 5 convolutional layers and 3 fully-connected layers, the sizes of convolutional cores of the convolutional layers are 3 x 3, the number of output feature graphs is 32,64,128,256 and 512 respectively, a down-sampling layer is connected behind the first three convolutional layers, a down-sampling window is 3 x 3, a ReLU activation function is connected behind all the convolutional layers, the number of nodes of the fully-connected layers is 4096,1024 and 3 respectively, a Dropout function is connected behind the first two fully-connected layers of the fully-connected layers, and the Dropout function is used for randomly setting the output to zero at a certain probability to prevent overfitting, and 3 nodes of the last layer of the fully-connected layers represent 3 categories and represent real persons, picture attacks and video attacks respectively. If the actual scene corresponding to the depth map of the normalized palm area contains the real palm, the process goes to S105; if it is determined that the actual scene corresponding to the depth map of the normalized palm region does not include the real palm (for example, it includes a picture or a video), the process returns to S101 to re-acquire the depth image of the actual scene.
S105: and predicting the key point coordinates of the depth map of the normalized palm region through a third neural network, and determining the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm region.
In this embodiment of the present invention, the depth map of the normalized palm region is sent to a third neural network, and the third neural network outputs predicted coordinate values of the key points of the normalized palm region, where the number of the predicted coordinate values is m × 3, m represents the number of the palm key points, each of the coordinate values represents an abscissa, an ordinate or a depth value, for example, m key points in total, the coordinate value of the first key point is (x 1, y1, d 1), and the coordinate value of the second key point is (x 2, y2, d 2) … … th key point is (xm, ym, dm); then, the predicted coordinate values may be converted into coordinate values in a palm region (palm region in S102) before being subjected to size conversion; wherein, the conversion formula is:
Figure 100002_DEST_PATH_IMAGE005A
(1)
wherein the content of the first and second substances,
Figure 638313DEST_PATH_IMAGE006
and
Figure DEST_PATH_IMAGE007
respectively representing width and height expansion factors in the size transformation process, (xi, yi) respectively representing an abscissa and an ordinate in predicted coordinate values of the key points, (xi ', yi') respectively representing an abscissa and an ordinate in coordinate values of the key points in the palm area before size transformation, and m represents the number of the key points; finally, the horizontal and vertical coordinate values in the coordinate values of the key points in the palm area before the size transformation are respectively added with the hand before the size transformationMinimum value in horizontal and vertical coordinates of all pixel points in the palm area relative to a coordinate system corresponding to the depth image with the fixed size (
Figure 233723DEST_PATH_IMAGE008
) Obtaining the coordinates of the key points of the palm in the fixed-size depth image in S102, and then combining the depth values of the keys in the fixed-size depth image to obtain the coordinates of the key points in the three-dimensional space. Preferably, the third neural network is a convolutional neural network, and the convolutional neural network has the following structure: the convolution filter comprises 8 convolution layers and 3 fully-connected layers, the convolution kernel size of each convolution layer is 3 x 3, the number of feature graphs output by the convolution layers is 32, 32,64, 64,128, 128,256 and 256 respectively, a ReLU activation function layer is connected behind each convolution layer, a down-sampling layer is connected behind the first 4 convolution layers and the activation function layer respectively, the size of a down-sampling window is 2 x2, the number of nodes of the fully-connected layers is 4096,2048 and m x 3 respectively, a ReLU activation function is connected behind the first two fully-connected layers, and then a Dropout function is connected, and the Dropout can randomly set the output to zero at a certain probability.
In the three-dimensional human hand positioning method provided in fig. 1, because the depth information (depth image) is used as an input, the method is insensitive to illumination, and can still perform human hand positioning work under extreme light conditions, so that the method has higher practicability. Moreover, the three-dimensional anti-counterfeiting algorithm provided by the embodiment of the invention can effectively prevent malicious attacks on pictures and videos, and improve the reliability of manual positioning. In addition, the embodiment of the invention can directly obtain the coordinates of the key points in the three-dimensional space, and when the palm deflects, the angle transformation can be carried out in the three-dimensional space to obtain the positive palm, so that the negative influence on the subsequent gesture recognition algorithm is smaller.
Referring to fig. 2, fig. 2 is a block diagram of a three-dimensional human hand key point positioning device according to an embodiment of the present invention. As shown in fig. 2, the three-dimensional human hand key point positioning device 2 of the present embodiment includes an obtaining module 21, a dividing module 22, a normalizing module 23, an anti-counterfeiting module 24, and a positioning module 25. The obtaining module 21, the dividing module 22, the normalizing module 23, the anti-counterfeiting module 24 and the positioning module 25 are respectively configured to execute the specific methods in S101, S102, S103, S104 and S105 in fig. 1, and details can be referred to in the related description of fig. 1 and are only briefly described here:
an obtaining module 21, configured to obtain a depth image of an actual scene.
And the segmentation module 22 is configured to perform palm region segmentation on the depth image through a first neural network to obtain a segmented palm region.
And the normalizing module 23 is configured to perform normalization processing and size conversion on the palm area to obtain a depth map of the normalized palm area.
And the anti-counterfeiting module 24 is configured to judge whether the actual scene corresponding to the depth map of the normalized palm region includes a real palm through a second neural network.
And the positioning module 25 is configured to predict, when the anti-counterfeit module 24 detects that the actual scene corresponding to the depth map includes a real palm, the key point coordinates of the depth map of the normalized palm region through a third neural network, and determine the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm region.
Further, as can be seen in fig. 3, the segmentation module 22 may specifically include a transformation unit 221, a segmentation unit 222, and a denoising unit 223:
a transforming unit 221, configured to perform size transformation on the depth image to obtain a depth image with a fixed size.
A segmentation unit 222, configured to input the fixed-size depth image into a first neural network, and perform palm region segmentation on the fixed-size depth image using the first neural network.
And a denoising unit 223, configured to perform denoising processing on the depth image with the fixed size after the palm region is segmented, so as to obtain a segmented palm region.
Further, referring to fig. 4, the anti-counterfeit module 24 may specifically include an input unit 241 and a determination unit 242:
an input unit 241, configured to input the normalized depth map of the palm region into a second neural network.
A determining unit 242, configured to obtain a determination value through the second neural network, and determine whether the actual scene corresponding to the depth map of the normalized palm region includes a real palm according to the determination value; the second neural network is a convolutional neural network and comprises 5 convolutional layers and 3 full-connection layers, the sizes of convolutional cores of the convolutional layers are 3 x 3, the number of output feature graphs is 32,64,128,256 and 512 respectively, the first three convolutional layers are connected with a down-sampling layer, a down-sampling window is 3 x 3, a linear rectification function ReLU is arranged behind all the convolutional layers, the number of nodes of the full-connection layers is 4096,1024 and 3 respectively, a Dropout function is connected behind the first two full-connection layers of the full-connection layers, the Dropout function is used for randomly setting the output to zero at a certain probability so as to prevent overfitting, and 3 nodes of the last layer of the full-connection layers represent 3 categories and represent a real palm, picture attack and video attack respectively.
The three-dimensional human hand positioning device provided by fig. 2 can not be sensitive to illumination by taking depth information (depth image) as input, so that the positioning work of the human hand can still be carried out under the extreme light condition, and the practicability is higher. Moreover, the three-dimensional anti-counterfeiting algorithm adopted by the three-dimensional hand positioning device can effectively prevent malicious attacks on pictures and videos, and the reliability of hand positioning is improved. In addition, the three-dimensional hand positioning device can directly obtain the coordinates of the key points in the three-dimensional space, and when the palm deflects, the angle transformation can be carried out in the three-dimensional space to obtain the positive palm, so that the negative influence on the subsequent gesture recognition algorithm is smaller.
Fig. 5 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 5, the terminal device 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52 stored in said memory 51 and executable on said processor 50, such as a program for performing a three-dimensional human hand keypoint localization method. The processor 50, when executing the computer program 52, implements the steps in the above-described method embodiments, e.g., S101 to S105 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of the modules/units in the system embodiments, such as the functions of the modules 21 to 25 shown in fig. 2.
Illustratively, the computer program 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 52 in the terminal device 5. For example, the computer program 52 may be partitioned into an acquisition module 21, a partitioning module 22, a normalization module 23, an anti-counterfeiting module 24, and a positioning module 25. (modules in the virtual system), the specific functions of each module are as follows:
an obtaining module 21, configured to obtain a depth image of an actual scene.
And the segmentation module 22 is configured to perform palm region segmentation on the depth image through a first neural network to obtain a segmented palm region.
And the normalizing module 23 is configured to perform normalization processing and size conversion on the palm area to obtain a depth map of the normalized palm area.
And the anti-counterfeiting module 24 is configured to judge whether the actual scene corresponding to the depth map of the normalized palm region includes a real palm through a second neural network.
And the positioning module 25 is configured to predict, when the anti-counterfeit module 24 detects that the actual scene corresponding to the depth map includes a real palm, the key point coordinates of the depth map of the normalized palm region through a third neural network, and determine the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm region.
The terminal device 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device 5 may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a terminal device 5 and does not constitute a limitation of terminal device 5 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit of the terminal device 5 and an external storage device. The memory 51 is used for storing the computer programs and other programs and data required by the terminal device 5. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the functional units, sub-units and modules described above are illustrated as examples, and in practical applications, the functions may be distributed as needed to different functional units, sub-units and modules, that is, the internal structure of the system may be divided into different functional units, sub-units or modules to complete all or part of the functions described above. Each functional unit, sub-unit, and module in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated units or sub-units may be implemented in a form of hardware, or may be implemented in a form of software functional units. In addition, specific names of the functional units, the sub-units and the modules are only used for distinguishing one from another, and are not used for limiting the protection scope of the application. The specific working processes of the units, sub-units, and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed system/terminal device and method can be implemented in other ways. For example, the above-described system/terminal device embodiments are merely illustrative, and for example, the division of the modules, units or sub-units is only one logical function division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A three-dimensional human hand key point positioning method is characterized by comprising the following steps:
acquiring a depth image of an actual scene;
carrying out palm region segmentation on the depth image through a first neural network to obtain a segmented palm region;
carrying out normalization processing and size transformation on the palm area to obtain a depth map of the normalized palm area;
judging whether the actual scene corresponding to the depth map of the normalized palm area contains the real palm or not through a second neural network;
if yes, predicting the key point coordinates of the depth map of the normalized palm area through a third neural network, and determining the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm area.
2. The three-dimensional human hand key point positioning method according to claim 1, wherein the first neural network is a convolutional neural network, and the palm region segmentation of the depth image by the first neural network to obtain a segmented palm region comprises:
carrying out size transformation on the depth image to obtain a depth image with a fixed size;
inputting the depth image with the fixed size into a first neural network, and performing palm region segmentation on the depth image with the fixed size by using the first neural network;
and denoising the depth image with the fixed size after the palm region segmentation to obtain a segmented palm region.
3. The three-dimensional human hand key point positioning method according to claim 2, wherein the denoising processing is performed on the fixed-size depth image after the palm region segmentation to obtain a segmented palm region, and the denoising processing comprises:
selecting a first initial point from the divided depth image with the fixed size, extracting nearby pixel points communicated with the initial point from the initial point, storing the pixel points in a first list, and marking the pixel points as processed; the pixel value of the starting point is equal to the segmentation value of the pixel point communicated with the starting point;
continuously searching points which are adjacent to the points in the first list and have the same segmentation value in unprocessed pixel points in the segmented depth image with the fixed size, adding the points into the first list until all the pixel points which are communicated with the first starting point are found and marked as processed, and adding the pixel points into the first list;
selecting a second starting point from unprocessed pixel points in the divided depth image with the fixed size, finding and marking all pixel points communicated with the second starting point, and adding the pixel points into a second list;
selecting other starting points except the first starting point and the second starting point from unprocessed pixel points in the segmented depth image with the fixed size, finding and marking all pixel points communicated with the other starting points until all the points in the segmented depth image with the fixed size are marked as processed, and obtaining a plurality of lists, wherein each list maintains all the pixel points corresponding to 1 communicated region;
and finding a list with the pixel point value of 1 and the largest number of pixel points in all the lists, and determining that the list with the pixel point value of 1 and the largest number of pixel points corresponds to the palm area.
4. The method according to claim 2, wherein the normalizing and the size transforming are performed on the palm region to obtain a depth map of the normalized palm region, and the method comprises:
selecting the maximum value and the minimum value in the horizontal and vertical coordinates of all pixel points in the palm area relative to the coordinate system corresponding to the depth image with the fixed size to obtainObtaining the diagonal position coordinates of the rectangular frame containing the human hand according to the maximum value and the minimum value of the horizontal and vertical coordinates; wherein the position coordinate of the upper left corner is (
Figure 527427DEST_PATH_IMAGE002
) The coordinate of the lower right corner is
Figure 838323DEST_PATH_IMAGE003
);
And cutting the depth image through the rectangular frame to obtain a depth map of the palm area, normalizing the depth value of the palm area, and carrying out size transformation on the palm area to obtain a normalized depth map of the palm area with a fixed size.
5. The method according to claim 1, wherein the second neural network is an anti-fake convolutional neural network for determining whether the acquired depth image belongs to a malicious attack, and the determining whether the actual scene corresponding to the depth image of the normalized palm region includes a real palm by the second neural network includes:
sending the normalized depth map of the palm region into a second neural network;
obtaining a judgment value through the second neural network, and judging whether the actual scene corresponding to the depth map of the normalized palm area contains the real palm or not through the judgment value; the second neural network comprises 5 convolutional layers and 3 fully-connected layers, the sizes of convolutional cores of the convolutional layers are 3 x 3, the number of output feature graphs is 32,64,128,256 and 512 respectively, a down-sampling layer is connected behind the first three convolutional layers, a down-sampling window is 3 x 3, a linear rectification function ReLU is arranged behind all the convolutional layers, the number of nodes of the fully-connected layers is 4096,1024 and 3 respectively, a Dropout function is connected behind the first two fully-connected layers of the fully-connected layers, the Dropout function is used for randomly setting the output to zero according to a preset probability to prevent overfitting, and 3 nodes of the last layer of the fully-connected layers represent 3 categories and represent real people, palm attacks and video attacks respectively.
6. The three-dimensional human hand keypoint location method of claim 4, wherein said third neural network is a convolutional neural network, said predicting, by said third neural network, keypoint coordinates of the depth map of the normalized palm region and determining, by the predicted keypoint coordinates of the depth map of the normalized palm region, keypoint coordinates of the palm in the depth image of the actual scene, comprising:
sending the depth map of the normalized palm region into a third neural network, and outputting predicted coordinate values of key points of the normalized palm region by the third neural network, wherein the number of the predicted coordinate values is m × 3, m represents the number of the palm key points, and each coordinate value represents an abscissa, an ordinate or a depth value;
converting the predicted coordinate values into coordinate values in a palm area before size conversion; wherein, the conversion formula is:
Figure DEST_PATH_IMAGE005A
(1)
wherein the content of the first and second substances,
Figure 827008DEST_PATH_IMAGE006
and
Figure 172538DEST_PATH_IMAGE007
respectively representing width and height expansion factors in the size transformation process, (xi, yi) respectively representing an abscissa and an ordinate in predicted coordinate values of the key points, (xi ', yi') respectively representing an abscissa and an ordinate in coordinate values of the key points in the palm area before size transformation, and m represents the number of the key points;
adding the minimum value of the horizontal and vertical coordinates of the coordinate system corresponding to all the pixel points in the palm region before the size transformation relative to the depth image with the fixed size to the horizontal and vertical coordinates of the coordinate values of the key points in the palm region before the size transformation (the minimum value of the horizontal and vertical coordinates of the coordinate system corresponding to all the pixel points in the palm region before the size transformation relative to the depth image with the fixed size) ((
Figure 526159DEST_PATH_IMAGE008
) And obtaining the coordinates of the key points of the palm in the depth image with the fixed size.
7. A three-dimensional human hand key point positioning device is characterized by comprising:
the acquisition module is used for acquiring a depth image of an actual scene;
the segmentation module is used for carrying out palm region segmentation on the depth image through a first neural network to obtain a segmented palm region;
the normalizing module is used for carrying out normalization processing and size conversion on the palm area to obtain a depth map of the normalized palm area;
the anti-counterfeiting module is used for judging whether the actual scene corresponding to the depth map of the normalized palm area contains the real palm or not through a second neural network;
and the positioning module is used for predicting the key point coordinates of the depth map of the normalized palm area through a third neural network when the anti-counterfeiting module detects that the actual scene corresponding to the depth map contains the real palm, and determining the key point coordinates of the palm in the depth image of the actual scene through the predicted key point coordinates of the depth map of the normalized palm area.
8. The three-dimensional human hand keypoint locating device of claim 7, wherein said segmentation module comprises:
the transformation unit is used for carrying out size transformation on the depth image to obtain a depth image with a fixed size;
a segmentation unit, configured to input the fixed-size depth image into a first neural network, and perform palm region segmentation on the fixed-size depth image using the first neural network;
and the denoising unit is used for denoising the depth image with the fixed size after the palm region is segmented to obtain the segmented palm region.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-6 when executing the computer program.
10. A computer-readable medium, in which a computer program is stored which, when being processed and executed, carries out the steps of the method according to any one of claims 1 to 6.
CN202010034582.0A 2020-01-14 2020-01-14 Three-dimensional human hand key point positioning method and device Pending CN110852311A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010034582.0A CN110852311A (en) 2020-01-14 2020-01-14 Three-dimensional human hand key point positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010034582.0A CN110852311A (en) 2020-01-14 2020-01-14 Three-dimensional human hand key point positioning method and device

Publications (1)

Publication Number Publication Date
CN110852311A true CN110852311A (en) 2020-02-28

Family

ID=69610681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010034582.0A Pending CN110852311A (en) 2020-01-14 2020-01-14 Three-dimensional human hand key point positioning method and device

Country Status (1)

Country Link
CN (1) CN110852311A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709268A (en) * 2020-04-24 2020-09-25 中国科学院软件研究所 Human hand posture estimation method and device based on human hand structure guidance in depth image
CN112156451A (en) * 2020-09-22 2021-01-01 歌尔科技有限公司 Handle and size adjusting method, size adjusting system and size adjusting device thereof
CN112233161A (en) * 2020-10-15 2021-01-15 北京达佳互联信息技术有限公司 Hand image depth determination method and device, electronic equipment and storage medium
CN112861783A (en) * 2021-03-08 2021-05-28 北京华捷艾米科技有限公司 Hand detection method and system
CN113065458A (en) * 2021-03-29 2021-07-02 新疆爱华盈通信息技术有限公司 Voting method and system based on gesture recognition and electronic device
CN113269089A (en) * 2021-05-25 2021-08-17 上海人工智能研究院有限公司 Real-time gesture recognition method and system based on deep learning
CN113724317A (en) * 2021-08-31 2021-11-30 南京未来网络产业创新有限公司 Hand joint positioning and local area calculation method, processor and memory
CN114581535A (en) * 2022-03-03 2022-06-03 北京深光科技有限公司 Method, device, storage medium and equipment for marking key points of user bones in image
CN113065458B (en) * 2021-03-29 2024-05-28 芯算一体(深圳)科技有限公司 Voting method and system based on gesture recognition and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648103A (en) * 2016-12-28 2017-05-10 歌尔科技有限公司 Gesture tracking method for VR headset device and VR headset device
CN108776773A (en) * 2018-05-04 2018-11-09 华南理工大学 A kind of three-dimensional gesture recognition method and interactive system based on depth image
CN109684925A (en) * 2018-11-21 2019-04-26 深圳奥比中光科技有限公司 A kind of human face in-vivo detection method and equipment based on depth image
US20190310716A1 (en) * 2015-12-15 2019-10-10 Purdue Research Foundation Method and System for Hand Pose Detection
CN110383288A (en) * 2019-06-06 2019-10-25 深圳市汇顶科技股份有限公司 The method, apparatus and electronic equipment of recognition of face

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190310716A1 (en) * 2015-12-15 2019-10-10 Purdue Research Foundation Method and System for Hand Pose Detection
CN106648103A (en) * 2016-12-28 2017-05-10 歌尔科技有限公司 Gesture tracking method for VR headset device and VR headset device
CN108776773A (en) * 2018-05-04 2018-11-09 华南理工大学 A kind of three-dimensional gesture recognition method and interactive system based on depth image
CN109684925A (en) * 2018-11-21 2019-04-26 深圳奥比中光科技有限公司 A kind of human face in-vivo detection method and equipment based on depth image
CN110383288A (en) * 2019-06-06 2019-10-25 深圳市汇顶科技股份有限公司 The method, apparatus and electronic equipment of recognition of face

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
王江明 等: "多梯度融合的 RGBD 图像边缘检测", 《电子测量与仪器学报》 *
蔡汉明 等: "《交互式微计算机图形学》", 31 October 1994 *
郭卡 等: "《Python数据爬取技术与实战手册》", 31 August 2018 *
郭峰 等: "《水中机器人运动》", 30 April 2012, 哈尔滨工程大学出版社 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709268A (en) * 2020-04-24 2020-09-25 中国科学院软件研究所 Human hand posture estimation method and device based on human hand structure guidance in depth image
CN111709268B (en) * 2020-04-24 2022-10-14 中国科学院软件研究所 Human hand posture estimation method and device based on human hand structure guidance in depth image
CN112156451B (en) * 2020-09-22 2022-07-22 歌尔科技有限公司 Handle and size adjusting method, size adjusting system and size adjusting device thereof
CN112156451A (en) * 2020-09-22 2021-01-01 歌尔科技有限公司 Handle and size adjusting method, size adjusting system and size adjusting device thereof
CN112233161A (en) * 2020-10-15 2021-01-15 北京达佳互联信息技术有限公司 Hand image depth determination method and device, electronic equipment and storage medium
CN112233161B (en) * 2020-10-15 2024-05-17 北京达佳互联信息技术有限公司 Hand image depth determination method and device, electronic equipment and storage medium
CN112861783A (en) * 2021-03-08 2021-05-28 北京华捷艾米科技有限公司 Hand detection method and system
CN113065458A (en) * 2021-03-29 2021-07-02 新疆爱华盈通信息技术有限公司 Voting method and system based on gesture recognition and electronic device
CN113065458B (en) * 2021-03-29 2024-05-28 芯算一体(深圳)科技有限公司 Voting method and system based on gesture recognition and electronic equipment
CN113269089A (en) * 2021-05-25 2021-08-17 上海人工智能研究院有限公司 Real-time gesture recognition method and system based on deep learning
CN113724317A (en) * 2021-08-31 2021-11-30 南京未来网络产业创新有限公司 Hand joint positioning and local area calculation method, processor and memory
CN113724317B (en) * 2021-08-31 2023-09-29 南京未来网络产业创新有限公司 Hand joint positioning and local area calculating method, processor and memory
CN114581535A (en) * 2022-03-03 2022-06-03 北京深光科技有限公司 Method, device, storage medium and equipment for marking key points of user bones in image

Similar Documents

Publication Publication Date Title
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN110414507B (en) License plate recognition method and device, computer equipment and storage medium
EP3916627A1 (en) Living body detection method based on facial recognition, and electronic device and storage medium
CN102667810B (en) Face recognition in digital images
CN110378297B (en) Remote sensing image target detection method and device based on deep learning and storage medium
US10445602B2 (en) Apparatus and method for recognizing traffic signs
CN110222572B (en) Tracking method, tracking device, electronic equipment and storage medium
CN110717497B (en) Image similarity matching method, device and computer readable storage medium
CN111414888A (en) Low-resolution face recognition method, system, device and storage medium
CN112085701A (en) Face ambiguity detection method and device, terminal equipment and storage medium
CN113490947A (en) Detection model training method and device, detection model using method and storage medium
CN111191582A (en) Three-dimensional target detection method, detection device, terminal device and computer-readable storage medium
CN112651380A (en) Face recognition method, face recognition device, terminal equipment and storage medium
CN111199558A (en) Image matching method based on deep learning
CN111104941B (en) Image direction correction method and device and electronic equipment
CN113111880A (en) Certificate image correction method and device, electronic equipment and storage medium
CN110232381B (en) License plate segmentation method, license plate segmentation device, computer equipment and computer readable storage medium
CN108960246B (en) Binarization processing device and method for image recognition
CN112488054B (en) Face recognition method, device, terminal equipment and storage medium
CN111353325A (en) Key point detection model training method and device
CN113191189A (en) Face living body detection method, terminal device and computer readable storage medium
El Ouariachi et al. RGB-D feature extraction method for hand gesture recognition based on a new fast and accurate multi-channel cartesian Jacobi moment invariants
CN112348008A (en) Certificate information identification method and device, terminal equipment and storage medium
CN113228105A (en) Image processing method and device and electronic equipment
WO2023011606A1 (en) Training method of live body detection network, method and apparatus of live body detectoin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination