CN111368668A

CN111368668A - Three-dimensional hand recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111368668A
Application number: CN202010117526.3A
Authority: CN
Inventors: 卢艺帆
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-07-03
Anticipated expiration: 2040-02-25
Also published as: CN111368668B

Abstract

In the three-dimensional hand recognition method, the three-dimensional hand recognition device, the electronic device, and the storage medium provided in the embodiments of the present invention, the two-dimensional joint point coordinates of the hand are obtained by recognition, and then the two-dimensional to three-dimensional coordinate conversion is performed on the joint point coordinates by using the conversion model, so as to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system, and establish the three-dimensional hand model based on the coordinates. By the mode, gestures or hand shapes in the two-dimensional images can be presented in a three-dimensional hand model mode, the method can be widely applied to various scenes controlled based on the three-dimensional hand model, and the control flexibility is effectively improved.

Description

Three-dimensional hand recognition method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the field of image processing, and in particular relates to a three-dimensional hand recognition method and device, an electronic device and a storage medium.

Background

The hand recognition of gestures or hand shapes in images is widely applied to various scene interactions. With the development of machine learning algorithms, it becomes possible to automatically recognize hands appearing in images using neural network models.

In the prior art, an image acquisition device is required to acquire an image of a hand shape or a gesture to obtain an image to be recognized, the hand shape or the gesture in the image to be recognized is recognized based on a neural network model, the image to be recognized is a two-dimensional image, and an obtained recognition result is a projection of the hand shape or the gesture on a two-dimensional plane.

However, with the development of the intelligent field, the gesture or the hand shape based on three-dimension can also be applied to the interaction with the device, and therefore, a method for recognizing the three-dimension gesture or the hand shape is needed to meet the application requirement.

Disclosure of Invention

In view of the above problems, the present disclosure provides a three-dimensional hand recognition method, apparatus, electronic device, and storage medium.

In a first aspect, an embodiment of the present disclosure provides a three-dimensional hand recognition method, including:

obtaining a hand image to be identified;

processing the hand image to be recognized by using a hand two-dimensional joint point recognition model to obtain the coordinates of each joint point of the hand in a two-dimensional image coordinate system;

converting the coordinates of each joint point of the hand under a two-dimensional image coordinate system by using a hand three-dimensional joint point conversion model to obtain the coordinates of each joint point of the hand under the three-dimensional coordinate system;

and generating and outputting a three-dimensional hand model according to the coordinates of each joint point of the hand in a three-dimensional coordinate system.

In a second aspect, an embodiment of the present disclosure provides a three-dimensional hand recognition device, including:

the acquisition module is used for acquiring a hand image to be identified;

the recognition module is used for processing the hand image to be recognized by utilizing a hand two-dimensional joint point recognition model to obtain the coordinates of each joint point of the hand in a two-dimensional image coordinate system;

the conversion module is used for converting the coordinates of each joint point of the hand under the two-dimensional image coordinate system by using the hand three-dimensional joint point conversion model to obtain the coordinates of each joint point of the hand under the three-dimensional coordinate system;

and the modeling module is used for generating and outputting a three-dimensional hand model according to the coordinates of each joint point of the hand in a three-dimensional coordinate system.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the three-dimensional hand recognition method as described above in relation to the first aspect and the various possible aspects of the first aspect.

In a fourth aspect, the embodiments of the present disclosure provide a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the three-dimensional hand recognition method described in the first aspect and various possible references to the first aspect is implemented.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic diagram of a network architecture upon which the present disclosure is based;

fig. 2 is a schematic flow chart of a three-dimensional hand recognition method according to an embodiment of the present disclosure;

fig. 3 is an interface schematic diagram of a three-dimensional hand recognition method according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of another three-dimensional hand recognition method provided by the embodiment of the present disclosure;

fig. 5 is a block diagram of a three-dimensional hand recognition device according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

In view of the above problems, the present disclosure provides a three-dimensional hand recognition method, apparatus, electronic device, and storage medium, in which two-dimensional joint coordinates of a hand are obtained by recognition, and then the joint coordinates are converted from two-dimensional coordinates to three-dimensional coordinates by using a conversion model, so as to obtain joint coordinates of the hand in a three-dimensional coordinate system, and a three-dimensional hand model is established based on the obtained coordinates.

Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture based on which the present disclosure is based, and the network architecture shown in fig. 1 may specifically include a three-dimensional hand recognition device 2 and a terminal 1.

The terminal 1 may be a hardware device such as a mobile phone of a user, a desktop computer, a smart home device, a tablet computer, and the like, which may be used to collect images, and the three-dimensional hand recognition device 2 is hardware or software that may interact with each terminal 1 through a network, and may be used to execute the three-dimensional hand recognition method in each of the following examples, so as to perform hand recognition on the images obtained from each terminal 1, to obtain a recognition result including three-dimensional hand modeling, and output the recognition result, where an object of the output is each terminal 1.

In the network architecture shown in fig. 1, when the three-dimensional hand recognition device 1 is hardware, it may include a cloud server with a computing function; when the three-dimensional hand recognition device 1 is software, it can be installed in an electronic device with an arithmetic function, wherein the electronic device includes, but is not limited to, a laptop portable computer, a desktop computer, and the like.

That is, the three-dimensional hand recognition method based on the present disclosure may be specifically based on the embodiment shown in fig. 1, and is applicable to various application scenarios, including but not limited to: modeling and simulation of human hands, intelligent control of scenes based on gestures, and the like.

In a human hand modeling and simulation scene, the terminal 1 may be a physical device such as a desktop computer and a tablet computer, in which a hand image captured by a camera is stored, and after the three-dimensional hand recognition device obtains the hand image and completes processing, the obtained three-dimensional hand model is returned to the terminal 1 for display on an interface, or after further image special effect processing, the three-dimensional hand model is displayed, so as to complete human hand modeling and simulation.

In the gesture-based intelligent control scene, in order to acquire a three-dimensional gesture or hand shape of a user, a three-dimensional hand recognition device is required to process acquired image data to acquire a three-dimensional hand model, and then the control device further analyzes the gesture or hand shape presented by the hand based on the three-dimensional hand model to acquire a control instruction corresponding to the gesture or hand shape, so as to control the controlled device based on the control instruction.

In a first aspect, referring to fig. 2, fig. 2 is a schematic flow chart of a three-dimensional hand recognition method according to an embodiment of the present disclosure. The three-dimensional hand recognition method provided by the embodiment of the disclosure comprises the following steps:

step 101, obtaining a hand image to be recognized.

It should be noted that the execution subject of the processing method provided by this example is the aforementioned three-dimensional hand recognition device, which may acquire a hand image when the terminal executes its own task by interacting with the terminal, or pre-store the hand image that needs to be processed in the terminal. These images will be pre-processed to become hand images that can be processed by the three-dimensional hand recognition device. The preprocessing includes, but is not limited to, performing hand recognition, image segmentation, denoising, matrixing, and the like on the image.

And 102, processing the hand image to be recognized by using a hand two-dimensional joint point recognition model to obtain the coordinates of each joint point of the hand in a two-dimensional image coordinate system.

The embodiment provided by the disclosure can process the hand image to be recognized by adopting a two-dimensional joint point recognition model.

Specifically, the hand two-dimensional joint point recognition model is a neural network model for recognizing and outputting joint types and corresponding positions of hand joints in an image.

Further, the hand two-dimensional joint point recognition model may specifically utilize pixel detection techniques, i.e. by making joint type and location determinations for the attribution of pixels in the image. For example, some pixels belong to the first knuckle of the index finger, and the pixel positions of the pixels are the joint positions of the first knuckle of the index finger, and accordingly, the joint point coordinates of the first knuckle of the index finger can be directly determined through the joint positions.

The joint point coordinates output by the hand two-dimensional joint point recognition model are based on an image coordinate system, that is, the joint point coordinates are two-dimensional and are represented by (x, y), wherein x is an abscissa of the joint point in the two-dimensional image coordinate system, and y is an ordinate of the joint point in the two-dimensional image coordinate system.

And 103, converting the coordinates of each joint point of the hand in a two-dimensional image coordinate system by using a hand three-dimensional joint point conversion model to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

Specifically, the hand three-dimensional joint point conversion model may be implemented based on a machine learning algorithm, such as a CNN model, VGGNet, and the like. In the embodiment of the present disclosure, since it is necessary to convert the coordinates of each joint point of the hand in the two-dimensional image coordinate system into the coordinates of each joint point in the three-dimensional coordinate system, when the machine learning algorithm handles this process, it is necessary to perform optimization processing in the depth of data. The residual network model is a model for optimizing processing by increasing the depth of data, and can alleviate the problem of gradient disappearance caused by increasing the depth in a deep neural network because a jump connection is used in a residual block inside the model.

Based on this, the three-dimensional hand joint point transformation model of the embodiment of the present disclosure may specifically adopt a residual error network model. That is, the coordinates of the joint points of the hand in the three-dimensional coordinate system can be obtained by inputting the coordinates of the joint points of the hand in the two-dimensional image coordinate system to the residual network model so that the residual network model performs residual processing on the coordinates of the joint points of the hand in the two-dimensional image coordinate system.

Specifically, fig. 3 is a schematic structural diagram of a three-dimensional joint point transformation model of a three-dimensional hand recognition method according to an embodiment of the present disclosure, and as shown in fig. 3, the residual network model is composed of a first mapping layer 100, a first fully-connected layer 200, a second fully-connected layer 300, and a second mapping layer 400; each full-connection layer comprises two full-connection sub-modules connected in series;

when converting two-dimensional joint coordinates using a three-dimensional joint conversion model, first, each joint coordinate including a hand in a two-dimensional image coordinate system is input to the first mapping layer 100 so that the first mapping layer 100 maps it to a first input vector. Specifically, as shown in fig. 3, if there are 21 joint points of one hand, each joint point has two-dimensional coordinates, the number of the two-dimensional joint point coordinates is 42, and the 42 coordinates can form a 42-dimensional vector; the vector is mapped to 1024 dimensions through the first mapping layer 100, becoming a 1024-dimensional first input vector.

And inputting the first input vector into a first full-connection layer 200 of the residual error network model, so that two full-connection submodules in the first full-connection layer 200 sequentially process the first input vector to obtain a first processing vector. Specifically, as shown in fig. 3, the first fully-connected layer 200 includes two fully-connected sub-modules, where the two fully-connected sub-modules are connected in series, and can process the input first input vector in 1024 dimensions and output a 1024-dimensional first processed vector.

And superposing the first input vector and the first processing vector to obtain a first output vector output by the first full-connection layer. Specifically, as shown in fig. 3, a 1024-dimensional first processing vector and a 1024-dimensional first input vector are subjected to superposition processing, so as to obtain a 1024-dimensional first output vector, which is to be input to the second fully-connected layer 300.

Inputting the first output vector serving as a second input vector into a second full-connection layer of the residual error network model, so that two full-connection sub-modules in the second full-connection layer sequentially process the second input vector to obtain a second processing vector; and superposing the second input vector and the second processing vector to obtain a second output vector output by the first diode-diode fully-connected layer.

Specifically, as shown in fig. 3, similarly to the foregoing process, the second fully-connected layer 200 includes two fully-connected sub-modules, and the two fully-connected sub-modules are connected in series, and can process the input second input vector in 1024 dimensions and output a 1024-dimensional second first processed vector. The 1024-dimensional second processed vector and the 1024-dimensional second input vector are subjected to superposition processing, so that a 1024-dimensional second output vector is obtained, and the second output vector is input to the second mapping layer 400.

Inputting the second output vector to the second mapping layer 400, so that the second mapping layer 400 maps the second output vector to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system. Specifically, as shown in fig. 3, the second mapping layer 400 will map the 1024-dimensional second output vector such that the vector is mapped to 63 dimensions, i.e. 63 coordinates of 21 joint points are obtained, which may be (x, y, z), where x is the abscissa of the joint point in the normal plane, y is the ordinate of the joint point in the normal plane, and z is the perpendicular coordinate of the joint point in the normal plane.

And 104, generating and outputting a three-dimensional hand model according to the coordinates of each joint point of the hand in the three-dimensional coordinate system.

Specifically, after obtaining the coordinates of each joint point of the hand in the three-dimensional coordinate system, a three-dimensional model of the hand is also built based on each coordinate, and a specific modeling method thereof may adopt the prior art, which is not limited by the present disclosure. Fig. 4 is an interface schematic diagram of a three-dimensional hand recognition method according to an embodiment of the disclosure, as shown in fig. 4, a three-dimensional hand model shown in the right drawing can be obtained by processing a hand image shown in the left drawing by using the two-dimensional joint point recognition model and the three-dimensional joint point conversion model.

The three-dimensional hand model may be further image processed, such as hand effect processing, to render modeling more realistic.

In an optional embodiment, the three-dimensional hand recognition method further includes:

establishing a hand two-dimensional joint point identification model to be trained, and obtaining a training sample of the hand two-dimensional joint point identification model; wherein, the training sample of the hand two-dimensional joint point identification model comprises: the hand image sample and the coordinates of all joint points of the hand in the hand image sample under a two-dimensional image coordinate system; and training the hand two-dimensional joint point recognition model to be trained by using the training sample of the hand two-dimensional joint point recognition model to obtain the trained hand two-dimensional joint point recognition model.

establishing a hand three-dimensional joint point conversion model to be trained, and obtaining a training sample of the hand three-dimensional joint point conversion model; wherein, the training sample of the hand three-dimensional joint point conversion model comprises: coordinates of each joint point of the hand in a two-dimensional image coordinate system and coordinates of each corresponding joint point of the hand in a three-dimensional coordinate system; and training the hand three-dimensional joint point conversion model to be trained by using the training sample of the hand three-dimensional joint point conversion model to obtain the trained hand three-dimensional joint point conversion model.

According to the three-dimensional hand recognition method provided by the embodiment, the image to be recognized is obtained; respectively identifying the images to be identified by utilizing a plurality of hand identification models based on images to respectively obtain hand identification frames output by each hand identification model; the tracking network model is used for judging the accuracy of the hand image of each hand recognition frame, and the hand recognition frames passing judgment are output as hand image recognition results, so that a scheme that different hand recognition models can effectively recognize hands under different shooting scenes is utilized, the tracking network model is combined to judge the results, and then the accurate hand image recognition results are obtained.

Fig. 5 is a block diagram of a three-dimensional hand recognition device according to an embodiment of the present disclosure, which corresponds to the three-dimensional hand recognition method according to the foregoing embodiment. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 5, the three-dimensional hand recognition apparatus includes: an acquisition module 10, a recognition module 20, a transformation module 30 and a modeling module 40.

The system comprises an acquisition module 10, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a hand image to be recognized;

the recognition module 20 is configured to process the hand image to be recognized by using a hand two-dimensional joint point recognition model, and obtain coordinates of each joint point of the hand in a two-dimensional image coordinate system;

the conversion module 30 is configured to convert coordinates of each joint point of the hand in the two-dimensional image coordinate system by using the hand three-dimensional joint point conversion model to obtain coordinates of each joint point of the hand in the three-dimensional coordinate system;

and the modeling module 40 is used for generating and outputting a three-dimensional hand model according to the coordinates of each joint point of the hand in the three-dimensional coordinate system.

In an optional embodiment, the hand three-dimensional joint point conversion model is a residual error network model;

the conversion module 30 is specifically configured to: and inputting the coordinates of each joint point of the hand in the two-dimensional image coordinate system into a residual error network model so that the residual error network model performs residual error processing on the coordinates of each joint point of the hand in the two-dimensional image coordinate system to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

In an optional embodiment, the residual network model is composed of a first mapping layer, a first fully-connected layer, a second fully-connected layer, and a second mapping layer; each full-connection layer comprises two full-connection sub-modules connected in series;

correspondingly, the conversion module 30 is specifically configured to: inputting coordinates of all joint points of a hand under a two-dimensional image coordinate system to a first mapping layer so that the first mapping layer maps the coordinates into a first input vector; inputting the first input vector into a first full-connection layer of the residual error network model, so that two full-connection submodules in the first full-connection layer sequentially process the first input vector to obtain a first processing vector; superposing the first input vector and the first processing vector to obtain a first output vector output by the first full-connection layer; inputting the first output vector serving as a second input vector into a second full-connection layer of the residual error network model, so that two full-connection sub-modules in the second full-connection layer sequentially process the second input vector to obtain a second processing vector; superposing the second input vector and the second processing vector to obtain a second output vector output by the first diode-diode fully-connected layer; and inputting the second output vector to a second mapping layer so that the second mapping layer maps the second output vector to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

In an optional embodiment, the three-dimensional hand recognition device further includes: a first training module;

the first training module is used for establishing a hand two-dimensional joint point identification model to be trained and obtaining a training sample of the hand two-dimensional joint point identification model; wherein, the training sample of the hand two-dimensional joint point identification model comprises: the hand image sample and the coordinates of all joint points of the hand in the hand image sample under a two-dimensional image coordinate system;

and training the hand two-dimensional joint point recognition model to be trained by using the training sample of the hand two-dimensional joint point recognition model to obtain the trained hand two-dimensional joint point recognition model.

In an optional embodiment, the three-dimensional hand recognition device further comprises:

establishing a hand three-dimensional joint point conversion model to be trained, and obtaining a training sample of the hand three-dimensional joint point conversion model; wherein, the training sample of the hand three-dimensional joint point conversion model comprises: coordinates of each joint point of the hand in a two-dimensional image coordinate system and coordinates of each corresponding joint point of the hand in a three-dimensional coordinate system;

and training the hand three-dimensional joint point conversion model to be trained by using the training sample of the hand three-dimensional joint point conversion model to obtain the trained hand three-dimensional joint point conversion model.

In the three-dimensional hand recognition device provided in this embodiment, the two-dimensional joint point coordinates of the hand are obtained by recognition, and then the two-dimensional to three-dimensional coordinate conversion is performed on the joint point coordinates by using the conversion model, so as to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system, and the three-dimensional hand model is established based on the coordinates. By the mode, gestures or hand shapes in the two-dimensional images can be presented in a three-dimensional hand model mode, the method can be widely applied to various scenes controlled based on the three-dimensional hand model, and the control flexibility is effectively improved.

The electronic device provided in this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

Referring to fig. 6, a schematic diagram of a structure of an electronic device 900 suitable for implementing an embodiment of the present disclosure is shown, where the electronic device 900 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, the electronic device 900 may include a three-dimensional hand recognition device (e.g., a central processing unit, a graphics processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage device 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are also stored. The three-dimensional hand recognition device 901, the ROM902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

Generally, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication device 909 may allow the electronic apparatus 900 to perform wireless or wired communication with other apparatuses to exchange data. While fig. 6 illustrates an electronic device 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the three-dimensional hand recognition apparatus 901.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The following are some embodiments of the disclosure.

In a first aspect, according to one or more embodiments of the present disclosure, a three-dimensional hand recognition method includes:

obtaining a hand image to be identified;

In an optional embodiment provided by the present disclosure, the hand three-dimensional joint point transformation model is a residual error network model;

the method for converting the coordinates of each joint point of the hand under the two-dimensional image coordinate system by using the hand three-dimensional joint point conversion model to obtain the coordinates of each joint point of the hand under the three-dimensional coordinate system comprises the following steps:

and inputting the coordinates of each joint point of the hand in the two-dimensional image coordinate system into a residual error network model so that the residual error network model performs residual error processing on the coordinates of each joint point of the hand in the two-dimensional image coordinate system to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

In an optional embodiment provided by the present disclosure, the residual network model is composed of a first mapping layer, a first fully-connected layer, a second fully-connected layer, and a second mapping layer; each full-connection layer comprises two full-connection sub-modules connected in series;

correspondingly, the inputting the coordinates of each joint point of the hand in the two-dimensional image coordinate system into the residual error network model to enable the residual error network model to perform residual error processing on the coordinates of each joint point of the hand in the two-dimensional image coordinate system to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system includes:

inputting coordinates of all joint points of a hand under a two-dimensional image coordinate system to a first mapping layer so that the first mapping layer maps the coordinates into a first input vector;

inputting the first input vector into a first full-connection layer of the residual error network model, so that two full-connection submodules in the first full-connection layer sequentially process the first input vector to obtain a first processing vector;

superposing the first input vector and the first processing vector to obtain a first output vector output by the first full-connection layer;

inputting the first output vector serving as a second input vector into a second full-connection layer of the residual error network model, so that two full-connection sub-modules in the second full-connection layer sequentially process the second input vector to obtain a second processing vector;

superposing the second input vector and the second processing vector to obtain a second output vector output by the first diode-diode fully-connected layer;

and inputting the second output vector to a second mapping layer so that the second mapping layer maps the second output vector to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

In an optional embodiment provided by the present disclosure, the three-dimensional hand recognition method further includes:

establishing a hand two-dimensional joint point identification model to be trained, and obtaining a training sample of the hand two-dimensional joint point identification model; wherein, the training sample of the hand two-dimensional joint point identification model comprises: the hand image sample and the coordinates of all joint points of the hand in the hand image sample under a two-dimensional image coordinate system;

In a second aspect, according to one or more embodiments of the present disclosure, a three-dimensional hand recognition device comprises:

the acquisition module is used for acquiring a hand image to be identified;

the conversion module is specifically configured to: and inputting the coordinates of each joint point of the hand in the two-dimensional image coordinate system into a residual error network model so that the residual error network model performs residual error processing on the coordinates of each joint point of the hand in the two-dimensional image coordinate system to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

correspondingly, the conversion module is specifically configured to: inputting coordinates of all joint points of a hand under a two-dimensional image coordinate system to a first mapping layer so that the first mapping layer maps the coordinates into a first input vector; inputting the first input vector into a first full-connection layer of the residual error network model, so that two full-connection submodules in the first full-connection layer sequentially process the first input vector to obtain a first processing vector; superposing the first input vector and the first processing vector to obtain a first output vector output by the first full-connection layer; inputting the first output vector serving as a second input vector into a second full-connection layer of the residual error network model, so that two full-connection sub-modules in the second full-connection layer sequentially process the second input vector to obtain a second processing vector; superposing the second input vector and the second processing vector to obtain a second output vector output by the first diode-diode fully-connected layer; and inputting the second output vector to a second mapping layer so that the second mapping layer maps the second output vector to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

In an optional embodiment provided by the present disclosure, the three-dimensional hand recognition device further includes: a first training module;

In an optional embodiment provided by the present disclosure, the three-dimensional hand recognition device further comprises:

In a third aspect, in accordance with one or more embodiments of the present disclosure, an electronic device comprises: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the memory-stored computer-executable instructions causes the at least one processor to perform the three-dimensional hand recognition method of any one of the preceding claims.

In a fourth aspect, according to one or more embodiments of the present disclosure, a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement a three-dimensional hand recognition method as in any one of the preceding claims.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A three-dimensional hand recognition method, comprising:

obtaining a hand image to be identified;

2. The three-dimensional hand recognition method of claim 1, wherein the hand three-dimensional joint point transformation model is a residual network model;

3. The three-dimensional hand recognition method of claim 2, wherein the residual network model is comprised of a first mapping layer, a first fully-connected layer, a second fully-connected layer, and a second mapping layer; each full-connection layer comprises two full-connection sub-modules connected in series;

superposing the second input vector and the second processing vector to obtain a second output vector output by the second full-connection layer;

4. The three-dimensional hand recognition method according to any one of claims 1 to 3, further comprising:

5. The three-dimensional hand recognition method according to any one of claims 1 to 3, further comprising:

6. A three-dimensional, three-dimensional hand recognition device, comprising:

the acquisition module is used for acquiring a hand image to be identified;

7. The three-dimensional hand recognition device of claim 6, wherein the hand three-dimensional joint point transformation model is a residual network model;

8. The three-dimensional hand recognition device of claim 7, wherein the residual network model is comprised of a first mapping layer, a first fully-connected layer, a second fully-connected layer, and a second mapping layer; each full-connection layer comprises two full-connection sub-modules connected in series;

correspondingly, the conversion module is specifically configured to: inputting coordinates of all joint points of a hand under a two-dimensional image coordinate system to a first mapping layer so that the first mapping layer maps the coordinates into a first input vector; inputting the first input vector into a first full-connection layer of the residual error network model, so that two full-connection submodules in the first full-connection layer sequentially process the first input vector to obtain a first processing vector; superposing the first input vector and the first processing vector to obtain a first output vector output by the first full-connection layer; inputting the first output vector serving as a second input vector into a second full-connection layer of the residual error network model, so that two full-connection sub-modules in the second full-connection layer sequentially process the second input vector to obtain a second processing vector; superposing the second input vector and the second processing vector to obtain a second output vector output by the second full-connection layer; and inputting the second output vector to a second mapping layer so that the second mapping layer maps the second output vector to obtain the coordinates of each joint point of the hand in the three-dimensional coordinate system.

9. The three-dimensional hand recognition device of any one of claims 6-8, further comprising: a first training module;

10. The three-dimensional hand recognition device of any one of claims 6-8, further comprising:

11. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the three-dimensional hand recognition method of any of claims 1-5.

12. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the three-dimensional hand recognition method of any one of claims 1-5.