CN112883757B

CN112883757B - Method for generating tracking attitude result

Info

Publication number: CN112883757B
Application number: CN201911197449.0A
Authority: CN
Inventors: 梁晓辉; 徐方宇
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2023-03-24
Anticipated expiration: 2039-11-29
Also published as: CN112883757A

Abstract

Embodiments of the present disclosure disclose methods, apparatuses, electronic devices, and computer-readable media for generating tracking pose results. One embodiment of the method comprises: segmenting a hand region in the depth image according to the color feature of the wrist band marker in the image and the morphological feature of the upper arm of the hand to obtain a point cloud set; determining a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model; constructing a constraint equation according to the first corresponding point set and the second corresponding point set; and generating a tracking attitude result according to a constraint equation. According to the embodiment, the stable hand tracking posture result is generated, and meanwhile, the time delay of the generated hand tracking posture result is relatively short.

Description

Method for generating tracking attitude result

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for generating a tracking gesture result, an electronic device, and a computer-readable medium.

Background

With the continuous progress of information technology, gesture interaction is more important. The implementation of gesture interaction technology relies mostly on hand tracking technology. However, the hand tracking result in the existing hand tracking technology is prone to jitter or has a long time delay phenomenon.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose methods, apparatuses, electronic devices and computer readable media for generating tracking pose results to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method for generating tracking pose results, the method comprising: segmenting a hand region in the depth image according to the color feature of the wrist band marker in the image and the morphological feature of the upper arm of the hand to obtain a point cloud set; determining a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model; constructing a constraint equation according to the first corresponding point set and the second corresponding point set; and generating a tracking attitude result according to a constraint equation.

In a second aspect, some embodiments of the present disclosure provide an apparatus to generate tracked pose results, the apparatus comprising: the segmentation unit is configured to segment a hand area in the depth image according to the color feature of the wrist strap marker in the image and the hand upper arm shape feature to obtain a point cloud set; a determining unit configured to determine a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model, wherein the first corresponding point set comprises corresponding points of each point in the point cloud set and a surface of the three-dimensional hand model, and the second corresponding point set comprises corresponding points of the three-dimensional hand model and a two-dimensional contour of the depth image; a construction unit configured to construct constraint equations according to the first corresponding point set and the second corresponding point set, wherein the constraint equations comprise a space constraint equation, an attitude constraint equation and a time sequence constraint equation of hand tracking; a generating unit configured to generate a tracking pose result according to a constraint equation.

In a third aspect, an embodiment of the present application provides an electronic device, where the network device includes: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as described in any implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

One of the above-described various embodiments of the present disclosure has the following advantageous effects: the method comprises the steps of obtaining a point cloud set by dividing a hand region in a depth image, determining corresponding points based on the point cloud set and a three-dimensional hand model, constructing a constraint equation according to the corresponding points, and generating a tracking posture result according to the constraint equation. Therefore, a stable hand tracking posture result is generated, and meanwhile, the time delay of the generated hand tracking posture result is relatively short.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is an architectural diagram of an exemplary system in which some embodiments of the present disclosure may be applied;

FIG. 2 is a flow diagram of some embodiments of a method of generating tracking pose results according to the present disclosure;

FIG. 3 is a schematic block diagram of some embodiments of an apparatus for generating tracked pose results according to the present disclosure;

FIG. 4 is a schematic block diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 of a method of generating a tracked-pose result or an apparatus for generating a tracked-pose result to which some embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to provide images, etc. Various image capturing applications, such as a camera application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for generating tracking gesture results on the

terminal devices

101, 102, 103. The background server may analyze and otherwise process the received data such as the image request, and feed back a processing result (e.g., a tracking posture result) to the terminal device.

It should be noted that the method for generating the tracking gesture result provided by the embodiment of the present disclosure may be executed by the

terminal devices

101, 102, 103, or may be executed by the server 105. Accordingly, the means for generating the tracking gesture result may be provided in the

terminal devices

101, 102, 103, or may be provided in the server 105. And is not particularly limited herein.

The server and the client may be hardware or software. When the server and the client are hardware, a distributed server cluster formed by a plurality of servers can be realized, and a single server can also be realized. When the server and the client are software, they may be implemented as multiple pieces of software or software modules for providing distributed services, or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of some embodiments of a method of generating a tracked pose result according to the present disclosure is shown. The method for generating the tracking posture result comprises the following steps:

step 201, segmenting a hand region in the depth image according to the color feature of the wrist band marker in the image and the shape feature of the upper arm of the hand to obtain a point cloud set.

In some embodiments, an executive (e.g., a server as shown in fig. 1) of the method of generating tracked pose results may segment the hand region in the depth image and obtain a point cloud set based on the color features of the wrist markers in the image and the hand upper arm morphology features.

Here, the wristband marker is generally a solid-color marker, and may be a marker such as a yellow cloth tape, for example. The above-mentioned morphological characteristics of the upper arm of the hand are generally morphological characteristics of the lower arm, for example, a contour map of the lower arm. The depth image is generally an image in which the distance (depth) from an image pickup to each point in a scene is taken as a pixel value. Here, the point cloud generally refers to a set of points obtained after acquiring spatial coordinates of each sampling point on the surface of the object, and is referred to as a "point cloud". The segmentation generally refers to image segmentation, that is, a technique and a process for dividing an image into a plurality of specific regions with unique properties and proposing an object of interest. As an example, the segmentation may be performed with a wristband marker as a boundary in an image of the upper arm shape of the winning arm in the depth image.

In some optional implementation manners of some embodiments, the execution main body may filter the image to obtain a set of pixel points corresponding to the marker area of the wrist band; determining the average value of the corresponding depth values of the pixels in the pixel set in the depth image; determining pixel points of which the difference value between the depth value and the average value in the depth image is smaller than a preset threshold value as pixel points of the forearm and the hand area, and acquiring a three-dimensional point cloud point set corresponding to the pixel points; determining a segmentation area based on the pixel point set and the three-dimensional point cloud point set; and taking the point cloud point set and the points of the pixel point set in the segmentation region as hand segmentation results.

Here, the filtering generally means filtering the image in a color space. Color space (HSV) generally refers to a color space created in 1978 according to the intuitive nature of color, and is also called a hexagonal cone model. The determining of the segmentation area based on the pixel point set and the three-dimensional point cloud point set is generally obtained according to the following steps: and performing principal component analysis on the three-dimensional point cloud point set to obtain a principal component direction as an orientation vector of the forearm, and determining the gravity center of the three-dimensional point cloud point set as the wrist gravity center. The center of sphere of the divided region is determined using the wrist center of gravity as a starting point and the orientation vector of the forearm as a deviation direction.

Step 202, determining a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model.

In some embodiments, the executing entity (e.g., the server shown in fig. 1) may determine a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model, where the first corresponding point set includes corresponding points of each point in the point cloud set and a surface of the three-dimensional hand model, and the second corresponding point set includes corresponding points of the three-dimensional hand model and a two-dimensional contour of the depth image. Here, the three-dimensional hand model is generally a polygonal representation of the hand, and is typically displayed by a computer or other video device. As an example, the three-dimensional hand model may be constructed according to the following steps: and constructing a three-dimensional hand model layer by layer according to a key point coordinate set in each part of the given hand, a key point structure sphere radius set, a connection relation of tangent planes of each structure sphere, a corresponding transformation matrix set of each part and hand posture parameters, and determining the hand posture and the model surface.

As an example, the above-mentioned determining the first set of corresponding points may be determined according to the following steps: and (3) taking one point in the cloud point set, solving the distance and projection point from the point to each basic geometric body forming the three-dimensional hand model, taking the projection point on the surface of the geometric body with the shortest distance as a pair of space corresponding points, traversing the hand point cloud set, and solving all the corresponding points as a first corresponding point set.

In some optional implementations of some embodiments, the executing body may perform orthogonal projection on the three-dimensional hand model to obtain a model contour; sampling the model contour to obtain a contour point set; and for each point in the contour point set, determining the closest corresponding point of the point in the contour of the depth image as a second corresponding point, and obtaining a second corresponding point set.

And step 203, constructing a constraint equation according to the first corresponding point set and the second corresponding point set.

In some embodiments, an executive of the method of generating a tracked pose result may construct a constraint equation from the first set of corresponding points and the second set of corresponding points. The above-mentioned constraints are generally referred to herein as constraints on other objects in which the object under consideration is limited. The action of a constraint on an object being constrained is referred to as a constraint force or constraint counter force, also often referred to simply as a counter force. The constraint equations described above generally refer to mathematical expressions of constraints.

The executing entity may construct the constraint equation in various ways, and the constraint equation may be constructed according to the following steps, for example: and (3) specifying a value interval of each one-dimensional component of the hand gesture parameters, constructing a joint parameter selection matrix according to whether each component is in the value interval or not, and setting the corresponding position of the diagonal of the matrix to be 0 if the component is in the interval, or setting the corresponding position to be 1 if the component is not in the interval. And constructing a parameter correction vector according to the distance of each component distance interval, finally obtaining an augmentation matrix according to the information, and constructing a constraint equation.

In some optional implementations of some embodiments, the constraint equation includes at least one of: a space constraint equation, an attitude constraint equation, and a timing constraint equation for hand tracking.

In some optional implementations of some embodiments, the executing subject may determine, for each pair of corresponding points of the first corresponding point set and the point cloud set, a coordinate difference of the pair of corresponding points and a jacobian matrix of the position of the first corresponding point in the three-dimensional hand model with respect to each pose parameter; and constructing a space constraint equation according to the coordinate difference and the Jacobian matrix.

Here, the jacobian matrix generally refers to a matrix in which first-order partial derivatives are arranged in a certain manner in a vector calculus, and its determinant is called jacobian.

As an example, the above-mentioned construction of the spatial constraint equation from the above-mentioned coordinate differences and the above-mentioned jacobian matrix is generally constructed according to the following steps: and vertically arranging the Jacobian matrixes to serve as coefficient matrixes, adding a column of coordinate differences of corresponding points to the right sides of the coefficient matrixes to obtain augmented matrixes, and further obtaining a matrix equation of space constraint.

And step 204, generating a tracking attitude result according to a constraint equation.

In some embodiments, the executive may solve the constraint equations to generate the tracked pose results. Here, the tracking gesture result generally refers to a numerical value, an image, text, video, or the like for characterizing the tracking gesture. As an example, the generation execution body may solve the constraint equation to obtain a tracking attitude parameter, and then construct an image or a video of a tracking attitude according to the tracking attitude parameter.

In some optional implementations of some embodiments, the executing body may convert the constraint equation into a mathematical equation form, where the constraint equation can be solved after being converted into the mathematical equation form; and solving the converted constraint equation to obtain a tracking attitude result.

Some embodiments of the present disclosure provide methods that can generate a relatively stable hand tracking gesture result, and at the same time, the time delay of the generated hand tracking gesture result is relatively short.

With further reference to fig. 3, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of an apparatus for generating a tracking pose result, which correspond to those method embodiments illustrated in fig. 2, which may be particularly applicable in various electronic devices.

As shown in FIG. 3, an apparatus 300 for generating tracked pose results of some embodiments comprises: a segmentation unit 301, a determination unit 302, a construction unit 303 and a generation unit 304. The segmentation unit 301 is configured to segment a hand region in the depth image according to color features and hand upper arm shape features of the wrist marker in the image to obtain a point cloud set; a determining unit 302 configured to determine a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model, wherein the first corresponding point set comprises corresponding points of each point in the point cloud set and a surface of the three-dimensional hand model, and the second corresponding point set comprises corresponding points of the three-dimensional hand model and a two-dimensional contour of the depth image; a constructing unit 303 configured to construct a constraint equation according to the first corresponding point set and the second corresponding point set; a generating unit 304 configured to generate a tracking pose result according to the constraint equation.

In an alternative implementation of some embodiments, the segmentation unit 301 of the apparatus 300 for generating tracking pose results is further configured to: filtering the image to obtain a pixel point set corresponding to the marker area of the wrist strap of the hand; determining an average value of corresponding depth values of the pixel points in the pixel point set in the depth image; determining pixel points of which the difference value between the depth value and the average value in the depth image is smaller than a preset threshold value as pixel points of the forearm and the hand area, and acquiring a three-dimensional point cloud point set corresponding to the pixel points; determining a segmentation area based on the pixel point set and the three-dimensional point cloud point set; and taking the point cloud point set and the point of the pixel point set in the segmentation region as a hand segmentation result.

In an optional implementation of some embodiments, the determining unit 302 of the apparatus 300 to generate the tracked pose result is further configured to: performing orthogonal projection on the three-dimensional hand model to obtain a model outline; sampling the model contour to obtain a contour point set; and for each point in the contour point set, determining the closest corresponding point of the point in the contour of the depth image as a second corresponding point, and obtaining a second corresponding point set.

In an alternative implementation of some embodiments, the constraint equation includes at least one of: a space constraint equation, an attitude constraint equation, and a timing constraint equation for hand tracking.

In an optional implementation of some embodiments, the constructing unit 303 of the apparatus 300 for generating tracking pose results is further configured to: for each pair of corresponding points of the first corresponding point set and the point cloud set, determining the coordinate difference of the corresponding points and a Jacobian matrix of the first corresponding point position in the hand three-dimensional model relative to each attitude parameter; and constructing a space constraint equation according to the coordinate difference and the Jacobian matrix.

In an optional implementation of some embodiments, the generating 304 of the apparatus 300 to generate the tracked pose results is further configured to: converting the constraint equation into a mathematical equation form, wherein the constraint equation can be solved after being converted into the mathematical equation form; and solving the converted constraint equation to obtain a tracking attitude result.

Referring now to fig. 4, a schematic diagram of an electronic device (e.g., the server of fig. 1) 400 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 4 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 409, or from the storage device 408, or from the ROM 402. The computer program, when executed by the processing apparatus 401, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: segmenting a hand region in the depth image according to the color feature of the wrist band marker in the image and the morphological feature of the upper arm of the hand to obtain a point cloud set; determining a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model, wherein the first corresponding point set comprises corresponding points of each point in the point cloud set and the surface of the three-dimensional hand model, and the second corresponding point set comprises corresponding points of the three-dimensional hand model and the two-dimensional contour of the depth image; constructing a constraint equation according to the first corresponding point set and the second corresponding point set; and generating a tracking attitude result according to the constraint equation.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a segmentation unit, a determination unit, a construction unit, and a generation unit. The names of the cells do not limit the cells themselves in some cases, and for example, the segmentation unit may also be described as a unit for obtaining a point cloud set by segmenting a hand region in a depth image according to color features and hand upper arm shape features of wrist band markers in an image.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for generating tracked pose results, comprising:

segmenting a hand area in the depth image according to the color feature of the wrist strap marker in the image and the morphological feature of the upper arm of the hand to obtain a point cloud set;

determining a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model, wherein the first corresponding point set comprises corresponding points of each point in the point cloud set and the surface of the three-dimensional hand model, and the second corresponding point set comprises corresponding points of the three-dimensional hand model and the two-dimensional contour of the depth image;

constructing a constraint equation according to the first corresponding point set and the second corresponding point set;

generating a tracking attitude result according to the constraint equation;

wherein, according to the color characteristic and the hand upper arm form characteristic of the wrist tape marker in the image, the hand region in the depth image is divided, including:

filtering the image to obtain a pixel point set corresponding to the marker area of the wrist strap of the hand;

determining an average value of corresponding depth values of the pixels in the pixel set in the depth image;

determining pixel points of which the difference value between the depth value and the average value in the depth image is smaller than a preset threshold value as pixel points of the forearm and the hand area, and acquiring a three-dimensional point cloud point set corresponding to the pixel points;

determining a partition area based on the pixel point set and the three-dimensional point cloud point set;

taking the three-dimensional point cloud point set and the points of the pixel point set in the segmentation region as hand segmentation results;

wherein the determining a segmentation region based on the set of pixel points and the set of three-dimensional point cloud points comprises:

and performing principal component analysis on the three-dimensional point cloud point set to obtain a principal component direction as an orientation vector of the forearm, determining the center of gravity of the three-dimensional point cloud point set as the center of gravity of the wrist, and determining the center of sphere of the segmentation area by taking the center of gravity of the wrist as a starting point and the orientation vector of the forearm as an offset direction.

2. The method of claim 1, wherein the determining a first set of corresponding points and a second set of corresponding points based on the point cloud set and a three-dimensional hand model comprises:

performing orthogonal projection on the three-dimensional hand model to obtain a model outline;

sampling is carried out on the model contour to obtain a contour point set;

and for each point in the contour point set, determining the closest corresponding point of the point in the contour of the depth image as a second corresponding point, and obtaining a second corresponding point set.

3. The method of claim 1, wherein the constraint equation comprises at least one of: a space constraint equation, an attitude constraint equation, and a timing constraint equation for hand tracking.

4. The method of claim 3, wherein said constructing a constraint equation from said first and second corresponding points comprises:

for each pair of corresponding points of the first corresponding point set and the point cloud set, determining the coordinate difference of the corresponding points and a Jacobian matrix of the first corresponding point position in the three-dimensional hand model relative to each attitude parameter;

and constructing a space constraint equation according to the coordinate difference and the Jacobian matrix.

5. The method of claim 1, wherein the generating tracking pose results according to the constraint equation comprises:

converting the constraint equation into a mathematical equation form, wherein the constraint equation can be solved after being converted into the mathematical equation form;

and solving the converted constraint equation to obtain a tracking attitude result.

6. An apparatus for generating tracked pose results, comprising:

the segmentation unit is configured to segment a hand area in the depth image according to the color feature of the wrist strap marker in the image and the hand upper arm shape feature to obtain a point cloud set;

a determining unit configured to determine a first corresponding point set and a second corresponding point set based on the point cloud set and the three-dimensional hand model, wherein the first corresponding point set comprises corresponding points of each point in the point cloud set and a surface of the three-dimensional hand model, and the second corresponding point set comprises corresponding points of the three-dimensional hand model and a two-dimensional contour of a depth image;

a construction unit configured to construct a constraint equation from the first set of corresponding points and the second set of corresponding points;

a generating unit configured to generate a tracking attitude result according to the constraint equation;

wherein the segmentation unit is further configured to: filtering the image to obtain a pixel point set corresponding to the marker area of the wrist strap of the hand; determining an average value of corresponding depth values of the pixels in the pixel set in the depth image; determining pixel points of which the difference value between the depth value and the average value in the depth image is smaller than a preset threshold value as pixel points of the forearm and the hand area, and acquiring a three-dimensional point cloud point set corresponding to the pixel points; determining a partition area based on the pixel point set and the three-dimensional point cloud point set; taking the three-dimensional point cloud point set and the points of the pixel point set in the segmentation region as hand segmentation results;

wherein the determining a segmentation region based on the set of pixel points and the set of three-dimensional point cloud points comprises: and performing principal component analysis on the three-dimensional point cloud point set to obtain a principal component direction as an orientation vector of the forearm, determining the center of gravity of the three-dimensional point cloud point set as the center of gravity of the wrist, and determining the center of sphere of the segmentation area by taking the center of gravity of the wrist as a starting point and the orientation vector of the forearm as an offset direction.

7. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

8. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.