CN111079618A

CN111079618A - Three-dimensional tracking gesture observation likelihood modeling method and device

Info

Publication number: CN111079618A
Application number: CN201911258862.3A
Authority: CN
Inventors: 周智
Original assignee: Unicloud Nanjing Digital Technology Co Ltd
Current assignee: Unicloud Nanjing Digital Technology Co Ltd
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-04-28

Abstract

A three-dimensional tracked gesture observation likelihood modeling method and device, the method comprising: acquiring a gesture image and preprocessing the gesture image; performing gesture observation likelihood measurement based on image edge characteristics and gesture observation likelihood measurement based on gesture foreground characteristics according to the preprocessed gesture image; and combining the gesture observation likelihood measurement based on the image edge characteristic and the gesture observation likelihood measurement based on the gesture foreground characteristic to construct a gesture observation likelihood model. The method can accurately extract the depth information in three-dimensional stereoscopic vision tracking, and can obtain better gesture understanding and recognition effects.

Description

Three-dimensional tracking gesture observation likelihood modeling method and device

Technical Field

The invention belongs to the field of computers, and particularly relates to a method, a device and a computer readable storage medium.

Background

With the development and popularization of artificial intelligence technology, the modeling and recognition of gesture gestures are increasingly applied to human emotion recognition and intelligent traffic control. An efficient gesture model has to be established to obtain a better understanding and recognition result, wherein the gesture tracking is most widely applied to gesture understanding and recognition especially in the gesture state tracking, but the currently used gesture modeling method still follows the principle of gradient descent in later specific recognition, so that the embarrassment of local minimum can not be avoided, and the recognition effect is poor.

Gesture modeling is the most basic step in artificial intelligence emotion recognition and computer vision gesture tracking, but the field classical three-dimensional gesture modeling method fills up the defect of fast generation of a two-hand interaction model based on hand actions and gestures to a certain extent, but shows the defect of overhigh linear saturation in three-dimensional stereoscopic vision tracking, and particularly cannot accurately extract three-dimensional scene depth information.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, one of the objectives of the present invention is to solve the problem that the existing gesture modeling method cannot accurately extract depth information in the face of three-dimensional stereoscopic tracking.

The embodiment of the invention discloses a three-dimensional tracking gesture observation likelihood modeling method and a three-dimensional tracking gesture observation likelihood modeling device, wherein the method comprises the following steps: acquiring a gesture image and preprocessing the gesture image; performing gesture observation likelihood measurement based on image edge characteristics and gesture observation likelihood measurement based on gesture foreground characteristics according to the preprocessed gesture image; and combining the gesture observation likelihood measurement based on the image edge characteristic and the gesture observation likelihood measurement based on the gesture foreground characteristic to construct a gesture observation likelihood model. The method can accurately extract the depth information in three-dimensional stereoscopic vision tracking, and can obtain better gesture understanding and recognition effects.

In one possible embodiment, the gesture observation likelihood model is as follows:

wherein x is_iDenotes the ith particle, w_edge(Z|x_i) The weight of the ith particle after sampling is measured for the gesture observation likelihood based on the image edge characteristics; w is a_foreground(Z|x_i) Weight of the ith particle for gesture observation likelihood metric based on gesture foreground feature, d_edgeAnd d_foregroundRespectively represent according to w_edge(Z|x_i) And w_foreground(Z|x_i) The variance between the obtained particle weights.

In one possible embodiment, the pre-processing comprises: and carrying out serialization processing on the gesture image.

In one possible embodiment, the performing the gesture observation likelihood metric based on the image edge feature and the gesture observation likelihood metric based on the gesture foreground feature includes: acquiring edge characteristics of a gesture frame image by using edge detection, acquiring a two-dimensional gesture outline of a hand model according to projection coordinates of each joint point of the model on a two-dimensional image plane, and calculating the similarity between the state of the model and an observed value according to the edge and the gesture outline; and comparing the similarity of the model state and the current observed value according to the coincidence degree between the gesture foreground and the projection of the model in the two-dimensional plane.

In one possible embodiment, the image edge feature-based gesture observation likelihood metric is represented as: p is a radical of_edge＝exp(-d_chamfef(edge,counter)),

The gesture observation likelihood metric based on the gesture foreground features is expressed as:

P_foreground＝exp{-[S_foreground∪S_projection]-[S_foreground∩S_projection]},

wherein, Chamfer represents the Chamfer distance of the image edge and the gesture model outline, S_foregroundRepresenting the foreground region of the gesture, S_projectionRepresenting a three-dimensional gesture model projection.

In one possible embodiment, the method further comprises: adjusting the weight w_edge(Z|x_i) And w_foreground(Z|x_i) So as to obtain a modeling result conforming to a preset accuracy.

The embodiment of the invention also discloses a three-dimensional tracking gesture observation likelihood modeling device, which comprises: the preprocessing module is used for acquiring a gesture image and preprocessing the gesture image; the observation likelihood measurement module is used for performing gesture observation likelihood measurement based on image edge characteristics and gesture observation likelihood measurement based on gesture foreground characteristics according to the preprocessed gesture image; and the construction module is used for constructing a gesture observation likelihood model by combining the gesture observation likelihood metric based on the image edge characteristics and the gesture observation likelihood metric based on the gesture foreground characteristics.

In one possible embodiment, the preprocessing module is further configured to: and carrying out serialization processing on the gesture image.

In a possible embodiment, the observation likelihood measurement module is further configured to acquire an edge feature of the gesture frame image by using edge detection, acquire a two-dimensional gesture contour of the hand model according to projection coordinates of each joint point of the model on a two-dimensional image plane, and calculate similarity between the model state and the observation value according to the edge and the gesture contour; and comparing the similarity of the model state and the current observed value according to the coincidence degree between the gesture foreground and the projection of the model in the two-dimensional plane.

In one possible embodiment, the method further comprises: the adjusting module is used for adjusting the weight w_edge(Z|x_i) And w_foreground(Z|x_i) So as to obtain a modeling result conforming to a preset accuracy.

The invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the computer program realizes any one of the three-dimensional tracking gesture observation likelihood modeling methods.

The invention has the beneficial effects that: the method provides a gesture observation likelihood modeling method based on three-dimensional tracking of particle filtering through preprocessing of gesture image information, image information gesture similarity measurement and similarity measurement based on foreground gesture features. The method improves the precision of the particle filter algorithm and can obtain better gesture understanding and recognition effects. And the conclusion that the result of the gesture observation likelihood model obtained by observing and fusing the two new three-dimensional tracking can be well matched with the original gesture information of the three-dimensional tracking gesture observation likelihood model basic platform is obtained through the verification of the three-dimensional tracking gesture observation likelihood model, so that a better modeling effect is achieved.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image preprocessing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a result of detecting an edge of a gesture image according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a model verification effect according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.

The embodiment of the invention discloses a three-dimensional tracking gesture observation likelihood modeling method, which comprises the following steps of:

s101, acquiring a gesture image and preprocessing the gesture image.

In one embodiment, the gesture images may be acquired by a camera.

In one embodiment, the pre-processing comprises: and carrying out serialization processing on the gesture image.

Specifically, basic preprocessing is performed according to the expression of a figure, the gesture original signal is obtained from a calibration camera, however, interference events such as overexposure and non-gaussian environment intervention can occur in the process of acquiring an image by the camera, so that the acquired gesture digital image has serious distortion, and the distortion can influence the recognition of the gesture posture in the later period to a great extent, so that preprocessing operation must be performed on the interference signal in advance to meet the requirement of high-quality digital image processing.

Fig. 2 shows the flow of the pre-processing. The first step in the pre-processing process is gesture image acquisition. The scheme can be established under an OpenCV library function of a C + + integrated development environment, a CreatCaputewineWindow () method is sequentially called to establish a capture window of a data source, a CraetWebDriver () method is called to establish connection between a gesture camera and a VS, a DlgCammmorsource () method is called to complete setting of relevant parameters of a camera capture gesture image, finally a callback pointer of a CameraCallbackImg () method is used to point to the capture window to complete transmission of a digital image, after each transmission, the VS automatically judges whether a captured object meets a minimum frame frequency rule or not, if the captured object meets the minimum frame frequency rule, the digital signal is continuously captured, otherwise, sampling and extracting are carried out again on the image.

Gesture image serialization is the addition of a fourth time dimension to the three-dimensional RGB digital image signal. The method is generally set by adopting a settimecapacity () method, and after the image has a fourth dimension related to time, various digital image processing methods such as segmentation and enhancement can be adopted to perform corresponding processing. Digital image processing is based on a gray scale image, and a color gesture image can be grayed by using Laplace gray scale transformation as shown in a formula (3-1):

Gray(x,y)＝0.286*R(x,y)+0.584*G(x,y)+0.128*B(x,y) (3-1)

and then Canny edge detection is carried out, a Gaussian filter is used for smoothing an image, the amplitude and the direction of a gradient are calculated by using finite difference of first-order partial derivatives, then depth information is extracted from the corner points, non-maximum suppression is carried out on the combined Canny detection image, finally, the edge is detected and connected by using a dual-threshold algorithm, and hysteresis threshold processing is carried out, so that high-dimensional well depth information of edge detection can be obtained. Although the depth information of the background can be extracted to the maximum extent by edge detection, the problem of edge discontinuity still can be caused by insufficient extraction of corner points, a Chamfer distance transformation formed according to hand-type topological information can be adopted, and missing contours can be completed by utilizing the complementary action of topology. The edge detection results are shown in fig. 3.

And S102, performing gesture observation likelihood measurement based on image edge characteristics and gesture observation likelihood measurement based on gesture foreground characteristics according to the preprocessed gesture image.

Specifically, edge characteristics of a gesture frame image can be obtained by using edge detection, a two-dimensional gesture outline of the hand model is obtained according to projection coordinates of each joint point of the model on a two-dimensional image plane, and similarity between a model state and an observed value is calculated according to the edge and the gesture outline; furthermore, the similarity of the model state and the current observed value can be compared according to the coincidence degree between the gesture foreground and the projection of the model in the two-dimensional plane.

In one embodiment, performing an image edge feature-based gesture similarity metric comprises:

first two data points are defined

And

the Chamfer distance transformation can be performed on the binarized discontinuous edge as follows, and the Chamfer distance transformation is defined as shown in equation (3-2):

then constructing the scanning and traversing process of each digital primitive: definition P and

respectively representing a high-dimensional depth-of-field pixel set and a three-dimensional gesture modeling pixel set in a binary image, and performing one-time traversal by using a binary primitive method shown as a formula (3-3), namely, when a point coordinate element of a next edge image does not belong to an edge any more, a result of one-time scanning is marked as 0, and when the point coordinate element of the next edge image belongs to the edgeThe result of one scan is +1 on the last state and finally adopts a minimization process, which is to ensure that the Chamfer distance has a certain digital frame gradient.

f₁(p)＝min{f₁(q)+1:q∈B(p)}fp∈<P>(3-3)

Then, defining the coordinate of p as (x, y), four element points of (x +1, y), (x-1, y), (x, y +1) and (x, y-1) exist in the four adjacent domains at the point, and for the edge point after the first scanning, a second scanning is performed as shown in a formula (3-4), so that the processing can ensure that the well depth of the second-order gradient can be matched with the model obtained by modeling of the classical three-dimensional gesture. In order to ensure that the accuracy of the two scans can be superimposed on each other, the second scan is performed in the opposite direction to the first scan.

f₂(p)＝min{f₁(p),f₂(q)+1:q∈A(p)} (3-4)

After the edge is introduced into the Chamfer distance transformation, a gesture similarity measurement of the edge and the gesture outline can be defined, as shown in the formula (3-5), so that three-dimensional tracked gesture observation likelihood information can be obtained:

p_edge＝exp(-d_chamfef(edge,counter)) (3-5)

in one embodiment, performing a similarity metric based on the gesture foreground features comprises: the gesture foreground image can be obtained through a Gaussian mixture model or a background difference combined Bayesian method. With the camera parameters and model state parameters known, a model projection in the image plane of the gesture state can be obtained.

Defining a similarity likelihood function according to the gesture foreground image and the model projection information, as shown in a formula (3-6), wherein a union represents a maximum combination pixel area of a classic three-dimensional gesture model foreground information pixel point and a high-order projection gesture information pixel point, and an intersection represents a common area of the classic three-dimensional gesture model foreground information pixel point and the high-order projection gesture information pixel point, so that formed similarity is obtainedThe measure of sexual activity is denoted P_foregroundThe expression shows that the similarity measurement of the latter adds high-order projection gesture information on the basis of the classical model.

P_foreground＝exp{-[S_foreground∪S_projection]-[S_foreground∩S_projection]} (3-6)

S103, combining the gesture observation likelihood measurement based on the image edge feature and the gesture observation likelihood measurement based on the gesture foreground feature to construct a gesture observation likelihood model.

After the gesture observation likelihood measurement based on the image edge characteristics and the gesture observation likelihood measurement based on the gesture foreground characteristics are obtained, a three-dimensional tracking gesture observation likelihood model can be established, and the method is mainly used for redistributing the weight after sampling of the gesture observation likelihood measurement based on the image edge characteristics and the weight of the gesture observation likelihood measurement based on the gesture foreground characteristics to obtain a more ideal model effect.

The gesture observation likelihood model is as follows:

The method further comprises the following steps: adjusting the weight w_edge(Z|x_i) And w_foreground(Z|x_i) So as to obtain a modeling result conforming to a preset accuracy.

The adjustment of the weight is shown in the formula (3-8) and the formula (3-9),

the method provides a gesture observation likelihood modeling method based on three-dimensional tracking of particle filtering through preprocessing of gesture image information, image information gesture similarity measurement and similarity measurement based on foreground gesture features. The method improves the precision of the particle filter algorithm, can accurately extract depth information in three-dimensional stereoscopic vision tracking, and can obtain better gesture understanding and recognition effects. And the conclusion that the result of the gesture observation likelihood model obtained by observing and fusing the two new three-dimensional tracking can be well matched with the original gesture information of the three-dimensional tracking gesture observation likelihood model basic platform is obtained through the verification of the three-dimensional tracking gesture observation likelihood model, so that a better modeling effect is achieved.

In the verification stage, the camera acquires gestures, the gesture of the palm is kept unchanged, the gesture of the index finger is generated to move in a translation mode according to an x coordinate axis, in a slow rotation mode according to a Y coordinate axis and in a slow rotation mode along a z coordinate axis, and then three groups of representative hand-shaped motions are selected according to experimental characteristics to respectively represent the translation of the palm and the rotation of the fingers around two axes. As shown in FIG. 4, the left side of each graph represents the classical three-dimensional tracked gesture and the high-order projection gesture information, and the result that the classical three-dimensional tracked gesture and the high-order projection gesture information are fused in the graph can be well matched with the original gesture information of the three-dimensional tracked gesture observation likelihood model basic platform, so that a good modeling effect is achieved.

The embodiment of the invention also discloses a three-dimensional tracking gesture observation likelihood modeling device 10, which comprises: the preprocessing module 101 is configured to acquire a gesture image and preprocess the gesture image; the observation likelihood measurement module 102 is configured to perform gesture observation likelihood measurement based on image edge features and gesture observation likelihood measurement based on gesture foreground features according to the preprocessed gesture image; and the constructing module 103 is used for constructing a gesture observation likelihood model by combining the gesture observation likelihood metric based on the image edge feature and the gesture observation likelihood metric based on the gesture foreground feature.

In one embodiment the gesture observation likelihood model is as follows:

In one embodiment the pre-processing module is further configured to: and carrying out serialization processing on the gesture image.

In one embodiment, the observation likelihood measurement module is further configured to acquire an edge feature of a gesture frame image by using edge detection, acquire a two-dimensional gesture contour of the hand model according to projection coordinates of each joint point of the model on a two-dimensional image plane, and calculate similarity between a model state and an observation value according to the edge and the gesture contour; and comparing the similarity of the model state and the current observed value according to the coincidence degree between the gesture foreground and the projection of the model in the two-dimensional plane.

In one embodiment the apparatus 10 further comprises: the adjusting module 104 is used for adjusting the weight w_edge(Z|x_i) And w_foreground(Z|x_i) So as to obtain a modeling result conforming to a preset accuracy.

For the specific implementation of the apparatus 10, reference may be made to the method embodiment, which is not described in detail.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments are merely illustrative, and for example, a division of a unit is merely a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

While the invention has been described in terms of its preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A three-dimensional tracked gesture observation likelihood modeling method is characterized by comprising the following steps:

acquiring a gesture image and preprocessing the gesture image;

performing gesture observation likelihood measurement based on image edge characteristics and gesture observation likelihood measurement based on gesture foreground characteristics according to the preprocessed gesture image;

and combining the gesture observation likelihood measurement based on the image edge characteristic and the gesture observation likelihood measurement based on the gesture foreground characteristic to construct a gesture observation likelihood model.

2. The method of claim 1, wherein the gesture observation likelihood model is as follows:

3. The method of claim 1, wherein the pre-processing comprises: and carrying out serialization processing on the gesture image.

4. The method of claim 1 or 2, wherein performing the image edge feature based gesture observation likelihood metric and the gesture foreground feature based gesture observation likelihood metric comprises: acquiring edge characteristics of a gesture frame image by using edge detection, acquiring a two-dimensional gesture outline of a hand model according to projection coordinates of each joint point of the model on a two-dimensional image plane, and calculating the similarity between the state of the model and an observed value according to the edge and the gesture outline; and comparing the similarity of the model state and the current observed value according to the coincidence degree between the gesture foreground and the projection of the model in the two-dimensional plane.

5. The method of claim 4, wherein the image edge feature based gesture observation likelihood metric is expressed as: p is a radical of_edge＝exp(-d_chamfef(edge,counter)),

6. The method of claim 1, further comprising: adjusting the weight w_edge(Z|x_i) And w_foreground(Z|x_i) So as to obtain a modeling result conforming to a preset accuracy.

7. A three-dimensional tracked gesture observation likelihood modeling apparatus, comprising: the preprocessing module is used for acquiring a gesture image and preprocessing the gesture image; the observation likelihood measurement module is used for performing gesture observation likelihood measurement based on image edge characteristics and gesture observation likelihood measurement based on gesture foreground characteristics according to the preprocessed gesture image; and the construction module is used for constructing a gesture observation likelihood model by combining the gesture observation likelihood metric based on the image edge characteristics and the gesture observation likelihood metric based on the gesture foreground characteristics.

8. The apparatus of claim 7, wherein the gesture observation likelihood model is as follows:

9. The apparatus of claim 7, wherein the preprocessing module is further configured to: and carrying out serialization processing on the gesture image.

10. The apparatus of claim 7 or 8, wherein the observation likelihood metric module is further configured to obtain edge features of the gesture frame image by using edge detection, obtain a two-dimensional gesture contour of the hand model according to projection coordinates of each joint point of the model in a two-dimensional image plane, and calculate similarity between the model state and the observation value according to the edge and the gesture contour; and comparing the similarity of the model state and the current observed value according to the coincidence degree between the gesture foreground and the projection of the model in the two-dimensional plane.

11. The apparatus of claim 7, further comprising: the adjusting module is used for adjusting the weight w_edge(Z|x_i) And w_foreground(Z|x_i) So as to obtain a modeling result conforming to a preset accuracy.