CN110634160B - Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph - Google Patents

Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph Download PDF

Info

Publication number
CN110634160B
CN110634160B CN201910738138.4A CN201910738138A CN110634160B CN 110634160 B CN110634160 B CN 110634160B CN 201910738138 A CN201910738138 A CN 201910738138A CN 110634160 B CN110634160 B CN 110634160B
Authority
CN
China
Prior art keywords
dimensional
key point
image
target
dimensional key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910738138.4A
Other languages
Chinese (zh)
Other versions
CN110634160A (en
Inventor
彭进业
张少博
赵万青
祝轩
李斌
张薇
乐明楠
李展
罗迒哉
王珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN201910738138.4A priority Critical patent/CN110634160B/en
Publication of CN110634160A publication Critical patent/CN110634160A/en
Application granted granted Critical
Publication of CN110634160B publication Critical patent/CN110634160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for constructing a target three-dimensional key point extraction model and identifying a posture in a two-dimensional graph, which can accurately and directly output the coordinates of a target three-dimensional key point by designing a network structure of the three-dimensional key point extraction model; by means of the designed key point loss function, the network can independently learn and extract key points with semantic consistency and geometric consistency in an unsupervised mode, and the accuracy of extracting the three-dimensional key points is improved.

Description

Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph
Technical Field
The invention relates to a target three-dimensional gesture recognition method, in particular to a target three-dimensional key point extraction model construction and gesture recognition method in a two-dimensional graph.
Background
Target three-dimensional gesture recognition refers to recognizing the three-dimensional position and direction of a target object, and is a key module in many computer vision applications such as augmented reality, robot control, and unmanned tasks. However, the three-dimensional gesture recognition of the target is based on the need of extracting three-dimensional key points of the target object, finding the two-dimensional position of the object on the image and extracting some key points such as the projection of the 3D frame of the object on the image, and these methods are very effective by using a large amount of supervision information, but the workload of labeling three-dimensional information on the image is huge, and extremely high professional knowledge and complicated preparation work are required, and these methods cannot process images with occlusion and complex backgrounds.
In addition, even after the three-dimensional key point of the target is obtained, the three-dimensional posture of the target cannot be accurately identified, so that the method for acquiring the three-dimensional posture of the target object in the two-dimensional image in the prior art has the problems of low posture acquisition accuracy, large workload, low real-time performance and low robustness.
Disclosure of Invention
The invention aims to provide a method for constructing a target three-dimensional key point extraction model and identifying a gesture in a two-dimensional image, which is used for solving the problems that the accuracy of the method for identifying the three-dimensional key points of a target object in the two-dimensional image is low, the gesture identification accuracy is low and the like in the prior art.
In order to realize the task, the invention adopts the following technical scheme:
a method for constructing a target three-dimensional key point extraction model in a two-dimensional graph is implemented according to the following steps:
step 1, acquiring a plurality of two-dimensional image groups containing targets to be recognized, wherein two-dimensional images in the two-dimensional image groups are different in image acquisition angle;
obtaining a training image set;
step 2, inputting the training image set into a neural network for training;
the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with a key point extraction sub-network and a target detection sub-network;
the feature extraction sub-network comprises a feature map extraction module and a sensitive interest region extraction module which are sequentially arranged;
the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;
the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;
the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;
the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:
Figure BDA0002162963720000021
wherein [ x ] i ,y i ]Coordinates representing the ith three-dimensional key point, I =1,2, …, I is a positive integer, P i (u, v) represents the probability that the (u, v) th pixel point in the two-dimensional image output by the key point probability calculation sub-network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and u and v are positive integers;
and obtaining a three-dimensional key point extraction model.
Furthermore, the characteristic map extraction module comprises a characteristic pyramid network and a residual error network which are sequentially arranged; the region of interest extraction module comprises a region generation network.
Further, the key point probability obtaining module comprises a plurality of convolution blocks, an up-sampling layer and a softmax layer which are sequentially connected in series;
the convolution block comprises a convolution layer and a ReLU active layer which are connected in sequence.
Further, the loss function L of the three-dimensional key point extraction model is as follows:
Figure BDA0002162963720000031
wherein,
Figure BDA0002162963720000032
represents the sum of the classification loss functions of all negative examples,
Figure BDA0002162963720000033
target classification loss function L representing all positive samples class Bounding box detection loss function L box And a keypoint detection loss function L keypoints The sum of beta and gamma is more than 0;
the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target in the interesting region; the positive sample is a two-dimensional image of a target contained in the interesting region extracted by the interesting region extraction module;
wherein the key point detects a loss function
Figure BDA0002162963720000034
Wherein L is dis To be a significant loss function, L dep For depth prediction loss function, L con As a three-dimensional consistency loss function, L sep As a function of separation loss, L pose Estimate the loss function for the relative attitude, tau, epsilon, mu,
Figure BDA0002162963720000035
Are both greater than 0.
Further, the beta, gamma, tau, epsilon, mu,
Figure BDA0002162963720000036
both are 1, and δ is 0.08.
A method for extracting a target three-dimensional key point in a two-dimensional graph is implemented according to the following steps:
step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;
and B, inputting the image to be recognized into a three-dimensional key point extraction model constructed by the construction method of the target three-dimensional key point extraction model in the two-dimensional graph to obtain a three-dimensional key point set of the target to be recognized, wherein the three-dimensional key point set comprises Q three-dimensional key points, and Q is a positive integer.
A method for recognizing a three-dimensional posture of a target in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the target in the two-dimensional image and is executed according to the following steps:
step I, acquiring a two-dimensional image containing a target to be identified, and acquiring an image to be identified;
step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the method for extracting the three-dimensional key points of the target in the two-dimensional graph as claimed in claim 6;
step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;
the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by adopting the method for extracting the target three-dimensional key point in the two-dimensional graph of claim 6 for each reference image;
taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;
step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;
subtracting the coordinate of the mass center of the three-dimensional key point set of the contrast image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the contrast image to obtain a new three-dimensional key point set of the contrast image;
v, using a singular value decomposition method to carry out the step
Figure BDA0002162963720000051
Decomposing to obtain a rotation matrix R;
wherein, X' n The coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P' n For the coordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, N P As a new three-dimensional set of key points of the object to be identified orThe total number of the three-dimensional key points in the three-dimensional key point set of the new contrast image;
VI, obtaining a posture matrix T = [ R | T =]Wherein t = μ X -Rμ P ,μ X Mean coordinate, mu, of a three-dimensional set of key points for a new object to be recognized P The average coordinates of the three-dimensional key point set of the new contrast image;
step VII, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula III input
T input =T·T ref Formula III
Wherein T is ref Is a three-dimensional pose matrix of objects in the contrast image.
Compared with the prior art, the invention has the following technical effects:
1. according to the method for constructing and extracting the target three-dimensional key point extraction model in the two-dimensional graph, the coordinates of the target three-dimensional key point can be accurately and directly output by designing the network structure of the three-dimensional key point extraction model; by the aid of the designed key point loss function, the network can independently learn and extract key points with semantic consistency and geometric consistency in an unsupervised mode, and accuracy of three-dimensional key point extraction is improved;
2. according to the method for constructing and extracting the target three-dimensional key point extraction model in the two-dimensional graph, in the network training stage, no three-dimensional model of any object or three-dimensional marking on the image is needed, compared with the existing method, the workload of marking can be greatly reduced, and the efficiency of the extraction method is improved;
3. according to the method for recognizing the three-dimensional posture of the target in the two-dimensional graph, the three-dimensional space coordinate system is established by setting the comparison image, and the recognition precision is improved.
Drawings
FIG. 1 is an internal structure diagram of a three-dimensional key point extraction model provided by the present invention;
FIG. 2 is a diagram of an internal structure of a keypoint probability acquisition module provided in an embodiment of the present invention;
FIG. 3 is an image to be recognized provided in an embodiment of the present invention;
fig. 4 is an image representation of a three-dimensional key point set obtained by performing three-dimensional key point extraction on the image to be recognized shown in fig. 3 according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
The following definitions or conceptual connotations relating to the present invention are provided for illustration:
three-dimensional key points: located on a structure where the object is more prominent, represents a local feature of the surface of the object that is rotationally invariant with respect to the object.
A bounding box: for marking the position of an object in the image.
Significance loss function: and (5) using the light and shade characteristics to enable the key points to fall on the object significance positions.
Depth prediction loss function: the epipolar geometry principle is used to train the network so that the depth of the key points can be accurately predicted.
Three-dimensional consistency loss function: for ensuring that the same area can be stably tracked under different viewing angles.
Separation loss function: and a certain distance is reserved between every two key points, so that the key points are prevented from being overlapped.
Relative attitude estimation loss function: is a penalty term for the angle, i.e. the difference between the true value of the angle of the relative pose of the camera between the pair of input images and the relative angle estimated from the detected keypoints, this term loss contributes to the generation of a meaningful and natural set of 3D keypoints.
Rotating the matrix: the rotation matrix is used to describe the rotation of the object around the x, y, z axes, and is a 3x3 orthogonal matrix with a determinant of 1.
An attitude matrix: the method comprises the following steps of [ R | T ], wherein R is a rotation matrix, and T is a translation matrix, and rotation information and translation information of an object in a three-dimensional space are described.
Example one
The embodiment discloses a method for constructing a three-dimensional key point extraction model of a target in a two-dimensional graph. In the present embodiment, the three-dimensional keypoint extraction model is used to extract three-dimensional keypoints with geometric consistency and semantic consistency on a target object in an image, and in the present embodiment, the three-dimensional keypoints on the object are directly predicted by using an autonomously designed CNN network.
The method comprises the following steps:
the method comprises the following steps that 1, a plurality of two-dimensional image groups containing targets to be identified are obtained, and two-dimensional images in the two-dimensional image groups are different in image acquisition angle;
obtaining a training image set;
in this embodiment, because the depth information of the corresponding key points in the two images can be calculated by epipolar geometry using two different angles, a multitask loss function is trained by using two pictures of the same object taken from different viewpoints and the relative posture change between the viewpoints during training, so that the network can predict the key points with geometric consistency and semantic consistency on the object.
Step 2, inputting the training image set into a neural network for training;
the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with the key point extraction sub-network and the target detection sub-network;
the feature extraction sub-network comprises a feature map extraction module and an interested region extraction module which are sequentially arranged;
the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;
the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;
the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;
the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:
Figure BDA0002162963720000081
wherein [ x ] i ,y i ]Coordinates representing the ith three-dimensional keypoint, I =1,2, …, I being a positive integer, P i (u, v) represents the probability that the (u, v) th pixel point in the two-dimensional image output by the key point probability computation sub-network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and u and v are both positive integers;
and obtaining a three-dimensional key point extraction model.
In this embodiment, as shown in fig. 1, the image is first input into the feature extraction sub-network, the region-of-interest sub-image of the image is output, the region-of-interest sub-image is input into the key point extraction sub-network and the target detection sub-network, respectively, the key point coordinates of the target in the region-of-interest sub-image are output by the key point detection sub-network, and the classification of the target in the region-of-interest sub-image and the coordinates of the bounding box are output by the target detection sub-network, wherein the bounding box is used to frame out the target in the two-dimensional image.
In this embodiment, the feature extraction sub-networks are spliced by using the existing CNN network, and specifically, the feature map extraction module includes a feature pyramid network and a residual error network that are sequentially arranged; the region of interest extraction module includes a region generation network.
In this embodiment, the function of implementing target classification detection and bounding box coordinate output in the target detection sub-network can be implemented by using the CNN network in the prior art.
In this embodiment, different from the prior art, the key point extraction sub-network first obtains the probability of the three-dimensional key point of each pixel point, and obtains the coordinate of each three-dimensional key point through the probability accumulation.
Optionally, the key point probability obtaining module includes a plurality of volume blocks, an upper sampling layer and a softmax layer connected in series in sequence;
the convolution block includes a convolution layer and a ReLU active layer connected in sequence.
In this embodiment, as shown in fig. 2, the key point probability obtaining module includes 4 volume blocks, an upsampling layer, and a softmax layer, which are connected in series in sequence.
Where the convolution kernel size in the convolution block is 3x3.
Optionally, the loss function L of the three-dimensional key point extraction model is:
Figure BDA0002162963720000101
wherein,
Figure BDA0002162963720000102
represents the sum of the classification loss functions of all negative examples,
Figure BDA0002162963720000103
target classification loss function L representing all positive samples class Bounding box detection loss function L box And a keypoint detection loss function L keypoints The sum of beta and gamma is more than 0;
the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target; the positive sample is a two-dimensional image of the region of interest containing the target, which is extracted by the region of interest extraction module;
wherein the key point detects the loss function
Figure BDA0002162963720000106
Wherein L is dis As a function of significant loss, L dep For depth prediction loss function, L con Is a three-dimensional uniform loss function, L sep As a function of separation loss, L pose Estimate the loss function for the relative attitude, tau, epsilon, mu,
Figure BDA0002162963720000107
Are all greater than 0.
Alternatively, the values of β, γ, τ,ε,μ,
Figure BDA0002162963720000108
is 1, and delta is 0.08.
In this step, a saliency loss function is used to ensure that the three-dimensional keypoints fall within the saliency areas of the object,
Figure BDA0002162963720000104
wherein l (x) i ,y i ) X, y-axis coordinates, P, of salient region representing the ith three-dimensional keypoint i (x i ,y i ) X, y-axis coordinates representing the ith three-dimensional keypoint, i =1,2, … …, N being the total number of three-dimensional keypoints, N being a positive integer, N =10 in this embodiment, where l (x = 10) i ,y i ) = l (u, v) was obtained using the following procedure:
step (1), carrying out Gaussian filtering on the image and obtaining a Hessian matrix of each pixel,
Figure BDA0002162963720000105
the determinant of each hessian matrix is calculated.
And finding out the point with the maximum determinant value in the range of 3*3 as the point on the salient region by using a non-maximum suppression algorithm.
With the generation of a profile of the salient region:
Figure BDA0002162963720000111
in this step, the depth prediction penalty function reduces the error of the predicted depth from the depth calculated by the epipolar geometry,
Figure BDA0002162963720000112
wherein z is i Is the Z-axis coordinate, Z, of the ith three-dimensional key point in one two-dimensional image in the two-dimensional image group i ' is the ith three-dimensional relation in another two-dimensional image in the two-dimensional image groupZ-axis coordinate of key point, d i The depth of the ith three-dimensional key point in one two-dimensional image in the two-dimensional image group, d i ' is the depth of the ith three-dimensional key point in the other two-dimensional image in the two-dimensional image group.
And calculating d 'by using a formula d' e ^ Re '+ e ^ t =0 and using a least square method, and calculating the depth d of the other point by using a formula d' e ^ Re '+ e ^ t =0, wherein e and e' are matched key points on the two images in the image group during training.
Wherein the three-dimensional consistency loss function maintains the position of the three-dimensional keypoints with respect to the object,
Figure BDA0002162963720000113
m i the coordinates of one of the images in the two-dimensional image set for the three-dimensional keypoint,
Figure BDA0002162963720000114
the coordinates of the three-dimensional keypoint in another image in the two-dimensional image set.
Wherein the separation loss function ensures that a certain distance is maintained between three-dimensional key points and does not fall on the same point,
Figure BDA0002162963720000115
(x i ,y i ,z i ) Is the ith key coordinate, (x) j ,y j ,z j ) For the coordinates of the jth key, i ≠ j, δ is the distance between the set key points.
Wherein the relative pose estimation loss function makes the obtained three-dimensional key points more suitable for the pose estimation task,
Figure BDA0002162963720000121
and R' is the posture change of the object in the two images calculated by utilizing the key points, and R is group-Truth.
Example two
A method for extracting three-dimensional key points of a target in a two-dimensional graph is implemented according to the following steps:
step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;
and step B, inputting the image to be recognized into the three-dimensional key point extraction model constructed by the construction method of the target three-dimensional key point extraction model in the two-dimensional graph of the embodiment, and obtaining the three-dimensional key point set of the target to be recognized.
In this embodiment, the extraction of three-dimensional keypoints is performed on the image to be recognized as shown in fig. 3, and an image representation of a three-dimensional keypoint set as shown in fig. 4 is obtained.
EXAMPLE III
A method for recognizing a three-dimensional posture of a target in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the target in the two-dimensional image and is executed according to the following steps:
step I, acquiring a two-dimensional image containing a target to be identified, and acquiring the image to be identified;
step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the second embodiment method;
step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;
the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by each reference image by adopting the method of the second embodiment;
taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;
step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;
subtracting the coordinate of the mass center of the three-dimensional key point set of the comparison image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the comparison image to obtain a new three-dimensional key point set of the comparison image;
v, using a singular value decomposition method to carry out the step
Figure BDA0002162963720000131
Decomposing to obtain a rotation matrix R;
wherein, X' n The coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P' n Coordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, N P The total number of the three-dimensional key points in the three-dimensional key point set of the new target to be identified or the three-dimensional key point set of the new contrast image is set;
VI, obtaining a posture matrix T = [ R | T =]Where t = μ x -Rμ p ,μ x Mean coordinate, mu, of a three-dimensional set of key points for a new object to be recognized p Average coordinates of the three-dimensional key point set of the new contrast image;
step VII, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula III input
T input =T·T ref Formula III
Wherein T is ref Is a three-dimensional pose matrix of objects in the contrast image.
In this embodiment, a three-dimensional key point set X of the target to be recognized in the image to be recognized and a three-dimensional key point set P of the contrast image, N in this embodiment, are obtained through step III P =10。
Figure BDA0002162963720000141
Figure BDA0002162963720000142
IV, obtaining the coordinates of the three-dimensional key point set centroid of the target to be identified
Figure RE-GDA0002262716110000143
Obtaining coordinates of centroids of three-dimensional keypoint sets of contrast images
Figure RE-GDA0002262716110000144
Subtracting the coordinate of each three-dimensional key point in the three-dimensional key point set X of the target to be recognized from the coordinate of the mass center of the three-dimensional key point set X of the target to be recognized
Figure BDA0002162963720000151
Obtaining a new three-dimensional key point set X' of the target to be identified;
a new three-dimensional keypoint set P' of the contrast image is also obtained.
V, processing each pair of three-dimensional key points in the new three-dimensional key point set X 'of the target to be identified and the three-dimensional key point set P' of the new comparison image to obtain a total matrix W,
Figure BDA0002162963720000152
and then carrying out singular value decomposition on the total matrix W to obtain a rotation matrix R:
Figure BDA0002162963720000153
obtaining an attitude matrix through the step VI:
T=[R|t]=[-0.05903081 -0.02168849 -0.01671735]
where | represents the integer division of the matrix.
Then, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula III input
T input =[-0.05903081 -0.02168849 -0.01671735]·T ref
Where · denotes the multiplication of the matrix.

Claims (7)

1. A method for constructing a three-dimensional key point extraction model of a target in a two-dimensional graph is characterized by comprising the following steps of:
the method comprises the following steps that 1, a plurality of two-dimensional image groups containing targets to be identified are obtained, and the two-dimensional images in the two-dimensional image groups are different in image acquisition angle;
obtaining a training image set;
step 2, inputting the training image set into a neural network for training;
the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with a key point extraction sub-network and a target detection sub-network;
the feature extraction sub-network comprises a feature map extraction module and an interested region extraction module which are sequentially arranged;
the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;
the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;
the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;
the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:
Figure FDA0002162963710000011
wherein [ x ] i ,y i ]Coordinates representing the ith three-dimensional keypoint, I =1,2, …, I being a positive integer, P i (u, v) represents the probability that the (u, v) th pixel point in the two-dimensional image output by the key point probability calculation sub-network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and both u and v are positive integers;
and obtaining a three-dimensional key point extraction model.
2. The method for constructing the extraction model of the target three-dimensional key points in the two-dimensional graph as claimed in claim 1, wherein the feature map extraction module comprises a feature pyramid network and a residual error network which are sequentially arranged; the region of interest extraction module comprises a region generation network.
3. The method for constructing the three-dimensional key point extraction model of the target in the two-dimensional graph, as claimed in claim 1, wherein the key point probability obtaining module comprises a plurality of volume blocks, an upsampling layer and a softmax layer which are connected in series in sequence;
the convolution block comprises a convolution layer and a ReLU active layer which are connected in sequence.
4. The method for constructing a three-dimensional key point extraction model of an object in a two-dimensional graph according to claim 1, wherein the loss function L of the three-dimensional key point extraction model is as follows:
Figure FDA0002162963710000021
wherein,
Figure FDA0002162963710000022
represents the sum of the classification loss functions of all negative examples,
Figure FDA0002162963710000023
target classification penalty function L representing all positive samples class Bounding box detection loss function L box And a keypoint detection loss function L keypoints The sum of beta and gamma is more than 0;
the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target in the interesting region; the positive sample is a two-dimensional image of an interesting region extracted by the interesting region extraction module and containing a target;
wherein the key point detects the loss function
Figure FDA0002162963710000024
Wherein L is dis As a function of significant loss, L dep For depth prediction loss function, L con As a three-dimensional consistency loss function, L sep As a function of separation loss, L pose Estimate the loss function for the relative attitude, tau, epsilon, mu,
Figure FDA0002162963710000031
Are all greater than 0.
5. The method for constructing a model for extracting three-dimensional key points of an object from a two-dimensional figure as claimed in claim 4, wherein said values β, γ, τ, ε, μ,
Figure FDA0002162963710000032
both are 1, and δ is 0.08.
6. A method for extracting a target three-dimensional key point in a two-dimensional graph is characterized by comprising the following steps:
step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;
and step B, inputting the image to be recognized into the three-dimensional key point extraction model constructed by the method for constructing the three-dimensional key point extraction model of the target in the two-dimensional graph according to any one of claims 1 to 5, and obtaining a three-dimensional key point set of the target to be recognized, wherein the three-dimensional key point set comprises Q three-dimensional key points, and Q is a positive integer.
7. A method for recognizing a three-dimensional posture of an object in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the object in the two-dimensional image, and is characterized by comprising the following steps:
step I, acquiring a two-dimensional image containing a target to be identified, and acquiring the image to be identified;
step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the method for extracting the three-dimensional key points of the target in the two-dimensional graph as claimed in claim 6;
step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;
the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by adopting the method for extracting the target three-dimensional key point in the two-dimensional graph of claim 6 for each reference image;
taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;
step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;
subtracting the coordinate of the mass center of the three-dimensional key point set of the contrast image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the contrast image to obtain a new three-dimensional key point set of the contrast image;
v, using singular value decomposition method to
Figure FDA0002162963710000041
Decomposing to obtain a rotation matrix R;
wherein, X' n The coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P' n Coordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, N P The total number of the three-dimensional key points in the three-dimensional key point set of the new target to be recognized or the three-dimensional key point set of the new contrast image is obtained;
VI, obtaining a posture matrix T = [ R | T =]Wherein t = μ X -Rμ P ,μ X Is the mean coordinate, mu, of the three-dimensional set of key points of the new object to be identified P Average coordinates of the three-dimensional keypoint set of the new contrast image;
VII, obtaining the image to be identified by adopting a formula IIIThree-dimensional attitude matrix T for recognizing target input
T input =T·T ref Formula III
Wherein T is ref Is a three-dimensional pose matrix of objects in the contrast image.
CN201910738138.4A 2019-08-12 2019-08-12 Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph Active CN110634160B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910738138.4A CN110634160B (en) 2019-08-12 2019-08-12 Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910738138.4A CN110634160B (en) 2019-08-12 2019-08-12 Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph

Publications (2)

Publication Number Publication Date
CN110634160A CN110634160A (en) 2019-12-31
CN110634160B true CN110634160B (en) 2022-11-18

Family

ID=68969864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910738138.4A Active CN110634160B (en) 2019-08-12 2019-08-12 Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph

Country Status (1)

Country Link
CN (1) CN110634160B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783820B (en) * 2020-05-08 2024-04-16 北京沃东天骏信息技术有限公司 Image labeling method and device
CN114926610A (en) * 2022-05-27 2022-08-19 北京达佳互联信息技术有限公司 Position determination model training method, position determination method, device and medium
CN115661577B (en) * 2022-11-01 2024-04-16 吉咖智能机器人有限公司 Method, apparatus and computer readable storage medium for object detection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295567A (en) * 2016-08-10 2017-01-04 腾讯科技(深圳)有限公司 The localization method of a kind of key point and terminal
US9898682B1 (en) * 2012-01-22 2018-02-20 Sr2 Group, Llc System and method for tracking coherently structured feature dynamically defined within migratory medium
CN108830172A (en) * 2018-05-24 2018-11-16 天津大学 Aircraft remote sensing images detection method based on depth residual error network and SV coding
CN109598234A (en) * 2018-12-04 2019-04-09 深圳美图创新科技有限公司 Critical point detection method and apparatus
CN110020633A (en) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 Training method, image-recognizing method and the device of gesture recognition model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9898682B1 (en) * 2012-01-22 2018-02-20 Sr2 Group, Llc System and method for tracking coherently structured feature dynamically defined within migratory medium
CN106295567A (en) * 2016-08-10 2017-01-04 腾讯科技(深圳)有限公司 The localization method of a kind of key point and terminal
CN108830172A (en) * 2018-05-24 2018-11-16 天津大学 Aircraft remote sensing images detection method based on depth residual error network and SV coding
CN109598234A (en) * 2018-12-04 2019-04-09 深圳美图创新科技有限公司 Critical point detection method and apparatus
CN110020633A (en) * 2019-04-12 2019-07-16 腾讯科技(深圳)有限公司 Training method, image-recognizing method and the device of gesture recognition model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种多特征相结合的三维人脸关键点检测方法;冯超等;《液晶与显示》;20180415(第04期);全文 *
小样本条件下基于随机森林和Haar特征的多姿态人脸识别;周致富等;《计算机应用与软件》;20151215(第12期);全文 *

Also Published As

Publication number Publication date
CN110634160A (en) 2019-12-31

Similar Documents

Publication Publication Date Title
CN112634451B (en) Outdoor large-scene three-dimensional mapping method integrating multiple sensors
CN111325843B (en) Real-time semantic map construction method based on semantic inverse depth filtering
Fraundorfer et al. Visual odometry: Part ii: Matching, robustness, optimization, and applications
CN111340797A (en) Laser radar and binocular camera data fusion detection method and system
CN110634160B (en) Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph
JP2021163503A (en) Three-dimensional pose estimation by two-dimensional camera
CN112712589A (en) Plant 3D modeling method and system based on laser radar and deep learning
EP3185212B1 (en) Dynamic particle filter parameterization
CN113267761A (en) Laser radar target detection and identification method and system and computer readable storage medium
Zhao et al. Visual odometry-A review of approaches
Cai et al. Improving CNN-based planar object detection with geometric prior knowledge
Islam et al. MVS‐SLAM: Enhanced multiview geometry for improved semantic RGBD SLAM in dynamic environment
Rogelio et al. Object detection and segmentation using Deeplabv3 deep neural network for a portable X-ray source model
Guan et al. Relative pose estimation for multi-camera systems from affine correspondences
Hwang et al. Frame-to-frame visual odometry estimation network with error relaxation method
CN114913289B (en) Three-dimensional dynamic uncertainty semantic SLAM method for production workshop
Chen et al. End-to-end multi-view structure-from-motion with hypercorrelation volume
Chen et al. Epipole Estimation under Pure Camera Translation.
Nakano Stereo vision based single-shot 6d object pose estimation for bin-picking by a robot manipulator
Wang et al. A Survey on Approaches of Monocular CAD Model-Based 3D Objects Pose Estimation and Tracking
Zhang et al. Dynamic Semantics SLAM Based on Improved Mask R-CNN
CN113239936A (en) Unmanned aerial vehicle visual navigation method based on deep learning and feature point extraction
CN112001223A (en) Rapid virtualization construction method of real environment map
Khurshid et al. Vision Based 3D Localization of UAV Using Deep Image Matching
Peng et al. An improved algorithm for detection and pose estimation of texture-less objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant