CN110634160B - Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph - Google Patents
Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph Download PDFInfo
- Publication number
- CN110634160B CN110634160B CN201910738138.4A CN201910738138A CN110634160B CN 110634160 B CN110634160 B CN 110634160B CN 201910738138 A CN201910738138 A CN 201910738138A CN 110634160 B CN110634160 B CN 110634160B
- Authority
- CN
- China
- Prior art keywords
- dimensional
- key point
- image
- target
- dimensional key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000011159 matrix material Substances 0.000 claims description 36
- 238000001514 detection method Methods 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000000926 separation method Methods 0.000 claims description 5
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000010276 construction Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for constructing a target three-dimensional key point extraction model and identifying a posture in a two-dimensional graph, which can accurately and directly output the coordinates of a target three-dimensional key point by designing a network structure of the three-dimensional key point extraction model; by means of the designed key point loss function, the network can independently learn and extract key points with semantic consistency and geometric consistency in an unsupervised mode, and the accuracy of extracting the three-dimensional key points is improved.
Description
Technical Field
The invention relates to a target three-dimensional gesture recognition method, in particular to a target three-dimensional key point extraction model construction and gesture recognition method in a two-dimensional graph.
Background
Target three-dimensional gesture recognition refers to recognizing the three-dimensional position and direction of a target object, and is a key module in many computer vision applications such as augmented reality, robot control, and unmanned tasks. However, the three-dimensional gesture recognition of the target is based on the need of extracting three-dimensional key points of the target object, finding the two-dimensional position of the object on the image and extracting some key points such as the projection of the 3D frame of the object on the image, and these methods are very effective by using a large amount of supervision information, but the workload of labeling three-dimensional information on the image is huge, and extremely high professional knowledge and complicated preparation work are required, and these methods cannot process images with occlusion and complex backgrounds.
In addition, even after the three-dimensional key point of the target is obtained, the three-dimensional posture of the target cannot be accurately identified, so that the method for acquiring the three-dimensional posture of the target object in the two-dimensional image in the prior art has the problems of low posture acquisition accuracy, large workload, low real-time performance and low robustness.
Disclosure of Invention
The invention aims to provide a method for constructing a target three-dimensional key point extraction model and identifying a gesture in a two-dimensional image, which is used for solving the problems that the accuracy of the method for identifying the three-dimensional key points of a target object in the two-dimensional image is low, the gesture identification accuracy is low and the like in the prior art.
In order to realize the task, the invention adopts the following technical scheme:
a method for constructing a target three-dimensional key point extraction model in a two-dimensional graph is implemented according to the following steps:
step 1, acquiring a plurality of two-dimensional image groups containing targets to be recognized, wherein two-dimensional images in the two-dimensional image groups are different in image acquisition angle;
obtaining a training image set;
step 2, inputting the training image set into a neural network for training;
the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with a key point extraction sub-network and a target detection sub-network;
the feature extraction sub-network comprises a feature map extraction module and a sensitive interest region extraction module which are sequentially arranged;
the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;
the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;
the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;
the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:
wherein [ x ] i ,y i ]Coordinates representing the ith three-dimensional key point, I =1,2, …, I is a positive integer, P i (u, v) represents the probability that the (u, v) th pixel point in the two-dimensional image output by the key point probability calculation sub-network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and u and v are positive integers;
and obtaining a three-dimensional key point extraction model.
Furthermore, the characteristic map extraction module comprises a characteristic pyramid network and a residual error network which are sequentially arranged; the region of interest extraction module comprises a region generation network.
Further, the key point probability obtaining module comprises a plurality of convolution blocks, an up-sampling layer and a softmax layer which are sequentially connected in series;
the convolution block comprises a convolution layer and a ReLU active layer which are connected in sequence.
Further, the loss function L of the three-dimensional key point extraction model is as follows:
wherein,represents the sum of the classification loss functions of all negative examples,target classification loss function L representing all positive samples class Bounding box detection loss function L box And a keypoint detection loss function L keypoints The sum of beta and gamma is more than 0;
the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target in the interesting region; the positive sample is a two-dimensional image of a target contained in the interesting region extracted by the interesting region extraction module;
wherein the key point detects a loss functionWherein L is dis To be a significant loss function, L dep For depth prediction loss function, L con As a three-dimensional consistency loss function, L sep As a function of separation loss, L pose Estimate the loss function for the relative attitude, tau, epsilon, mu,Are both greater than 0.
A method for extracting a target three-dimensional key point in a two-dimensional graph is implemented according to the following steps:
step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;
and B, inputting the image to be recognized into a three-dimensional key point extraction model constructed by the construction method of the target three-dimensional key point extraction model in the two-dimensional graph to obtain a three-dimensional key point set of the target to be recognized, wherein the three-dimensional key point set comprises Q three-dimensional key points, and Q is a positive integer.
A method for recognizing a three-dimensional posture of a target in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the target in the two-dimensional image and is executed according to the following steps:
step I, acquiring a two-dimensional image containing a target to be identified, and acquiring an image to be identified;
step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the method for extracting the three-dimensional key points of the target in the two-dimensional graph as claimed in claim 6;
step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;
the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by adopting the method for extracting the target three-dimensional key point in the two-dimensional graph of claim 6 for each reference image;
taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;
step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;
subtracting the coordinate of the mass center of the three-dimensional key point set of the contrast image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the contrast image to obtain a new three-dimensional key point set of the contrast image;
v, using a singular value decomposition method to carry out the stepDecomposing to obtain a rotation matrix R;
wherein, X' n The coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P' n For the coordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, N P As a new three-dimensional set of key points of the object to be identified orThe total number of the three-dimensional key points in the three-dimensional key point set of the new contrast image;
VI, obtaining a posture matrix T = [ R | T =]Wherein t = μ X -Rμ P ,μ X Mean coordinate, mu, of a three-dimensional set of key points for a new object to be recognized P The average coordinates of the three-dimensional key point set of the new contrast image;
step VII, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula III input :
T input =T·T ref Formula III
Wherein T is ref Is a three-dimensional pose matrix of objects in the contrast image.
Compared with the prior art, the invention has the following technical effects:
1. according to the method for constructing and extracting the target three-dimensional key point extraction model in the two-dimensional graph, the coordinates of the target three-dimensional key point can be accurately and directly output by designing the network structure of the three-dimensional key point extraction model; by the aid of the designed key point loss function, the network can independently learn and extract key points with semantic consistency and geometric consistency in an unsupervised mode, and accuracy of three-dimensional key point extraction is improved;
2. according to the method for constructing and extracting the target three-dimensional key point extraction model in the two-dimensional graph, in the network training stage, no three-dimensional model of any object or three-dimensional marking on the image is needed, compared with the existing method, the workload of marking can be greatly reduced, and the efficiency of the extraction method is improved;
3. according to the method for recognizing the three-dimensional posture of the target in the two-dimensional graph, the three-dimensional space coordinate system is established by setting the comparison image, and the recognition precision is improved.
Drawings
FIG. 1 is an internal structure diagram of a three-dimensional key point extraction model provided by the present invention;
FIG. 2 is a diagram of an internal structure of a keypoint probability acquisition module provided in an embodiment of the present invention;
FIG. 3 is an image to be recognized provided in an embodiment of the present invention;
fig. 4 is an image representation of a three-dimensional key point set obtained by performing three-dimensional key point extraction on the image to be recognized shown in fig. 3 according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
The following definitions or conceptual connotations relating to the present invention are provided for illustration:
three-dimensional key points: located on a structure where the object is more prominent, represents a local feature of the surface of the object that is rotationally invariant with respect to the object.
A bounding box: for marking the position of an object in the image.
Significance loss function: and (5) using the light and shade characteristics to enable the key points to fall on the object significance positions.
Depth prediction loss function: the epipolar geometry principle is used to train the network so that the depth of the key points can be accurately predicted.
Three-dimensional consistency loss function: for ensuring that the same area can be stably tracked under different viewing angles.
Separation loss function: and a certain distance is reserved between every two key points, so that the key points are prevented from being overlapped.
Relative attitude estimation loss function: is a penalty term for the angle, i.e. the difference between the true value of the angle of the relative pose of the camera between the pair of input images and the relative angle estimated from the detected keypoints, this term loss contributes to the generation of a meaningful and natural set of 3D keypoints.
Rotating the matrix: the rotation matrix is used to describe the rotation of the object around the x, y, z axes, and is a 3x3 orthogonal matrix with a determinant of 1.
An attitude matrix: the method comprises the following steps of [ R | T ], wherein R is a rotation matrix, and T is a translation matrix, and rotation information and translation information of an object in a three-dimensional space are described.
Example one
The embodiment discloses a method for constructing a three-dimensional key point extraction model of a target in a two-dimensional graph. In the present embodiment, the three-dimensional keypoint extraction model is used to extract three-dimensional keypoints with geometric consistency and semantic consistency on a target object in an image, and in the present embodiment, the three-dimensional keypoints on the object are directly predicted by using an autonomously designed CNN network.
The method comprises the following steps:
the method comprises the following steps that 1, a plurality of two-dimensional image groups containing targets to be identified are obtained, and two-dimensional images in the two-dimensional image groups are different in image acquisition angle;
obtaining a training image set;
in this embodiment, because the depth information of the corresponding key points in the two images can be calculated by epipolar geometry using two different angles, a multitask loss function is trained by using two pictures of the same object taken from different viewpoints and the relative posture change between the viewpoints during training, so that the network can predict the key points with geometric consistency and semantic consistency on the object.
Step 2, inputting the training image set into a neural network for training;
the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with the key point extraction sub-network and the target detection sub-network;
the feature extraction sub-network comprises a feature map extraction module and an interested region extraction module which are sequentially arranged;
the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;
the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;
the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;
the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:
wherein [ x ] i ,y i ]Coordinates representing the ith three-dimensional keypoint, I =1,2, …, I being a positive integer, P i (u, v) represents the probability that the (u, v) th pixel point in the two-dimensional image output by the key point probability computation sub-network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and u and v are both positive integers;
and obtaining a three-dimensional key point extraction model.
In this embodiment, as shown in fig. 1, the image is first input into the feature extraction sub-network, the region-of-interest sub-image of the image is output, the region-of-interest sub-image is input into the key point extraction sub-network and the target detection sub-network, respectively, the key point coordinates of the target in the region-of-interest sub-image are output by the key point detection sub-network, and the classification of the target in the region-of-interest sub-image and the coordinates of the bounding box are output by the target detection sub-network, wherein the bounding box is used to frame out the target in the two-dimensional image.
In this embodiment, the feature extraction sub-networks are spliced by using the existing CNN network, and specifically, the feature map extraction module includes a feature pyramid network and a residual error network that are sequentially arranged; the region of interest extraction module includes a region generation network.
In this embodiment, the function of implementing target classification detection and bounding box coordinate output in the target detection sub-network can be implemented by using the CNN network in the prior art.
In this embodiment, different from the prior art, the key point extraction sub-network first obtains the probability of the three-dimensional key point of each pixel point, and obtains the coordinate of each three-dimensional key point through the probability accumulation.
Optionally, the key point probability obtaining module includes a plurality of volume blocks, an upper sampling layer and a softmax layer connected in series in sequence;
the convolution block includes a convolution layer and a ReLU active layer connected in sequence.
In this embodiment, as shown in fig. 2, the key point probability obtaining module includes 4 volume blocks, an upsampling layer, and a softmax layer, which are connected in series in sequence.
Where the convolution kernel size in the convolution block is 3x3.
Optionally, the loss function L of the three-dimensional key point extraction model is:
wherein,represents the sum of the classification loss functions of all negative examples,target classification loss function L representing all positive samples class Bounding box detection loss function L box And a keypoint detection loss function L keypoints The sum of beta and gamma is more than 0;
the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target; the positive sample is a two-dimensional image of the region of interest containing the target, which is extracted by the region of interest extraction module;
wherein the key point detects the loss functionWherein L is dis As a function of significant loss, L dep For depth prediction loss function, L con Is a three-dimensional uniform loss function, L sep As a function of separation loss, L pose Estimate the loss function for the relative attitude, tau, epsilon, mu,Are all greater than 0.
In this step, a saliency loss function is used to ensure that the three-dimensional keypoints fall within the saliency areas of the object,wherein l (x) i ,y i ) X, y-axis coordinates, P, of salient region representing the ith three-dimensional keypoint i (x i ,y i ) X, y-axis coordinates representing the ith three-dimensional keypoint, i =1,2, … …, N being the total number of three-dimensional keypoints, N being a positive integer, N =10 in this embodiment, where l (x = 10) i ,y i ) = l (u, v) was obtained using the following procedure:
step (1), carrying out Gaussian filtering on the image and obtaining a Hessian matrix of each pixel,
the determinant of each hessian matrix is calculated.
And finding out the point with the maximum determinant value in the range of 3*3 as the point on the salient region by using a non-maximum suppression algorithm.
With the generation of a profile of the salient region:
in this step, the depth prediction penalty function reduces the error of the predicted depth from the depth calculated by the epipolar geometry,wherein z is i Is the Z-axis coordinate, Z, of the ith three-dimensional key point in one two-dimensional image in the two-dimensional image group i ' is the ith three-dimensional relation in another two-dimensional image in the two-dimensional image groupZ-axis coordinate of key point, d i The depth of the ith three-dimensional key point in one two-dimensional image in the two-dimensional image group, d i ' is the depth of the ith three-dimensional key point in the other two-dimensional image in the two-dimensional image group.
And calculating d 'by using a formula d' e ^ Re '+ e ^ t =0 and using a least square method, and calculating the depth d of the other point by using a formula d' e ^ Re '+ e ^ t =0, wherein e and e' are matched key points on the two images in the image group during training.
Wherein the three-dimensional consistency loss function maintains the position of the three-dimensional keypoints with respect to the object,m i the coordinates of one of the images in the two-dimensional image set for the three-dimensional keypoint,the coordinates of the three-dimensional keypoint in another image in the two-dimensional image set.
Wherein the separation loss function ensures that a certain distance is maintained between three-dimensional key points and does not fall on the same point,(x i ,y i ,z i ) Is the ith key coordinate, (x) j ,y j ,z j ) For the coordinates of the jth key, i ≠ j, δ is the distance between the set key points.
Wherein the relative pose estimation loss function makes the obtained three-dimensional key points more suitable for the pose estimation task,and R' is the posture change of the object in the two images calculated by utilizing the key points, and R is group-Truth.
Example two
A method for extracting three-dimensional key points of a target in a two-dimensional graph is implemented according to the following steps:
step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;
and step B, inputting the image to be recognized into the three-dimensional key point extraction model constructed by the construction method of the target three-dimensional key point extraction model in the two-dimensional graph of the embodiment, and obtaining the three-dimensional key point set of the target to be recognized.
In this embodiment, the extraction of three-dimensional keypoints is performed on the image to be recognized as shown in fig. 3, and an image representation of a three-dimensional keypoint set as shown in fig. 4 is obtained.
EXAMPLE III
A method for recognizing a three-dimensional posture of a target in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the target in the two-dimensional image and is executed according to the following steps:
step I, acquiring a two-dimensional image containing a target to be identified, and acquiring the image to be identified;
step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the second embodiment method;
step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;
the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by each reference image by adopting the method of the second embodiment;
taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;
step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;
subtracting the coordinate of the mass center of the three-dimensional key point set of the comparison image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the comparison image to obtain a new three-dimensional key point set of the comparison image;
v, using a singular value decomposition method to carry out the stepDecomposing to obtain a rotation matrix R;
wherein, X' n The coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P' n Coordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, N P The total number of the three-dimensional key points in the three-dimensional key point set of the new target to be identified or the three-dimensional key point set of the new contrast image is set;
VI, obtaining a posture matrix T = [ R | T =]Where t = μ x -Rμ p ,μ x Mean coordinate, mu, of a three-dimensional set of key points for a new object to be recognized p Average coordinates of the three-dimensional key point set of the new contrast image;
step VII, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula III input :
T input =T·T ref Formula III
Wherein T is ref Is a three-dimensional pose matrix of objects in the contrast image.
In this embodiment, a three-dimensional key point set X of the target to be recognized in the image to be recognized and a three-dimensional key point set P of the contrast image, N in this embodiment, are obtained through step III P =10。
IV, obtaining the coordinates of the three-dimensional key point set centroid of the target to be identifiedObtaining coordinates of centroids of three-dimensional keypoint sets of contrast images
Subtracting the coordinate of each three-dimensional key point in the three-dimensional key point set X of the target to be recognized from the coordinate of the mass center of the three-dimensional key point set X of the target to be recognizedObtaining a new three-dimensional key point set X' of the target to be identified;
a new three-dimensional keypoint set P' of the contrast image is also obtained.
V, processing each pair of three-dimensional key points in the new three-dimensional key point set X 'of the target to be identified and the three-dimensional key point set P' of the new comparison image to obtain a total matrix W,
and then carrying out singular value decomposition on the total matrix W to obtain a rotation matrix R:
obtaining an attitude matrix through the step VI:
T=[R|t]=[-0.05903081 -0.02168849 -0.01671735]
where | represents the integer division of the matrix.
Then, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula III input :
T input =[-0.05903081 -0.02168849 -0.01671735]·T ref
Where · denotes the multiplication of the matrix.
Claims (7)
1. A method for constructing a three-dimensional key point extraction model of a target in a two-dimensional graph is characterized by comprising the following steps of:
the method comprises the following steps that 1, a plurality of two-dimensional image groups containing targets to be identified are obtained, and the two-dimensional images in the two-dimensional image groups are different in image acquisition angle;
obtaining a training image set;
step 2, inputting the training image set into a neural network for training;
the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with a key point extraction sub-network and a target detection sub-network;
the feature extraction sub-network comprises a feature map extraction module and an interested region extraction module which are sequentially arranged;
the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;
the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;
the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;
the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:
wherein [ x ] i ,y i ]Coordinates representing the ith three-dimensional keypoint, I =1,2, …, I being a positive integer, P i (u, v) represents the probability that the (u, v) th pixel point in the two-dimensional image output by the key point probability calculation sub-network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and both u and v are positive integers;
and obtaining a three-dimensional key point extraction model.
2. The method for constructing the extraction model of the target three-dimensional key points in the two-dimensional graph as claimed in claim 1, wherein the feature map extraction module comprises a feature pyramid network and a residual error network which are sequentially arranged; the region of interest extraction module comprises a region generation network.
3. The method for constructing the three-dimensional key point extraction model of the target in the two-dimensional graph, as claimed in claim 1, wherein the key point probability obtaining module comprises a plurality of volume blocks, an upsampling layer and a softmax layer which are connected in series in sequence;
the convolution block comprises a convolution layer and a ReLU active layer which are connected in sequence.
4. The method for constructing a three-dimensional key point extraction model of an object in a two-dimensional graph according to claim 1, wherein the loss function L of the three-dimensional key point extraction model is as follows:
wherein,represents the sum of the classification loss functions of all negative examples,target classification penalty function L representing all positive samples class Bounding box detection loss function L box And a keypoint detection loss function L keypoints The sum of beta and gamma is more than 0;
the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target in the interesting region; the positive sample is a two-dimensional image of an interesting region extracted by the interesting region extraction module and containing a target;
wherein the key point detects the loss functionWherein L is dis As a function of significant loss, L dep For depth prediction loss function, L con As a three-dimensional consistency loss function, L sep As a function of separation loss, L pose Estimate the loss function for the relative attitude, tau, epsilon, mu,Are all greater than 0.
6. A method for extracting a target three-dimensional key point in a two-dimensional graph is characterized by comprising the following steps:
step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;
and step B, inputting the image to be recognized into the three-dimensional key point extraction model constructed by the method for constructing the three-dimensional key point extraction model of the target in the two-dimensional graph according to any one of claims 1 to 5, and obtaining a three-dimensional key point set of the target to be recognized, wherein the three-dimensional key point set comprises Q three-dimensional key points, and Q is a positive integer.
7. A method for recognizing a three-dimensional posture of an object in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the object in the two-dimensional image, and is characterized by comprising the following steps:
step I, acquiring a two-dimensional image containing a target to be identified, and acquiring the image to be identified;
step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the method for extracting the three-dimensional key points of the target in the two-dimensional graph as claimed in claim 6;
step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;
the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by adopting the method for extracting the target three-dimensional key point in the two-dimensional graph of claim 6 for each reference image;
taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;
step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;
subtracting the coordinate of the mass center of the three-dimensional key point set of the contrast image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the contrast image to obtain a new three-dimensional key point set of the contrast image;
wherein, X' n The coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P' n Coordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, N P The total number of the three-dimensional key points in the three-dimensional key point set of the new target to be recognized or the three-dimensional key point set of the new contrast image is obtained;
VI, obtaining a posture matrix T = [ R | T =]Wherein t = μ X -Rμ P ,μ X Is the mean coordinate, mu, of the three-dimensional set of key points of the new object to be identified P Average coordinates of the three-dimensional keypoint set of the new contrast image;
VII, obtaining the image to be identified by adopting a formula IIIThree-dimensional attitude matrix T for recognizing target input :
T input =T·T ref Formula III
Wherein T is ref Is a three-dimensional pose matrix of objects in the contrast image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910738138.4A CN110634160B (en) | 2019-08-12 | 2019-08-12 | Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910738138.4A CN110634160B (en) | 2019-08-12 | 2019-08-12 | Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110634160A CN110634160A (en) | 2019-12-31 |
CN110634160B true CN110634160B (en) | 2022-11-18 |
Family
ID=68969864
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910738138.4A Active CN110634160B (en) | 2019-08-12 | 2019-08-12 | Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110634160B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111783820B (en) * | 2020-05-08 | 2024-04-16 | 北京沃东天骏信息技术有限公司 | Image labeling method and device |
CN114926610A (en) * | 2022-05-27 | 2022-08-19 | 北京达佳互联信息技术有限公司 | Position determination model training method, position determination method, device and medium |
CN115661577B (en) * | 2022-11-01 | 2024-04-16 | 吉咖智能机器人有限公司 | Method, apparatus and computer readable storage medium for object detection |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106295567A (en) * | 2016-08-10 | 2017-01-04 | 腾讯科技(深圳)有限公司 | The localization method of a kind of key point and terminal |
US9898682B1 (en) * | 2012-01-22 | 2018-02-20 | Sr2 Group, Llc | System and method for tracking coherently structured feature dynamically defined within migratory medium |
CN108830172A (en) * | 2018-05-24 | 2018-11-16 | 天津大学 | Aircraft remote sensing images detection method based on depth residual error network and SV coding |
CN109598234A (en) * | 2018-12-04 | 2019-04-09 | 深圳美图创新科技有限公司 | Critical point detection method and apparatus |
CN110020633A (en) * | 2019-04-12 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Training method, image-recognizing method and the device of gesture recognition model |
-
2019
- 2019-08-12 CN CN201910738138.4A patent/CN110634160B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9898682B1 (en) * | 2012-01-22 | 2018-02-20 | Sr2 Group, Llc | System and method for tracking coherently structured feature dynamically defined within migratory medium |
CN106295567A (en) * | 2016-08-10 | 2017-01-04 | 腾讯科技(深圳)有限公司 | The localization method of a kind of key point and terminal |
CN108830172A (en) * | 2018-05-24 | 2018-11-16 | 天津大学 | Aircraft remote sensing images detection method based on depth residual error network and SV coding |
CN109598234A (en) * | 2018-12-04 | 2019-04-09 | 深圳美图创新科技有限公司 | Critical point detection method and apparatus |
CN110020633A (en) * | 2019-04-12 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Training method, image-recognizing method and the device of gesture recognition model |
Non-Patent Citations (2)
Title |
---|
一种多特征相结合的三维人脸关键点检测方法;冯超等;《液晶与显示》;20180415(第04期);全文 * |
小样本条件下基于随机森林和Haar特征的多姿态人脸识别;周致富等;《计算机应用与软件》;20151215(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110634160A (en) | 2019-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112634451B (en) | Outdoor large-scene three-dimensional mapping method integrating multiple sensors | |
CN111325843B (en) | Real-time semantic map construction method based on semantic inverse depth filtering | |
Fraundorfer et al. | Visual odometry: Part ii: Matching, robustness, optimization, and applications | |
CN111340797A (en) | Laser radar and binocular camera data fusion detection method and system | |
CN110634160B (en) | Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph | |
JP2021163503A (en) | Three-dimensional pose estimation by two-dimensional camera | |
CN112712589A (en) | Plant 3D modeling method and system based on laser radar and deep learning | |
EP3185212B1 (en) | Dynamic particle filter parameterization | |
CN113267761A (en) | Laser radar target detection and identification method and system and computer readable storage medium | |
Zhao et al. | Visual odometry-A review of approaches | |
Cai et al. | Improving CNN-based planar object detection with geometric prior knowledge | |
Islam et al. | MVS‐SLAM: Enhanced multiview geometry for improved semantic RGBD SLAM in dynamic environment | |
Rogelio et al. | Object detection and segmentation using Deeplabv3 deep neural network for a portable X-ray source model | |
Guan et al. | Relative pose estimation for multi-camera systems from affine correspondences | |
Hwang et al. | Frame-to-frame visual odometry estimation network with error relaxation method | |
CN114913289B (en) | Three-dimensional dynamic uncertainty semantic SLAM method for production workshop | |
Chen et al. | End-to-end multi-view structure-from-motion with hypercorrelation volume | |
Chen et al. | Epipole Estimation under Pure Camera Translation. | |
Nakano | Stereo vision based single-shot 6d object pose estimation for bin-picking by a robot manipulator | |
Wang et al. | A Survey on Approaches of Monocular CAD Model-Based 3D Objects Pose Estimation and Tracking | |
Zhang et al. | Dynamic Semantics SLAM Based on Improved Mask R-CNN | |
CN113239936A (en) | Unmanned aerial vehicle visual navigation method based on deep learning and feature point extraction | |
CN112001223A (en) | Rapid virtualization construction method of real environment map | |
Khurshid et al. | Vision Based 3D Localization of UAV Using Deep Image Matching | |
Peng et al. | An improved algorithm for detection and pose estimation of texture-less objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |