CN111915632A

CN111915632A - Poor texture target object truth value database construction method based on machine learning

Info

Publication number: CN111915632A
Application number: CN202010726969.2A
Authority: CN
Inventors: 董延超; 冀玲玲; 宁少淳; 王浩天; 岳继光; 何斌; 沈润杰
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-07-26
Filing date: 2020-07-26
Publication date: 2020-11-10

Abstract

The invention relates to a poor texture target object truth value database construction method based on machine learning, which comprises the following steps of 1: acquiring an image data set of a poor texture target object and a three-dimensional model of the target object; step 2: extracting image edges of all images in the image data set and edge grating points in the three-dimensional model; and step 3: calculating DCM tensors of all images in the image data set, and constructing a direction chamfer distance error function; and 4, step 4: obtaining a rough classification initial pose; and 5: obtaining an optimized initial pose by using a target tracking sub-method; step 6: obtaining the optimized pose of each image in the image data set by using the optimized initial pose and a camera projection model and using a target tracking sub-method; and 7: acquiring a true value of a target object; and 8: a truth database is constructed using the truth values of the target object. Compared with the prior art, the method has the advantages of high precision, high speed, more comprehensive target object data, flexible target database manufacturing and the like.

Description

Poor texture target object truth value database construction method based on machine learning

Technical Field

The invention relates to the technical field of computer vision, in particular to a poor texture target object truth value database construction method based on machine learning.

Background

With the promotion of deep learning theory, computer vision technology has made great progress, especially in the direction of classification, detection and segmentation. In recent years, vision-based pose estimation is also increasingly popular, and particularly, in a deep learning-based pose estimation method, such techniques are often required to be performed based on a large amount of data, and at present, the pose of a target object is estimated and tracked through an image. There are two main types of acquisition methods for the target object pose database at present: one is that the real pose of the object is calculated by a manual measurement method, the method is often used in cooperation with a sensor, large manpower and material resources are needed, and the obtained pose and the real pose have a certain difference, especially for poor texture target objects in industrial scenes, the objects are often metallic and difficult to track by extracting feature points, a drawing-pasted assisted positioning mode greatly limits a working space, and the calculated precision is low and difficult to meet the requirements, for example, a rapid target detection method is disclosed in Chinese patent CN109558902A, the method uses the extracted features of the target object for identification, but when the method is used for identifying poor texture target objects, the precision and the accuracy are low; the other type is that a true value such as a corresponding pose is obtained by rendering a target object through 3D software in a computer graphics mode, the pose obtained by the method is accurate, but the object in a virtual scene is different from the object in a real scene, so that a real application scene is difficult to simulate, and the method is difficult to realize. Therefore, it is necessary to accurately and rapidly generate a truth database for poor texture target objects in real scenes.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a poor texture target object truth value database construction method based on machine learning, which has the advantages of high precision, high speed, more comprehensive target object data and more flexible target database manufacture.

The purpose of the invention can be realized by the following technical scheme:

a poor texture target object truth value database construction method based on machine learning comprises the following steps:

step 1: acquiring an image data set of a poor texture target object and a three-dimensional model of the target object;

step 2: extracting image edges of all images in the image data set and edge grating points in the three-dimensional model;

and step 3: calculating DCM tensors of all images in the image data set, and constructing a direction chamfer distance error function;

and 4, step 4: obtaining a rough classification initial pose;

and 5: obtaining an optimized initial pose by using a target tracking sub-method;

step 6: obtaining the optimized pose of each image in the image data set by using the optimized initial pose and a camera projection model and using a target tracking sub-method;

and 7: acquiring a true value of a target object;

and 8: a truth database is constructed using the truth values of the target object.

Preferably, the image data set of the target object in step 1 is specifically:

in a video of a target object captured by an industrial camera and acquiring a grayscale image each frame containing the target object, an image dataset is constructed that includes grayscale images of the target object in all poses.

Preferably, the directional chamfering distance error function in step 3 is specifically:

wherein M ═ { M ═ M_i1,2,3, …, where | M | is a point in the three-dimensional model of the target object, which is mapped in the image coordinate system after being rasterized by discrete sampling; n ═ N_jJ ═ 1,2,3, …, | N | is an edge point in the image; λ is the directional error weight; omega is the number of edge points in the three-dimensional model of the target object, namely omega is | M |; phi (-) is the directional operator in the image, i.e., phi (m)_i) For edge points m in the three-dimensional model_iCorresponding model edge direction in the camera imaging plane, phi (n)_j) For image edge points n_jA corresponding edge direction;

then using a bidirectional dynamic programming algorithm, firstly initializing the chamfer distance of all angles into an image two-dimensional distance, and calculating the distance minimum value corresponding to each point according to forward recursion and backward recursion:

wherein | · | purple sweet_πIs the absolute value of the edge direction difference.

More preferably, the discrete sampling specifically includes: discretizing the edge direction according to a certain angle interval.

More preferably, the step 4 specifically includes:

detecting position information of a target object in an image by using a random forest classification detector, firstly inputting a DCM tensor map as a feature map, performing feature extraction by using normalized pixel difference features, training a random forest classifier, and finally detecting and acquiring the position of the target object in the image through a sliding window;

extracting the characteristics of the matched nearest graph, and describing the difference among all pixel points in the image by adopting normalized characteristics;

after target detection is finished, obtaining a translation vector of an object coordinate system relative to a camera coordinate system by using a regression method, determining the translation vector of the object coordinate system relative to the camera coordinate system through two-dimensional coordinates of the target in an image and model information, training by using initial pose information of the target object as the characteristics of a regression tree, and outputting the translation vector T as the output of the regression tree;

and finally, acquiring the rotation relation R ═ R of the target object relative to the camera coordinate system_x,r_y.r_z]Wherein r is_x、r_yAnd r_zEuler angles of rotation about the x, y and z axes of the camera coordinate system, respectively.

More preferably, the difference function of the differences between the pixels is specifically:

where x and y are pixel values of any two pixels in the image.

More preferably, the initial pose information of the target object in the image includes the pixel coordinates of the left vertex of the bounding box of the target object map in the image, the pixel coordinates of the center point of the bounding box, and the coarse classification result of the random forest classification detector.

Preferably, the target tracking sub-method specifically comprises:

constructing an objective function by using the DCM tensor, wherein the objective function is specifically as follows:

wherein o is_iEdge grating points of the three-dimensional model of the target object; pi (·) is the camera projection model;

representing a three-dimensional space pose transformation relation of a target object relative to a camera;

after the target function is optimized, the accurate pose transformation relation of the target object in the current frame image relative to the camera is obtained through the optimized target function, and the pose relation of the frame is used as the initial pose of the next frame image, so that the pose tracking is realized.

More preferably, the specific method for optimizing the objective function is as follows:

optimizing the objective function by adopting a self-adaptive weight optimization algorithm, wherein the optimization weight is as follows:

adding the optimization weight into the objective function to obtain a new optimization objective function, which specifically comprises the following steps:

coordinate point x of image_iConversion to three-dimensional spatial points o_iObtaining:

preferably, the true value of the target object includes: the position of the target object, the posture of the target object, the Mask of the target object, the two-dimensional bounding box of the target object and the three-dimensional bounding box of the target object.

Compared with the prior art, the invention has the following advantages:

firstly, the precision is high, and is fast: the truth value acquisition method of the invention uses DCM tensor to construct objective function to solve the pose of each frame of image of the target object, the solved pose has higher precision, the angle error is within 2 degrees, and the translation vector error is within 1 mm; the corresponding minimum distance between the pixel points is solved by using a bidirectional dynamic programming algorithm, so that the speed of obtaining a true value is increased, and the speed can reach 10 frames/second; meanwhile, due to the adoption of self-adaptive weight optimization, the target tracking robustness is stronger.

Secondly, the target object data are more comprehensive, and the target database is more flexible to manufacture: the real value of the target object finally obtained by the real obtaining method comprises the following steps: the position of the target object, the posture of the target object, the Mask of the target object, the two-dimensional boundary frame of the target object, the three-dimensional boundary frame of the target object and the like, the data related to the target object is more comprehensive, and the target database is more flexibly manufactured.

Drawings

FIG. 1 is a schematic flow chart of a method for obtaining a true value of a target object according to the present invention;

FIG. 2 is a schematic illustration of a three-dimensional model in an embodiment of the invention;

FIG. 3 is a diagram illustrating a first Mask of a target object according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a second Mask of the target object according to the embodiment of the present invention;

FIG. 5 is a diagram illustrating a first effect of a database generated in an embodiment of the present invention;

fig. 6 is a diagram illustrating a second effect of the database generated in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

A method for constructing a truth value database of a poor texture target object based on machine learning, the flow of which is shown in fig. 1, includes:

the image data set acquisition method comprises the following steps:

acquiring a gray level image containing a target object in each frame in a target object video acquired by an industrial camera to form an image data set, wherein the data set comprises the gray level images of the target object in all postures;

the three-dimensional model of the target object is a three-dimensional model created in advance by 3Dmax software.

and step 3: calculating DCM tensors of images in the image data set, and constructing a direction chamfer distance error function to obtain a DCM tensor map;

the direction chamfer distance error function is specifically:

wherein M ═ { M ═ M_i1,2,3, …, where | M | is a point in the three-dimensional model of the target object, which is mapped in the image coordinate system after being rasterized by discrete sampling; n ═ N_jJ ═ 1,2,3, …, | N | is an edge point in the image; λ is the directional error weight; omega is the number of edge points in the three-dimensional model of the target object, namely omega is | M |; phi (-) is the directional operator in the image, i.e., phi (m)_i) For edge points m in the three-dimensional model_iCorresponding model edge direction in the camera imaging plane, phi (n)_j) For image edge points n_jCorresponding edge direction. Experiments prove that the DCM-based matching method is higher in accuracy and higher in robustness under the conditions of occlusion and other complex backgrounds.

In order to accelerate the calculation of the direction error of the DCM tensor, the present embodiment performs discretization processing on the edge direction at a certain angle interval in the scene edge image, so that the edge in the angle range is imaged separately.

The method has high calculation speed, and can obtain the corresponding pixel points of all the pixel points in the edge images of all the angles after the forward and backward circulation is carried out for at most 1.5 times

The time is again controlled within O (q), where q represents the number of pixels in the image.

And 4, step 4: the method for obtaining the rough classification initial pose specifically comprises the following steps:

The initial pose information of the target object in the image comprises the pixel coordinate of the left vertex of the boundary box of the target object image in the image, the pixel coordinate of the center point of the boundary box and the rough classification result of the random forest classification detector.

Extracting the characteristics of the matched nearest graph, describing the difference among all pixel points in the image by adopting normalized characteristics, wherein the difference function of the difference among the pixel points is as follows:

where x and y are pixel values of any two pixels in the image.

and 7: acquiring a true value of a target object;

the truth value of the target object in the present embodiment includes: the position of the target object, the posture of the target object, the Mask of the target object, the two-dimensional boundary box of the target object, the three-dimensional boundary box of the target object and the like. And acquiring the pose of the target object in each frame of image by using a target tracking sub-method, and mapping the model grating points to an image plane through the pose information and the three-dimensional model of the target object. Traversing all raster point image coordinates (x)_i,y_i) Through the maximum point x of the raster point coordinates_min、y_min、x_maxAnd y_maxThe two-dimensional boundary frame of the target object is determined, and the three-dimensional model information of the target object can project the coordinates of the three-dimensional space point of the object to the two-dimensional image by using the pose, so that the two-dimensional image is displayedObtaining a three-dimensional boundary frame of the object; and obtaining the Mask corresponding to the target object on the image by using the pose information. In addition, negative sample data can be obtained, and after the image range of the target object in each frame of image is obtained, images with the same size are intercepted in other areas of the current image by using a random method and are stored as negative samples.

The target tracking sub-method specifically comprises the following steps:

and obtaining the accurate pose transformation relation of the target object in the current frame image relative to the camera by optimizing the target function, and using the pose relation of the frame as the initial pose of the next frame image to realize pose tracking.

The optimization method of the objective function comprises the following steps:

in order to improve the robustness of tracking, an adaptive weight optimization algorithm is adopted to optimize the objective function. When the gradient size is larger than a set threshold, the matching point is proved to be matched with the edge of the scene image, but whether the matching of the model edge point and the scene edge is correct or not and the matching precision need to be further determined by using the direction of the model edge and the gradient direction of the scene. By calculating the sine value of the difference between the edge direction and the gradient direction as the optimized weight of the grating point, the weight under the condition that the edge direction and the gradient direction are perpendicular to each other is higher, and thus the following results are obtained:

by using the method, the disturbance of image noise and edge extraction errors to the system can be reduced, the influence of interference points on the overall optimization can be timely reduced in a complex background, and the robustness of the system is ensured.

And 8: the truth values of the target objects are used to construct a truth value database of the poor texture target objects, and the effect graphs of the constructed database in the embodiment are shown in fig. 5 and 6.

A schematic diagram of the three-dimensional model of the target object in this embodiment is shown in fig. 2, masks of the finally generated target object are shown in fig. 3 and 4, and the finally obtained pose of the target object and the position of the target object in the image are shown in table 1 and table 2, respectively.

TABLE 1 pose of target object

TABLE 2 location of target object in image

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A poor texture target object truth value database construction method based on machine learning is characterized by comprising the following steps:

and 4, step 4: obtaining a rough classification initial pose;

and 7: acquiring a true value of a target object;

2. The method for constructing the poor-texture target object truth database based on the machine learning as claimed in claim 1, wherein the image dataset of the target object in the step 1 is specifically:

3. The method for constructing the poor-texture target object truth value database based on the machine learning according to claim 1, wherein the direction chamfer distance error function in the step 3 is specifically as follows:

4. The method for constructing the poor-texture target object truth database based on the machine learning as claimed in claim 3, wherein the discrete sampling specifically comprises: discretizing the edge direction according to a certain angle interval.

5. The method for constructing the poor-texture target object truth database based on the machine learning according to claim 3, wherein the step 4 specifically comprises:

6. The machine learning-based poor texture target object truth value database construction method according to claim 5, wherein the difference function of the differences between the pixel points is specifically as follows:

where x and y are pixel values of any two pixels in the image.

7. The machine learning-based poor texture target object truth value database construction method according to claim 5, wherein the initial pose information of the target object in the image comprises a bounding box left vertex pixel coordinate, a bounding box center point pixel coordinate and a rough classification result of a random forest classification detector of the target object map in the image.

8. The method for constructing the poor-texture target object truth database based on the machine learning according to claim 1, wherein the target tracking sub-method specifically comprises the following steps:

9. The method for constructing the poor-texture target object truth database based on the machine learning according to claim 8, wherein the specific method for optimizing the objective function is as follows:

10. the method for constructing a poor-texture target object truth database based on machine learning according to claim 1, wherein the target object truth values comprise: the position of the target object, the posture of the target object, the Mask of the target object, the two-dimensional bounding box of the target object and the three-dimensional bounding box of the target object.