CN113012298B - Curved MARK three-dimensional registration augmented reality method based on region detection - Google Patents

Curved MARK three-dimensional registration augmented reality method based on region detection Download PDF

Info

Publication number
CN113012298B
CN113012298B CN202011563089.4A CN202011563089A CN113012298B CN 113012298 B CN113012298 B CN 113012298B CN 202011563089 A CN202011563089 A CN 202011563089A CN 113012298 B CN113012298 B CN 113012298B
Authority
CN
China
Prior art keywords
mark
point
pose
camera
natural texture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011563089.4A
Other languages
Chinese (zh)
Other versions
CN113012298A (en
Inventor
张明敏
陈忠庆
潘志庚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202011563089.4A priority Critical patent/CN113012298B/en
Publication of CN113012298A publication Critical patent/CN113012298A/en
Application granted granted Critical
Publication of CN113012298B publication Critical patent/CN113012298B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker
    • G06T2207/30208Marker matrix

Abstract

The invention discloses a curved MARK three-dimensional registration augmented reality method based on region detection. Partial occlusion can be done for the curved MARK without affecting the final effect. The method provided by the invention overcomes the problems that the traditional plane MARK can not be bent, so that the consistency of a cylindrical object on an augmented reality scene is damaged, the robustness of the natural texture MARK is low, the real-time performance is low and the like.

Description

Curved MARK three-dimensional registration augmented reality method based on region detection
Technical Field
The invention belongs to the field of intersection of computer vision technology and graphics, and particularly relates to a curved MARK three-dimensional registration augmented reality method based on region detection.
Background
With the increasing development and maturity of the internet and the iterative update of multimedia technology, Augmented Reality (AR) is more and more common in our daily life and learning. Augmented reality is a very practical technology combining computer graphics and computer vision, and can overlay information such as virtual objects, videos and characters to a real scene so that a user can acquire more information in the scene, and the user can understand the scene more deeply and clearly.
The augmented reality has wide applications in daily life, such as teaching demonstrations, tour navigation, virtual shopping, workshop guidance, and the like. In the teaching field, the augmented reality can bring safer and more interesting teaching experiments for students, and improve the interest and the practical ability of the students. Because the dangerousness or the imperceptibility of many experiments, these experiments are often neglected in the teaching process, this has promoted the application of augmented reality in the teaching process, and the student can use the interaction of virtual object and real object that superpose in the augmented reality to accomplish the dangerousness experiment and observe more detailed experimental effect, all has very big promotion to student's hands-on ability and theoretical cognition. For the tourism industry, the augmented reality can also enable the user to obtain more direct and vivid explanation on the mobile terminal, and the interestingness and the interactivity of navigation are enhanced.
The augmented reality system mainly relates to technologies such as three-dimensional registration, user interaction, virtual-real fusion and the like, wherein the three-dimensional registration plays a decisive role in development and popularization of the augmented reality system, and the method mainly has the function of estimating the relative pose of a camera in a scene and then superimposing a virtual object on the real scene. Three-dimensional registration is not satisfactory in the experience degree of users at present due to the problems of instantaneity, robustness, stability, attractiveness and the like, so that the deep exploration of the three-dimensional registration technology has a profound significance in the development process of augmented reality for the research and development of the three-dimensional registration technology to be a hot topic in the field of augmented reality nowadays.
In the augmented reality system, in order to generate the visual effect of virtual-real fusion, registration alignment of virtual-real environment is firstly ensured. The most commonly used method in the augmented reality system is that the virtual and real environments share the same spatial coordinate system, so that virtual objects can be rendered in a scene to achieve the effect of virtual-real interaction. In augmented reality systems, cameras are generally used as main sensors, and the positions of virtual objects to be rendered are acquired by estimating the relative poses of the cameras in real time through a three-dimensional registration technology.
The three-dimensional registration technique most commonly used today is the vision-based registration technique, with planar MARKs being the most commonly used. However, for a planar MARK with a curved surface, such as a cylinder, the aesthetic property of the MARK is damaged, so that the immersion of a user is greatly reduced, and therefore, the three-dimensional registration based on the curved MARK is not very significant for the development of augmented reality. The category of the MARK mainly includes an artificial MARK and a MARK based on a natural texture, wherein the artificial MARK such as a hamming code, a two-dimensional code, and the like has the disadvantages of being not shelterable and not being able to obtain a correct pose after being bent, so the MARK based on the natural texture becomes a unique choice for realizing the bent MARK.
Disclosure of Invention
The invention aims to apply a curved MARK three-dimensional registration technology to the field of augmented reality, and provides a curved MARK three-dimensional registration augmented reality method based on region detection. The region where the MARK is located is obtained through the neural network model, three-dimensional registration is carried out on the bent MARK, meanwhile, the MARK can be partially shielded, and the attractiveness and the real-time performance of augmented reality cannot be damaged.
The method is based on a region detection technology, the region where the bent MARK is located in the scene is obtained, and a three-dimensional model formed by attaching the MARK to the cylinder after bending is built according to the radius of the cylinder object and the coordinates of the MARK plane feature points. Coordinates of the curved MARK feature points are acquired in a scene, and the relative pose between the camera and the MARK is restored through a PnP algorithm, so that the virtual object is rendered in the scene.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a curved MARK three-dimensional registration augmented reality method based on region detection comprises the following steps:
step (1), calibrating a camera:
acquiring internal parameters and distortion parameters of the RGB monocular camera by using a Zhang Zhen friend camera calibration method;
step (2), constructing a data set:
the natural texture MARK to be identified is shot by more than 300 pictures under different angles, different distances, different illumination, partial shielding and non-shielding conditions respectively, wherein 80% of the pictures are used as a training set, and the rest 20% of the pictures are used as a verification set. Calibrating a border (bounding box) and a class (classes) of a natural texture MARK in a picture by using labelImg software to generate a corresponding xml format file, framing out an area where the natural texture MARK is located by using a rectangular frame in the calibration process, and then calibrating the class corresponding to the natural texture MARK;
step (3), a neural network model of Yolov5 is used as a natural texture MARK target detection model, the training set constructed in the step (2) is adopted for training, the accuracy of the model is verified through a verification set, the trained model is extracted, and the frame of MARK in a scene can be identified and a specific type can be identified through the trained natural texture MARK target detection model;
step (4), printing the MARK, pasting the MARK on a cylindrical object, and simultaneously measuring the radius r of the cylinder and the width and height of the MARK picture with the natural texturer _ w, marker _ h, extracting MARK picture feature points by using a Fast algorithm, calculating three-dimensional coordinates of each feature point relative to a MARK central point, and calculating an included angle between a feature point with (x, y) coordinates and a cylinder central connecting line for the feature point with (x, y) coordinates in the MARK picture
Figure GDA0003039726560000031
The pixel to millimeter conversion scale is pixel2mm, solving for the equation (1):
Figure GDA0003039726560000032
the corresponding three-dimensional coordinates are (the coordinate units here are all millimeters):
Figure GDA0003039726560000033
Figure GDA0003039726560000034
Figure GDA0003039726560000035
storing the three-dimensional coordinates of all the feature points in a dictionary, wherein the keys of the dictionary are the coordinates (x, y) of a MARK picture, and the values are the coordinates obtained by the formulas (2), (3) and (4);
and (5) for a scene picture obtained by a camera, extracting a natural texture MARK in the scene picture by using a natural texture MARK target detection model to generate a Region Of Interest (ROI Region), extracting all feature points in the ROI Region by adopting a Fast algorithm, calculating a descriptor by using an ORB algorithm, matching the extracted feature points Of the ROI Region with original MARK feature points by using a RANSAC and K nearest neighbor classification algorithm (KNN) on the basis Of calculating the hamming distance Of the descriptor, obtaining 30 most matched feature point matching pairs, obtaining three-dimensional coordinates Of the feature points Of the MARK picture from a dictionary in the step (4), and estimating the relative pose Of the camera and the curved MARK by using a PNP algorithm, wherein the specific implementation is as follows:
point X of world coordinate systemw=(xw,yw,zw1), with its projection coordinates X on the image planei=(xi,yiThe relation of 1) is expressed by the following formula, wherein fx, fy, cx and cy are camera internal parameters calibrated by Zhangyingyou, and rijRepresenting a rotational variable, tiRepresents the translation variables:
Figure GDA0003039726560000041
the method is simplified as follows:
Figure GDA0003039726560000042
wherein, λ represents a scale factor, the matrix K is a camera internal parameter matrix, and the matrix M is a model viewpoint matrix. Randomly selecting 4 characteristic point matching point pairs from 30 characteristic point matching pairs, calculating 4 groups of different solutions by using 3 point pairs, substituting the rest 1 point pairs into a formula, solving the solution with the minimum reprojection error into a final solution, and optimizing the final solution by using a random sample consensus (RANSAC) algorithm in the process;
step (6), in the MARK moving process, tracking the moving state of the feature points by using an optical flow method, judging the moving quantity of the feature points of the same two frames, when the moving quantity is less than or equal to ten percent of the total feature point quantity, considering that the pose of the marker does not change relative to the previous frame, and when the moving quantity is greater than ten percent of the total feature point quantity, considering that the pose of the marker changes relative to the previous frame, and then following the step (5) to obtain the pose of the current natural texture MARK for three-dimensional registration, specifically realizing the following steps:
and setting I and J as the gray level images of the previous frame and the current frame, then:
Figure GDA0003039726560000043
wherein, the point A is any point in the image, and the coordinate vector is (x, y)TFor a point u on the previous frame I ═ ux,uy]TThe purpose of feature point tracking is to find its position v + u + d in the current frame imagex+dx,uy+dy]TD is ═ dx,dy]TIs the image velocity at point a, i.e. the optical flow at point a. Defining the concept of similarity in the two-dimensional neighborhood sense due to the influence of the aperture, setting ωxAnd ωyFor two integer values, the minimized residual function for the velocity vector d is defined as follows:
Figure GDA0003039726560000051
the similarity definition can be obtained through the formula, and the similarity definition is based on the image neighborhood size of (2 omega)x+1)×(2ωy+1), solving d to obtain the corresponding position of the point u in the image J; omegaxAnd ωyIs 2, 3, 4, 5, 6 or 7.
Comparing the position of the feature point calculated by the current frame with the position of the feature point corresponding to the previous frame, judging whether the feature points of two adjacent frames of the camera move or not, counting the number of the moved feature points, if the number of the feature points of the current frame is less than or equal to ten percent of the number of the total feature points, considering that the object does not move relative to the object of the previous frame, and directly acquiring the pose of the previous frame, if the number of the feature points of the current frame is greater than ten percent of the number of the total feature points, considering that the object moves relative to the object of the previous frame, and recalculating the relative pose of the camera relative to the object;
and (7) in the process of estimating the relative pose of the camera in the step (5), predicting and correcting the 6D pose of the MARK by using Kalman filtering.
First defining the displacement of the camera with respect to the natural texture MARK (t)x,ty,tz) And the rotation angles (psi, theta, phi), the first derivative of the coordinates being (t)x',ty',tz') and the second derivative of the coordinates is (t)x”,t'y',tz") where the first derivative represents the speed at which the natural texture MARK moves and the second derivative represents the acceleration at which the natural texture MARK moves, the first derivative of the rotation angle is (ψ ', θ ', φ '), and the second derivative is (ψ", θ ", φ"), where the first derivative represents the speed at which the MARK rotates and the second derivative represents the acceleration at which the natural texture MARK rotates. The kalman filtering may be used for estimation and correction, and the specific formula is as follows:
Kalman=(tx,ty,tz,tx′,ty′,tz′,tx″,ty″,tz″,ψ,θ,φ,ψ′,θ′,φ′,ψ″,θ″,φ″) (9)
and (8) eliminating the frame with the wrong pose estimation by using a sliding window.
And judging whether the current estimated camera pose is correct or not by the camera pose coordinates of the last two frames and the first two frames relative to the current frame, so as to eliminate the frame with wrong estimation of the camera pose caused by blurring in the motion process of the natural texture MARK. The displacement in the 6D pose estimation of the current frame camera is (x)t,yt,zt) (parameters with different meanings need to be represented by different symbols), calculating the average displacement (x ', y ', z ') of the cameras of the first two frames and the average displacement (x ", y", z ") of the cameras of the second two frames, wherein the current displacement satisfies:
x"-dt<xt<x'+dt orx'-dt<xt<x"+dt
y"-dt<yt<y'+dt ory'-dt<yt<y"+dt (10)
z"-dt<zt<z'+dt or z'-dt<zt<z"+dt
considering the current pose as an effective pose, otherwise considering the current frame as a fuzzy frame, and continuously using the last effective pose, wherein dtThreshold adjusted for translation, dt=3。
And (9) after the relative pose between the camera and the natural texture MARK is obtained through the steps (5), (6), (7) and (8), the virtual object needing three-dimensional registration is subjected to translation and rotation transformation, and the virtual object is rendered into a scene through OpenGL and OpenCV to achieve the effect of augmented reality.
The invention has the beneficial effects that:
the method comprises the steps of attaching a two-dimensional natural texture MARK to a cylinder to form a curved MARK, processing the curved MARK through a neural network model to obtain the region where the curved MARK is located in the current scene, calculating the relative pose between a camera and an object through feature point matching, and rendering a virtual object into an augmented reality scene. Partial occlusion can be done for the curved MARK without affecting the final effect. The method solves the problems that the traditional plane MARK can not be bent, so that the consistency of the cylindrical object to the augmented reality scene is damaged, the robustness of the natural texture MARK is low, the real-time performance is low and the like.
Drawings
FIG. 1 is a picture of a natural texture MARK according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating feature points detected in a MARK picture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the calculation of three-dimensional coordinates of feature points on a MARK according to an embodiment of the present invention;
FIG. 4 is a comparison diagram of feature points between two adjacent frames according to an embodiment of the present invention;
FIG. 5 is an effect diagram of a virtual object rendered to an assigned pose in a scene according to an embodiment of the present invention;
FIG. 6 is a flowchart of a method according to an embodiment of the present invention.
Detailed Description
The method of the present invention is further described below with reference to the accompanying drawings.
The experimental environment is a monocular RGB video camera (640 x 480), a cylindrical object, a natural texture MARK picture is printed and attached to the cylindrical object, and the part of the MARK on the cylindrical object is always aligned to the monocular camera in the experimental process.
As shown in fig. 1, a picture with sharp corners and irregular natural texture features with multiple features is used as a MARK, and a symmetrical picture is not selected, so that a large number of pictures are selected, and the pictures have obvious differences.
As shown in fig. 2, all feature points in the MARK picture are calculated by using Fast algorithm, and the specific steps are as follows:
step (a), selecting a pixel Q from the MARK picture, and setting the brightness value of the pixel Q as I to judge whether the pixel is a characteristic point or notq
Step (b), a Bresenham circle is obtained by taking the pixel point Q as the center and the radius of the Bresenham circle as 3, and the circle has 16 pixels;
step (c), on the circle with the size of 16 pixels, if the pixel values of 9 continuous pixel points are all larger than Iq+ t or both less than IqAnd t, the pixel point Q is considered as a characteristic point, and the t is a set threshold value.
Step (d), in order to improve the judgment efficiency of the angular points to eliminate pixels of non-angular points in the image, checking corresponding pixels according to four positions of 1, 9, 5 and 13, and when a pixel point Q is an angular point, at least 3 pixel values of the pixel points of the four positions are all larger than Iq+ t is greater than or less than IqAnd (c) if the pixel values of the pixel points at the four positions do not meet the condition, judging and screening all the pixel points which are not the angular points if the pixel values of the pixel points at the four positions are not the angular points, and judging and screening the rest pixel points to obtain the final angular points by performing the operation judgment in the step (c).
As in fig. 3, for each feature point, its three-dimensional coordinates relative to the MARK center point are calculated. After attaching the MARK to the cylindrical object, a three-dimensional model can be obtained. After the radius r of the cylinder is obtained, for the feature point with the coordinate (x, y) of the MARK picture, the included angle between the feature point and the central connecting line of the cylinder is
Figure GDA0003039726560000071
The pixel to millimeter conversion scale is pixel2mm, and the calculation formula is as follows:
Figure GDA0003039726560000072
the corresponding three-dimensional coordinates are (the coordinate units here are all pixels):
Figure GDA0003039726560000081
Figure GDA0003039726560000082
Figure GDA0003039726560000083
storing the three-dimensional coordinates of all the feature points in a dictionary, wherein the keys of the dictionary are the coordinates (x, y) of the MARK picture, and the values are the obtained three-dimensional coordinates (3 d)x,3dy,3dz)。
Shooting under different angles, different distances, different illumination, partial shielding and non-shielding conditions to obtain a training set picture; 300 pictures are shot in total, wherein 240 pictures are used as a training set, the rest pictures are used as a verification set, a border (bounding box) and a category (classes) of the pictures are calibrated by using labelImg to generate an xml file, and then the corresponding pictures and the marked xml are put into corresponding paths of a Yolov5 model code (step (3)).
Detecting a natural texture MARK in a scene through a Yolov5 target detection model, wherein the model can extract a region where the MARK is located in the scene and the confidence coefficient of the region in real time, if the confidence coefficient is less than 20, the region is not considered to be the region where the MARK is located, if the confidence coefficient is greater than or equal to 20, a bounding box (bounding box) of the MARK in the scene is obtained, and masking operation is performed on an image of the region by using OPENCV so that the RGB value of pixels except the other parts of the MARK region is (0,0, 0).
As shown in fig. 4, tracking all feature points in a scene by an optical flow method (optical flow) is specifically implemented as follows:
and setting I and J as the gray level images of the previous frame and the current frame, then:
Figure GDA0003039726560000084
wherein, the point A is any point in the image, and the coordinate vector is (x, y)TFor a point u on the previous frame I ═ ux,uy]TThe purpose of feature point tracking is to find its position v + u + d in the current frame imagex+dx,uy+dy]TD is ═ dx,dy]TIs the image velocity at point a, i.e. the optical flow at point a. Defining the concept of similarity in the two-dimensional neighborhood sense due to the influence of the aperture, setting ωxAnd ωyFor two integer values, the minimized residual function for the velocity vector d is defined as follows:
Figure GDA0003039726560000091
the similarity definition can be obtained through the formula, and the similarity definition is based on the image neighborhood size of (2 omega)x+1)×(2ωy+1), solving for d, resulting in the corresponding position of point u in image J, for ωxAnd ωyTypical values are 2, 3, 4, 5, 6, 7.
Comparing the position of the feature point calculated by the current frame with the position of the feature point corresponding to the previous frame, judging whether the feature points of two adjacent frames of the camera move or not, counting the number of the moved feature points, if the number of the feature points of the current frame is less than or equal to ten percent of the number of the total feature points, considering that the object does not move relative to the object of the previous frame, and directly acquiring the pose of the previous frame, if the number of the feature points of the current frame is greater than ten percent of the number of the total feature points, considering that the object moves relative to the object of the previous frame, and recalculating the relative pose of the camera relative to the object;
and calculating the obtained descriptor of the feature point by using an ORB algorithm, and specifically comprising the following steps of:
setting the center of a key point O, and using O as the center of a circlerThe size of the pixel is that the radius is made into a circle;
taking N point pairs in the circle, wherein N is 512;
step (g), defining operation M (wherein I)AExpressing the gray scale of A, IBGrayscale for B):
Figure GDA0003039726560000092
and (h) carrying out the operation of the step (g) on the selected key points to obtain a descriptor combination consisting of 0 and 1.
The ORB implementation in OPENCV adopts an image pyramid to solve the problem that descriptors are not sensitive to illumination and have no scale consistency. For rotation consistency, the principal direction of each feature point is calculated by adopting a gray scale centroid method, the gray scale centroid coordinate is calculated in the circular area range with the radius r of the feature point, and the direction vector from the center position to the centroid position is determined as the principal direction.
And carrying out similarity matching on the feature point descriptor of the natural texture MARK in the scene and the original MARK feature point descriptor. Hamming distance is used to calculate the similarity between two descriptors, with dkHamming distance, D, between rBRIEF descriptors representing feature points A and BADescriptor representing characteristic point A, DBA descriptor representing a feature point B, i representing the bit value of the i-th position of the descriptor:
Figure GDA0003039726560000101
and after obtaining a feature point pair matched with the natural texture MARK and the reference MARK in the scene, carrying out outlier rejection through a ratio test, regarding the feature point p of the natural texture MARK in the scene, the distance between the two feature points closest to the reference image is d1 and d2, and when d1/d2> ratio (ratio is preferably 0.8), considering the feature point p as an outlier to carry out rejection. The use of random sample consensus (RANSAC) algorithm on the ratio-tested valid feature points (inliers) further eliminates possible outliers. In the matching process, cross validation (namely that the feature point p and the feature point q are mutually the most matched feature points) and a nearest neighbor algorithm are used for further screening out the point pairs with wrong matching, and finally the camera pose is calculated through a PnP algorithm. As shown in fig. 5, for the effect of augmented reality by rendering the virtual object into the scene, the virtual object completely covers the cup attached with the natural texture MARK, so that the cup in the scene is replaced.
As shown in fig. 6, which is a flow chart of the practical application of the method of the present invention, the steps are as follows:
step (1) acquiring a natural texture MARK Region (ROI) in a scene by using Yolov 5;
step (2) comparing the feature points of the previous frame with the optical flow method, if the object is judged not to move, directly acquiring the pose of the camera of the previous frame, and executing step (4), otherwise, executing step (3);
step (3) extracting feature points of an ROI (region of interest) by using a Fast algorithm to match with feature points of a MARK picture, finding out 30 feature points which are most matched by a KNN (nearest neighbor algorithm), recovering three-dimensional coordinates of the 30 feature points by a dictionary, calculating to obtain a MARK pose by using a PnP (neighbor nearest neighbor algorithm) and a RANSAC (random sample consensus) algorithm, comparing the MARK pose with an average pose of a sliding window to judge whether the current pose is effective, updating the sliding window if the MARK pose is effective, and executing the step (4), otherwise, using the pose of a previous frame of camera;
and (4) rendering the virtual object to the acquired pose through OPENGL and OPENCV to perform augmented reality.

Claims (6)

1. A curved MARK three-dimensional registration augmented reality method based on region detection is characterized by comprising the following steps:
a curved MARK three-dimensional registration augmented reality method based on region detection comprises the following steps:
step (1), calibrating a camera:
acquiring internal parameters and distortion parameters of the RGB monocular camera by using a Zhang Zhen friend camera calibration method;
step (2), constructing a data set:
respectively taking more than 300 pictures of the natural texture MARK to be identified under the conditions of different angles, different distances, different illumination, partial shielding and non-shielding, wherein 80% of the pictures are used as a training set, and the rest 20% of the pictures are used as a verification set; calibrating a border (bounding box) and a class (classes) of a natural texture MARK in a picture by using labelImg software to generate a corresponding xml format file, framing out an area where the natural texture MARK is located by using a rectangular frame in the calibration process, and then calibrating the class corresponding to the natural texture MARK;
step (3), a neural network model of Yolov5 is used as a natural texture MARK target detection model, the training set constructed in the step (2) is adopted for training, the accuracy of the model is verified through a verification set, the trained model is extracted, and the frame of MARK in a scene can be identified and a specific type can be identified through the trained natural texture MARK target detection model;
step (4), printing the MARK, pasting the MARK on a cylindrical object, measuring the radius r of the cylinder, the width and height of a natural texture MARK picture, marker _ w and marker _ h, extracting the MARK picture characteristic points by using a Fast algorithm, calculating the three-dimensional coordinate of each characteristic point relative to the central point of the MARK, and calculating the included angle between the characteristic point with the coordinate (x, y) of the MARK picture and the connecting line of the characteristic point and the center of the cylinder
Figure FDA0003454925880000011
The pixel to millimeter conversion scale is pixel2mm, solving for the equation (1):
Figure FDA0003454925880000012
the corresponding three-dimensional coordinates are, in units of mm:
Figure FDA0003454925880000013
Figure FDA0003454925880000021
Figure FDA0003454925880000022
storing the three-dimensional coordinates of all the feature points in a dictionary, wherein the keys of the dictionary are the coordinates (x, y) of a MARK picture, and the values are the coordinates obtained by the formulas (2), (3) and (4);
and (5) for a scene picture obtained by a camera, extracting a natural texture MARK in the scene picture by using a natural texture MARK target detection model to generate a Region Of Interest (ROI Region), extracting all feature points in the ROI Region by adopting a Fast algorithm, calculating a descriptor by using an ORB algorithm, matching the extracted feature points Of the ROI Region with original MARK feature points by using a RANSAC and K nearest neighbor classification algorithm (KNN) on the basis Of calculating the hamming distance Of the descriptor, obtaining 30 most matched feature point matching pairs, obtaining three-dimensional coordinates Of the feature points Of the MARK picture from a dictionary in the step (4), and estimating the relative pose Of the camera and the curved MARK by using a PNP algorithm, wherein the specific implementation is as follows:
point X of world coordinate systemw=(xw,yw,zw1), with its projection coordinates X on the image planei=(xi,yiThe relation of 1) is expressed by the following formula, wherein fx, fy, cx and cy are camera internal parameters calibrated by Zhangyingyou, and rijRepresenting a rotational variable, tiRepresents the translation variables:
Figure FDA0003454925880000023
the method is simplified as follows:
Figure FDA0003454925880000024
wherein, λ represents a scale factor, and the matrix K is a camera internal parameter matrix; randomly selecting 4 characteristic point matching point pairs from 30 characteristic point matching pairs, calculating 4 groups of different solutions by using 3 point pairs, substituting the rest 1 point pairs into a formula, solving the solution with the minimum reprojection error into a final solution, and optimizing the final solution by using a random sample consensus (RANSAC) algorithm in the process;
step (6), in the MARK moving process, tracking the moving state of the feature points by using an optical flow method, judging the moving quantity of the feature points of the same two frames, when the moving quantity is less than or equal to ten percent of the total feature point quantity, considering that the pose of the marker does not change relative to the previous frame, and when the moving quantity is greater than ten percent of the total feature point quantity, considering that the pose of the marker changes relative to the previous frame, and performing three-dimensional registration by following the pose of the current natural texture MARK obtained in the step (5);
step (7), in the process of estimating the relative pose of the camera in the step (5), predicting and correcting the 6D pose of the MARK by using Kalman filtering;
first defining the displacement of the camera with respect to the natural texture MARK (t)x,ty,tz) And the rotation angles (psi, theta, phi), the first derivative of the coordinates being (t)x',ty',tz') and the second derivative of the coordinates is (t)x″,ty″,tz"), wherein the first derivative represents the speed at which the natural texture MARK moves, and the second derivative represents the acceleration at which the natural texture MARK moves, the first derivative of the rotation angle is (ψ ', θ ', φ '), and the second derivative is (ψ", θ ", φ"), wherein the first derivative represents the speed at which the MARK rotates, and the second derivative represents the acceleration at which the natural texture MARK rotates; the kalman filtering may be used for estimation and correction, and the specific formula is as follows:
Kalman=(tx,ty,tz,tx′,ty′,tz′,tx″,ty″,tz″,ψ,θ,φ,ψ′,θ′,φ′,ψ″,θ″,φ″) (7)
step (8), eliminating the frame with the wrong pose estimation by using a sliding window;
judging whether the current estimated camera pose is correct or not by the camera pose coordinates of the last two frames and the first two frames relative to the current frame, so as to eliminate the frame with wrong estimation of the camera pose caused by blurring in the motion process of the natural texture MARK; the displacement in the 6D pose estimation of the current frame camera is (x)t,yt,zt) Calculating the average displacement (x ', y ', z ') of the cameras of the first two frames and the average displacement (x ", y", z ") of the cameras of the second two frames, wherein the current displacement satisfies:
Figure FDA0003454925880000031
considering the current pose as an effective pose, otherwise considering the current frame as a fuzzy frame, and continuously using the last effective pose, wherein dtThreshold adjusted for translation, dt=3;
And (9) after the relative pose between the camera and the natural texture MARK is obtained through the steps (5), (6), (7) and (8), the virtual object needing three-dimensional registration is subjected to translation and rotation transformation, and the virtual object is rendered into a scene through OpenGL and OpenCV to achieve the effect of augmented reality.
2. The method of claim 1, wherein a well-defined, irregular picture with multi-feature natural texture features is used as the MARK, and no symmetrical picture is selected, and a large number of pictures and obvious differences among the pictures are selected.
3. The method as claimed in claim 1, wherein the method for the curved MARK three-dimensional registration augmented reality based on the region detection uses a Fast algorithm to calculate all feature points in the MARK picture, and comprises the following steps:
step (a), selecting a pixel Q from the MARK picture, in order to judge whether the pixel is a characteristic point,first set its brightness value to Iq
Step (b), a Bresenham circle is obtained by taking the pixel point Q as the center and the radius of the Bresenham circle as 3, and the circle has 16 pixels;
step (c), on the circle with the size of 16 pixels, if the pixel values of 9 continuous pixel points are all larger than Iq+ t or both less than Iq+ t, the pixel point Q is considered as a characteristic point, and t is a set threshold value;
step (d), in order to improve the judgment efficiency of the angular points to eliminate pixels of non-angular points in the image, checking corresponding pixels according to four positions of 1, 9, 5 and 13, and when a pixel point Q is an angular point, at least 3 pixel values of the pixel points of the four positions are all larger than Iq+ t is greater than or less than IqAnd (c) if the pixel values of the pixel points at the four positions do not meet the condition, judging and screening all the pixel points which are not the angular points if the pixel values of the pixel points at the four positions are not the angular points, and judging and screening the rest pixel points to obtain the final angular points by performing the operation judgment in the step (c).
4. The method for the three-dimensional registration of the curved MARK augmented reality based on the region detection as claimed in claim 1 or 3, wherein the descriptor of the feature point calculated by the ORB algorithm is obtained by the following steps:
setting the center of a key point O, and using O as the center of a circlerThe size of the pixel is that the radius is made into a circle;
taking N point pairs in the circle, wherein N is 512;
step (g), defining operation M, wherein IAExpressing the gray scale of A, IBGradation representing B:
Figure FDA0003454925880000041
and (h) carrying out the operation of the step (g) on the selected key points to obtain a descriptor combination consisting of 0 and 1.
5. The method for the curved MARK three-dimensional registration augmented reality based on the region detection as claimed in claim 4, wherein the step (6) is implemented as follows:
and setting I and J as the gray level images of the previous frame and the current frame, then:
Figure FDA0003454925880000051
wherein, the point A is any point in the image, and the coordinate vector is (x, y)TFor a point u on the previous frame I ═ ux,uy]TThe purpose of feature point tracking is to find its position v + u + d in the current frame imagex+dx,uy+dy]TD is ═ dx,dy]TIs the image velocity at point a, i.e. the optical flow at point a; defining the concept of similarity in the two-dimensional neighborhood sense due to the influence of the aperture, setting ωxAnd ωyFor two integer values, the minimized residual function for the velocity vector d is defined as follows:
Figure FDA0003454925880000052
the similarity definition can be obtained through the formula, and the similarity definition is based on the image neighborhood size of (2 omega)x+1)×(2ωy+1), solving d to obtain the corresponding position of the point u in the image J;
comparing the position of the feature point calculated by the current frame with the position of the feature point corresponding to the previous frame, judging whether the feature points of two adjacent frames of the camera move or not, counting the number of the moved feature points, if the number of the feature points of the current frame is less than or equal to ten percent of the number of the total feature points, considering that the object does not move relative to the object of the previous frame, and directly acquiring the pose of the previous frame, if the number of the feature points of the current frame is greater than ten percent of the number of the total feature points, considering that the object moves relative to the object of the previous frame, and recalculating the relative pose of the camera relative to the object.
6. The method of claim 5, wherein ω is a three-dimensional registration augmented reality method based on region detectionxAnd ωyIs 2, 3, 4, 5, 6 or 7.
CN202011563089.4A 2020-12-25 2020-12-25 Curved MARK three-dimensional registration augmented reality method based on region detection Active CN113012298B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011563089.4A CN113012298B (en) 2020-12-25 2020-12-25 Curved MARK three-dimensional registration augmented reality method based on region detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011563089.4A CN113012298B (en) 2020-12-25 2020-12-25 Curved MARK three-dimensional registration augmented reality method based on region detection

Publications (2)

Publication Number Publication Date
CN113012298A CN113012298A (en) 2021-06-22
CN113012298B true CN113012298B (en) 2022-04-08

Family

ID=76383738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011563089.4A Active CN113012298B (en) 2020-12-25 2020-12-25 Curved MARK three-dimensional registration augmented reality method based on region detection

Country Status (1)

Country Link
CN (1) CN113012298B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288657A (en) * 2019-05-23 2019-09-27 华中师范大学 A kind of augmented reality three-dimensional registration method based on Kinect

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8175617B2 (en) * 2009-10-28 2012-05-08 Digimarc Corporation Sensor-based mobile search, related methods and systems
US8121618B2 (en) * 2009-10-28 2012-02-21 Digimarc Corporation Intuitive computing methods and systems
CN102142055A (en) * 2011-04-07 2011-08-03 上海大学 True three-dimensional design method based on augmented reality interactive technology
US10430985B2 (en) * 2014-03-14 2019-10-01 Magic Leap, Inc. Augmented reality systems and methods utilizing reflections

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288657A (en) * 2019-05-23 2019-09-27 华中师范大学 A kind of augmented reality three-dimensional registration method based on Kinect

Also Published As

Publication number Publication date
CN113012298A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
US10528847B2 (en) Method of providing image feature descriptors
CN110288657B (en) Augmented reality three-dimensional registration method based on Kinect
CN109685913B (en) Augmented reality implementation method based on computer vision positioning
CN107292965A (en) A kind of mutual occlusion processing method based on depth image data stream
JP6011102B2 (en) Object posture estimation method
JP7178396B2 (en) Method and computer system for generating data for estimating 3D pose of object included in input image
Barandiaran et al. Real-time optical markerless tracking for augmented reality applications
CN111401266B (en) Method, equipment, computer equipment and readable storage medium for positioning picture corner points
EP2622576A1 (en) Method and apparatus for solving position and orientation from correlated point features in images
CN106952312B (en) Non-identification augmented reality registration method based on line feature description
CN110956661A (en) Method for calculating dynamic pose of visible light and infrared camera based on bidirectional homography matrix
CN111709980A (en) Multi-scale image registration method and device based on deep learning
CN111797688A (en) Visual SLAM method based on optical flow and semantic segmentation
CN111998862B (en) BNN-based dense binocular SLAM method
CN115393519A (en) Three-dimensional reconstruction method based on infrared and visible light fusion image
CN115830135A (en) Image processing method and device and electronic equipment
CN113436251B (en) Pose estimation system and method based on improved YOLO6D algorithm
CN113240656B (en) Visual positioning method and related device and equipment
CN108447092B (en) Method and device for visually positioning marker
CN111179271B (en) Object angle information labeling method based on retrieval matching and electronic equipment
JP6016242B2 (en) Viewpoint estimation apparatus and classifier learning method thereof
CN113012298B (en) Curved MARK three-dimensional registration augmented reality method based on region detection
CN113723432B (en) Intelligent identification and positioning tracking method and system based on deep learning
CN114972451A (en) Rotation-invariant SuperGlue matching-based remote sensing image registration method
CN115147344A (en) Three-dimensional detection and tracking method for parts in augmented reality assisted automobile maintenance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant