WO2020133080A1

WO2020133080A1 - Object positioning method and apparatus, computer device, and storage medium

Info

Publication number: WO2020133080A1
Application number: PCT/CN2018/124409
Authority: WO
Inventors: 熊友军; 郭奎; 庞建新
Original assignee: 深圳市优必选科技有限公司
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2020-07-02

Abstract

An object positioning method, comprising: obtaining a target image obtained by photographing a target object to be positioned (202); performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point (204); searching a word bag for a target feature matching each two-dimensional feature, and determining three-dimensional point coordinates of the corresponding feature point according to the target feature, the word bag being established by learning on the basis of a label point and storing a correspondence between the two-dimensional feature and the three-dimensional point coordinates of the feature point in the target object (206); and obtaining two-dimensional point coordinates of each feature point in a current camera coordinate system, and determining a position relationship of the target object with respect to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates (208). The object positioning method is simple in operation and high in stability and accuracy. In addition, also provided are an object positioning apparatus, a computer device, and a storage medium.

Description

Object positioning method, device, computer equipment and storage medium

Technical field

The invention relates to the field of computer processing, in particular to an object positioning method, device, computer equipment and storage medium.

Background technique

The positioning of any object in space belongs to the category of AR (Augmented Reality). Object positioning determines the positional relationship between the spatial coordinate system of the target object and the camera coordinate system. The current monocular visual target positioning method is divided into a marked point method and a non-marked point method according to whether there is a marked point method. For the positioning of marked points, the position of the target object is determined by positioning the position of the marked point, which has limitations in practical applications, while the positioning of the unmarked position is based on the characteristics of the target object itself, and is easily affected by external environmental factors , Low stability and low accuracy.

Therefore, in view of the above problems, there is an urgent need for an object positioning solution with wide application range and high stability and accuracy.

Summary of the invention

Based on this, it is necessary to provide an object positioning method, device, computer equipment, and storage medium with a wide application range and high stability and accuracy in view of the above problems.

In a first aspect, an embodiment of the present invention provides an object positioning method. The method includes:

Acquire the target image obtained by shooting the target object to be located;

Performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;

Find the target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;

Acquire the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.

In a second aspect, an embodiment of the present invention provides an object positioning device, the device including:

The first acquisition module is used to acquire the target image obtained by shooting the target object to be located;

The first extraction module is used to perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;

The searching module is used to search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marked points Established and stored the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;

The determining module is configured to obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor is caused to perform the following steps:

Acquire the target image obtained by shooting the target object to be located;

Obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.

According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the processor is caused to perform the following steps:

Acquire the target image obtained by shooting the target object to be located;

The above-mentioned object positioning method, device, computer equipment and storage medium first establish a bag of words based on the marked points, and in the actual positioning process, only the two-dimensional features based on the extracted feature points can be used to quickly and accurately locate the target object. The object positioning method is not only simple to operate, but also has high stability and accuracy.

BRIEF DESCRIPTION

In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without paying any creative work, other drawings can be obtained according to the structures shown in these drawings.

FIG. 1 is an application environment diagram of an object positioning method in an embodiment;

2 is a flowchart of an object positioning method in an embodiment;

FIG. 3 is a flowchart of a method for creating a bag of words in an embodiment;

FIG. 4 is a schematic diagram of marking points in an embodiment;

FIG. 5 is a schematic diagram of setting areas in an embodiment;

6 is a flowchart of a method for three-dimensional reconstruction of feature points in an embodiment;

7 is a schematic flowchart of positioning a target object in an embodiment;

8 is a structural block diagram of an object positioning device in an embodiment;

9 is an internal structure diagram of a computer device in an embodiment.

detailed description

In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, and are not intended to limit the present invention.

FIG. 1 is an application environment diagram of an object positioning method in an embodiment. Referring to FIG. 1, the object positioning method is applied to an object positioning system. The object positioning system includes a terminal 110 and a server 120. The terminal 110 obtains a target image by calling a camera to capture the target object to be positioned, and uploads the target image to the server 120. The server 120 performs feature extraction on the target object in the target image to obtain each For the two-dimensional features corresponding to the feature points, find the target features matching each two-dimensional feature in the bag of words, and determine the corresponding feature points corresponding to the three-dimensional point coordinates according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features and the three-dimensional point coordinates of the feature points in the target object is obtained; the two-dimensional point coordinates of each feature point in the current camera coordinate system are obtained, and the target object is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates The determined positional relationship is sent to the terminal 110 with respect to the positional relationship of the current camera coordinate system.

In another embodiment, the above object positioning method can be directly applied to the terminal 110. The terminal 110 calls the camera to take a picture of the target object to be located to obtain a target image, and extracts the feature of the target object in the target image to obtain the correspondence of each feature point The two-dimensional features of the search for the target features that match each two-dimensional feature in the bag of words, and determine the corresponding feature points corresponding to the coordinates of the three-dimensional points according to the target features. The bag of words is learned based on the marker points and stores the target object. Correspondence between the two-dimensional features and the three-dimensional point coordinates of the feature points in the; obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, determine the target object relative to the current The positional relationship of the camera coordinate system.

As shown in FIG. 2, an object positioning method is proposed. The object positioning method can be applied to both a terminal and a server. In this embodiment, the method is applied to the terminal as an example, and specifically includes the following steps:

Step 202: Acquire a target image obtained by shooting a target object to be located.

Among them, the target object refers to the object to be located. The target image refers to an image containing the target object obtained by shooting the target object. Specifically, the terminal captures the target object by calling a camera (camera) to obtain the target image.

Step 204: Perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point.

The feature points refer to the points on the target object in the target image, and the method of selecting the feature points can be customized according to actual needs. In one embodiment, only the significant points in the image may be selected as the feature points, for example, the contour points of the target object may be selected, and of course, all pixel points constituting the target object may also be used as the feature points. The target image obtained by shooting the target object is two-dimensional, so the feature points of the target object are extracted to obtain two-dimensional features. The two-dimensional feature is the feature corresponding to the feature point on the target object. Different feature points correspond to different two-dimensional features, so the two-dimensional feature can be used as the identification mark of the feature point.

In one embodiment, the extracted two-dimensional feature is an ORB (Oriented Fast and Rotated BRIEF) feature, and a FAST (features from accelerated segment test) algorithm can be used to detect feature points. In another embodiment, the extracted features may be HOG features, or of course DOG features.

Step 206: Find target features matching each two-dimensional feature in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored in the target object. The correspondence between the two-dimensional features of the feature points and the coordinates of the three-dimensional points.

Among them, the bag of words is learned based on the marker points, which are reference points used for auxiliary positioning of the target object. The bag of words stores the correspondence between the two-dimensional features of the feature points obtained after learning and the corresponding three-dimensional point coordinates. After determining the two-dimensional features of the feature points, find the target features matching the two-dimensional features in the bag of words, and then determine the corresponding three-dimensional point coordinates according to the target features. The target feature refers to the feature found in the bag of words that matches the two-dimensional feature. Since the two-dimensional features of the feature point and the corresponding three-dimensional point coordinates are stored in advance, after the two-dimensional features of the feature point are extracted, the corresponding three-dimensional point coordinates can be quickly found in the word bag, thereby improving the The speed of object positioning.

Step 208: Acquire the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.

Among them, the target image is obtained by shooting the target object based on the current camera coordinate system. After correspondingly acquiring the two-dimensional point coordinates of the target object's feature point, the camera can pass the camera according to the two-dimensional point coordinates and the corresponding three-dimensional point coordinates The perspective projection model calculates the positional relationship of the target object relative to the current camera coordinate system. The positional relationship is generally expressed by a rotation matrix R and a translation matrix T.

In one embodiment, since the coordinate system corresponding to the obtained three-dimensional point coordinates may not be consistent with the coordinate system corresponding to the target object, after obtaining the three-dimensional point coordinates, it is also necessary to convert the three-dimensional point coordinates to the coordinate system of the target object to obtain The target 3D point coordinates, and then obtain the position relationship of the target object relative to the current camera coordinate system according to the 2D point coordinates and the target 3D point coordinates. In one embodiment, the camera perspective projection matrix can be expressed by the following calculation formula: C=f(RT)*M, where C represents the two-dimensional coordinates corresponding to the feature points of the image, and M is the three-dimensional points corresponding to the corresponding feature points Coordinates, f(RT) represents a function with R and T as variables. Knowing C and M, a rotation matrix R and a translation matrix T can be obtained.

The above object positioning method first learns the target object based on the marked points, and stores the correspondence between the two-dimensional features of the learned feature points and the three-dimensional point coordinates in the bag of words. When locating the target object, extract the two-dimensional feature of the feature point of the target object, find the three-dimensional point coordinates corresponding to the two-dimensional feature in the bag of words, and then determine the target object according to the two-dimensional point coordinates and the three-dimensional point coordinates Positional relationship with respect to the current camera coordinate system. The object positioning method establishes a word bag based on the marked points, and in the actual positioning process, only the two-dimensional features based on the extracted feature points can be used to quickly and accurately locate the target object. The object positioning method is not only simple to operate, but also has high stability and accuracy.

As shown in FIG. 3, in one embodiment, before separately searching for target features matching each two-dimensional feature in the word bag, the method further includes: establishing a word bag; establishing the word bag includes the following steps:

Step 302: Acquire multiple video images including the marked point and the target object obtained by shooting the marked point and the target object.

Among them, the mark point is a reference point used for auxiliary positioning of the target object. Generally, mark points are affixed on the drawing, as shown in FIG. 4, which is a schematic diagram of the mark points in one embodiment, and the mark points in the figure are dots. Set the coordinates of the marker points in advance. With reference to Figure 4, the second dot in Figure 4 can be used as the origin, the direction of dots 2 to 3 is the X axis, and the direction of dots 2 to 1 is the Y axis, X axis The cross product with the Y axis is the Z axis. And set the center coordinates of each circle in advance. The special 6 mark points shown in Figure 4 can better locate the target object.

In one embodiment, the coordinates of the center of the 6 circles are set in advance as 1(0,1,0), 2(0,0,0), 3(1,0,0), 4(-1,- 1,0), 5(0,-1,0) and 6(1,-1,0). Place the target object in the set target area, and place the drawing with marked points in the target area. As shown in FIG. 5, it is a schematic diagram of the set target area, the target area is a rectangular parallelepiped, and the coordinate relationship of the eight vertices can be expressed as follows:

P1(x',y',0),P1'(x',y',offset_z),P2(x',y'+offset_y,0),P2'(x',y'+offset_y,offset_z), P3(x'+offset_x,y',0),P3'(x'+offset_x,y',offset_z),P4(x'+offset_x,y'+offset_y,0),P4'(x'+offset_x, y'+offset_y,offset_z). Among them, P1 is a fixed value, determined by the drawing, offset_x, offset_y, offset_z can be freely adjusted according to the target object being learned. Place the marker point in the target area, and use the camera to shoot around the marker point and the target object. Each shot ensures that the marker point and the target object appear in the camera's field of view at the same time, and multiple videos containing the marker point and the target object are obtained The image, in one embodiment, the camera is a monocular camera.

Step 304: Determine the conversion relationship between the camera coordinate system and the marker coordinate system corresponding to each video image.

Among them, when shooting, as the camera moves, the camera coordinate system is constantly changing. It is necessary to calculate the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system. The conversion relationship refers to the positional relationship between the camera coordinate system and the marker point coordinate system. The positional relationship can be represented by R, T, and R and T respectively represent the rotation matrix and the translation matrix.

In one embodiment, the following conversion relationship is calculated using the following formula. Let the conversion relationship between the camera and the point coordinate system be (RT) _i , as follows: C _i =(RT) _i M. Where, C _i is a point in the camera coordinate system, M is a point in the coordinate system of the marker point, and (RT) _i is the rotation translation matrix. Where C _i is the two-dimensional coordinate in the camera coordinate system, and M is the three-dimensional point coordinate in the corresponding marker coordinate system. The rotation matrix R and the translation matrix T between the camera coordinate system and the marker coordinate system are calculated by the above formula.

Step 306: Calculate the transformation relationship between the camera coordinate system and the reference coordinate system corresponding to each video image according to the transformation relationship.

The reference coordinate system refers to the selected coordinate system as the reference, and the camera coordinate system corresponding to the first video frame can be selected as the reference coordinate system. After knowing the conversion relationship between each camera coordinate system and the marker coordinate system, you can calculate the conversion relationship between each camera coordinate system and the reference coordinate system, that is, calculate the position between the camera coordinate system relationship. In one embodiment, the coordinate transformation relationship between two adjacent cameras can be obtained according to the conversion relationship C _i =(RT) _i M between the camera and the landmark coordinate system:

In one embodiment, the camera coordinate system of the first frame of the video image is used as the reference coordinate system, and the change relationship between each camera coordinate system and the reference coordinate system can be calculated through the transformation relationship between the adjacent camera coordinate systems For example, using the coordinate system where C ₁ is located as the reference coordinate system, and knowing the transformation relationship between adjacent coordinates such as C ₁ and C ₂ , C ₂ and C ₃ , you can determine the transformation between each C _i and C ₁ relationship.

Step 308: Convert the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the two-dimensional coordinates of the feature points in each video image in the reference coordinate system.

The transformation relationship refers to the position transformation relationship corresponding to the conversion of the coordinate points in the camera coordinate system to the reference coordinate system. After calculating the camera coordinate system and the reference coordinate system corresponding to each video image, the coordinates of the feature points of the target object are converted to the reference coordinate system, and then the feature points in each video image are converted to the reference coordinate system The corresponding two-dimensional coordinate point.

Step 310: Perform feature extraction on the target object in the video image to obtain a two-dimensional feature corresponding to each feature point.

Among them, the two-dimensional feature corresponding to each feature point is extracted for the video image. The two-dimensional feature can use the ORB feature, and the corresponding can use the FAST (features from accelerated segment test) algorithm to detect the feature and extract it.

Step 312: Perform a three-dimensional reconstruction of the target object according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain the corresponding three-dimensional Point coordinates.

Among them, although different video images are based on different camera coordinate systems, for the same feature point on the target object, the extracted two-dimensional features are the same, and different feature points correspond to different two-dimensional features, so you can Through the feature matching method, the same feature points corresponding to different video images are determined to form matching feature points. After the matching feature points are known to be in the two-dimensional coordinates of the reference coordinate system, combined with the camera's internal parameters, you can Perform three-dimensional reconstruction on the feature point to obtain the three-dimensional point coordinates corresponding to the feature point. According to this method, three-dimensional reconstruction is performed on each feature point, thus completing the three-dimensional reconstruction of the target object.

Step 314: Associate and store the two-dimensional features of the feature points in the target object and the corresponding three-dimensional point coordinates to complete the establishment of the bag of words.

Among them, the three-dimensional point coordinates are reconstructed with respect to the reference standard system, and the corresponding three-dimensional point coordinates are also determined based on the reference coordinate system. After determining the coordinate of the three-dimensional point corresponding to the feature point on the target object, the feature bag of the target object is associated with the three-dimensional point coordinate and stored to complete the establishment of the bag of words. The method of establishing the above bag of words can accurately, quickly and stably determine the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points by locating the target object by means of the marker points.

As shown in FIG. 6, in one embodiment, the target object is three-dimensionally reconstructed according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain each The corresponding 3D point coordinates of the feature point after 3D reconstruction include:

Step 312A: Match the feature points in different video images according to the two-dimensional features of the feature points, determine the corresponding feature points in the different video images, and obtain the corresponding two different features of the same feature point in the different video images in the reference coordinate system Dimensional coordinates.

Among them, the two-dimensional features corresponding to the same feature point in different video images are the same. Therefore, the same feature points in different video images can be determined by feature matching. Then separately obtain the two-dimensional coordinates corresponding to the same feature point in different video images in the reference coordinate system. For example, if the two-dimensional feature of point A of the first video image is the same as the two-dimensional feature of point B of the second video image, then points A and B correspond to the same feature point. Then separately obtain the two-dimensional coordinates of point A in the reference coordinate system, and obtain the two-dimensional coordinates of point B in the reference coordinate system.

Step 312B: Obtain the transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images.

Among them, the internal parameter matrix refers to the internal parameter matrix of the camera. After obtaining the internal and external parameters of the camera, the three-dimensional coordinates of the spatial point can be calculated. The internal parameter matrix is fixed and can be obtained directly. The external parameter refers to the positional relationship between the camera coordinate systems corresponding to different video images, and the transformation relationship here refers to the positional relationship between the camera coordinate systems.

Step 312C: Perform three-dimensional reconstruction of the corresponding feature points according to the internal parameter matrix, transformation relationship and different two-dimensional coordinates corresponding to the same feature point, to obtain the corresponding three-dimensional point coordinates of the feature point in the reference coordinate system.

Among them, the transformation relationship refers to the relative position relationship between the camera coordinate systems, which can also be represented by a rotation matrix R and a translation matrix T. Three-dimensional reconstruction of feature points can be performed by knowing the camera internal parameter matrix corresponding to each camera coordinate system and different two-dimensional coordinates and transformation relationships corresponding to the same matching point. Specifically, suppose there are two video images, the first video image and the second video image, and the coordinate system corresponding to the first video image is used as the reference coordinate system. At this time, the projection matrix of the cameras at different positions is:

M ₁ = K ₁ [I 0] M ₂ = K ₂ [RT]

Where I is the identity matrix, K ₁ and K ₂ are the internal parameter matrix of the cameras, R is the relative rotation matrix between the two camera coordinate systems, and T is the translation matrix between the two cameras. Assume that x and x'are a pair of matching points in two video images, that is, the corresponding feature points are the same. Let X be the corresponding space point coordinates, then the relationship between them can be expressed as: x = M ₁ X, x'= M ₂ X. Through the above relationship, the three-dimensional point coordinates of the feature point in the reference coordinate system can be obtained.

In one embodiment, determining the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system includes: acquiring the three-dimensional point coordinates corresponding to the marker point in the marker point coordinate system; Recognize points to determine the two-dimensional coordinates of the marker point in the camera coordinate system; calculate the camera coordinate system and the coordinate system of the marker point according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system The conversion relationship between.

Among them, the three-dimensional point coordinates can be set in advance in the marker point coordinate system. Recognize the marked points in the video image to obtain the two-dimensional coordinates of the marked points in the camera coordinate system. After determining the two-dimensional coordinates of the marked points in the camera coordinate system and the three-dimensional point coordinates in the marked point coordinate system , The conversion relationship between the camera coordinate system and the marker coordinate system can be calculated according to the camera projection matrix equation. Specifically, the equation of the camera projection matrix equation is as follows:

Where s is the scaling factor, dX, dY are the physical dimensions of the pixel, f is the focal length, R is the rotation matrix, T is the translation matrix, α _x = f/dX, α _y = f/dY, (u, v) It is the two-dimensional point coordinate in the video image, (X _W , Y _W , Z _W ) is its corresponding spatial physical coordinate. Since s, dX, dY, and f are known quantities, R and T can be calculated based on multiple sets of two-dimensional point coordinates and three-dimensional point coordinates. The number of groups is determined according to the number of unknown degrees of freedom included in the rotation matrix and translation matrix. If the number of unknown degrees of freedom is 4, then at least 4 pairs of coordinates are needed to calculate the corresponding rotation matrix and translation matrix.

In one embodiment, after the step of photographing the marker point and the target object to obtain multiple video images containing the marker point and the target object, the method further includes: determining the segmentation position corresponding to the target object in each video image, according to The segmentation position extracts the target object from the corresponding video image. After extracting the target object in the video image, it proceeds to the feature extraction of the target object in the video image to obtain the two-dimensional feature corresponding to each feature point.

Among them, in order to filter other non-target interference in the space, the target object in the video image needs to be extracted. First, the segmentation position corresponding to the target object in the video image is determined. In one embodiment, the target object is placed in a rectangular parallelepiped, as shown in FIG. 5, the vertices are P1, P2, P3, P4, P1', P2', P3', P4'. The vertices are projected onto the image plane according to the perspective projection matrix of the camera, and the polygonal area obtained after the projection is the segmentation position of the target object. After the segmentation position is determined, the target object can be extracted according to the segmentation position, and then the feature extraction step is entered.

In one embodiment, acquiring the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determining the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates include: The coordinates are converted to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates; the positional relationship of the target object relative to the current camera coordinate system is calculated according to the two-dimensional point coordinates and the target three-dimensional coordinates in the current camera coordinate system.

Among them, since the obtained three-dimensional point coordinates are obtained based on the reference coordinate system when the bag of words is established, in order to obtain the positional relationship between the target object coordinate system and the current camera coordinate system, the three-dimensional point coordinates need to be converted from the reference coordinate system to the target Object coordinate system to get the target three-dimensional coordinates. Specifically, by moving the origin of the acquired three-dimensional point coordinates corresponding to the target object to the target object, that is, the feature points of the target object can be centered, and then all points are subtracted from the center. In this way, according to the two-dimensional coordinates in the current camera coordinate system and the corresponding target three-dimensional coordinates, the positional relationship of the target object equivalent to the current camera coordinate system can be directly calculated.

In one embodiment, converting the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates includes: obtaining the three-dimensional point coordinates corresponding to each feature point in the target object; the three-dimensional point coordinates corresponding to all feature points Perform averaging to obtain the average three-dimensional point coordinates; subtract the average three-dimensional point coordinates from the three-dimensional point coordinates corresponding to each feature point to obtain the corresponding target three-dimensional coordinates.

Among them, in order to convert the three-dimensional point coordinates to the coordinate system corresponding to the target object, it is necessary to obtain the three-dimensional point coordinates corresponding to each feature point on the target object, and then average the three-dimensional point coordinates corresponding to all feature points to obtain the average three-dimensional point Coordinates, and finally subtract the average three-dimensional point coordinates of the three-dimensional point coordinates of each feature point to obtain the corresponding target three-dimensional coordinates. The target three-dimensional coordinates are the corresponding coordinates transferred to the target object coordinate system.

As shown in FIG. 7, in one embodiment, a schematic flowchart of positioning a target object. In the first step, the drawing containing the marked points is placed on a flat surface. The second step is to place the target object in the target placement area of the drawing. In the third step, a video image containing the marked points and the target object is obtained through the camera. The fourth step is to segment the target object in the video image and extract the target object image. In the fifth step, feature extraction is performed on the target object image to obtain the two-dimensional features of the feature points. In the sixth step, the target object is three-dimensionally reconstructed according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain the corresponding Three-dimensional point coordinates. In the seventh step, the two-dimensional features of the feature points are associated with the corresponding three-dimensional point coordinates and stored to complete the establishment of the bag of words. In the eighth step, the drawing is removed, the target object is placed on a flat surface, the target image containing the target object is captured by the camera, and the target image is subjected to feature extraction to obtain the two-dimensional features of the feature points. In the ninth step, the target feature corresponding to the two-dimensional feature is matched in the word bag, and then the corresponding three-dimensional point coordinates are obtained. Step 10: Obtain the two-dimensional point coordinates of the feature point in the current camera coordinate system, and determine the target object's posture equivalent to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.

As shown in FIG. 8, in one embodiment, an object positioning device is proposed, which includes:

The first acquisition module 802 is used to acquire a target image obtained by shooting the target object to be located; the first extraction module 804 is used to perform feature extraction on the target object in the target image to obtain the corresponding Two-dimensional feature; the search module 806 is used to respectively search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. Established by learning based on the marker points, and storing the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points; the determination module 808 is used to obtain the two-dimensional features of each feature point in the current camera coordinate system Point coordinates, the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.

In one embodiment, the above-mentioned object positioning apparatus further includes: a second acquisition module, configured to acquire multiple video images containing the marker point and the target object obtained by shooting the marker point and the target object; conversion The relationship determination module is used to determine the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system; the calculation module is used to calculate the camera coordinate system and reference corresponding to each video image according to the conversion relationship A transformation relationship between coordinate systems; a transformation module for transforming the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the feature points in each video image in the The two-dimensional coordinates in the reference coordinate system; the second extraction module is used to extract the feature of the target object in the video image to obtain the two-dimensional feature corresponding to each feature point; the three-dimensional reconstruction module is used to The two-dimensional features corresponding to each feature point and the two-dimensional coordinates of the feature points in each video image in the reference coordinate system are used to three-dimensionally reconstruct the target object, and the corresponding three-dimensional reconstruction of each feature point is obtained. Three-dimensional point coordinates; a storage module for associatively storing the two-dimensional features of the feature points in the target object and the corresponding three-dimensional point coordinates to complete the establishment of the word bag.

In one embodiment, the three-dimensional reconstruction module is further used to match the feature points in different video images according to the two-dimensional features of the feature points, determine the corresponding feature points in different video images, and obtain the Corresponding different two-dimensional coordinates of the same feature point in the reference coordinate system; obtaining the transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images; according to the internal parameter matrix, the transformation relationship Different two-dimensional coordinates corresponding to the same feature point are used to three-dimensionally reconstruct the corresponding feature point to obtain the corresponding three-dimensional point coordinates of the feature point under the reference coordinate system.

In one embodiment, the three-dimensional reconstruction module is further used to obtain the coordinate of the corresponding three-dimensional point of the marked point in the coordinate system of the marked point; identify the marked point in the video image and determine that the marked point is in the camera Two-dimensional coordinates in the coordinate system; the camera coordinate system and the coordinates of the marker point are calculated according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system Conversion relationship between departments.

In one embodiment, the above-mentioned object positioning device further includes: a segmentation module for determining a segmentation position corresponding to the target object in each video image, and extracting the target object from the corresponding video image according to the segmentation position, when the extracted After the target object in the video image, the feature extraction module is notified to enter feature extraction on the target object in the video image to obtain the two-dimensional feature corresponding to each feature point.

In one embodiment, the determination module is further configured to convert the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates; according to the two-dimensional point coordinates in the current camera coordinate system and the target three-dimensional The coordinate calculation obtains the positional relationship of the target object relative to the current camera coordinate system.

In one embodiment, the determination module is further used to obtain the three-dimensional point coordinates corresponding to each feature point in the target object; average the three-dimensional point coordinates corresponding to all feature points to obtain an average three-dimensional point coordinate; The corresponding three-dimensional point coordinates are subtracted from the average three-dimensional point coordinates to obtain the corresponding target three-dimensional coordinates.

FIG. 9 shows an internal structure diagram of a computer device in an embodiment. The computer can be a terminal or a server. As shown in FIG. 9, the computer device includes a processor, a memory, and a network interface connected through a system bus. Among them, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program. When the computer program is executed by the processor, the processor may cause the processor to implement an object positioning method. A computer program may also be stored in the internal memory. When the computer program is executed by the processor, the processor may cause the processor to execute the object positioning method. The network interface is used to communicate with the outside world. Those skilled in the art may understand that the structure shown in FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.

In one embodiment, the object positioning method provided by the present application may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 9. Various program templates constituting the object positioning device can be stored in the memory of the computer equipment. For example, the shooting module 802, the extraction module 804, the search module 806, and the determination module 808.

A computer device includes a memory and a processor, and the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps: acquisition is obtained by shooting a target object to be located The target image of the target image; perform feature extraction on the target object in the target image to obtain the two-dimensional feature corresponding to each feature point; find the target feature that matches each of the two-dimensional feature in the bag of words, according to the The target feature determines the three-dimensional point coordinates corresponding to the corresponding feature points. The bag of words is learned based on the marker points and stores the correspondence between the two-dimensional features of the feature points in the target object and the three-dimensional point coordinates; The two-dimensional point coordinates of the feature points in the current camera coordinate system, and the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.

A computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to perform the following steps: acquiring a target image obtained by shooting a target object to be located; The target object in the image is subjected to feature extraction to obtain the two-dimensional features corresponding to each feature point; the target features matching each of the two-dimensional features are respectively found in the bag of words, and the corresponding feature points are determined according to the target features 3D point coordinates, the bag of words is learned based on the marker points, and stores the correspondence between the 2D features of the feature points in the target object and the coordinates of the 3D points; obtain each feature point in the current camera coordinate system The two-dimensional point coordinates in, the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.

Those of ordinary skill in the art may understand that all or part of the processes in the method of the above embodiments may be completed by instructing relevant hardware through a computer program, and the program may be stored in a non-volatile computer-readable storage medium In this case, when the program is executed, it may include the processes of the embodiments of the above methods. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the scope described in this specification.

The above-mentioned embodiment only expresses several implementation manners of the present application, and its description is more specific and detailed, but it cannot be understood as a limitation of the patent scope of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, a number of modifications and improvements can also be made, which all fall within the protection scope of the present application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims

An object positioning method, characterized in that the method includes:

Acquire the target image obtained by shooting the target object to be located;

Performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;

Find the target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;

Obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
The method according to claim 1, wherein before the searching for target features matching each of the two-dimensional features in the bag of words, the method further comprises: building a bag of words; the building of the bag of words includes the following Step: Obtain multiple video images containing the marked point and the target object obtained by shooting the marked point and the target object;

Determine the conversion relationship between the camera coordinate system and the point coordinate system corresponding to each video image;

Calculating the conversion relationship between the camera coordinate system and the reference coordinate system corresponding to each video image according to the conversion relationship;

Converting the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the two-dimensional coordinates of the feature points in each video image in the reference coordinate system;

Performing feature extraction on the target object in the video image to obtain a two-dimensional feature corresponding to each feature point;

3D reconstruction of the target object according to the 2D feature corresponding to each feature point and the 2D coordinates of the feature point in each video image in the reference coordinate system to obtain each feature point for 3D Corresponding 3D point coordinates after reconstruction;

The two-dimensional features of the feature points in the target object and the corresponding three-dimensional point coordinates are stored in association to complete the establishment of the word bag.
The method according to claim 2, wherein the two-dimensional coordinate pair in the reference coordinate system according to the two-dimensional feature corresponding to each feature point and the feature point in each video image The three-dimensional reconstruction of the target object to obtain the corresponding three-dimensional point coordinates after the three-dimensional reconstruction of each feature point includes:

Match feature points in different video images according to the two-dimensional features of the feature points, determine corresponding feature points in different video images, and obtain corresponding features of the same feature point in different video images in the reference coordinate system Different two-dimensional coordinates;

Obtaining a transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images;

Perform a three-dimensional reconstruction of the corresponding feature point according to the internal parameter matrix, the transformation relationship, and different two-dimensional coordinates corresponding to the same feature point to obtain the corresponding three-dimensional point coordinates of the feature point in the reference coordinate system.
The method according to claim 2, wherein the determining the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system includes:

Acquiring the coordinate of the three-dimensional point corresponding to the marked point in the coordinate system of the marked point;

Identify the marked points in the video image, and determine the two-dimensional coordinates of the marked points in the camera coordinate system;

The conversion relationship between the camera coordinate system and the marker point coordinate system is calculated according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system.
The method according to claim 2, wherein after the step of photographing the marker point and the target object to obtain multiple video images including the marker point and the target object, the method further comprises:

Determine the segmentation position corresponding to the target object in each video image, extract the target object from the corresponding video image according to the segmentation position, and after extracting the target object in the video image, enter the target object in the video image The step of performing feature extraction to obtain the two-dimensional feature corresponding to each feature point.
The method according to claim 1, wherein the acquiring the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determining the target object according to the two-dimensional point coordinates and the three-dimensional point coordinates The positional relationship with respect to the current camera coordinate system, including:

Convert the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates;

The positional relationship of the target object relative to the current camera coordinate system is calculated according to the two-dimensional point coordinates in the current camera coordinate system and the target three-dimensional coordinates.
The method according to claim 6, wherein the converting the three-dimensional point coordinates to a coordinate system corresponding to the target object to obtain the target three-dimensional coordinates includes:

Obtain the three-dimensional point coordinates corresponding to each feature point in the target object;

Average the three-dimensional point coordinates corresponding to all feature points to obtain the average three-dimensional point coordinates;

Subtract the average three-dimensional point coordinates from the three-dimensional point coordinates corresponding to each feature point to obtain the corresponding target three-dimensional coordinates.
An object positioning device, characterized in that the device comprises:

The first acquisition module is used to acquire the target image obtained by shooting the target object to be located;

The first extraction module is used to perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;

The searching module is used to search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marked points Established and stored the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;

The determining module is configured to obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
A computer device comprising a memory and a processor, the memory storing a computer program, when the computer program is executed by the processor, the processor is caused to perform the method according to any one of claims 1 to 7. A step of.
A computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.