WO2020133080A1 - Object positioning method and apparatus, computer device, and storage medium - Google Patents

Object positioning method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2020133080A1
WO2020133080A1 PCT/CN2018/124409 CN2018124409W WO2020133080A1 WO 2020133080 A1 WO2020133080 A1 WO 2020133080A1 CN 2018124409 W CN2018124409 W CN 2018124409W WO 2020133080 A1 WO2020133080 A1 WO 2020133080A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
feature
coordinate system
point
coordinates
Prior art date
Application number
PCT/CN2018/124409
Other languages
French (fr)
Chinese (zh)
Inventor
熊友军
郭奎
庞建新
Original Assignee
深圳市优必选科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技有限公司 filed Critical 深圳市优必选科技有限公司
Priority to PCT/CN2018/124409 priority Critical patent/WO2020133080A1/en
Publication of WO2020133080A1 publication Critical patent/WO2020133080A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques

Definitions

  • the invention relates to the field of computer processing, in particular to an object positioning method, device, computer equipment and storage medium.
  • the positioning of any object in space belongs to the category of AR (Augmented Reality).
  • Object positioning determines the positional relationship between the spatial coordinate system of the target object and the camera coordinate system.
  • the current monocular visual target positioning method is divided into a marked point method and a non-marked point method according to whether there is a marked point method.
  • the position of the target object is determined by positioning the position of the marked point, which has limitations in practical applications, while the positioning of the unmarked position is based on the characteristics of the target object itself, and is easily affected by external environmental factors , Low stability and low accuracy.
  • an embodiment of the present invention provides an object positioning method.
  • the method includes:
  • the bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
  • an embodiment of the present invention provides an object positioning device, the device including:
  • the first acquisition module is used to acquire the target image obtained by shooting the target object to be located;
  • the first extraction module is used to perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;
  • the searching module is used to search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features.
  • the bag of words is learned based on the marked points Established and stored the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
  • the determining module is configured to obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  • an embodiment of the present invention provides a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the processor is caused to perform the following steps:
  • the bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
  • an embodiment of the present invention provides a computer-readable storage medium that stores a computer program.
  • the processor is caused to perform the following steps:
  • the bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
  • the above-mentioned object positioning method, device, computer equipment and storage medium first establish a bag of words based on the marked points, and in the actual positioning process, only the two-dimensional features based on the extracted feature points can be used to quickly and accurately locate the target object.
  • the object positioning method is not only simple to operate, but also has high stability and accuracy.
  • FIG. 1 is an application environment diagram of an object positioning method in an embodiment
  • FIG. 3 is a flowchart of a method for creating a bag of words in an embodiment
  • FIG. 4 is a schematic diagram of marking points in an embodiment
  • FIG. 5 is a schematic diagram of setting areas in an embodiment
  • FIG. 6 is a flowchart of a method for three-dimensional reconstruction of feature points in an embodiment
  • FIG. 7 is a schematic flowchart of positioning a target object in an embodiment
  • FIG. 8 is a structural block diagram of an object positioning device in an embodiment
  • FIG. 9 is an internal structure diagram of a computer device in an embodiment.
  • FIG. 1 is an application environment diagram of an object positioning method in an embodiment.
  • the object positioning method is applied to an object positioning system.
  • the object positioning system includes a terminal 110 and a server 120.
  • the terminal 110 obtains a target image by calling a camera to capture the target object to be positioned, and uploads the target image to the server 120.
  • the server 120 performs feature extraction on the target object in the target image to obtain each For the two-dimensional features corresponding to the feature points, find the target features matching each two-dimensional feature in the bag of words, and determine the corresponding feature points corresponding to the three-dimensional point coordinates according to the target features.
  • the bag of words is learned based on the marker points and stored
  • the correspondence between the two-dimensional features and the three-dimensional point coordinates of the feature points in the target object is obtained; the two-dimensional point coordinates of each feature point in the current camera coordinate system are obtained, and the target object is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates
  • the determined positional relationship is sent to the terminal 110 with respect to the positional relationship of the current camera coordinate system.
  • the above object positioning method can be directly applied to the terminal 110.
  • the terminal 110 calls the camera to take a picture of the target object to be located to obtain a target image, and extracts the feature of the target object in the target image to obtain the correspondence of each feature point.
  • the bag of words is learned based on the marker points and stores the target object.
  • Correspondence between the two-dimensional features and the three-dimensional point coordinates of the feature points in the obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, determine the target object relative to the current The positional relationship of the camera coordinate system.
  • an object positioning method is proposed.
  • the object positioning method can be applied to both a terminal and a server.
  • the method is applied to the terminal as an example, and specifically includes the following steps:
  • Step 202 Acquire a target image obtained by shooting a target object to be located.
  • the target object refers to the object to be located.
  • the target image refers to an image containing the target object obtained by shooting the target object.
  • the terminal captures the target object by calling a camera (camera) to obtain the target image.
  • Step 204 Perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point.
  • the feature points refer to the points on the target object in the target image, and the method of selecting the feature points can be customized according to actual needs. In one embodiment, only the significant points in the image may be selected as the feature points, for example, the contour points of the target object may be selected, and of course, all pixel points constituting the target object may also be used as the feature points.
  • the target image obtained by shooting the target object is two-dimensional, so the feature points of the target object are extracted to obtain two-dimensional features.
  • the two-dimensional feature is the feature corresponding to the feature point on the target object. Different feature points correspond to different two-dimensional features, so the two-dimensional feature can be used as the identification mark of the feature point.
  • the extracted two-dimensional feature is an ORB (Oriented Fast and Rotated BRIEF) feature, and a FAST (features from accelerated segment test) algorithm can be used to detect feature points.
  • the extracted features may be HOG features, or of course DOG features.
  • Step 206 Find target features matching each two-dimensional feature in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features.
  • the bag of words is learned based on the marker points and stored in the target object. The correspondence between the two-dimensional features of the feature points and the coordinates of the three-dimensional points.
  • the bag of words is learned based on the marker points, which are reference points used for auxiliary positioning of the target object.
  • the bag of words stores the correspondence between the two-dimensional features of the feature points obtained after learning and the corresponding three-dimensional point coordinates. After determining the two-dimensional features of the feature points, find the target features matching the two-dimensional features in the bag of words, and then determine the corresponding three-dimensional point coordinates according to the target features.
  • the target feature refers to the feature found in the bag of words that matches the two-dimensional feature.
  • the two-dimensional features of the feature point and the corresponding three-dimensional point coordinates are stored in advance, after the two-dimensional features of the feature point are extracted, the corresponding three-dimensional point coordinates can be quickly found in the word bag, thereby improving the The speed of object positioning.
  • Step 208 Acquire the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  • the target image is obtained by shooting the target object based on the current camera coordinate system.
  • the camera can pass the camera according to the two-dimensional point coordinates and the corresponding three-dimensional point coordinates
  • the perspective projection model calculates the positional relationship of the target object relative to the current camera coordinate system.
  • the positional relationship is generally expressed by a rotation matrix R and a translation matrix T.
  • the above object positioning method first learns the target object based on the marked points, and stores the correspondence between the two-dimensional features of the learned feature points and the three-dimensional point coordinates in the bag of words.
  • extract the two-dimensional feature of the feature point of the target object find the three-dimensional point coordinates corresponding to the two-dimensional feature in the bag of words, and then determine the target object according to the two-dimensional point coordinates and the three-dimensional point coordinates Positional relationship with respect to the current camera coordinate system.
  • the object positioning method establishes a word bag based on the marked points, and in the actual positioning process, only the two-dimensional features based on the extracted feature points can be used to quickly and accurately locate the target object.
  • the object positioning method is not only simple to operate, but also has high stability and accuracy.
  • the method before separately searching for target features matching each two-dimensional feature in the word bag, the method further includes: establishing a word bag; establishing the word bag includes the following steps:
  • Step 302 Acquire multiple video images including the marked point and the target object obtained by shooting the marked point and the target object.
  • the mark point is a reference point used for auxiliary positioning of the target object.
  • mark points are affixed on the drawing, as shown in FIG. 4, which is a schematic diagram of the mark points in one embodiment, and the mark points in the figure are dots.
  • the second dot in Figure 4 can be used as the origin
  • the direction of dots 2 to 3 is the X axis
  • the direction of dots 2 to 1 is the Y axis
  • X axis The cross product with the Y axis
  • the special 6 mark points shown in Figure 4 can better locate the target object.
  • the coordinates of the center of the 6 circles are set in advance as 1(0,1,0), 2(0,0,0), 3(1,0,0), 4(-1,- 1,0), 5(0,-1,0) and 6(1,-1,0). Place the target object in the set target area, and place the drawing with marked points in the target area. As shown in FIG. 5, it is a schematic diagram of the set target area, the target area is a rectangular parallelepiped, and the coordinate relationship of the eight vertices can be expressed as follows:
  • P1 is a fixed value, determined by the drawing
  • offset_x, offset_y, offset_z can be freely adjusted according to the target object being learned.
  • the camera is a monocular camera.
  • Step 304 Determine the conversion relationship between the camera coordinate system and the marker coordinate system corresponding to each video image.
  • the conversion relationship refers to the positional relationship between the camera coordinate system and the marker point coordinate system.
  • the positional relationship can be represented by R, T, and R and T respectively represent the rotation matrix and the translation matrix.
  • the following conversion relationship is calculated using the following formula.
  • C i (RT) i M.
  • C i (RT) i M.
  • M is a point in the coordinate system of the marker point
  • (RT) i is the rotation translation matrix.
  • C i is the two-dimensional coordinate in the camera coordinate system
  • M is the three-dimensional point coordinate in the corresponding marker coordinate system.
  • the rotation matrix R and the translation matrix T between the camera coordinate system and the marker coordinate system are calculated by the above formula.
  • Step 306 Calculate the transformation relationship between the camera coordinate system and the reference coordinate system corresponding to each video image according to the transformation relationship.
  • the reference coordinate system refers to the selected coordinate system as the reference, and the camera coordinate system corresponding to the first video frame can be selected as the reference coordinate system. After knowing the conversion relationship between each camera coordinate system and the marker coordinate system, you can calculate the conversion relationship between each camera coordinate system and the reference coordinate system, that is, calculate the position between the camera coordinate system relationship.
  • the camera coordinate system of the first frame of the video image is used as the reference coordinate system, and the change relationship between each camera coordinate system and the reference coordinate system can be calculated through the transformation relationship between the adjacent camera coordinate systems For example, using the coordinate system where C 1 is located as the reference coordinate system, and knowing the transformation relationship between adjacent coordinates such as C 1 and C 2 , C 2 and C 3 , you can determine the transformation between each C i and C 1 relationship.
  • Step 308 Convert the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the two-dimensional coordinates of the feature points in each video image in the reference coordinate system.
  • the transformation relationship refers to the position transformation relationship corresponding to the conversion of the coordinate points in the camera coordinate system to the reference coordinate system.
  • Step 310 Perform feature extraction on the target object in the video image to obtain a two-dimensional feature corresponding to each feature point.
  • the two-dimensional feature corresponding to each feature point is extracted for the video image.
  • the two-dimensional feature can use the ORB feature, and the corresponding can use the FAST (features from accelerated segment test) algorithm to detect the feature and extract it.
  • Step 312 Perform a three-dimensional reconstruction of the target object according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain the corresponding three-dimensional Point coordinates.
  • the extracted two-dimensional features are the same, and different feature points correspond to different two-dimensional features, so you can Through the feature matching method, the same feature points corresponding to different video images are determined to form matching feature points.
  • the matching feature points are known to be in the two-dimensional coordinates of the reference coordinate system, combined with the camera's internal parameters, you can Perform three-dimensional reconstruction on the feature point to obtain the three-dimensional point coordinates corresponding to the feature point. According to this method, three-dimensional reconstruction is performed on each feature point, thus completing the three-dimensional reconstruction of the target object.
  • Step 314 Associate and store the two-dimensional features of the feature points in the target object and the corresponding three-dimensional point coordinates to complete the establishment of the bag of words.
  • the three-dimensional point coordinates are reconstructed with respect to the reference standard system, and the corresponding three-dimensional point coordinates are also determined based on the reference coordinate system.
  • the feature bag of the target object is associated with the three-dimensional point coordinate and stored to complete the establishment of the bag of words.
  • the method of establishing the above bag of words can accurately, quickly and stably determine the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points by locating the target object by means of the marker points.
  • the target object is three-dimensionally reconstructed according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain each
  • the corresponding 3D point coordinates of the feature point after 3D reconstruction include:
  • Step 312A Match the feature points in different video images according to the two-dimensional features of the feature points, determine the corresponding feature points in the different video images, and obtain the corresponding two different features of the same feature point in the different video images in the reference coordinate system Dimensional coordinates.
  • the two-dimensional features corresponding to the same feature point in different video images are the same. Therefore, the same feature points in different video images can be determined by feature matching. Then separately obtain the two-dimensional coordinates corresponding to the same feature point in different video images in the reference coordinate system. For example, if the two-dimensional feature of point A of the first video image is the same as the two-dimensional feature of point B of the second video image, then points A and B correspond to the same feature point. Then separately obtain the two-dimensional coordinates of point A in the reference coordinate system, and obtain the two-dimensional coordinates of point B in the reference coordinate system.
  • Step 312B Obtain the transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images.
  • the internal parameter matrix refers to the internal parameter matrix of the camera. After obtaining the internal and external parameters of the camera, the three-dimensional coordinates of the spatial point can be calculated. The internal parameter matrix is fixed and can be obtained directly.
  • the external parameter refers to the positional relationship between the camera coordinate systems corresponding to different video images, and the transformation relationship here refers to the positional relationship between the camera coordinate systems.
  • Step 312C Perform three-dimensional reconstruction of the corresponding feature points according to the internal parameter matrix, transformation relationship and different two-dimensional coordinates corresponding to the same feature point, to obtain the corresponding three-dimensional point coordinates of the feature point in the reference coordinate system.
  • the transformation relationship refers to the relative position relationship between the camera coordinate systems, which can also be represented by a rotation matrix R and a translation matrix T.
  • Three-dimensional reconstruction of feature points can be performed by knowing the camera internal parameter matrix corresponding to each camera coordinate system and different two-dimensional coordinates and transformation relationships corresponding to the same matching point. Specifically, suppose there are two video images, the first video image and the second video image, and the coordinate system corresponding to the first video image is used as the reference coordinate system. At this time, the projection matrix of the cameras at different positions is:
  • I is the identity matrix
  • K 1 and K 2 are the internal parameter matrix of the cameras
  • R is the relative rotation matrix between the two camera coordinate systems
  • T is the translation matrix between the two cameras.
  • determining the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system includes: acquiring the three-dimensional point coordinates corresponding to the marker point in the marker point coordinate system; Recognize points to determine the two-dimensional coordinates of the marker point in the camera coordinate system; calculate the camera coordinate system and the coordinate system of the marker point according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system The conversion relationship between.
  • the three-dimensional point coordinates can be set in advance in the marker point coordinate system. Recognize the marked points in the video image to obtain the two-dimensional coordinates of the marked points in the camera coordinate system. After determining the two-dimensional coordinates of the marked points in the camera coordinate system and the three-dimensional point coordinates in the marked point coordinate system , The conversion relationship between the camera coordinate system and the marker coordinate system can be calculated according to the camera projection matrix equation. Specifically, the equation of the camera projection matrix equation is as follows:
  • s is the scaling factor
  • dX, dY are the physical dimensions of the pixel
  • f is the focal length
  • R is the rotation matrix
  • T is the translation matrix
  • ⁇ x f/dX
  • ⁇ y f/dY
  • (u, v) It is the two-dimensional point coordinate in the video image
  • (X W , Y W , Z W ) is its corresponding spatial physical coordinate. Since s, dX, dY, and f are known quantities, R and T can be calculated based on multiple sets of two-dimensional point coordinates and three-dimensional point coordinates.
  • the number of groups is determined according to the number of unknown degrees of freedom included in the rotation matrix and translation matrix. If the number of unknown degrees of freedom is 4, then at least 4 pairs of coordinates are needed to calculate the corresponding rotation matrix and translation matrix.
  • the method further includes: determining the segmentation position corresponding to the target object in each video image, according to The segmentation position extracts the target object from the corresponding video image. After extracting the target object in the video image, it proceeds to the feature extraction of the target object in the video image to obtain the two-dimensional feature corresponding to each feature point.
  • the target object in the video image needs to be extracted.
  • the segmentation position corresponding to the target object in the video image is determined.
  • the target object is placed in a rectangular parallelepiped, as shown in FIG. 5, the vertices are P1, P2, P3, P4, P1', P2', P3', P4'.
  • the vertices are projected onto the image plane according to the perspective projection matrix of the camera, and the polygonal area obtained after the projection is the segmentation position of the target object.
  • the target object can be extracted according to the segmentation position, and then the feature extraction step is entered.
  • acquiring the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determining the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates include: The coordinates are converted to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates; the positional relationship of the target object relative to the current camera coordinate system is calculated according to the two-dimensional point coordinates and the target three-dimensional coordinates in the current camera coordinate system.
  • the three-dimensional point coordinates are obtained based on the reference coordinate system when the bag of words is established, in order to obtain the positional relationship between the target object coordinate system and the current camera coordinate system, the three-dimensional point coordinates need to be converted from the reference coordinate system to the target Object coordinate system to get the target three-dimensional coordinates. Specifically, by moving the origin of the acquired three-dimensional point coordinates corresponding to the target object to the target object, that is, the feature points of the target object can be centered, and then all points are subtracted from the center. In this way, according to the two-dimensional coordinates in the current camera coordinate system and the corresponding target three-dimensional coordinates, the positional relationship of the target object equivalent to the current camera coordinate system can be directly calculated.
  • converting the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates includes: obtaining the three-dimensional point coordinates corresponding to each feature point in the target object; the three-dimensional point coordinates corresponding to all feature points Perform averaging to obtain the average three-dimensional point coordinates; subtract the average three-dimensional point coordinates from the three-dimensional point coordinates corresponding to each feature point to obtain the corresponding target three-dimensional coordinates.
  • the target three-dimensional coordinates are the corresponding coordinates transferred to the target object coordinate system.
  • a schematic flowchart of positioning a target object In the first step, the drawing containing the marked points is placed on a flat surface. The second step is to place the target object in the target placement area of the drawing. In the third step, a video image containing the marked points and the target object is obtained through the camera. The fourth step is to segment the target object in the video image and extract the target object image. In the fifth step, feature extraction is performed on the target object image to obtain the two-dimensional features of the feature points. In the sixth step, the target object is three-dimensionally reconstructed according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain the corresponding Three-dimensional point coordinates.
  • the two-dimensional features of the feature points are associated with the corresponding three-dimensional point coordinates and stored to complete the establishment of the bag of words.
  • the drawing is removed, the target object is placed on a flat surface, the target image containing the target object is captured by the camera, and the target image is subjected to feature extraction to obtain the two-dimensional features of the feature points.
  • the target feature corresponding to the two-dimensional feature is matched in the word bag, and then the corresponding three-dimensional point coordinates are obtained.
  • Step 10 Obtain the two-dimensional point coordinates of the feature point in the current camera coordinate system, and determine the target object's posture equivalent to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  • an object positioning device which includes:
  • the first acquisition module 802 is used to acquire a target image obtained by shooting the target object to be located; the first extraction module 804 is used to perform feature extraction on the target object in the target image to obtain the corresponding Two-dimensional feature; the search module 806 is used to respectively search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features.
  • the determination module 808 is used to obtain the two-dimensional features of each feature point in the current camera coordinate system Point coordinates, the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  • the above-mentioned object positioning apparatus further includes: a second acquisition module, configured to acquire multiple video images containing the marker point and the target object obtained by shooting the marker point and the target object; conversion The relationship determination module is used to determine the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system; the calculation module is used to calculate the camera coordinate system and reference corresponding to each video image according to the conversion relationship A transformation relationship between coordinate systems; a transformation module for transforming the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the feature points in each video image in the The two-dimensional coordinates in the reference coordinate system; the second extraction module is used to extract the feature of the target object in the video image to obtain the two-dimensional feature corresponding to each feature point; the three-dimensional reconstruction module is used to The two-dimensional features corresponding to each feature point and the two-dimensional coordinates of the feature points in each video image in the reference coordinate system are used to three-dimensionally reconstruct the target object, and the corresponding three
  • the three-dimensional reconstruction module is further used to match the feature points in different video images according to the two-dimensional features of the feature points, determine the corresponding feature points in different video images, and obtain the Corresponding different two-dimensional coordinates of the same feature point in the reference coordinate system; obtaining the transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images; according to the internal parameter matrix, the transformation relationship Different two-dimensional coordinates corresponding to the same feature point are used to three-dimensionally reconstruct the corresponding feature point to obtain the corresponding three-dimensional point coordinates of the feature point under the reference coordinate system.
  • the three-dimensional reconstruction module is further used to obtain the coordinate of the corresponding three-dimensional point of the marked point in the coordinate system of the marked point; identify the marked point in the video image and determine that the marked point is in the camera Two-dimensional coordinates in the coordinate system; the camera coordinate system and the coordinates of the marker point are calculated according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system Conversion relationship between departments.
  • the above-mentioned object positioning device further includes: a segmentation module for determining a segmentation position corresponding to the target object in each video image, and extracting the target object from the corresponding video image according to the segmentation position, when the extracted After the target object in the video image, the feature extraction module is notified to enter feature extraction on the target object in the video image to obtain the two-dimensional feature corresponding to each feature point.
  • a segmentation module for determining a segmentation position corresponding to the target object in each video image, and extracting the target object from the corresponding video image according to the segmentation position, when the extracted After the target object in the video image, the feature extraction module is notified to enter feature extraction on the target object in the video image to obtain the two-dimensional feature corresponding to each feature point.
  • the determination module is further configured to convert the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates; according to the two-dimensional point coordinates in the current camera coordinate system and the target three-dimensional The coordinate calculation obtains the positional relationship of the target object relative to the current camera coordinate system.
  • the determination module is further used to obtain the three-dimensional point coordinates corresponding to each feature point in the target object; average the three-dimensional point coordinates corresponding to all feature points to obtain an average three-dimensional point coordinate; The corresponding three-dimensional point coordinates are subtracted from the average three-dimensional point coordinates to obtain the corresponding target three-dimensional coordinates.
  • FIG. 9 shows an internal structure diagram of a computer device in an embodiment.
  • the computer can be a terminal or a server.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system and may also store a computer program.
  • the processor may cause the processor to implement an object positioning method.
  • a computer program may also be stored in the internal memory.
  • the processor may cause the processor to execute the object positioning method.
  • the network interface is used to communicate with the outside world. Those skilled in the art may understand that the structure shown in FIG.
  • FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.
  • the object positioning method provided by the present application may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 9.
  • Various program templates constituting the object positioning device can be stored in the memory of the computer equipment.
  • a computer device includes a memory and a processor, and the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps: acquisition is obtained by shooting a target object to be located The target image of the target image; perform feature extraction on the target object in the target image to obtain the two-dimensional feature corresponding to each feature point; find the target feature that matches each of the two-dimensional feature in the bag of words, according to the The target feature determines the three-dimensional point coordinates corresponding to the corresponding feature points.
  • the bag of words is learned based on the marker points and stores the correspondence between the two-dimensional features of the feature points in the target object and the three-dimensional point coordinates;
  • the two-dimensional point coordinates of the feature points in the current camera coordinate system, and the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  • a computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to perform the following steps: acquiring a target image obtained by shooting a target object to be located; The target object in the image is subjected to feature extraction to obtain the two-dimensional features corresponding to each feature point; the target features matching each of the two-dimensional features are respectively found in the bag of words, and the corresponding feature points are determined according to the target features 3D point coordinates, the bag of words is learned based on the marker points, and stores the correspondence between the 2D features of the feature points in the target object and the coordinates of the 3D points; obtain each feature point in the current camera coordinate system The two-dimensional point coordinates in, the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Image Analysis (AREA)

Abstract

An object positioning method, comprising: obtaining a target image obtained by photographing a target object to be positioned (202); performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point (204); searching a word bag for a target feature matching each two-dimensional feature, and determining three-dimensional point coordinates of the corresponding feature point according to the target feature, the word bag being established by learning on the basis of a label point and storing a correspondence between the two-dimensional feature and the three-dimensional point coordinates of the feature point in the target object (206); and obtaining two-dimensional point coordinates of each feature point in a current camera coordinate system, and determining a position relationship of the target object with respect to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates (208). The object positioning method is simple in operation and high in stability and accuracy. In addition, also provided are an object positioning apparatus, a computer device, and a storage medium.

Description

物体定位方法、装置、计算机设备及存储介质Object positioning method, device, computer equipment and storage medium 技术领域Technical field
本发明涉及计算机处理领域,尤其是涉及一种物体定位方法、装置、计算机设备及存储介质。The invention relates to the field of computer processing, in particular to an object positioning method, device, computer equipment and storage medium.
背景技术Background technique
空间中任意物体定位属于AR(Augmented Reality增强现实)的范畴,物体定位即确定目标物体的空间坐标系和相机坐标系之间的位置关系。目前单目视觉目标定位的方法按照有无标记点方法分为有标志点方法和无标志点方法。对于有标志点的定位是通过定位标志点的位置来确定目标物体的位置,在实际应用中具有局限性,而无标志的定位是基于目标物体自身特征进行定位的,容易受外部环境因素的影响,稳定性低,精度也不高。The positioning of any object in space belongs to the category of AR (Augmented Reality). Object positioning determines the positional relationship between the spatial coordinate system of the target object and the camera coordinate system. The current monocular visual target positioning method is divided into a marked point method and a non-marked point method according to whether there is a marked point method. For the positioning of marked points, the position of the target object is determined by positioning the position of the marked point, which has limitations in practical applications, while the positioning of the unmarked position is based on the characteristics of the target object itself, and is easily affected by external environmental factors , Low stability and low accuracy.
因此,针对上述问题,亟需一种适用范围广,稳定性和准确度高的物体定位方案。Therefore, in view of the above problems, there is an urgent need for an object positioning solution with wide application range and high stability and accuracy.
发明内容Summary of the invention
基于此,有必要针对上述问题,提供了一种适用范围广且稳定性和准确度高的物体定位方法、装置、计算机设备及存储介质。Based on this, it is necessary to provide an object positioning method, device, computer equipment, and storage medium with a wide application range and high stability and accuracy in view of the above problems.
第一方面,本发明实施例提供一种物体定位方法,所述方法包括:In a first aspect, an embodiment of the present invention provides an object positioning method. The method includes:
获取通过对待定位的目标物体进行拍摄得到的目标图像;Acquire the target image obtained by shooting the target object to be located;
对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;Performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;
在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;Find the target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标 和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。Acquire the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
第二方面,本发明实施例提供一种物体定位装置,所述装置包括:In a second aspect, an embodiment of the present invention provides an object positioning device, the device including:
第一获取模块,用于获取通过对待定位的目标物体进行拍摄得到的目标图像;The first acquisition module is used to acquire the target image obtained by shooting the target object to be located;
第一提取模块,用于对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;The first extraction module is used to perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;
查找模块,用于在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;The searching module is used to search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marked points Established and stored the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
确定模块,用于获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。The determining module is configured to obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
第三方面,本发明实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor is caused to perform the following steps:
获取通过对待定位的目标物体进行拍摄得到的目标图像;Acquire the target image obtained by shooting the target object to be located;
对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;Performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;
在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;Find the target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。Obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
第四方面,本发明实施例提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the processor is caused to perform the following steps:
获取通过对待定位的目标物体进行拍摄得到的目标图像;Acquire the target image obtained by shooting the target object to be located;
对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;Performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;
在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;Find the target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。Obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
上述物体定位方法、装置、计算机设备及存储介质,首先基于标志点建立词袋,而在实际定位过程中,只需要基于提取到的特征点的二维特征就可以快速、准确地定位出该目标物体。该物体定位方法不仅操作简单,而且稳定性和准确度高。The above-mentioned object positioning method, device, computer equipment and storage medium first establish a bag of words based on the marked points, and in the actual positioning process, only the two-dimensional features based on the extracted feature points can be used to quickly and accurately locate the target object. The object positioning method is not only simple to operate, but also has high stability and accuracy.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without paying any creative work, other drawings can be obtained according to the structures shown in these drawings.
图1为一个实施例中物体定位方法的应用环境图;FIG. 1 is an application environment diagram of an object positioning method in an embodiment;
图2为一个实施例中物体定位方法的流程图;2 is a flowchart of an object positioning method in an embodiment;
图3为一个实施例中建立词袋的方法流程图;FIG. 3 is a flowchart of a method for creating a bag of words in an embodiment;
图4为一个实施例中标志点的示意图;FIG. 4 is a schematic diagram of marking points in an embodiment;
图5为一个实施例中设置区域的示意图;FIG. 5 is a schematic diagram of setting areas in an embodiment;
图6为一个实施例中对特征点进行三维重建的方法流程图;6 is a flowchart of a method for three-dimensional reconstruction of feature points in an embodiment;
图7为一个实施例中对目标物体进行定位的流程示意图;7 is a schematic flowchart of positioning a target object in an embodiment;
图8为一个实施例中物体定位装置的结构框图;8 is a structural block diagram of an object positioning device in an embodiment;
图9为一个实施例中计算机设备的内部结构图。9 is an internal structure diagram of a computer device in an embodiment.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, and are not intended to limit the present invention.
图1为一个实施例中物体定位方法的应用环境图。参照图1,该物体定位方法应用于物体定位系统。该物体定位系统包括终端110和服务器120,终端110通过调用摄像头对待定位的目标物体拍摄得到目标图像,将目标图像上传到服务器120,服务器120对目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征,在词袋中分别查找与每个二维特征匹配的目标特征,根据目标特征确定相应的特征点对应三维点坐标,词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;获取每个特征点在当前相机坐标系中的二维点坐标,根据二维点坐标和三维点坐标确定目标物体相对于当前相机坐标系的位置关系,将确定的位置关系发送到终端110。FIG. 1 is an application environment diagram of an object positioning method in an embodiment. Referring to FIG. 1, the object positioning method is applied to an object positioning system. The object positioning system includes a terminal 110 and a server 120. The terminal 110 obtains a target image by calling a camera to capture the target object to be positioned, and uploads the target image to the server 120. The server 120 performs feature extraction on the target object in the target image to obtain each For the two-dimensional features corresponding to the feature points, find the target features matching each two-dimensional feature in the bag of words, and determine the corresponding feature points corresponding to the three-dimensional point coordinates according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features and the three-dimensional point coordinates of the feature points in the target object is obtained; the two-dimensional point coordinates of each feature point in the current camera coordinate system are obtained, and the target object is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates The determined positional relationship is sent to the terminal 110 with respect to the positional relationship of the current camera coordinate system.
在另一个实施例中,上述物体定位方法可以直接应用于终端110,终端110调用摄像头对待定位的目标物体进行拍摄得到目标图像,对目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征,在词袋中分别查找与每个二维特征匹配的目标特征,根据目标特征确定相应的特征点对应三维点坐标,词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;获取每个特征点在当前相机坐标系中的二维点坐标,根据二维点坐标和三维点坐标确定目标物体相对于当前相机坐标系的位置关系。In another embodiment, the above object positioning method can be directly applied to the terminal 110. The terminal 110 calls the camera to take a picture of the target object to be located to obtain a target image, and extracts the feature of the target object in the target image to obtain the correspondence of each feature point The two-dimensional features of the search for the target features that match each two-dimensional feature in the bag of words, and determine the corresponding feature points corresponding to the coordinates of the three-dimensional points according to the target features. The bag of words is learned based on the marker points and stores the target object. Correspondence between the two-dimensional features and the three-dimensional point coordinates of the feature points in the; obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, determine the target object relative to the current The positional relationship of the camera coordinate system.
如图2所示,提出了一种物体定位方法,该物体定位方法既可以应用于终端,也可以应用于服务器,本实施例中以应用于终端为例,具体包括以下步骤:As shown in FIG. 2, an object positioning method is proposed. The object positioning method can be applied to both a terminal and a server. In this embodiment, the method is applied to the terminal as an example, and specifically includes the following steps:
步骤202,获取通过对待定位的目标物体进行拍摄得到的目标图像。Step 202: Acquire a target image obtained by shooting a target object to be located.
其中,目标物体是指待定位的物体。目标图像是指对目标物体进行拍摄得到的包含有目标物体的图像。具体地,终端通过调用摄像头(相机)对目标物 体进行拍摄,得到目标图像。Among them, the target object refers to the object to be located. The target image refers to an image containing the target object obtained by shooting the target object. Specifically, the terminal captures the target object by calling a camera (camera) to obtain the target image.
步骤204,对目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征。Step 204: Perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point.
其中,特征点是指目标图像中目标物体上的点,可以根据实际需求自定义选取特征点的方法。在一个实施例中,可以只选用图像中比较显著的点作为特征点,比如选取目标物体的轮廓点,当然也可以将组成目标物体的所有像素点作为特征点。拍摄目标物体得到的目标图像为二维的,所以对目标物体的特征点进行特征提取得到的是二维特征。二维特征是目标物体上的特征点对应的特征,不同的特征点对应的二维特征不同,所以可以将二维特征作为特征点的识别标志。The feature points refer to the points on the target object in the target image, and the method of selecting the feature points can be customized according to actual needs. In one embodiment, only the significant points in the image may be selected as the feature points, for example, the contour points of the target object may be selected, and of course, all pixel points constituting the target object may also be used as the feature points. The target image obtained by shooting the target object is two-dimensional, so the feature points of the target object are extracted to obtain two-dimensional features. The two-dimensional feature is the feature corresponding to the feature point on the target object. Different feature points correspond to different two-dimensional features, so the two-dimensional feature can be used as the identification mark of the feature point.
在一个实施例中,提取的二维特征为ORB(Oriented Fast and Rotated BRIEF)特征,可以采用FAST(features from accelerated segment test)算法来检测特征点。在另一个实施例中,提取的特征可以为HOG特征,当然也可以为DOG特征。In one embodiment, the extracted two-dimensional feature is an ORB (Oriented Fast and Rotated BRIEF) feature, and a FAST (features from accelerated segment test) algorithm can be used to detect feature points. In another embodiment, the extracted features may be HOG features, or of course DOG features.
步骤206,在词袋中分别查找与每个二维特征匹配的目标特征,根据目标特征确定相应的特征点对应的三维点坐标,词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系。Step 206: Find target features matching each two-dimensional feature in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored in the target object. The correspondence between the two-dimensional features of the feature points and the coordinates of the three-dimensional points.
其中,词袋是基于标志点进行学习建立的,标志点是用于对目标物体进行辅助定位的参考点。词袋中存储了学习之后得到的特征点的二维特征与相应的三维点坐标之间的对应关系。在确定了特征点的二维特征后,在词袋中查找与二维特征匹配的目标特征,然后根据目标特征确定相应的三维点坐标。目标特征是指在词袋中查找到的与该二维特征匹配的特征。由于预先将特征点的二维特征和相应的三维点坐标都进行了存储,所以在提取到特征点的二维特征后,就可以快速在词袋中查找到相应的三维点坐标,从而提高了物体定位的速度。Among them, the bag of words is learned based on the marker points, which are reference points used for auxiliary positioning of the target object. The bag of words stores the correspondence between the two-dimensional features of the feature points obtained after learning and the corresponding three-dimensional point coordinates. After determining the two-dimensional features of the feature points, find the target features matching the two-dimensional features in the bag of words, and then determine the corresponding three-dimensional point coordinates according to the target features. The target feature refers to the feature found in the bag of words that matches the two-dimensional feature. Since the two-dimensional features of the feature point and the corresponding three-dimensional point coordinates are stored in advance, after the two-dimensional features of the feature point are extracted, the corresponding three-dimensional point coordinates can be quickly found in the word bag, thereby improving the The speed of object positioning.
步骤208,获取每个特征点在当前相机坐标系中的二维点坐标,根据二维点坐标和三维点坐标确定目标物体相对于当前相机坐标系的位置关系。Step 208: Acquire the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
其中,目标图像是基于当前相机坐标系对目标物体进行拍摄得到的,相应 的获取到目标物体的特征点的二维点坐标后,就可以根据该二维点坐标和对应的三维点坐标通过相机透视投影模型计算得到目标物体相对于当前相机坐标系的位置关系。位置关系一般采用旋转矩阵R和平移矩阵T来表示。Among them, the target image is obtained by shooting the target object based on the current camera coordinate system. After correspondingly acquiring the two-dimensional point coordinates of the target object's feature point, the camera can pass the camera according to the two-dimensional point coordinates and the corresponding three-dimensional point coordinates The perspective projection model calculates the positional relationship of the target object relative to the current camera coordinate system. The positional relationship is generally expressed by a rotation matrix R and a translation matrix T.
在一个实施例中,由于得到的三维点坐标对应的坐标系与目标物体所对应的坐标系可能不一致,所以在得到三维点坐标后,还需要将三维点坐标转换到目标物体的坐标系,得到目标三维点坐标,然后根据二维点坐标和目标三维点坐标得到目标物体相对于当前相机坐标系的位置关系。在一个实施例中,相机透视投影矩阵可以采用如下计算公式表示:C=f(RT)*M,其中,C表示图像的特征点对应的二维坐标,M为相应的特征点对应的三维点坐标,f(RT)表示以R、T为变量的函数。在已知C、M的情况下,就可以得到旋转矩阵R和平移矩阵T。In one embodiment, since the coordinate system corresponding to the obtained three-dimensional point coordinates may not be consistent with the coordinate system corresponding to the target object, after obtaining the three-dimensional point coordinates, it is also necessary to convert the three-dimensional point coordinates to the coordinate system of the target object to obtain The target 3D point coordinates, and then obtain the position relationship of the target object relative to the current camera coordinate system according to the 2D point coordinates and the target 3D point coordinates. In one embodiment, the camera perspective projection matrix can be expressed by the following calculation formula: C=f(RT)*M, where C represents the two-dimensional coordinates corresponding to the feature points of the image, and M is the three-dimensional points corresponding to the corresponding feature points Coordinates, f(RT) represents a function with R and T as variables. Knowing C and M, a rotation matrix R and a translation matrix T can be obtained.
上述物体定位方法,首先基于标志点对目标物体进行学习,将学习到的目标物体中的特征点的二维特征和三维点坐标之间的对应关系存储到词袋。在对目标物体进行定位时,提取目标物体的特征点的二维特征,在词袋中查找与该二维特征对应的三维点坐标,然后根据二维点坐标和三维点坐标就可以确定目标物体相对于当前相机坐标系的位置关系。该物体定位的方法,基于标志点建立词袋,而在实际定位过程中,只需要基于提取到的特征点的二维特征就可以快速、准确地定位出该目标物体。该物体定位方法不仅操作简单,而且稳定性和准确度高。The above object positioning method first learns the target object based on the marked points, and stores the correspondence between the two-dimensional features of the learned feature points and the three-dimensional point coordinates in the bag of words. When locating the target object, extract the two-dimensional feature of the feature point of the target object, find the three-dimensional point coordinates corresponding to the two-dimensional feature in the bag of words, and then determine the target object according to the two-dimensional point coordinates and the three-dimensional point coordinates Positional relationship with respect to the current camera coordinate system. The object positioning method establishes a word bag based on the marked points, and in the actual positioning process, only the two-dimensional features based on the extracted feature points can be used to quickly and accurately locate the target object. The object positioning method is not only simple to operate, but also has high stability and accuracy.
如图3所示,在一个实施例中,在在词袋中分别查找与每个二维特征匹配的目标特征之前,还包括:建立词袋;建立词袋包括以下步骤:As shown in FIG. 3, in one embodiment, before separately searching for target features matching each two-dimensional feature in the word bag, the method further includes: establishing a word bag; establishing the word bag includes the following steps:
步骤302,获取通过对标志点和目标物体进行拍摄得到的包含有标志点和目标物体的多张视频图像。Step 302: Acquire multiple video images including the marked point and the target object obtained by shooting the marked point and the target object.
其中,标志点是用于对目标物体进行辅助定位的参考点。通常在图纸上贴上标志点,如图4所示,为一个实施例中,标志点的示意图,图中标志点为圆点。预先设置标志点的坐标,参考图4,可以将图4中的第2个圆点作为原点,将圆点2到3的方向为X轴,圆点2到1的方向为Y轴,X轴和Y轴叉积为Z 轴。并预先设置每个圆点的圆心坐标。图4中展示的特殊的6标志点能够更好地对目标物体进行定位。Among them, the mark point is a reference point used for auxiliary positioning of the target object. Generally, mark points are affixed on the drawing, as shown in FIG. 4, which is a schematic diagram of the mark points in one embodiment, and the mark points in the figure are dots. Set the coordinates of the marker points in advance. With reference to Figure 4, the second dot in Figure 4 can be used as the origin, the direction of dots 2 to 3 is the X axis, and the direction of dots 2 to 1 is the Y axis, X axis The cross product with the Y axis is the Z axis. And set the center coordinates of each circle in advance. The special 6 mark points shown in Figure 4 can better locate the target object.
在一个实施例中,预先分别设置6个圆的圆心的坐标为1(0,1,0),2(0,0,0),3(1,0,0),4(-1,-1,0),5(0,-1,0)和6(1,-1,0)。将目标物体放在设定的目标区域,目标区域中放置带有标志点的图纸。如图5所示,为设置的目标区域的示意图,目标区域为长方体,八个顶点坐标关系可以表示如下:In one embodiment, the coordinates of the center of the 6 circles are set in advance as 1(0,1,0), 2(0,0,0), 3(1,0,0), 4(-1,- 1,0), 5(0,-1,0) and 6(1,-1,0). Place the target object in the set target area, and place the drawing with marked points in the target area. As shown in FIG. 5, it is a schematic diagram of the set target area, the target area is a rectangular parallelepiped, and the coordinate relationship of the eight vertices can be expressed as follows:
P1(x’,y’,0),P1’(x’,y’,offset_z),P2(x’,y’+offset_y,0),P2’(x’,y’+offset_y,offset_z),P3(x’+offset_x,y’,0),P3’(x’+offset_x,y’,offset_z),P4(x’+offset_x,y’+offset_y,0),P4’(x’+offset_x,y’+offset_y,offset_z)。其中,P1为固定值,由图纸决定,offset_x,offset_y,offset_z根据被学习目标物体的可以自由调节。将标志点也放在该目标区域,采用相机围绕着标志点和目标物体进行拍摄,每次拍摄保证标志点和目标物体同时出现在相机视野中,得到多张包含有标志点和目标物体的视频图像,在一个实施例中,相机为单目相机。P1(x',y',0),P1'(x',y',offset_z),P2(x',y'+offset_y,0),P2'(x',y'+offset_y,offset_z), P3(x'+offset_x,y',0),P3'(x'+offset_x,y',offset_z),P4(x'+offset_x,y'+offset_y,0),P4'(x'+offset_x, y'+offset_y,offset_z). Among them, P1 is a fixed value, determined by the drawing, offset_x, offset_y, offset_z can be freely adjusted according to the target object being learned. Place the marker point in the target area, and use the camera to shoot around the marker point and the target object. Each shot ensures that the marker point and the target object appear in the camera's field of view at the same time, and multiple videos containing the marker point and the target object are obtained The image, in one embodiment, the camera is a monocular camera.
步骤304,确定每一视频图像对应的相机坐标系与标志点坐标系之间的转换关系。Step 304: Determine the conversion relationship between the camera coordinate system and the marker coordinate system corresponding to each video image.
其中,在进行拍摄时,随着相机的移动,相机坐标系在不断地改变。需要计算得到每一张视频图像对应的相机坐标系与标志点坐标系之间的转换关系。转换关系是指相机坐标系与标志点坐标系之间的位置关系,位置关系可以采用R,T表示,R,T分别表示旋转矩阵和平移矩阵。Among them, when shooting, as the camera moves, the camera coordinate system is constantly changing. It is necessary to calculate the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system. The conversion relationship refers to the positional relationship between the camera coordinate system and the marker point coordinate system. The positional relationship can be represented by R, T, and R and T respectively represent the rotation matrix and the translation matrix.
在一个实施例中,采用如下公式计算得到相应的转换关系。设相机和标志点坐标系之间的转换关系为(RT) i,如下式:C i=(RT) iM。其中,C i为相机坐标系中一点,M为标志点坐标系中一点,(RT) i为旋转平移矩阵。其中,C i为相机坐标系中的二维坐标,M为相应的标志点坐标系中的三维点坐标。通过上述公式计算得到相机坐标系与标志点坐标系之间的旋转矩阵R和平移矩阵T。 In one embodiment, the following conversion relationship is calculated using the following formula. Let the conversion relationship between the camera and the point coordinate system be (RT) i , as follows: C i =(RT) i M. Where, C i is a point in the camera coordinate system, M is a point in the coordinate system of the marker point, and (RT) i is the rotation translation matrix. Where C i is the two-dimensional coordinate in the camera coordinate system, and M is the three-dimensional point coordinate in the corresponding marker coordinate system. The rotation matrix R and the translation matrix T between the camera coordinate system and the marker coordinate system are calculated by the above formula.
步骤306,根据转换关系计算得到每一视频图像对应的相机坐标系与基准坐标系之间的变换关系。Step 306: Calculate the transformation relationship between the camera coordinate system and the reference coordinate system corresponding to each video image according to the transformation relationship.
其中,基准坐标系是指选定的作为基准的坐标系,可以选择第一视频帧所 对应的相机坐标系作为基准坐标系。在已知每个相机坐标系与标志点坐标系之之间的转换关系后,就可以计算得到每个相机坐标系与基准坐标系之间的变换关系,即计算得到相机坐标系之间的位置关系。在一个实施例中,根据相机和标志点坐标系之间的转换关系C i=(RT) iM可以得到相邻的两次相机之间的坐标变换关系:
Figure PCTCN2018124409-appb-000001
在一个实施例中,将第一帧视频图像的相机坐标系作为基准坐标系,通过相邻相机坐标系之间的变换关系就可以计算得到每个相机坐标系与基准坐标系之间的变化关系,比如,将C 1所在的坐标系作为基准坐标系,已知C 1和C 2、C 2和C 3等相邻坐标的变换关系,就可以确定每个C i和C 1之间的变换关系。
The reference coordinate system refers to the selected coordinate system as the reference, and the camera coordinate system corresponding to the first video frame can be selected as the reference coordinate system. After knowing the conversion relationship between each camera coordinate system and the marker coordinate system, you can calculate the conversion relationship between each camera coordinate system and the reference coordinate system, that is, calculate the position between the camera coordinate system relationship. In one embodiment, the coordinate transformation relationship between two adjacent cameras can be obtained according to the conversion relationship C i =(RT) i M between the camera and the landmark coordinate system:
Figure PCTCN2018124409-appb-000001
In one embodiment, the camera coordinate system of the first frame of the video image is used as the reference coordinate system, and the change relationship between each camera coordinate system and the reference coordinate system can be calculated through the transformation relationship between the adjacent camera coordinate systems For example, using the coordinate system where C 1 is located as the reference coordinate system, and knowing the transformation relationship between adjacent coordinates such as C 1 and C 2 , C 2 and C 3 , you can determine the transformation between each C i and C 1 relationship.
步骤308,根据变换关系将每一视频图像中的目标物体的特征点的坐标转换到基准坐标系,得到每一视频图像中的特征点在基准坐标系中的二维坐标。Step 308: Convert the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the two-dimensional coordinates of the feature points in each video image in the reference coordinate system.
其中,变换关系是指将相机坐标系中的坐标点转换到基准坐标系中所对应的位置变换关系。在计算得到每一视频图像对应的相机坐标系与基准坐标系后,将目标物体的特征点的坐标都转换到该基准坐标系,然后得到每一视频图像中的特征点转换到该基准坐标系后相对应的二维坐标点。The transformation relationship refers to the position transformation relationship corresponding to the conversion of the coordinate points in the camera coordinate system to the reference coordinate system. After calculating the camera coordinate system and the reference coordinate system corresponding to each video image, the coordinates of the feature points of the target object are converted to the reference coordinate system, and then the feature points in each video image are converted to the reference coordinate system The corresponding two-dimensional coordinate point.
步骤310,对视频图像中的目标物体进行特征提取,得到每个特征点对应的二维特征。Step 310: Perform feature extraction on the target object in the video image to obtain a two-dimensional feature corresponding to each feature point.
其中,针对视频图像提取每个特征点对应的二维特征,二维特征可以采用ORB特征,相应的可以采用FAST(features from accelerated segment test)算法来检测特征,并提取。Among them, the two-dimensional feature corresponding to each feature point is extracted for the video image. The two-dimensional feature can use the ORB feature, and the corresponding can use the FAST (features from accelerated segment test) algorithm to detect the feature and extract it.
步骤312,根据每个特征点对应的二维特征和每一视频图像中的特征点在基准坐标系中的二维坐标对目标物体进行三维重建,得到每个特征点进行三维重建后对应的三维点坐标。Step 312: Perform a three-dimensional reconstruction of the target object according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain the corresponding three-dimensional Point coordinates.
其中,不同的视频图像虽然是基于不同的相机坐标系,但是对于目标物体上的同一特征点来说,提取到的二维特征是相同的,不同的特征点对应的二维特征不同,所以可以通过特征匹配的方法,确定不同视频图像中对应的同一特征点,构成匹配的特征点,在已知匹配的特征点分别在基准坐标系中的二维坐 标后,结合相机的内部参数,就可以对特征点进行三维重建,得到该特征点对应的三维点坐标。根据该方法,对每个特征点进行三维重建,这样就完成了目标物体的三维重建。Among them, although different video images are based on different camera coordinate systems, for the same feature point on the target object, the extracted two-dimensional features are the same, and different feature points correspond to different two-dimensional features, so you can Through the feature matching method, the same feature points corresponding to different video images are determined to form matching feature points. After the matching feature points are known to be in the two-dimensional coordinates of the reference coordinate system, combined with the camera's internal parameters, you can Perform three-dimensional reconstruction on the feature point to obtain the three-dimensional point coordinates corresponding to the feature point. According to this method, three-dimensional reconstruction is performed on each feature point, thus completing the three-dimensional reconstruction of the target object.
步骤314,将目标物体中的特征点的二维特征和对应的三维点坐标进行关联存储,完成词袋的建立。Step 314: Associate and store the two-dimensional features of the feature points in the target object and the corresponding three-dimensional point coordinates to complete the establishment of the bag of words.
其中,三维点坐标是相对于基准标准系进行重建得到的,相应的三维点坐标也是基于基准坐标系确定的。在确定了目标物体上的特征点对应的三维点坐标后,通过将目标物体的特征带你与三维点坐标进行关联存储,从而完成词袋的建立。上述词袋建立的方法通过借助于标志点来定位目标物体,能够准确、快速、稳定地确定目标物体中的特征点的二维特征与三维点坐标之间的对应关系。Among them, the three-dimensional point coordinates are reconstructed with respect to the reference standard system, and the corresponding three-dimensional point coordinates are also determined based on the reference coordinate system. After determining the coordinate of the three-dimensional point corresponding to the feature point on the target object, the feature bag of the target object is associated with the three-dimensional point coordinate and stored to complete the establishment of the bag of words. The method of establishing the above bag of words can accurately, quickly and stably determine the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points by locating the target object by means of the marker points.
如图6所示,在一个实施例中,根据每个特征点对应的二维特征和每一视频图像中的特征点在基准坐标系中的二维坐标对目标物体进行三维重建,得到每个特征点进行三维重建后对应的三维点坐标,包括:As shown in FIG. 6, in one embodiment, the target object is three-dimensionally reconstructed according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain each The corresponding 3D point coordinates of the feature point after 3D reconstruction include:
步骤312A,根据特征点的二维特征对不同视频图像中特征点进行匹配,确定不同视频图像中对应的同一特征点,获取不同视频图像中的同一特征点在基准坐标系中对应的不同的二维坐标。Step 312A: Match the feature points in different video images according to the two-dimensional features of the feature points, determine the corresponding feature points in the different video images, and obtain the corresponding two different features of the same feature point in the different video images in the reference coordinate system Dimensional coordinates.
其中,在不同视频图像中的同一特征点对应的二维特征相同。所以可以通过特征匹配的方式确定不同视频图像中对应的同一特征点。然后分别获取不同视频图像中的同一特征点在基准坐标系对应的二维坐标。比如,第一视频图像A点的二维特征和第二视频图像B点的二维特征相同,那么A点和B点对应的为同一特征点。然后分别获取A点在基准坐标系中的二维坐标,获取B点在基准坐标系中的二维坐标。Among them, the two-dimensional features corresponding to the same feature point in different video images are the same. Therefore, the same feature points in different video images can be determined by feature matching. Then separately obtain the two-dimensional coordinates corresponding to the same feature point in different video images in the reference coordinate system. For example, if the two-dimensional feature of point A of the first video image is the same as the two-dimensional feature of point B of the second video image, then points A and B correspond to the same feature point. Then separately obtain the two-dimensional coordinates of point A in the reference coordinate system, and obtain the two-dimensional coordinates of point B in the reference coordinate system.
步骤312B,获取相机的内部参数矩阵和不同视频图像对应的相机坐标系之间的变换关系。Step 312B: Obtain the transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images.
其中,内部参数矩阵是指相机的内部参数矩阵。在得到相机内部和外部参数之后就可以计算空间点的三维坐标了。内部参数矩阵是固定的,直接获取即 可。外部参数是指不同视频图像对应的相机坐标系之间的位置关系,这里的变换关系就是指相机坐标系之间的位置关系。Among them, the internal parameter matrix refers to the internal parameter matrix of the camera. After obtaining the internal and external parameters of the camera, the three-dimensional coordinates of the spatial point can be calculated. The internal parameter matrix is fixed and can be obtained directly. The external parameter refers to the positional relationship between the camera coordinate systems corresponding to different video images, and the transformation relationship here refers to the positional relationship between the camera coordinate systems.
步骤312C,根据内部参数矩阵、变换关系和同一特征点对应的不同的二维坐标对相应的特征点进行三维重建,得到特征点在基准坐标系下对应的三维点坐标。Step 312C: Perform three-dimensional reconstruction of the corresponding feature points according to the internal parameter matrix, transformation relationship and different two-dimensional coordinates corresponding to the same feature point, to obtain the corresponding three-dimensional point coordinates of the feature point in the reference coordinate system.
其中,变换关系是指相机坐标系之间的相对位置关系,同样可以采用旋转矩阵R和平移矩阵T来表示。在已知每个相机坐标系对应的相机内部参数矩阵,以及同一匹配点对应的不同的二维坐标和变换关系就可以进行特征点的三维重建。具体地,假设有两张视频图像,第一视频图像和第二视频图像,将第一视频图像对应的坐标系作为基准坐标系,此时,不同位置的相机的投影矩阵为:Among them, the transformation relationship refers to the relative position relationship between the camera coordinate systems, which can also be represented by a rotation matrix R and a translation matrix T. Three-dimensional reconstruction of feature points can be performed by knowing the camera internal parameter matrix corresponding to each camera coordinate system and different two-dimensional coordinates and transformation relationships corresponding to the same matching point. Specifically, suppose there are two video images, the first video image and the second video image, and the coordinate system corresponding to the first video image is used as the reference coordinate system. At this time, the projection matrix of the cameras at different positions is:
M 1=K 1[I 0] M 2=K 2[R T] M 1 = K 1 [I 0] M 2 = K 2 [RT]
其中,I为单位矩阵,K 1,K 2为分别为相机的内部参数矩阵,R为两个相机坐标系之间的相对旋转矩阵,T为两个相机之间的平移矩阵。假设x和x’为两个视频图像中的一对匹配点,即对应的是同一特征点。设X为对应的空间点坐标,则它们之间的关系可以表示为:x=M 1X,x’=M 2X。通过上述关系就可以求解得到特征点在基准坐标系下的三维点坐标。 Where I is the identity matrix, K 1 and K 2 are the internal parameter matrix of the cameras, R is the relative rotation matrix between the two camera coordinate systems, and T is the translation matrix between the two cameras. Assume that x and x'are a pair of matching points in two video images, that is, the corresponding feature points are the same. Let X be the corresponding space point coordinates, then the relationship between them can be expressed as: x = M 1 X, x'= M 2 X. Through the above relationship, the three-dimensional point coordinates of the feature point in the reference coordinate system can be obtained.
在一个实施例中,确定每一视频图像对应的相机坐标系与标志点坐标系之间的转换关系,包括:获取标志点在标志点坐标系中对应的三维点坐标;对视频图像中的标志点进行识别,确定标志点在相机坐标系中的二维坐标;根据标志点在相机坐标系中的二维坐标和在标志点坐标系中的三维点坐标计算得到相机坐标系与标志点坐标系之间的转换关系。In one embodiment, determining the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system includes: acquiring the three-dimensional point coordinates corresponding to the marker point in the marker point coordinate system; Recognize points to determine the two-dimensional coordinates of the marker point in the camera coordinate system; calculate the camera coordinate system and the coordinate system of the marker point according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system The conversion relationship between.
其中,三维点坐标可以在标志点坐标系中预先设置。对视频图像中的标志点进行识别处理,得到标志点在相机坐标系中的二维坐标,在确定了标志点在相机坐标系中的二维坐标和在标志点坐标系中的三维点坐标后,就可以根据相机投影矩阵方程计算得到相机坐标系与标志点坐标系之间的转换关系。具体地,摄像投影矩阵的方程的公式如下:Among them, the three-dimensional point coordinates can be set in advance in the marker point coordinate system. Recognize the marked points in the video image to obtain the two-dimensional coordinates of the marked points in the camera coordinate system. After determining the two-dimensional coordinates of the marked points in the camera coordinate system and the three-dimensional point coordinates in the marked point coordinate system , The conversion relationship between the camera coordinate system and the marker coordinate system can be calculated according to the camera projection matrix equation. Specifically, the equation of the camera projection matrix equation is as follows:
Figure PCTCN2018124409-appb-000002
Figure PCTCN2018124409-appb-000002
其中,s为放缩系数,dX,dY为像素的物理尺寸,f为焦距,R为旋转矩阵,T为平移矩阵,α x=f/dX,α y=f/dY,(u,v)为视频图像中的二维点坐标,(X W,Y W,Z W)为其对应的空间物理坐标。由于s,dX,dY,f为已知的量,所以根据多组二维点坐标和三维点坐标就可以计算得到R和T。组数是根据旋转矩阵和平移矩阵包含的未知自由度的个数确定的,如果未知自由度的个数为4个,那么相应地至少需要4对坐标来计算得到相应的旋转矩阵和平移矩阵。 Where s is the scaling factor, dX, dY are the physical dimensions of the pixel, f is the focal length, R is the rotation matrix, T is the translation matrix, α x = f/dX, α y = f/dY, (u, v) It is the two-dimensional point coordinate in the video image, (X W , Y W , Z W ) is its corresponding spatial physical coordinate. Since s, dX, dY, and f are known quantities, R and T can be calculated based on multiple sets of two-dimensional point coordinates and three-dimensional point coordinates. The number of groups is determined according to the number of unknown degrees of freedom included in the rotation matrix and translation matrix. If the number of unknown degrees of freedom is 4, then at least 4 pairs of coordinates are needed to calculate the corresponding rotation matrix and translation matrix.
在一个实施例中,在对标志点和目标物体进行拍摄,得到包含有标志点和目标物体的多张视频图像的步骤之后,还包括:确定每一视频图像中目标物体对应的分割位置,根据分割位置从相应的视频图像中提取出目标物体,当提取出视频图像中的目标物体后,进入对视频图像中的目标物体进行特征提取,得到每个特征点对应的二维特征的步骤。In one embodiment, after the step of photographing the marker point and the target object to obtain multiple video images containing the marker point and the target object, the method further includes: determining the segmentation position corresponding to the target object in each video image, according to The segmentation position extracts the target object from the corresponding video image. After extracting the target object in the video image, it proceeds to the feature extraction of the target object in the video image to obtain the two-dimensional feature corresponding to each feature point.
其中,为了过滤空间中其他非目标干扰,需要将视频图像中的目标物体提取出来,首先,确定视频图像中目标物体对应的分割位置。在一个实施例中,将目标物体放置在一个长方体中,参考图5所示,其顶点分别为P1、P2、P3、P4、P1’、P2’、P3’、P4’,分割就是把这八个顶点根据相机的透视投影矩阵投影到像平面,投影后得到多边形区域即为目标物体的分割位置。在确定了分割位置后,就可以根据分割位置提取出目标物体,然后再进入提取特征的步骤。Among them, in order to filter other non-target interference in the space, the target object in the video image needs to be extracted. First, the segmentation position corresponding to the target object in the video image is determined. In one embodiment, the target object is placed in a rectangular parallelepiped, as shown in FIG. 5, the vertices are P1, P2, P3, P4, P1', P2', P3', P4'. The vertices are projected onto the image plane according to the perspective projection matrix of the camera, and the polygonal area obtained after the projection is the segmentation position of the target object. After the segmentation position is determined, the target object can be extracted according to the segmentation position, and then the feature extraction step is entered.
在一个实施例中,获取每个特征点在当前相机坐标系中的二维点坐标,根据二维点坐标和三维点坐标确定目标物体相对于当前相机坐标系的位置关系,包括:将三维点坐标转换到目标物体对应的坐标系上,得到目标三维坐标;根据当前相机坐标系中的二维点坐标和目标三维坐标计算得到目标物体相对于当前相机坐标系的位置关系。In one embodiment, acquiring the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determining the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates include: The coordinates are converted to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates; the positional relationship of the target object relative to the current camera coordinate system is calculated according to the two-dimensional point coordinates and the target three-dimensional coordinates in the current camera coordinate system.
其中,由于获取的三维点坐标是基于建立词袋时的基准坐标系得到的,为 了得到目标物体坐标系与当前相机坐标系之间的位置关系,需要将三维点坐标从基准坐标系转换到目标物体坐标系,得到目标三维坐标。具体地,通过将获取到的目标物体对应的三维点坐标的原点移动到目标物体上,即,可以对目标物体的特征点求中心,然后所有点减去中心。这样根据当前相机坐标系中的二维坐标和相应的目标三维坐标就可以直接计算得到目标物体相当于当前相机坐标系的位置关系。Among them, since the obtained three-dimensional point coordinates are obtained based on the reference coordinate system when the bag of words is established, in order to obtain the positional relationship between the target object coordinate system and the current camera coordinate system, the three-dimensional point coordinates need to be converted from the reference coordinate system to the target Object coordinate system to get the target three-dimensional coordinates. Specifically, by moving the origin of the acquired three-dimensional point coordinates corresponding to the target object to the target object, that is, the feature points of the target object can be centered, and then all points are subtracted from the center. In this way, according to the two-dimensional coordinates in the current camera coordinate system and the corresponding target three-dimensional coordinates, the positional relationship of the target object equivalent to the current camera coordinate system can be directly calculated.
在一个实施例中,将三维点坐标转换到目标物体对应的坐标系上,得到目标三维坐标,包括:获取目标物体中每一个特征点对应的三维点坐标;对所有特征点对应的三维点坐标进行平均,得到平均三维点坐标;将每个特征点对应的三维点坐标减去平均三维点坐标,得到相应的目标三维坐标。In one embodiment, converting the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates includes: obtaining the three-dimensional point coordinates corresponding to each feature point in the target object; the three-dimensional point coordinates corresponding to all feature points Perform averaging to obtain the average three-dimensional point coordinates; subtract the average three-dimensional point coordinates from the three-dimensional point coordinates corresponding to each feature point to obtain the corresponding target three-dimensional coordinates.
其中,为了将三维点坐标转换到目标物体对应的坐标系上,需要获取目标物体上的每个特征点对应的三维点坐标,然后对所有特征点对应的三维点坐标求平均,得到平均三维点坐标,最后把每个特征点的三维点坐标减去该平均三维点坐标得到相应的目标三维坐标。目标三维坐标即为转移到目标物体坐标系中对应的坐标。Among them, in order to convert the three-dimensional point coordinates to the coordinate system corresponding to the target object, it is necessary to obtain the three-dimensional point coordinates corresponding to each feature point on the target object, and then average the three-dimensional point coordinates corresponding to all feature points to obtain the average three-dimensional point Coordinates, and finally subtract the average three-dimensional point coordinates of the three-dimensional point coordinates of each feature point to obtain the corresponding target three-dimensional coordinates. The target three-dimensional coordinates are the corresponding coordinates transferred to the target object coordinate system.
如图7所示,在一个实施例中,对目标物体进行定位的流程示意图。第一步,将包含有标志点的图纸放置在平整表面。第二步,将目标物体放置在图纸的目标放置区域。第三步,通过相机拍摄得到包含有标志点和目标物体的视频图像。第四步,对视频图像中的目标物体进行分割,提取出目标物体图像。第五步,对目标物体图像进行特征提取,得到特征点的二维特征。第六步,根据每个特征点对应的二维特征和每一视频图像中的特征点在基准坐标系中的二维坐标对目标物体进行三维重建,得到每个特征点进行三维重建后对应的三维点坐标。第七步,将特征点的二维特征与相应的三维点坐标进行关联存储,完成词袋的建立。第八步,撤掉图纸,将目标物体放置在平整表面,采用相机拍摄包含有目标物体的目标图像,对目标图像进行特征提取,得到特征点的二维特征。第九步,在词袋中匹配与该二维特征对应的目标特征,然后获取相应的三维点坐标。第十步,获取特征点在当前相机坐标系的二维点坐标,根据二维点 坐标和三维点坐标确定目标物体相当于当前相机坐标系的位姿。As shown in FIG. 7, in one embodiment, a schematic flowchart of positioning a target object. In the first step, the drawing containing the marked points is placed on a flat surface. The second step is to place the target object in the target placement area of the drawing. In the third step, a video image containing the marked points and the target object is obtained through the camera. The fourth step is to segment the target object in the video image and extract the target object image. In the fifth step, feature extraction is performed on the target object image to obtain the two-dimensional features of the feature points. In the sixth step, the target object is three-dimensionally reconstructed according to the two-dimensional feature corresponding to each feature point and the two-dimensional coordinates of the feature point in each video image in the reference coordinate system to obtain the corresponding Three-dimensional point coordinates. In the seventh step, the two-dimensional features of the feature points are associated with the corresponding three-dimensional point coordinates and stored to complete the establishment of the bag of words. In the eighth step, the drawing is removed, the target object is placed on a flat surface, the target image containing the target object is captured by the camera, and the target image is subjected to feature extraction to obtain the two-dimensional features of the feature points. In the ninth step, the target feature corresponding to the two-dimensional feature is matched in the word bag, and then the corresponding three-dimensional point coordinates are obtained. Step 10: Obtain the two-dimensional point coordinates of the feature point in the current camera coordinate system, and determine the target object's posture equivalent to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
如图8所示,在一个实施例中,提出了一种物体定位装置,该装置包括:As shown in FIG. 8, in one embodiment, an object positioning device is proposed, which includes:
第一获取模块802,用于获取通过对待定位的目标物体进行拍摄得到的目标图像;第一提取模块804,用于对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;查找模块806,用于在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;确定模块808,用于获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。The first acquisition module 802 is used to acquire a target image obtained by shooting the target object to be located; the first extraction module 804 is used to perform feature extraction on the target object in the target image to obtain the corresponding Two-dimensional feature; the search module 806 is used to respectively search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. Established by learning based on the marker points, and storing the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points; the determination module 808 is used to obtain the two-dimensional features of each feature point in the current camera coordinate system Point coordinates, the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.
在一个实施例中,上述物体定位装置还包括:第二获取模块,用于获取通过对标志点和目标物体进行拍摄得到的包含有所述标志点和所述目标物体的多张视频图像;转换关系确定模块,用于确定每一视频图像对应的相机坐标系与标志点坐标系之间的转换关系;计算模块,用于根据所述转换关系计算得到每一视频图像对应的相机坐标系与基准坐标系之间的变换关系;转换模块,用于根据所述变换关系将每一视频图像中的目标物体的特征点的坐标转换到基准坐标系,得到每一视频图像中的特征点在所述基准坐标系中的二维坐标;第二提取模块,用于对所述视频图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;三维重建模块,用于根据所述每个特征点对应的二维特征和所述每一视频图像中的特征点在所述基准坐标系中的二维坐标对所述目标物体进行三维重建,得到每个特征点进行三维重建后对应的三维点坐标;存储模块,用于将目标物体中的特征点的二维特征和对应的三维点坐标进行关联存储,完成所述词袋的建立。In one embodiment, the above-mentioned object positioning apparatus further includes: a second acquisition module, configured to acquire multiple video images containing the marker point and the target object obtained by shooting the marker point and the target object; conversion The relationship determination module is used to determine the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system; the calculation module is used to calculate the camera coordinate system and reference corresponding to each video image according to the conversion relationship A transformation relationship between coordinate systems; a transformation module for transforming the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the feature points in each video image in the The two-dimensional coordinates in the reference coordinate system; the second extraction module is used to extract the feature of the target object in the video image to obtain the two-dimensional feature corresponding to each feature point; the three-dimensional reconstruction module is used to The two-dimensional features corresponding to each feature point and the two-dimensional coordinates of the feature points in each video image in the reference coordinate system are used to three-dimensionally reconstruct the target object, and the corresponding three-dimensional reconstruction of each feature point is obtained. Three-dimensional point coordinates; a storage module for associatively storing the two-dimensional features of the feature points in the target object and the corresponding three-dimensional point coordinates to complete the establishment of the word bag.
在一个实施例中,三维重建模块还用于根据所述特征点的二维特征对不同视频图像中特征点进行匹配,确定不同视频图像中对应的同一特征点,获取不同视频图像中的所述同一特征点在所述基准坐标系中对应的不同的二维坐标;获取所述相机的内部参数矩阵和不同视频图像对应的相机坐标系之间的变换关 系;根据所述内部参数矩阵、变换关系和所述同一特征点对应的不同的二维坐标对相应的特征点进行三维重建,得到所述特征点在所述基准坐标系下对应的三维点坐标。In one embodiment, the three-dimensional reconstruction module is further used to match the feature points in different video images according to the two-dimensional features of the feature points, determine the corresponding feature points in different video images, and obtain the Corresponding different two-dimensional coordinates of the same feature point in the reference coordinate system; obtaining the transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images; according to the internal parameter matrix, the transformation relationship Different two-dimensional coordinates corresponding to the same feature point are used to three-dimensionally reconstruct the corresponding feature point to obtain the corresponding three-dimensional point coordinates of the feature point under the reference coordinate system.
在一个实施例中,三维重建模块还用于获取标志点在所述标志点坐标系中对应的三维点坐标;对所述视频图像中的标志点进行识别,确定所述标志点在所述相机坐标系中的二维坐标;根据所述标志点在所述相机坐标系中的二维坐标和在所述标志点坐标系中的三维点坐标计算得到所述相机坐标系与所述标志点坐标系之间的转换关系。In one embodiment, the three-dimensional reconstruction module is further used to obtain the coordinate of the corresponding three-dimensional point of the marked point in the coordinate system of the marked point; identify the marked point in the video image and determine that the marked point is in the camera Two-dimensional coordinates in the coordinate system; the camera coordinate system and the coordinates of the marker point are calculated according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system Conversion relationship between departments.
在一个实施例中,上述物体定位装置还包括:分割模块,用于确定每一视频图像中目标物体对应的分割位置,根据所述分割位置从相应的视频图像中提取出目标物体,当提取出视频图像中的目标物体后,通知特征提取模块进入对所述视频图像中的目标物体进行特征提取,得到每个特征点对应的二维特征。In one embodiment, the above-mentioned object positioning device further includes: a segmentation module for determining a segmentation position corresponding to the target object in each video image, and extracting the target object from the corresponding video image according to the segmentation position, when the extracted After the target object in the video image, the feature extraction module is notified to enter feature extraction on the target object in the video image to obtain the two-dimensional feature corresponding to each feature point.
在一个实施例中,确定模块还用于将所述三维点坐标转换到目标物体对应的坐标系上,得到目标三维坐标;根据所述当前相机坐标系中的二维点坐标和所述目标三维坐标计算得到所述目标物体相对于当前相机坐标系的位置关系。In one embodiment, the determination module is further configured to convert the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates; according to the two-dimensional point coordinates in the current camera coordinate system and the target three-dimensional The coordinate calculation obtains the positional relationship of the target object relative to the current camera coordinate system.
在一个实施例中,确定模块还用于获取目标物体中每一个特征点对应的三维点坐标;对所有特征点对应的三维点坐标进行平均,得到平均三维点坐标;将所述每个特征点对应的三维点坐标减去所述平均三维点坐标,得到相应的目标三维坐标。In one embodiment, the determination module is further used to obtain the three-dimensional point coordinates corresponding to each feature point in the target object; average the three-dimensional point coordinates corresponding to all feature points to obtain an average three-dimensional point coordinate; The corresponding three-dimensional point coordinates are subtracted from the average three-dimensional point coordinates to obtain the corresponding target three-dimensional coordinates.
图9示出了一个实施例中计算机设备的内部结构图。该计算机可以是终端,也可以是服务器。如图9所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现物体定位方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行物体定位方法。网络接口用于与外界进行通信。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方 案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。FIG. 9 shows an internal structure diagram of a computer device in an embodiment. The computer can be a terminal or a server. As shown in FIG. 9, the computer device includes a processor, a memory, and a network interface connected through a system bus. Among them, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program. When the computer program is executed by the processor, the processor may cause the processor to implement an object positioning method. A computer program may also be stored in the internal memory. When the computer program is executed by the processor, the processor may cause the processor to execute the object positioning method. The network interface is used to communicate with the outside world. Those skilled in the art may understand that the structure shown in FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.
在一个实施例中,本申请提供的物体定位方法可以实现为一种计算机程序的形式,计算机程序可在如图9所示的计算机设备上运行。计算机设备的存储器中可存储组成该物体定位装置的各个程序模板。比如,拍摄模块802、提取模块804、查找模块806和确定模块808。In one embodiment, the object positioning method provided by the present application may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 9. Various program templates constituting the object positioning device can be stored in the memory of the computer equipment. For example, the shooting module 802, the extraction module 804, the search module 806, and the determination module 808.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:获取通过对待定位的目标物体进行拍摄得到的目标图像;对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。A computer device includes a memory and a processor, and the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps: acquisition is obtained by shooting a target object to be located The target image of the target image; perform feature extraction on the target object in the target image to obtain the two-dimensional feature corresponding to each feature point; find the target feature that matches each of the two-dimensional feature in the bag of words, according to the The target feature determines the three-dimensional point coordinates corresponding to the corresponding feature points. The bag of words is learned based on the marker points and stores the correspondence between the two-dimensional features of the feature points in the target object and the three-dimensional point coordinates; The two-dimensional point coordinates of the feature points in the current camera coordinate system, and the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:获取通过对待定位的目标物体进行拍摄得到的目标图像;对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。A computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to perform the following steps: acquiring a target image obtained by shooting a target object to be located; The target object in the image is subjected to feature extraction to obtain the two-dimensional features corresponding to each feature point; the target features matching each of the two-dimensional features are respectively found in the bag of words, and the corresponding feature points are determined according to the target features 3D point coordinates, the bag of words is learned based on the marker points, and stores the correspondence between the 2D features of the feature points in the target object and the coordinates of the 3D points; obtain each feature point in the current camera coordinate system The two-dimensional point coordinates in, the positional relationship of the target object relative to the current camera coordinate system is determined according to the two-dimensional point coordinates and the three-dimensional point coordinates.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施 例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art may understand that all or part of the processes in the method of the above embodiments may be completed by instructing relevant hardware through a computer program, and the program may be stored in a non-volatile computer-readable storage medium In this case, when the program is executed, it may include the processes of the embodiments of the above methods. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the scope described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiment only expresses several implementation manners of the present application, and its description is more specific and detailed, but it cannot be understood as a limitation of the patent scope of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, a number of modifications and improvements can also be made, which all fall within the protection scope of the present application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims (10)

  1. 一种物体定位方法,其特征在于,所述方法包括:An object positioning method, characterized in that the method includes:
    获取通过对待定位的目标物体进行拍摄得到的目标图像;Acquire the target image obtained by shooting the target object to be located;
    对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;Performing feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;
    在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;Find the target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marker points and stored The correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
    获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。Obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  2. 根据权利要求1所述的方法,其特征在于,在所述在词袋中分别查找与每个所述二维特征匹配的目标特征之前,还包括:建立词袋;所述建立词袋包括以下步骤:获取通过对标志点和目标物体进行拍摄得到的包含有所述标志点和所述目标物体的多张视频图像;The method according to claim 1, wherein before the searching for target features matching each of the two-dimensional features in the bag of words, the method further comprises: building a bag of words; the building of the bag of words includes the following Step: Obtain multiple video images containing the marked point and the target object obtained by shooting the marked point and the target object;
    确定每一视频图像对应的相机坐标系与标志点坐标系之间的转换关系;Determine the conversion relationship between the camera coordinate system and the point coordinate system corresponding to each video image;
    根据所述转换关系计算得到每一视频图像对应的相机坐标系与基准坐标系之间的变换关系;Calculating the conversion relationship between the camera coordinate system and the reference coordinate system corresponding to each video image according to the conversion relationship;
    根据所述变换关系将每一视频图像中的目标物体的特征点的坐标转换到基准坐标系,得到每一视频图像中的特征点在所述基准坐标系中的二维坐标;Converting the coordinates of the feature points of the target object in each video image to the reference coordinate system according to the transformation relationship, to obtain the two-dimensional coordinates of the feature points in each video image in the reference coordinate system;
    对所述视频图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;Performing feature extraction on the target object in the video image to obtain a two-dimensional feature corresponding to each feature point;
    根据所述每个特征点对应的二维特征和所述每一视频图像中的特征点在所述基准坐标系中的二维坐标对所述目标物体进行三维重建,得到每个特征点进行三维重建后对应的三维点坐标;3D reconstruction of the target object according to the 2D feature corresponding to each feature point and the 2D coordinates of the feature point in each video image in the reference coordinate system to obtain each feature point for 3D Corresponding 3D point coordinates after reconstruction;
    将目标物体中的特征点的二维特征和对应的三维点坐标进行关联存储,完成所述词袋的建立。The two-dimensional features of the feature points in the target object and the corresponding three-dimensional point coordinates are stored in association to complete the establishment of the word bag.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述每个特征点对 应的二维特征和所述每一视频图像中的特征点在所述基准坐标系中的二维坐标对所述目标物体进行三维重建,得到每个特征点进行三维重建后对应的三维点坐标,包括:The method according to claim 2, wherein the two-dimensional coordinate pair in the reference coordinate system according to the two-dimensional feature corresponding to each feature point and the feature point in each video image The three-dimensional reconstruction of the target object to obtain the corresponding three-dimensional point coordinates after the three-dimensional reconstruction of each feature point includes:
    根据所述特征点的二维特征对不同视频图像中特征点进行匹配,确定不同视频图像中对应的同一特征点,获取不同视频图像中的所述同一特征点在所述基准坐标系中对应的不同的二维坐标;Match feature points in different video images according to the two-dimensional features of the feature points, determine corresponding feature points in different video images, and obtain corresponding features of the same feature point in different video images in the reference coordinate system Different two-dimensional coordinates;
    获取所述相机的内部参数矩阵和不同视频图像对应的相机坐标系之间的变换关系;Obtaining a transformation relationship between the internal parameter matrix of the camera and the camera coordinate system corresponding to different video images;
    根据所述内部参数矩阵、变换关系和所述同一特征点对应的不同的二维坐标对相应的特征点进行三维重建,得到所述特征点在所述基准坐标系下对应的三维点坐标。Perform a three-dimensional reconstruction of the corresponding feature point according to the internal parameter matrix, the transformation relationship, and different two-dimensional coordinates corresponding to the same feature point to obtain the corresponding three-dimensional point coordinates of the feature point in the reference coordinate system.
  4. 根据权利要求2所述的方法,其特征在于,所述确定每一视频图像对应的相机坐标系与标志点坐标系之间的转换关系,包括:The method according to claim 2, wherein the determining the conversion relationship between the camera coordinate system corresponding to each video image and the marker point coordinate system includes:
    获取标志点在所述标志点坐标系中对应的三维点坐标;Acquiring the coordinate of the three-dimensional point corresponding to the marked point in the coordinate system of the marked point;
    对所述视频图像中的标志点进行识别,确定所述标志点在所述相机坐标系中的二维坐标;Identify the marked points in the video image, and determine the two-dimensional coordinates of the marked points in the camera coordinate system;
    根据所述标志点在所述相机坐标系中的二维坐标和在所述标志点坐标系中的三维点坐标计算得到所述相机坐标系与所述标志点坐标系之间的转换关系。The conversion relationship between the camera coordinate system and the marker point coordinate system is calculated according to the two-dimensional coordinates of the marker point in the camera coordinate system and the three-dimensional point coordinates in the marker point coordinate system.
  5. 根据权利要求2所述的方法,其特征在于,在所述对标志点和目标物体进行拍摄,得到包含有所述标志点和所述目标物体的多张视频图像的步骤之后,还包括:The method according to claim 2, wherein after the step of photographing the marker point and the target object to obtain multiple video images including the marker point and the target object, the method further comprises:
    确定每一视频图像中目标物体对应的分割位置,根据所述分割位置从相应的视频图像中提取出目标物体,当提取出视频图像中的目标物体后,进入对所述视频图像中的目标物体进行特征提取,得到每个特征点对应的二维特征的步骤。Determine the segmentation position corresponding to the target object in each video image, extract the target object from the corresponding video image according to the segmentation position, and after extracting the target object in the video image, enter the target object in the video image The step of performing feature extraction to obtain the two-dimensional feature corresponding to each feature point.
  6. 根据权利要求1所述的方法,其特征在于,所述获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述 目标物体相对于当前相机坐标系的位置关系,包括:The method according to claim 1, wherein the acquiring the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determining the target object according to the two-dimensional point coordinates and the three-dimensional point coordinates The positional relationship with respect to the current camera coordinate system, including:
    将所述三维点坐标转换到目标物体对应的坐标系上,得到目标三维坐标;Convert the three-dimensional point coordinates to the coordinate system corresponding to the target object to obtain the target three-dimensional coordinates;
    根据所述当前相机坐标系中的二维点坐标和所述目标三维坐标计算得到所述目标物体相对于当前相机坐标系的位置关系。The positional relationship of the target object relative to the current camera coordinate system is calculated according to the two-dimensional point coordinates in the current camera coordinate system and the target three-dimensional coordinates.
  7. 根据权利要求6所述的方法,其特征在于,所述将所述三维点坐标转换到目标物体对应的坐标系上,得到目标三维坐标,包括:The method according to claim 6, wherein the converting the three-dimensional point coordinates to a coordinate system corresponding to the target object to obtain the target three-dimensional coordinates includes:
    获取目标物体中每一个特征点对应的三维点坐标;Obtain the three-dimensional point coordinates corresponding to each feature point in the target object;
    对所有特征点对应的三维点坐标进行平均,得到平均三维点坐标;Average the three-dimensional point coordinates corresponding to all feature points to obtain the average three-dimensional point coordinates;
    将所述每个特征点对应的三维点坐标减去所述平均三维点坐标,得到相应的目标三维坐标。Subtract the average three-dimensional point coordinates from the three-dimensional point coordinates corresponding to each feature point to obtain the corresponding target three-dimensional coordinates.
  8. 一种物体定位装置,其特征在于,所述装置包括:An object positioning device, characterized in that the device comprises:
    第一获取模块,用于获取通过对待定位的目标物体进行拍摄得到的目标图像;The first acquisition module is used to acquire the target image obtained by shooting the target object to be located;
    第一提取模块,用于对所述目标图像中的目标物体进行特征提取,得到每个特征点对应的二维特征;The first extraction module is used to perform feature extraction on the target object in the target image to obtain a two-dimensional feature corresponding to each feature point;
    查找模块,用于在词袋中分别查找与每个所述二维特征匹配的目标特征,根据所述目标特征确定相应的特征点对应的三维点坐标,所述词袋是基于标志点进行学习建立的,存储了目标物体中的特征点的二维特征和三维点坐标之间的对应关系;The searching module is used to search for target features matching each of the two-dimensional features in the bag of words, and determine the three-dimensional point coordinates corresponding to the corresponding feature points according to the target features. The bag of words is learned based on the marked points Established and stored the correspondence between the two-dimensional features of the feature points in the target object and the coordinates of the three-dimensional points;
    确定模块,用于获取每个特征点在当前相机坐标系中的二维点坐标,根据所述二维点坐标和所述三维点坐标确定所述目标物体相对于当前相机坐标系的位置关系。The determining module is configured to obtain the two-dimensional point coordinates of each feature point in the current camera coordinate system, and determine the positional relationship of the target object relative to the current camera coordinate system according to the two-dimensional point coordinates and the three-dimensional point coordinates.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。A computer device comprising a memory and a processor, the memory storing a computer program, when the computer program is executed by the processor, the processor is caused to perform the method according to any one of claims 1 to 7. A step of.
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。A computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
PCT/CN2018/124409 2018-12-27 2018-12-27 Object positioning method and apparatus, computer device, and storage medium WO2020133080A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/124409 WO2020133080A1 (en) 2018-12-27 2018-12-27 Object positioning method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/124409 WO2020133080A1 (en) 2018-12-27 2018-12-27 Object positioning method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020133080A1 true WO2020133080A1 (en) 2020-07-02

Family

ID=71126144

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124409 WO2020133080A1 (en) 2018-12-27 2018-12-27 Object positioning method and apparatus, computer device, and storage medium

Country Status (1)

Country Link
WO (1) WO2020133080A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950431A (en) * 2020-08-07 2020-11-17 北京猎户星空科技有限公司 Object searching method and device
CN112716509A (en) * 2020-12-24 2021-04-30 上海联影医疗科技股份有限公司 Motion control method and system for medical equipment
CN113705390A (en) * 2021-08-13 2021-11-26 北京百度网讯科技有限公司 Positioning method, positioning device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101839692A (en) * 2010-05-27 2010-09-22 西安交通大学 Method for measuring three-dimensional position and stance of object with single camera
CN102368810A (en) * 2011-09-19 2012-03-07 长安大学 Semi-automatic aligning video fusion system and method thereof
CN102645173A (en) * 2011-02-16 2012-08-22 张文杰 Multi-vision-based bridge three-dimensional deformation monitoring method
CN102722886A (en) * 2012-05-21 2012-10-10 浙江捷尚视觉科技有限公司 Video speed measurement method based on three-dimensional calibration and feature point matching
US20160350904A1 (en) * 2014-03-18 2016-12-01 Huawei Technologies Co., Ltd. Static Object Reconstruction Method and System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101839692A (en) * 2010-05-27 2010-09-22 西安交通大学 Method for measuring three-dimensional position and stance of object with single camera
CN102645173A (en) * 2011-02-16 2012-08-22 张文杰 Multi-vision-based bridge three-dimensional deformation monitoring method
CN102368810A (en) * 2011-09-19 2012-03-07 长安大学 Semi-automatic aligning video fusion system and method thereof
CN102722886A (en) * 2012-05-21 2012-10-10 浙江捷尚视觉科技有限公司 Video speed measurement method based on three-dimensional calibration and feature point matching
US20160350904A1 (en) * 2014-03-18 2016-12-01 Huawei Technologies Co., Ltd. Static Object Reconstruction Method and System

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950431A (en) * 2020-08-07 2020-11-17 北京猎户星空科技有限公司 Object searching method and device
CN111950431B (en) * 2020-08-07 2024-03-26 北京猎户星空科技有限公司 Object searching method and device
CN112716509A (en) * 2020-12-24 2021-04-30 上海联影医疗科技股份有限公司 Motion control method and system for medical equipment
CN112716509B (en) * 2020-12-24 2023-05-02 上海联影医疗科技股份有限公司 Motion control method and system for medical equipment
CN113705390A (en) * 2021-08-13 2021-11-26 北京百度网讯科技有限公司 Positioning method, positioning device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110568447B (en) Visual positioning method, device and computer readable medium
CN110176032B (en) Three-dimensional reconstruction method and device
CN111383270B (en) Object positioning method, device, computer equipment and storage medium
RU2609434C2 (en) Detection of objects arrangement and location
CN109993793B (en) Visual positioning method and device
WO2019042426A1 (en) Augmented reality scene processing method and apparatus, and computer storage medium
CN109472828B (en) Positioning method, positioning device, electronic equipment and computer readable storage medium
WO2020133080A1 (en) Object positioning method and apparatus, computer device, and storage medium
CN114119864A (en) Positioning method and device based on three-dimensional reconstruction and point cloud matching
CN111107337B (en) Depth information complementing method and device, monitoring system and storage medium
CN114862973B (en) Space positioning method, device and equipment based on fixed point location and storage medium
CN115423863B (en) Camera pose estimation method and device and computer readable storage medium
CN113269671A (en) Bridge apparent panorama generation method based on local and global features
CN110567441A (en) Particle filter-based positioning method, positioning device, mapping and positioning method
CN116295279A (en) Unmanned aerial vehicle remote sensing-based building mapping method and unmanned aerial vehicle
CN111724432B (en) Object three-dimensional detection method and device
KR20230049969A (en) Method and apparatus for global localization
KR101598399B1 (en) System for combining images using coordinate information of roadview image
Pollok et al. A visual SLAM-based approach for calibration of distributed camera networks
Ventura et al. Structure and motion in urban environments using upright panoramas
JP2016527574A (en) A method for registering data using a set of primitives
KR101673144B1 (en) Stereoscopic image registration method based on a partial linear method
CN111179342A (en) Object pose estimation method and device, storage medium and robot
KR20170001448A (en) Apparatus for measuring position of camera using stereo camera and method using the same
CN113674353B (en) Accurate pose measurement method for space non-cooperative target

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18944665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18944665

Country of ref document: EP

Kind code of ref document: A1