WO2022028253A1 - 定位模型优化方法、定位方法和定位设备以及存储介质 - Google Patents

定位模型优化方法、定位方法和定位设备以及存储介质 Download PDF

Info

Publication number
WO2022028253A1
WO2022028253A1 PCT/CN2021/107975 CN2021107975W WO2022028253A1 WO 2022028253 A1 WO2022028253 A1 WO 2022028253A1 CN 2021107975 W CN2021107975 W CN 2021107975W WO 2022028253 A1 WO2022028253 A1 WO 2022028253A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional point
positioning
descriptors
predetermined threshold
scene
Prior art date
Application number
PCT/CN2021/107975
Other languages
English (en)
French (fr)
Inventor
罗琳捷
刘晶
陈志立
王国晖
杨骁�
杨建朝
连晓晨
Original Assignee
罗琳捷
字节跳动有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 罗琳捷, 字节跳动有限公司 filed Critical 罗琳捷
Priority to US18/000,375 priority Critical patent/US20230222749A1/en
Publication of WO2022028253A1 publication Critical patent/WO2022028253A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/56Particle system, point based geometry or rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2004Aligning objects, relative positioning of parts

Definitions

  • the present disclosure relates to the field of positioning, and more particularly, to a positioning model optimization method, an image-based positioning method and a positioning device, and a computer-readable storage medium.
  • GPS Global Positioning System, global positioning system
  • Wi-Fi wireless networks etc.
  • these positioning technologies have many problems such as being susceptible to interference and limited scope of application.
  • image-based positioning methods can provide better positioning accuracy by performing three-dimensional positioning relative to known scenes, so as to better serve the application of Augmented Reality (AR). .
  • AR Augmented Reality
  • a 3D localization model obtained by 3D reconstruction of a series of images of a scene contains a large number of 3D points and corresponding 2D feature points in each image and their descriptors.
  • For an image to be queried that needs to be located firstly, it is necessary to extract the two-dimensional feature points and their descriptors in the image to be queried, and then match them with the descriptors in the localization model. If the matching descriptor is determined, the corresponding descriptor is determined. Three-dimensional point, so as to realize the positioning of the image to be queried.
  • the number of 3D points and corresponding descriptors in the localization model is positively correlated with the size of the scene and the number of input images, and directly affects the efficiency of the localization algorithm.
  • Traditional image-based localization methods rely on powerful computing power to process large-scale 3D points and descriptors, but are usually only implemented on the server side, while server-side localization methods need to rely on network connections and high-speed bandwidth.
  • server-side localization methods need to rely on network connections and high-speed bandwidth.
  • Various AR real-time applications on mobile have caused considerable limitations.
  • the present invention provides a positioning model optimization method, an image-based positioning method and positioning device, and a computer-readable storage medium.
  • a method for optimizing a positioning model comprising: inputting a positioning model of a scene, the positioning model including a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud For each three-dimensional point in the three-dimensional point cloud, determine a plurality of adjacent points of the three-dimensional point, and if the distance relationship between each adjacent point in the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold , outputting the three-dimensional point and a plurality of descriptors corresponding to the three-dimensional point into the optimized positioning model of the scene; and outputting the optimized positioning model of the scene.
  • an image-based positioning method comprising: inputting an image to be queried; locating the image to be queried by using an optimized positioning model of a scene to which the image to be queried belongs; and outputting a captured image The pose of the camera of the image to be queried.
  • the optimized positioning model of the scene is obtained by the following method: inputting the positioning model of the scene, the positioning model includes a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; For each three-dimensional point in the three-dimensional point cloud, determine a plurality of adjacent points of the three-dimensional point, and if the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold, output adding the three-dimensional point and a plurality of descriptors corresponding to the three-dimensional point into an optimized positioning model of the scene; and outputting the optimized positioning model of the scene.
  • an image-based positioning device comprising: an input unit configured to input an image to be queried; a positioning unit configured to utilize an optimized positioning model of a scene to which the image to be queried belongs positioning the image to be queried; and an output unit configured to output the pose of the camera that captured the image to be queried.
  • the positioning device further includes an optimization unit, and the optimization unit is configured to: receive a positioning model of the input scene, where the positioning model includes a three-dimensional point cloud and multiple points corresponding to each three-dimensional point in the three-dimensional point cloud.
  • a descriptor for each 3D point in the 3D point cloud, determine a plurality of neighbors of the 3D point, and if the distance relationship between each neighbor in the plurality of neighbors and the 3D point If it is less than a predetermined threshold, output the three-dimensional point and a plurality of descriptors corresponding to the three-dimensional point into the optimized positioning model of the scene; and output the optimized positioning model of the scene.
  • an image-based positioning device comprising one or more processors; and one or more memories, wherein the memories have computer-readable codes stored therein, the computers can Reading code, when executed by the one or more processors, causes the one or more processors to perform the method of any of the above aspects.
  • a computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions, when executed by a processor, cause the processor to perform any of the above aspects The method of any one.
  • the positioning model optimization method by determining a plurality of neighbor points for each three-dimensional point in the three-dimensional point cloud, and When the distance relationship between each of the multiple neighboring points and the 3D point is smaller than a predetermined threshold, the 3D point and its corresponding multiple descriptors are output to the optimized positioning model of the scene, which can effectively reduce the redundant positioning model.
  • the number of 3D points is increased, the positioning speed is accelerated, and the positioning efficiency is improved, which can optimize the positioning calculation on the mobile device, and make various real-time AR applications based on the scene 3D positioning possible on the mobile device.
  • FIG. 1 shows a flowchart of a positioning model optimization method 100 according to an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of an image-based positioning method 200 according to an embodiment of the present disclosure
  • FIG. 3 shows a schematic structural diagram of an image-based positioning device 300 according to an embodiment of the present disclosure
  • FIG. 4 shows a schematic structural diagram of an exemplary electronic device 400 according to an embodiment of the present disclosure.
  • FIG. 5 shows a schematic diagram of an exemplary computer-readable storage medium 500 according to an embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • FIG. 1 shows a flowchart of a positioning model optimization method 100 according to an embodiment of the present disclosure.
  • a localization model of the scene is input, and the localization model includes a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud.
  • the scene can be any geographic scene such as a building, a city, etc., for example.
  • its positioning model may be a three-dimensional positioning model obtained by performing three-dimensional reconstruction on the scene. For example, a series of images of the scene can be captured in advance, and then a three-dimensional localization model of the scene can be obtained by performing three-dimensional reconstruction of the scene based on the images.
  • the 3D positioning model includes a 3D point cloud formed by a large number of 3D points, and each 3D point in the 3D point cloud corresponds to a series of 2D feature points located on each image and descriptors of these 2D feature points.
  • the descriptor may be, for example, a parameter describing the relationship between the two-dimensional feature point and its surrounding contents, so that the matching of the feature point can be realized by using the descriptor.
  • the descriptors can be binary feature descriptors that describe small blocks of pixels around the feature points.
  • descriptors include BRISK (Binary Robust Invariant Scalable Keypoints, Binary Robust Invariant Scalable Keypoints) descriptor, BRIEF (Binary Robust Independent Elementary Features) descriptor, etc.
  • step S120 for each three-dimensional point in the three-dimensional point cloud, a plurality of adjacent points of the three-dimensional point are determined, and if the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined distance If the threshold is set, the 3D point and multiple descriptors corresponding to the 3D point are output to the optimal positioning model of the scene. Otherwise, if the distance relationship between each of the plurality of neighboring points and the three-dimensional point is not all smaller than the predetermined threshold, the three-dimensional point and its corresponding descriptor are not output.
  • the 3D points in the 3D point cloud are clustered, and the number of redundant 3D points and their corresponding descriptors is reduced, and the specific reduction extent depends on the size of the predetermined threshold.
  • determining multiple neighboring points of a certain three-dimensional point may include: determining multiple points in the three-dimensional point cloud whose distances from the three-dimensional point are less than a first predetermined threshold as multiple neighboring points. For example, for a certain three-dimensional point p, if the distance between a certain point p' in its neighborhood and the three-dimensional point p is less than the first predetermined threshold, the point p' is determined to be the nearest neighbor of the three-dimensional point p, and so on, determine The n nearest neighbors of a 3D point p.
  • the size of the first predetermined threshold determines the range of clustering.
  • the first predetermined threshold is 5, it means that the points whose distance from the three-dimensional point p is less than 5 are the nearest neighbors of the three-dimensional point p.
  • the distance relationship between each of the plurality of adjacent points and the three-dimensional point is smaller than the predetermined threshold may refer to: the distance between each of the plurality of adjacent points and the three-dimensional point is smaller than the third Two predetermined thresholds.
  • the 3D point p and multiple descriptors corresponding to the 3D point p are output to the optimized localization model of the scene; otherwise, the 3D point p and its corresponding descriptors are not output.
  • the distance of each neighboring point p' from the three-dimensional point p may be the distance between the three-dimensional coordinates (x', y', z') of p' and the three-dimensional coordinates (x, y, z) of p.
  • the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold may refer to: multiple descriptions of each of the plurality of adjacent points
  • the distance of the symbol from the plurality of descriptors of the three-dimensional point is less than a third predetermined threshold.
  • each 3D point corresponds to a 2D feature point in a series of images, and each 2D feature point has a corresponding descriptor, that is, each 3D point corresponds to multiple descriptors, and each 3D point has multiple descriptors.
  • Each of the nearest neighbors also corresponds to multiple descriptors.
  • a three-dimensional point p may have corresponding m descriptors d 1 , ... d m
  • a neighboring point p' of the three-dimensional point p may have corresponding m' descriptors d' 1 , ... d' m' , m and m' depend on the number of 2D feature points used to reconstruct 3D points p and p'.
  • a third predetermined threshold can be determined by comparing the descriptors of the points.
  • the distance between the average values of the descriptors of the three-dimensional point p and its neighbors p' may be, for example, their Euclidean distance.
  • the distance between a three-dimensional point and the descriptors of its neighboring points is described above by taking the distance between the average value of the descriptors of the three-dimensional point and its neighboring points as an example, the present disclosure is not limited thereto, for example, The distance between the 3D point and the descriptors of its neighbors can also be represented by the distance between the weighted values of the descriptors of the 3D point and its neighbors, or a certain number of 3D points and their neighbors can be selected respectively. descriptors, and based on the distances between these descriptors, an average value, a weighted value, or a maximum/minimum value is determined for comparison with a third predetermined threshold.
  • the distance relationship between each of the plurality of adjacent points and the three-dimensional point is smaller than the predetermined threshold may refer to: the distance between each of the plurality of adjacent points and the three-dimensional point is smaller than the third Two predetermined thresholds, and the distance of the plurality of descriptors of each of the plurality of adjacent points from the plurality of descriptors of the three-dimensional point is less than a third predetermined threshold. That is to say, the 3D point and its corresponding descriptors are output to the scene only when the distance relationship between each of the neighbors of a 3D point and the 3D point satisfies the above two threshold conditions at the same time. in the optimized localization model; otherwise, the 3D point and its corresponding descriptor are not output.
  • the second predetermined threshold and the third predetermined threshold in the above example may be selected according to actual needs, which are not limited in the present disclosure.
  • the second predetermined threshold may be smaller than the first predetermined threshold, that is, a larger cluster of multiple adjacent points including the three-dimensional point may be determined first through a larger first predetermined threshold. class range, and then further judge whether to output the three-dimensional point and its corresponding descriptor through the second predetermined threshold and/or the third predetermined threshold.
  • the first predetermined threshold may be equal to the second predetermined threshold, that is, determining multiple neighbors of a three-dimensional point and judging each neighbor of the multiple neighbors Whether the distance from the three-dimensional point is smaller than the second predetermined threshold may be performed simultaneously.
  • the optimized positioning model of the scene is output.
  • the optimized positioning model of the output scene can perform positioning calculation quickly and efficiently, so in addition to the traditional server-side positioning calculation, it can also be applied to the positioning calculation on mobile terminals such as mobile phones and portable computers.
  • the 3D point and its corresponding descriptors are output to the optimized positioning model of the scene, which can effectively reduce the number of redundant 3D points in the positioning model, speed up the positioning speed, and improve the positioning efficiency, so that the optimization can be achieved.
  • the positioning calculation on mobile devices makes various real-time AR applications based on scene 3D positioning possible on mobile devices.
  • FIG. 2 shows a flowchart of an image-based positioning method 200 according to an embodiment of the present disclosure.
  • an image to be queried is input.
  • the image to be queried is, for example, an image captured by a photographing device such as a camera.
  • the image to be queried is located by using the optimized positioning model of the scene to which the image to be queried belongs.
  • the optimized positioning model of the scene can be obtained, for example, by the following methods: inputting the positioning model of the scene, the positioning model includes a three-dimensional point cloud and multiple descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; a three-dimensional point, determine a plurality of adjacent points of the three-dimensional point, and if the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold, output the three-dimensional point and a plurality of corresponding three-dimensional points. descriptors into the optimized localization model of the scene; and the optimized localization model of the output scene.
  • its positioning model may be a three-dimensional positioning model obtained by performing three-dimensional reconstruction on the scene.
  • a series of images of the scene can be captured in advance, and then a three-dimensional localization model of the scene can be obtained by performing three-dimensional reconstruction of the scene based on the images.
  • the 3D positioning model includes a 3D point cloud formed by a large number of 3D points, and each 3D point in the 3D point cloud corresponds to a series of 2D feature points located on each image and descriptors of these 2D feature points.
  • the descriptor may be, for example, a parameter describing the relationship between the two-dimensional feature point and its surrounding contents, so that the matching of the feature point can be realized by using the descriptor.
  • the descriptors can be binary feature descriptors that describe small blocks of pixels around the feature points. Commonly used descriptors include BRISK descriptors, BRIEF descriptors, and the like.
  • determining multiple neighboring points of a certain three-dimensional point may include: determining multiple points in the three-dimensional point cloud whose distances from the three-dimensional point are less than a first predetermined threshold as multiple neighboring points. For example, for a certain three-dimensional point p, if the distance between a certain point p' in its neighborhood and the three-dimensional point p is less than the first predetermined threshold, the point p' is determined to be the nearest neighbor of the three-dimensional point p, and so on, determine The n nearest neighbors of a 3D point p.
  • the size of the first predetermined threshold determines the range of clustering.
  • the first predetermined threshold is 5, it means that the points whose distance from the three-dimensional point p is less than 5 are the nearest neighbors of the three-dimensional point p.
  • the distance relationship between each of the plurality of adjacent points and the three-dimensional point is smaller than the predetermined threshold may refer to: the distance between each of the plurality of adjacent points and the three-dimensional point is smaller than the third Two predetermined thresholds.
  • the 3D point p and multiple descriptors corresponding to the 3D point p are output to the optimized localization model of the scene; otherwise, the 3D point p and its corresponding descriptors are not output.
  • the distance of each neighboring point p' from the three-dimensional point p may be the distance between the three-dimensional coordinates (x', y', z') of p' and the three-dimensional coordinates (x, y, z) of p.
  • the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold may refer to: multiple descriptions of each of the plurality of adjacent points
  • the distance of the symbol from the plurality of descriptors of the three-dimensional point is less than a third predetermined threshold.
  • each 3D point corresponds to a 2D feature point in a series of images, and each 2D feature point has a corresponding descriptor, that is, each 3D point corresponds to multiple descriptors, and each 3D point has multiple descriptors.
  • Each of the nearest neighbors also corresponds to multiple descriptors.
  • a three-dimensional point p may have corresponding m descriptors d 1 , ... d m
  • a neighboring point p' of the three-dimensional point p may have corresponding m' descriptors d' 1 , ... d' m' , m and m' depend on the number of 2D feature points used to reconstruct 3D points p and p'.
  • a third predetermined threshold can be determined by comparing the descriptors of the points.
  • the distance between the average values of the descriptors of the three-dimensional point p and its neighbors p' may be, for example, their Euclidean distance.
  • the distance between a three-dimensional point and the descriptors of its neighboring points is described above by taking the distance between the average value of the descriptors of the three-dimensional point and its neighboring points as an example, the present disclosure is not limited thereto, for example, The distance between the 3D point and the descriptors of its neighbors can also be represented by the distance between the weighted values of the descriptors of the 3D point and its neighbors, or a certain number of 3D points and their neighbors can be selected respectively. descriptors, and based on the distances between these descriptors, an average value, a weighted value, or a maximum/minimum value is determined for comparison with a third predetermined threshold.
  • the distance relationship between each of the plurality of adjacent points and the three-dimensional point is smaller than the predetermined threshold may refer to: the distance between each of the plurality of adjacent points and the three-dimensional point is smaller than the third Two predetermined thresholds, and the distance of the plurality of descriptors of each of the plurality of adjacent points from the plurality of descriptors of the three-dimensional point is less than a third predetermined threshold. That is to say, the 3D point and its corresponding descriptors are output to the scene only when the distance relationship between each of the neighbors of a 3D point and the 3D point satisfies the above two threshold conditions at the same time. in the optimized localization model; otherwise, the 3D point and its corresponding descriptor are not output.
  • the second predetermined threshold and the third predetermined threshold in the above example may be selected according to actual needs, which are not limited in the present disclosure.
  • the second predetermined threshold may be smaller than the first predetermined threshold, that is, a larger clustering range including multiple neighboring points of the three-dimensional point may be determined first by using a larger first predetermined threshold , and then use the second predetermined threshold and/or the third predetermined threshold to further determine whether to output the three-dimensional point and its corresponding descriptor.
  • the first predetermined threshold may be equal to the second predetermined threshold, that is, determining multiple neighbors of a three-dimensional point and judging each neighbor of the multiple neighbors Whether the distance from the three-dimensional point is smaller than the second predetermined threshold may be performed simultaneously.
  • the pose of the camera that shoots the image to be queried is output.
  • the pose of the camera includes, for example, the position and pose of the camera when the image to be queried is captured.
  • the output pose of the camera may be a 6-DOF variable describing the three-dimensional coordinates and rotation direction of the camera.
  • a plurality of neighboring points of each three-dimensional point in the three-dimensional point cloud are determined, and when the distance relationship between each of the multiple neighboring points and the three-dimensional point is less than a predetermined distance
  • the 3D point and its corresponding descriptors are output to optimize the location model of the scene, and then the optimized location model of the scene is used to locate the image to be queried, which can effectively reduce the number of redundant 3D points in the location model.
  • the positioning speed is accelerated and the positioning efficiency is improved, so that the positioning calculation on the mobile device can be optimized, and various real-time AR applications based on the scene 3D positioning on the mobile device are possible.
  • FIG. 3 shows a schematic structural diagram of an image-based positioning device 300 according to an embodiment of the present disclosure. Since the functions of the positioning device 300 are the same as the details of the positioning method 200 described above with reference to FIG. 2 , the detailed description of the same content is omitted here for simplicity.
  • the positioning device 300 includes: an input unit 310 configured to input an image to be queried; a positioning unit 320 configured to locate the image to be queried using an optimized positioning model of the scene to which the image to be queried belongs; and an output unit 330, which is configured to output the pose of the camera that captures the image to be queried.
  • the positioning device 300 may further include an optimization unit 340, and the optimization unit 340 is configured to: receive a positioning model of the input scene, where the positioning model includes a three-dimensional point cloud and a plurality of descriptions corresponding to each three-dimensional point in the three-dimensional point cloud For each three-dimensional point in the three-dimensional point cloud, determine a plurality of adjacent points of the three-dimensional point, and if the distance relationship between each of the adjacent points in the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold, output the three-dimensional point and a plurality of descriptors corresponding to the three-dimensional point into the optimal localization model of the scene; and outputting the optimal localization model of the scene.
  • the positioning device 300 may also include other components, however, since these components are not related to the content of the embodiment of the present disclosure, their illustration and description are omitted here.
  • the positioning device by determining a plurality of adjacent points of each three-dimensional point in the three-dimensional point cloud, and when the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined distance
  • the 3D point and its corresponding descriptors are output to optimize the location model of the scene, and then the optimized location model of the scene is used to locate the image to be queried, which can effectively reduce the number of redundant 3D points in the location model.
  • the positioning speed is accelerated and the positioning efficiency is improved, so that the positioning calculation on the mobile device can be optimized, and various real-time AR applications based on the scene 3D positioning on the mobile device are possible.
  • FIG. 4 shows a schematic structural diagram of an exemplary electronic device 400 according to an embodiment of the present disclosure.
  • An exemplary electronic device 400 according to an embodiment of the present disclosure includes at least one or more processors; and one or more memories, wherein the memory stores computer-readable code that is executed by the one or more processors When run, causes one or more processors to perform the method as described above.
  • the electronic device 400 in the embodiment of the present disclosure may include, but is not limited to, such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), Mobile terminals such as in-vehicle terminals (eg, in-vehicle navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • PDA personal digital assistant
  • PAD tablet computer
  • PMP portable multimedia player
  • Mobile terminals such as in-vehicle terminals (eg, in-vehicle navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • FIG. 4 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 400 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 401 that may be loaded into random access according to a program stored in a read only memory (ROM) 402 or from a storage device 408 Various appropriate actions and processes are executed by the programs in the memory (RAM) 403 . In the RAM 403, various programs and data required for the operation of the electronic device 400 are also stored.
  • the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An input/output (I/O) interface 405 is also connected to bus 404 .
  • I/O interface 405 the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 407 of a computer, etc.; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409. Communication means 409 may allow electronic device 400 to communicate wirelessly or by wire with other devices to exchange data.
  • FIG. 4 shows electronic device 400 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 409, or from the storage device 408, or from the ROM 402.
  • the processing apparatus 401 When the computer program is executed by the processing apparatus 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • FIG. 5 shows a schematic diagram of an exemplary computer-readable storage medium 500 according to an embodiment of the present disclosure.
  • the computer-readable storage medium 500 has computer-readable instructions 501 stored thereon, and the computer-readable instructions 501 are When the processor executes, the processor is caused to execute the positioning model optimization method and the positioning method described in the foregoing embodiments.
  • the above-mentioned computer-readable storage medium of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • Computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon.
  • Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable storage medium, other than a computer-readable storage medium, that can be sent, propagated, or transmitted for use by or in connection with the instruction execution system, apparatus, or device. program.
  • Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a computer-readable storage medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a positioning model optimization method comprising: a positioning model of an input scene, the positioning model comprising a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; for each of the three-dimensional point clouds Three-dimensional point, determine a plurality of adjacent points of the three-dimensional point, and if the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold, output the three-dimensional point and a plurality of descriptions corresponding to the three-dimensional point match into the optimized positioning model of the scene; and output the optimized positioning model of the scene.
  • determining a plurality of adjacent points of the three-dimensional point includes: determining a plurality of points in the three-dimensional point cloud whose distance from the three-dimensional point is less than a first predetermined threshold as a plurality of adjacent points .
  • A4 The method for optimizing the positioning model according to solution A1, wherein the distance relationship between each of the plurality of adjacent points and the three-dimensional point is less than a predetermined threshold value comprising: a distance from each of the plurality of adjacent points to the three-dimensional point The distance of is less than a second predetermined threshold, and the distance of the plurality of descriptors of each of the plurality of adjacent points from the plurality of descriptors of the three-dimensional point is less than the third predetermined threshold.
  • the positioning model optimization method according to solution A3 or A4, wherein the distance between the multiple descriptors of each of the multiple nearby points and the multiple descriptors of the three-dimensional point is less than the third predetermined threshold value includes: multiple The distance of the average value of the descriptors of each of the neighbor points from the average value of the descriptors of the three-dimensional point is less than a third predetermined threshold.
  • An image-based positioning method comprising: inputting an image to be queried; using an optimized positioning model of a scene to which the image to be queried belongs to locate the image to be queried; and outputting the pose of a camera that shoots the image to be queried, wherein the The optimized positioning model is obtained by the following method: the positioning model of the input scene, the positioning model includes a three-dimensional point cloud and multiple descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; for each three-dimensional point in the three-dimensional point cloud, Determine a plurality of adjacent points of the 3D point, and if the distance relationship between each of the plurality of adjacent points and the 3D point is less than a predetermined threshold, output the 3D point and a plurality of descriptors corresponding to the 3D point to the scene in the optimized positioning model of the output scene; and the optimized positioning model of the output scene.
  • determining the multiple neighbor points of the three-dimensional point includes: determining multiple points in the three-dimensional point cloud whose distance from the three-dimensional point is less than a first predetermined threshold as multiple neighbor points.
  • the positioning method according to solution B3 or B4, wherein the distance between the multiple descriptors of each of the multiple nearby points and the multiple descriptors of the three-dimensional point is smaller than the third predetermined threshold value comprises: multiple neighbors The distance of the average value of the plurality of descriptors for each neighboring point of the point from the average value of the plurality of descriptors for the three-dimensional point is less than a third predetermined threshold.
  • An image-based positioning device comprising: an input unit configured to input an image to be queried; a positioning unit configured to locate an image to be queried using an optimized positioning model of a scene to which the image to be queried belongs; and an output unit, It is configured to output the pose of the camera that captures the image to be queried, wherein the positioning device further includes an optimization unit, and the optimization unit is configured to: receive a positioning model of the input scene, where the positioning model includes a three-dimensional point cloud and each of the three-dimensional point clouds.
  • An image-based positioning device comprising: one or more processors; and one or more memories, wherein the memories are stored with computer-readable codes, the computer-readable codes are executed by the one or more processors , causing one or more processors to perform a method as described in any of the solutions A1 to A6 and B1 to B6 above.
  • a computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions, when executed by a processor, cause the processor to execute any one of the above-mentioned solutions A1 to A6 and B1 to B6 the method described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种定位模型优化方法、基于图像的定位方法和定位设备、以及计算机可读存储介质。定位模型优化方法包括:输入场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符;对于三维点云中的每个三维点,确定三维点的多个近邻点,并且如果多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。根据本公开的定位模型优化方法能够通过三维点的聚类有效减少定位模型中冗余三维点的数量,加快了定位速度,提高了定位效率。

Description

定位模型优化方法、定位方法和定位设备以及存储介质
优先权信息
本申请要求于2020年8月03日提交中国专利局、申请号为202010767042.3、申请名称为“定位模型优化方法、定位方法和定位设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及定位领域,并且更具体地,涉及一种定位模型优化方法、基于图像的定位方法和定位设备、以及计算机可读存储介质。
背景技术
传统的定位技术通常基于GPS(Global Positioning System,全球定位系统)或者Wi-Fi无线网络等,但是这些定位技术存在着易受干扰、适用范围有限等诸多问题。相比于GPS等常用的定位方法,基于图像的定位方法通过进行相对于已知场景的三维定位能够提供更好的定位精度,从而能更好的服务于增强现实(Augmented Reality,AR)的应用。
通常,在基于图像的定位方法中,通过对场景的一系列图像进行三维重建得到的三维定位模型包含大量的三维点和各图像中的对应二维特征点及其描述符。对于一幅需要定位的待查询图像,首先需要提取待查询图像中的二维特征点及其描述符,然后与定位模型中的描述符进行匹配,确定了匹配的描述符也就确定了对应的三维点,从而实现对待查询图像的定位。因此,定位模型中的三维点及对应描述符的数量与场景的大小和输入图像的多少呈正相关关系,并直接影响定位算法的效率。传统的基于图像的定位方法依赖于强大的计算能力来处理大规模的三维点和描述符,但因此通常只能在服务器端实现,而服务器端的定位方法则需要依赖于网络连接和高速带宽,对于移动端上的各种AR实时应用造成了相当程度上的限制。
发明内容
为了克服现有技术中的缺陷,本发明提供了一种定位模型优化方法、基于图像的定位方法和定位设备、以及计算机可读存储介质。
根据本公开的一个方面,提供了一种定位模型优化方法,包括:输入场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;对于所述三维点云中的每个三维点,确定所述三维点的多个近邻点,并且如果所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及输出所述场景的优化定位模型。
根据本公开的另一个方面,提供了一种基于图像的定位方法,包括:输入待查询图像;利用所述 待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及输出拍摄所述待查询图像的相机的位姿。其中,所述场景的优化定位模型通过以下方法得到:输入所述场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;对于所述三维点云中的每个三维点,确定所述三维点的多个近邻点,并且如果所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及输出所述场景的优化定位模型。
根据本公开的另一个方面,提供了一种基于图像的定位设备,包括:输入单元,被配置为输入待查询图像;定位单元,被配置为利用所述待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及输出单元,被配置为输出拍摄所述待查询图像的相机的位姿。其中,所述定位设备还包括优化单元,所述优化单元被配置为:接收输入的场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;对于所述三维点云中的每个三维点,确定所述三维点的多个近邻点,并且如果所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及输出所述场景的优化定位模型。
根据本公开的另一个方面,提供了一种基于图像的定位装置,包括一个或多个处理器;和一个或多个存储器,其中,所述存储器中存储有计算机可读代码,所述计算机可读代码在由所述一个或多个处理器运行时,使得所述一个或多个处理器执行上述各方面中任一项所述的方法。
根据本公开的另一个方面,提供了一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令在被处理器执行时,使得所述处理器执行上述各方面中任一项所述的方法。
如下面将详细描述的,根据本公开的定位模型优化方法、基于图像的定位方法和定位设备、以及计算机可读存储介质,通过确定三维点云中的每个三维点的多个近邻点,并且当多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值时,输出该三维点及其对应的多个描述符到场景的优化定位模型中,能够有效减少定位模型中冗余三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
附图说明
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1示出了根据本公开实施例的定位模型优化方法100的流程图;
图2示出了根据本公开实施例的基于图像的定位方法200的流程图;
图3示出了根据本公开实施例的基于图像的定位设备300的结构示意图;
图4示出了根据本公开实施例的示例性电子设备400的结构示意图;以及
图5示出了根据本公开实施例的示例性计算机可读存储介质500的示意图。
具体实施方式
为了使得本公开的目的、技术方案和优点更为明显,下面将参照附图详细描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理解,本公开不受这里描述的示例实施例的限制。
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
下面参照图1描述根据本公开实施例的定位模型优化方法。图1示出了根据本公开实施例的定位模型优化方法100的流程图。
如图1所示,在步骤S110中,输入场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符。场景例如可以是建筑物、城市等任何地理场景。根据本公开的实施例的一个示例,对于某个场景,其定位模型可以是通过对该场景进行三维重建得到的三维定位模型。例如,可以预先对场景拍摄一系列图像,然后对场景进行基于图像的三维重建得到场景的三维定位模型。三维定位模型中包括由大量三维点形成的三维点云,三维点云中的每一个三维点都对应有一系列的位于各个图像上的二维特征点以及这些二维特征点的描述符。这里,描述符例如可以是描述二维特征点与其周围内容的相互关系的参数,从而能够利用描述符实现特征点的匹配。例如,描述符可以是对特征点周围的小像素块进行描述的二进制特征描述符。常用的描述符例如有BRISK(Binary Robust  Invariant Scalable Keypoints,二进制鲁棒不变可伸缩关键点)描述符、BRIEF(Binary Robust Independent Elementary Features,二进制鲁棒独立元特征)描述符等。
接下来,在步骤S120中,对于三维点云中的每个三维点,确定该三维点的多个近邻点,并且如果多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值,则输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中。否则,如果多个近邻点中的每个近邻点与该三维点的距离关系并非都小于预定阈值,则不输出该三维点及其对应的描述符。通过上述过程,对三维点云中的三维点进行了聚类,减少了冗余三维点及其对应的描述符的数量,具体减少的幅度则取决于预定阈值的大小。
根据本公开实施例的一个示例,确定某个三维点的多个近邻点可以包括:确定三维点云中与该三维点的距离小于第一预定阈值的多个点作为多个近邻点。例如,对于某个三维点p,如果其邻域内的某个点p’与三维点p的距离小于第一预定阈值,则确定该点p’为三维点p的近邻点,以此类推,确定三维点p的n个近邻点。第一预定阈值的大小决定了聚类的范围,例如,假定第一预定阈值为5,则表示与三维点p的距离小于5的点为三维点p的近邻点,然后根据这些近邻点与三维点p的距离关系来确定是否输出三维点p及其对应的描述符。应当理解的是,第一预定阈值不限于这里的示例数值,而是可以根据实际需要选择任意合适的数值。
根据本公开实施例的一个示例,多个近邻点中的每个近邻点与三维点的距离关系小于预定阈值可以是指:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值。例如,对于三维点p的多个近邻点,计算多个近邻点中的每个近邻点p’与三维点p的距离,如果每个近邻点p’与三维点p的距离均小于第二预定阈值,则输出三维点p以及与三维点p对应的多个描述符到场景的优化定位模型中;否则,不输出三维点p及其对应的描述符。例如,每个近邻点p’与三维点p的距离可以是p’的三维坐标(x’,y’,z’)与p的三维坐标(x,y,z)之间的距离。
替代地,根据本公开实施例的另一示例,多个近邻点中的每个近邻点与三维点的距离关系小于预定阈值可以是指:多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。如前所述,每个三维点对应一系列图像中的二维特征点,每个二维特征点具有对应的描述符,即,每个三维点对应多个描述符,每个三维点的多个近邻点中的每个近邻点也对应多个描述符。例如,三维点p可以具有对应的m个描述符d 1,……d m,三维点p的近邻点p’可以具有对应的m’个描述符d’ 1,……d’ m’,m和m’取决于用于重建三维点p和p’的二维特征点的数量。根据本公开实施例的一个示例,判断多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离是否小于第三预定阈值可以通过比较各个点的描述符的平均值来进行,例如,如果每个近邻点p’的m’个描述符d’ 1,……d’ m’的平均值与三维点p的m个描述符d 1,……d m的平均值的距离均小于第三预定阈值,则输出三维点p及其对应的多个描述符d 1,……d m到场景的优化定位模型中;否则,不输出三维点p及其对应的描述符。这里,如果描述符为多维的二进制向量,例如,128维的向量,则三维点p及其近邻点 p’的描述符的平均值之间的距离例如可以为它们的欧几里得距离。
应当理解的是,虽然上述以三维点及其近邻点的描述符的平均值之间的距离为例描述了三维点与其近邻点的描述符之间的距离,但是本公开不限于此,例如,三维点与其近邻点的描述符之间的距离也可以用三维点及其近邻点的描述符的加权值之间的距离来表示,或者,也可以分别选取三维点及其近邻点的一定数量的描述符,并根据这些描述符之间的距离来确定一个平均值、加权值或者最大值/最小值用于与第三预定阈值进行比较。
根据本公开实施例的一个示例,多个近邻点中的每个近邻点与三维点的距离关系小于预定阈值可以是指:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值,并且多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。也就是说,某个三维点的多个近邻点中的每个近邻点与该三维点的距离关系同时满足上述两个阈值条件时,才输出该三维点及其对应的多个描述符到场景的优化定位模型中;否则,不输出该三维点及其对应的描述符。
需要说明的是,上述示例中的第二预定阈值和第三预定阈值可以根据实际需要进行选取,本公开对此不作限定。此外,根据本公开实施例的一个示例,第二预定阈值可以小于第一预定阈值,也就是说,可以先通过较大的第一预定阈值确定包括三维点的多个近邻点的较大的聚类范围,然后再通过第二预定阈值和/或第三预定阈值进一步判断是否输出该三维点及其对应的描述符。替代地,根据本公开实施例的另一示例,第一预定阈值可以等于第二预定阈值,也就是说,确定某个三维点的多个近邻点与判断多个近邻点中的每个近邻点与该三维点之间的距离是否小于第二预定阈值可以同时进行。
在对输入的场景的定位模型进行优化之后,接下来,在步骤S130中,输出场景的优化定位模型。所输出的场景的优化定位模型能够快速、高效地进行定位计算,因而除了可以应用于传统的服务器端的定位计算之外,还可以应用于诸如手机、便携式计算机等的移动终端上的定位计算。
利用根据本公开上述实施例的定位模型优化方法,通过确定三维点云中的每个三维点的多个近邻点,并且当多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值时,输出该三维点及其对应的多个描述符到场景的优化定位模型中,能够有效减少定位模型中冗余三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
下面参照图2描述根据本公开实施例的基于图像的定位方法。图2示出了根据本公开实施例的基于图像的定位方法200的流程图。如图2所示,在步骤S210中,输入待查询图像。待查询图像例如是通过相机等拍摄设备拍摄得到的图像。接下来,在步骤S220中,利用待查询图像所属的场景的优化定位模型对待查询图像进行定位。场景的优化定位模型例如可以通过以下方法得到:输入场景的定位模型,定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;对于三维点云中的每个三维点,确定该三维点的多个近邻点,并且如果多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出 场景的优化定位模型。
根据本公开的实施例的一个示例,对于某个场景,其定位模型可以是通过对该场景进行三维重建得到的三维定位模型。例如,可以预先对场景拍摄一系列图像,然后对场景进行基于图像的三维重建得到场景的三维定位模型。三维定位模型中包括由大量三维点形成的三维点云,三维点云中的每一个三维点都对应有一系列的位于各个图像上的二维特征点以及这些二维特征点的描述符。这里,描述符例如可以是描述二维特征点与其周围内容的相互关系的参数,从而能够利用描述符实现特征点的匹配。例如,描述符可以是对特征点周围的小像素块进行描述的二进制特征描述符。常用的描述符例如有BRISK描述符、BRIEF描述符等。
根据本公开实施例的一个示例,确定某个三维点的多个近邻点可以包括:确定三维点云中与该三维点的距离小于第一预定阈值的多个点作为多个近邻点。例如,对于某个三维点p,如果其邻域内的某个点p’与三维点p的距离小于第一预定阈值,则确定该点p’为三维点p的近邻点,以此类推,确定三维点p的n个近邻点。第一预定阈值的大小决定了聚类的范围,例如,假定第一预定阈值为5,则表示与三维点p的距离小于5的点为三维点p的近邻点,然后根据这些近邻点与三维点p的距离关系来确定是否输出三维点p及其对应的描述符。应当理解的是,第一预定阈值不限于这里的示例数值,而是可以根据实际需要选择任意合适的数值。
根据本公开实施例的一个示例,多个近邻点中的每个近邻点与三维点的距离关系小于预定阈值可以是指:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值。例如,对于三维点p的多个近邻点,计算多个近邻点中的每个近邻点p’与三维点p的距离,如果每个近邻点p’与三维点p的距离均小于第二预定阈值,则输出三维点p以及与三维点p对应的多个描述符到场景的优化定位模型中;否则,不输出三维点p及其对应的描述符。例如,每个近邻点p’与三维点p的距离可以是p’的三维坐标(x’,y’,z’)与p的三维坐标(x,y,z)之间的距离。
替代地,根据本公开实施例的另一示例,多个近邻点中的每个近邻点与三维点的距离关系小于预定阈值可以是指:多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。如前所述,每个三维点对应一系列图像中的二维特征点,每个二维特征点具有对应的描述符,即,每个三维点对应多个描述符,每个三维点的多个近邻点中的每个近邻点也对应多个描述符。例如,三维点p可以具有对应的m个描述符d 1,……d m,三维点p的近邻点p’可以具有对应的m’个描述符d’ 1,……d’ m’,m和m’取决于用于重建三维点p和p’的二维特征点的数量。根据本公开实施例的一个示例,判断多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离是否小于第三预定阈值可以通过比较各个点的描述符的平均值来进行,例如,如果每个近邻点p’的m’个描述符d’ 1,……d’ m’的平均值与三维点p的m个描述符d 1,……d m的平均值的距离均小于第三预定阈值,则输出三维点p及其对应的多个描述符d 1,……d m到场景的优化定位模型中;否则,不输出三维点p及其对应的描述符。这里,如果描述符为多维的二进制向量,例如,128维的向量,则三维点p及其近邻点 p’的描述符的平均值之间的距离例如可以为它们的欧几里得距离。
应当理解的是,虽然上述以三维点及其近邻点的描述符的平均值之间的距离为例描述了三维点与其近邻点的描述符之间的距离,但是本公开不限于此,例如,三维点与其近邻点的描述符之间的距离也可以用三维点及其近邻点的描述符的加权值之间的距离来表示,或者,也可以分别选取三维点及其近邻点的一定数量的描述符,并根据这些描述符之间的距离来确定一个平均值、加权值或者最大值/最小值用于与第三预定阈值进行比较。
根据本公开实施例的一个示例,多个近邻点中的每个近邻点与三维点的距离关系小于预定阈值可以是指:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值,并且多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。也就是说,某个三维点的多个近邻点中的每个近邻点与该三维点的距离关系同时满足上述两个阈值条件时,才输出该三维点及其对应的多个描述符到场景的优化定位模型中;否则,不输出该三维点及其对应的描述符。
需要说明的是,上述示例中的第二预定阈值和第三预定阈值可以根据实际需要进行选取,本公开对此不作限定。根据本公开实施例的一个示例,第二预定阈值可以小于第一预定阈值,也就是说,可以先通过较大的第一预定阈值确定包括三维点的多个近邻点的较大的聚类范围,然后再通过第二预定阈值和/或第三预定阈值进一步判断是否输出该三维点及其对应的描述符。替代地,根据本公开实施例的另一示例,第一预定阈值可以等于第二预定阈值,也就是说,确定某个三维点的多个近邻点与判断多个近邻点中的每个近邻点与该三维点之间的距离是否小于第二预定阈值可以同时进行。
在利用场景的优化定位模型对待查询图像进行定位之后,接下来,在步骤S230中,输出拍摄待查询图像的相机的位姿。相机的位姿例如包括拍摄该待查询图像时相机的位置和姿态,例如,所输出的相机的位姿可以是描述相机的三维坐标和旋转方向的6自由度的变量。
在根据本公开上述实施例的定位方法中,通过确定三维点云中的每个三维点的多个近邻点,并且当多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值时,输出该三维点及其对应的多个描述符来对场景的定位模型进行优化,然后利用场景的优化定位模型对待查询图像进行定位,能够有效减少定位模型中冗余三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
下面参照图3描述根据本公开实施例的基于图像的定位设备。图3示出了根据本公开实施例的基于图像的定位设备300的结构示意图。由于定位设备300的功能与在上文中参照图2描述的定位方法200的细节相同,因此在这里为了简单起见,省略对相同内容的详细描述。如图3所示,定位设备300包括:输入单元310,被配置为输入待查询图像;定位单元320,被配置为利用待查询图像所属的场景的优化定位模型对待查询图像进行定位;以及输出单元330,被配置为输出拍摄待查询图像的相机的位姿。此外,定位设备300还可以包括优化单元340,优化单元340被配置为:接收输入的场景的定位模型,定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;对于三维点云中的每 个三维点,确定该三维点的多个近邻点,并且如果多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。除了这四个单元以外,定位设备300还可以包括其他部件,然而,由于这些部件与本公开实施例的内容无关,因此在这里省略其图示和描述。
在根据本公开上述实施例的定位设备中,通过确定三维点云中的每个三维点的多个近邻点,并且当多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值时,输出该三维点及其对应的多个描述符来对场景的定位模型进行优化,然后利用场景的优化定位模型对待查询图像进行定位,能够有效减少定位模型中冗余三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
此外,根据本公开实施例的定位设备也可以借助于图4所示的示例性电子设备的架构来实现。图4示出了根据本公开实施例的示例性电子设备400的结构示意图。根据本公开实施例的示例性电子设备400至少包括一个或多个处理器;和一个或多个存储器,其中,存储器中存储有计算机可读代码,计算机可读代码在由一个或多个处理器运行时,使得一个或多个处理器执行如上所述的方法。
具体地,本公开实施例中的电子设备400可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。应当理解的是,图4示出的电子设备仅仅是一个示例,而不应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(ROM)402中的程序或者从存储装置408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM 402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
此外,本公开还提供了一种计算机可读存储介质。图5示出了根据本公开实施例的示例性计算机可读存储介质500的示意图,如图5所示,计算机可读存储介质500上存储有计算机可读指令501,计算机可读指令501在被处理器执行时,使得处理器执行上述各个实施例中描述的定位模型优化方法以及定位方法。
需要说明的是,本公开上述的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读存储介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程 图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,计算机可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
A1.一种定位模型优化方法,包括:输入场景的定位模型,定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;对于三维点云中的每个三维点,确定该三维点的多个近邻点,并且如果多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。
A2.根据解决方案A1的定位模型优化方法,其中,确定该三维点的多个近邻点包括:确定三维点云中与该三维点的距离小于第一预定阈值的多个点作为多个近邻点。
A3.根据解决方案A1的定位模型优化方法,其中,多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值包括:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值,或者多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。
A4.根据解决方案A1的定位模型优化方法,其中,多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值包括:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值,并且多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。
A5.根据解决方案A3或A4的定位模型优化方法,其中,多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值包括:多个近邻点中的每个近邻点的多个描述符的平均值距该三维点的多个描述符的平均值的距离小于第三预定阈值。
A6.根据解决方案A1的定位模型优化方法,其中,场景的定位模型是通过对场景进行三维重建得到的三维定位模型。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
B1.一种基于图像的定位方法,包括:输入待查询图像;利用待查询图像所属的场景的优化定位模型对待查询图像进行定位;以及输出拍摄待查询图像的相机的位姿,其中,场景的优化定位模型通过以下方法得到:输入场景的定位模型,定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;对于三维点云中的每个三维点,确定该三维点的多个近邻点,并且如果多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。
B2.根据解决方案B1的定位方法,其中,确定该三维点的多个近邻点包括:确定三维点云中与该三维点的距离小于第一预定阈值的多个点作为多个近邻点。
B3.根据解决方案B1的定位方法,其中,多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值包括:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值,或者多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。
B4.根据解决方案B1的定位方法,其中,多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值包括:多个近邻点中的每个近邻点距该三维点的距离小于第二预定阈值,并且多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值。
B5.根据解决方案B3或B4的定位方法,其中,多个近邻点中的每个近邻点的多个描述符距该三维点的多个描述符的距离小于第三预定阈值包括:多个近邻点中的每个近邻点的多个描述符的平均值距该三维点的多个描述符的平均值的距离小于第三预定阈值。
B6.根据解决方案B1的定位方法,其中,场景的定位模型是通过对场景进行三维重建得到的三维定位模型。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
C1.一种基于图像的定位设备,包括:输入单元,被配置为输入待查询图像;定位单元,被配置为利用待查询图像所属的场景的优化定位模型对待查询图像进行定位;以及输出单元,被配置为输出拍摄待查询图像的相机的位姿,其中,定位设备还包括优化单元,优化单元被配置为:接收输入的场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符;对于三维点云中的每个三维点,确定该三维点的多个近邻点,并且如果多个近邻点中的每个近邻点与该三维点的距离关系小于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
D1.一种基于图像的定位装置,包括:一个或多个处理器;和一个或多个存储器,其中,存储器中存储有计算机可读代码,计算机可读代码在由一个或多个处理器运行时,使得一个或多个处理器执行如上述解决方案A1至A6和B1至B6中任一项所述的方法。
E1.一种计算机可读存储介质,其上存储有计算机可读指令,该计算机可读指令在被处理器执行时,使得处理器执行如上述解决方案A1至A6和B1至B6中任一项所述的方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (15)

  1. 一种定位模型优化方法,包括:
    输入场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;
    对于所述三维点云中的每个三维点,确定所述三维点的多个近邻点,并且如果所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及
    输出所述场景的优化定位模型。
  2. 根据权利要求1所述的定位模型优化方法,其中,所述确定所述三维点的多个近邻点包括:
    确定所述三维点云中与所述三维点的距离小于第一预定阈值的多个点作为所述多个近邻点。
  3. 根据权利要求1所述的定位模型优化方法,其中,所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值包括:
    所述多个近邻点中的每个近邻点距所述三维点的距离小于第二预定阈值,或者所述多个近邻点中的每个近邻点的多个描述符距所述三维点的多个描述符的距离小于第三预定阈值。
  4. 根据权利要求1所述的定位模型优化方法,其中,所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值包括:
    所述多个近邻点中的每个近邻点距所述三维点的距离小于第二预定阈值,并且所述多个近邻点中的每个近邻点的多个描述符距所述三维点的多个描述符的距离小于第三预定阈值。
  5. 根据权利要求3或4所述的定位模型优化方法,其中,所述多个近邻点中的每个近邻点的多个描述符距所述三维点的多个描述符的距离小于第三预定阈值包括:
    所述多个近邻点中的每个近邻点的多个描述符的平均值距所述三维点的多个描述符的平均值的距离小于第三预定阈值。
  6. 根据权利要求1所述的定位模型优化方法,其中,所述场景的定位模型是通过对所述场景进行三维重建得到的三维定位模型。
  7. 一种基于图像的定位方法,包括:
    输入待查询图像;
    利用所述待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及
    输出拍摄所述待查询图像的相机的位姿,
    其中,所述场景的优化定位模型通过以下方法得到:
    输入场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;
    对于所述三维点云中的每个三维点,确定所述三维点的多个近邻点,并且如果所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及
    输出所述场景的优化定位模型。
  8. 根据权利要求7所述的定位方法,其中,所述确定所述三维点的多个近邻点包括:
    确定所述三维点云中与所述三维点的距离小于第一预定阈值的多个点作为所述多个近邻点。
  9. 根据权利要求7所述的定位方法,其中,所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值包括:
    所述多个近邻点中的每个近邻点距所述三维点的距离小于第二预定阈值,或者所述多个近邻点中的每个近邻点的多个描述符距所述三维点的多个描述符的距离小于第三预定阈值。
  10. 根据权利要求7所述的定位方法,其中,所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值包括:
    所述多个近邻点中的每个近邻点距所述三维点的距离小于第二预定阈值,并且所述多个近邻点中的每个近邻点的多个描述符距所述三维点的多个描述符的距离小于第三预定阈值。
  11. 根据权利要求9或10所述的定位方法,其中,所述多个近邻点中的每个近邻点的多个描述符距所述三维点的多个描述符的距离小于第三预定阈值包括:
    所述多个近邻点中的每个近邻点的多个描述符的平均值距所述三维点的多个描述符的平均值的距离小于第三预定阈值。
  12. 根据权利要求7所述的定位方法,其中,所述场景的定位模型是通过对所述场景进行三维重建得到的三维定位模型。
  13. 一种基于图像的定位设备,包括:
    输入单元,被配置为输入待查询图像;
    定位单元,被配置为利用所述待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及
    输出单元,被配置为输出拍摄所述待查询图像的相机的位姿,
    其中,所述定位设备还包括优化单元,所述优化单元被配置为:
    接收输入的场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;
    对于所述三维点云中的每个三维点,确定所述三维点的多个近邻点,并且如果所述多个近邻点中的每个近邻点与所述三维点的距离关系小于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及
    输出所述场景的优化定位模型。
  14. 一种基于图像的定位装置,包括:
    一个或多个处理器;和
    一个或多个存储器,其中,所述存储器中存储有计算机可读代码,所述计算机可读代码在由所述一个或多个处理器运行时,使得所述一个或多个处理器执行如权利要求1-12中任一项所述的方法。
  15. 一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令在被处理器执行时,使得所述处理器执行如权利要求1-12中任一项所述的方法。
PCT/CN2021/107975 2020-08-03 2021-07-22 定位模型优化方法、定位方法和定位设备以及存储介质 WO2022028253A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/000,375 US20230222749A1 (en) 2020-08-03 2021-07-22 Positioning model optimization method, positioning method, positioning device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010767042.3 2020-08-03
CN202010767042.3A CN111862351B (zh) 2020-08-03 2020-08-03 定位模型优化方法、定位方法和定位设备

Publications (1)

Publication Number Publication Date
WO2022028253A1 true WO2022028253A1 (zh) 2022-02-10

Family

ID=72952671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107975 WO2022028253A1 (zh) 2020-08-03 2021-07-22 定位模型优化方法、定位方法和定位设备以及存储介质

Country Status (3)

Country Link
US (1) US20230222749A1 (zh)
CN (1) CN111862351B (zh)
WO (1) WO2022028253A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111862351B (zh) * 2020-08-03 2024-01-19 字节跳动有限公司 定位模型优化方法、定位方法和定位设备
CN112750164B (zh) * 2021-01-21 2023-04-18 脸萌有限公司 轻量化定位模型的构建方法、定位方法、电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184789A (zh) * 2015-08-31 2015-12-23 中国科学院自动化研究所 一种基于点云约减的相机定位系统及方法
CN105844696A (zh) * 2015-12-31 2016-08-10 清华大学 基于射线模型三维重构的图像定位方法以及装置
US20180330504A1 (en) * 2017-05-14 2018-11-15 International Business Machines Corporation Systems and methods for determining a camera pose of an image
CN111275810A (zh) * 2020-01-17 2020-06-12 五邑大学 基于图像处理的k近邻点云滤波方法、装置和存储介质
CN111862351A (zh) * 2020-08-03 2020-10-30 字节跳动有限公司 定位模型优化方法、定位方法和定位设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103745498B (zh) * 2014-01-16 2017-01-04 中国科学院自动化研究所 一种基于图像的快速定位方法
CN103942824B (zh) * 2014-05-15 2017-01-11 厦门大学 一种三维点云直线特征提取方法
CN104484882B (zh) * 2014-12-24 2017-09-29 哈尔滨工业大学 一种基于机载LiDAR数据的城市区域电力线检测方法
CN106940186B (zh) * 2017-02-16 2019-09-24 华中科技大学 一种机器人自主定位与导航方法及系统
CN109087342A (zh) * 2018-07-12 2018-12-25 武汉尺子科技有限公司 一种基于特征匹配的三维点云全局配准方法及系统
CN110163903B (zh) * 2019-05-27 2022-02-25 百度在线网络技术(北京)有限公司 三维图像的获取及图像定位方法、装置、设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184789A (zh) * 2015-08-31 2015-12-23 中国科学院自动化研究所 一种基于点云约减的相机定位系统及方法
CN105844696A (zh) * 2015-12-31 2016-08-10 清华大学 基于射线模型三维重构的图像定位方法以及装置
US20180330504A1 (en) * 2017-05-14 2018-11-15 International Business Machines Corporation Systems and methods for determining a camera pose of an image
CN111275810A (zh) * 2020-01-17 2020-06-12 五邑大学 基于图像处理的k近邻点云滤波方法、装置和存储介质
CN111862351A (zh) * 2020-08-03 2020-10-30 字节跳动有限公司 定位模型优化方法、定位方法和定位设备

Also Published As

Publication number Publication date
CN111862351A (zh) 2020-10-30
CN111862351B (zh) 2024-01-19
US20230222749A1 (en) 2023-07-13

Similar Documents

Publication Publication Date Title
WO2022028254A1 (zh) 定位模型优化方法、定位方法和定位设备
US20190130603A1 (en) Deep-learning based feature mining for 2.5d sensing image search
WO2022028253A1 (zh) 定位模型优化方法、定位方法和定位设备以及存储介质
WO2022033111A1 (zh) 图像信息提取方法、训练方法及装置、介质和电子设备
CN112258512A (zh) 点云分割方法、装置、设备和存储介质
WO2023143178A1 (zh) 对象分割方法、装置、设备及存储介质
WO2022171036A1 (zh) 视频目标追踪方法、视频目标追踪装置、存储介质及电子设备
CN110211195B (zh) 生成图像集合的方法、装置、电子设备和计算机可读存储介质
CN113420757B (zh) 文本审核方法、装置、电子设备和计算机可读介质
WO2023103653A1 (zh) 键值匹配方法、装置、可读介质及电子设备
WO2023083152A1 (zh) 图像分割方法、装置、设备及存储介质
CN110188782B (zh) 图像相似性确定方法、装置、电子设备及可读存储介质
WO2023016111A1 (zh) 键值匹配方法、装置、可读介质及电子设备
CN112488095A (zh) 印章图像识别方法、装置和电子设备
CN112257598B (zh) 图像中四边形的识别方法、装置、可读介质和电子设备
WO2023138468A1 (zh) 虚拟物体的生成方法、装置、设备及存储介质
WO2023138540A1 (zh) 边缘提取方法、装置、电子设备及存储介质
WO2022194145A1 (zh) 一种拍摄位置确定方法、装置、设备及介质
WO2023138441A1 (zh) 视频生成方法、装置、设备及存储介质
WO2023071694A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2022105622A1 (zh) 图像分割方法、装置、可读介质及电子设备
CN116503596A (zh) 图片分割方法、装置、介质和电子设备
CN110765304A (zh) 图像处理方法、装置、电子设备及计算机可读介质
WO2022052889A1 (zh) 图像识别方法、装置、电子设备和计算机可读介质
CN111639198A (zh) 媒体文件识别方法、装置、可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21854176

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 09/06/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21854176

Country of ref document: EP

Kind code of ref document: A1