WO2022028254A1 - 定位模型优化方法、定位方法和定位设备 - Google Patents

定位模型优化方法、定位方法和定位设备 Download PDF

Info

Publication number
WO2022028254A1
WO2022028254A1 PCT/CN2021/107976 CN2021107976W WO2022028254A1 WO 2022028254 A1 WO2022028254 A1 WO 2022028254A1 CN 2021107976 W CN2021107976 W CN 2021107976W WO 2022028254 A1 WO2022028254 A1 WO 2022028254A1
Authority
WO
WIPO (PCT)
Prior art keywords
positioning
point
scene
dimensional
model
Prior art date
Application number
PCT/CN2021/107976
Other languages
English (en)
French (fr)
Inventor
罗琳捷
刘晶
陈志立
王国晖
杨骁�
杨建朝
连晓晨
Original Assignee
罗琳捷
字节跳动有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 罗琳捷, 字节跳动有限公司 filed Critical 罗琳捷
Priority to US18/040,463 priority Critical patent/US20230290094A1/en
Publication of WO2022028254A1 publication Critical patent/WO2022028254A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2004Aligning objects, relative positioning of parts

Definitions

  • the present disclosure relates to the field of positioning, and more particularly, to a positioning model optimization method, a positioning method and a positioning device.
  • GPS Global Positioning System, global positioning system
  • Wi-Fi wireless networks etc.
  • these positioning technologies have many problems such as being susceptible to interference and limited scope of application.
  • image-based positioning methods can provide better positioning accuracy by performing three-dimensional positioning relative to known scenes, so as to better serve the application of Augmented Reality (AR). .
  • AR Augmented Reality
  • a 3D localization model obtained by 3D reconstruction of a series of images of a scene contains a large number of 3D points and corresponding 2D feature points in each image and their descriptors.
  • For an image to be queried that needs to be located firstly, it is necessary to extract the two-dimensional feature points and their descriptors in the image to be queried, and then match them with the descriptors in the localization model. If the matching descriptor is determined, the corresponding descriptor is determined. Three-dimensional point, so as to realize the positioning of the image to be queried.
  • the number of 3D points and corresponding descriptors in the localization model is positively correlated with the size of the scene and the number of input images, and directly affects the efficiency of the localization algorithm.
  • Traditional image-based localization methods rely on powerful computing power to process large-scale 3D points and descriptors, but are usually only implemented on the server side, while server-side localization methods need to rely on network connections and high-speed bandwidth.
  • server-side localization methods need to rely on network connections and high-speed bandwidth.
  • Various AR real-time applications on mobile have caused considerable limitations.
  • the purpose of the present disclosure is to solve at least one of the above-mentioned technical defects, especially the server-based positioning method existing in the prior art needs to rely on network connection and high-speed bandwidth, which causes various AR real-time applications on the mobile terminal. A considerable degree of restricted technical issues.
  • the present disclosure provides a positioning model optimization method, an image-based positioning method and a positioning device, and a computer-readable storage medium.
  • a method for optimizing a positioning model comprising: inputting a positioning model of a scene, the positioning model including a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud Calculate the saliency of each 3D point in the 3D point cloud, and if the saliency is greater than a predetermined threshold, output the 3D point and a plurality of descriptors corresponding to the 3D point to the optimization of the scene in the positioning model; and outputting the optimized positioning model of the scene.
  • an image-based positioning method comprising: inputting an image to be queried; locating the image to be queried by using an optimized positioning model of a scene to which the image to be queried belongs; and outputting a captured image The pose of the camera of the image to be queried.
  • the optimized positioning model of the scene is obtained by the following methods: inputting the positioning model of the scene, the positioning model includes a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; calculating the saliency of each 3D point in the 3D point cloud, and if the saliency is greater than a predetermined threshold, output the 3D point and a plurality of descriptors corresponding to the 3D point to the optimized localization model of the scene and outputting the optimized localization model of the scene.
  • an image-based positioning device comprising: an input unit configured to input an image to be queried; a positioning unit configured to utilize an optimized positioning model of a scene to which the image to be queried belongs positioning the image to be queried; and an output unit configured to output the pose of the camera that captured the image to be queried.
  • the positioning device further includes an optimization unit, and the optimization unit is configured to: receive a positioning model of the input scene, where the positioning model includes a three-dimensional point cloud and multiple points corresponding to each three-dimensional point in the three-dimensional point cloud.
  • a descriptor calculates the saliency of each 3D point in the 3D point cloud, and if the saliency is greater than a predetermined threshold, output the 3D point and a plurality of descriptors corresponding to the 3D point to the 3D point in the optimized positioning model of the scene; and outputting the optimized positioning model of the scene.
  • an image-based positioning device comprising one or more processors; and one or more memories, wherein the memories have computer-readable codes stored therein, the computers can Reading code, when executed by the one or more processors, causes the one or more processors to perform the method of any of the above aspects.
  • a computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions, when executed by a processor, cause the processor to perform any of the above aspects The method of any one.
  • the saliency of each three-dimensional point in the three-dimensional point cloud of the positioning model of the input scene is calculated, and the saliency is output
  • the three-dimensional points whose properties are greater than the predetermined threshold and their corresponding descriptors are added to the optimized positioning model of the scene, which can effectively reduce the number of three-dimensional points in the positioning model, speed up the positioning speed, and improve the positioning efficiency.
  • the positioning calculation on the mobile device makes various real-time AR applications based on scene 3D positioning possible on mobile devices.
  • FIG. 1 shows a flowchart of a positioning model optimization method 100 according to an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of an image-based positioning method 200 according to an embodiment of the present disclosure
  • FIG. 3 shows a schematic structural diagram of an image-based positioning device 300 according to an embodiment of the present disclosure
  • FIG. 4 shows a schematic structural diagram of an exemplary electronic device 400 according to an embodiment of the present disclosure.
  • FIG. 5 shows a schematic diagram of an exemplary computer-readable storage medium 500 according to an embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • FIG. 1 shows a flowchart of a positioning model optimization method 100 according to an embodiment of the present disclosure.
  • a localization model of the scene is input, and the localization model includes a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud.
  • the scene can be any geographic scene such as a building, a city, etc., for example.
  • its positioning model may be a three-dimensional positioning model obtained by performing three-dimensional reconstruction on the scene. For example, a series of images of the scene can be captured in advance, and then a three-dimensional localization model of the scene can be obtained by performing three-dimensional reconstruction of the scene based on the images.
  • the 3D positioning model includes a 3D point cloud formed by a large number of 3D points, and each 3D point in the 3D point cloud corresponds to a series of 2D feature points located on each image and descriptors of these 2D feature points.
  • the descriptor may be, for example, a parameter describing the relationship between the two-dimensional feature point and its surrounding contents, so that the matching of the feature point can be realized by using the descriptor.
  • the descriptors can be binary feature descriptors that describe small blocks of pixels around the feature points. Commonly used descriptors include BRISK (Binary Robust Invariant Scalable Keypoints) descriptor, BRIEF (Binary Robust Independent Elementary Features) descriptor, etc.
  • step S120 the saliency of each 3D point in the 3D point cloud is calculated, and if the saliency of the 3D point is greater than a predetermined threshold, the 3D point and a plurality of descriptors corresponding to the 3D point are output to In the optimized positioning model of the scene. That is to say, for each 3D point in the 3D point cloud, if its significance is greater than a predetermined threshold, output the 3D point and multiple descriptors corresponding to the 3D point to the optimized localization model of the scene; if it is significant If the property is less than or equal to a predetermined threshold, the 3D point and its corresponding descriptor are not output. Therefore, compared with the input scene positioning model, the number of 3D points and descriptors in the scene optimized positioning model is greatly reduced, and the specific reduction extent depends on the predetermined threshold.
  • calculating the saliency of each three-dimensional point in the three-dimensional point cloud may include: determining a trajectory formed by two-dimensional feature points projected by the three-dimensional point on different images of the scene; and calculating the length of the trajectory Saliency as a 3D point.
  • the localization model is obtained by 3D reconstruction of a series of images of the scene. Accordingly, each 3D point in the 3D point cloud of the localization model corresponds to a series of 2D feature points on each image. These two-dimensional feature points can be regarded as forming a trajectory in space, and the length of the trajectory represents the saliency of the three-dimensional point.
  • the 3D point is reconstructed from more 2D feature points, or the projection of the 3D point appears in more images, then the 3D point is reconstructed from more 2D feature points. is significant; if the trajectory of a 3D point is shorter, it means that the 3D point is reconstructed from fewer 2D feature points, or the projection of the 3D point appears in fewer images, then the 3D point is reconstructed from fewer 2D feature points. Three-dimensional points are less significant.
  • the present disclosure is not limited to this, and the three-dimensional point can also be calculated in other ways. point significance.
  • each element in the matrix representing the trajectory may be a vector composed of position coordinates and descriptors of each two-dimensional feature point corresponding to the three-dimensional point.
  • the trajectory formed by the 2D feature points of a 3D point p in the 3D point cloud on different images of the scene can be represented by a matrix ⁇ f1, f2, .
  • the number of two-dimensional feature points of point p, where element fi can be composed of the position coordinates (xi, yi) of the two-dimensional feature points of the three-dimensional point p on the ith image and its corresponding descriptor di.
  • the element fi is a 130-dimensional vector composed of a two-dimensional position coordinate vector (xi, yi) and a 128-dimensional descriptor vector di.
  • the length of a trajectory of a three-dimensional point in a three-dimensional point cloud may be the number of rows or columns of a matrix representing the trajectory.
  • the matrix representing the trajectory of the three-dimensional point p is ⁇ f1, f2, . . . , fn ⁇
  • the length of the trajectory of the three-dimensional point p can be the number of columns n of the matrix. is significant for the number of columns n.
  • the matrix representing the trajectory of the three-dimensional point p may be ⁇ f1, f2, ..., fn ⁇ T, and the length of the trajectory of the three-dimensional point p may be the number of rows n of the matrix, Correspondingly, the significance of the three-dimensional point p is the number of rows n. It should be noted that, although the number of rows or columns of the matrix representing the trajectory is used as an example to illustrate the length of the trajectory of the three-dimensional point, the present disclosure is not limited to this, and the length of the trajectory of the three-dimensional point can also be any other suitable value. , such as the Euclidean norm of the matrix representing the trajectory, etc.
  • the saliency of each 3D point in the 3D point cloud is calculated, the saliency is compared with a predetermined threshold. For example, in the above example, the significance of the three-dimensional point p is n. If n is greater than the predetermined threshold, the three-dimensional point p and its corresponding multiple descriptors are output to the optimal positioning model; if n is less than or equal to the predetermined threshold, then The 3D point p and its corresponding descriptor are not output.
  • the selection of the predetermined threshold may depend on the desired positioning speed and efficiency of the optimized positioning model.
  • a predetermined threshold value of 5 can be set, then for a three-dimensional point p, if its significance n is greater than 5, output the three-dimensional point p.
  • Point p and its corresponding multiple descriptors are added to the optimized localization model; if n is less than or equal to 5, the 3D point p and its corresponding descriptors are not output.
  • the predetermined threshold is 5
  • the number of three-dimensional points in the optimized localization model can be reduced by at least half, thereby increasing the localization speed by at least one time. It should be understood that, although the predetermined threshold value of 5 is used as an example for description, the present disclosure is not limited thereto, and any other suitable value may be selected as the predetermined threshold value according to actual needs.
  • the optimized positioning model of the scene is output.
  • the optimized positioning model of the output scene can perform positioning calculation quickly and efficiently, so in addition to the traditional server-side positioning calculation, it can also be applied to the positioning calculation on mobile terminals such as mobile phones and portable computers.
  • the saliency of each 3D point in the 3D point cloud of the localization model of the input scene is calculated, and the 3D point with saliency greater than a predetermined threshold and its corresponding multi-dimensional point are output. It can effectively reduce the number of 3D points in the positioning model, speed up the positioning speed, and improve the positioning efficiency, so that the positioning calculation on the mobile device can be optimized, so that various Real-time AR applications based on scene 3D positioning become possible.
  • FIG. 2 shows a flowchart of an image-based positioning method 200 according to an embodiment of the present disclosure.
  • an image to be queried is input.
  • the image to be queried is, for example, an image captured by a photographing device such as a camera.
  • the image to be queried is located by using the optimized positioning model of the scene to which the image to be queried belongs.
  • the optimized positioning model of the scene can be obtained, for example, by the following methods: inputting the positioning model of the scene, the positioning model includes a three-dimensional point cloud and multiple descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; calculating each three-dimensional point cloud in the three-dimensional point cloud.
  • the saliency of the point and if the saliency is greater than a predetermined threshold, output the three-dimensional point and a plurality of descriptors corresponding to the three-dimensional point into the optimal localization model of the scene; and output the optimal localization model of the scene.
  • its positioning model may be a three-dimensional positioning model obtained by performing three-dimensional reconstruction on the scene.
  • a series of images of the scene can be captured in advance, and then a three-dimensional localization model of the scene can be obtained by performing three-dimensional reconstruction of the scene based on the images.
  • the 3D positioning model includes a 3D point cloud formed by a large number of 3D points, and each 3D point in the 3D point cloud corresponds to a series of 2D feature points located on each image and descriptors of these 2D feature points.
  • the descriptor may be, for example, a parameter describing the relationship between the two-dimensional feature point and its surrounding contents, so that the matching of the feature point can be realized by using the descriptor.
  • the descriptors can be binary feature descriptors that describe small blocks of pixels around the feature points. Commonly used descriptors include BRISK descriptors, BRIEF descriptors, and the like.
  • calculating the saliency of each three-dimensional point in the three-dimensional point cloud may include: determining a trajectory formed by two-dimensional feature points projected by the three-dimensional point on different images of the scene; and calculating the length of the trajectory Saliency as a 3D point.
  • the localization model is obtained by 3D reconstruction of a series of images of the scene. Accordingly, each 3D point in the 3D point cloud of the localization model corresponds to a series of 2D feature points on each image. These two-dimensional feature points can be regarded as forming a trajectory in space, and the length of the trajectory represents the saliency of the three-dimensional point.
  • the 3D point is reconstructed from more 2D feature points, or the projection of the 3D point appears in more images, then the 3D point is reconstructed from more 2D feature points. is significant; if the trajectory of a 3D point is shorter, it means that the 3D point is reconstructed from fewer 2D feature points, or the projection of the 3D point appears in fewer images, then the 3D point is reconstructed from fewer 2D feature points. Three-dimensional points are less significant.
  • the present disclosure is not limited to this, and the three-dimensional point can also be calculated in other ways. point significance.
  • each element in the matrix representing the trajectory may be a vector composed of position coordinates and descriptors of each two-dimensional feature point corresponding to the three-dimensional point.
  • the trajectory formed by the 2D feature points of a 3D point p in the 3D point cloud on different images of the scene can be represented by a matrix ⁇ f1, f2, .
  • the number of two-dimensional feature points of point p, where element fi can be composed of the position coordinates (xi, yi) of the two-dimensional feature points of the three-dimensional point p on the ith image and its corresponding descriptor di.
  • the element fi is a 130-dimensional vector composed of a two-dimensional position coordinate vector (xi, yi) and a 128-dimensional descriptor vector di.
  • the length of a trajectory of a three-dimensional point in a three-dimensional point cloud may be the number of rows or columns of a matrix representing the trajectory.
  • the matrix representing the trajectory of the three-dimensional point p is ⁇ f1, f2, . . . , fn ⁇
  • the length of the trajectory of the three-dimensional point p can be the number of columns n of the matrix. is significant for the number of columns n.
  • the matrix representing the trajectory of the three-dimensional point p may be ⁇ f1, f2, ..., fn ⁇ T, and the length of the trajectory of the three-dimensional point p may be the number of rows n of the matrix, Correspondingly, the significance of the three-dimensional point p is the number of rows n. It should be noted that, although the number of rows or columns of the matrix representing the trajectory is used as an example to illustrate the length of the trajectory of the three-dimensional point, the present disclosure is not limited to this, and the length of the trajectory of the three-dimensional point can also be any other suitable value. , such as the Euclidean norm of the matrix representing the trajectory, etc.
  • the saliency of each 3D point in the 3D point cloud is calculated, the saliency is compared with a predetermined threshold. For example, in the above example, the significance of the three-dimensional point p is n. If n is greater than the predetermined threshold, the three-dimensional point p and its corresponding multiple descriptors are output to the optimal positioning model; if n is less than or equal to the predetermined threshold, then The 3D point p and its corresponding descriptor are not output.
  • the selection of the predetermined threshold may depend on the desired positioning speed and efficiency of the optimized positioning model.
  • a predetermined threshold value of 5 can be set, then for a three-dimensional point p, if its significance n is greater than 5, output the three-dimensional point p.
  • Point p and its corresponding multiple descriptors are added to the optimized localization model; if n is less than or equal to 5, the 3D point p and its corresponding descriptors are not output.
  • the predetermined threshold is 5
  • the number of three-dimensional points in the optimized localization model can be reduced by at least half, thereby increasing the localization speed by at least one time. It should be understood that, although the predetermined threshold value of 5 is used as an example for description, the present disclosure is not limited thereto, and any other suitable value may be selected as the predetermined threshold value according to actual needs.
  • the pose of the camera that shoots the image to be queried is output.
  • the pose of the camera includes, for example, the position and pose of the camera when the image to be queried is captured.
  • the output pose of the camera may be a 6-DOF variable describing the three-dimensional coordinates and rotation direction of the camera.
  • the saliency of each 3D point in the 3D point cloud of the localization model is calculated, and the 3D points whose saliency is greater than a predetermined threshold and their corresponding multiple descriptors are outputted.
  • Optimize the location model of the scene and then use the optimized location model of the scene to locate the image to be queried, which can effectively reduce the number of 3D points in the location model, speed up the location speed, and improve the location efficiency, thereby optimizing the location on mobile devices.
  • Computing makes various real-time AR applications based on scene 3D positioning possible on mobile devices.
  • FIG. 3 shows a schematic structural diagram of an image-based positioning device 300 according to an embodiment of the present disclosure. Since the functions of the positioning device 300 are the same as the details of the positioning method 200 described above with reference to FIG. 2 , the detailed description of the same content is omitted here for simplicity.
  • the positioning device 300 includes: an input unit 310 configured to input an image to be queried; a positioning unit 320 configured to locate the image to be queried using an optimized positioning model of the scene to which the image to be queried belongs; and an output unit 330, which is configured to output the pose of the camera that captures the image to be queried.
  • the positioning device 300 may further include an optimization unit 340, and the optimization unit 340 is configured to: receive a positioning model of the input scene, where the positioning model includes a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; Calculate the saliency of each 3D point in the 3D point cloud, and if the saliency is greater than a predetermined threshold, output the 3D point and a plurality of descriptors corresponding to the 3D point to the optimized localization model of the scene; and output the optimized scene Position the model.
  • the positioning device 300 may also include other components, however, since these components are not related to the content of the embodiment of the present disclosure, their illustration and description are omitted here.
  • the saliency of each 3D point in the 3D point cloud of the positioning model is calculated, and the 3D points whose saliency is greater than a predetermined threshold and their corresponding multiple descriptors are used to identify the saliency.
  • Optimize the location model of the scene and then use the optimized location model of the scene to locate the image to be queried, which can effectively reduce the number of 3D points in the location model, speed up the location speed, and improve the location efficiency, thereby optimizing the location on mobile devices.
  • Computing makes various real-time AR applications based on scene 3D positioning possible on mobile devices.
  • FIG. 4 shows a schematic structural diagram of an exemplary electronic device 400 according to an embodiment of the present disclosure.
  • An exemplary electronic device 400 according to an embodiment of the present disclosure includes at least one or more processors; and one or more memories, wherein the memory stores computer-readable code that is executed by the one or more processors When run, causes one or more processors to perform the method as described above.
  • the electronic device 400 in the embodiment of the present disclosure may include, but is not limited to, such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), Mobile terminals such as in-vehicle terminals (eg, in-vehicle navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • PDA personal digital assistant
  • PAD tablet computer
  • PMP portable multimedia player
  • Mobile terminals such as in-vehicle terminals (eg, in-vehicle navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • FIG. 4 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 400 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 401 that may be loaded into random access according to a program stored in a read only memory (ROM) 402 or from a storage device 408 Various appropriate actions and processes are executed by the programs in the memory (RAM) 403 . In the RAM 403, various programs and data required for the operation of the electronic device 400 are also stored.
  • the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An input/output (I/O) interface 405 is also connected to bus 404 .
  • I/O interface 405 the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 407 of a computer, etc.; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409. Communication means 409 may allow electronic device 400 to communicate wirelessly or by wire with other devices to exchange data.
  • FIG. 4 shows electronic device 400 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 409, or from the storage device 408, or from the ROM 402.
  • the processing apparatus 401 When the computer program is executed by the processing apparatus 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • FIG. 5 shows a schematic diagram of an exemplary computer-readable storage medium 500 according to an embodiment of the present disclosure.
  • the computer-readable storage medium 500 has computer-readable instructions 501 stored thereon, and the computer-readable instructions 501 are When the processor executes, the processor is caused to execute the positioning model optimization method and the positioning method described in the foregoing embodiments.
  • the above-mentioned computer-readable storage medium of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • Computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon.
  • Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable storage medium, other than a computer-readable storage medium, that can be sent, propagated, or transmitted for use by or in connection with the instruction execution system, apparatus, or device. program.
  • Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a computer-readable storage medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a positioning model optimization method comprising: a positioning model of an input scene, the positioning model comprising a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; calculating each of the three-dimensional point clouds The saliency of the 3D point, and if the saliency is greater than a predetermined threshold, output the 3D point and a plurality of descriptors corresponding to the 3D point into the optimal localization model of the scene; and output the optimal localization model of the scene.
  • each element in the matrix representing the trajectory is a vector composed of the position coordinates and descriptors of each two-dimensional feature point corresponding to the three-dimensional point.
  • An image-based positioning method comprising: inputting an image to be queried; using an optimized positioning model of a scene to which the image to be queried belongs to locate the image to be queried; and outputting the pose of a camera that shoots the image to be queried, wherein the The optimized positioning model is obtained by the following methods: the positioning model of the input scene, the positioning model includes a three-dimensional point cloud and a plurality of descriptors corresponding to each three-dimensional point in the three-dimensional point cloud; saliency, and if the saliency is greater than a predetermined threshold, output the three-dimensional point and a plurality of descriptors corresponding to the three-dimensional point into the optimal localization model of the scene; and output the optimal localization model of the scene.
  • each element in the matrix representing the trajectory is a vector consisting of the position coordinates and descriptors of each two-dimensional feature point corresponding to the three-dimensional point.
  • An image-based positioning device comprising: an input unit configured to input an image to be queried; a positioning unit configured to locate an image to be queried using an optimized positioning model of a scene to which the image to be queried belongs; and an output unit, It is configured to output the pose of the camera that captures the image to be queried, wherein the positioning device further includes an optimization unit, and the optimization unit is configured to: receive a positioning model of the input scene, where the positioning model includes a three-dimensional point cloud and each of the three-dimensional point clouds.
  • Multiple descriptors corresponding to three 3D points calculate the saliency of each 3D point in the 3D point cloud, and if the saliency is greater than a predetermined threshold, output the 3D point and the multiple descriptors corresponding to the 3D point to the scene's in the optimized positioning model; and the optimized positioning model of the output scene.
  • An image-based positioning device comprising: one or more processors; and one or more memories, wherein the memories are stored with computer-readable codes, the computer-readable codes are executed by the one or more processors , causing one or more processors to perform a method as described in any of the above solutions A1 to A5 and B1 to B5.
  • a computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions, when executed by a processor, cause the processor to execute any one of the above-mentioned solutions A1 to A5 and B1 to B5 the method described.

Abstract

提供一种定位模型优化方法、定位方法和定位设备。定位模型优化方法包括:输入场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符;计算三维点云中的每个三维点的显著性,并且如果显著性大于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。

Description

定位模型优化方法、定位方法和定位设备
相关申请的交叉引用
本公开要求字节跳动有限公司于2020年08月03日提交的,申请名称为“定位模型优化方法、定位方法和定位设备”的、中国专利申请号为“202010767049.5”的优先权,该中国专利申请的全部内容通过引用结合在本公开中。
技术领域
本公开涉及定位领域,并且更具体地,涉及一种定位模型优化方法、定位方法和定位设备。
背景技术
传统的定位技术通常基于GPS(Global Positioning System,全球定位系统)或者Wi-Fi无线网络等,但是这些定位技术存在着易受干扰、适用范围有限等诸多问题。相比于GPS等常用的定位方法,基于图像的定位方法通过进行相对于已知场景的三维定位能够提供更好的定位精度,从而能更好的服务于增强现实(Augmented Reality,AR)的应用。
通常,在基于图像的定位方法中,通过对场景的一系列图像进行三维重建得到的三维定位模型包含大量的三维点和各图像中的对应二维特征点及其描述符。对于一幅需要定位的待查询图像,首先需要提取待查询图像中的二维特征点及其描述符,然后与定位模型中的描述符进行匹配,确定了匹配的描述符也就确定了对应的三维点,从而实现对待查询图像的定位。因此,定位模型中的三维点及对应描述符的数量与场景的大小和输入图像的多少呈正相关关系,并直接影响定位算法的效率。传统的基于图像的定位方法依赖于强大的计算能力来处理大规模的三维点和描述符,但因此通常只能在服务器端实现,而服务器端的定位方法则需要依赖于网络连接和高速带宽,对于移动端上的各种AR实时应用造成了相当程度上的限制。
技术问题
本公开的目的旨在至少能解决上述的技术缺陷之一,特别是现有技术中存在的基于服务器端的定位方法需要依赖于网络连接和高速带宽,对于移动端上的各种AR实时应用造成了相当程度上的限制的技术问题。
技术解决方案
为了克服现有技术中的缺陷,本公开提供了一种定位模型优化方法、基于图像的定位方法和定位设备、以及计算机可读存储介质。
根据本公开的一个方面,提供了一种定位模型优化方法,包括:输入场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;计 算所述三维点云中的每个三维点的显著性,并且如果所述显著性大于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及输出所述场景的优化定位模型。
根据本公开的另一个方面,提供了一种基于图像的定位方法,包括:输入待查询图像;利用所述待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及输出拍摄所述待查询图像的相机的位姿。其中,所述场景的优化定位模型通过以下方法得到:输入所述场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;计算所述三维点云中的每个三维点的显著性,并且如果所述显著性大于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及输出所述场景的优化定位模型。
根据本公开的另一个方面,提供了一种基于图像的定位设备,包括:输入单元,被配置为输入待查询图像;定位单元,被配置为利用所述待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及输出单元,被配置为输出拍摄所述待查询图像的相机的位姿。其中,所述定位设备还包括优化单元,所述优化单元被配置为:接收输入的场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;计算所述三维点云中的每个三维点的显著性,并且如果所述显著性大于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及输出所述场景的优化定位模型。
根据本公开的另一个方面,提供了一种基于图像的定位装置,包括一个或多个处理器;和一个或多个存储器,其中,所述存储器中存储有计算机可读代码,所述计算机可读代码在由所述一个或多个处理器运行时,使得所述一个或多个处理器执行上述各方面中任一项所述的方法。
根据本公开的另一个方面,提供了一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令在被处理器执行时,使得所述处理器执行上述各方面中任一项所述的方法。
有益效果
根据本公开的定位模型优化方法、基于图像的定位方法和定位设备、以及计算机可读存储介质,通过计算输入的场景的定位模型的三维点云中的每个三维点的显著性,并且输出显著性大于预定阈值的三维点及其对应的多个描述符到场景的优化定位模型中,能够有效减少定位模型中三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
附图说明
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1示出了根据本公开实施例的定位模型优化方法100的流程图;
图2示出了根据本公开实施例的基于图像的定位方法200的流程图;
图3示出了根据本公开实施例的基于图像的定位设备300的结构示意图;
图4示出了根据本公开实施例的示例性电子设备400的结构示意图;以及
图5示出了根据本公开实施例的示例性计算机可读存储介质500的示意图。
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
本公开的实施方式
为了使得本公开的目的、技术方案和优点更为明显,下面将参照附图详细描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理解,本公开不受这里描述的示例实施例的限制。
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本 领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
下面参照图1描述根据本公开实施例的定位模型优化方法。图1示出了根据本公开实施例的定位模型优化方法100的流程图。
如图1所示,在步骤S110中,输入场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符。场景例如可以是建筑物、城市等任何地理场景。根据本公开的实施例的一个示例,对于某个场景,其定位模型可以是通过对该场景进行三维重建得到的三维定位模型。例如,可以预先对场景拍摄一系列图像,然后对场景进行基于图像的三维重建得到场景的三维定位模型。三维定位模型中包括由大量三维点形成的三维点云,三维点云中的每一个三维点都对应有一系列的位于各个图像上的二维特征点以及这些二维特征点的描述符。这里,描述符例如可以是描述二维特征点与其周围内容的相互关系的参数,从而能够利用描述符实现特征点的匹配。例如,描述符可以是对特征点周围的小像素块进行描述的二进制特征描述符。常用的描述符例如有BRISK(Binary Robust Invariant Scalable Keypoints,二进制鲁棒不变可伸缩关键点)描述符、BRIEF(Binary Robust Independent Elementary Features,二进制鲁棒独立元特征)描述符等。
接下来,在步骤S120中,计算三维点云中的每个三维点的显著性,并且如果该三维点的显著性大于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中。也就是说,对于三维点云中的每个三维点,如果其显著性大于预定阈值,则输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;如果其显著性小于或等于预定阈值,则不输出该三维点及其对应的描述符。从而,相比于输入的场景的定位模型,场景的优化定位模型中的三维点及描述符的数量大大减少,具体减少的幅度则取决于预定阈值的大小。
根据本公开实施例的一个示例,计算三维点云中的每个三维点的显著性可以包括:确定该三维点投影在场景的不同图像上的二维特征点形成的轨迹;以及计算轨迹的长度作为三维点的显著性。如前所述,定位模型是通过对场景的一系列图像进行三维重建得到的,相应地,定位模型的三维点云中的每个三维点对应于各个图像上的一系列二维特征点。这些二维特征点在空间上可以视为形成了一条轨迹,轨迹的长度代表了该三维点的显著性。例如,如果某个三维点的轨迹较长,则表明该三维点是由较多个二维特征点重建得到的,或者说该三维点的投影出现在了较多幅图像中,则该三维点的显著性大;如果某个三维点的轨迹较短,则表明该三维点是由较少二维特征点重建得到的,或者说该三维点的投影出现在了较少幅图像中,则该三维点的显著性小。需要说明的是,虽然在该示例中通过计算三维点投影在不同图像上的二维特征点形成的轨迹的长度作为三 维点的显著性,但是本公开不限于此,也可以以其他方式计算三维点的显著性。
根据本公开实施例的一个示例,表示轨迹的矩阵中的每个元素可以是由三维点对应的每个二维特征点的位置坐标和描述符构成的向量。例如,三维点云中的某个三维点p在场景的不同图像上的二维特征点形成的轨迹可以用矩阵{f1,f2,……,fn}来表示,其中,n表示用于重建三维点p的二维特征点的个数,其中,元素fi可以由三维点p在第i幅图像上的二维特征点的位置坐标(xi,yi)及其对应的描述符di构成。例如,假定描述符di是128维的向量,则元素fi为二维的位置坐标向量(xi,yi)和128维的描述符向量di构成的130维的向量,此时,轨迹{f1,f2,……,fn}为130×n的矩阵,计算该130×n的矩阵的长度作为三维点p的显著性。
根据本公开实施例的一个示例,三维点云中的三维点的轨迹的长度可以为表示该轨迹的矩阵的行数或列数。例如,在上述示例中,表示三维点p的轨迹的矩阵为{f1,f2,……,fn},则三维点p的轨迹的长度可以为该矩阵的列数n,相应地,三维点p的显著性为列数n。可以理解的是,在另一示例中,表示三维点p的轨迹的矩阵可以为{f1,f2,……,fn}T,则三维点p的轨迹的长度可以为该矩阵的行数n,相应地,三维点p的显著性为行数n。需要说明的是,虽然上述以表示轨迹的矩阵的行数或列数为例说明了三维点的轨迹的长度,但是本公开不限于此,三维点的轨迹的长度也可以是任何其他合适的值,,例如表示轨迹的矩阵的欧几里得范数等。
在计算得到三维点云中的每个三维点的显著性之后,将该显著性与预定阈值进行比较。例如,在上述示例中,三维点p的显著性为n,如果n大于预定阈值,则输出三维点p及其对应的多个描述符到优化定位模型中;如果n小于或等于预定阈值,则不输出三维点p及其对应描述符。预定阈值的选取可以取决于期望达到的优化后的定位模型的定位速度和效率。例如,在三维点的显著性为表示三维点的轨迹的矩阵的行数或列数的情况下,可以设置预定阈值为5,则对于三维点p,如果其显著性n大于5,则输出三维点p及其对应的多个描述符到优化定位模型中;如果n小于或等于5,则不输出三维点p及其对应描述符。在预定阈值为5的情况下,优化后的定位模型中的三维点的数量可以减少至少一半,从而提高至少一倍的定位速度。应当理解的是,虽然这里以预定阈值为5作为示例进行了说明,但是本公开不限于此,可以根据实际需要选取任何其他合适的数值作为预定阈值。
在对输入的场景的定位模型进行优化之后,接下来,在步骤S130中,输出场景的优化定位模型。所输出的场景的优化定位模型能够快速、高效地进行定位计算,因而除了可以应用于传统的服务器端的定位计算之外,还可以应用于诸如手机、便携式计算机等的移动终端上的定位计算。
利用根据本公开上述实施例的定位模型优化方法,通过计算输入的场景的定位模型的三维点云中的每个三维点的显著性,并且输出显著性大于预定阈值的三维点及其对应的多个描述符到场景的优化定位模型中,能够有效减少定位模型中三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
下面参照图2描述根据本公开实施例的基于图像的定位方法。图2示出了根据本公开实施例的基于图像的定位方法200的流程图。如图2所示,在步骤S210中,输入待查询图像。待查询图像例如是通过相机等拍摄设备拍摄得到的图像。接下来,在步骤S220中,利用待查询图像所属的场景的优化定位模型对待查询图像进行定位。场景的优化定位模型例如可以通过以下方法得到:输入场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符;计算三维点云中的每个三维点的显著性,并且如果显著性大于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。
根据本公开的实施例的一个示例,对于某个场景,其定位模型可以是通过对该场景进行三维重建得到的三维定位模型。例如,可以预先对场景拍摄一系列图像,然后对场景进行基于图像的三维重建得到场景的三维定位模型。三维定位模型中包括由大量三维点形成的三维点云,三维点云中的每一个三维点都对应有一系列的位于各个图像上的二维特征点以及这些二维特征点的描述符。这里,描述符例如可以是描述二维特征点与其周围内容的相互关系的参数,从而能够利用描述符实现特征点的匹配。例如,描述符可以是对特征点周围的小像素块进行描述的二进制特征描述符。常用的描述符例如有BRISK描述符、BRIEF描述符等。
根据本公开实施例的一个示例,计算三维点云中的每个三维点的显著性可以包括:确定该三维点投影在场景的不同图像上的二维特征点形成的轨迹;以及计算轨迹的长度作为三维点的显著性。如前所述,定位模型是通过对场景的一系列图像进行三维重建得到的,相应地,定位模型的三维点云中的每个三维点对应于各个图像上的一系列二维特征点。这些二维特征点在空间上可以视为形成了一条轨迹,轨迹的长度代表了该三维点的显著性。例如,如果某个三维点的轨迹较长,则表明该三维点是由较多个二维特征点重建得到的,或者说该三维点的投影出现在了较多幅图像中,则该三维点的显著性大;如果某个三维点的轨迹较短,则表明该三维点是由较少二维特征点重建得到的,或者说该三维点的投影出现在了较少幅图像中,则该三维点的显著性小。需要说明的是,虽然在该示例中通过计算三维点投影在不同图像上的二维特征点形成的轨迹的长度作为三维点的显著性,但是本公开不限于此,也可以以其他方式计算三维点的显著性。
根据本公开实施例的一个示例,表示轨迹的矩阵中的每个元素可以是由三维点对应的每个二维特征点的位置坐标和描述符构成的向量。例如,三维点云中的某个三维点p在场景的不同图像上的二维特征点形成的轨迹可以用矩阵{f1,f2,……,fn}来表示,其中,n表示用于重建三维点p的二维特征点的个数,其中,元素fi可以由三维点p在第i幅图像上的二维特征点的位置坐标(xi,yi)及其对应的描述符di构成。例如,假定描述符di是128维的向量,则元素fi为二维的位置坐标向量(xi,yi)和128维的描述符向量di构成的130维的向量,此时,轨迹{f1,f2,……,fn}为130×n的矩阵,计算该130×n的矩阵的长度作为三维点p的显著性。
根据本公开实施例的一个示例,三维点云中的三维点的轨迹的长度可以为表示该轨迹的矩阵的行数或列数。例如,在上述示例中,表示三维点p的轨迹的矩阵为{f1,f2,……,fn},则三维点p的轨迹的长度可以为该矩阵的列数n,相应地,三维点p的显著性为列数n。可以理解的是,在另一示例中,表示三维点p的轨迹的矩阵可以为{f1,f2,……,fn}T,则三维点p的轨迹的长度可以为该矩阵的行数n,相应地,三维点p的显著性为行数n。需要说明的是,虽然上述以表示轨迹的矩阵的行数或列数为例说明了三维点的轨迹的长度,但是本公开不限于此,三维点的轨迹的长度也可以是任何其他合适的值,,例如表示轨迹的矩阵的欧几里得范数等。
在计算得到三维点云中的每个三维点的显著性之后,将该显著性与预定阈值进行比较。例如,在上述示例中,三维点p的显著性为n,如果n大于预定阈值,则输出三维点p及其对应的多个描述符到优化定位模型中;如果n小于或等于预定阈值,则不输出三维点p及其对应描述符。预定阈值的选取可以取决于期望达到的优化后的定位模型的定位速度和效率。例如,在三维点的显著性为表示三维点的轨迹的矩阵的行数或列数的情况下,可以设置预定阈值为5,则对于三维点p,如果其显著性n大于5,则输出三维点p及其对应的多个描述符到优化定位模型中;如果n小于或等于5,则不输出三维点p及其对应描述符。在预定阈值为5的情况下,优化后的定位模型中的三维点的数量可以减少至少一半,从而提高至少一倍的定位速度。应当理解的是,虽然这里以预定阈值为5作为示例进行了说明,但是本公开不限于此,可以根据实际需要选取任何其他合适的数值作为预定阈值。
在利用场景的优化定位模型对待查询图像进行定位之后,接下来,在步骤S230中,输出拍摄待查询图像的相机的位姿。相机的位姿例如包括拍摄该待查询图像时相机的位置和姿态,例如,所输出的相机的位姿可以是描述相机的三维坐标和旋转方向的6自由度的变量。
在根据本公开上述实施例的定位方法中,通过计算定位模型的三维点云中的每个三 维点的显著性、并且输出显著性大于预定阈值的三维点及其对应的多个描述符来对场景的定位模型进行优化,然后利用场景的优化定位模型对待查询图像进行定位,能够有效减少定位模型中三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
下面参照图3描述根据本公开实施例的基于图像的定位设备。图3示出了根据本公开实施例的基于图像的定位设备300的结构示意图。由于定位设备300的功能与在上文中参照图2描述的定位方法200的细节相同,因此在这里为了简单起见,省略对相同内容的详细描述。如图3所示,定位设备300包括:输入单元310,被配置为输入待查询图像;定位单元320,被配置为利用待查询图像所属的场景的优化定位模型对待查询图像进行定位;以及输出单元330,被配置为输出拍摄待查询图像的相机的位姿。此外,定位设备300还可以包括优化单元340,优化单元340被配置为:接收输入的场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符;计算三维点云中的每个三维点的显著性,并且如果显著性大于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。除了这四个单元以外,定位设备300还可以包括其他部件,然而,由于这些部件与本公开实施例的内容无关,因此在这里省略其图示和描述。
在根据本公开上述实施例的定位设备中,通过计算定位模型的三维点云中的每个三维点的显著性、并且输出显著性大于预定阈值的三维点及其对应的多个描述符来对场景的定位模型进行优化,然后利用场景的优化定位模型对待查询图像进行定位,能够有效减少定位模型中三维点的数量,加快了定位速度,提高了定位效率,从而能够优化在移动设备上的定位计算,使得移动设备上的各种基于场景三维定位的实时AR应用成为可能。
此外,根据本公开实施例的定位设备也可以借助于图4所示的示例性电子设备的架构来实现。图4示出了根据本公开实施例的示例性电子设备400的结构示意图。根据本公开实施例的示例性电子设备400至少包括一个或多个处理器;和一个或多个存储器,其中,存储器中存储有计算机可读代码,计算机可读代码在由一个或多个处理器运行时,使得一个或多个处理器执行如上所述的方法。
具体地,本公开实施例中的电子设备400可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。应当理解的是,图4示出的电子设备仅仅是一个示例,而不 应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(ROM)402中的程序或者从存储装置408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM 402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
此外,本公开还提供了一种计算机可读存储介质。图5示出了根据本公开实施例的示例性计算机可读存储介质500的示意图,如图5所示,计算机可读存储介质500上存储有计算机可读指令501,计算机可读指令501在被处理器执行时,使得处理器执行上述各个实施例中描述的定位模型优化方法以及定位方法。
需要说明的是,本公开上述的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。 这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读存储介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,计算机可读存储介质可以是有形的介质,其可以包含或存储 以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
A1.一种定位模型优化方法,包括:输入场景的定位模型,定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;计算三维点云中的每个三维点的显著性,并且如果显著性大于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。
A2.根据解决方案A1的定位模型优化方法,其中,计算三维点云中的每个三维点的显著性包括:确定三维点投影在场景的不同图像上的二维特征点形成的轨迹;以及计算轨迹的长度作为三维点的显著性。
A3.根据解决方案A2的定位模型优化方法,其中,表示轨迹的矩阵中的每个元素是由三维点对应的每个二维特征点的位置坐标和描述符构成的向量。
A4.根据解决方案A2的定位模型优化方法,其中,轨迹的长度为表示轨迹的矩阵的行数或列数。
A5.根据解决方案A1的定位模型优化方法,其中,场景的定位模型是通过对场景进行三维重建得到的三维定位模型。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
B1.一种基于图像的定位方法,包括:输入待查询图像;利用待查询图像所属的场景的优化定位模型对待查询图像进行定位;以及输出拍摄待查询图像的相机的位姿,其中,场景的优化定位模型通过以下方法得到:输入场景的定位模型,定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;计算三维点云中的每个三维点的显著性,并且如果显著性大于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。
B2.根据解决方案B1的定位方法,其中,计算三维点云中的每个三维点的显著性包括:确定三维点投影在场景的不同图像上的二维特征点形成的轨迹;以及计算轨迹的长度作为三维点的显著性。
B3.根据解决方案B2的定位方法,其中,表示轨迹的矩阵中的每个元素是由三维 点对应的每个二维特征点的位置坐标和描述符构成的向量。
B4.根据解决方案B2的定位方法,其中,轨迹的长度为表示轨迹的矩阵的行数或列数。
B5.根据解决方案B1的定位方法,其中,场景的定位模型是通过对场景进行三维重建得到的三维定位模型。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
C1.一种基于图像的定位设备,包括:输入单元,被配置为输入待查询图像;定位单元,被配置为利用待查询图像所属的场景的优化定位模型对待查询图像进行定位;以及输出单元,被配置为输出拍摄待查询图像的相机的位姿,其中,定位设备还包括优化单元,优化单元被配置为:接收输入的场景的定位模型,定位模型包括三维点云和三维点云中的每个三维点对应的多个描述符;计算三维点云中的每个三维点的显著性,并且如果显著性大于预定阈值,输出该三维点以及与该三维点对应的多个描述符到场景的优化定位模型中;以及输出场景的优化定位模型。
现在以基于解决方案的格式公开优选地由一些实施例实现的一些特征。
D1.一种基于图像的定位装置,包括:一个或多个处理器;和一个或多个存储器,其中,存储器中存储有计算机可读代码,计算机可读代码在由一个或多个处理器运行时,使得一个或多个处理器执行如上述解决方案A1至A5和B1至B5中任一项所述的方法。
E1.一种计算机可读存储介质,其上存储有计算机可读指令,该计算机可读指令在被处理器执行时,使得处理器执行如上述解决方案A1至A5和B1至B5中任一项所述的方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理 解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (13)

  1. 一种定位模型优化方法,包括:
    输入场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;
    计算所述三维点云中的每个三维点的显著性,并且如果所述显著性大于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及
    输出所述场景的优化定位模型。
  2. 根据权利要求1所述的定位模型优化方法,其中,所述计算所述三维点云中的每个三维点的显著性包括:
    确定所述三维点投影在所述场景的不同图像上的二维特征点形成的轨迹;以及
    计算所述轨迹的长度作为所述三维点的显著性。
  3. 根据权利要求2所述的定位模型优化方法,其中,表示所述轨迹的矩阵中的每个元素是由所述三维点对应的每个二维特征点的位置坐标和描述符构成的向量。
  4. 根据权利要求2所述的定位模型优化方法,其中,所述轨迹的长度为表示所述轨迹的矩阵的行数或列数。
  5. 根据权利要求1所述的定位模型优化方法,其中,所述场景的定位模型是通过对所述场景进行三维重建得到的三维定位模型。
  6. 一种基于图像的定位方法,包括:
    输入待查询图像;
    利用所述待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及
    输出拍摄所述待查询图像的相机的位姿,
    其中,所述场景的优化定位模型通过以下方法得到:
    输入所述场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;
    计算所述三维点云中的每个三维点的显著性,并且如果所述显著性大于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及
    输出所述场景的优化定位模型。
  7. 根据权利要求6所述的定位方法,其中,所述计算所述三维点云中的每个三维点的显著性包括:
    确定所述三维点投影在所述场景的不同图像上的二维特征点形成的轨迹;以及
    计算所述轨迹的长度作为所述三维点的显著性。
  8. 根据权利要求7所述的定位方法,其中,表示所述轨迹的矩阵中的每个元素是由所述三维点对应的每个二维特征点的位置坐标和描述符构成的向量。
  9. 根据权利要求7所述的定位方法,其中,所述轨迹的长度为表示所述轨迹的矩阵的行数或列数。
  10. 根据权利要求6所述的定位方法,其中,所述场景的定位模型是通过对所述场景进行三维重建得到的三维定位模型。
  11. 一种基于图像的定位设备,包括:
    输入单元,被配置为输入待查询图像;
    定位单元,被配置为利用所述待查询图像所属的场景的优化定位模型对所述待查询图像进行定位;以及
    输出单元,被配置为输出拍摄所述待查询图像的相机的位姿,
    其中,所述定位设备还包括优化单元,所述优化单元被配置为:
    接收输入的场景的定位模型,所述定位模型包括三维点云和所述三维点云中的每个三维点对应的多个描述符;
    计算所述三维点云中的每个三维点的显著性,并且如果所述显著性大于预定阈值,输出所述三维点以及与所述三维点对应的多个描述符到所述场景的优化定位模型中;以及
    输出所述场景的优化定位模型。
  12. 一种基于图像的定位装置,包括:
    一个或多个处理器;和
    一个或多个存储器,其中,所述存储器中存储有计算机可读代码,所述计算机可读代码在由所述一个或多个处理器运行时,使得所述一个或多个处理器执行如权利要求1-10中任一项所述的方法。
  13. 一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令在被处理器执行时,使得所述处理器执行如权利要求1-10中任一项所述的方法。
PCT/CN2021/107976 2020-08-03 2021-07-22 定位模型优化方法、定位方法和定位设备 WO2022028254A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/040,463 US20230290094A1 (en) 2020-08-03 2021-07-22 Positioning model optimization method, positioning method, and positioning device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010767049.5A CN111862352A (zh) 2020-08-03 2020-08-03 定位模型优化方法、定位方法和定位设备
CN202010767049.5 2020-08-03

Publications (1)

Publication Number Publication Date
WO2022028254A1 true WO2022028254A1 (zh) 2022-02-10

Family

ID=72952766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107976 WO2022028254A1 (zh) 2020-08-03 2021-07-22 定位模型优化方法、定位方法和定位设备

Country Status (3)

Country Link
US (1) US20230290094A1 (zh)
CN (1) CN111862352A (zh)
WO (1) WO2022028254A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661493A (zh) * 2022-12-28 2023-01-31 航天云机(北京)科技有限公司 一种对象位姿的确定方法及装置、设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111862352A (zh) * 2020-08-03 2020-10-30 字节跳动有限公司 定位模型优化方法、定位方法和定位设备
CN112750164B (zh) * 2021-01-21 2023-04-18 脸萌有限公司 轻量化定位模型的构建方法、定位方法、电子设备
CN114998600B (zh) * 2022-06-17 2023-07-25 北京百度网讯科技有限公司 图像处理方法、模型的训练方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093499A (zh) * 2012-12-26 2013-05-08 深圳先进技术研究院 一种适用于网络传输的城市三维模型数据组织方法
CN104715504A (zh) * 2015-02-12 2015-06-17 四川大学 一种鲁棒的大场景稠密三维重建方法
CN105184789A (zh) * 2015-08-31 2015-12-23 中国科学院自动化研究所 一种基于点云约减的相机定位系统及方法
US20190139319A1 (en) * 2017-11-06 2019-05-09 Adobe Systems Incorporated Automatic 3d camera alignment and object arrangment to match a 2d background image
CN110163903A (zh) * 2019-05-27 2019-08-23 百度在线网络技术(北京)有限公司 三维图像的获取及图像定位方法、装置、设备和存储介质
CN111862352A (zh) * 2020-08-03 2020-10-30 字节跳动有限公司 定位模型优化方法、定位方法和定位设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826206B (zh) * 2010-03-31 2011-12-28 北京交通大学 一种相机自定标的方法
CN108090960B (zh) * 2017-12-25 2019-03-05 北京航空航天大学 一种基于几何约束的目标重建方法
CN109658497B (zh) * 2018-11-08 2023-04-14 北方工业大学 一种三维模型重建方法及装置
CN110070608B (zh) * 2019-04-11 2023-03-31 浙江工业大学 一种自动删除基于图像的三维重建冗余点的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093499A (zh) * 2012-12-26 2013-05-08 深圳先进技术研究院 一种适用于网络传输的城市三维模型数据组织方法
CN104715504A (zh) * 2015-02-12 2015-06-17 四川大学 一种鲁棒的大场景稠密三维重建方法
CN105184789A (zh) * 2015-08-31 2015-12-23 中国科学院自动化研究所 一种基于点云约减的相机定位系统及方法
US20190139319A1 (en) * 2017-11-06 2019-05-09 Adobe Systems Incorporated Automatic 3d camera alignment and object arrangment to match a 2d background image
CN110163903A (zh) * 2019-05-27 2019-08-23 百度在线网络技术(北京)有限公司 三维图像的获取及图像定位方法、装置、设备和存储介质
CN111862352A (zh) * 2020-08-03 2020-10-30 字节跳动有限公司 定位模型优化方法、定位方法和定位设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIANHANG ZHENG; CHANGYOU CHEN; JUNSONG YUAN; BO LI; KUI REN: "PointCloud Saliency Maps", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 28 November 2018 (2018-11-28), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081124199 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661493A (zh) * 2022-12-28 2023-01-31 航天云机(北京)科技有限公司 一种对象位姿的确定方法及装置、设备及存储介质

Also Published As

Publication number Publication date
CN111862352A (zh) 2020-10-30
US20230290094A1 (en) 2023-09-14

Similar Documents

Publication Publication Date Title
WO2022028254A1 (zh) 定位模型优化方法、定位方法和定位设备
CN110321958B (zh) 神经网络模型的训练方法、视频相似度确定方法
WO2022083383A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN110413812B (zh) 神经网络模型的训练方法、装置、电子设备及存储介质
CN110222775B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2022033111A1 (zh) 图像信息提取方法、训练方法及装置、介质和电子设备
WO2022028253A1 (zh) 定位模型优化方法、定位方法和定位设备以及存储介质
CN112258512A (zh) 点云分割方法、装置、设备和存储介质
WO2020211573A1 (zh) 用于处理图像的方法和装置
WO2022105622A1 (zh) 图像分割方法、装置、可读介质及电子设备
CN114399588B (zh) 三维车道线生成方法、装置、电子设备和计算机可读介质
WO2024051536A1 (zh) 直播特效渲染方法、装置、设备、可读存储介质及产品
WO2023029893A1 (zh) 纹理映射方法、装置、设备及存储介质
CN110188782B (zh) 图像相似性确定方法、装置、电子设备及可读存储介质
WO2022171036A1 (zh) 视频目标追踪方法、视频目标追踪装置、存储介质及电子设备
WO2023138468A1 (zh) 虚拟物体的生成方法、装置、设备及存储介质
WO2023109564A1 (zh) 视频图像处理方法、装置、电子设备及存储介质
WO2023138441A1 (zh) 视频生成方法、装置、设备及存储介质
WO2023138467A1 (zh) 虚拟物体的生成方法、装置、设备及存储介质
WO2022194145A1 (zh) 一种拍摄位置确定方法、装置、设备及介质
CN116596748A (zh) 图像风格化处理方法、装置、设备、存储介质和程序产品
CN113963000B (zh) 图像分割方法、装置、电子设备及程序产品
WO2022052889A1 (zh) 图像识别方法、装置、电子设备和计算机可读介质
CN113778078A (zh) 定位信息生成方法、装置、电子设备和计算机可读介质
WO2023284479A1 (zh) 平面估计方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21854246

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21854246

Country of ref document: EP

Kind code of ref document: A1