WO2021027692A1 - 视觉特征库的构建方法、视觉定位方法、装置和存储介质 - Google Patents

视觉特征库的构建方法、视觉定位方法、装置和存储介质 Download PDF

Info

Publication number
WO2021027692A1
WO2021027692A1 PCT/CN2020/107597 CN2020107597W WO2021027692A1 WO 2021027692 A1 WO2021027692 A1 WO 2021027692A1 CN 2020107597 W CN2020107597 W CN 2020107597W WO 2021027692 A1 WO2021027692 A1 WO 2021027692A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
library
feature
feature points
processed
Prior art date
Application number
PCT/CN2020/107597
Other languages
English (en)
French (fr)
Inventor
杜斯亮
康泽慧
方伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021027692A1 publication Critical patent/WO2021027692A1/zh
Priority to US17/665,793 priority Critical patent/US20220156968A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • This application relates to the field of visual positioning, and more specifically, to a method for constructing a visual feature library, a visual positioning method, a device, and a storage medium.
  • Visual positioning is widely used in many fields, such as autonomous driving, augmented reality and other fields. Visual positioning generally uses a pre-established visual feature library to calculate the pose information of the camera through a single image taken by the camera.
  • the traditional scheme is generally to perform feature extraction and feature matching on the collected images, then obtain the descriptors and 3D (3D) positions of the matched feature points, and save the descriptors and 3D positions of the matched feature points in the visual feature library.
  • the traditional solution needs to match the features first. Only the information of the successfully matched feature points in the collected image can be saved in the visual feature database. This makes the traditional solution only have a certain number of images in the database. Fewer feature point information can be collected from the library-building image, so that the resulting visual feature library contains less information about the feature points of the library-building image.
  • This application provides a method for constructing a visual feature library, a visual positioning method, a device, and a storage medium.
  • the ray corresponding to the feature point of the library image is intersected by the 3D model, which can be used in the case of a certain number of library images. , Extract the information of a larger number of feature points from the library image, so that the constructed visual feature library contains a larger number of feature point information of the library image, which is convenient for subsequent better vision based on the visual feature library Positioning.
  • a method for constructing a visual feature library includes: obtaining a library image; performing feature extraction on the library image to obtain feature points of the library image and descriptors of the feature points of the library image ; Intersect the ray corresponding to the feature point of the library image with the 3D model to determine the 3D position of the feature point of the library image; build a visual feature library, which includes the descriptor of the feature point of the library image and the library image The 3D position of the feature point.
  • the 3D position of the feature point of the above-mentioned library image is the 3D position of the intersection of the ray and the 3D model, and the ray corresponding to the feature point of the library image starts from the projection center of the library image and passes through the image Rays of characteristic points.
  • the above-mentioned library-building image and the 3D model are located in the same coordinate system, and the projection center of the library-building image is the position where the first photographing unit shoots the library-building image.
  • the above-mentioned library image and 3D model may be located in the same world coordinate system.
  • the above-mentioned first photographing unit is a photographing unit that photographs the library-building image, and the first photographing unit may specifically be a camera.
  • the above-mentioned library image is one image or multiple images.
  • the above-mentioned library image is captured by a camera or other image capturing equipment, and the library image is used to build a visual feature library.
  • the above-mentioned library image may be a panoramic image, a wide-angle image, etc.
  • the above-mentioned acquiring the image for building the database includes: acquiring the image for building the database from a camera or an image capturing device.
  • a communication connection (either wired communication or wireless communication) may be established with the camera or image capturing device to obtain the library building image.
  • the feature of the above-mentioned library-building image includes multiple feature points.
  • the 3D position of the feature point of the library image is obtained by intersecting the ray with the 3D model.
  • the number of images in the library can be obtained Under certain circumstances, a greater number of feature point information can be obtained from the library image, so that the constructed visual feature library contains a greater number of feature point information.
  • the visual feature database constructed by this application contains a larger number of feature point information under the condition that the number of images in the database is fixed, a better visual positioning effect can be achieved when the visual feature database is subsequently used for visual positioning. .
  • the method for constructing a visual feature database of the present application contains a larger number of feature point information under the condition that the number of images to be constructed in the database is fixed, the method for constructing a visual feature database of the present application can be applied In scenes with large radiation differences and weak textures that are difficult to accurately perform visual positioning, in these scenes, the visual feature library obtained by the method for constructing the visual feature library of the embodiment of the present application can be used for visual positioning to achieve better vision. Positioning effect.
  • feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: using a feature extraction algorithm to Feature extraction is performed on the library image to obtain the feature point of the library image and the descriptor of the feature point of the library image.
  • the aforementioned feature extraction algorithm is an algorithm for extracting the feature points of the library image and the feature points of the library image.
  • one or more of the following feature extraction algorithms can be used.
  • ORB (English full name is oriented FAST and rotated BRIEF, the Chinese translation is fast oriented and easy to rotate) algorithm
  • ORB algorithm is a fast feature point extraction and description algorithm
  • SIFT full name in English scale-invariant feature transform, Chinese translation is scale-invariant feature transform
  • D2-Net algorithm D2-Net algorithm
  • D2-Net algorithm is a paper (A Trainable CNN for Joint Detection and Description of Local Features, Chinese translation is a trainable CNN for joint detection and local feature description, where CNN stands for Convolutional Neural Network)
  • the above feature extraction algorithm can be called a feature extraction operator.
  • the visual feature library further includes semantic information of the feature points of the library image and the confidence of the semantic information of the feature points of the library image.
  • the semantic information of the feature points of the library image is the same as the semantic information of the region where the feature points of the library image are located, and the confidence of the semantic information of the feature points of the library image is compared with the semantic information of the region where the feature points of the library image are located. The confidence is the same.
  • the semantic information of each region of the library image and the confidence of the semantic information of each region are obtained by semantic segmentation of the library image.
  • the above semantic information may include pedestrians, roads, vehicles, trees, buildings, sky, glass, and so on.
  • the above-mentioned semantic information may also include furniture, electrical appliances and so on.
  • the confidence of the above semantic information can be called the confidence of the semantic information.
  • the subsequent visual positioning can take into account the corresponding characteristics of different feature points. Semantic information and confidence levels determine the importance of different feature points in visual positioning, which can perform more precise visual positioning and improve the accuracy of visual positioning.
  • the visual feature library further includes a descriptor of the library image, wherein the descriptor of the library image is synthesized from the descriptors of the feature points of the library image .
  • the feature points of the library image can be multiple feature points
  • synthesizing the descriptors of the feature points of the library image is actually synthesizing the descriptors of multiple feature points in the library image.
  • the descriptors of the feature points of the above-mentioned library image can be called local descriptors, and the descriptor of the library image can be called global descriptors.
  • the visual feature library includes the descriptor of the library image
  • the visual feature library when visual positioning is performed according to the visual feature library, the visual feature library can be roughly screened according to the descriptor of the image to be processed, and the description can be selected first.
  • N N is a positive integer
  • feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: Scene simulation, generating scene images in various scenes; performing feature extraction on the scene images in the above various scenes to obtain feature points of the library image and descriptors of the feature point of the library image.
  • the foregoing multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • the lighting conditions of the foregoing multiple scenes are different.
  • the lighting conditions of each scene may be different from the lighting conditions of other scenes.
  • different light conditions may specifically refer to different light intensity.
  • Scene images in multiple scenes can also be called multiple scene images, and each scene image is obtained by performing a scene simulation on the library image.
  • the visual feature library finally constructed contains the information of the feature points extracted from different scene images , Making the information contained in the visual feature library richer, and facilitating more effective visual positioning based on the visual feature library.
  • the visual feature library contains feature points of multiple scene images
  • the target scene image that is closest to the scene when the image to be processed was taken can be determined from the multiple scene images.
  • the matching feature points of the feature points of the image to be processed are determined from the target scene image, which can determine more accurate matching feature points for the feature points of the image to be processed, thereby improving the success rate of visual positioning.
  • feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: Segmentation processing to obtain multiple slice images; feature extraction of multiple slice images to obtain feature points of the library image and descriptors of the feature point of the library image.
  • the above-mentioned library-building image may be a panoramic image.
  • the library image is a panoramic image
  • the panoramic image by segmenting the panoramic image, and extracting the features of the sliced image, it is convenient to determine the image to be processed more accurately during the subsequent visual positioning (that requires visual positioning) The matching points of the feature points of the image), thereby improving the accuracy of visual positioning.
  • the library-building image is a panoramic image
  • the imaging mode of panoramic projection is different from that of the image taken by the user
  • by segmenting the library-building image slice images of different perspectives can be obtained, thereby eliminating the
  • the difference between the imaging modes of the library image and the image taken by the user enables the matching feature point of the image taken by the user to be determined more accurately when visually positioning the image taken by the user according to the visual feature library.
  • the above method further includes: receiving an image to be processed from the user equipment; performing feature extraction on the image to be processed to obtain feature points of the image to be processed and information about the image to be processed Descriptor of the feature point; intersect the ray corresponding to the feature point of the image to be processed with the 3D model to determine the 3D position of the feature point of the image to be processed; update the visual feature library, the updated visual feature library includes the feature of the image to be processed The 3D position of the point and the feature point of the image to be processed.
  • the 3D position of the feature point of the image to be processed is the 3D position of the intersection of the ray corresponding to the feature point of the image to be processed and the 3D model, and the ray corresponding to the feature point of the image to be processed is the projection center of the image to be processed as the starting point , And the ray passing through the characteristic points of the image to be processed;
  • the image to be processed and the 3D model are located in the same coordinate system, and the projection center of the image to be processed is the position where the second photographing unit shoots the image to be processed.
  • the aforementioned image to be processed may be an image taken by the user equipment.
  • the image to be processed and the 3D model may be located in the same world coordinate system.
  • the above-mentioned second photographing unit is a photographing unit that photographs the library-building image, and the second photographing unit may specifically be a camera.
  • the information contained in the updated visual feature library is more real-time .
  • the above method before updating the visual feature database, the above method further includes: determining that the semantic information of the image to be processed is different from the semantic information of the reference image, wherein the reference image is a visual feature The image in the library that is closest to the position of the image to be processed.
  • the semantic information of the reference image in the visual feature library when the semantic information of the reference image in the visual feature library is different from that of the image to be processed, it means that the image content of the object corresponding to the image to be processed may have changed.
  • the visual feature database is updated in time to improve the real-time performance of the visual feature database.
  • the above method further includes: obtaining modeling data, the modeling data including modeling images and point cloud data; performing feature extraction on the modeling images to obtain modeling The feature points of the image; feature matching is performed on the feature points of any two images in the library image and the modeling image, and the matching feature points are stringed to obtain the feature point sequence with the same name;
  • the database image and the modeling image are adjusted to obtain the pose of the database image and the pose of the modeling image; a 3D model is constructed according to the pose and point cloud data of the modeling image.
  • the feature points obtained by the above matching are feature points corresponding to the same feature point in the real world in different images.
  • the above-mentioned stringing of the feature points obtained by matching may specifically be to connect the feature points corresponding to the same feature point in the real world in the database image and the modeling image to obtain a sequence connected by multiple feature points (sequence of feature points with the same name) ).
  • the position correction of the feature points in the library image and the modeling image can be performed according to the feature point sequence with the same name and the preset control points, so that the pose and the modeling image of the library image are obtained.
  • the pose is more accurate, which facilitates the subsequent construction of a more accurate visual feature library.
  • the above modeling image can be an image taken by a drone (in an outdoor environment, a drone can be used to obtain a modeling image), or a scanned image (in an indoor environment, a scanner can be used to scan to obtain a modeling image) .
  • the above modeled image is an image used to build a 3D model.
  • the library-building image and the modeling image are adjusted to align the library-building image and the modeling image, so that the 3D position of the feature points of the library-building image in the visual feature library is more accurate, and it is convenient to follow Visual feature library for more accurate positioning.
  • the above-mentioned library image is a panoramic image.
  • the library image When the library image is a panoramic image, the library image contains more information, and more feature points can be extracted from the library image in the process of building the visual feature library.
  • the above-mentioned feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: the library image Perform scene simulation to obtain scene images under various scenes; perform segmentation processing on the scene images under various scenes to obtain multiple slice images; perform feature extraction on multiple slice images to obtain a library image Descriptors of the feature points and the feature points of the library image.
  • the foregoing multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • part of the image content of adjacent slice images is the same.
  • a scene simulation is performed on the library image to obtain the scene images in the three scenes as the first scene image, the second scene image, and the third scene image.
  • the first scene image and the second scene image are respectively
  • the scene image and the third scene image are segmented to obtain multiple slice images. Assuming that each scene image is segmented to obtain 8 slice images, then by segmenting the first scene image, the second scene image and the third scene image, 24 slice images can be obtained, and then the 24 slice images are obtained.
  • Feature extraction is performed on slices of images to obtain the feature points and descriptors of the library image.
  • the above-mentioned feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: the library image Perform segmentation processing to obtain multiple slice images; perform scene simulation on each slice image of the multiple slice images to obtain scene images in various scenes; perform feature extraction on scene images in various scenes , To get the feature points of the library image and the descriptors of the feature point of the library image.
  • the above-mentioned multiple slice images part of the image content of adjacent slice images is the same, and the above-mentioned multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • segmentation processing is performed on the library image to obtain 8 slice images, and then scene simulation is performed on these 8 slice images.
  • scene simulation is performed on each slice image to obtain scene images under 4 kinds of scenes. Then, by performing scene simulation on these 8 slice images separately, 32 scene images can be obtained, and then the 32 scene images are characterized Extraction, so as to get the feature points and descriptors of the library image.
  • the above-mentioned feature extraction using multiple feature extraction algorithms for the library image can be to first perform segmentation processing and/or scene simulation on the library image, and then perform feature extraction on the resulting image, so as to obtain the feature points and the characteristics of the library image. Describes the descriptors of the feature points of the library image.
  • the above-mentioned feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: the library image Perform segmentation processing to obtain multiple slice images; use multiple feature extraction algorithms to separately perform feature extraction on each slice image in the multiple slice images to obtain the feature points of the library image and the features of the library image Point descriptor.
  • the database image is segmented to obtain 12 sliced images.
  • 3 feature extraction algorithms are used to extract the features of each of the 12 slice images to obtain the image of the database.
  • the feature point and the descriptor of the feature point of the library image are used.
  • the above-mentioned feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: the library image Perform scene simulation to generate scene images in a variety of scenarios; use a variety of feature extraction algorithms to perform feature extraction on scene images in multiple scenarios to obtain the feature points of the library image and the feature points of the library image Descriptor.
  • the aforementioned multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • scene simulation is performed on the library image to obtain scene images in 4 scenes.
  • 3 feature extraction algorithms are used to extract the features of the scene images in these 4 scenes respectively to obtain the image of the library.
  • the feature point and the descriptor of the feature point of the library image are used to obtain the image of the library.
  • the above-mentioned feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: the library image Perform scene simulation to obtain scene images in various scenes; perform segmentation processing on the scene images under various scenes to obtain multiple slice images; use various feature extraction algorithms to separately feature multiple slice images Extract to obtain the feature points of the library image and the descriptors of the feature point of the library image.
  • the foregoing multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • part of the image content of adjacent slice images is the same.
  • scene simulation is performed on the library image to obtain the scene images in the three scenes as the first scene image, the second scene image, and the third scene image.
  • the first scene image and the second scene image Perform segmentation processing with the third scene image, and segment each scene image to obtain 8 slice images.
  • segmenting the first scene image, the second scene image, and the third scene image 24 can be obtained.
  • three feature extraction algorithms are used to extract the features of the 24 slice images respectively to obtain the feature points and descriptors of the library image.
  • the above-mentioned feature extraction is performed on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image, including: the library image Perform segmentation processing to obtain multiple slice images; perform scene simulation on each slice image of the multiple slice images to obtain scene images in multiple scenes; use multiple feature extraction algorithms to separately perform multiple scenes Feature extraction is performed on the following scene image to obtain the feature points of the library image and the descriptor of the feature point of the library image.
  • the above-mentioned multiple slice images part of the image content of adjacent slice images is the same, and the above-mentioned multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • the database image is segmented to obtain 8 sliced images.
  • scene simulation is performed on each of the 8 sliced images to obtain scene images under 4 kinds of scenes.
  • 32 images are used to extract features of these 32 images respectively, so as to obtain the feature points and descriptors of the library image.
  • a visual positioning method which includes: acquiring an image to be processed; performing feature extraction on the image to be processed to obtain feature points of the image to be processed and a descriptor of the feature points of the image to be processed; The descriptor of the feature point of the image determines the matching feature point of the feature point of the image to be processed from the visual feature library; according to the 3D position of the matching feature point, the pose information when the photographing unit shoots the image to be processed is determined.
  • the aforementioned visual feature library includes the descriptor of the feature point of the library image and the 3D position of the feature point of the library image, and the visual feature library satisfies at least one of the following conditions:
  • the feature points of the database image include multiple sets of feature points, and the description methods of any two sets of feature points in the multiple sets of feature points are different;
  • the visual feature library includes the descriptor of the library image, and the descriptor of the library image is synthesized from the descriptors of the feature points of the library image;
  • the feature points of the library image are the feature points of various scene images.
  • the various scene images are obtained by scene simulation of the library image.
  • the various scenes include at least two of day, night, rain, snow and cloudy. ;
  • the feature points of the library image and the descriptors of the feature point of the library image are obtained by feature extraction of multiple slice images.
  • the multiple slice images are obtained by segmenting the database image. Among them, in the multiple slices In the image, part of the image content of adjacent slice images is the same;
  • the visual feature library includes the semantic information of the feature points of the library image and the confidence of the semantic information of the feature points of the library image.
  • the visual feature database in this application contains more information than the visual feature database of the traditional solution. Therefore, the visual feature database in this application can better perform visual positioning and improve the effect of visual positioning.
  • the visual feature library in the present application includes more information, when visual positioning is performed on the image to be processed according to the visual feature library, the matching feature points of the feature points of the image to be processed can be determined more accurately, and then It can achieve more precise positioning of the image to be processed.
  • the visual feature library in the second aspect may be constructed according to the method for constructing the visual feature library in the first aspect.
  • the multiple sets of feature points and the descriptors of the multiple sets of feature points may be obtained by feature extraction of the library image according to multiple feature extraction algorithms.
  • the multiple feature extraction algorithms can be any two algorithms among ORB algorithm, SIFT algorithm, SuperPoint algorithm, D2-net, and line feature.
  • the visual feature library contains more related information of the feature points, which is convenient for subsequent better visual positioning based on the visual feature library.
  • the descriptor of the above-mentioned library-building image may be a descriptor that describes the overall characteristics of the image-building, and the descriptor of the above-mentioned library-building image may be synthesized by synthesizing the descriptors of the feature points of the library-building image during the process of building the visual feature library. Obtained, the feature points of the database-building image here may refer to all the feature points extracted from the database-building image.
  • the feature points of the image to be built include multiple sets of feature points, and the feature points to be processed are determined from the visual feature library according to the descriptors of the feature points of the image to be processed.
  • the matching feature points of the feature points of the image include: determining the feature points of the target group from multiple groups of feature points according to the description method of the feature points of the image to be processed; The matching feature points of the feature points of the image to be processed are determined from the feature points of the target group.
  • the above-mentioned target group feature points are the same group of feature points in the description manner of the descriptors in the multiple sets of feature points and the description manner of the feature points of the image to be processed.
  • the feature points contained in the visual feature library have more information, and the description method of the feature points of the image to be processed is the same as the description of the feature points of the image to be processed by selecting from the multiple sets of feature points
  • the target feature points can be subsequently selected from the target feature points that are more matched with the feature points of the image to be processed, thereby improving the effect of visual positioning.
  • the above-mentioned visual feature library includes a descriptor of the library image, and the above-mentioned descriptor determines the image to be processed from the visual feature library based on the descriptor of the feature point of the image to be processed
  • the matching feature points of the feature points include: determining N images from the library image according to the descriptor of the image to be processed; determining the matching feature points of the feature points of the image to be processed from the feature points of the above N images.
  • the descriptor of the image to be processed is synthesized from the descriptors of the feature points of the image to be processed.
  • the library image is composed of N (N is a positive integer) images and M (M is a positive integer) images.
  • the distance between the descriptor of the processed image and the descriptor of any one of the above N images is less than or equal to the descriptor of the image to be processed and the descriptor of any one of the remaining M images in the library image distance.
  • the visual feature library When the visual feature library includes the descriptor of the image to be built, the visual feature library can be roughly screened according to the descriptor of the image to be processed, and N images with relatively close descriptors can be selected, and then the N images
  • the matching feature points of the feature points of the image to be processed are determined from the feature points of the image, which can accelerate the process of visual positioning and improve the efficiency of visual positioning.
  • the feature points of the above-mentioned library image are feature points of scene images in multiple scenes, and the above-mentioned descriptors are based on the feature points of the image to be processed, from the visual feature
  • the matching feature points of the feature points of the image to be processed are determined in the library, including: determining the target scene image from scene images in multiple scenes; according to the descriptor of the feature point of the image to be processed, from the feature points of the target scene image
  • the matching feature points of the feature points of the image to be processed are determined.
  • the target scene image is the scene image in which the corresponding scene in the scene images under multiple scenes is closest to the scene when the image to be processed is taken.
  • the target scene image that is closest to the scene when the image to be processed was taken can be determined from the multiple scene images, and then the target scene image can be determined from the target scene image. Processing the matching feature points of the feature points of the image can determine more accurate matching feature points for the feature points of the image to be processed, thereby improving the success rate of visual positioning.
  • the above-mentioned visual feature library includes the semantic information of the feature points of the library image and the confidence of the semantic information of the feature points of the library image.
  • 3D position determining the pose information when the photographing unit shoots the image to be processed, including: weighting the 3D position of the matching feature point according to the confidence of the semantic information of the matching feature point; determining the photographing unit to be processed according to the weighted processing result The pose information of the image.
  • the matching feature point with a higher degree of confidence corresponds to a greater weight.
  • the visual feature library contains the semantic information of the feature points of the library image and the confidence level of the semantic information of the feature points of the image
  • the semantics corresponding to different feature points can be taken into consideration when visual positioning is performed.
  • Information and confidence determine the importance of different feature points in visual positioning, enabling more precise visual positioning and improving the accuracy of visual positioning.
  • the above-mentioned library image is a panoramic image.
  • the visual feature library in the second aspect is constructed according to the method for constructing the visual feature library in the first aspect.
  • the visual feature library constructed by the method in the first aspect above contains more information about the feature points of the images in the library. Therefore, in the case of a certain number of images in the library, the first On the one hand, the visual feature library constructed by the method for visual positioning can improve the effect of visual positioning.
  • a device for constructing a visual feature library includes a module for executing the method in any one of the foregoing first aspect and the first aspect.
  • a visual positioning device which includes a module for executing the method in any one of the foregoing second aspect and the second aspect.
  • a device for constructing a visual feature library including a memory and a processor, the memory is used to store a program, the processor is used to execute the program, and when the program is executed, the processor is used to Execute the method in any one of the foregoing first aspect and the first aspect.
  • the processor can obtain the library image through (call) the communication interface (in this case, the library can be obtained from other devices through the communication interface).
  • the library image) or the library image is obtained from the memory (the library image is stored in the memory at this time), and then a series of processing is performed on the library image through the processor, and the visual feature library is finally constructed.
  • a visual positioning device including a memory and a processor, the memory is used to store a program, the processor is used to execute the program, and when the program is executed, the processor is used to execute the above The method in any one of the second aspect and the second aspect.
  • the processor may obtain the image to be processed through (call) the camera or obtain the image to be processed from the memory, and then use the processor Perform a series of processing on the image to be processed, and finally achieve visual positioning.
  • the device for constructing the visual feature database of the third aspect or the fifth aspect may be a server, a cloud device, or a computer device with certain computing capabilities.
  • the visual positioning device of the fourth aspect or the sixth aspect may specifically be a mobile phone, a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an Internet of Things device, a virtual reality device, an augmented reality device, and so on.
  • a computer-readable storage medium is provided.
  • the computer-readable medium storage medium is used to store program code.
  • the program code is executed by a computer, the computer is used to execute the first aspect and the first aspect described above.
  • a computer-readable storage medium is provided.
  • the computer-readable medium storage medium is used to store program code.
  • the program code is executed by a computer, the computer is used to execute the second aspect and the first The method in any one of the two aspects.
  • a chip in a ninth aspect, includes a processor, and the processor is configured to execute the method in any one of the foregoing first aspect and the first aspect.
  • the chip of the above-mentioned ninth aspect may be located in a server, or in a cloud device, or in a computer device with a certain computing capability capable of constructing a visual feature library.
  • a chip in a tenth aspect, includes a processor, and the processor is configured to execute the method in any one of the foregoing second aspect and the second aspect.
  • the chip of the tenth aspect described above may be located in a terminal device, which may be a mobile phone, a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an Internet of Things device, a virtual reality device, an augmented reality device, and so on.
  • a terminal device which may be a mobile phone, a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an Internet of Things device, a virtual reality device, an augmented reality device, and so on.
  • a computer program for causing a computer or a terminal device to execute the method in any one of the foregoing first aspect and the first aspect is provided.
  • a twelfth aspect provides a computer program (or a computer program product) for causing a computer or a terminal device to execute the method in any one of the foregoing second aspect and the second aspect.
  • FIG. 1 is a schematic flowchart of a method for constructing a visual feature library according to an embodiment of the present application
  • Figure 2 is a schematic diagram of feature extraction of library-built images
  • FIG. 3 is a schematic diagram of the process of determining the 3D position of the feature point of the library image
  • Figure 4 is a schematic diagram of performing semantic segmentation on a library image and obtaining semantic information and confidence of the library image
  • FIG. 5 is a schematic diagram of the process of obtaining the descriptor of the library image
  • FIG. 6 is a schematic diagram of the process of obtaining the descriptor of the library image
  • FIG. 7 is a schematic diagram of various scene images obtained by performing scene simulation on the library image
  • FIG. 8 is a schematic diagram of segmenting a library image to obtain a slice image
  • FIG. 9 is a schematic diagram of scene simulation and segmentation processing on the library image
  • FIG. 10 is a schematic diagram of segmentation processing and scene simulation of the library image
  • FIG. 11 is a schematic flowchart of a visual positioning method according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the method for constructing a visual feature library according to an embodiment of the application applied to a specific product form
  • FIG. 13 is a schematic block diagram of a device for constructing a visual feature library according to an embodiment of the present application.
  • FIG. 14 is a schematic block diagram of a visual positioning device according to an embodiment of the present application.
  • FIG. 15 is a schematic diagram of the hardware structure of the apparatus for constructing a visual feature library according to an embodiment of the present application.
  • 16 is a schematic diagram of the hardware structure of the visual positioning device provided by an embodiment of the present application.
  • FIG. 17 is a schematic diagram of the hardware structure of a terminal device according to an embodiment of the present application.
  • Visual positioning is the use of images or videos taken by terminal equipment and a pre-established 3D map, through a series of algorithms such as feature extraction, feature matching and perspective N-point projection (pespective-n-point, PNP), to estimate the terminal equipment The position and posture of the shooting unit.
  • Visual positioning can be applied in the fields of augmented reality, unmanned driving and intelligent mobile robots.
  • visual positioning can be specifically used for 3D navigation, 3D advertising, and virtual doll interaction.
  • virtual 3D navigation icons, etc. can be accurately placed in appropriate positions in the real scene to achieve precise positioning.
  • the accurate position of the vehicle can be obtained through visual positioning.
  • the position and posture of the intelligent mobile robot can be obtained in real time through visual positioning, and then the actions of the intelligent mobile robot can be controlled.
  • Fig. 1 is a schematic flowchart of a method for constructing a visual feature library according to an embodiment of the present application.
  • the method shown in Fig. 1 can be implemented with a visual feature library construction device.
  • the device for constructing the visual feature library may specifically be a server, a cloud device, or a computer device with a certain computing capability (the computing capability can satisfy the construction of the visual feature library).
  • the method shown in FIG. 1 includes steps 1001 to 1004, which are respectively described in detail below.
  • the aforementioned database-building image may be an image used to build a visual feature database, and the database-building image may be one image or multiple images.
  • the process of processing the library image in the embodiment of the present application can be regarded as processing any image of the library image.
  • the above-mentioned library-building image may be obtained by shooting with a camera, and the library-building image may be a panoramic image or a non-panoramic image (for example, a wide-angle image).
  • the above-mentioned library building image may also be called a library building image.
  • the visual feature library construction device can obtain the library image from the camera by communicating with the camera.
  • the device for constructing the visual feature library can directly obtain the library image from the memory.
  • the feature points of the library image there may be multiple feature points of the library image, and the feature points of the library image are obtained by feature extraction on the library image.
  • the library building image is unified.
  • the characteristic point of the image is the name.
  • a feature extraction algorithm can be used to perform feature extraction on the library image to obtain feature points of the library image and descriptors of the library image.
  • one or more feature extraction algorithms can be used to perform feature extraction on the library image to obtain feature points of the library image and descriptors of the library image.
  • the above-mentioned feature extraction algorithm is an algorithm for extracting feature points in an image and a descriptor of the feature points of the image.
  • the available feature extraction algorithms can include the following:
  • the above feature extraction algorithm can also be called a feature extraction operator.
  • one or more of the ORB algorithm, the SIFT algorithm, the SuperPoint algorithm, the D2-net algorithm, and the line feature algorithm may be used to perform feature extraction on the library image.
  • the visual feature library contains multiple types of feature points and feature point descriptors
  • the image to be processed when the image to be processed is visually positioned according to the visual feature library, it can be compared with the multiple types of feature points from the visual feature library. It can more accurately determine the matching feature points that match the feature points of the image to be processed, which can improve the effect of visual positioning.
  • three feature extraction algorithms can be used to perform feature extraction on the library image to obtain three types of feature points and descriptors of the three types of feature points.
  • the above three feature extraction algorithms can include:
  • the first type of feature points, the second type of feature points, and the third type of feature points can be the feature points obtained after feature extraction of the library image according to the ORB algorithm, the SIFT algorithm and the SuperPoint algorithm respectively.
  • the descriptors, the descriptors of the second type of feature points, and the descriptors of the third type of feature points are also obtained according to the corresponding feature extraction algorithm, and the 2D coordinates of each type of feature point can be directly obtained from the library image.
  • the 3D position of the feature point of the library image is the 3D position of the intersection of the ray corresponding to the feature point of the library image and the 3D model, and the ray corresponding to the feature point of the library image is the image of the library.
  • the projection center is the starting point and the ray passing through the characteristic point of the library image.
  • the library building image and the 3D model are located in the same coordinate system, and the projection center of the library building image is the position where the first shooting unit shoots the library building image (the first shooting unit).
  • the first photographing unit here is a photographing unit that photographs the library building image.
  • the coordinate of the feature point P of the library image in the image coordinate system of the library image is [x p y p ] T
  • the feature point P is transformed to the camera coordinate system through coordinate transformation to obtain the feature
  • the coordinates of the point P in the camera coordinate system are shown in formula (1).
  • [x o y o f] is the camera internal parameter, specifically, f is the focal length of the camera, and (x o y o ) is the position of the principal point of the camera.
  • the parameters of the rotation matrix can be determined according to the positional relationship between the camera coordinate system and the world coordinate system. It is the coordinates of the projection center of the camera in the world coordinate system.
  • the feature point P is converted to the world coordinate system, which is equivalent to converting the library image to the world coordinate system, and the 3D model shown in Figure 3 (located in the world coordinate system in Figure 3
  • the hexahedron represents the 3D model) itself is located in the world coordinate system, therefore, the library image and the 3D model are both located in the world coordinate system.
  • construct a ray passing through the feature point P using the origin of the camera coordinate system as a starting point, construct a ray passing through the feature point P.
  • the 3D position of the intersection point of the ray and the 3D model is the 3D position of the feature point P.
  • the position coordinates of the intersection are Therefore, the 3D position of the feature point P obtained by the intersection of the rays is
  • Construct a visual feature library which includes the descriptors of the feature points of the library image and the 3D positions of the feature points of the library image.
  • the 3D position of the feature point of the library image is obtained by intersecting the ray with the 3D model.
  • the number of images in the library can be obtained Under certain circumstances, a greater number of feature point information can be obtained from the library image, so that the constructed visual feature library contains a greater number of feature point information.
  • the visual feature library constructed by this application contains a larger number of feature point information, so that a better visual positioning effect can be achieved when the visual feature library is subsequently used for visual positioning.
  • the method for constructing a visual feature database of the present application contains a larger number of feature point information under the condition that the number of images to be constructed in the database is fixed, the method for constructing a visual feature database of the present application can be applied In scenes with large radiation differences and weak textures that are difficult to accurately perform visual positioning, in these scenes, the visual feature library obtained by the method for constructing the visual feature library of the embodiment of the present application can be used for visual positioning to achieve better vision. Positioning effect.
  • the visual feature library may also include the following two kinds of information.
  • one or more of the three types of information can be written (save to) the visual feature In the library.
  • step A and step B can be used to determine the semantic information and confidence of the feature points of the library image.
  • Step A Perform semantic segmentation on the library image to obtain the semantic segmentation result of the library image
  • Step B Generate the semantic information of the feature points of the library image and the confidence of the semantic information of the feature points of the library image according to the semantic segmentation result of the library image.
  • the semantic segmentation result of the library image obtained in step A includes the semantic information of each region of the library image and the confidence level of the semantic information of each region.
  • the semantic information of the region where the feature points of the library image are located can be determined as the semantic information of the feature points of the library image, and the confidence of the semantic information of the region where the feature points of the library image is located is determined as The confidence level of the semantic information of the feature points of the library image.
  • the library image can be divided into six regions corresponding to the semantics of the images of the six regions: pedestrians, roads, trees, buildings, sky, and glass.
  • the feature points of the library image can specifically determine the image area where the feature points of the library image are located and the image where the feature points are located according to the 2D coordinates of the feature points of the library image.
  • the semantic information of the area is the semantic information of the feature point, so that the feature point information of the library image not only includes the 2D coordinates of the feature point and the descriptor of the feature point, but also the semantic information of the feature point and the semantic information of the feature point. Confidence.
  • the semantics of the image area where the feature point is located is determined by the 2D coordinates of the feature point as a road, then it can be determined that the semantics of the feature point is also a road.
  • the process shown in Figure 4 above is to directly perform semantic segmentation on the directly built image to finally obtain the semantic category of the feature point of the library image (the specific form of semantic information) and the semantics of the feature point of the library image The confidence level of the category.
  • the descriptor of the library image can be obtained by synthesizing the descriptors of the feature points of the library image.
  • the descriptors of the library-building image By synthesizing the descriptors of the feature points of the library-building image, the descriptors of the library-building image can be obtained, and the descriptors of the library-building image can be written into the visual feature library, making the information contained in the visual feature library richer.
  • a feature extraction algorithm (which can be one of ORB algorithm, SIFT algorithm, SuperPoint algorithm, D2-net algorithm, and line feature algorithm) is used to extract the features of the library image to obtain the feature point Descriptor.
  • the feature point descriptor is synthesized to obtain the descriptor of the library image, and then the descriptor of the feature point and the descriptor of the library image are written into the visual feature library.
  • three different feature extraction algorithms can be any three of ORB algorithm, SIFT algorithm, SuperPoint algorithm, D2-net algorithm, and line feature algorithm
  • the descriptors of the first type of feature points can be merged to obtain the first type of descriptors for the library building image
  • the descriptors of the second type of feature points can be combined to obtain the second type of descriptors for the library building image.
  • Merging the descriptors of the third type of feature points to obtain the third type of descriptors of the library image.
  • different types of descriptors of the feature points and different types of descriptors of the library image can be written into the visual feature library.
  • the descriptors of the library-building images can be saved in the image retrieval library in the visual feature library, which is convenient for searching during subsequent visual positioning.
  • step 1002 specifically includes:
  • the multiple scenes in the above step 1002a may include at least two of day, night, rainy, snowy, and cloudy.
  • scene images in different scenes can be obtained, and then information extracted from different scene images can be obtained, so that the finally generated visual feature library contains more information.
  • scene images of daytime, night, rainy, and snowy scenes can be obtained.
  • FIG. 7 only shows scene images in some scenes.
  • scene images in other scenes can be obtained by performing scene simulation on the library image, for example, scene images in cloudy and cloudy scenes. .
  • the process shown in FIG. 7 above is to directly perform scene simulation on the library image to obtain multiple scene images.
  • the library image can be segmented to obtain slice images of different perspectives, thereby eliminating the library image and the user.
  • the difference in the imaging modes of the captured images enables the user to determine the matching feature points of the image captured by the user more accurately when visually positioning the image captured by the user according to the visual feature library.
  • step 1002 specifically includes:
  • the database image in the above step 1002c may specifically be a panoramic image or a wide-angle image.
  • the database image can be segmented (also called projection processing) to obtain slice image 1 to slice image K.
  • Different slice images correspond to different perspectives (slice image 1 to slice image K).
  • the corresponding viewing angles are respectively viewing angle 1 to viewing angle K), in slice image 1 to slice image K, slice image i and slice image i+1 are adjacent slice images, where 1 ⁇ i ⁇ K, and K is a positive integer .
  • K can be set according to the requirements of building a visual feature library.
  • the viewing angle range of the image taken by the user and the segmented image obtained by segmentation can be closer.
  • the value of K may be 8, 12, 16, or the like.
  • part of the image content of adjacent slice images obtained by the segmentation process shown in FIG. 8 is the same.
  • slice image 1 and slice image 2 are adjacent, and part of the image content of slice image 1 and slice image 2 are the same.
  • segmentation process shown in FIG. 8 is to directly segment the library image to obtain multiple slice images.
  • scene simulation on the library image first (the process of performing the scene simulation can be shown in FIG. 7), and then segment each scene image to obtain multiple slice images.
  • the visual feature library obtained by performing a series of processing on the image of the library, in order to make the information contained in the visual feature library more real-time, it can also be crowdsourced to update.
  • the crowdsourcing update here means that you can receive the image to be processed from the user device, and perform a series of processing on the image to be processed, and write the descriptors of the feature points of the image to be processed and the descriptor of the image to be processed into the visual feature library. In this way, the visual feature library is updated so that the visual feature library contains more information.
  • the feature extraction of the library image is performed to obtain the feature points of the library image and the descriptors of the feature points of the library image, including:
  • 1002f Perform feature extraction on multiple slice images to obtain feature points of the library-built image and descriptors of the feature points of the library-built image.
  • the foregoing multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • part of the image content of adjacent slice images is the same.
  • the library image is simulated to obtain the scene images in the three scenes as the first scene image, the second scene image and the third scene image.
  • the first scene image, the second scene image and The third scene image is segmented to obtain multiple slice images.
  • each scene image is segmented to obtain 8 slice images
  • 24 slice images can be obtained, and then the 24 slice images are obtained.
  • Feature extraction is performed on slices of images to obtain the feature points and descriptors of the library image.
  • the image in the daytime scene is segmented to obtain 8 sliced images with viewing angles 1 to 8.
  • the images in these four scenes are segmented, and finally slice image 1 to slice image 32 are obtained.
  • feature extraction can be performed on the 32 slice images to obtain the feature points of the library image and the library image Descriptor of the feature point.
  • the number of slice images obtained by segmentation may be different.
  • the number of slice images obtained by segmenting the scene images in the four scenes of day, night, rain and snow are 8, 8, 12, and 12 respectively (the number here is only an example, and it can be other Quantity).
  • step 1002 it is also possible to perform segmentation processing on the library image first to obtain slice images, and then perform scene simulation on each slice image (of course, it is also possible to perform scene simulation only on part of the slice images).
  • the feature extraction of the library image is performed to obtain the feature points of the library image and the descriptors of the feature points of the library image, including:
  • the above-mentioned multiple slice images part of the image content of adjacent slice images is the same, and the above-mentioned multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • the database image is segmented to obtain 8 slice images, and then the scene simulation is performed on these 8 slice images.
  • scene simulation is performed on each slice image to obtain scene images under 4 kinds of scenes. Then, by performing scene simulation on these 8 slice images separately, 32 scene images can be obtained, and then the 32 scene images are characterized Extraction, so as to get the feature points and descriptors of the library image.
  • the above-mentioned feature extraction using multiple feature extraction algorithms for the library image can be to first perform segmentation processing and/or scene simulation on the library image, and then perform feature extraction on the resulting image, so as to obtain the feature points and the characteristics of the library image. Describes the descriptors of the feature points of the library image.
  • the database image can be segmented first to obtain slice image 1 to slice image 8, and then scene simulations are performed on these 8 slice images to obtain day, night, and rainy days.
  • the scene image under the scene Specifically, as shown in FIG. 10, by performing scene simulation on slice image 1 to slice image 8, scene image 1 to scene image 24 are obtained.
  • feature extraction can be performed on scene image 1 to scene image 24 to Obtain the feature points of the library image and the descriptors of the feature point of the library image.
  • step 1002 it is also possible to perform segmentation processing or scene simulation on the library-built images respectively, and then use multiple feature extraction algorithms to perform feature extraction.
  • the feature extraction of the library image is performed to obtain the feature points of the library image and the descriptors of the feature points of the library image, including:
  • 1002k Use multiple feature extraction algorithms to perform feature extraction on each of the multiple slice images to obtain feature points of the library image and descriptors of the feature points of the library image.
  • the database image is segmented to obtain 12 sliced images.
  • 3 feature extraction algorithms are used to extract the features of each of the 12 slice images to obtain the image of the database.
  • the feature point and the descriptor of the feature point of the library image are used.
  • the feature extraction of the library image is performed to obtain the feature points of the library image and the descriptors of the feature points of the library image, including:
  • 1002h Use multiple feature extraction algorithms to perform feature extraction on scene images in multiple scenarios, respectively, to obtain feature points of the library-built image and descriptors of the feature points of the library-built image.
  • the aforementioned multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • scene simulation is performed on the library image to obtain scene images in 4 scenes.
  • 3 feature extraction algorithms are used to extract the features of the scene images in these 4 scenes respectively to obtain the image of the library.
  • the feature point and the descriptor of the feature point of the library image are used to obtain the image of the library.
  • step 1002 it is also possible to perform segmentation processing and scene simulation on the library image (first segmentation processing and then scene simulation, or scene simulation first and segmentation processing), and then use multiple features
  • the extraction algorithm performs feature extraction.
  • the feature extraction of the library image is performed to obtain the feature points of the library image and the descriptors of the feature points of the library image, including:
  • 1002w Use multiple feature extraction algorithms to perform feature extraction on multiple slice images to obtain feature points of the library image and descriptors of the feature point of the library image.
  • the foregoing multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • part of the image content of adjacent slice images is the same.
  • scene simulation is performed on the library image to obtain the scene images in the three scenes as the first scene image, the second scene image, and the third scene image.
  • the first scene image and the second scene image Perform segmentation processing with the third scene image, and segment each scene image to obtain 8 slice images.
  • segmenting the first scene image, the second scene image, and the third scene image 24 can be obtained.
  • three feature extraction algorithms are used to extract the features of the 24 slice images respectively to obtain the feature points and descriptors of the library image.
  • the feature extraction of the library image is performed to obtain the feature points of the library image and the descriptors of the feature points of the library image, including:
  • 1002z Use multiple feature extraction algorithms to perform feature extraction on scene images in multiple scenes to obtain feature points of the library image and descriptors of the feature points of the library image.
  • the above-mentioned multiple slice images part of the image content of adjacent slice images is the same, and the above-mentioned multiple scenes include at least two of day, night, rainy, snowy, and cloudy.
  • the database image is segmented to obtain 8 sliced images.
  • scene simulation is performed on each of the 8 sliced images to obtain scene images under 4 kinds of scenes.
  • 32 images are used to extract features of these 32 images respectively, so as to obtain the feature points and descriptors of the library image.
  • the method shown in FIG. 1 further includes the following steps:
  • the updated visual feature library includes the feature points of the image to be processed and the 3D position of the feature points of the image to be processed.
  • the 3D position of the feature point of the image to be processed is the 3D position of the intersection of the ray corresponding to the feature point of the image to be processed and the 3D model, and the ray corresponding to the feature point of the image to be processed is the projection center of the image to be processed as the starting point , And passing through the characteristic points of the image to be processed, the image to be processed and the 3D model are located in the same coordinate system, and the projection center of the image to be processed is the position where the second photographing unit takes the image to be processed.
  • the information contained in the updated visual feature library is more real-time .
  • step 2001 to step 2003 is the same as the process described in steps 1001 to 1003 above, and will not be described in detail here.
  • the above-mentioned reference image is the image closest to the position of the image to be processed in the visual feature library
  • the position of the reference image and the position of the image to be processed can be determined by the 3D position of the respective feature points.
  • the 3D feature points of each image in the library image can be determined.
  • the position is compared with the 3D position of the image to be processed, and the image whose 3D position of a feature point is closest to the 3D position of the image to be processed is selected as the reference image (the 3D position of the feature point and the 3D position of the image to be processed most overlap) .
  • the semantic information of the reference image in the visual feature library when the semantic information of the reference image in the visual feature library is different from that of the image to be processed, it means that the image content of the object corresponding to the image to be processed may have changed.
  • the visual feature database is updated in time to improve the real-time performance of the visual feature database.
  • FIG. 11 is a schematic flowchart of a visual positioning method according to an embodiment of the present application.
  • the method shown in FIG. 11 can use the visual feature library constructed by the method shown in FIG. 1 for visual positioning.
  • the method shown in Fig. 11 can be executed by a visual positioning device, which specifically can be a mobile phone, a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an Internet of Things device, a virtual reality device, an augmented reality device, and so on.
  • the method shown in FIG. 11 includes steps 3001 to 3004, and steps 3001 to 3004 are described in detail below.
  • the foregoing image to be processed may be an image taken by a visual positioning device.
  • the image to be processed may be an image taken by a mobile phone.
  • step 3002 For the specific implementation process of the foregoing step 3002, reference may be made to the above description of step 1002. In order to avoid unnecessary repetition and redundancy, a detailed description is omitted here.
  • the matching feature point of the feature point of the image to be processed can be determined from the visual feature library according to the descriptor of the feature point of the image to be processed, and the descriptor of the matching feature point is the same as that in the visual feature library.
  • the descriptors of the feature points of the image to be processed are closest.
  • the aforementioned visual feature library includes the descriptors of the feature points of the library image and the 3D positions of the feature points of the library image, and the aforementioned visual feature library satisfies at least one of the following conditions:
  • the feature points of the library image include multiple sets of feature points, and the description methods of any two sets of feature points in the multiple sets of feature points are different;
  • the visual feature library includes the descriptor of the library image, and the descriptor of the library image is synthesized from the descriptors of the feature points of the library image;
  • the feature points of the library image are the feature points of various scene images.
  • the various scene images are obtained by scene simulation of the library image.
  • the various scenes include at least two of day, night, rain, snow and cloudy. ;
  • the feature points of the library image and the descriptors of the feature point of the library image are obtained by feature extraction of multiple slice images.
  • the multiple slice images are obtained by segmenting the database image. Among them, in the multiple slices In the image, part of the image content of adjacent slice images is the same;
  • the visual feature library includes the semantic information of the feature points of the library image and the confidence of the semantic information of the feature points of the library image.
  • the visual feature library in the embodiment of the present application contains richer information. Therefore, when performing visual positioning according to the visual feature library in the embodiment of the present application, better visual positioning can be obtained The effect can make the visual positioning effect more accurate.
  • the 3D position of the matching feature point when determining the pose information when the photographing unit shoots the image to be processed, the 3D position of the matching feature point can be determined as the location of the feature point of the image to be processed, and then according to the The position of the feature point of the image determines the pose information when the photographing unit photographs the image to be processed.
  • the pose information when the photographing unit shoots the image to be processed can be derived from the 3D positions of the multiple feature points of the image to be processed.
  • the name of the feature point of the image to be processed is uniformly adopted.
  • the visual positioning process of the embodiment of the present application may be different. The following describes the visual positioning process when the visual feature library contains different information.
  • Case 1 The feature points of the library image include multiple groups of feature points.
  • determining the matching feature points of the feature points of the image to be processed in step 3003 specifically includes:
  • the description manner of the descriptors of the feature points of the image to be processed is the same as the description manner of the feature points of the target group.
  • the above-mentioned multiple sets of feature points are respectively obtained by using different feature extraction algorithms to perform feature extraction on the library image.
  • the feature points of the image to be built include multiple sets of feature points
  • the feature points contained in the visual feature library have more information.
  • the target feature points with the same description method can be subsequently selected from the target feature points that match the feature points of the image to be processed, which improves the effect of visual positioning.
  • Case 2 The visual feature library includes the descriptor of the library image.
  • determining the matching feature points of the feature points of the image to be processed in the above step 3003 specifically includes:
  • the descriptor of the image to be processed is synthesized from the descriptors of the feature points of the image to be processed, and the distance between the descriptor of the image to be processed and the descriptor of any one of the above N images is less than or equal to The distance between the descriptor of the processed image and the descriptor of any one of the remaining M images in the library image.
  • the library image is composed of N images and M images.
  • the visual feature library when the visual feature library includes the descriptors of the built image, the visual feature library can be roughly screened according to the descriptors of the image to be processed, and N images with relatively close descriptors can be selected, and then Determining the matching feature points of the feature points of the image to be processed from the feature points of the N images can accelerate the process of visual positioning and improve the efficiency of visual positioning.
  • determining the matching feature points of the feature points of the image to be processed in step 3003 specifically includes:
  • a matching feature point of the feature point of the image to be processed is determined from the feature points of the target scene image.
  • the scene corresponding to the target scene image is the closest to the scene when the image to be processed was taken.
  • the target scene image that is closest to the scene when the image to be processed was taken can be determined from the multiple scene images, and then the target scene Determining the matching feature points of the feature points of the image to be processed in the image can determine more accurate matching feature points for the feature points of the image to be processed, thereby improving the success rate of visual positioning.
  • the visual feature library includes the semantic information of the feature points of the library image and the confidence of the semantic information of the feature points of the library image.
  • the posture information determined in step 3004 when the photographing unit photographs the image to be processed specifically includes:
  • the weight corresponding to the matching feature point with higher confidence is greater.
  • the visual feature library contains the semantic information of the feature points of the library image and the confidence level of the semantic information of the feature points of the image
  • the semantics corresponding to different feature points can be taken into consideration when visual positioning is performed.
  • Information and confidence determine the importance of different feature points in visual positioning, enabling more precise visual positioning and improving the accuracy of visual positioning.
  • FIG. 12 is a schematic diagram of the method for constructing a visual feature library according to an embodiment of the application applied to a specific product form.
  • the panoramic camera is used to shoot to obtain the library image
  • the drone or laser scanner is used to scan to obtain the modeling data.
  • the internal parameters of the panoramic camera, drone and laser scanner have been calibrated Great.
  • drones can be used to obtain modeling data in outdoor scenarios
  • laser scanners can be used to obtain modeling data in indoor scenarios.
  • the library image and modeling data can be processed through each module in the server, and finally the 3D position of the feature point of the library image and the descriptor of the library image are obtained. Save the descriptor of the library image in the image retrieval library, and save the 3D position of the feature point of the library image in the 3D feature library.
  • the software module in the server can be used to process the database image and modeling data.
  • the modeling data includes modeling images and point cloud data.
  • the data alignment module can be used to align the database image and the modeling image, and then the 3D module can be combined with the modeling data to perform 3D modeling to obtain a 3D model;
  • the semantic information of the feature points of the library image and the confidence of the semantic information of the feature points of the library image can be determined according to the semantic recognition module;
  • the scene simulation of the library image can be performed according to the scene simulation module to obtain scenes in various scenarios
  • the feature extraction module can be used to perform feature extraction on the library image to obtain the feature points of the library image and the descriptor of the feature point of the library image.
  • the 3D position acquisition module can be used to determine the 3D position of the feature point of the library image.
  • Table 1 respectively shows the success rate of visual positioning using the visual feature library constructed by the existing scheme and the scheme of the application.
  • the first column is the construction scheme of the visual feature library.
  • the traditional scheme is the construction scheme of the visual feature library based on structure from motion (SFM).
  • SFM structure from motion
  • the scheme of this application is implemented for this application
  • Example of the visual feature library construction method the second column is the corresponding visual positioning scheme, including ORB positioning (visual positioning scheme using ORB feature extraction algorithm to extract features) and rootSIFT positioning (visual positioning scheme using ORB feature extraction algorithm to extract features ), the third column is the success rate of visual positioning.
  • the success rate of visual positioning in the visual feature library obtained based on the solution of this application is higher than the success rate of visual positioning based on the visual feature library obtained from the traditional solution.
  • the success rate of visual positioning based on the visual feature library obtained by the solution of this application is 93%, which is far greater than the 61% success rate of visual positioning based on the visual feature library obtained based on the traditional scheme.
  • rootSIFT positioning The success rate of visual positioning based on the visual feature library obtained by the solution of this application is 98%, which is far greater than the 71% success rate of visual positioning based on the visual feature library obtained based on the traditional solution.
  • the construction method of the visual feature library and the visual positioning method of the embodiment of the present application are described in detail above in conjunction with the accompanying drawings.
  • the construction device and visual positioning device of the visual feature library of the embodiments of the present application are introduced below in conjunction with the accompanying drawings. It should be understood that, The visual feature library construction device introduced below can execute the visual feature library construction method of the embodiment of the present application, and the visual positioning device introduced below can implement the visual positioning method of the embodiment of the present application. When introducing these two devices, repeated descriptions are appropriately omitted.
  • FIG. 13 is a schematic block diagram of an apparatus for constructing a visual feature library according to an embodiment of the present application.
  • the device 5000 shown in FIG. 13 includes an acquisition unit 5001, a feature extraction unit 5002, a position determination unit 5003, and a construction unit 5004.
  • the device 5000 shown in FIG. 13 may be specifically used to execute the method shown in FIG. 1. Specifically, the acquisition unit 5001 is used to perform step 1001, the feature extraction unit 5002 is used to perform step 1002, the position determination unit 5003 is used to perform step 1003, and the construction unit 5004 is used to perform step 1004.
  • Fig. 14 is a schematic block diagram of a visual positioning device according to an embodiment of the present application.
  • the device 6000 shown in FIG. 14 includes an acquisition unit 6001, a feature extraction unit 6002, a feature matching unit 6003, and a visual positioning unit 6004.
  • the device 6000 shown in FIG. 14 may be specifically used to execute the method shown in FIG. 11. Specifically, the acquisition unit 6001 is used to perform step 3001, the feature extraction unit 6002 is used to perform step 3002, the feature matching unit 6003 is used to perform step 3003, and the visual positioning unit 6004 is used to perform step 3004.
  • FIG. 15 is a schematic diagram of the hardware structure of the apparatus for constructing a visual feature library according to an embodiment of the present application.
  • the device 7000 for constructing a visual feature library shown in FIG. 15 includes a memory 7001, a processor 7002, a communication interface 7003, and a bus 7004. Among them, the memory 7001, the processor 7002, and the communication interface 7003 implement communication connections between each other through the bus 7004.
  • the above-mentioned processor 7002 can obtain the library building image through (calling) the communication interface 7003 (the library building image can be obtained from other devices through the communication interface at this time) or the library building image from the memory 7001 (the library building image is stored in the memory at this time) 7001), the processor 7002 performs a series of processing on the library image, and finally builds a visual feature library.
  • the memory 7001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 7001 may store a program. When the program stored in the memory 7001 is executed by the processor 7002, the processor 7002 is configured to execute each step of the method for constructing a visual feature library in the embodiment of the present application.
  • the processor 7002 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the method for constructing the visual feature library in the method embodiment of the present application.
  • the processor 7002 may also be an integrated circuit chip with signal processing capabilities.
  • each step of the method for constructing a visual feature library of the present application can be completed by an integrated logic circuit of hardware in the processor 7002 or instructions in the form of software.
  • the aforementioned processor 7002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic Devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 7001, and the processor 7002 reads the information in the memory 7001, combines its hardware to complete the functions required by the units included in the device for constructing the visual feature library, or executes the visual feature library of the method embodiment of the application The construction method.
  • the communication interface 7003 uses a transceiving device such as but not limited to a transceiver to implement communication between the visual feature library construction device 7000 and other devices or communication networks. For example, the information of the neural network to be constructed and the training data needed in the process of constructing the neural network can be obtained through the communication interface 7003.
  • a transceiving device such as but not limited to a transceiver to implement communication between the visual feature library construction device 7000 and other devices or communication networks. For example, the information of the neural network to be constructed and the training data needed in the process of constructing the neural network can be obtained through the communication interface 7003.
  • the bus 7004 may include a path for transmitting information between the various components of the visual feature library construction device 7000 (for example, the memory 7001, the processor 7002, and the communication interface 7003).
  • the acquisition unit 5001 in the device 5000 for constructing a visual feature database may be equivalent to the communication interface 7003 in the device 7000 for constructing a visual feature database, and is used to obtain a database image.
  • the feature extraction unit 5002, the position determination unit 5003, and the construction unit 5004 in the visual feature database construction device 5000 are equivalent to the processor 7002 in the visual feature database construction device 7000, and are used to perform a series of processing on the library image Finally build a visual feature library.
  • FIG. 16 is a schematic diagram of the hardware structure of the visual positioning device provided by an embodiment of the present application.
  • the visual positioning device 8000 shown in FIG. 16 includes a memory 8001, a processor 8002, a communication interface 8003, and a bus 8004. Among them, the memory 8001, the processor 8002, and the communication interface 8003 implement communication connections between each other through the bus 8004.
  • the above-mentioned processor 8002 can obtain the image to be processed by (calling) the camera (not shown in FIG. 16) or obtain the image to be processed from the memory 8001, and then perform a series of processing on the image to be processed by the processor 8002, and finally realize visual positioning .
  • the foregoing memory 8001 may be used to store a program, and the processor 8002 is used to execute the program stored in the memory 8001.
  • the processor 8002 is used to execute each step of the visual positioning method in the embodiment of the present application.
  • the acquisition unit 6001 in the above-mentioned visual positioning device 6000 may be equivalent to the communication interface 8003 in the visual positioning device 8000 for acquiring the image to be processed.
  • the feature extraction unit 6002, feature matching unit 6003, and visual positioning unit 6004 in the above-mentioned visual positioning device 6000 are equivalent to the processor 8002 in the visual positioning device 8000, and are used to perform a series of processing on the image to be processed and determine that the photographing unit is photographed to be processed The pose information of the image.
  • the device 5000 for constructing a visual feature library shown in FIG. 13 and the device 7000 for constructing a visual feature library shown in FIG. 15 may specifically be a server, a cloud device, or a computer device with a certain computing capability.
  • the above-mentioned visual positioning device 6000 shown in FIG. 14 and the visual positioning device 8000 shown in FIG. 16 may specifically be mobile phones, computers, personal digital assistants, wearable devices, in-vehicle devices, Internet of Things devices, virtual reality devices, augmented reality devices, etc. Wait.
  • the visual positioning method in the embodiment of the present application may be executed by a terminal device.
  • the structure of the terminal device will be described in detail below with reference to FIG. 17.
  • FIG. 17 is a schematic diagram of the hardware structure of a terminal device according to an embodiment of the present application.
  • the terminal device shown in FIG. 17 can execute the visual positioning method of the embodiment of the present application.
  • the terminal device shown in FIG. 17 can execute each step of the visual positioning method shown in FIG. 3. Specifically, the image to be processed can be obtained through the camera 3060 (the camera can perform the above step 3001), and then the processor can process the image to be processed to achieve visual positioning (the processor can perform the above steps 3002 to 3004).
  • the terminal device shown in FIG. 17 includes a communication module 3010, a sensor 3020, a user input module 3030, an output module 3040, a processor 3050, a camera 3060, a memory 3070, and a power supply 3080. These modules are described in detail below.
  • the communication module 3010 may include at least one module that enables communication between the terminal device and other devices (for example, cloud devices).
  • the communication module 3010 may include one or more of a wired network interface, a broadcast receiving module, a mobile communication module, a wireless Internet module, a local area communication module, and a location (or positioning) information module.
  • the sensor 3020 can sense some operations of the user, and the sensor 3020 can include a distance sensor, a touch sensor, and so on.
  • the sensor 3020 can sense operations such as the user touching the screen or approaching the screen.
  • the user input module 3030 is used to receive input digital information, character information, or contact touch operation/non-contact gestures, and receive signal input related to user settings and function control of the system.
  • the user input module 3030 includes a touch panel and/or other input devices.
  • the output module 3040 includes a display panel for displaying information input by the user, information provided to the user, or various menu interfaces of the system.
  • the output module 3040 can display visual positioning results.
  • the display panel may be configured in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • the touch panel can cover the display panel to form a touch display screen.
  • the output module 3040 may also include an audio output module, an alarm, and a haptic module.
  • the camera 3060 is used to capture images.
  • the images captured by the camera 3060 can be sent to the processor for visual positioning.
  • the processor processes the images captured by the camera (specific processing procedures can be shown in steps 3001 to 3004) to obtain The pose information of the camera 3060 when the image is taken.
  • the power supply 3080 can receive external power and internal power under the control of the processor 3050, and provide power required by the various modules of the entire terminal device during operation.
  • the processor 3050 may indicate one or more processors.
  • the processor 3050 may include one or more central processing units, or include a central processing unit and a graphics processor, or include an application processor and a coprocessor (For example, micro control unit or neural network processor).
  • the processor 3050 includes multiple processors, the multiple processors may be integrated on the same chip, or each may be an independent chip.
  • a processor may include one or more physical cores, where the physical core is the smallest processing module.
  • the memory 3070 stores computer programs, and the computer programs include an operating system program 3071 and an application program 3072.
  • Typical operating systems such as Microsoft’s Windows, Apple’s MacOS, etc. are used in desktop or notebook systems, as well as systems developed by Google based on Android System and other systems used in mobile terminals.
  • the resource scheduling method in the embodiment of the present application is implemented by software, it can be considered to be implemented by the application 3071.
  • the memory 3070 may be one or more of the following types: flash memory, hard disk type memory, micro multimedia card type memory, card type memory (such as SD or XD memory), random access memory (random access memory) , RAM), static random access memory (static RAM, SRAM), read-only memory (read only memory, ROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), programmable Read-only memory (programmable ROM, PROM), magnetic memory, magnetic disk or optical disk.
  • the memory 3070 may also be a network storage device on the Internet, and the system may perform operations such as updating or reading the memory 3070 on the Internet.
  • the processor 3050 is used to read the computer program in the memory 3070 and then execute the method defined by the computer program. For example, the processor 3050 reads the operating system program 3072 to run the operating system on the system and implement various functions of the operating system, or read One or more application programs 3071 are taken to run applications on the system.
  • the aforementioned memory 3070 may store a computer program (the computer program is a program corresponding to the resource scheduling method of this embodiment of the application), and when the processor 3050 executes the extreme and the program, the processor 3050 can execute the Resource scheduling method.
  • the memory 3070 also stores other data 3073 besides computer programs.
  • the memory 3070 can store the load characteristics of the frame drawing thread involved in the resource scheduling method of the present application, the load prediction value of the frame drawing thread, and so on.
  • connection relationship of each module in FIG. 17 is only an example, and each module in FIG. 17 may also have other connection relationships.
  • all modules in the terminal device are connected through a bus.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请提供了视觉特征库的构建方法、视觉定位方法、装置和存储介质。该视觉特征库的构建方法包括:获取建库图像,并对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子;通过将建库图像的特征点对应的射线与3D模型相交,将该射线与3D模型的交点的3D位置确定为建库图像的特征点的3D位置,接下来再将建库图像的特征点的描述子和建库图像写入到视觉特征库中,从而完成视觉特征库的构建。本申请能够在建库图像数量一定的情况下,从建库图像中提取更多数量的特征点的信息,从而使得构建得到的视觉特征库包含更多数量的建库图像的特征点的信息,便于后续根据该视觉特征库更好地进行视觉定位。

Description

视觉特征库的构建方法、视觉定位方法、装置和存储介质
本申请要求于2019年08月09日提交中国专利局、申请号为201910736102.2、申请名称为“视觉特征库的构建方法、视觉定位方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视觉定位领域,并且更具体地,涉及一种视觉特征库的构建方法、视觉定位方法、装置和存储介质。
背景技术
视觉定位被广泛运用到多种领域中,例如,自动驾驶,增强现实等领域。视觉定位一般是利用事先建立好的视觉特征库,通过相机拍摄的单张图像来推算相机的位姿信息。
为了实现较为准确的视觉定位,一般要构建出包含足够信息的视觉特征库。传统方案一般是对采集图像进行特征提取和特征匹配,然后获取匹配特征点的描述子和三维(3 dimensions,3D)位置,并将匹配特征点的描述子和3D位置保存到视觉特征库中。
传统方案在构建视觉特征库时需要先进行特征的匹配,只有采集图像中匹配成功的特征点的信息才能保存到视觉特征库中,这样就使得传统方案在建库图像数量一定的情况下,只能从建库图像中采集得到较少的特征点的信息,使得最终构建得到的视觉特征库中包含的建库图像的特征点的信息偏少。
发明内容
本申请提供一种视觉特征库的构建方法、视觉定位方法、装置和存储介质,通过建库图像的特征点对应的射线与3D模型进行射线相交的方式,能够在建库图像数量一定的情况下,从建库图像中提取更多数量的特征点的信息,从而使得构建得到的视觉特征库包含更多数量的建库图像的特征点的信息,便于后续根据该视觉特征库更好地进行视觉定位。
第一方面,提供了一种视觉特征库的构建方法,该方法包括:获取建库图像;对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子;将建库图像的特征点对应的射线与3D模型相交,以确定建库图像的特征点的3D位置;构建视觉特征库,视觉特征库包括建库图像的特征点的描述子和建库图像的特征点的3D位置。
其中,上述建库图像的特征点的3D位置为射线与3D模型相交的交点的3D位置,建库图像的特征点对应的射线是以建库图像的投影中心为起点,并经过建库图像的特征点的射线。
可选地,上述建库图像与3D模型位于同一坐标系中,建库图像的投影中心为第一拍摄单元拍摄建库图像时所处的位置。
上述建库图像与3D模型可以位于同一世界坐标系中。
上述第一拍摄单元是拍摄建库图像的拍摄单元,该第一拍摄单元具体可以是摄像头。
上述建库图像为一张图像或者多张图像。
上述建库图像是通过相机或者其他图像拍摄设备拍摄得到的,该建库图像用于构建视觉特征库。
上述建库图像可以是全景图像,广角图像等。
可选地,上述获取建库图像,包括:从相机或者图像拍摄设备获取建库图像。
当建库图像是由相机或者图像拍摄设备拍摄得到时,可以与相机或者图像拍摄设备建立通信连接(可以是有线通信也可以是无线通信),以获取建库图像。
应理解,上述建库图像的特征包括多个特征点。
本申请中,通过射线与3D模型相交的方式来获取建库图像特征点的3D位置,与传统方案中仅能够获取图像之间匹配上的特征点的3D位置相比,可以在建库图像数量一定的情况下,从建库图像中获取到更多数量的特征点的信息,使得构建得到的视觉特征库包含更多数量的特征点的信息。
进一步的,由于在建库图像数量一定的情况下,本申请构建得到的视觉特征库包含更多数量的特征点的信息,使得后续利用该视觉特征库进行视觉定位时取得更好的视觉定位效果。
此外,由于在建库图像数量一定的情况下,本申请的视觉特征库的构建方法构建得到的视觉特征库包含更多数量的特征点的信息,使得本申请的视觉特征库的构建方法能够适用于辐射差异较大、弱纹理等较难进行准确进行视觉定位的场景中,在这些场景中采用本申请实施例的视觉特征库的构建方法得到的视觉特征库进行视觉定位能够取得更好的视觉定位效果。
结合第一方面,在第一方面的某些实现方式中,对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:采用特征提取算法对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
上述特征提取算法是用于提取建库图像的特征点和建库图像的特征点的描述算子的算法。
具体地,在对建库图像进行特征提取时,可以采用以下特征提取算法中的一种或者多种。
ORB(英文全称为oriented FAST and rotated BRIEF,中文译文是定向快并且旋转简单)算法,ORB算法是一种快速特征点提取和描述的算法;
SIFT(英文全称为scale-invariant feature transform,中文译文为尺度不变的特征变换)算法;
SuperPoint(中文译文为超级点)算法;
D2-Net算法,D2-Net算法是论文(A Trainable CNN for Joint Detection and Description of Local Features,中文译文为用于联合检测和本地特征描述的可训练CNN,其中,CNN表示卷积神经网络)提出的一种特征提取算法;
线特征算法;
上述特征提取算法可以称为特征提取算子。
应理解,当采用多种特征提取算法对建库图像进行特征提取时能够提取到多种类型的 建库图像的特征点和建库图像的特征点的描述子。
本申请中,当采用多种特征提取算法对建库图像进行特征提取时,能够从建库图像中获取更多种类的特征点和特征点的描述子,使得最终构建得到的视觉特征库中能够包含更多种类的特征点的,能够提高后续根据该视觉特征库进行视觉定位的效果。
结合第一方面,在第一方面的某些实现方式中,视觉特征库还包括建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度。
其中,建库图像的特征点的语义信息与建库图像的特征点所在区域的语义信息相同,建库图像的特征点的语义信息的置信度与建库图像的特征点所在区域的语义信息的置信度相同,建库图像的每个区域的语义信息和每个区域的语义信息的置信度是对建库图像进行语义分割得到的。
上述语义信息可以包括行人、道路、车辆、树、建筑物、天空和玻璃等等。当上述建库图像是室内的图像的话,上述语义信息还可以包括家具,电器等等。
上述语义信息的置信度可以称为语义信息的可信度。
本申请中,当视觉特征库中包含建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度时,能够在后续进行视觉定位时考虑到不同特征点对应的语义信息和置信度以确定不同特征点在进行视觉定位时的重要程度,能够进行更精准的视觉定位,提高视觉定位的准确度。
结合第一方面,在第一方面的某些实现方式中,视觉特征库还包括建库图像的描述子,其中,建库图像的描述子是由建库图像的特征点的描述子合成得到的。
由于建库图像的特征点可以是多个特征点,因此,对建库图像的特征点的描述子进行合成,实际上是对建库图像中的多个特征点的描述子进行合成。
上述建库图像的特征点的描述子可以称为局部描述子,建库图像的描述子可以称为全局描述子。
本申请中,当视觉特征库中包括建库图像的描述子时,便于后续再进行视觉定位时提高确定匹配特征点的过程,加快视觉定位的过程。
具体地,当视觉特征库中包括建库图像的描述子时,在根据该视觉特征库进行视觉定位时能够先根据待处理图像的描述子从视觉特征库中进行粗略的筛选,先选择出描述子比较接近的N(N为正整数)张图像,然后再从该N张图像的特征点中确定出待处理图像的特征点的匹配特征点,能够加速视觉定位的过程,提高视觉定位的效率。
结合第一方面,在第一方面的某些实现方式中,对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行场景模拟,生成多种场景下的场景图像;对上述多种场景下的场景图像进行特征提取,以得到建库图像的特征点以及建库图像的特征点的描述子。
可选地,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
可选地,上述多种场景的光照条件不同。
也就是说,对于上述多种场景来说,每种场景的光照条件可以与其他场景的光照条件都不相同。另外,光照条件不同具体可以是指光照强度不同。
多种场景下的场景图像还可以称为多种场景图像,每种场景图像是对建库图像进行一种场景模拟得到的。
本申请中,通过对建库图像进行场景模拟,进而对场景模拟后得到的多种场景图像进行特征提取,使得最终构建得到的视觉特征库中包含从不同场景图像中提取得到的特征点的信息,使得视觉特征库中包含的信息更加丰富,便于后续根据该视觉特征库进行更有效的视觉定位。
具体地,在进行视觉定位时,如果视觉特征库中包含多种场景图像的特征点,那么,可以先从多种场景图像中确定出与待处理图像拍摄时的场景最接近的目标场景图像,然后再从该目标场景图像中确定待处理图像的特征点的匹配特征点,可以为待处理图像的特征点确定更准确的匹配特征点,进而提高视觉定位的成功率。
结合第一方面,在第一方面的某些实现方式中,对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行切分处理,以得到多张切片图像;对多张切片图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
其中,在上述多张切片图像中,相邻切片图像的部分图像内容相同,上述建库图像可以是全景图像。
当建库图像是全景图像时,通过对全景图像进行切分,并对切分得到的切片图像进行特征提取,便于后续在进行视觉定位时较为准确的确定出待处理图像(需要进行视觉定位的图像)的特征点的匹配点,进而提高视觉定位的准确率。
具体地,当建库图像是全景图像时,由于全景投影的成像方式与用户拍摄的图像的成像方式不同,通过以对建库图像进行切分处理,能够得到不同视角的切片图像,从而消除建库图像与用户拍摄的图像的成像方式的差异,使得在根据视觉特征库对用户拍摄的图像进行视觉定位时,能够更准确的确定出用户拍摄的图像的特征点的匹配特征点。
结合第一方面,在第一方面的某些实现方式中,上述方法还包括:接收来自用户设备的待处理图像;对待处理图像进行特征提取,以得到待处理图像的特征点和待处理图像的特征点的描述子;将待处理图像的特征点对应的射线与3D模型相交,以确定待处理图像的特征点的3D位置;更新视觉特征库,更新后的视觉特征库包括待处理图像的特征点和待处理图像的特征点的3D位置。
其中,待处理图像的特征点的3D位置为待处理图像的特征点对应的射线与3D模型相交的交点的3D位置,待处理图像的特征点对应的射线是以待处理图像的投影中心为起点,并经过待处理图像的特征点的射线;
可选地,待处理图像与3D模型位于同一坐标系中,待处理图像的投影中心为第二拍摄单元拍摄待处理图像时所处的位置。
上述待处理图像可以是用户设备拍摄的图像。上述待处理图像与3D模型可以位于同一世界坐标系中。
另外,上述第二拍摄单元是拍摄建库图像的拍摄单元,该第二拍摄单元具体可以是摄像头。
本申请中,通过获取来自用户设备的待处理图像,并在确定之后待处理图像的特征点的3D位置后对视觉特征库进行更新,使得更新后的视觉特征库包含的信息的实时性更强。
结合第一方面,在第一方面的某些实现方式中,在更新视觉特征库之前,上述方法还包括:确定待处理图像的语义信息与参照图像的语义信息不同,其中,参照图像是视觉特 征库中与待处理图像的位置最接近的图像。
本申请中,当视觉特征库中的参照图像与待处理图像的语义信息不同时,说明待处理图像对应的物体的图像内容可能发生了变化,此时通过对视觉特征库进行更新,能够图像信息反映的语义信息不够准确的情况下及时对视觉特征库进行更新,提高视觉特征库的实时性。
结合第一方面,在第一方面的某些实现方式中,上述方法还包括:获取建模数据,建模数据包括建模图像和点云数据;对建模图像进行特征提取,以得到建模图像的特征点;对建库图像和建模图像中的任意两张图像的特征点进行特征匹配,对匹配得到的特征点进行串点,以得到同名特征点序列;根据同名特征点序列对建库图像和建模图像进行平差处理,以得到建库图像的位姿和建模图像的位姿;根据建模图像的位姿和点云数据,构建3D模型。
上述匹配得到的特征点是不同的图像中对应真实世界同一地物点的特征点。上述对匹配得到的特征点进行串点具体可以是将建库图像和建模图像中对应真实世界同一地物点的特征点连接起来,以得到由多个特征点连接的序列(同名特征点序列)。
得到同名特征点序列之后,可以根据同名特征点序列以及预先设定好的控制点对建库图像和建模图像中的特征点进行位置校正,使得得到的建库图像的位姿和建模图像的位姿更加准确,便于后续构建更加准确的视觉特征库。
上述建模图像可以是无人机拍摄得到的图像(室外环境下可以采用无人机拍摄得到建模图像),也可以是扫描得到的图像(室内环境下可以采用扫描仪扫描得到建模图像)。上述建模图像是用于建立3D模型的图像。
本申请中,通过对建库图像和建模图像进行平差处理,使得建库图像和建模图像对齐,使得视觉特征库中的建库图像的特征点的3D位置更加准确,便于后续根据该视觉特征库进行更准确的定位。
结合第一方面,在第一方面的某些实现方式中,上述建库图像为全景图像。
当建库图像为全景图像时,建库图像包含的信息更多,能够在构建视觉特征库的过程中从建库图像中提取到更多的特征点。
在对建库图像进行特征提取时,可以先对建库图像进行场景模拟,以得到多种场景下的场景图像,然后对每种场景下的场景图像进行切分处理(当然也可以只对其中的部分场景图像进行切分处理),以得到多张切片图像。
结合第一方面,在第一方面的某些实现方式中,上述对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行场景模拟,以得到多种场景下的场景图像;对该多种场景下的场景图像分别进行切分处理,以得到多张切片图像;对多张切片图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
其中,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种,在上述多张切片图像中,相邻切片图像的部分图像内容相同。
具体地,假设对建库图像进行场景模拟,以得到三种场景下的场景图像分别为第一场景图像、第二场景图像和第三场景图像,接下来,分别对第一场景图像、第二场景图像和第三场景图像进行切分处理,以得到多张切片图像。假设对每个场景图像进行切分得到8 张切片图像,那么,通过对第一场景图像、第二场景图像和第三场景图像进行切分处理,可以得到24张切片图像,接下来对这24张切片图像进行特征提取,从而得到建库图像的特征点和描述子。
在对建库图像进行特征提取时,也可以先对建库图像进行切分处理,以得到切片图像,然后对每个切片图像进行场景模拟(当然也可以只对其中的部分切片图像进行场景模拟)。
结合第一方面,在第一方面的某些实现方式中,上述对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行切分处理,以得到多张切片图像;对所述多张切片图像中的每张切片图像进行场景模拟,以得到多种场景下的场景图像;对多种场景下的场景图像进行特征提取,以得到建库图像的特征点以及建库图像的特征点的描述子。
其中,在上述多张切片图像中,相邻切片图像的部分图像内容相同,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
具体地,假设对建库图像进行切分处理,以得到8张切片图像,接下来,再对这8张切片图像进行场景模拟。假设对每个切片图像进行场景模拟,以得到4种场景下的场景图像,那么,对这8张切片图像分别进行场景模拟,可以得到32张场景图像,接下来对这32张场景图像进行特征提取,从而得到建库图像的特征点和描述子。
上述采用多种特征提取算法对建库图像进行特征提取可以是先对建库图像进行切分处理和/或场景模拟,然后对得到的图像进行特征提取,从而得到建库图像的特征点和所述建库图像的特征点的描述子。
结合第一方面,在第一方面的某些实现方式中,上述对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行切分处理,以得到多张切片图像;采用多种特征提取算法分别对多张切片图像中的每张切片图像进行特征提取,以得到建库图像的特征点和所述建库图像的特征点的描述子。
其中,在上述多张切片图像中,相邻切片图像的部分图像内容相同。
例如,对建库图像进行切分处理,以得到了12张切片图像,接下来,采用3种特征提取算法分别对12张切片图像中的每种切片图像进行特征提取,从而得到建库图像的特征点和所述建库图像的特征点的描述子。
结合第一方面,在第一方面的某些实现方式中,上述对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行场景模拟,生成多种场景下的场景图像;采用多种特征提取算法分别对多种场景下的场景图像进行特征提取,以得到建库图像的特征点以及所述建库图像的特征点的描述子。
其中,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
例如,对建库图像进行场景模拟,以得到了4种场景下的场景图像,接下来,采用3种特征提取算法分别对这4种场景下的场景图像进行特征提取,从而得到建库图像的特征点和所述建库图像的特征点的描述子。
结合第一方面,在第一方面的某些实现方式中,上述对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行场景模拟,以得到多种场景下的场景图像;对该多种场景下的场景图像分别进行切分处理,以得到多张切片图像;采用多种特征提取算法分别对多张切片图像进行特征提取,以得到建库图像的 特征点和建库图像的特征点的描述子。
其中,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种,在上述多张切片图像中,相邻切片图像的部分图像内容相同。
例如,对建库图像进行场景模拟,以得到三种场景下的场景图像分别为第一场景图像、第二场景图像和第三场景图像,接下来,分别对第一场景图像、第二场景图像和第三场景图像进行切分处理,对每个场景图像进行切分得到8张切片图像,那么,通过对第一场景图像、第二场景图像和第三场景图像进行切分处理,可以得到24张切片图像,接下来再采用3种特征提取算法分别对这24张切片图像进行特征提取,从而得到建库图像的特征点和描述子。
结合第一方面,在第一方面的某些实现方式中,上述对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:对建库图像进行切分处理,以得到多张切片图像;对所述多张切片图像中的每张切片图像进行场景模拟,以得到多种场景下的场景图像;采用多种特征提取算法分别对多种场景下的场景图像进行特征提取,以得到建库图像的特征点以及建库图像的特征点的描述子。
其中,在上述多张切片图像中,相邻切片图像的部分图像内容相同,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
例如,对建库图像进行切分处理,以得到8张切片图像,接下来,再对这8张切片图像中的每个切片图像进行场景模拟,以得到4种场景下的场景图像,共得到32张图像,接下来,再采用3种特征提取算法分别对这32张图像进行特征提取,从而得到建库图像的特征点和描述子。
第二方面,提供了一种视觉定位方法,该方法包括:获取待处理图像;对待处理图像进行特征提取,以得到待处理图像的特征点和待处理图像的特征点的描述子;根据待处理图像的特征点的描述子,从视觉特征库中确定出待处理图像的特征点的匹配特征点;根据匹配特征点的3D位置,确定拍摄单元拍摄待处理图像时的位姿信息。
上述视觉特征库包括建库图像的特征点的描述子和建库图像的特征点的3D位置,视觉特征库满足下列条件中的至少一种:
建库图像的特征点包括多组特征点,该多组特征点中的任意两组特征点的描述子的描述方式不同;
视觉特征库包括建库图像的描述子,建库图像的描述子是由建库图像的特征点的描述子合成得到的;
建库图像的特征点为多种场景图像的特征点,多种场景图像是对建库图像进行场景模拟得到的,多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种;
建库图像的特征点和建库图像的特征点的描述子是对多张切片图像进行特征提取得到的,多张切片图像是对建库图像进行切分处理得到的,其中,在多张切片图像中,相邻切片图像的部分图像内容相同;
视觉特征库包括建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度。
本申请中的视觉特征库与传统方案的视觉特征库相比包含更多的信息,因此,本申请中能够根据该视觉特征库更好地进行视觉定位,提高视觉定位的效果。
具体地,由于本申请中的视觉特征库中包括更多的信息,因此,根据该视觉特征库对待处理图像进行视觉定位时,能够更精准的确定待处理图像的特征点的匹配特征点,进而能够实现对待处理图像更精准的定位。
上述第二方面中的视觉特征库可以是根据上述第一方面中的视觉特征库的构建方法构建得到的。
上述多组特征点以及多组特征点的描述子可以是根据多种特征提取算法对建库图像进行特征提取得到的。该多种特征提取算法可以是ORB算法、SIFT算法和SuperPoint算法、D2-net以及线特征中的任意两个算法。
当上述视觉特征库中包含多组特征点时,使得视觉特征库中包含更多的特征点的相关信息,便于后续根据视觉特征库更好地进行视觉定位。
上述建库图像的描述子可以是描述建库图像的整体特征的描述子,上述建库图像的描述子可以是在构建视觉特征库的过程中通过对建库图像的特征点的描述子进行合成得到的,这里的建库图像的特征点可以是指从建库图像中提取到的所有的特征点。
结合第二方面,在第二方面的某些实现方式中,上述建库图像的特征点包括多组特征点,上述根据待处理图像的特征点的描述子,从视觉特征库中确定出待处理图像的特征点的匹配特征点,包括:根据待处理图像的特征点的描述子的描述方式,从多组特征点中确定出目标组特征点;根据待处理图像的特征点的描述子,从目标组特征点中确定出待处理图像的特征点的匹配特征点。
其中,上述目标组特征点是多组特征点中描述子的描述方式与待处理图像的特征点的描述子的描述方式与相同的一组特征点。
当建库图像的特征点包括多组特征点时,视觉特征库中包含的特征点的信息更多,通过从多组特征点中选择出与待处理图像的特征点的描述子的描述方式相同的目标特征点,能够在后续从目标特征点中选择出与待处理图像的特征点更匹配的匹配特征点,提高视觉定位的效果。
结合第二方面,在第二方面的某些实现方式中,上述视觉特征库包括建库图像的描述子,上述根据待处理图像的特征点的描述子,从视觉特征库中确定出待处理图像的特征点的匹配特征点,包括:根据待处理图像的描述子从建库图像中确定出N张图像;从上述N张图像的特征点中确定出待处理图像的特征点的匹配特征点。
其中,上述待处理图像的描述子是由待处理图像的特征点的描述子合成得到的,建库图像由N(N为正整数)张图像和M(M为正整数)张图像组成,待处理图像的描述子与上述N张图像中的任意一张图像的描述子的距离小于或者等于待处理图像的描述子与建库图像中剩余的M张图像中的任意一张图像的描述子的距离。
当视觉特征库中包括建库图像的描述子时,能够先根据待处理图像的描述子从视觉特征库中进行粗略的筛选,选择出描述子比较接近的N张图像,然后再从该N张图像的特征点中确定出待处理图像的特征点的匹配特征点,能够加速视觉定位的过程,提高视觉定位的效率。
结合第二方面,在第二方面的某些实现方式中,上述建库图像的特征点为多种场景下的场景图像的特征点,上述根据待处理图像的特征点的描述子,从视觉特征库中确定出待处理图像的特征点的匹配特征点,包括:从多种场景下的场景图像中确定目标场景图像; 根据待处理图像的特征点的描述子,从目标场景图像的特征点中确定出待处理图像的特征点的匹配特征点。
其中,目标场景图像是多种场景下的场景图像中对应的场景与拍摄待处理图像时的场景最接近的场景图像。
当视觉特征库中包含多种场景图像的特征点时,可以先从多种场景图像中确定出与待处理图像拍摄时的场景最接近的目标场景图像,然后再从该目标场景图像中确定待处理图像的特征点的匹配特征点,可以为待处理图像的特征点确定更准确的匹配特征点,进而提高视觉定位的成功率。
结合第二方面,在第二方面的某些实现方式中,上述视觉特征库包括建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度,上述根据匹配特征点的3D位置,确定拍摄单元拍摄待处理图像时的位姿信息,包括:根据匹配特征点的语义信息的置信度,对匹配特征点的3D位置进行加权处理;根据加权处理结果确定拍摄单元拍摄待处理图像时的位姿信息。
其中,在对匹配特征点的3D位置进行加权处理时,置信度越高的匹配特征点对应的权重越大。
本申请中,当视觉特征库中包含建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度时,能够在进行视觉定位时考虑到不同特征点对应的语义信息和置信度确定不同特征点在进行视觉定位时的重要程度,能够进行更精准的视觉定位,提高视觉定位的准确度。
结合第二方面,在第二方面的某些实现方式中,上述建库图像为全景图像。
可选地,上述第二方面中的视觉特征库是根据上述第一方面中的视觉特征库的构建方法构建得到的。
在建库图像数量一定的情况下,上述第一方面的方法构建得到的视觉特征库包含更多数量的建库图像的特征点的信息,因此,在建库图像数量一定的情况下,采用第一方面的方法构建得到的视觉特征库进行视觉定位能够提高视觉定位的效果。
第三方面,提供了一种视觉特征库的构建装置,该装置包括用于执行上述第一方面及第一方面中的任意一种实现方式中的方法的模块。
第四方面,提供了一种视觉定位装置,该装置包括用于执行上述第二方面及第二方面中的任意一种实现方式中的方法的模块。
第五方面,提供了一种视觉特征库的构建装置,包括存储器和处理器,所述存储器用于存储程序,所述处理器用于执行程序,当所述程序被执行时,所述处理器用于执行上述第一方面及第一方面中的任意一种实现方式中的方法。
在上述处理器执行第一方面及第一方面中的任意一种实现方式中的方法时,处理器可以通过(调用)通信接口来获取建库图像(此时可以通过通信接口从其他装置获取建库图像)或者从存储器中获取建库图像(此时建库图像存储在存储器中),然后通过处理器对建库图像进行一系列处理,最终构建得到视觉特征库。
第六方面,提供了一种视觉定位装置,包括存储器和处理器,所述存储器用于存储程序,所述处理器用于执行程序,当所述程序被执行时,所述处理器用于执行上述第二方面及第二方面中的任意一种实现方式中的方法。
在上述处理器执行第二方面及第二方面中的任意一种实现方式中的方法时,处理器可以通过(调用)摄像头来获取待处理图像或者从存储器中获取待处理图像,然后通过处理器对待处理图像进行一系列处理,最终实现视觉定位。
上述第三方面或者第五方面的视觉特征库的构建装置可以是服务器、云端设备或者具有一定运算能力的计算机设备。
上述第四方面或者第六方面的视觉定位装置具体可以是手机,电脑,个人数字助理,可穿戴设备,车载设备,物联网设备、虚拟现实设备、增强现实设备等等。
第七方面,提供了一种计算机可读存储介质,所述计算机可读介质存储介质用于存储程序代码,当所述程序代码被计算机执行时,所述计算机用于执行上述第一方面及第一方面中的任意一种实现方式中的方法。
第八方面,提供了一种计算机可读存储介质,所述计算机可读介质存储介质用于存储程序代码,当所述程序代码被计算机执行时,所述计算机用于执行上述第二方面及第二方面中的任意一种实现方式中的方法。
第九方面,提供了一种芯片,所述芯片包括处理器,所述处理器用于执行上述第一方面及第一方面中的任意一种实现方式中的方法。
上述第九方面的芯片可以位于服务器中,或者位于云端设备中,或者位于具有一定运算能够力能够构建视觉特征库构建的计算机设备中。
第十方面,提供了一种芯片,所述芯片包括处理器,所述处理器用于执行上述第二方面及第二方面中的任意一种实现方式中的方法。
上述第十方面的芯片可以位于终端设备中,该终端设备可以是手机,电脑,个人数字助理,可穿戴设备,车载设备,物联网设备、虚拟现实设备、增强现实设备等等。
第十一方面,提供了一种用于使得计算机或者终端设备执行上述第一方面及第一方面中的任意一种实现方式中的方法的计算机程序(或称计算机程序产品)。
第十二方面,提供了一种用于使得计算机或者终端设备执行上述第二方面及第二方面中的任意一种实现方式中的方法的计算机程序(或称计算机程序产品)。
附图说明
图1是本申请实施例的视觉特征库的构建方法的示意性流程图;
图2是对建库图像进行特征提取的示意图;
图3是确定建库图像的特征点的3D位置的过程的示意图;
图4是对建库图像进行语义分割并得到建库图像的语义信息和置信度的示意图;
图5是获得建库图像的描述子的过程的示意图;
图6是获得建库图像的描述子的过程的示意图;
图7是对建库图像进行场景模拟得到多种场景图像的示意图;
图8是对建库图像进行切分得到切片图像的示意图;
图9是对建库图像进行场景模拟和切分处理的示意图;
图10是对建库图像进行切分处理和场景模拟的示意图;
图11是本申请实施例的视觉定位方法的示意性流程图;
图12为本申请实施例的视觉特征库的构建方法应用在具体的产品形态上的示意图;
图13是本申请实施例的视觉特征库的构建装置的示意性框图;
图14是本申请实施例的视觉定位装置的示意性框图;
图15是本申请实施例的视觉特征库的构建装置的硬件结构示意图;
图16是本申请实施例提供的视觉定位装置的硬件结构示意图;
图17是本申请实施例的终端设备的硬件结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
视觉定位是利用终端设备拍摄的图像或视频以及预先建立好的3D地图,通过特征提取、特征匹配和透视N点投影(pespective-n-point,PNP)等一系列算法,来估计出终端设备的拍摄单元所处的位置和姿态。视觉定位可以应用在增强现实、无人驾驶以及智能移动机器人领域。
其中,在增强现实领域,视觉定位具体可以用于3D导航、3D广告投放和虚拟人偶交互等。例如,可以将虚拟的3D导航图标等精确地安置在真实场景的适当位置,以实现精准定位。
在自动驾驶领域,可以通过视觉定位获取车辆的准确位置。在智能移动机器人领域,可以通过视觉定位来实时获得智能移动机器人的位置和姿态,进而控制智能移动机器人的动作。
进行精准的视觉定位的关键在于构建出包含足够精确信息的视觉特征库。下面结合附图对本申请实施例的视觉定位方法进行详细介绍。
图1是本申请实施例的视觉特征库的构建方法的示意性流程图。图1所示的方法可以有视觉特征库的构建装置来执行。该视觉特征库的构建装置具体可以是服务器、云端设备或者具有一定运算能力(运算能力能够满足视觉特征库的构建)的计算机设备。
图1所示的方法包括步骤1001至1004,下面分别对这些步骤进行详细的介绍。
1001、获取建库图像;
上述建库图像可以是用于构建视觉特征库的图像,该建库图像既可以是一张图像,也可以是多张图像。当建库图像为多张图像时,本申请实施例中对建库图像的处理过程可以视为对建库图像的任意一张图像的处理。
上述建库图像可以是用相机拍摄得到的,该述建库图像可以是全景图像,也可以是非全景图像(例如,广角图像)。上述建库图像还可以称为建库影像。
在上述步骤1001中,当建库图像存储在相机内部时,视觉特征库的构建装置可以通过与相机通信的方式从相机获得建库图像,当建库图像存储在视觉特征库的构建装置内的存储器中时,视觉特征库的构建装置可以直接从存储器中获取建库图像。
1002、对建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子。
应理解,在本申请中,建库图像的特征点可以是多个,通过对建库图像进行特征提取得到的是建库图像的多个特征点,本申请中为了描述方便,统一采用建库图像的特征点这一名称。
在步骤1002中,可以采用特征提取算法对建库图像进行特征提取,以得到建库图像 的特征点和建库图像的特征点的描述子。
在上述步骤1002中,可以采用一种或者多种特征提取算法对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
其中,上述特征提取算法是用于提取图像中的特征点以及图像的特征点的描述子的算法。在本申请中,可用的特征提取算法可以包括以下几种:
ORB算法;
SIFT算法;
SuperPoint算法;
D2-Net算法;
线特征算法;
上述特征提取算法还可以称为特征提取算子。
在上述步骤1002中,可以采用ORB算法,SIFT算法、SuperPoint算法、D2-net算法以及线特征算法中的一种或者多种对建库图像进行特征提取。
本申请中,当采用多种特征提取算法对建库图像进行特征提取时,能够从建库图像中获取更多种类的特征点和特征点的描述子,使得最终构建得到的视觉特征库中能够包含更多种类的特征点,可以提高后续根据该视觉特征库进行视觉定位的效果。
具体地,当视觉特征库中包含多种类型的特征点和特征点的描述子时,在根据该视觉特征库对待处理图像进行视觉定位时,能够较从视觉特征库的多种类型的特征点中更准确确定出与待处理图像的特征点相匹配的匹配特征点,可以提高视觉定位的效果。
下面结合图2对建库图像的特征提取过程进行说明。
如图2所示,可以采用三种特征提取算法对建库图像进行特征提取,以得到三类特征点和三类特征点的描述子。
上述三种特征提取算法可以包括:
其中,第一类特征点、第二类特征点和第三类特征点可以是分别根据ORB算法,SIFT算法和SuperPoint算法对建库图像进行特征提取后得到的特征点,第一类特征点的描述子、第二类特征点的描述子和第三类特征点的描述子也是根据相应的特征提取算法得到的,每一类特征点的2D坐标可以是根据建库图像直接得到的。
1003、将建库图像的特征点对应的射线与3D模型相交,以确定(得到)建库图像的特征点的3D位置。
在上述步骤1003中,建库图像的特征点的3D位置是建库图像的特征点对应的射线与3D模型相交的交点的3D位置,建库图像的特征点对应的射线是以建库图像的投影中心为起点并经过所述建库图像的特征点的射线。
上述建库图像与上述3D模型位于同一坐标系中,上述建库图像的投影中心为第一拍摄单元拍摄建库图像时(第一拍摄单元)所处的位置。应理解,这里的第一拍摄单元是拍摄建库图像的拍摄单元。
下面结合图3对确定建库图像的特征点的3D位置的过程进行详细描述。
如图3所示,建库图像的特征点P在建库图像的图像坐标系中的坐标为[x p y p] T,通过坐标变换将特征点P变换到相机坐标系下,以得到特征点P在相机坐标系中的坐标如公式(1)所示。
Figure PCTCN2020107597-appb-000001
其中,[x o y o f]为相机内参,具体地,f为相机焦距,(x o y o)为相机主点位置。
接下来,在将特征点P在相机坐标系下的坐标转换到世界坐标系中,以得到的特征点P在世界坐标系中的坐标如公式(2)所示。
Figure PCTCN2020107597-appb-000002
其中,
Figure PCTCN2020107597-appb-000003
是将特征点P从相机坐标系下转换到世界坐标系的旋转矩阵,该旋转矩阵的参数可以根据相机坐标系与世界坐标系的位置关系来确定,
Figure PCTCN2020107597-appb-000004
是相机投影中心在世界坐标系下的坐标。
经过上述坐标转换过程,将特征点P转换到了世界坐标系中,此时相当于将建库图像转换到了世界坐标系中,而图3所示的3D模型(图3中位于世界坐标系统中的六面体表示3D模型)本身就位于世界坐标系中,因此,建库图像和3D模型均位于世界坐标系中。接下来,以相机坐标系的原点为起点,构建一条经过特征点P的射线,该射线与3D模型相交的交点的3D位置就是特征点P的3D位置。如图3所示,该交点的位置坐标为
Figure PCTCN2020107597-appb-000005
因此,通过射线相交得到特征点P的3D位置为
Figure PCTCN2020107597-appb-000006
1004、构建视觉特征库,该视觉特征库包括建库图像的特征点的描述子和建库图像的特征点的3D位置。
本申请中,通过射线与3D模型相交的方式来获取建库图像特征点的3D位置,与传统方案中仅能够获取图像之间匹配上的特征点的3D位置相比,可以在建库图像数量一定的情况下,从建库图像中获取到更多数量的特征点的信息,使得构建得到的视觉特征库包含更多数量的特征点的信息。
由于在建库图像数量一定的情况下,本申请构建得到的视觉特征库包含更多数量的特征点的信息,使得后续利用该视觉特征库进行视觉定位时取得更好的视觉定位效果。
此外,由于在建库图像数量一定的情况下,本申请的视觉特征库的构建方法构建得到的视觉特征库包含更多数量的特征点的信息,使得本申请的视觉特征库的构建方法能够适用于辐射差异较大、弱纹理等较难进行准确进行视觉定位的场景中,在这些场景中采用本申请实施例的视觉特征库的构建方法得到的视觉特征库进行视觉定位能够取得更好的视觉定位效果。
在本申请中,视觉特征库除了包含建库图像的特征点的描述子和建库图像的特征点的3D位置之外,视觉特征库还可以包括以下两种信息。
(1)建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度;
(2)建库图像的描述子。
具体地,可以在本申请的视觉特征库的构建方法中,生成上述三种信息的一种或者多种之后,然后将该三种信息中的一种或者多种写入(保存到)视觉特征库中。
下面结合附图对上述两种信息的生成过程进行详细的描述。
(1)建库图像的特征点的语义信息及置信度。
在本申请中,可以通过以步骤A和步骤B来确定建库图像的特征点的语义信息及置信度。
步骤A:对建库图像进行语义分割,以得到建库图像的语义分割结果;
步骤B:根据建库图像的语义分割结果生成建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度。
其中,步骤A得到的建库图像的语义分割结果包括建库图像的每个区域的语义信息和每个区域的语义信息的置信度。接下来,在步骤B中,可以将建库图像的特征点所在区域的语义信息确定为建库图像的特征点的语义信息,将建库图像的特征点所在区域的语义信息的置信度确定为建库图像的特征点的语义信息的置信度。
如图4所示,通过对建库图像进行语义分割,可以将建库图像划分成对应6个区域,这6个区域的图像的语义分别是行人、道路、树、建筑物、天空和玻璃。
接下来,可以对建库图像的特征点进行语义识别,在进行语义识别时,具体可以根据建库图像的特征点的2D坐标确定建库图像的特征点所在的图像区域,特征点所在的图像区域的语义信息就是该特征点的语义信息,从而得到建库图像的特征点的信息不仅包括特征点的2D坐标和特征点的描述子,还包括特征点的语义信息和特征点的语义信息的置信度。
例如,对于建库图像中的某个特征点来说,通过该特征点的2D坐标确定该特征点所在的图像区域的语义为道路,那么,就可以确定该特征点的语义也为道路。
应理解,上述图4所示的过程是对直接建库图像直接进行语义分割,以最终得到建库图像的特征点的语义类别(语义信息的具体表现形式)和建库图像的特征点的语义类别的置信度。事实上,在本申请中,还可以先对建库图像进行切分(切分的过程可以如图1所示),然后对得到的切片图像进行语义分割,以最终得到建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度。
另外,在本申请中,还可以先对建库图像进行场景模拟,在得到多种场景图像之后,再对多种场景进行语义分割,从而最终得到建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度。
(2)建库图像的描述子。
在本申请中,可以通过对建库图像的特征点的描述子进行合成,来得到建库图像的描述子。
通过合成建库图像的特征点的描述子,能够得到建库图像的描述子,可以将建库图像的描述子写入到视觉特征库中,使得视觉特征库中包含的信息更加丰富。
如图5所示,采用一种特征提取算法(可以是ORB算法,SIFT算法、SuperPoint算法、D2-net算法以及线特征算法中的一种)对建库图像进行特征提取,以得到特征点的描 述子,接下来,对该特征点描述子进行合成,以得到建库图像的描述子,然后将特征点的描述子和建库图像的描述子都写入到视觉特征库中。
在提取建库图像的特征点时,还可以采用多种特征算法对建库图像进行特征提取,以得到多种类型的特征点以及特征点的描述子,接下来,可以对每一类特征点的描述子进行合并得到建库图像的描述子。
如图6所示,采用三种不同的特征提取算法(可以是ORB算法,SIFT算法、SuperPoint算法、D2-net算法以及线特征算法中的任意三种)对建库图像进行特征提取,以得到第一类特征点及其描述子,第二类特征点及其描述子,第三类特征点及其描述子。接下来,可以对第一类特征点的描述子进行合并处理得到建库图像的第一类描述子,对第二类特征点的描述子进行合并处理得到建库图像的第二类描述子,对第三类特征点的描述子进行合并处理得到建库图像的第三类描述子。接下来,可以将特征点的不同类型的描述子以及建库图像的不同类型的描述子写入到视觉特征库中。
另外,在保存建库图像的描述子时,可以将建库图像的描述子保存到视觉特征库中的图像检索库中,便于后续进行视觉定位时进行查找。
为了模拟不同的场景下的图像,在本申请中,在根据建库图像进行特征提取时,可以先对建库图像进行场景模拟,以得到不同场景的场景图像,然后再对不同场景的场景图像进行特征提取,从而获取到更多以场景下的图像的特征点和描述子。
可选地,上述步骤1002具体包括:
1002a、对建库图像进行场景模拟,生成多种场景下的建库图像;
1002b、对多种场景下的建库图像进行特征提取,以得到建库图像的特征点以及建库图像的特征点的描述子。
其中,上述步骤1002a中的多种场景可以包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
通过对建库图像进行场景模拟,能够得到不同场景下的场景图像,进而得到从不同场景图像中提取到的信息,使得最终生成的视觉特征库包含的信息更加丰富。
例如,如图7所示,通过对建库图像进行场景模拟,可以得到白天、夜晚、雨天以及雪天场景的场景图像。应理解,图7仅示出了部分场景下的场景图像,在本申请中还可以通过对建库图像进行场景模拟得到其他场景下的场景图像,例如,阴天和多云等场景下的场景图像。
应理解,上述图7所示的过程是对建库图像直接进行场景模拟,以得到多种场景图像。事实上,在本申请中,还可以先对建库图像进行切分(切分的过程可以如图8所示),然后对得到的切片图像进行场景模拟,以得到多种场景图像。
由于全景投影的成像方式与用户拍摄的图像的成像方式不同,当建库图像为全景图像时,可以对建库图像进行切分处理,以得到不同视角的切片图像,从而消除建库图像与用户拍摄的图像的成像方式的差异,使得在根据视觉特征库对用户拍摄的图像进行视觉定位时,能够更准确的确定出用户拍摄的图像的特征点的匹配特征点。
可选地,上述步骤1002具体包括:
1002c、对建库图像进行切分处理,以得到多张切片图像;
1002d、对多张切片图像进行特征提取,以得到建库图像的特征点和建库图像的特征 点的描述子。
其中,在步骤1002c得到的多张切片图像中,相邻切片图像的部分图像内容相同。上述步骤1002c中的建库图像具体可以是全景图像或者广角图像。
下面结合图8对图像的切分过程进行描述。
如图8所示,可以对建库图像进行切分处理(也可以称为投影处理),以得到切片图像1至切片图像K,不同的切片图像对应的视角不同(切片图像1至切片图像K对应的视角分别为视角1至视角K),在切片图像1至切片图像K中,切片图像i和切片图像i+1为相邻的切片图像,其中,1≤i<K,K为正整数。
上述K的数值可以根据构建视觉特征库的需求来设定。
在设置K的数值时可以使得用户拍摄的图像与切分得到的切分图像的视角范围比较接近。上述K的数值具体可以是8,12,16等数值。
其中,图8所示的切分过程切分得到的相邻切片图像的部分图像内容相同。例如,切片图像1和切片图像2相邻,切片图像1和切片图像2的部分图像内容相同。
应理解,图8所示的切分过程是对建库图像进行直接切分,以得到多个切片图像。事实上,在本申请中,还可以先对建库图像进行场景模拟(进行场景模拟的过程可以如图7所示),然后再对每个场景图像进行切分,以得到多个切片图像。
在本申请中,在对建库图像进行一系列处理得到视觉特征库中,为了使得视觉特征库中包含的信息的实时性更强,还可以进行众包更新。这里的众包更新是指可以接收来自用户设备的待处理图像,并对待处理图像进行一系列处理,将待处理图像的特征点的描述子以及待处理图像的描述子也写入到视觉特征库中,从而实现对视觉特征库的更新,使得视觉特征库包含更多的信息。
在对建库图像进行特征提取时,可以先对建库图像进行场景模拟,以得到多种场景下的场景图像,然后对每种场景下的场景图像进行切分处理(当然也可以只对其中的部分场景图像进行切分处理),以得到多张切片图像。
具体地,上述步骤1002中对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:
1002e、对建库图像进行场景模拟,以得到多种场景下的场景图像;对该多种场景下的场景图像分别进行切分处理,以得到多张切片图像;
1002f、对多张切片图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
其中,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种,在上述多张切片图像中,相邻切片图像的部分图像内容相同。
假设对建库图像进行场景模拟,以得到三种场景下的场景图像分别为第一场景图像、第二场景图像和第三场景图像,接下来,分别对第一场景图像、第二场景图像和第三场景图像进行切分处理,以得到多张切片图像。假设对每个场景图像进行切分得到8张切片图像,那么,通过对第一场景图像、第二场景图像和第三场景图像进行切分处理,可以得到24张切片图像,接下来对这24张切片图像进行特征提取,从而得到建库图像的特征点和描述子。
例如,如图9所示,可以先对建库图像进行场景模拟,以得到白天、夜晚、雨天和雪 天这四种场景下的场景图像,然后再将每种场景图像切分成8个切片图像。例如,对白天场景下的图像进行切分的,以得到视角分别为视角1至视角8的8张切片图像。对这四种场景下的图像进行切分处理,最终得到了切片图像1至切片图像32,接下来,可以对这32张切片图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
应理解,在图9所示的过程中,在对每种场景下的图像进行切分时,切分得到的切片图像的数量可以不同。例如,在对白天、夜晚、雨天和雪天这四种场景下的场景图像进行切分得到的切片图像的数量分别为8、8、12和12(这里的数量仅为举例,还可以是其他数量)。
在上述步骤1002中,也可以先对建库图像进行切分处理,以得到切片图像,然后对每个切片图像进行场景模拟(当然也可以只对其中的部分切片图像进行场景模拟)。
具体地,上述步骤1002中对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:
1002r、对建库图像进行切分处理,以得到多张切片图像;
1002s、对多张切片图像中的每张切片图像进行场景模拟,以得到多种场景下的场景图像;
1002t、对多种场景下的场景图像进行特征提取,以得到建库图像的特征点以及建库图像的特征点的描述子。
其中,在上述多张切片图像中,相邻切片图像的部分图像内容相同,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
假设对建库图像进行切分处理,以得到8张切片图像,接下来,再对这8张切片图像进行场景模拟。假设对每个切片图像进行场景模拟,以得到4种场景下的场景图像,那么,对这8张切片图像分别进行场景模拟,可以得到32张场景图像,接下来对这32张场景图像进行特征提取,从而得到建库图像的特征点和描述子。
上述采用多种特征提取算法对建库图像进行特征提取可以是先对建库图像进行切分处理和/或场景模拟,然后对得到的图像进行特征提取,从而得到建库图像的特征点和所述建库图像的特征点的描述子。
例如,如图10所示,可以先对建库图像进行切分处理,以得到切片图像1至切片图像8,然后再分别对这8个切片图像分别进行场景模拟,以得到白天、夜晚和雨天场景下的场景图像。具体地,如图10所示,通过对切片图像1至切片图像8进行场景模拟,以得到了场景图像1至场景图像24,接下来,可以对场景图像1至场景图像24进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
应理解,图10所示的切片图像的数量以及场景均为示例说明,事实上,在对建库图像进行切分处理时还可以得到其他数量的切片图像,并这些切片图像进行场景模拟时,也可以得到其他场景下的图像。
在上述步骤1002中,还可以先分别对建库图像进行切分处理或场景模拟,然后再采用多种特征提取算法进行特征提取。
具体地,上述步骤1002中对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:
1002j、对建库图像进行切分处理,以得到多张切片图像;
1002k、采用多种特征提取算法分别对多张切片图像中的每张切片图像进行特征提取,以得到建库图像的特征点和所述建库图像的特征点的描述子。
其中,在上述多张切片图像中,相邻切片图像的部分图像内容相同。
例如,对建库图像进行切分处理,以得到了12张切片图像,接下来,采用3种特征提取算法分别对12张切片图像中的每种切片图像进行特征提取,从而得到建库图像的特征点和所述建库图像的特征点的描述子。
具体地,上述步骤1002中对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:
1002g、对建库图像进行场景模拟,生成多种场景下的场景图像;
1002h、采用多种特征提取算法分别对多种场景下的场景图像进行特征提取,以得到建库图像的特征点以及所述建库图像的特征点的描述子。
其中,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
例如,对建库图像进行场景模拟,以得到了4种场景下的场景图像,接下来,采用3种特征提取算法分别对这4种场景下的场景图像进行特征提取,从而得到建库图像的特征点和所述建库图像的特征点的描述子。
在上述步骤1002中,还可以先分别对建库图像进行切分处理和场景模拟(先进行切分处理后进行场景模拟,或者先进行场景模拟后进行切分处理),然后再采用多种特征提取算法进行特征提取。
具体地,上述步骤1002中对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:
1002u、对建库图像进行场景模拟,以得到多种场景下的场景图像;
1002v、对该多种场景下的场景图像分别进行切分处理,以得到多张切片图像;
1002w、采用多种特征提取算法分别对多张切片图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。
其中,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种,在上述多张切片图像中,相邻切片图像的部分图像内容相同。
例如,对建库图像进行场景模拟,以得到三种场景下的场景图像分别为第一场景图像、第二场景图像和第三场景图像,接下来,分别对第一场景图像、第二场景图像和第三场景图像进行切分处理,对每个场景图像进行切分得到8张切片图像,那么,通过对第一场景图像、第二场景图像和第三场景图像进行切分处理,可以得到24张切片图像,接下来再采用3种特征提取算法分别对这24张切片图像进行特征提取,从而得到建库图像的特征点和描述子。
具体地,上述步骤1002中对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子,包括:
1002x、对建库图像进行切分处理,以得到多张切片图像;
1002y、对多张切片图像中的每张切片图像进行场景模拟,以得到多种场景下的场景图像;
1002z、采用多种特征提取算法分别对多种场景下的场景图像进行特征提取,以得到建库图像的特征点以及建库图像的特征点的描述子。
其中,在上述多张切片图像中,相邻切片图像的部分图像内容相同,上述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种。
例如,对建库图像进行切分处理,以得到8张切片图像,接下来,再对这8张切片图像中的每个切片图像进行场景模拟,以得到4种场景下的场景图像,共得到32张图像,接下来,再采用3种特征提取算法分别对这32张图像进行特征提取,从而得到建库图像的特征点和描述子。
可选地,图1所示的方法还包括以下步骤:
2001、接收来自用户设备的待处理图像;
2002、对待处理图像进行特征提取,以得到待处理图像的特征点和待处理图像的特征点的描述子;
2003、将待处理图像的特征点对应的射线与3D模型相交,以确定待处理图像的特征点的3D位置;
2004、更新视觉特征库,更新后的视觉特征库包括待处理图像的特征点和待处理图像的特征点的3D位置。
其中,待处理图像的特征点的3D位置为待处理图像的特征点对应的射线与3D模型相交的交点的3D位置,待处理图像的特征点对应的射线是以待处理图像的投影中心为起点,并经过待处理图像的特征点的射线,待处理图像与3D模型位于同一坐标系中,待处理图像的投影中心为第二拍摄单元拍摄待处理图像时所处的位置。
本申请中,通过获取来自用户设备的待处理图像,并在确定之后待处理图像的特征点的3D位置后对视觉特征库进行更新,使得更新后的视觉特征库包含的信息的实时性更强。
上述步骤2001至步骤2003的处理过程与上文中的步骤1001至1003描述的过程相同,这里不再详细描述。
可选地,在执行2005之前,还可以先执行2006和2007:
2007、从视觉特征库中确定出参照图像;
2008、确定待处理图像的语义信息与参照图像的语义信息不同。
上述参照图像是视觉特征库中与待处理图像的位置最接近的图像;
参照图像的位置和待处理图像的位置可以由各自的特征点的3D位置来确定,在从视觉特征库中确定出参照图像时,具体可以将建库图像中的每张图像的特征点的3D位置与待处理图像的3D位置进行比较,从中选择出一个特征点的3D位置与待处理图像的3D位置最接近(特征点的3D位置与待处理图像的3D位置重合最多)的图像作为参照图像。
本申请中,当视觉特征库中的参照图像与待处理图像的语义信息不同时,说明待处理图像对应的物体的图像内容可能发生了变化,此时通过对视觉特征库进行更新,能够图像信息反映的语义信息不够准确的情况下及时对视觉特征库进行更新,提高视觉特征库的实时性。
上文结合附图对本申请实施例的视觉特征库的构建方法进行了详细描述,应理解,本申请实施例的视觉特征库的构建方法构建得到的视觉特征库可以用于进行视觉定位。下面结合附图对本申请实施例的视觉定位方法进行详细的介绍。
图11是本申请实施例的视觉定位方法的示意性流程图。图11所示的方法可以采用图1所示的方法构建得到的视觉特征库进行视觉定位。图11所示的方法可以由视觉定位设 备来执行,该视觉定位设备具体可以是手机,电脑,个人数字助理,可穿戴设备,车载设备,物联网设备、虚拟现实设备、增强现实设备等等。
图11所示的方法包括步骤3001至步骤3004,下面对步骤3001至步骤3004进行详细的介绍。
3001、获取待处理图像。
上述待处理图像可以是视觉定位设备拍摄的图像,例如,该待处理图像可以是手机拍摄的图像。
3002、对待处理图像进行特征提取,以得到待处理图像的特征点和待处理图像的特征点的描述子。
上述步骤3002的具体实现过程可以参见上文中对步骤1002的描述,为了避免不必要的重复和冗余,这里不再进行详细描述。
3003、根据待处理图像的特征点的描述子,从视觉特征库中确定出待处理图像的特征点的匹配特征点。
具体地,在步骤3003中,可以根据待处理图像的特征点的描述子从视觉特征库中确定出待处理图像的特征点的匹配特征点,该匹配特征点的描述子是视觉特征库中与待处理图像的特征点的描述子最接近的。
3004、根据匹配特征点的3D位置,确定拍摄单元拍摄待处理图像时的位姿信息。
上述视觉特征库包括建库图像的特征点的描述子和建库图像的特征点的3D位置,上述视觉特征库满足下列条件中的至少一种:
建库图像的特征点包括多组特征点,多组特征点中的任意两组特征点的描述子的描述方式不同;
视觉特征库包括建库图像的描述子,建库图像的描述子是由建库图像的特征点的描述子合成得到的;
建库图像的特征点为多种场景图像的特征点,多种场景图像是对建库图像进行场景模拟得到的,多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种;
建库图像的特征点和建库图像的特征点的描述子是对多张切片图像进行特征提取得到的,多张切片图像是对建库图像进行切分处理得到的,其中,在多张切片图像中,相邻切片图像的部分图像内容相同;
视觉特征库包括建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度。
本申请实施例中的视觉特征库与传统方案的视觉特征库相比,包含更丰富的信息,因此,在根据本申请实施例中的视觉特征库进行视觉定位时,能够得到更好的视觉定位效果,可以使得视觉定位的效果更加准确。
上述步骤3004中根据匹配特征点的3D位置,确定拍摄单元拍摄待处理图像时的位姿信息时,可以先将匹配特征点的3D位置确定为待处理图像的特征点的位置,然后根据待处理图像的特征点的位置来确定拍摄单元拍摄待处理图像时的位姿信息。
应理解,在本申请中,待处理图像的特征点可以是多个,通过待处理图像的多个特征点的3D位置能够推导出拍摄单元拍摄待处理图像时的位姿信息。本申请中为了描述方便,统一采用待处理图像的特征点这一名称。
当视觉特征库中包含的不同的信息时,本申请实施例的视觉定位过程可能会有所不同,下面对视觉特征库中包含不同信息时的视觉定位过程进行详细描述。
情况一:建库图像的特征点包括多组特征点。
在情况一中,上述步骤3003中确定待处理图像的特征点的匹配特征点具体包括:
3003a、根据待处理图像的特征点的描述子的描述方式,从多组特征点中确定出目标组特征点;
3003b、根据待处理图像的特征点的描述子,从目标组特征点中确定出待处理图像的特征点的匹配特征点。
其中,上述待处理图像的特征点的描述子的描述方式与目标组特征点的描述方式相同。上述多组特征点是分别采用不同的特征提取算法对建库图像进行特征提取得到的。
本申请中,当建库图像的特征点包括多组特征点时,视觉特征库中包含的特征点的信息更多,通过从多组特征点中选择出与待处理图像的特征点的描述子的描述方式相同的目标特征点,能够在后续从目标特征点中选择出与待处理图像的特征点更匹配的匹配特征点,提高视觉定位的效果。
情况二:视觉特征库包括建库图像的描述子。
在情况二中,上述步骤3003中确定待处理图像的特征点的匹配特征点具体包括:
3003c、根据待处理图像的描述子从建库图像中确定出N张图像;
3003d、从N张图像的特征点中确定出待处理图像的特征点的匹配特征点。
其中,上述待处理图像的描述子由待处理图像的特征点的描述子合成得到的,上述待处理图像的描述子与上述N张图像中的任意一张图像的描述子的距离小于或者等于待处理图像的描述子与建库图像中剩余的M张图像中的任意一张图像的描述子的距离,上述建库图像由N张图像和M张图像组成。
本申请中,当视觉特征库中包括建库图像的描述子时,能够先根据待处理图像的描述子从视觉特征库中进行粗略的筛选,选择出描述子比较接近的N张图像,然后再从该N张图像的特征点中确定出待处理图像的特征点的匹配特征点,能够加速视觉定位的过程,提高视觉定位的效率。
情况三:建库图像的特征点为多种场景图像的特征点。
在情况三中,上述步骤3003中确定待处理图像的特征点的匹配特征点具体包括:
3003e、从多种场景图像中确定目标场景图像;
3003f、根据待处理图像的特征点的描述子,从目标场景图像的特征点中确定出待处理图像的特征点的匹配特征点。
其中,在步骤3003e所示的多种场景图像中,目标场景图像所对应的场景与拍摄待处理图像时的场景最接近。
本申请中,当视觉特征库中包含多种场景图像的特征点时,可以先从多种场景图像中确定出与待处理图像拍摄时的场景最接近的目标场景图像,然后再从该目标场景图像中确定待处理图像的特征点的匹配特征点,可以为待处理图像的特征点确定更准确的匹配特征点,进而提高视觉定位的成功率。
情况四:视觉特征库包括所述建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度。
在情况四中,上述步骤3004中确定拍摄单元拍摄待处理图像时的位姿信息具体包括:
3004a、根据所述匹配特征点的语义信息的置信度,对所述匹配特征点的3D位置进行加权处理;
3004b、根据加权处理结果确定所述拍摄单元拍摄所述待处理图像时的位姿信息。
其中,在上述步骤3004a进行加权处理的过程中,置信度越高的匹配特征点对应的权重越大。
本申请中,当视觉特征库中包含建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度时,能够在进行视觉定位时考虑到不同特征点对应的语义信息和置信度确定不同特征点在进行视觉定位时的重要程度,能够进行更精准的视觉定位,提高视觉定位的准确度。
图12为本申请实施例的视觉特征库的构建方法应用在具体的产品形态上的示意图。
如图12所示,通过全景相机进行拍摄,以得到建库图像,通过无人机或者激光扫描仪进行扫描得到建模数据,其中,全景相机、无人机以及激光扫描仪的内部参数已经标定好。另外,在室外场景下可以采用无人机获取建模数据,而在室内场景下可以采用激光扫描仪来获取建模数据。
在得到了建库图像和建模数据之后,可以通过服务器中的各个模块对建库图像和建模数据进行处理,最终得到建库图像的特征点的3D位置和建库图像的描述子,然后将建库图像的描述子保存在图像检索库中,将建库图像的特征点的3D位置保存在3D特征库中。
在对建库图像和建模数据进行处理时,可以利用服务器中的软件模块实现对建库图像和建模数据的处理。具体地,建模数据中包括建模图像和点云数据,可以利用数据对齐模块对建库图像和建模图像进行数据对齐,然后再利用3D模块结合建模数据进行3D建模得到3D模型;可以根据语义识别模块确定建库图像的特征点的语义信息和建库图像的特征点的语义信息的置信度;可以根据场景模拟模块对建库图像进行场景模拟,以得到多种场景下的场景图像;可以采用特征提取模块对建库图像进行特征提取,以得到建库图像的特征点和建库图像的特征点的描述子。可以采用3D位置获取模块确定建库图像的特征点的3D位置。
下面结合具体的测试结果对本申请实施例的视觉特征库的构建方法的效果进行说明。表1分别示出了利用现有方案和本申请方案构建得到的视觉特征库进行视觉定位的成功率。如表1所示,第一列为视觉特征库的构建方案,其中,传统方案是基于从运动中恢复结构(structure from motion,SFM)的视觉特征库的构建方案,本申请方案为本申请实施例的视觉特征库的构建方法,第二列为相应的视觉定位方案,包括ORB定位(采用ORB特征提取算法提取特征的视觉定位方案)和rootSIFT定位(采用ORB特征提取算法提取特征的视觉定位方案),第三列为视觉定位的成功率。
由表1可知,无论采用ORB定位还是rootSIFT定位,在基于本申请方案得到的视觉特征库进行视觉定位的成功率都要高于基于传统方案得到的视觉特征库进行视觉定位的成功率,其中,在采用ORB定位时,基于本申请方案得到的视觉特征库进行视觉定位的成功率为93%,远远大于基于传统方案得到的视觉特征库进行视觉定位的成功率61%,在采用rootSIFT定位时,基于本申请方案得到的视觉特征库进行视觉定位的成功率为98%,也远远大于于基于传统方案得到的视觉特征库进行视觉定位的成功率71%。
表1
视觉特征库构建方案 视觉定位方案 视觉定位成功率
传统方案 ORB定位 61%
本申请方案 ORB定位 93%
传统方案 rootSIFT定位 71%
本申请方案 rootSIFT定位 98%
上文结合附图对本申请实施例的视觉特征库的构建方法和视觉定位方法进行了详细介绍,下面结合附图对本申请实施例的视觉特征库的构建装置和视觉定位装置进行介绍,应理解,下文中介绍的视觉特征库的构建装置能够执行本申请实施例的视觉特征库的构建方法,下文中介绍的视觉定位装置能够执行本申请实施例的视觉定位方法。下面在介绍这两种装置时适当省略重复的描述。
图13是本申请实施例的视觉特征库的构建装置的示意性框图。图13所示的装置5000包括获取单元5001、特征提取单元5002、位置确定单元5003以及构建单元5004。
其中,图13所示的装置5000具体可以用于执行图1所示的方法。具体地,获取单元5001用于执行步骤1001,特征提取单元5002用于执行步骤1002,位置确定单元5003用于执行步骤1003,构建单元5004用于执行步骤1004。
图14是本申请实施例的视觉定位装置的示意性框图。图14所示的装置6000包括获取单元6001、特征提取单元6002、特征匹配单元6003以及视觉定位单元6004。
其中,图14所示的装置6000具体可以用于执行图11所示的方法。具体地,获取单元6001用于执行步骤3001,特征提取单元6002用于执行步骤3002,特征匹配单元6003用于执行步骤3003,视觉定位单元6004用于执行步骤3004。
图15是本申请实施例的视觉特征库的构建装置的硬件结构示意图。
图15所示的视觉特征库的构建装置7000包括存储器7001、处理器7002、通信接口7003以及总线7004。其中,存储器7001、处理器7002、通信接口7003通过总线7004实现彼此之间的通信连接。
上述处理器7002可以通过(调用)通信接口7003来获取建库图像(此时可以通过通信接口从其他装置获取建库图像)或者从存储器7001中获取建库图像(此时建库图像存储在存储器7001中),然后通过处理器7002对建库图像进行一系列处理,最终构建得到视觉特征库。
下面对上述装置7000中的各个模块和单元进行详细介绍。
存储器7001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器7001可以存储程序,当存储器7001中存储的程序被处理器7002执行时,处理器7002用于执行本申请实施例的视觉特征库的构建方法的各个步骤。
处理器7002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的视觉特征库的构建方法。
处理器7002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本 申请的视觉特征库的构建方法的各个步骤可以通过处理器7002中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器7002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。
通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器7001,处理器7002读取存储器7001中的信息,结合其硬件完成本视觉特征库的构建装置中包括的单元所需执行的功能,或者执行本申请方法实施例的视觉特征库的构建方法。
通信接口7003使用例如但不限于收发器一类的收发装置,来实现视觉特征库的构建装置7000与其他设备或通信网络之间的通信。例如,可以通过通信接口7003获取待构建的神经网络的信息以及构建神经网络过程中需要的训练数据。
总线7004可包括在视觉特征库的构建装置7000各个部件(例如,存储器7001、处理器7002、通信接口7003)之间传送信息的通路。
上述视觉特征库的构建装置5000中的获取单元5001可以相当于视觉特征库的构建装置7000中的通信接口7003,用于获取建库图像。
上述视觉特征库的构建装置5000中的特征提取单元5002、位置确定单元5003以及构建单元5004相当于视觉特征库的构建装置7000中的处理器7002,用于对建库图像进行一系列的处理后最终构建得到视觉特征库。
图16是本申请实施例提供的视觉定位装置的硬件结构示意图。图16所示的视觉定位装置8000包括存储器8001、处理器8002、通信接口8003以及总线8004。其中,存储器8001、处理器8002、通信接口8003通过总线8004实现彼此之间的通信连接。
上述处理器8002可以通过(调用)摄像头(图16中未示出)来获取待处理图像或者从存储器8001中获取待处理图像,然后通过处理器8002对待处理图像进行一系列处理,最终实现视觉定位。
上文中对视觉特征库的构建装置7000中的各个模块的限定和解释同样也适用于视觉定位装置8000,这里不再详细描述。
上述存储器8001可以用于存储程序,处理器8002用于执行存储器8001存储的程序,当存储器8001存储的程序被执行时,处理器8002用于执行本申请实施例的视觉定位方法的各个步骤。
上述视觉定位装置6000中的获取单元6001可以相当于视觉定位装置8000中的通信接口8003,用于获取待处理图像。
上述视觉定位装置6000中的特征提取单元6002、特征匹配单元6003以及视觉定位单元6004相当于视觉定位装置8000中的处理器8002,用于对待处理图像进行一系列的处理后确定拍摄单元拍摄待处理图像时的位姿信息。
上述图13所示的视觉特征库的构建装置5000和图15所示的视觉特征库的构建装置7000具体可以是服务器、云端设备或者具有一定运算能力的计算机设备。
上述图14所示的视觉定位装置6000和图16所示的视觉定位装置8000具体可以是手机,电脑,个人数字助理,可穿戴设备,车载设备,物联网设备、虚拟现实设备、增强现实设备等等。
本申请实施例的视觉定位方法可以由终端设备来执行,下面结合图17对终端设备的结构进行详细的描述。
图17是本申请实施例的终端设备的硬件结构示意图。图17所示的终端设备可以执行本申请实施例的视觉定位方法。
图17所示的终端设备可以执行图3所示的视觉定位方法的各个步骤。具体地,可以通过摄像头3060可以获取待处理图像(摄像头可以执行上述步骤3001),接下来再通过处理器对待处理图像进行处理能够实现视觉定位(处理器可以执行上述步骤3002至3004)。
图17所示的终端设备包括通信模块3010、传感器3020、用户输入模块3030、输出模块3040、处理器3050、摄像头3060、存储器3070以及电源3080。下面分别对这些模块进行详细的介绍。
通信模块3010可以包括至少一个能使该终端设备与其他设备(例如,云端设备)之间进行通信的模块。例如,通信模块3010可以包括有线网络接口、广播接收模块、移动通信模块、无线因特网模块、局域通信模块和位置(或定位)信息模块等其中的一个或多个。
传感器3020可以感知用户的一些操作,传感器3020可以包括距离传感器,触摸传感器等等。传感器3020可以感知用户触摸屏幕或者靠近屏幕等操作。
用户输入模块3030,用于接收输入的数字信息、字符信息或接触式触摸操作/非接触式手势,以及接收与系统的用户设置以及功能控制有关的信号输入等。用户输入模块3030包括触控面板和/或其他输入设备。
输出模块3040包括显示面板,用于显示由用户输入的信息、提供给用户的信息或系统的各种菜单界面等。该输出模块3040可以显示视觉定位结果。
可选的,可以采用液晶显示器(liquid crystal display,LCD)或有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板。在其他一些实施例中,触控面板可覆盖显示面板上,形成触摸显示屏。另外,输出模块3040还可以包括音频输出模块、告警器以及触觉模块等。
摄像头3060,用于拍摄图像,摄像头3060拍摄的图像可以送入到处理器中进行视觉定位,处理器通过对摄像头拍摄的图像进行处理(具体处理过程可以如步骤3001至3004所示),从而得到摄像头3060的拍摄图像时的位姿信息。
电源3080可以在处理器3050的控制下接收外部电力和内部电力,并且提供整个终端设备各个模块运行时需要的电力。
处理器3050可以指示一个或多个处理器,例如,处理器3050可以包括一个或多个中央处理器,或者包括一个中央处理器和一个图形处理器,或者包括一个应用处理器和一个协处理器(例如微控制单元或神经网络处理器)。当处理器3050包括多个处理器时,这 多个处理器可以集成在同一块芯片上,也可以各自为独立的芯片。一个处理器可以包括一个或多个物理核,其中物理核为最小的处理模块。
存储器3070存储计算机程序,该计算机程序包括操作系统程序3071和应用程序3072等。典型的操作系统如微软公司的Windows,苹果公司的MacOS等用于台式机或笔记本的系统,又如谷歌公司开发的基于
Figure PCTCN2020107597-appb-000007
的安卓
Figure PCTCN2020107597-appb-000008
系统等用于移动终端的系统。当本申请实施例的资源调度方法通过软件的方式实现时,可以认为是通过应用程序3071来具体实现的。
存储器3070可以是以下类型中的一种或多种:闪速(flash)存储器、硬盘类型存储器、微型多媒体卡型存储器、卡式存储器(例如SD或XD存储器)、随机存取存储器(random access memory,RAM)、静态随机存取存储器(static RAM,SRAM)、只读存储器(read only memory,ROM)、电可擦除可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、可编程只读存储器(programmable ROM,PROM)、磁存储器、磁盘或光盘。在其他一些实施例中,存储器3070也可以是因特网上的网络存储设备,系统可以对在因特网上的存储器3070执行更新或读取等操作。
处理器3050用于读取存储器3070中的计算机程序,然后执行计算机程序定义的方法,例如处理器3050读取操作系统程序3072从而在该系统运行操作系统以及实现操作系统的各种功能,或读取一种或多种应用程序3071,从而在该系统上运行应用。
例如,上述存储器3070可以存储一种计算机程序(该计算机程序是本申请实施例的资源调度方法对应的程序),当处理器3050执行该极端及程序时,处理器3050能够执行本申请实施例的资源调度方法。
存储器3070还存储有除计算机程序之外的其他数据3073,例如,存储器3070可以存储本申请的资源调度方法中涉及的绘帧线程的负载特征,绘帧线程的负载预测值等等。
图17中各个模块的连接关系仅为一种示例,图17中的各个模块还可以是其他的连接关系,例如,终端设备中所有模块通过总线连接。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络 单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (30)

  1. 一种视觉特征库的构建方法,其特征在于,包括:
    获取建库图像;
    对所述建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子;
    将所述建库图像的特征点对应的射线与3D模型相交,以确定所述建库图像的特征点的3D位置,其中,所述建库图像的特征点的3D位置为所述建库图像的特征点对应的射线与所述3D模型相交的交点的3D位置,所述建库图像的特征点对应的射线是以所述建库图像的投影中心为起点并经过所述建库图像的特征点的射线;
    构建视觉特征库,所述视觉特征库包括所述建库图像的特征点的描述子和所述建库图像的特征点的3D位置。
  2. 如权利要求1所述的构建方法,其特征在于,对所述建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子,包括:
    采用多种特征提取算法对所述建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子。
  3. 如权利要求1或2所述的构建方法,其特征在于,所述视觉特征库还包括所述建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度,其中,所述建库图像的特征点的语义信息与所述建库图像的特征点所在区域的语义信息相同,所述建库图像的特征点的语义信息的置信度与所述建库图像的特征点所在区域的语义信息的置信度相同,所述建库图像的每个区域的语义信息和所述每个区域的语义信息的置信度是对所述建库图像进行语义分割得到的。
  4. 如权利要求1-3中任一项所述的构建方法,其特征在于,所述视觉特征库还包括建库图像的描述子,其中,所述建库图像的描述子是由所述建库图像的特征点的描述子合成得到的。
  5. 如权利要求1-4中任一项所述的构建方法,其特征在于,对所述建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子,包括:
    对所述建库图像进行场景模拟,生成多种场景下的场景图像,其中,所述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种;
    对所述多种场景下的场景图像进行特征提取,以得到所述建库图像的特征点以及所述建库图像的特征点的描述子。
  6. 如权利要求1-5中任一项所述的构建方法,其特征在于,对所述建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子,包括:
    对所述建库图像进行切分处理,以得到多张切片图像,在所述多张切片图像中,相邻切片图像的部分图像内容相同;
    对所述多张切片图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子。
  7. 如权利要求1-6中任一项所述的构建方法,其特征在于,所述方法还包括:
    接收来自用户设备的待处理图像;
    对所述待处理图像进行特征提取,以得到所述待处理图像的特征点和所述待处理图像的特征点的描述子;
    将所述待处理图像的特征点对应的射线与所述3D模型相交,以确定所述待处理图像的特征点的3D位置,其中,所述待处理图像的特征点的3D位置为所述待处理图像的特征点对应的射线与所述3D模型相交的交点的3D位置,所述待处理图像的特征点对应的射线是以所述待处理图像的投影中心为起点,并经过所述待处理图像的特征点的射线,所述待处理图像与所述3D模型位于同一坐标系中,所述待处理图像的投影中心为第二拍摄单元拍摄所述待处理图像时所处的位置;
    更新所述视觉特征库,所述更新后的视觉特征库包括所述待处理图像的特征点和所述待处理图像的特征点的3D位置。
  8. 如权利要求7所述的构建方法,其特征在于,在更新所述视觉特征库之前,所述方法还包括:
    确定所述待处理图像的语义信息与参照图像的语义信息不同,其中,所述参照图像是所述视觉特征库中与所述待处理图像的位置最接近的图像。
  9. 如权利要求1-8中任一项所述的构建方法,其特征在于,所述方法还包括:
    获取建模数据,所述建模数据包括建模图像和点云数据;
    对所述建模图像进行特征提取,以得到所述建模图像的特征点;
    对所述建库图像和所述建模图像中的任意两张图像的特征点进行特征匹配,对匹配得到的特征点进行串点,以得到同名特征点序列;
    根据所述同名特征点序列对所述建库图像和所述建模图像进行平差处理,以得到所述建库图像的位姿和所述建模图像的位姿;
    根据所述建模图像的位姿和点云数据,构建所述3D模型。
  10. 如权利要求1-9中任一项所述的构建方法,其特征在于,所述建库图像为全景图像。
  11. 一种视觉定位方法,其特征在于,包括:
    获取待处理图像;
    对所述待处理图像进行特征提取,以得到所述待处理图像的特征点和所述待处理图像的特征点的描述子;
    根据所述待处理图像的特征点的描述子,从视觉特征库中确定出所述待处理图像的特征点的匹配特征点,所述视觉特征库包括建库图像的特征点的描述子和所述建库图像的特征点的3D位置,所述视觉特征库满足下列条件中的至少一种:
    所述建库图像的特征点包括多组特征点,所述多组特征点中的任意两组特征点的描述子的描述方式不同;
    所述视觉特征库包括所述建库图像的描述子,所述建库图像的描述子是由所述建库图像的特征点的描述子合成得到的;
    所述建库图像的特征点为多种场景下的场景图像的特征点,所述多种场景下的场景图像是对所述建库图像进行场景模拟得到的,所述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种;
    所述建库图像的特征点和所述建库图像的特征点的描述子是对多张切片图像进行特征提取得到的,所述多张切片图像是对所述建库图像进行切分处理得到的,在所述多张切片图像中,相邻切片图像的部分图像内容相同;
    所述视觉特征库包括所述建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度;
    根据所述匹配特征点的3D位置,确定拍摄单元拍摄所述待处理图像时的位姿信息。
  12. 如权利要求11所述的视觉定位方法,其特征在于,所述建库图像的特征点包括多组特征点,所述根据所述待处理图像的特征点的描述子,从视觉特征库中确定出所述待处理图像的特征点的匹配特征点,包括:
    根据所述待处理图像的特征点的描述子的描述方式,从所述多组特征点中确定出目标组特征点,所述目标组特征点的描述方式与所述待处理图像的特征点的描述子的描述方式相同;
    根据所述待处理图像的特征点的描述子,从所述目标组特征点中确定出所述待处理图像的特征点的匹配特征点。
  13. 如权利要求11或12所述的视觉定位方法,其特征在于,所述视觉特征库包括所述建库图像的描述子,所述根据所述待处理图像的特征点的描述子,从视觉特征库中确定出所述待处理图像的特征点的匹配特征点,包括:
    根据所述待处理图像的描述子从所述建库图像中确定出N张图像,其中,所述待处理图像的描述子由所述待处理图像的特征点的描述子合成得到的,所述待处理图像的描述子与所述N张图像中的任意一张图像的描述子的距离小于或者等于所述待处理图像的描述子与所述建库图像中剩余的M张图像中的任意一张图像的描述子的距离,所述建库图像由N张图像和M张图像组成;
    从N张图像的特征点中确定出待处理图像的特征点的匹配特征点。
  14. 如权利要求如权利要求11-13中任一项所述的视觉定位方法,其特征在于,所述建库图像的特征点为多种场景下的场景图像的特征点,所述根据所述待处理图像的特征点的描述子,从视觉特征库中确定出所述待处理图像的特征点的匹配特征点,包括:
    从所述多种场景下的场景图像中确定目标场景图像,其中,在所述多种场景下的场景图像中,所述目标场景图像对应的场景与拍摄所述待处理图像时的场景最接近;
    根据所述待处理图像的特征点的描述子,从所述目标场景图像的特征点中确定出所述待处理图像的特征点的匹配特征点。
  15. 如权利要求11-14中任一项所述的视觉定位方法,其特征在于,所述视觉特征库包括所述建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度,所述根据所述匹配特征点的3D位置,确定拍摄单元拍摄所述待处理图像时的位姿信息,包括:
    根据所述匹配特征点的语义信息的置信度,对所述匹配特征点的3D位置进行加权处理,并根据加权处理结果确定所述拍摄单元拍摄所述待处理图像时的位姿信息,其中,置信度越高的所述匹配特征点对应的权重越大。
  16. 一种视觉特征库的构建装置,其特征在于,包括:
    获取单元,用于获取建库图像;
    特征提取单元,用于对所述建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子;
    位置确定单元,将所述建库图像的特征点对应的射线与3D模型相交,以确定所述建库图像的特征点的3D位置,其中,所述建库图像的特征点的3D位置为所述建库图像的特征点对应的射线与所述3D模型相交的交点的3D位置,所述建库图像的特征点对应的射线是以所述建库图像的投影中心为起点并经过所述建库图像的特征点的射线;
    构建单元,用于构建视觉特征库,所述视觉特征库包括所述建库图像的特征点的描述子和所述建库图像的特征点的3D位置。
  17. 如权利要求16所述的构建装置,其特征在于,所述特征提取单元用于:
    采用多种特征提取算法对所述建库图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子。
  18. 如权利要求16或17所述的构建装置,其特征在于,所述特征提取单元用于:
    对所述建库图像进行场景模拟,生成多种场景下的场景图像,其中,所述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种;
    对所述多种场景下的场景图像进行特征提取,以得到所述建库图像的特征点以及所述建库图像的特征点的描述子。
  19. 如权利要求16-18中任一项所述的构建装置,其特征在于,所述特征提取单元用于:
    对所述建库图像进行切分处理,以得到多张切片图像,在所述多张切片图像中,相邻切片图像的部分图像内容相同;
    对所述多张切片图像进行特征提取,以得到所述建库图像的特征点和所述建库图像的特征点的描述子。
  20. 如权利要求16-19中任一项所述的构建装置,其特征在于,所述获取单元还用于:
    接收来自用户设备的待处理图像;
    所述特征提取单元用于对所述待处理图像进行特征提取,以得到所述待处理图像的特征点和所述待处理图像的特征点的描述子;
    所述位置确定单元用于将所述待处理图像的特征点对应的射线与所述3D模型相交,以确定所述待处理图像的特征点的3D位置,其中,所述待处理图像的特征点的3D位置为所述待处理图像的特征点对应的射线与所述3D模型相交的交点的3D位置,所述待处理图像的特征点对应的射线是以所述待处理图像的投影中心为起点,并经过所述待处理图像的特征点的射线,所述待处理图像与所述3D模型位于同一坐标系中,所述待处理图像的投影中心为第二拍摄单元拍摄所述待处理图像时所处的位置;
    所述构建单元用于更新所述视觉特征库,所述更新后的视觉特征库包括所述待处理图像的特征点和所述待处理图像的特征点的3D位置。
  21. 如权利要求20所述的构建装置,其特征在于,所述构建单元还用于:
    在更新所述视觉特征库之前,确定所述待处理图像的语义信息与参照图像的语义信息不同,其中,所述参照图像是所述视觉特征库中与所述待处理图像的位置最接近的图像。
  22. 如权利要求16-21中任一项所述的构建装置,其特征在于,所述获取单元还用于:
    获取建模数据,所述建模数据包括建模图像和点云数据;
    所述特征提取单元用于:
    对所述建模图像进行特征提取,以得到所述建模图像的特征点;
    对所述建库图像和所述建模图像中的任意两张图像的特征点进行特征匹配,对匹配得到的特征点进行串点,以得到同名特征点序列;
    根据所述同名特征点序列对所述建库图像和所述建模图像进行平差处理,以得到所述建库图像的位姿和所述建模图像的位姿;
    所述构建单元用于根据所述建模图像的位姿和点云数据,构建所述3D模型。
  23. 一种视觉定位装置,其特征在于,包括:
    获取单元,用于获取待处理图像;
    特征提取单元,用于对所述待处理图像进行特征提取,以得到所述待处理图像的特征点和所述待处理图像的特征点的描述子;
    特征匹配单元,用于根据所述待处理图像的特征点的描述子,从视觉特征库中确定出所述待处理图像的特征点的匹配特征点,所述视觉特征库包括建库图像的特征点的描述子和所述建库图像的特征点的3D位置,所述视觉特征库满足下列条件中的至少一种:
    所述建库图像的特征点包括多组特征点,所述多组特征点中的任意两组特征点的描述子的描述方式不同;
    所述视觉特征库包括所述建库图像的描述子,所述建库图像的描述子是由所述建库图像的特征点的描述子合成得到的;
    所述建库图像的特征点为多种场景下的场景图像的特征点,所述多种场景下的场景图像是对所述建库图像进行场景模拟得到的,所述多种场景包括白天、夜晚、雨天、雪天以及阴天中的至少两种;
    所述建库图像的特征点和所述建库图像的特征点的描述子是对多张切片图像进行特征提取得到的,所述多张切片图像是对所述建库图像进行切分处理得到的,在所述多张切片图像中,相邻切片图像的部分图像内容相同;
    所述视觉特征库包括所述建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度;
    视觉定位单元,用于根据所述匹配特征点的3D位置,确定拍摄单元拍摄所述待处理图像时的位姿信息。
  24. 如权利要求23所述的视觉定位装置,其特征在于,所述建库图像的特征点包括多组特征点,所述特征匹配单元用于:
    根据所述待处理图像的特征点的描述子的描述方式,从所述多组特征点中确定出目标组特征点,所述目标组特征点的描述方式与所述待处理图像的特征点的描述子的描述方式相同;
    根据所述待处理图像的特征点的描述子,从所述目标组特征点中确定出所述待处理图像的特征点的匹配特征点。
  25. 如权利要求23或24所述的视觉定位装置,其特征在于,所述视觉特征库包括所述建库图像的描述子,所述特征匹配单元用于:
    根据所述待处理图像的描述子从所述建库图像中确定出N张图像,其中,所述待处理图像的描述子由所述待处理图像的特征点的描述子合成得到的,所述待处理图像的描述子 与所述N张图像中的任意一张图像的描述子的距离小于或者等于所述待处理图像的描述子与所述建库图像中剩余的M张图像中的任意一张图像的描述子的距离,所述建库图像由N张图像和M张图像组成;
    从N张图像的特征点中确定出待处理图像的特征点的匹配特征点。
  26. 如权利要求如权利要求23-25中任一项所述的视觉定位装置,其特征在于,所述建库图像的特征点为多种场景下的场景图像的特征点,所述特征匹配单元用于:
    从所述多种场景下的场景图像中确定目标场景图像,其中,在所述多种场景下的场景图像中,所述目标场景图像对应的场景与拍摄所述待处理图像时的场景最接近;
    根据所述待处理图像的特征点的描述子,从所述目标场景图像的特征点中确定出所述待处理图像的特征点的匹配特征点。
  27. 如权利要求23-26中任一项所述的视觉定位装置,其特征在于,所述视觉特征库包括所述建库图像的特征点的语义信息和所述建库图像的特征点的语义信息的置信度,所述视觉定位单元用于:
    根据所述匹配特征点的语义信息的置信度,对所述匹配特征点的3D位置进行加权处理,并根据加权处理结果确定所述拍摄单元拍摄所述待处理图像时的位姿信息,其中,置信度越高的所述匹配特征点对应的权重越大。
  28. 一种视觉特征库的构建装置,其特征在于,包括:
    存储器,用于存储程序;
    处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被所述处理器执行时,所述处理器执行如权利要求1-10中任一项所述的方法。
  29. 一种视觉定位装置,其特征在于,包括:
    存储器,用于存储程序;
    处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被所述处理器执行时,所述处理器执行如权利要求11-15中任一项所述的方法。
  30. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1-10中任一项所述的构建方法或者权利要求11-15中任一项所述的视觉定位方法。
PCT/CN2020/107597 2019-08-09 2020-08-07 视觉特征库的构建方法、视觉定位方法、装置和存储介质 WO2021027692A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/665,793 US20220156968A1 (en) 2019-08-09 2022-02-07 Visual feature database construction method, visual positioning method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910736102.2A CN112348885A (zh) 2019-08-09 2019-08-09 视觉特征库的构建方法、视觉定位方法、装置和存储介质
CN201910736102.2 2019-08-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/665,793 Continuation US20220156968A1 (en) 2019-08-09 2022-02-07 Visual feature database construction method, visual positioning method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2021027692A1 true WO2021027692A1 (zh) 2021-02-18

Family

ID=74367059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/107597 WO2021027692A1 (zh) 2019-08-09 2020-08-07 视觉特征库的构建方法、视觉定位方法、装置和存储介质

Country Status (3)

Country Link
US (1) US20220156968A1 (zh)
CN (1) CN112348885A (zh)
WO (1) WO2021027692A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022247126A1 (zh) * 2021-05-24 2022-12-01 浙江商汤科技开发有限公司 视觉定位方法、装置、设备、介质及程序

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987228A (zh) * 2018-06-20 2022-01-28 华为技术有限公司 一种数据库构建方法、一种定位方法及其相关设备
WO2022252337A1 (zh) * 2021-06-04 2022-12-08 华为技术有限公司 3d地图的编解码方法及装置
CN115937722A (zh) * 2021-09-30 2023-04-07 华为技术有限公司 一种设备定位方法、设备及系统
CN114266830B (zh) * 2021-12-28 2022-07-15 北京建筑大学 地下大空间高精度定位方法
CN117710467B (zh) * 2024-02-06 2024-05-28 天津云圣智能科技有限责任公司 无人机定位方法、设备及飞行器

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080247635A1 (en) * 2006-03-20 2008-10-09 Siemens Power Generation, Inc. Method of Coalescing Information About Inspected Objects
CN102075686A (zh) * 2011-02-10 2011-05-25 北京航空航天大学 一种鲁棒的实时在线摄像机跟踪方法
CN102867057A (zh) * 2012-09-17 2013-01-09 北京航空航天大学 一种基于视觉定位的虚拟向导构建方法
CN103954970A (zh) * 2014-05-08 2014-07-30 天津市勘察院 一种地形要素采集方法
CN105225240A (zh) * 2015-09-25 2016-01-06 哈尔滨工业大学 一种基于视觉特征匹配与拍摄角度估计的室内定位方法
CN105389569A (zh) * 2015-11-17 2016-03-09 北京工业大学 一种人体姿态估计方法
CN106447585A (zh) * 2016-09-21 2017-02-22 武汉大学 城市地区和室内高精度视觉定位系统及方法
CN107742311A (zh) * 2017-09-29 2018-02-27 北京易达图灵科技有限公司 一种视觉定位的方法及装置
CN108109164A (zh) * 2017-12-08 2018-06-01 联想(北京)有限公司 一种信息处理方法及电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107223269B (zh) * 2016-12-29 2021-09-28 达闼机器人有限公司 三维场景定位方法和装置
CN111373393B (zh) * 2017-11-24 2022-05-31 华为技术有限公司 图像检索方法和装置以及图像库的生成方法和装置
CN109141433A (zh) * 2018-09-20 2019-01-04 江阴市雷奥机器人技术有限公司 一种机器人室内定位系统及定位方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080247635A1 (en) * 2006-03-20 2008-10-09 Siemens Power Generation, Inc. Method of Coalescing Information About Inspected Objects
CN102075686A (zh) * 2011-02-10 2011-05-25 北京航空航天大学 一种鲁棒的实时在线摄像机跟踪方法
CN102867057A (zh) * 2012-09-17 2013-01-09 北京航空航天大学 一种基于视觉定位的虚拟向导构建方法
CN103954970A (zh) * 2014-05-08 2014-07-30 天津市勘察院 一种地形要素采集方法
CN105225240A (zh) * 2015-09-25 2016-01-06 哈尔滨工业大学 一种基于视觉特征匹配与拍摄角度估计的室内定位方法
CN105389569A (zh) * 2015-11-17 2016-03-09 北京工业大学 一种人体姿态估计方法
CN106447585A (zh) * 2016-09-21 2017-02-22 武汉大学 城市地区和室内高精度视觉定位系统及方法
CN107742311A (zh) * 2017-09-29 2018-02-27 北京易达图灵科技有限公司 一种视觉定位的方法及装置
CN108109164A (zh) * 2017-12-08 2018-06-01 联想(北京)有限公司 一种信息处理方法及电子设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022247126A1 (zh) * 2021-05-24 2022-12-01 浙江商汤科技开发有限公司 视觉定位方法、装置、设备、介质及程序

Also Published As

Publication number Publication date
US20220156968A1 (en) 2022-05-19
CN112348885A (zh) 2021-02-09

Similar Documents

Publication Publication Date Title
WO2021027692A1 (zh) 视觉特征库的构建方法、视觉定位方法、装置和存储介质
WO2020119527A1 (zh) 人体动作识别方法、装置、终端设备及存储介质
US8442307B1 (en) Appearance augmented 3-D point clouds for trajectory and camera localization
CN109960742B (zh) 局部信息的搜索方法及装置
CN112085840B (zh) 语义分割方法、装置、设备及计算机可读存储介质
CN111046125A (zh) 一种视觉定位方法、系统及计算机可读存储介质
CN113989450B (zh) 图像处理方法、装置、电子设备和介质
US11830103B2 (en) Method, apparatus, and computer program product for training a signature encoding module and a query processing module using augmented data
WO2022179581A1 (zh) 一种图像处理方法及相关设备
WO2023280038A1 (zh) 一种三维实景模型的构建方法及相关装置
WO2021093679A1 (zh) 视觉定位方法和装置
CN108711144A (zh) 增强现实方法及装置
EP4307219A1 (en) Three-dimensional target detection method and apparatus
WO2022237821A1 (zh) 生成交通标志线地图的方法、设备和存储介质
CN114565916A (zh) 目标检测模型训练方法、目标检测方法以及电子设备
CN113378605B (zh) 多源信息融合方法及装置、电子设备和存储介质
CN113378756A (zh) 一种三维人体语义分割方法、终端设备及存储介质
CN111179309A (zh) 一种跟踪方法及设备
WO2021179751A1 (zh) 图像处理方法和系统
WO2024093641A1 (zh) 多模态融合的高精地图要素识别方法、装置、设备及介质
CN116858215B (zh) 一种ar导航地图生成方法及装置
CN112288878B (zh) 增强现实预览方法及预览装置、电子设备及存储介质
CN115578432B (zh) 图像处理方法、装置、电子设备及存储介质
US20230169680A1 (en) Beijing baidu netcom science technology co., ltd.
CN113379748A (zh) 一种点云全景分割方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20852507

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20852507

Country of ref document: EP

Kind code of ref document: A1