CN111859003B - Visual positioning method and device, electronic equipment and storage medium - Google Patents

Visual positioning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111859003B
CN111859003B CN202010710996.0A CN202010710996A CN111859003B CN 111859003 B CN111859003 B CN 111859003B CN 202010710996 A CN202010710996 A CN 202010710996A CN 111859003 B CN111859003 B CN 111859003B
Authority
CN
China
Prior art keywords
database
feature
query image
feature points
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010710996.0A
Other languages
Chinese (zh)
Other versions
CN111859003A (en
Inventor
金诚
冯友计
章国锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shangtang Technology Development Co Ltd
Original Assignee
Zhejiang Shangtang Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Shangtang Technology Development Co Ltd filed Critical Zhejiang Shangtang Technology Development Co Ltd
Priority to CN202010710996.0A priority Critical patent/CN111859003B/en
Publication of CN111859003A publication Critical patent/CN111859003A/en
Priority to PCT/CN2020/139166 priority patent/WO2022016803A1/en
Priority to TW110116124A priority patent/TW202205206A/en
Application granted granted Critical
Publication of CN111859003B publication Critical patent/CN111859003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The disclosure relates to a visual positioning method and apparatus, an electronic device, and a storage medium. The method comprises the following steps: extracting a feature vector of a feature point of the query image; searching database characteristic points matched with the characteristic points of the query image according to the characteristic vectors of the characteristic points of the query image, wherein the database characteristic points represent the characteristic points of the database image; and determining a visual positioning result of the query image according to the matched database feature points.

Description

Visual positioning method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer vision technologies, and in particular, to a visual positioning method and apparatus, an electronic device, and a storage medium.
Background
With the continuous development of information technology, positioning technology is becoming more and more important to the life of people. The conventional positioning technologies mainly include a positioning technology based on a GPS (Global positioning System), a positioning technology based on a wireless local area network or bluetooth, a positioning technology based on an ultra wideband, and the like. These conventional positioning techniques have certain limitations. The GPS signal has poor penetration capability, and effective and accurate positioning is difficult to achieve in an environment with dense buildings or an indoor environment. In addition, even in an open scene, a professional GPS device with high cost is required to realize high-precision positioning, and thus it is difficult to realize consumer-grade applications. The positioning technology based on the wireless local area network or the Bluetooth needs to arrange related equipment in an area to be positioned in advance, the arrangement process is complicated, the reliability and the precision are poor, and the positioning range is small. The positioning technology based on the ultra-wideband can achieve high precision relatively, but at least three receivers are required, and the transmitter and the receivers need to be kept clear, so that the application scene of the positioning technology based on the ultra-wideband is limited. For a large scene, the ultra-wideband-based positioning technology needs to multiply the number of receivers, and the system reliability is poor. In addition, the above positioning technology can only obtain position information, and it is difficult to obtain reliable attitude information.
In recent years, with the continuous maturity of visual positioning technology, visual-based positioning methods are increasingly applied. Compared with the traditional positioning technology, the visual positioning technology is simple in information acquisition mode and does not need to greatly change a positioning scene. In addition, the visual positioning technology can not only position the position information, but also position the attitude information, so that the positioning result can not only serve the requirement of conventional position information acquisition, but also can realize more intelligent applications, such as augmented reality and the like. How to improve the speed of visual positioning is a technical problem to be solved urgently.
Disclosure of Invention
The present disclosure provides a visual positioning technical solution.
According to an aspect of the present disclosure, there is provided a visual positioning method, including:
extracting a feature vector of a feature point of the query image;
searching database characteristic points matched with the characteristic points of the query image according to the characteristic vectors of the characteristic points of the query image, wherein the database characteristic points represent the characteristic points of the database image;
and determining a visual positioning result of the query image according to the matched database feature points.
By extracting the feature vectors of the feature points of the query image, searching the database feature points matched with the feature points of the query image according to the feature vectors of the feature points of the query image, and determining the visual positioning result of the query image according to the matched database feature points, a local map is obtained without retrieval in visual positioning, feature point matching is directly performed, and the visual positioning result of the query image is determined according to the database feature points matched with the feature points of the query image, so that the positioning process is more direct and effective, the memory consumption is lower, the time consumption of visual positioning can be reduced, and the positioning process is more reliable.
In a possible implementation manner, the extracting a feature vector of a feature point of a query image includes:
converting the query image to obtain at least one converted image corresponding to the query image;
and performing feature extraction on at least two images in the query image and the at least one conversion image to obtain feature vectors of feature points of the query image.
In the implementation mode, the query image is subjected to transformation processing, and at least two images in the query image and the at least one transformation image are utilized to perform feature extraction, so that the obtained feature vector of the feature point of the query image can reflect richer and comprehensive information in the query image, and the extracted feature vector has stronger robustness to environmental changes such as illumination and the like, thereby being beneficial to improving the accuracy of visual positioning.
In a possible implementation manner, the performing feature extraction on at least two images of the query image and the at least one transformed image to obtain a feature vector of a feature point of the query image includes:
inputting at least two images of the query image and the at least one transformation image into a first neural network respectively, and outputting feature maps of the at least two images via the first neural network;
performing grouping convolution on the feature maps of the at least two images to obtain at least two grouping convolution results;
and performing feature fusion on the at least two grouped convolution results to obtain a feature vector of the feature point of the query image.
In this implementation manner, the feature maps of the at least two images are subjected to the packet convolution, so that the deep features of the query image can be obtained, the robustness of subsequent feature point matching can be improved, and the reliability of visual positioning can be improved.
In a possible implementation manner, the searching, according to the feature vector of the feature point of the query image, for the database feature point matching the feature point of the query image includes:
decomposing the feature vectors of the feature points of the query image to obtain a plurality of sub-feature vectors of the feature points of the query image, wherein the dimension of the sub-feature vector of the feature points of the query image is smaller than the dimension of the feature vector of the feature points of the query image;
searching a database class center matched with a plurality of sub-feature vectors of the feature points of the query image, wherein the database class center represents the class center of the sub-feature vectors of the feature points of the database;
and determining a first group of database feature points matched with the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image.
According to the implementation mode, the feature vectors of the feature points of the query image are decomposed into the sub-feature vectors with lower dimensions and then are matched, so that the speed of determining the database feature points matched with the feature points of the query image can be improved.
In one possible implementation, before the searching for the database class center matching with the plurality of sub-feature vectors of the feature point of the query image, the method further includes:
extracting feature vectors of a plurality of database feature points;
for any database feature point in the plurality of database feature points, decomposing the feature vector of the database feature point to obtain a plurality of sub-feature vectors of the database feature point, wherein the dimension of the sub-feature vector of the database feature point is smaller than that of the feature vector of the database feature point;
clustering the sub-feature vectors of the plurality of database feature points to obtain a database class center;
and establishing a corresponding relation between the database feature points and a database class center for any one of the database feature points.
In the implementation mode, only the database class center and the corresponding relation between the database feature points and the database class center need to be stored, and feature vectors (high-dimensional vectors) of the database feature points do not need to be stored, so that the storage space can be saved, the calculation memory can be saved, and the searching speed can be increased.
In one possible implementation manner, the determining, according to a database class center matched with a plurality of sub-feature vectors of feature points of the query image, a first set of database feature points matched with the feature points of the query image includes:
determining candidate database feature points corresponding to the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image;
and performing geometric verification on the candidate database feature points, and determining a first group of database feature points matched with the feature points of the query image.
In the implementation mode, candidate database feature points corresponding to the feature points of the query image are determined according to a database class center matched with a plurality of sub-feature vectors of the feature points of the query image, geometric verification is performed on the candidate database feature points, and a first group of database feature points matched with the feature points of the query image are determined, so that the first group of database feature points matched with the feature points of the query image can be determined quickly and accurately.
In one possible implementation, the geometrically validating the candidate database feature points and determining a first set of database feature points matching the feature points of the query image includes:
determining a similarity transformation matrix between the candidate database characteristic points and the corresponding characteristic points of the query image;
determining a matrix interval to which the similarity transformation matrix belongs in a plurality of preset matrix intervals;
determining matrix intervals, the number of which meets a first number condition, of the similarity transformation matrixes in the matrix intervals as target matrix intervals;
and determining a first group of database feature points matched with the feature points of the query image according to the candidate database feature points corresponding to the similarity transformation matrix in the target matrix interval.
The implementation mode carries out geometric verification in a matrix interval voting mode, so that the feature points matched with the feature points of the query image can be quickly determined, and the visual positioning speed can be improved.
In a possible implementation manner, the determining, according to candidate database feature points corresponding to a similarity transformation matrix in the target matrix interval, a first group of database feature points that match feature points of the query image includes:
determining a database image to which candidate database feature points belong, wherein the candidate database feature points represent candidate database feature points corresponding to a similarity transformation matrix in the target matrix interval;
and determining the first group of database feature points according to the alternative database feature points in the database image with the alternative database feature points meeting the second quantity condition.
According to the implementation mode, the alternative database feature points can be filtered to obtain the first group of database feature points, so that the visual positioning result of the query image is determined based on the first group of database feature points, and the accuracy of the determined visual positioning result is improved.
In a possible implementation manner, the searching, according to the feature vector of the feature point of the query image, for the database feature point matched with the feature point of the query image further includes:
determining three-dimensional coordinates corresponding to the first group of database feature points;
determining a second group of database characteristic points corresponding to the three-dimensional coordinates;
and determining a visual positioning result of the query image according to the first group of database feature points and the second group of database feature points.
The implementation mode can increase the number of the associated point pairs through reverse search, thereby improving the robustness of visual positioning.
According to an aspect of the present disclosure, there is provided a visual positioning apparatus including:
the first extraction module is used for extracting a feature vector of a feature point of the query image;
the searching module is used for searching database characteristic points matched with the characteristic points of the query image according to the characteristic vectors of the characteristic points of the query image, wherein the database characteristic points represent the characteristic points of the database image;
and the determining module is used for determining the visual positioning result of the query image according to the matched database characteristic points.
In one possible implementation manner, the first extraction module is configured to:
converting the query image to obtain at least one converted image corresponding to the query image;
and performing feature extraction on at least two images in the query image and the at least one conversion image to obtain feature vectors of feature points of the query image.
In one possible implementation manner, the first extraction module is configured to:
inputting at least two images of the query image and the at least one transformation image into a first neural network respectively, and outputting feature maps of the at least two images via the first neural network;
performing grouping convolution on the feature maps of the at least two images to obtain at least two grouping convolution results;
and performing feature fusion on the at least two grouped convolution results to obtain a feature vector of the feature point of the query image.
In one possible implementation manner, the search module is configured to:
decomposing the feature vectors of the feature points of the query image to obtain a plurality of sub-feature vectors of the feature points of the query image, wherein the dimension of the sub-feature vector of the feature points of the query image is smaller than the dimension of the feature vector of the feature points of the query image;
searching a database class center matched with a plurality of sub-feature vectors of the feature points of the query image, wherein the database class center represents the class center of the sub-feature vectors of the feature points of the database;
and determining a first group of database feature points matched with the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image.
In one possible implementation, the apparatus further includes:
the second extraction module is used for extracting the feature vectors of the plurality of database feature points;
the decomposition module is used for decomposing the feature vectors of the database feature points to obtain a plurality of sub-feature vectors of the database feature points for any one of the database feature points, wherein the dimension of the sub-feature vector of the database feature point is smaller than that of the feature vector of the database feature point;
the clustering module is used for clustering the sub-feature vectors of the plurality of database feature points to obtain a database class center;
and the establishing module is used for establishing the corresponding relation between the database characteristic points and the database class center for any database characteristic point in the plurality of database characteristic points.
In one possible implementation manner, the search module is configured to:
determining candidate database feature points corresponding to the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image;
and performing geometric verification on the candidate database feature points, and determining a first group of database feature points matched with the feature points of the query image.
In one possible implementation manner, the search module is configured to:
determining a similarity transformation matrix between the candidate database characteristic points and the corresponding characteristic points of the query image;
determining a matrix interval to which the similarity transformation matrix belongs in a plurality of preset matrix intervals;
determining matrix intervals, the number of which meets a first number condition, of the similarity transformation matrixes in the matrix intervals as target matrix intervals;
and determining a first group of database feature points matched with the feature points of the query image according to the candidate database feature points corresponding to the similarity transformation matrix in the target matrix interval.
In one possible implementation manner, the search module is configured to:
determining a database image to which candidate database feature points belong, wherein the candidate database feature points represent candidate database feature points corresponding to a similarity transformation matrix in the target matrix interval;
and determining the first group of database feature points according to the alternative database feature points in the database image with the alternative database feature points meeting the second quantity condition.
In one possible implementation manner, the search module is configured to:
determining three-dimensional coordinates corresponding to the first group of database feature points;
determining a second group of database characteristic points corresponding to the three-dimensional coordinates;
and determining a visual positioning result of the query image according to the first group of database feature points and the second group of database feature points.
According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
In the embodiment of the disclosure, the feature vector of the feature point of the query image is extracted, the database feature point matched with the feature point of the query image is searched according to the feature vector of the feature point of the query image, and the visual positioning result of the query image is determined according to the matched database feature point, so that a local map is obtained without retrieval in visual positioning, the feature point matching is directly performed, and the visual positioning result of the query image is determined according to the database feature point matched with the feature point of the query image, so that the positioning process is more direct and effective, the memory consumption is lower, the time consumption of visual positioning can be reduced, and the positioning process is more reliable.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flowchart of a visual positioning method provided by an embodiment of the present disclosure.
Fig. 2 is a schematic diagram illustrating a query image being transformed to obtain a plurality of transformed images corresponding to the query image.
Fig. 3 shows a schematic diagram of inputting a query image and a plurality of transformed images into the vanilla CNN, and outputting feature maps of the query image and the plurality of transformed images via the vanilla CNN.
Fig. 4 is a schematic diagram illustrating that feature points of a query image and a plurality of transformed images are subjected to block convolution through two convolutional neural networks, and the result of the block convolution is subjected to bilinear pooling to obtain feature vectors of the feature points.
Fig. 5 shows a block diagram of a visual positioning apparatus provided by an embodiment of the present disclosure.
Fig. 6 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure.
Fig. 7 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
In the related technology, when visual positioning is carried out, a plurality of database images closest to an inquiry image are searched to obtain a local map, and characteristic point matching is carried out on the inquiry image in the local map to obtain a visual positioning result of the inquiry image. The flow of the visual positioning mode has large consumption on the calculation memory and slow visual positioning speed.
In order to solve the technical problems similar to the above, in the embodiment of the present disclosure, a feature vector of a feature point of a query image is extracted, a database feature point matched with the feature point of the query image is searched according to the feature vector of the feature point of the query image, and a visual positioning result of the query image is determined according to the matched database feature point, so that a local map is obtained without retrieval in visual positioning, feature point matching is directly performed, and the visual positioning result of the query image is determined according to the database feature point matched with the feature point of the query image, so that the positioning process is more direct and effective, memory consumption is lower, time consumption of visual positioning can be reduced, and the positioning process is more reliable.
Fig. 1 shows a flowchart of a visual positioning method provided by an embodiment of the present disclosure. The subject of execution of the visual positioning method may be a visual positioning device. For example, the visual positioning method may be performed by a terminal device or a cloud server or other processing device. The terminal device may be a robot, a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, or a wearable device. In some possible implementations, the visual positioning method may be implemented by a processor invoking computer readable instructions stored in a memory. As shown in fig. 1, the visual positioning method includes steps S11 to S13.
In step S11, a feature vector of the feature point of the query image is extracted.
In the embodiment of the present disclosure, the feature points of the query image may correspond to pixels of the query image, that is, the position of any feature point of the query image in the query image may be uniquely determined according to the pixel corresponding to the feature point. Thus, the feature vectors of the extracted feature points of the query image can provide description information at the pixel level. The number of feature points of the query image may be less than or equal to the number of pixels of the query image. In one example, the number of feature points of the query image may be less than the number of pixels of the query image. For example, the value range of the number of feature points of the query image may be 500 and 2500.
In a possible implementation manner, the extracting a feature vector of a feature point of a query image includes: converting the query image to obtain at least one converted image corresponding to the query image; and performing feature extraction on at least two images in the query image and the at least one conversion image to obtain feature vectors of feature points of the query image.
As an example of this implementation, the transformation process may be at least one of rotation, scaling, mirroring, warping, and the like. Fig. 2 is a schematic diagram illustrating a query image being transformed to obtain a plurality of transformed images corresponding to the query image.
As an example of this implementation, a query image may be subjected to transformation processing to obtain a plurality of transformed images corresponding to the query image, and feature extraction may be performed on the query image and the plurality of transformed images corresponding to the query image to obtain feature vectors of feature points of the query image.
In the implementation mode, the query image is subjected to transformation processing, and at least two images in the query image and the at least one transformation image are utilized to perform feature extraction, so that the obtained feature vector of the feature point of the query image can reflect richer and comprehensive information in the query image, and the extracted feature vector has stronger robustness to environmental changes such as illumination and the like, thereby being beneficial to improving the accuracy of visual positioning.
As an example of this implementation, the performing feature extraction on at least two images of the query image and the at least one transformed image to obtain a feature vector of a feature point of the query image includes: inputting at least two images of the query image and the at least one transformation image into a first neural network respectively, and outputting feature maps of the at least two images via the first neural network; performing grouping convolution on the feature maps of the at least two images to obtain at least two grouping convolution results; and performing feature fusion on the at least two grouped convolution results to obtain a feature vector of the feature point of the query image.
In this example, the first neural network may be a convolutional neural network, such as vanilla CNN or the like. Fig. 3 shows a schematic diagram of inputting a query image and a plurality of transformed images into the vanilla CNN, and outputting feature maps of the query image and the plurality of transformed images via the vanilla CNN.
According to this example, dense image description features can be obtained, that is, feature vectors of a large number of feature points (e.g., 500-2500 feature points) can be extracted. The Feature vectors of the Feature points of the query image extracted using this example may be referred to as GIFT (Group Invariant Feature Transform) features. In this example, by performing a grouping convolution on the feature maps of the at least two images, deep features of the query image can be obtained, so that the robustness of subsequent feature point matching can be improved, and the reliability of visual positioning can be improved.
In one example, the performing a packet convolution on the feature maps of the at least two images to obtain at least two packet convolution results includes: dividing the feature maps of the at least two images into a first feature map group and a second feature map group; inputting the first feature map set into a second neural network, and outputting a grouping convolution result of the first feature map set through the second neural network; and inputting the second feature map group into a third neural network, and outputting a grouping convolution result of the second feature map group through the third neural network.
The first feature map group comprises a part of feature maps in the feature maps of the at least two images, and the second feature map group comprises another part of feature maps in the feature maps of the at least two images.
In this example, the second neural network and the third neural network may both be convolutional neural networks. In this example, the grouping convolution results of the first feature map group and the second feature map group are obtained by processing the second neural network and the third neural network respectively, so that the deep structure of the features can be obtained, the obtained feature information is more comprehensive, and the overall matching effect is more robust.
In this example, the number of neural networks used for the packet convolution may also be three or more.
In one example, the performing feature fusion on the at least two grouped convolution results to obtain a feature vector of a feature point of the query image includes: and performing bilinear pooling operation on the at least two grouped convolution results to obtain the feature vector of the feature point of the query image.
In other examples, the fusion may also be performed by means of concat and the like, which is not limited in the embodiments of the present disclosure.
Fig. 4 is a schematic diagram illustrating that feature points of a query image and a plurality of transformed images are subjected to block convolution through two convolutional neural networks, and the result of the block convolution is subjected to bilinear pooling to obtain feature vectors of the feature points. As shown in fig. 4, each convolutional layer of the second neural network and the third neural network may be processed with an activation function, for example, the activation function may be a ReLU (Rectified Linear Unit).
In step S12, a database feature point matching the feature point of the query image is found according to the feature vector of the feature point of the query image, where the database feature point represents the feature point of the database image.
In one possible implementation manner, the database feature points matching the feature points of the query image may be determined according to the similarity between feature vectors of the feature points.
In a possible implementation manner, the searching, according to the feature vector of the feature point of the query image, for the database feature point matching the feature point of the query image includes: decomposing the feature vectors of the feature points of the query image to obtain a plurality of sub-feature vectors of the feature points of the query image, wherein the dimension of the sub-feature vector of the feature points of the query image is smaller than the dimension of the feature vector of the feature points of the query image; searching a database class center matched with a plurality of sub-feature vectors of the feature points of the query image, wherein the database class center represents the class center of the sub-feature vectors of the feature points of the database; and determining a first group of database feature points matched with the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image.
For example, the feature vector of the feature point of the query image is [ s ]1,s2,s3,s4,s5,s6,s7,s8,s9]Decomposing the feature vector of the feature point of the query image into 3 sub-feature vectors to obtain [ s ]1,s2,s3]、[s4,s5,s6]And [ s ]7,s8,s9]. Of course, in practical applications, the dimensionality of the feature vectors of the feature points of the query image may be much higher. The disclosed embodiments do not limit the dimensions of the feature vectors of the feature points of the query image, the number of sub-feature vectors of the feature points of the query image, and the dimensions of sub-feature vectors of the feature points of the query image. Obtaining sub-feature vector s1,s2,s3]、[s4,s5,s6]And [ s ]7,s8,s9]Then, the database class center closest to each sub-feature vector is searched, for example, with sub-feature vector [ s ]1,s2,s3]The closest database class center is database class center 2, and the sub-feature vector [ s ]4,s5,s6]The closest database class center is the database class center 5, and the sub-feature vector [ s ]7,s8,s9]The closest database class center is the database class center 8. According to the corresponding relation between the database feature points and the database class center, the database feature points matched with the feature points of the query image can be determined.
According to the implementation mode, the feature vectors of the feature points of the query image are decomposed into the sub-feature vectors with lower dimensions and then are matched, so that the speed of determining the database feature points matched with the feature points of the query image can be improved.
As an example of this implementation, before the finding a database class center that matches a plurality of sub-feature vectors of feature points of the query image, the method further comprises: extracting feature vectors of a plurality of database feature points; for any database feature point in the plurality of database feature points, decomposing the feature vector of the database feature point to obtain a plurality of sub-feature vectors of the database feature point, wherein the dimension of the sub-feature vector of the database feature point is smaller than that of the feature vector of the database feature point; clustering the sub-feature vectors of the plurality of database feature points to obtain a database class center; and establishing a corresponding relation between the database feature points and a database class center for any one of the database feature points.
In this example, a plurality of database images may be included in the database, where the database images represent images in the database. For each database image, feature vectors for the database feature points in the database image may be extracted in a manner similar to the manner described above for extracting feature vectors for feature points of a query image. For example, the value range of the number of extracted database feature points for each database image may be 500-2500.
In this example, for any database feature point, the feature vector for that database feature point may be decomposed into a plurality of sub-feature vectors. For example, the feature vector of a certain database feature point is [1,3,2,3,4,5,3,2,1], and the feature vector of the database feature point can be decomposed into 3 sub-feature vectors [1,3,2], [3,4,5] and [3,2,1 ]. Of course, in practical applications, the dimensionality of the feature vectors for the database feature points may be much higher.
In this example, the sub-feature vectors of the plurality of database feature points may be clustered by using a method such as K-means, KD tree, or vocabulary tree, so as to obtain a database class center.
In this example, after the database class center is obtained, the correspondence between the database feature point and the database class center is recorded.
In one example, the correspondence between the database class center and the database feature points corresponding to all the sub-feature vectors in the class to which the database class center belongs may be recorded. For example, if the sub-feature vectors of the database feature points 5,6, and 7 belong to the class to which the database class center 1 belongs, the correspondence between the database class center 1 and the database feature points 5,6, and 7 may be recorded, and may be written as (1:5,6,7), for example.
In another example, the corresponding relationship between the database feature point and each database class center corresponding to the database feature point may be recorded. For example, if the database feature point 1 corresponds to the database class center 2, the database class center 5, and the database class center 8, the corresponding relationship between the database feature point 1 and the database class center 2, the database class center 5, and the database class center 8 may be recorded, and may be (1:2,5,8), for example.
In this example, only the database class center and the corresponding relationship between the database feature point and the database class center need to be stored, and the feature vector (high-dimensional vector) of the database feature point does not need to be stored, so that the storage space can be saved, the calculation memory can be saved, and the search speed can be increased.
In one example, the indexer can be built from all database class centers.
As an example of this implementation, the determining, according to the database class center matched with the plurality of sub-feature vectors of the feature point of the query image, a first set of database feature points matched with the feature point of the query image includes: determining candidate database feature points corresponding to the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image; and performing geometric verification on the candidate database feature points, and determining a first group of database feature points matched with the feature points of the query image.
In this example, candidate database feature points corresponding to the feature points of the query image may be determined according to the correspondence between the database feature points and the database class centers matched with the plurality of sub-feature vectors of the feature points of the query image.
In one example, candidate database feature points corresponding to the feature points of the query image may be determined by using a cartesian product. For example, the feature vector of the feature point a of the query image corresponds to 3 sub-feature vectors a1, a2 and A3, the center of the database class matching with the sub-feature vector a1 is the center of the database class P1, the center of the database class matching with the sub-feature vector a2 is the center of the database class P2, and the center of the database class matching with the sub-feature vector A3 is the center of the database class P3, then the database feature points corresponding to all sub-feature vectors in the class to which the centers of the database classes P1, P2 and P3 belong may be determined as candidate database feature points corresponding to the feature point a of the query image. For example, the database feature points corresponding to all the sub-feature vectors in the class to which the database class center P1 belongs include database feature points D1, D2, D5, and D6, the database feature points corresponding to all the sub-feature vectors in the class to which the database class center P2 belongs include database feature points D1, D7, D8, and D9, and the database feature points corresponding to all the sub-feature vectors in the class to which the database class center P3 belongs include database feature points D3, D4, and D10, so that D1-D10 may be respectively determined as candidate database feature points corresponding to the feature point a of the query image.
In another example, the feature vectors of feature point a of the query image correspond to 3 sub-feature vectors a1, a2, and A3, the database class center matching sub-feature vector a1 is database class center P1, the database class center matching sub-feature vector a2 is database class center P2, and the database class center matching sub-feature vector A3 is database class center P3, then the database feature points corresponding to database class centers P1, P2, and P3 may all be determined as candidate database feature points corresponding to feature point a of the query image. For example, the database feature points corresponding to all sub-feature vectors in the class to which the database class center P1 belongs include database feature points D1, D2, D3, D5 and D6, the database feature points corresponding to all sub-feature vectors in the class to which the database class center P2 belongs include database feature points D1, D2, D3, D7, D8 and D9, and the database feature points corresponding to all sub-feature vectors in the class to which the database class center P3 belongs include database feature points D1, D3, D4 and D10, so that D1 and D3 may be respectively determined as candidate database feature points corresponding to the feature point a of the query image.
In this example, the number of candidate database feature points corresponding to each feature point of the query image may be one or more, for example, may be 25 at most. That is, for any feature point of the query image, a plurality of database feature points that are closest to the feature point may be used as candidate database feature points corresponding to the feature point.
In this example, the candidate database feature points corresponding to the feature points of the query image are determined according to the database class centers matched with the plurality of sub-feature vectors of the feature points of the query image, and then the candidate database feature points are subjected to geometric verification to determine the first group of database feature points matched with the feature points of the query image, so that the first group of database feature points matched with the feature points of the query image can be determined quickly and accurately.
In one example, the geometrically validating the candidate database feature points to determine a first set of database feature points that match feature points of the query image includes: determining a similarity transformation matrix between the candidate database characteristic points and the corresponding characteristic points of the query image; determining a matrix interval to which the similarity transformation matrix belongs in a plurality of preset matrix intervals; determining matrix intervals, the number of which meets a first number condition, of the similarity transformation matrixes in the matrix intervals as target matrix intervals; and determining a first group of database feature points matched with the feature points of the query image according to the candidate database feature points corresponding to the similarity transformation matrix in the target matrix interval.
In this example, the feature point of the query image corresponding to any one of the candidate database feature points represents a feature point of the query image matching the candidate database feature point. In this example, for any candidate database feature point, a similarity transformation matrix between the candidate database feature point and the corresponding query image feature point may be constructed according to the coordinate, scale and rotation angle of the candidate database feature point and the feature point. For example, the similarity transformation matrix includes 4 elements
Figure BDA0002596536660000131
The preset matrix interval contains the value ranges of all elements in the similarity transformation matrix. For example, the preset matrix interval may be expressed as
Figure BDA0002596536660000132
According to the values of the elements in the similarity transformation matrix and the value ranges of the elements in the preset matrix intervals, the matrix intervals to which the similarity transformation matrices belong can be determined, and therefore the number of the similarity transformation matrices in each matrix interval can be determined.
For example, the first quantity condition may be that the number of the similar transformation matrices is greater than or equal to a second preset value, and for example, if the number of the similar transformation matrices in a certain matrix interval is greater than or equal to the second preset value, the matrix interval may be determined as a target matrix interval.
For another example, the first number condition may be M matrix intervals belonging to the largest number of similarity transformation matrices, where M is a positive integer. According to the number of the similarity transformation matrixes in each matrix interval, M matrix intervals with the largest number of the similarity transformation matrixes can be determined, and the M matrix intervals with the largest number of the similarity transformation matrixes can be respectively determined as target matrix intervals.
For another example, the first quantity condition may be that the quantity of the similarity transformation matrices is greater than or equal to the second preset value and belongs to M matrix intervals with the largest quantity of similarity transformation matrices. For example, if the number of similar transformation matrices in a certain matrix interval is greater than or equal to a second preset value, and the matrix interval belongs to M matrix intervals with the largest number of similar transformation matrices, the matrix interval may be determined as a target matrix interval.
In the above example, geometric verification is performed by matrix interval voting, so that feature points matched with feature points of the query image can be quickly determined, and thus, the visual positioning speed can be increased.
For example, the determining a first group of database feature points matching with feature points of the query image according to candidate database feature points corresponding to the similarity transformation matrix in the target matrix interval includes: determining a database image to which candidate database feature points belong, wherein the candidate database feature points represent candidate database feature points corresponding to a similarity transformation matrix in the target matrix interval; and determining the first group of database feature points according to the alternative database feature points in the database image with the alternative database feature points meeting the second quantity condition. According to this example, the candidate database feature points may be filtered to obtain a first set of database feature points, so that the visual positioning result of the query image is determined based on the first set of database feature points, which is helpful for improving the accuracy of the determined visual positioning result.
For example, the second quantity condition may be greater than or equal to a first preset value, e.g., the first preset value is equal to 12.
For example, if the candidate database feature points in any database image satisfy the second number condition, it may be determined that the candidate database feature points in the database image belong to the first group of database feature points.
For another example, if the number of database images whose candidate database feature points satisfy the second quantity condition is greater than N, the first N database images whose candidate database feature points are the most may be selected from the database images whose candidate database feature points satisfy the second quantity condition, and the candidate database feature points in the N database images are determined to belong to the first group of database feature points. Where N is a positive integer, e.g., N is equal to 30. If the number of the database images of which the candidate database feature points satisfy the second quantity condition is less than or equal to N, the candidate database feature points in all the database images of which the candidate database feature points satisfy the second quantity condition may be determined to belong to the first group of database feature points.
If the number of the database images of which the candidate database feature points satisfy the second number condition is 0, it may be determined that the visual positioning result of the query image is positioning failure.
As another example of the implementation, the geometric verification may be performed by using a method such as RANSAC (RANdom Sample Consensus).
As an example of this implementation manner, the searching, according to the feature vector of the feature point of the query image, for the database feature point matching the feature point of the query image further includes: determining three-dimensional coordinates corresponding to the first group of database feature points; determining a second group of database characteristic points corresponding to the three-dimensional coordinates; and determining a visual positioning result of the query image according to the first group of database feature points and the second group of database feature points.
In this example, the three-dimensional coordinates corresponding to the first set of database feature points may be determined according to a correspondence between the database feature points and the three-dimensional coordinates. One three-dimensional coordinate may correspond to a plurality of database feature points. According to the corresponding relation between the three-dimensional coordinates and the database feature points, all the database feature points corresponding to the three-dimensional coordinates can be determined, and the database feature points other than the first group of database feature points in the database feature points corresponding to the three-dimensional coordinates can be determined to belong to the second group of database feature points.
This example can increase the number of associated point pairs by reverse search, and thus can improve the robustness of visual localization. And the database characteristic points represented by the associated point pairs and the corresponding three-dimensional coordinates are obtained.
In step S13, a visual positioning result of the query image is determined according to the matched database feature points.
In a possible implementation manner, the visual positioning result of the query image may include pose information corresponding to the query image. The pose information may include one or both of position information and pose information. The position information may be represented by coordinates, and the posture information may be represented by an angle. In one example, the visual localization result of the query image may include pose information for six degrees of freedom of the query image.
In the embodiment of the present disclosure, after the matched database feature Point is obtained, a visual positioning result of the query image may be determined by using a method such as PnP (Perspective-n-Point). For example, the visual positioning result of the query image can be determined by using methods such as EPnP (effective Perspective-n-Point, Efficient Perspective n-Point), P3P (Perspective-3-Point, Perspective 3-Point), or DLS (Direct Least-Squares).
In one example, if the number of interior points after solving PnP is less than a third preset value, it may be determined that the visual positioning result of the query image is a positioning failure. For example, the third preset value may be equal to 12. If the number of the inner points is larger than or equal to the third preset value, the pose information corresponding to the query image can be obtained. The interior points can represent feature points which are correctly matched when the pose is solved. For example, the inliermask may be used to determine inliers according to the RANSAC algorithm. In one example, the pose information corresponding to the query image can be optimized through a nonlinear optimizer to obtain a final visual positioning result.
In an application scene, a user device can collect a query image and send a visual positioning request to a cloud server, wherein the visual positioning request carries the query image; the cloud server processes the image by adopting the visual positioning method provided by the embodiment of the disclosure to obtain the visual positioning result of the query image, and returns the visual positioning result of the query image to the user equipment. The user equipment may be a mobile phone or other equipment with a photographing function. The visual positioning request may also include camera internal reference information for the user device, which may include, for example, a focal length and a principal point location.
The embodiment of the disclosure can be applied to various application scenes such as a positioning navigation system, a high-precision map, an augmented reality product and the like. For example, the embodiment of the disclosure can be used for providing visual positioning and navigation services in large indoor scenes such as shopping malls, airports and museums, and the problem that effective positioning cannot be achieved due to the fact that no GPS signal exists in the indoor scenes is solved. For another example, in an outdoor scene, a high-precision map can be enhanced, positioning with higher precision can be realized by combining with a GPS signal, and a visual positioning service can be provided in a place where the outdoor GPS signal is weak. For another example, since the six-degree-of-freedom position and posture information of the user equipment can be obtained quickly, the embodiment of the present disclosure can be applied to augmented reality applications.
An application scenario of the embodiment of the present disclosure is described below by taking an example of a robot performing visual positioning in a mall. In the application scene, photos of a plurality of shopping malls can be taken as database images. For any database image, the database image can be transformed to obtain a plurality of transformed images corresponding to the database image; and performing feature extraction on the database image and a plurality of converted images corresponding to the database image to obtain feature vectors of feature points of the database image, namely feature vectors of the database feature points. For any database feature point, the feature vector of the database feature point may be decomposed to obtain a plurality of sub-feature vectors of the database feature point. And clustering the sub-feature vectors of all the database feature points to obtain a plurality of database class centers. And for any database feature point, establishing a corresponding relation between the database feature point and the database class center. When the robot needs to be positioned, the currently acquired image can be used as a query image. Transforming the query image to obtain a plurality of transformed images corresponding to the query image; and extracting the characteristics of the query image and the plurality of converted images corresponding to the query image to obtain the characteristic vectors of the characteristic points of the query image. And decomposing the feature vectors of the feature points of the query image to obtain a plurality of sub-feature vectors of the feature points of the query image. And searching a database class center matched with the plurality of sub-feature vectors of the feature points of the query image, and determining a first group of database feature points matched with the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image. And determining three-dimensional coordinates corresponding to the first group of database feature points, and determining a second group of database feature points corresponding to the three-dimensional coordinates. And determining a visual positioning result of the query image according to the first group of database feature points and the second group of database feature points, namely determining the current position information of the robot in the market and the current posture information of the robot.
It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the present disclosure also provides a visual positioning apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the visual positioning methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.
Fig. 5 shows a block diagram of a visual positioning apparatus provided by an embodiment of the present disclosure. As shown in fig. 5, the visual positioning device includes: a first extraction module 51, configured to extract a feature vector of a feature point of a query image; the searching module 52 is configured to search, according to the feature vector of the feature point of the query image, a database feature point that matches the feature point of the query image, where the database feature point represents a feature point of a database image; and the determining module 53 is configured to determine a visual positioning result of the query image according to the matched database feature points.
In one possible implementation manner, the first extraction module 51 is configured to: converting the query image to obtain at least one converted image corresponding to the query image; and performing feature extraction on at least two images in the query image and the at least one conversion image to obtain feature vectors of feature points of the query image.
In one possible implementation manner, the first extraction module 51 is configured to: inputting at least two images of the query image and the at least one transformation image into a first neural network respectively, and outputting feature maps of the at least two images via the first neural network; performing grouping convolution on the feature maps of the at least two images to obtain at least two grouping convolution results; and performing feature fusion on the at least two grouped convolution results to obtain a feature vector of the feature point of the query image.
In one possible implementation, the lookup module 52 is configured to: decomposing the feature vectors of the feature points of the query image to obtain a plurality of sub-feature vectors of the feature points of the query image, wherein the dimension of the sub-feature vector of the feature points of the query image is smaller than the dimension of the feature vector of the feature points of the query image; searching a database class center matched with a plurality of sub-feature vectors of the feature points of the query image, wherein the database class center represents the class center of the sub-feature vectors of the feature points of the database; and determining a first group of database feature points matched with the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image.
In one possible implementation, the apparatus further includes: the second extraction module is used for extracting the feature vectors of the plurality of database feature points; the decomposition module is used for decomposing the feature vectors of the database feature points to obtain a plurality of sub-feature vectors of the database feature points for any one of the database feature points, wherein the dimension of the sub-feature vector of the database feature point is smaller than that of the feature vector of the database feature point; the clustering module is used for clustering the sub-feature vectors of the plurality of database feature points to obtain a database class center; and the establishing module is used for establishing the corresponding relation between the database characteristic points and the database class center for any database characteristic point in the plurality of database characteristic points.
In one possible implementation, the lookup module 52 is configured to: determining candidate database feature points corresponding to the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image; and performing geometric verification on the candidate database feature points, and determining a first group of database feature points matched with the feature points of the query image.
In one possible implementation, the lookup module 52 is configured to: determining a similarity transformation matrix between the candidate database characteristic points and the corresponding characteristic points of the query image; determining a matrix interval to which the similarity transformation matrix belongs in a plurality of preset matrix intervals; determining matrix intervals, the number of which meets a first number condition, of the similarity transformation matrixes in the matrix intervals as target matrix intervals; and determining a first group of database feature points matched with the feature points of the query image according to the candidate database feature points corresponding to the similarity transformation matrix in the target matrix interval.
In one possible implementation, the lookup module 52 is configured to: determining a database image to which candidate database feature points belong, wherein the candidate database feature points represent candidate database feature points corresponding to a similarity transformation matrix in the target matrix interval; and determining the first group of database feature points according to the alternative database feature points in the database image with the alternative database feature points meeting the second quantity condition.
In one possible implementation, the lookup module 52 is configured to: determining three-dimensional coordinates corresponding to the first group of database feature points; determining a second group of database characteristic points corresponding to the three-dimensional coordinates; and determining a visual positioning result of the query image according to the first group of database feature points and the second group of database feature points.
In the embodiment of the disclosure, the feature vector of the feature point of the query image is extracted, the database feature point matched with the feature point of the query image is searched according to the feature vector of the feature point of the query image, and the visual positioning result of the query image is determined according to the matched database feature point, so that a local map is obtained without retrieval in visual positioning, the feature point matching is directly performed, and the visual positioning result of the query image is determined according to the database feature point matched with the feature point of the query image, so that the positioning process is more direct and effective, the memory consumption is lower, the time consumption of visual positioning can be reduced, and the positioning process is more reliable.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-described method. The computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.
The disclosed embodiments also provide a computer program product comprising computer readable code, which when run on a device, a processor in the device executes instructions for implementing the visual positioning method provided in any of the above embodiments.
The disclosed embodiments also provide another computer program product for storing computer readable instructions, which when executed, cause a computer to perform the operations of the visual positioning method provided by any of the above embodiments.
An embodiment of the present disclosure further provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.
The electronic device may be provided as a terminal, server, or other form of device.
Fig. 6 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.
Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G/LTE, 5G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.
Fig. 7 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows, stored in memory 1932
Figure BDA0002596536660000211
Mac OS
Figure BDA0002596536660000212
Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (11)

1. A visual positioning method, comprising:
extracting a feature vector of a feature point of the query image;
searching database characteristic points matched with the characteristic points of the query image according to the characteristic vectors of the characteristic points of the query image, wherein the database characteristic points represent the characteristic points of the database image;
determining a visual positioning result of the query image according to the matched database feature points;
wherein, the searching the database feature points matched with the feature points of the query image according to the feature vectors of the feature points of the query image comprises:
decomposing the feature vectors of the feature points of the query image to obtain a plurality of sub-feature vectors of the feature points of the query image, wherein the dimension of the sub-feature vector of the feature points of the query image is smaller than the dimension of the feature vector of the feature points of the query image;
searching a database class center matched with a plurality of sub-feature vectors of the feature points of the query image, wherein the database class center represents the class center of the sub-feature vectors of the feature points of the database;
and determining a first group of database feature points matched with the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image.
2. The method of claim 1, wherein extracting feature vectors of feature points of the query image comprises:
converting the query image to obtain at least one converted image corresponding to the query image;
and performing feature extraction on at least two images in the query image and the at least one conversion image to obtain feature vectors of feature points of the query image.
3. The method according to claim 2, wherein the performing feature extraction on at least two images of the query image and the at least one transformed image to obtain a feature vector of a feature point of the query image comprises:
inputting at least two images of the query image and the at least one transformation image into a first neural network respectively, and outputting feature maps of the at least two images via the first neural network;
performing grouping convolution on the feature maps of the at least two images to obtain at least two grouping convolution results;
and performing feature fusion on the at least two grouped convolution results to obtain a feature vector of the feature point of the query image.
4. The method of claim 1, wherein prior to said finding a database class center that matches a plurality of sub-feature vectors of feature points of the query image, the method further comprises:
extracting feature vectors of a plurality of database feature points;
for any database feature point in the plurality of database feature points, decomposing the feature vector of the database feature point to obtain a plurality of sub-feature vectors of the database feature point, wherein the dimension of the sub-feature vector of the database feature point is smaller than that of the feature vector of the database feature point;
clustering the sub-feature vectors of the plurality of database feature points to obtain a database class center;
and establishing a corresponding relation between the database feature points and a database class center for any one of the database feature points.
5. The method of claim 1, wherein determining a first set of database feature points that match feature points of the query image from a database class center that matches a plurality of sub-feature vectors of feature points of the query image comprises:
determining candidate database feature points corresponding to the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image;
and performing geometric verification on the candidate database feature points, and determining a first group of database feature points matched with the feature points of the query image.
6. The method of claim 5, wherein geometrically validating the candidate database feature points and determining a first set of database feature points that match feature points of the query image comprises:
determining a similarity transformation matrix between the candidate database characteristic points and the corresponding characteristic points of the query image;
determining a matrix interval to which the similarity transformation matrix belongs in a plurality of preset matrix intervals;
determining matrix intervals, the number of which meets a first number condition, of the similarity transformation matrixes in the matrix intervals as target matrix intervals;
and determining a first group of database feature points matched with the feature points of the query image according to the candidate database feature points corresponding to the similarity transformation matrix in the target matrix interval.
7. The method according to claim 6, wherein the determining a first set of database feature points matching feature points of the query image according to candidate database feature points corresponding to similarity transformation matrices in the target matrix interval comprises:
determining a database image to which candidate database feature points belong, wherein the candidate database feature points represent candidate database feature points corresponding to a similarity transformation matrix in the target matrix interval;
and determining the first group of database feature points according to the alternative database feature points in the database image with the alternative database feature points meeting the second quantity condition.
8. The method according to any one of claims 1 to 7, wherein the searching for the database feature point matching the feature point of the query image according to the feature vector of the feature point of the query image further comprises:
determining three-dimensional coordinates corresponding to the first group of database feature points;
determining a second group of database characteristic points corresponding to the three-dimensional coordinates;
and determining a visual positioning result of the query image according to the first group of database feature points and the second group of database feature points.
9. A visual positioning device, comprising:
the first extraction module is used for extracting a feature vector of a feature point of the query image;
the searching module is used for searching database characteristic points matched with the characteristic points of the query image according to the characteristic vectors of the characteristic points of the query image, wherein the database characteristic points represent the characteristic points of the database image;
the determining module is used for determining a visual positioning result of the query image according to the matched database feature points;
wherein the lookup module is configured to:
decomposing the feature vectors of the feature points of the query image to obtain a plurality of sub-feature vectors of the feature points of the query image, wherein the dimension of the sub-feature vector of the feature points of the query image is smaller than the dimension of the feature vector of the feature points of the query image;
searching a database class center matched with a plurality of sub-feature vectors of the feature points of the query image, wherein the database class center represents the class center of the sub-feature vectors of the feature points of the database;
and determining a first group of database feature points matched with the feature points of the query image according to the database class center matched with the plurality of sub-feature vectors of the feature points of the query image.
10. An electronic device, comprising:
one or more processors;
a memory for storing executable instructions;
wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the method of any one of claims 1 to 8.
11. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 8.
CN202010710996.0A 2020-07-22 2020-07-22 Visual positioning method and device, electronic equipment and storage medium Active CN111859003B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010710996.0A CN111859003B (en) 2020-07-22 2020-07-22 Visual positioning method and device, electronic equipment and storage medium
PCT/CN2020/139166 WO2022016803A1 (en) 2020-07-22 2020-12-24 Visual positioning method and apparatus, electronic device, and computer readable storage medium
TW110116124A TW202205206A (en) 2020-07-22 2021-05-04 Visual positioning method, electronic equipment and computer-readable storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010710996.0A CN111859003B (en) 2020-07-22 2020-07-22 Visual positioning method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111859003A CN111859003A (en) 2020-10-30
CN111859003B true CN111859003B (en) 2021-12-28

Family

ID=73002310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010710996.0A Active CN111859003B (en) 2020-07-22 2020-07-22 Visual positioning method and device, electronic equipment and storage medium

Country Status (3)

Country Link
CN (1) CN111859003B (en)
TW (1) TW202205206A (en)
WO (1) WO2022016803A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859003B (en) * 2020-07-22 2021-12-28 浙江商汤科技开发有限公司 Visual positioning method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104596519A (en) * 2015-02-17 2015-05-06 哈尔滨工业大学 RANSAC algorithm-based visual localization method
CN104820718A (en) * 2015-05-22 2015-08-05 哈尔滨工业大学 Image classification and searching method based on geographic position characteristics and overall situation vision characteristics
CN110296686A (en) * 2019-05-21 2019-10-01 北京百度网讯科技有限公司 Localization method, device and the equipment of view-based access control model
CN110390356A (en) * 2019-07-03 2019-10-29 Oppo广东移动通信有限公司 Visual dictionary generation method and device, storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101259957B1 (en) * 2012-11-16 2013-05-02 (주)엔써즈 System and method for providing supplementary information using image matching
CN111859003B (en) * 2020-07-22 2021-12-28 浙江商汤科技开发有限公司 Visual positioning method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104596519A (en) * 2015-02-17 2015-05-06 哈尔滨工业大学 RANSAC algorithm-based visual localization method
CN104820718A (en) * 2015-05-22 2015-08-05 哈尔滨工业大学 Image classification and searching method based on geographic position characteristics and overall situation vision characteristics
CN110296686A (en) * 2019-05-21 2019-10-01 北京百度网讯科技有限公司 Localization method, device and the equipment of view-based access control model
CN110390356A (en) * 2019-07-03 2019-10-29 Oppo广东移动通信有限公司 Visual dictionary generation method and device, storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GIFT Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs;Liu Yuan 等;《arXiv:1911.05932v1》;20191114;第1-12页 *

Also Published As

Publication number Publication date
TW202205206A (en) 2022-02-01
CN111859003A (en) 2020-10-30
WO2022016803A1 (en) 2022-01-27

Similar Documents

Publication Publication Date Title
CN110688951B (en) Image processing method and device, electronic equipment and storage medium
US11120078B2 (en) Method and device for video processing, electronic device, and storage medium
CN110059652B (en) Face image processing method, device and storage medium
US20200327353A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN111538855B (en) Visual positioning method and device, electronic equipment and storage medium
JP2022540072A (en) POSITION AND ATTITUDE DETERMINATION METHOD AND DEVICE, ELECTRONIC DEVICE, AND STORAGE MEDIUM
CN112001321A (en) Network training method, pedestrian re-identification method, network training device, pedestrian re-identification device, electronic equipment and storage medium
CN109145150B (en) Target matching method and device, electronic equipment and storage medium
CN111340048B (en) Image processing method and device, electronic equipment and storage medium
CN110781813B (en) Image recognition method and device, electronic equipment and storage medium
CN110532956B (en) Image processing method and device, electronic equipment and storage medium
CN111563138B (en) Positioning method and device, electronic equipment and storage medium
CN111523485A (en) Pose recognition method and device, electronic equipment and storage medium
CN113326768A (en) Training method, image feature extraction method, image recognition method and device
CN114332503A (en) Object re-identification method and device, electronic equipment and storage medium
CN111652107A (en) Object counting method and device, electronic equipment and storage medium
CN113139484B (en) Crowd positioning method and device, electronic equipment and storage medium
CN111859003B (en) Visual positioning method and device, electronic equipment and storage medium
CN113283343A (en) Crowd positioning method and device, electronic equipment and storage medium
CN113345000A (en) Depth detection method and device, electronic equipment and storage medium
CN111311588B (en) Repositioning method and device, electronic equipment and storage medium
CN111611414B (en) Vehicle searching method, device and storage medium
CN112948411B (en) Pose data processing method, interface, device, system, equipment and medium
CN114842404A (en) Method and device for generating time sequence action nomination, electronic equipment and storage medium
CN114648649A (en) Face matching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40032335

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant