RU2759773C1

RU2759773C1 - Method and system for determining the location of the user

Info

Publication number: RU2759773C1
Application number: RU2020134607A
Authority: RU
Inventors: Артем Алексеевич Шафаростов; Алексей Алексеевич Рыбаков; Максим Александрович Козлов; Илья Викторович Батурин; Александр Сергеевич Евграшин
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-11-17
Also published as: WO2022086358A1

Abstract

FIELD: information technology.SUBSTANCE: method for determining the location of the mobile apparatus of the user comprises the stages of: receiving a user request for determining the location thereof, containing at least one image from the camera of the mobile apparatus of the user and the GNSS coordinates of the mobile apparatus; determining the expected area of location of the user based on the received GNSS coordinates of the mobile apparatus by finding the three-dimensional model of the environment; determining a global descriptor for said image; determining at least one similar image in the determined three-dimensional model of the environment based on the obtained global descriptor; calculating singular points and calculating two-dimensional local descriptors therefor on the image of the user request; comparing the local descriptors; based on the result of comparison, defining a subset of singular points wherefor a projection of three-dimensional points is present on the three-dimensional model of the environment; determining the three-dimensional coordinates and the rotation of the camera of the mobile apparatus relative to the coordinate system of the three-dimensional model of the environment.EFFECT: increase in the accuracy and speed of determining the location of the mobile apparatus of the user.13 cl, 6 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

[0001] Настоящее техническое решение в общем относится к области обработки цифровых данных, а в частности, способам и системам для целей определения местоположения пользователя в системах визуального позиционирования (англ. Visual Positioning System).[0001] The present technical solution generally relates to the field of digital data processing, and in particular, methods and systems for the purpose of determining the location of a user in visual positioning systems (eng. Visual Positioning System).

УРОВЕНЬ ТЕХНИКИLEVEL OF TECHNOLOGY

[0002] Разработки в области определения местоположения пользователя с помощью мобильных вычислительных устройств всегда являлись приоритетной областью разработок с учетом стремительного развития сектора потребительской цифровой техники. Наиболее часто используемыми подходами являются программные приложения (например, Яндекс.Карты, Google Maps, Navitel и т.п.), предлагающие различные цифровые карты для позиционирования пользователя с помощью получения его координат ГНСС (Глобальная навигационная спутниковая система), например, с помощью приемников GPS/ГЛОНАСС, массово встраиваемых в мобильные устройства.[0002] Developments in the field of determining the location of the user using mobile computing devices have always been a priority area of development in view of the rapid development of the consumer digital technology sector. The most commonly used approaches are software applications (for example, Yandex.Maps, Google Maps, Navitel, etc.) that offer various digital maps for positioning a user by obtaining his GNSS (Global Navigation Satellite System) coordinates, for example, using receivers GPS / GLONASS, massively embedded in mobile devices.

[0003] Данные подходы обладают существенным недостатком в части отсутствия возможности анализа и определения местоположения пользователя в условиях реальных объектов окружающего пространства, с помощью их захвата камерой мобильного устройства.[0003] These approaches have a significant drawback in terms of the lack of the ability to analyze and determine the location of the user in the conditions of real objects of the surrounding space, using their capture by the camera of the mobile device.

[0004] Для решения данной проблемы могут использоваться известные из уровня техники системы визуального позиционирования, которые позволяют обрабатывать изображения окружающего пространства с помощью камеры мобильного устройства для целей определения местоположения пользователя с учетом его позиции и направления обзора.[0004] To solve this problem, visual positioning systems known from the prior art can be used, which allow processing images of the surrounding space using the camera of a mobile device for the purpose of determining the location of the user, taking into account his position and direction of view.

[0005] Данные подходы построены на принципе определения локализации камеры мобильного устройства, в частности, существуют структурные методы (см., например, Active Search: Torsten Sattler, Bastian Leibe, Leif Kobbelt, Improving Image-Based Localization by Active Correspondence Search, ECCV 2012, 2012, pp 752-765.) и методы глубокого обучения (PoseNet: A. Kendall, М. Grimes, R. Cipolla, Posenet: A convolutional network for real-time 6-dof camera relocalization, In Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938-2946).[0005] These approaches are based on the principle of determining the localization of the camera of a mobile device, in particular, there are structural methods (see, for example, Active Search: Torsten Sattler, Bastian Leibe, Leif Kobbelt, Improving Image-Based Localization by Active Correspondence Search, ECCV 2012 , 2012, pp 752-765.) And deep learning methods (PoseNet: A. Kendall, M. Grimes, R. Cipolla, Posenet: A convolutional network for real-time 6-dof camera relocalization, In Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938-2946).

[0006] Структурные методы позволяют достичь достаточно высокий показатель точности, но являются требовательными к вычислительным мощностям, что обуславливает более длительное время для выполнения требуемых вычислений. Методы глубокого обучения напротив, позволяют выполнить требуемые вычисления достаточно быстро, но при этом качество позиционирования данных методов не позволяет получить достаточный показатель точности позиционирования камеры мобильного устройства.[0006] Structural methods can achieve a sufficiently high accuracy rate, but are computationally demanding, which leads to a longer time to perform the required calculations. On the other hand, deep learning methods allow performing the required calculations rather quickly, but the quality of positioning of these methods does not allow obtaining a sufficient indicator of the positioning accuracy of the camera of a mobile device.

[0007] Задача абсолютной локализации камеры в известном окружении является ключевой задачей в проектах дополненной реальности, а также вспомогательной задачей в системах навигации мобильных роботов (SLAM-задаче). Основные требования к данному подходу следующие:[0007] The task of absolute localization of the camera in a known environment is a key task in augmented reality projects, as well as an auxiliary task in navigation systems for mobile robots (SLAM task). The main requirements for this approach are as follows:

• стабильно высокая точность,• consistently high accuracy,

• минимальные требования к вычислительным ресурсам и системам хранения информации (возможность решать эту задачу непосредственно на мобильных устройствах),• minimum requirements for computing resources and information storage systems (the ability to solve this problem directly on mobile devices),

• высокая скорость решения задачи.• high speed of solving the problem.

[0008] Для обеспечения указанных требований предлагается создание нового подхода, обеспечивающего локализацию камеры мобильного устройства в условиях известного окружения для определения точного и быстрого местоположения пользователя на основании информации, получаемой с камеры мобильного устройства.[0008] To meet these requirements, it is proposed to create a new approach that provides the localization of the camera of a mobile device in a known environment to determine the accurate and fast location of the user based on information received from the camera of the mobile device.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

[0009] Настоящее техническое решение направлено на решение технической проблемы, заключающейся в быстром и точном определении местоположения пользователя в условиях известного окружения с помощью информации, захватываемой с камеры его мобильного устройства.[0009] The present technical solution is aimed at solving the technical problem of quickly and accurately determining the location of the user in a known environment using information captured from the camera of his mobile device.

[0010] Технический результат совпадает с решением технической проблемы и направлен на повышение точности и скорости определения местоположения мобильного устройства пользователя.[0010] The technical result coincides with the solution of the technical problem and is aimed at improving the accuracy and speed of determining the location of the user's mobile device.

[0011] Заявленный результат достигается за счет компьютерно-реализуемого способа определения местоположения пользователя, выполняемого с помощью процессора и содержащего этапы, на которых:[0011] The claimed result is achieved by a computer-implemented method for determining the location of the user, performed using a processor and containing the steps at which:

а) получают пользовательский запрос на определение его местоположения, содержащий по меньшей мере одно изображение, полученное с камеры мобильного устройства пользователя, причем изображение содержит по меньшей мере часть одного из объектов окружающего пространства;a) receive a user request to determine its location, containing at least one image received from the camera of the user's mobile device, and the image contains at least part of one of the objects of the surrounding space;

b) определяют глобальный дескриптор для упомянутого изображения, содержащегося в пользовательском запросе;b) define a global descriptor for said image contained in the user request;

c) на основании полученного глобального дескриптора определяютc) based on the received global descriptor, determine

область нахождения пользователя с помощью нахождения трехмерной модели окружения, хранящейся в базе данных предварительно сформированных трехмерных моделей окружения, причем каждая трехмерная модель содержит по меньшей мере набор изображений окружения с различных ракурсов, при этом каждое изображение имеет особые точки, для которых рассчитаны двумерные локальные дескрипторы;the user's location area by finding a three-dimensional model of the environment stored in a database of pre-formed three-dimensional models of the environment, and each three-dimensional model contains at least a set of images of the environment from different angles, and each image has specific points for which the two-dimensional local descriptors are calculated;

по меньшей мере одно схожее изображение в определенной трехмерной модели окружения, причем определение схожего изображения выполняется с помощью алгоритма поиска ближайших соседей;at least one similar image in a certain three-dimensional model of the environment, and the definition of a similar image is performed using a search algorithm for nearest neighbors;

d) вычисляют особые точки и рассчитывают для них двумерные локальные дескрипторы на изображении пользовательского запроса;d) compute the feature points and calculate two-dimensional local descriptors for them on the image of the user request;

e) осуществляют сравнение локальных дескрипторов, полученных на этапе d), с локальными дескрипторами по меньшей мере одного изображения трехмерной модели окружения, выявленного на этапе с);e) comparing the local descriptors obtained in step d) with the local descriptors of at least one image of the three-dimensional model of the environment identified in step c);

f) определяют подмножество особых точек по итогу выполнения сравнения на этапе е), для которых имеется репроекция трехмерных точек на трехмерной модели окружения;f) determine a subset of special points from the result of the comparison in step e), for which there is a reproduction of three-dimensional points on a three-dimensional model of the environment;

g) определяют трехмерные координаты и поворот камеры мобильного устройства относительно системы координат трехмерной модели окружения на основании выявленных репроекций для подмножества особых точек на этапе f).g) determine the three-dimensional coordinates and rotation of the camera of the mobile device relative to the coordinate system of the three-dimensional model of the environment based on the identified projections for the subset of special points in step f).

[0012] В одном из частных вариантов осуществления способа двумерные локальные дескрипторы определяется с помощью алгоритма масштабно-инвариантной трансформации признаков (SIFT).[0012] In one particular embodiment of the method, the two-dimensional local descriptors are determined using a scale invariant feature transformation (SIFT) algorithm.

[0013] В другом частном варианте осуществления способа для каждой особой точки вычисляются трехмерные координаты.[0013] In another particular embodiment of the method, three-dimensional coordinates are calculated for each feature point.

[0014] В другом частном варианте осуществления способа для каждого изображения, формирующего трехмерную модель окружения, определяется по меньшей мере один глобальный дескриптор.[0014] In another particular embodiment of the method, for each image forming a three-dimensional model of the environment, at least one global descriptor is determined.

[0015] В другом частном варианте осуществления способа глобальные дескрипторы определяются с помощью по меньшей мере одной искусственной нейронной сети (ИНС).[0015] In another particular embodiment of the method, global descriptors are defined using at least one artificial neural network (ANN).

[0016] В другом частном варианте осуществления способа на основании глобальных дескрипторов формируется поисковый индекс.[0016] In another particular embodiment of the method, a search index is generated based on the global descriptors.

[0017] В другом частном варианте осуществления способа поисковый индекс сохраняется в базе данных в бинарном формате как элемент трехмерной модели окружения.[0017] In another particular embodiment of the method, the search index is stored in a database in a binary format as an element of a three-dimensional model of the environment.

[0018] В другом частном варианте осуществления способа пользовательский запрос дополнительно содержит данные, получаемые от инерциального измерительного устройства мобильного устройства пользователя.[0018] In another particular embodiment of the method, the user request further comprises data received from the inertial measurement device of the user's mobile device.

[0019] В другом частном варианте осуществления способа для трехмерной модели определяются точки соответствия на двумерных изображениях с трехмерными точками модели.[0019] In another particular embodiment of the method, for a three-dimensional model, correspondence points are determined on two-dimensional images with three-dimensional model points.

[0020] В другом частном варианте осуществления способа на основании соответствующих точек выделяется набор точек на трехмерной модели для оценки положения камеры мобильного устройства.[0020] In another particular embodiment of the method, based on the corresponding points, a set of points on the three-dimensional model is extracted to estimate the position of the camera of the mobile device.

[0021] В другом частном варианте осуществления способа дополнительно используются координаты ГНСС мобильного устройства.[0021] In another particular embodiment of the method, the GNSS coordinates of the mobile device are additionally used.

[0022] В другом частном варианте осуществления способа каждая трехмерная модель дополнительно содержит координаты ГНСС.[0022] In another particular embodiment of the method, each 3D model further comprises GNSS coordinates.

[0023] Заявленное изобретение также осуществляется с помощью системы для определения местоположения пользователя, которая содержит сервер и мобильное устройство пользователя, в которой[0023] The claimed invention is also implemented using a system for determining the location of a user, which contains a server and a mobile device of the user, in which

мобильное устройство пользователя выполнено с возможностьюthe user's mobile device is configured to

- получения по меньшей мере одного изображения с камеры мобильного устройства, причем изображение содержит по меньшей мере часть одного из объектов окружающего пространства;- obtaining at least one image from the camera of the mobile device, and the image contains at least part of one of the objects of the surrounding space;

- формирования пользовательского запроса на определение местоположения, содержащего упомянутое по меньшей мере одно изображение;- generating a user request for location determination, containing the mentioned at least one image;

сервер выполнен с возможностьюthe server is configured

- определения глобального дескриптора для упомянутого изображения, содержащегося в пользовательском запросе;- definition of a global descriptor for the said image contained in the user request;

- на основании полученного глобального дескриптора определения трехмерной модели окружения, хранящейся в базе данных предварительно сформированных трехмерных моделей окружения, причем каждая модель содержит набор изображений окружения с различных ракурсов, при этом каждое изображение имеет особые точки, для которых рассчитаны двумерные локальные дескрипторы;- based on the obtained global descriptor, the definition of a three-dimensional model of the environment stored in a database of pre-formed three-dimensional models of the environment, and each model contains a set of images of the environment from different angles, and each image has special points for which two-dimensional local descriptors are calculated;

по меньшей мере одно схожее изображение в найденной трехмерной модели окружения, причем определение изображений выполняется с помощью алгоритма поиска ближайших соседей;at least one similar image in the found three-dimensional model of the environment, and the definition of the images is performed using the algorithm for finding nearest neighbors;

- определения на основании полученного глобального дескриптора изображения пользовательского запроса- definitions based on the received global image descriptor of the user request

- осуществления сравнения локальных дескрипторов с локальными дескрипторами по меньшей мере одного изображения трехмерной модели окружения;- comparison of local descriptors with local descriptors of at least one image of a three-dimensional model of the environment;

- вычисления особых точек и двумерных локальных дескрипторов для них на изображении пользовательского запроса;- calculation of special points and two-dimensional local descriptors for them on the image of the user request;

- определения подмножества особых точек по итогу выполненного сравнения локальных дескрипторов, для которых имеется репроекция трехмерных точек на трехмерной модели окружения;- determination of a subset of singular points based on the results of the performed comparison of local descriptors, for which there is a reprojection of three-dimensional points on a three-dimensional model of the environment;

- определения трехмерных координат и поворота камеры мобильного устройства относительно системы координат трехмерной модели окружения на основании выявленных репроекций для подмножества особых точек.- determination of three-dimensional coordinates and rotation of the camera of the mobile device relative to the coordinate system of the three-dimensional model of the environment based on the identified projections for a subset of special points.

[0024] В одном из частных вариантов реализации системы мобильное устройство пользователя выбирается из группы: смартфон, планшет, портативная игровая приставка, ноутбук, устройство дополненной реальности.[0024] In one of the particular embodiments of the system, the user's mobile device is selected from the group: smartphone, tablet, portable game console, laptop, augmented reality device.

[0025] В одном из частных вариантов реализации системы обмен данными между мобильным устройством пользователя и сервером осуществляется посредством информационной сети Интернет.[0025] In one of the particular embodiments of the system, data exchange between the user's mobile device and the server is carried out via the Internet.

[0026] В одном из частных вариантов реализации системы мобильное устройство пользователя при формировании запроса дополнительно учитывает ГНСС координаты устройства.[0026] In one of the particular embodiments of the system, the user's mobile device, when generating a request, additionally takes into account the GNSS coordinates of the device.

[0027] В одном из частных вариантов реализации системы поиск моделей трехмерного пространства осуществляется с учетом ГНСС координат мобильного устройства пользователя.[0027] In one of the particular embodiments of the system, the search for three-dimensional space models is carried out taking into account the GNSS coordinates of the user's mobile device.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF DRAWINGS

[0028] Фиг. 1 иллюстрирует блок-схему выполнения заявленного способа.[0028] FIG. 1 illustrates a block diagram of the implementation of the claimed method.

[0029] Фиг. 2А иллюстрирует пример 3D модели окружения в открытом пространстве.[0029] FIG. 2A illustrates an example of a 3D outdoor environment model.

[0030] Фиг. 2Б иллюстрирует пример 3D модели окружения внутри помещения.[0030] FIG. 2B illustrates an example of a 3D indoor environment model.

[0031] Фиг. 3А иллюстрирует принцип обмена данными между пользователем и сервером.[0031] FIG. 3A illustrates the principle of communication between a user and a server.

[0032] Фиг. 3Б иллюстрирует пример сравнения особых точек между изображениями модели окружения и кадром с камеры устройства пользователя.[0032] FIG. 3B illustrates an example of comparing feature points between images of the environment model and a frame from the camera of the user's device.

[0033] Фиг. 4 иллюстрирует общий вид вычислительного устройства для реализации заявленного изобретения.[0033] FIG. 4 illustrates a general view of a computing device for implementing the claimed invention.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯCARRYING OUT THE INVENTION

[0034] Как представлено на Фиг. 1 заявленный способ (100) определения местоположения пользователя осуществляется с помощью выполнения последовательных этапов. В основу способа положено формирование трехмерных моделей (3D) окружения для целей их использования при последующей локализации позиции пользователей с помощью анализа изображений, фиксируемых с помощью мобильных устройств.[0034] As shown in FIG. 1, the claimed method (100) for determining the user's location is carried out by performing successive steps. The method is based on the formation of three-dimensional models (3D) of the environment for the purposes of their use with the subsequent localization of the position of users by analyzing images captured using mobile devices.

[0035] На Фиг. 2А-Фиг. 2Б представлены примеры созданной 3D модели окружения (20) по группе изображений с различных ракурсов (201)-(203). Построение модели окружения состоит из следующих этапов. С помощью программного обеспечения, обеспечивающего создание трехмерных реконструкций, например, программное обеспечение, созданное на базе фреймворков ARKit, COLMAP, OpenMVG, pytorch, TensorFlow и т.п., создается последовательность изображений, на которых видны все элементы описываемого окружения с разных ракурсов, например, здания, группы зданий, улицы, внутренняя планировка зданий, объекты окружения, архитектурные элементы, мебель, предметы интерьера и т.п. Также программа сохраняет информацию о положении камеры для каждого изображения (эта информация используется для последующей валидации расчетов). В качестве такой информации используется 6-Dof позиция пользователя, в частности, его трехмерное положение и углы поворота в трехмерном пространстве (x, y, z).[0035] FIG. 2A-FIG. 2B shows examples of the created 3D model of the environment (20) for a group of images from different angles (201) - (203). Building an environment model consists of the following steps. With the help of software that provides the creation of three-dimensional reconstructions, for example, software created on the basis of the ARKit, COLMAP, OpenMVG, pytorch, TensorFlow frameworks, etc., a sequence of images is created, in which all elements of the described environment are visible from different angles, for example , buildings, groups of buildings, streets, interior layout of buildings, objects of the environment, architectural elements, furniture, interior items, etc. The program also saves information about the position of the camera for each image (this information is used for the subsequent validation of calculations). As such information, the 6-Dof position of the user is used, in particular, his three-dimensional position and angles of rotation in three-dimensional space (x, y, z).

[0036] Съемка изображений (201)-(203) ведется по определенной методике: в виде последовательных поворотов камеры в горизонтальной плоскости и последующего перехода к следующей точке съемки. Это повышает точность расчетов.[0036] Shooting images (201) - (203) is carried out according to a certain method: in the form of successive rotations of the camera in the horizontal plane and the subsequent transition to the next shooting point. This improves the accuracy of the calculations.

[0037] Полученная 3D модель (20) представляет собой облако точек, формируемое в ходе упомянутой съемки. На каждом изображении выделяются особые точки (2011)-(2013), для которых определяются двумерные (2D) локальные дескрипторы с помощью алгоритма масштабно-инвариантной трансформации признаков (SIFT). Также, могут применяться такие примеры алгоритмов, как ORB, SURF, LIFT, BRIEF, SuperPoint и др.[0037] The resulting 3D model (20) is a point cloud generated during said survey. Special points (2011) - (2013) are highlighted on each image, for which two-dimensional (2D) local descriptors are determined using the scale-invariant feature transformation (SIFT) algorithm. Also, examples of algorithms such as ORB, SURF, LIFT, BRIEF, SuperPoint, etc. can be applied.

[0038] Вычисленные локальные дескрипторы сохраняются в базу данных в заданном формате, например, COLMAP (система формирования структур из движения, англ. structure-from-motion). С помощью COLMAP осуществляется построение облака точек трехмерной модели (20), по которому реализуется возможность расчета репродукции положения двумерных точек (2011)-(2013) в трехмерной системе координат трехмерной модели (20). В качестве особых двумерных точек (2011)-(2013) могут использоваться различные отличимые части объектов окружения, например, части зданий, элементы построек (арки, вывески, колонны, статуи и т.п.) и др.[0038] The calculated local descriptors are stored in a database in a predetermined format, for example, COLMAP (structure-from-motion system). With the help of COLMAP, a point cloud of a three-dimensional model is constructed (20), according to which the possibility of calculating the reproduction of the position of two-dimensional points (2011) - (2013) in a three-dimensional coordinate system of a three-dimensional model (20) is realized. Various distinguishable parts of environmental objects can be used as special two-dimensional points (2011) - (2013), for example, parts of buildings, building elements (arches, signs, columns, statues, etc.), etc.

[0039] Для каждого изображения вычисляются глобальные дескрипторы с использованием модели машинного обучения, например, искусственной нейронной сети NetVLAD, предобученной на датасете Pittsburgh 30k. Глобальный дескриптор представляет собой вектор, отражающий основную информацию, закодированную на изображении.[0039] For each image, global descriptors are computed using a machine learning model such as an artificial neural network NetVLAD, pre-trained on the Pittsburgh 30k dataset. The global descriptor is a vector representing the basic information encoded in the image.

[0040] По глобальным дескрипторам создается поисковый индекс с помощью одного из применяемых способов, например, FAISS, Nmslib, Falconn и др. Поисковый индекс представляет собой структуру данных, которая содержит информацию об изображениях и используется в системах поиска по изображению. Поисковый индекс сохраняется в бинарном формате как элемент 3D модели окружения (20).[0040] A search index is generated from the global descriptors using one of the applicable methods, for example, FAISS, Nmslib, Falconn, etc. A search index is a data structure that contains information about images and is used in image search systems. The search index is stored in a binary format as an element of the 3D model of the environment (20).

[0041] Далее рассмотрим процесс выполнения способа (100) с отсылкой к Фиг. 3А-3Б. На этапе (101) пользователь (30) с помощью мобильного устройства (300) инициирует запрос на выполнение определения его местоположения, который передается на сервер (302) посредством сети передачи данных (305), например, Интернет. Пользовательский запрос содержит одно или несколько изображений (301), захватываемых с помощью камеры устройства (300). Изображение (301) в общем случае содержит по меньшей мере часть одного из объектов окружающего пространства, например, здание, элемент внутренней отделки, архитектуры и т.п. Изображение (301) также может дополнительно учитывать координаты ГНСС мобильного устройства, получаемые с помощью соответствующего антенного модуля устройства (300) (GPS, ГЛОНАСС, BeiDou и др.). Дополнительно пользовательский запрос может содержать информацию, получаемую с инерциальных датчиков мобильного устройства (300), например, показания гироскопа.[0041] Next, a process of performing method (100) with reference to FIG. 3A-3B. At step (101), the user (30), using the mobile device (300), initiates a request to determine his location, which is transmitted to the server (302) via a data network (305), for example, the Internet. The user request contains one or more images (301) captured by the camera of the device (300). The image (301) generally contains at least a part of one of the objects of the surrounding space, for example, a building, an element of interior decoration, architecture, etc. The image (301) can also additionally take into account the GNSS coordinates of the mobile device obtained using the corresponding antenna module of the device (300) (GPS, GLONASS, BeiDou, etc.). Additionally, the user request may contain information obtained from the inertial sensors of the mobile device (300), for example, gyroscope readings.

[0042] На этапе (102) на сервере (302) выполняется определение глобального дескриптора для полученного пользовательского изображения (301). Определение глобального дескриптора для изображения (301), которое содержит полностью или частично объект (или его часть) окружения, может осуществляться с помощью упомянутой ранее искусственной нейронной сети NetVlad или любого другого способа, пригодного для выполнения данной функции.[0042] In step (102), the server (302) determines the global descriptor for the obtained user image (301). Determination of the global descriptor for the image (301), which contains in whole or in part an object (or part of it) of the environment, can be carried out using the previously mentioned artificial neural network NetVlad or any other method suitable for performing this function.

[0043] На этапе (103) на основании полученного глобального дескриптора сервер (302) осуществляет определение предполагаемой области нахождения пользователя (30). Данное определение осуществляется на основании сравнения полученного глобального дескриптора с хранящимися в базе данных (303) предварительно сформированных трехмерных моделей окружения (20).[0043] At step (103), based on the obtained global descriptor, the server (302) determines the intended area of the user (30). This determination is carried out on the basis of comparing the obtained global descriptor with pre-formed three-dimensional environment models (20) stored in the database (303).

[0044] Для каждой модели (30) сформирован поисковый индекс на основании глобальных дескрипторов, входящих в них изображений (201)-(203) с различных ракурсов с рассчитанными локальными 2D дескрипторами в соответствующих особых точках (2011)-(2013).[0044] For each model (30), a search index is generated based on the global descriptors, the images (201) - (203) included in them from different angles with the calculated local 2D descriptors at the corresponding special points (2011) - (2013).

[0045] На этапе (104) при нахождении одной или нескольких подходящих моделей (20) при их анализе на предмет схожести с глобальным дескриптором пользовательского изображения (301), осуществляется поиск одного или нескольких изображений (201)-(203), формирующих упомянутую 3D модель окружения (20), для целей последующего выявления схожих объектов окружения. Алгоритмически выявляется ряд схожих изображений соответствующей 3D модели окружения (20). Определение изображений выполняется с помощью определения меры близости (L2-норма) при помощи метода, основанного на поиске изображений по алгоритму ближайших соседей, например, с помощью метода FAISS (сокр. от англ. Facebook AI Research Similarity Search).[0045] At step (104), when one or more suitable models (20) are found when they are analyzed for similarity to the global descriptor of a custom image (301), one or more images (201) - (203) are searched for forming the aforementioned 3D environment model (20), for the purpose of subsequent identification of similar objects of the environment. Algorithmically, a number of similar images of the corresponding 3D model of the environment are identified (20). Image determination is performed by determining the proximity measure (L2-norm) using a method based on image search using the nearest neighbors algorithm, for example, using the FAISS method (abbreviated from the Facebook AI Research Similarity Search).

[0046] На этапе (105) осуществляется вычисление особых точек (ЗОН)-(3013) для изображения (301) пользовательского запроса и выполняется расчет для упомянутых точек двумерных локальных дескрипторов с помощью вышеуказанного алгоритма SIFT.[0046] In step (105), special points (ZON) - (3013) are calculated for the user request image (301) and the two-dimensional local descriptor points are calculated using the above SIFT algorithm.

[0047] Далее осуществляется сравнение (этап 106) вычисленных двумерных локальных дескрипторов, полученных на этапе (105), с дескрипторами изображений (201)-(203) трехмерной модели окружения (20). Как показано на Фиг. 3Б, по итогам выполненного сравнения локальных дескрипторов на этапе (107) определяется подмножество двумерных точек на как минимум одном выявленном изображении (201)-(203) модели трехмерного окружения (20).[0047] Next, a comparison is made (step 106) of the calculated 2D local descriptors obtained in step (105) with the image descriptors (201) - (203) of the 3D environment model (20). As shown in FIG. 3B, based on the results of the comparison of local descriptors at step (107), a subset of two-dimensional points is determined on at least one identified image (201) - (203) of a three-dimensional environment model (20).

[0048] На этапе (107) выполняется анализ соответствия дескрипторов изображений (201) для модели окружения (20), хранящейся в базе данных (303), и изображением пользователя (301). На данном этапе осуществляется определение подмножества особых точек, которые содержатся на изображении пользователя (301) по итогу сравнения с изображениями (201)-(203), для которых имеется репроекция трехмерных точек на трехмерной модели окружения (20).Под репрокцией понимается отображение точки в пространстве с различных ракурсов, что позволяет более точно определить схожие изображения, содержащие данные точки (см, например, Zhang. Camera Models and Image Reprojection // ECE661 Computer Vision Homework 6. 2008).[0048] In step (107), an analysis is performed to match the image descriptors (201) for the environment model (20) stored in the database (303) and the user's image (301). At this stage, a subset of singular points is determined, which are contained in the user's image (301) as a result of comparison with images (201) - (203), for which there is a reproduction of three-dimensional points on a three-dimensional model of the environment (20). space from different angles, which makes it possible to more accurately determine similar images containing these points (see, for example, Zhang. Camera Models and Image Reprojection // ECE661 Computer Vision Homework 6. 2008).

[0049] Поскольку для изображений (201)-(203), формирующих 3D модель окружения (20), заранее рассчитаны 2D-3D соответствия между точками трехмерной модели (20) и их двумерными проекциями (2011)-(2013) на определенном изображении (201)-(203), то с помощью сравнения двумерных дескрипторов становится возможным определение трехмерных координат (репроекций) для особых точек (3011)-(3013) на пользовательском изображении (301).[0049] Since for images (201) - (203), forming a 3D model of the environment (20), 2D-3D correspondences between the points of the three-dimensional model (20) and their two-dimensional projections (2011) - (2013) on a certain image ( 201) - (203), then by comparing the two-dimensional descriptors, it becomes possible to determine the three-dimensional coordinates (projections) for the special points (3011) - (3013) on the user image (301).

[0050] На этапе (108) с помощью алгоритма perspective-n-point (сокр. PNP) осуществляется определение положения камеры устройства (300). Алгоритм PNP обеспечивает решение задачи оценки положения калиброванной камеры по набору из п трехмерных точек в трехмерном пространстве и их соответствующих 2D проекциям на изображении. В качестве такого алгоритма может применяться известный из уровня техники алгоритм P3P+RANSAC (X.S. Gao, X.-R. Hou, J. Tang, H.-F. Chang, Complete Solution Classification for the Perspective-Three-Point Problem, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 25, issue 8, pp 930-943, 2003).[0050] In step (108), the camera position of the device (300) is determined using the perspective-n-point (abbreviated PNP) algorithm. The PNP algorithm provides a solution to the problem of estimating the position of a calibrated camera from a set of n three-dimensional points in three-dimensional space and their corresponding 2D projections on the image. As such an algorithm, the P3P + RANSAC algorithm known from the prior art (XS Gao, X.-R. Hou, J. Tang, H.-F. Chang, Complete Solution Classification for the Perspective-Three-Point Problem, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 25, issue 8, pp 930-943, 2003).

[0051] С помощью алгоритма PNP определяются трехмерные координаты и данные о повороте камеры мобильного устройства (300) относительно системы координат трехмерной модели окружения (20). Указанное определение выполняется на основании выявленных репроекций для подмножества особых точек (3011)-(3013) пользовательского изображения (301). Это позволяет уточнить местоположение пользователя (30) и получить соответствующие данные на сервере (302).[0051] Using the PNP algorithm, three-dimensional coordinates and rotation data of the camera of the mobile device (300) relative to the coordinate system of the three-dimensional model of the environment (20) are determined. This determination is performed based on the detected projections for the subset of feature points (3011) - (3013) of the user image (301). This allows you to clarify the location of the user (30) and obtain the corresponding data on the server (302).

[0052] Под позицией камеры по итогам работы способа (100) понимается [R, t] - поворот и смещение камеры устройства (300) относительно системы координат облака точек, формирующего трехмерную модель (20), где[0052] The position of the camera based on the results of the method (100) is understood as [R, t] - rotation and displacement of the camera of the device (300) relative to the coordinate system of the point cloud forming a three-dimensional model (20), where

[R, t] - это скомбинированная матрица, в которой[R, t] is a combined matrix in which

R=>{α, β, γ} - углы поворота виртуальной камеры относительно осей Ох, Оу, Oz,R => {α, β, γ} - angles of rotation of the virtual camera relative to the axes Ox, Oy, Oz,

t=>{x, у, z} - смещение положения виртуальной камеры, предсказываемое по пользовательскому изображению.t => {x, y, z} - offset of the position of the virtual camera, predicted from the user image.

[0053] В качестве одного из примеров использования настоящего решения является размещение AR-контента при наведении камеры устройства (300) на объекты окружающего мира. За счет того, что осуществляется привязка AR-контента в системе координат облака точек на модели окружения (20), то зная позицию камеры и ее поворот устройства (300) через 2D-3D репроекцию можно получить отображение AR-контента на мобильном устройстве (300).[0053] As one of the examples of the use of the present solution is the placement of AR content when pointing the camera of the device (300) at objects of the surrounding world. Due to the fact that AR content is bound in the point cloud coordinate system on the environment model (20), then knowing the position of the camera and its rotation of the device (300) through 2D-3D reprojection, it is possible to obtain a display of AR content on a mobile device (300) ...

[0054] Дополнительно могут учитываться полученные координаты ГНСС мобильного устройства (300). Благодаря получаемым координатам ГНСС, на этапе (101) появляется возможность уточнения расположения объекта в мировой системе координат. Каждое облако точек, формирующее ЗД модель (20), также обладает привязкой к позициям в мировой системе координат, таким образом становятся известными примерные значения долготы и широты. Используя априорные знания о расположении благодаря ГНСС, на этапе (103) осуществляется выбор модели (20), в которой будет происходить дальнейший поиск по глобальным дескрипторам (104). Модель (20) выбирается на основании поиска кратчайшего расстояние от значений долготы и широты расположения устройства (300) пользователя и долготы/широты самой модели (20). После нахождения кандидатов (104) и получения двумерного сопоставления дескрипторов на шаге (105), позиция ГНСС также используется во время оптимизационного процесса PnP на этапе (108), в ходе которой добавляется ограничение на возможную позицию в пространстве, равное радиусу (несколько метров) от позиции ГНСС устройства (130). Это улучшает сходимость метода, а также уточняет позицию, получая наилучший результат.[0054] Additionally, the acquired GNSS coordinates of the mobile device (300) may be taken into account. Thanks to the obtained GNSS coordinates, at stage (101) it becomes possible to clarify the location of the object in the world coordinate system. Each point cloud that forms the ST model (20) also has a binding to positions in the world coordinate system, thus the approximate values of longitude and latitude become known. Using a priori knowledge of the location due to GNSS, at stage (103), the model (20) is selected, in which further search by global descriptors (104) will take place. Model (20) is selected based on finding the shortest distance from the longitude and latitude values of the user's device (300) and the longitude / latitude of the model itself (20). After finding candidates (104) and obtaining a two-dimensional mapping of descriptors in step (105), the GNSS position is also used during the PnP optimization process in step (108), during which a constraint is added on a possible position in space equal to the radius (several meters) from position of the GNSS device (130). This improves the convergence of the method, as well as refines the position, getting the best result.

[0055] Заявленный способ (100) определения позиции пользователя (30) может использоваться в качестве поправки при применении алгоритма SLAM (от англ. imultaneous localization and mapping-одновременная локализация и построение карты) внутри помещений. Данный метод используется в мобильных автономных средствах для построения карты в неизвестном пространстве или для обновления карты в заранее известном пространстве с одновременным контролем текущего местоположения и пройденного пути.[0055] The claimed method (100) for determining the position of the user (30) can be used as a correction when applying the SLAM (imultaneous localization and mapping) algorithm indoors. This method is used in mobile autonomous tools to build a map in an unknown space or to update a map in a pre-known space while monitoring the current location and the distance traveled.

[0056] В общем виде процесс использования способа (100) внутри помещений с алгоритмом SLAM выглядит следующим образом. Формируются снимки с помощью ARKit/ARCore и отправляются с соответствующими позициями на сервер (302), сохраняя позиции для последующей корректировки. С помощью применения способа (100) к каждому снимку выполняется предсказывание позиции. С учетом полученных от устройства внутри помещения позиций отбрасываются неверно предсказанные позиции, осуществляется выбор лучшей позиции по количеству инлайеров (англ, inliner, точки, удовлетворяющие модели) после этапа PnP и отправляется на устройство внутри помещения, например, устройство (300) пользователя (30).[0056] In general, the process of using the method (100) indoors with the SLAM algorithm is as follows. Images are formed using ARKit / ARCore and sent with the corresponding positions to the server (302), storing the positions for subsequent correction. By applying method (100) to each picture, position prediction is performed. Taking into account the positions received from the indoor device, incorrectly predicted positions are discarded, the best position is selected by the number of inliners (English, inliner, points that satisfy the model) after the PnP stage and sent to the indoor device, for example, user device (300) (30) ...

[0057] Далее выполняется замена текущей позиции устройства пользователя (300) новой, полученной от сервера (302) позицией, учитывая сохраненную позицию и собственное смещение за время ожидания ответа от сервера (302). Это позволяет повысить точность определения позиции устройства (300) пользователя внутри помещений.[0057] Next, the current position of the user device (300) is replaced with a new position received from the server (302), taking into account the stored position and its own offset while waiting for a response from the server (302). This improves the accuracy of determining the position of the device (300) of the user indoors.

[0058] На Фиг. 4 представлен общий вид вычислительного устройства (400). На базе устройства (400) может быть реализовано устройство пользователя (300), сервер (302) и иные непредставленные устройства, которые могут участвовать в общей информационной архитектуре заявленного решения.[0058] FIG. 4 shows a general view of the computing device (400). On the basis of the device (400), a user device (300), a server (302) and other unrepresented devices can be implemented, which can participate in the general information architecture of the claimed solution.

[0059] В общем случае, вычислительное устройство (400) содержит объединенные общей шиной информационного обмена один или несколько процессоров (401), средства памяти, такие как ОЗУ (402) и ПЗУ (403), интерфейсы ввода/вывода (404), устройства ввода/вывода (405), и устройство для сетевого взаимодействия (406).[0059] In the General case, the computing device (400) contains one or more processors (401) united by a common bus of information exchange, memory means such as RAM (402) and ROM (403), input / output interfaces (404), devices input / output (405), and a device for networking (406).

[0060] Процессор (401) (или несколько процессоров, многоядерный процессор) могут выбираться из ассортимента устройств, широко применяемых в текущее время, например, компаний Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п.[0060] The processor (401) (or multiple processors, multi-core processor) can be selected from a range of devices currently widely used, for example, Intel ™, AMD ™, Apple ™, Samsung Exynos ™, MediaTEK ™, Qualcomm Snapdragon ™ and etc.

[0061] ОЗУ (402) представляет собой оперативную память и предназначено для хранения исполняемых процессором (401) машиночитаемых инструкций для выполнение необходимых операций по логической обработке данных. ОЗУ (402), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.).[0061] RAM (402) is a random access memory and is intended for storing machine-readable instructions executed by the processor (401) for performing the necessary operations for logical processing of data. RAM (402) typically contains executable instructions of the operating system and associated software components (applications, software modules, etc.).

[0062] ПЗУ (403) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.[0062] ROM (403) is one or more persistent storage devices such as a hard disk drive (HDD), solid state data storage device (SSD), flash memory (EEPROM, NAND, etc.), optical storage media ( CD-R / RW, DVD-R / RW, BlueRay Disc, MD), etc.

[0063] Для организации работы компонентов устройства (400) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (404). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.[0063] Various types of I / O interfaces (404) are used to organize the operation of the components of the device (400) and to organize the operation of external connected devices. The choice of the appropriate interfaces depends on the specific version of the computing device, which can be, but are not limited to: PCI, AGP, PS / 2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS / Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

[0064] Для обеспечения взаимодействия пользователя с вычислительным устройством (400) применяются различные средства (405) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.[0064] To ensure the interaction of the user with the computing device (400), various means (405) I / O information are used, for example, a keyboard, display (monitor), touch display, touch-pad, joystick, mouse manipulator, light pen, stylus, touch panel, trackball, speakers, microphone, augmented reality, optical sensors, tablet, light indicators, projector, camera, biometric identification (retina scanner, fingerprint scanner, voice recognition module), etc.

[0065] Средство сетевого взаимодействия (406) обеспечивает передачу данных устройством (400) посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (406) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.[0065] The networking means (406) allows the device (400) to transmit data via an internal or external computer network, for example, Intranet, Internet, LAN, and the like. One or more means (406) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and / or BLE module, Wi-Fi module and dr.

[0066] Дополнительно могут применяться также средства спутниковой навигации в составе устройства (400), например, GPS, ГЛОНАСС, BeiDou, Galileo.[0066] Additionally, satellite navigation aids can also be used as part of the device (400), for example, GPS, GLONASS, BeiDou, Galileo.

[0067] Представленные материалы технического решения раскрывают предпочтительные примеры реализации и не должны трактоваться как ограничивающие иные, частные примеры его воплощения, не выходящие за пределы испрашиваемой правовой охраны, которые являются очевидными для специалистов соответствующей области техники.[0067] The presented materials of the technical solution disclose preferred examples of implementation and should not be construed as limiting other, particular examples of its implementation, which do not go beyond the scope of the claimed legal protection, which are obvious to specialists in the relevant field of technology.

Claims

1. A computer-implemented method for determining the location of a user's mobile device, performed using a processor and containing the stages at which:

a) receive a user request to determine its location, containing at least one image received from the camera of the user's mobile device, and the image contains at least part of one of the objects of the surrounding space, as well as the coordinates of the GNSS of the mobile device;

b) determine the intended user area based on the obtained GNSS coordinates of the mobile device by finding a three-dimensional model of the environment stored in a database of pre-formed three-dimensional models of the environment, each model containing the coordinates of the GNSS and a set of images of the corresponding environment from different angles, with each image has singular points for which two-dimensional local descriptors are calculated;

c) define a global descriptor for the said image contained in the user request;

d) determining, based on the obtained global descriptor of the user's request image, at least one similar image in the three-dimensional model of the environment determined in step b), and the determination of the images is performed using the nearest neighbors search algorithm;

e) compute the feature points and compute two-dimensional local descriptors for them on the user request image;

f) comparing the local descriptors obtained in step e) with the local descriptors of at least one image of the three-dimensional model of the environment identified in step d);

g) determine a subset of special points from the comparison at step f), for which there is a reproduction of three-dimensional points on a three-dimensional model of the environment;

h) determine the three-dimensional coordinates and rotation of the camera of the mobile device relative to the coordinate system of the three-dimensional model of the environment based on the identified projections for the subset of the special points in step g).

2. The method according to claim 2, characterized in that the two-dimensional local descriptors are determined using a scale-invariant feature transformation (SIFT) algorithm.

3. The method according to claim 1, characterized in that three-dimensional coordinates are calculated for each feature point.

4. The method according to claim 3, characterized in that for each image forming a three-dimensional model of the environment, at least one global descriptor is defined.

5. The method according to claim 4, characterized in that the global descriptors are determined using at least one artificial neural network (ANN).

6. The method according to claim 4, characterized in that a search index is formed on the basis of the global descriptors.

7. The method according to claim 6, characterized in that the search index is stored in the database in a binary format as an element of a three-dimensional model of the environment.

8. The method according to claim 1, characterized in that the user request further comprises data received from the inertial measuring device of the user's mobile device.

9. The method according to claim 4, characterized in that correspondence points are determined for the three-dimensional model on two-dimensional images with three-dimensional model points.

10. The method according to claim 9, characterized in that, based on the corresponding points, a set of points on the three-dimensional model is selected to estimate the position of the camera of the mobile device.

11. A system for determining the location of a user's mobile device, containing a server and a user's mobile device, in which

the user's mobile device is configured to

- obtaining at least one image from the camera of the mobile device, and the image contains at least part of one of the objects of the surrounding space;

- generating a user request to determine the location, containing the aforementioned at least one image obtained from the camera of the user's mobile device, as well as the coordinates of the GNSS of the mobile device;

the server is configured

- determining the intended area of the user's location based on the obtained GNSS coordinates of the mobile device by finding a three-dimensional model of the environment stored in a database of pre-formed three-dimensional models of the environment, and each model contains the coordinates of the GNSS and a set of images of the corresponding environment from different angles, while each image has special points for which two-dimensional local descriptors are calculated;

- definition of a global descriptor for the said image contained in the user request;

- determining, based on the obtained global descriptor of the user's request image, at least two similar images in the found three-dimensional model of the environment, and the definition of the images is performed using the algorithm for finding the nearest neighbors;

- comparison of local descriptors with local descriptors of at least one image of a three-dimensional model of the environment;

- calculation of special points and two-dimensional local descriptors for them on the image of the user request;

- determination of a subset of singular points based on the results of the performed comparison of local descriptors, for which there is a reprojection of three-dimensional points on a three-dimensional model of the environment;

- determination of three-dimensional coordinates and rotation of the camera of the mobile device relative to the coordinate system of the three-dimensional model of the environment based on the identified projections for a subset of special points.

12. The system according to claim 11, characterized in that the user's mobile device is selected from the group: smartphone, tablet, portable game console, laptop, augmented reality device.

13. The system according to claim 11, characterized in that the exchange of data between the user's mobile device and the server is carried out via the Internet.