CN114359392A

CN114359392A - Visual positioning method, device, chip system and storage medium

Info

Publication number: CN114359392A
Application number: CN202210254346.9A
Authority: CN
Inventors: 刘小伟; 王国毅; 周俊伟
Original assignee: Honor Device Co Ltd
Current assignee: Shanghai Glory Smart Technology Development Co ltd
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2022-04-15
Anticipated expiration: 2042-03-16
Also published as: CN114359392B

Abstract

The application provides a visual positioning method, a visual positioning device, a chip system and a storage medium, wherein the visual positioning method comprises the following steps: when the electronic equipment enters a target area, acquiring coarse positioning in the target area; the electronic equipment determines a viewing cone area corresponding to the electronic equipment according to the azimuth information; the viewing cone region comprises at least one preset characteristic point; the electronic equipment performs plane projection on the preset characteristic points along the azimuth axis direction, and selects target characteristic points from the result of the plane projection; the electronic equipment acquires a feature point descriptor vector of a target feature point along the azimuth axis direction; obtaining a characteristic point descriptor vector along the azimuth axis direction according to a characteristic point descriptor vector in at least one direction; the electronic equipment matches the feature point coordinates and the feature point descriptor vectors of the target feature points with the image currently acquired by the electronic equipment to acquire the current pose of the electronic equipment. The technical scheme shown in the application can be suitable for indoor scenes with repeated textures and the like, reduces the time consumption of positioning in the scenes, and improves the user experience.

Description

Visual positioning method, device, chip system and storage medium

Technical Field

The present application relates to the field of terminal technologies, and in particular, to a visual positioning method, an apparatus, a chip system, and a storage medium.

Background

The visual positioning technology is a technology for completing a positioning task through machine vision, and is a research hotspot in the field of Extended Reality (XR) in recent years. The existing Visual Positioning technology is applied in the field of Augmented Reality, and includes that an electronic device manufacturer realizes a part of XR functions on an electronic device by using an image acquisition device and a Visual Positioning Service (VPS) of the electronic device, for example, real-world navigation is performed by using an Augmented Reality (AR) technology.

In the prior art, in the process of performing live-action navigation by using a VPS, multiple feature matching needs to be performed on an image acquired in real time and a map stored in a cloud end to obtain accurate positioning.

However, in a scene with repeated textures such as indoors (e.g., a long indoor corridor, a room with a repeated structure, and a wall with a uniform front and back), accurate positioning is still difficult to obtain after the VPS matches a real-time acquired image with a cloud-stored map for multiple times, which results in long time consumption and poor user experience.

Disclosure of Invention

The application provides a visual positioning method, a visual positioning device, a chip system and a storage medium, which can solve the problems that in the scenes with repeated textures such as indoors, the VPS is still difficult to accurately position the images acquired in real time and the map stored in the cloud after being matched for many times, so that the time consumption is long, the user experience is poor, the positioning time under the scene is reduced, and the user experience is improved.

In a first aspect, the present application illustrates a visual localization method, comprising: when the electronic equipment enters a target area, acquiring the rough positioning of the electronic equipment in the target area; the rough positioning is used for determining the azimuth information of the electronic equipment; the azimuth information comprises position coordinates and azimuth axes of the electronic equipment; the electronic equipment determines a viewing cone area corresponding to the electronic equipment according to the azimuth information; the viewing cone region comprises at least one preset characteristic point; each preset feature point corresponds to a feature point coordinate and a feature point descriptor vector in at least one direction; the viewing cone area is a perspective observation range of the electronic equipment at the current viewing angle; the electronic equipment performs plane projection on the preset characteristic points along the azimuth axis direction, and selects at least one target characteristic point from the result of the plane projection, wherein the target characteristic point is a point which is not shielded in the result of the plane projection; the electronic equipment acquires a feature point descriptor vector of a target feature point along the azimuth axis direction; the feature point descriptor sub-vector along the azimuth axis direction is obtained according to the feature point descriptor sub-vector in at least one direction; the electronic equipment matches the feature point coordinates and the feature point descriptor vectors of the target feature points with the image currently acquired by the electronic equipment to acquire the current pose of the electronic equipment. By adopting the embodiment, the electronic equipment can obtain the current accurate positioning of the electronic equipment only through one-time characteristic matching in a scene with repeated textures such as indoors by adopting a mode of coarse positioning auxiliary visual positioning service, the positioning time under the scene is reduced, and the user experience is improved.

In an alternative implementation, an electronic device obtains a coarse localization at a target area, including: and the electronic equipment acquires the coarse positioning of the electronic equipment in the target area according to the ultra-wideband UWB positioning. By adopting the embodiment, the coarse positioning of the electronic equipment can be controlled in a smaller positioning range, and the precision is higher.

In an alternative implementation, the electronic device determines a viewing cone region corresponding to the electronic device according to the orientation information, including: the method comprises the steps that electronic equipment obtains a feature map of a target area, wherein the feature map comprises at least one sub-map; the electronic equipment determines a sub-map where the electronic equipment is located from the feature map; and the electronic equipment determines a view cone area corresponding to the electronic equipment in the sub-map according to the rough positioning. By adopting the embodiment, the electronic equipment only needs to match the viewing cone area with the sub-map, so that the time consumption of positioning is reduced.

In an alternative implementation manner, the electronic device performs plane projection on the preset feature points along the azimuth axis direction, and selects at least one target feature point from the result of the plane projection, including: the electronic equipment acquires a first vector between the position coordinate and the feature point coordinate of each preset feature point; the electronic equipment acquires the product of each first vector and the characteristic point descriptor vector along the azimuth axis direction; the electronic equipment acquires an inverse cosine value of a product of each preset feature point; and if the inverse cosine value is smaller than the preset threshold value, the electronic equipment determines the preset feature point as a candidate feature point. By adopting the embodiment, the electronic equipment screens the preset feature points, so that the matching times are reduced, and the time consumed for positioning is reduced.

In an alternative implementation manner, the electronic device performs plane projection on the preset feature points along the azimuth axis direction, and selects at least one target feature point from the result of the plane projection, including: the electronic equipment sequentially performs plane projection on each candidate characteristic point along the azimuth axis direction according to the sequence of the distance between each candidate characteristic point and the electronic equipment from far to near; in the plane projection process of the electronic equipment, if the candidate feature point projected first is shielded on the candidate feature point projected later, deleting the candidate feature point projected first; and the electronic equipment determines the candidate characteristic points which are not shielded in the plane projection process as target characteristic points. By adopting the embodiment, the electronic equipment screens the candidate characteristic points, so that the matching times are reduced, and the time consumption of positioning is reduced.

In an alternative implementation manner, the electronic device obtains a feature point descriptor vector of a target feature point along an azimuth axis direction, including: the electronic equipment acquires a feature point descriptor vector of at least one direction of a target feature point; the electronic equipment determines the characteristic point descriptor vector in at least one direction as the characteristic point descriptor vector of the target characteristic point along the direction axis direction according to a linear interpolation method. By adopting the embodiment, the electronic equipment acquires the feature point descriptor vector along the azimuth axis direction, so that the matching times are reduced, and the time consumption for positioning is reduced.

In an alternative implementation manner, the electronic device obtains a feature point descriptor vector of a target feature point along an azimuth axis direction, including: the electronic equipment acquires a feature point descriptor vector of at least one direction of a target feature point; the electronic equipment determines the feature point descriptor vector in at least one direction as the feature point descriptor vector of the target feature point along the direction of the azimuth axis according to a deep learning model interpolation method. By adopting the embodiment, the electronic equipment acquires the feature point descriptor vector along the azimuth axis direction, so that the matching times are reduced, and the time consumption for positioning is reduced.

In an alternative implementation manner, the matching, by the electronic device, of the feature point coordinates and the feature point descriptor vector of the target feature point and an image currently acquired by the electronic device to obtain a current pose of the electronic device includes: the method comprises the steps that the electronic equipment obtains image feature points in a currently acquired image, wherein the image feature points are extracted from the currently acquired image by the electronic equipment according to a feature extraction algorithm; the electronic equipment generates a matching point pair according to the image characteristic point and the target characteristic point; and the electronic equipment executes a 2D-3D matching algorithm on the matching point pairs to obtain a matching result of the image characteristic points and the target characteristic points. By adopting the embodiment, the electronic equipment generates the matching point pairs from the image characteristic points and the target characteristic points so as to accurately position through one-time matching, and the time consumption of positioning is reduced.

In an alternative implementation manner, the matching, by the electronic device, of the feature point coordinates and the feature point descriptor vector of the target feature point and an image currently acquired by the electronic device to obtain a current pose of the electronic device includes: and the electronic equipment executes a pose estimation algorithm PNP and a random sampling algorithm RANSAC on the matching result to acquire the current pose of the electronic equipment. By adopting the embodiment, the electronic equipment analyzes the matching result through the algorithm to obtain the accurate pose.

The visual positioning method can solve the problems that in scenes with repeated textures such as indoors, accurate positioning is still difficult to obtain after images acquired in real time and a map stored in a cloud end are matched for multiple times by a VPS, time consumption is long, user experience is poor, time consumption of positioning under the scene is reduced, and user experience is improved. Compared with the traditional VPS mode that matching needs to be carried out 30-50 times, the technical scheme shown in the application only needs to carry out 1 time of matching, greatly improves the matching speed, simultaneously does not need to set an image retrieval module like the traditional VPS, and cannot be misled by a similar image structure in the image retrieval process.

In a second aspect, the present application also shows a visual positioning apparatus having the functionality to implement the above-described method. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the electronic device includes a processor configured to process the apparatus to perform corresponding functions of the method. The apparatus may also include a memory, coupled to the processor, that retains program instructions and data necessary for the apparatus.

In a third aspect, the present application provides a chip system, which is applied to an electronic device; the chip system includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected through a line; the interface circuit is used for receiving signals from a memory of the electronic equipment and sending the signals to the processor, and the signals comprise computer instructions stored in the memory; when the processor executes the computer instructions, the electronic device performs the design as described in the first aspect and any of its possible designs.

In a fourth aspect, the present application provides a computer storage medium having instructions stored therein, where the instructions, when executed on a computer, cause the computer to perform some or all of the steps of the message display method in the first aspect and various possible implementations of the first aspect.

It is to be understood that, the beneficial effects that can be achieved by the visual positioning apparatus of the second aspect, the chip system of the third aspect, and the computer storage medium of the fourth aspect provided above can refer to the beneficial effects of the first aspect and any possible design manner thereof, and are not described herein again.

Drawings

Fig. 1 is a schematic view of a scenario provided in an embodiment of the present application;

FIG. 2 is a flow chart of a method of providing visual positioning services in accordance with an embodiment of the present application;

fig. 3 is a schematic diagram of an indoor scene provided in an embodiment of the present application;

fig. 4 is a schematic hardware structure diagram of an electronic device 100 according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a software structure of the electronic device 100 according to an embodiment of the present application;

FIG. 6 is a flowchart of a visual positioning method provided by an embodiment of the present application;

FIG. 7 is a schematic illustration of a UWB positioning range provided by an embodiment of the present application;

FIG. 8 is a schematic view of a viewing cone region provided in an embodiment of the present application;

FIG. 9 is a schematic illustration of a sub-map provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of a point cloud provided in an embodiment of the present application;

FIG. 11 is a diagram of a first vector and a feature point descriptor vector along an azimuth axis according to an embodiment of the present application;

FIG. 12 is a schematic plan view of a projection provided in accordance with another embodiment of the present application;

FIG. 13 is a schematic diagram of a linear interpolation scheme provided by another embodiment of the present application;

FIG. 14 is a diagram of a visual positioning hardware device provided by an embodiment of the present application;

fig. 15 is a schematic diagram of a visual positioning software device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be described below clearly with reference to the drawings in the embodiments of the present application.

In the description of this application, "/" means "or" unless otherwise stated, for example, A/B may mean A or B. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. Further, "at least one" means one or more, "a plurality" means two or more. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily limit the difference.

It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In order to facilitate the technical solutions of the embodiments of the present application to be understood by the skilled person, the technical terms related to the embodiments of the present application are explained below.

1. Visual Positioning Service (VPS): the service is a service for performing positioning using image information, and the service matches a 3D map with image information acquired by an image acquisition device of an electronic device based on the 3D map to identify orientation information of the electronic device with high accuracy.

2. Ultra Wide Band (UWB): the wireless carrier communication technology utilizes nanosecond non-sine wave narrow pulses to transmit data and occupies a wide frequency spectrum range. The UWB has the advantages of low system complexity, low power spectral density of transmitted signals, insensitivity to channel fading, low interception capability, high positioning accuracy and the like, and is suitable for high-speed wireless access in dense multipath places such as indoor places and the like.

3. Extended Reality (XR): the Virtual environment is a Virtual environment in which a real world and a Virtual world are combined by a computer to create human-computer interaction, and is a generic term for various technologies such as Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR).

First, an application scenario of the embodiment of the present application will be described with reference to the drawings.

Fig. 1 is a schematic view of a scenario to which the embodiment of the present application is applied. As shown in fig. 1, when a user enters an unfamiliar scene area, the electronic device may provide live-action navigation to the user through VR technology to enable the user to accurately perform route identification. In the process of live-action navigation of the electronic equipment, the electronic equipment needs to identify the current scene so as to realize positioning of the electronic equipment according to the current scene. The electronic equipment can accurately position itself under the conditions of no network and no satellite positioning according to the built-in characteristic map.

Fig. 2 is a flow chart of a method of a visual location service. As shown in fig. 2, after the electronic device collects image information of a current scene, the image information is input into the image retrieval module, a matching area of the image information in the cloud feature map is retrieved through the image retrieval module, after the matching area is retrieved, feature points are extracted from the image information, the feature points extracted from the image information are matched with the feature points in the feature map to obtain a matching result, and a pose of the matching result is calculated to obtain a current pose of the electronic device.

It should be noted that visual positioning services are not applicable in some scenarios, such as: indoor long corridors, rooms with repeated structures, walls with completely consistent front and back surfaces and the like. In such a scene, since there are many regions having repeated textures, the electronic device needs to match the acquired image information of the current scene with the feature map multiple times. In the matching process, the electronic device may take a long time to obtain an accurate pose, or the process is terminated due to matching failure, which affects the user experience of the XR technology.

Fig. 3 is a schematic view of an indoor scene. As shown in fig. 3, the indoor corridor has the same decoration style, when the electronic device collects the image information of the first path, the collected image information is similar to the image information of the second path, and when the feature point is extracted from the image information collected by the first path, the electronic device needs to perform multiple matching to confirm that the current image information is the image information of the first path but not the image information of the second path, or the electronic device mistakenly considers that the image information of the first path is the image information of the second path, which causes a path navigation error, and affects the experience of the user on the XR technology.

In the technical scheme, in a scene with repeated textures such as indoors, the electronic equipment needs to be matched for multiple times when being positioned by adopting the VPS, and the positioning precision is reduced or the electronic equipment is invalid, so that the VPS method has the problems of long time consumption and poor user experience in the scene.

In order to solve the problems in the prior art, the embodiment of the application shows a visual positioning method. The method can be applied to electronic equipment.

The electronic device 100 in the present application may be a mobile terminal or a fixed terminal having a touch screen, such as a tablet computer (PAD), a Personal Digital Assistant (PDA), a handheld device having a wireless communication function, a computing device, a vehicle-mounted device, or a wearable device, a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a wireless terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a wireless terminal in remote medical (remote medical), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety, a wireless terminal in smart city (smart city), a wireless terminal in smart home (smart home), or the like. The form of the terminal device is not particularly limited in the embodiment of the present application.

Fig. 4 shows a hardware configuration diagram of the electronic device 100.

The electronic device 100 may include a processor 110, a memory 120, an antenna 130, a mobile communication module 140, and a sensor module 150. Among other things, processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution. The sensor module 150 may include a gyro sensor 150A, an air pressure sensor 150B, a magnetic sensor 150C, an acceleration sensor 150D, a gravity sensor 150E, and the like.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

The wireless communication function of the electronic device 100 may be implemented by the antenna 130, the mobile communication module 140, the modem processor, the baseband processor, and the like. Where antenna 130 includes at least one antenna panel, each antenna panel may be used to transmit and receive electromagnetic wave signals, and antenna 130 may be used to cover a single or multiple communication bands. In other embodiments, antenna 103 may be used in conjunction with a tuning switch.

The mobile communication module 140 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 140 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 140 may receive the electromagnetic wave from the antenna 130, filter, amplify, etc. the received electromagnetic wave, and transmit the filtered electromagnetic wave to the modem processor for demodulation. The mobile communication module 140 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 130 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 140 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 140 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device or displays images or videos through a display screen. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 140 or other functional modules, independent of the processor 110.

In some embodiments, the antenna 130 and the mobile communication module 140 of the electronic device 100 are coupled such that the user device 100 may communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include a fifth generation mobile communication technology new air interface (5 th generation mobile network new radio, 5G NR), global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), time division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), and the like.

The memory 120 may be used to store computer-executable program code, which includes instructions. The memory 120 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The stored data area may store data (e.g., audio data, a phonebook, etc.) created during use of the user device 100, and the like. Further, the memory 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 110 executes various functional applications of the user device 100 and data processing by executing instructions stored in the memory 120 and/or instructions stored in a memory provided in the processor.

The gyro sensor 150A may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 150A. The gyro sensor 150A may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 150A detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the user device 100 through a reverse movement, thereby achieving anti-shake. The gyro sensor 150A may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 150B is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.

The acceleration sensor 150D may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for identifying the gesture of the user equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the electronic device 100.

Fig. 5 shows a software structure diagram of the electronic device 100.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in fig. 5, the application package may include Applications (APP) such as camera, gallery, mailbox, bluetooth, memo, music, video, file management, etc.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 5, the application framework layers may include a window manager, a view system, a drag manager, a content provider, a resource manager, a notification manager, and the like. The functional modules of the application framework layer may be integrated into the processor 110 illustrated in fig. 4, and the functions of the application framework layer in this embodiment may be implemented by the hardware processor 110 illustrated in fig. 4.

The window manager is used for managing window programs. Illustratively, the window manager may obtain the size of the display screen 184, determine if there is a status bar, lock the screen, intercept the screen, etc. The window manager may also manage the distribution of each APP in the application layer, and the window layout of each APP, to achieve the function of the display screen 184 displaying two APP windows. In addition, the window manager has the function of identifying the file types supported by the APP, and the like, so that the window manager can determine whether the APP can support the file types of the user dragging objects.

The view system includes visual interface elements such as interface elements that display text, interface elements that display images, and the like. The view system may be used to build a display interface for an APP. The display interface may be composed of one or more views. For example, a display interface including various types of APP icons, and the like. The view system may also construct a snapshot of the dragged object. The snapshot includes, for example, a size, an identifier, and the like of the snapshot, and the identifier may include a layer, a mark, and the like.

The drag manager may determine the location touched by the user and the snapshot of the corresponding object based on the detection signal reported by the touch sensor 160B. Further, the drag manager may control the corresponding snapshot to move on the display screen 180 along with the position touched by the user, so as to implement the drag function.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The resource manager provides various resources for the application, such as localized strings, icons, images, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media libraries (media libraries), three-dimensional graphics processing libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), accessory management services, Bluetooth apk, BT stack, and the like.

Wherein, the Bluetooth apk is a Bluetooth installation Package (Bluetooth Android Package); the BT stack is a Bluetooth protocol stack.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The accessory management service is used to manage accessories (e.g., wireless keyboard, stylus, headset, mouse, etc.) of the electronic device, such as pairing, connecting, disconnecting, and data transfer with the accessories.

The Bluetooth apk is mainly responsible for the management of the Bluetooth state of the electronic equipment, is connected with the BT stack and provides various Bluetooth services.

BT stack provides all the actual operations of bluetooth, including: switching on and off Bluetooth, managing Bluetooth, searching management, managing links, realizing various profiles and the like.

The kernel layer is a layer between hardware and software. The kernel layer at least comprises a display driver, a camera driver, an audio driver, a sensor driver and the like, and the embodiment of the application does not limit the display driver, the camera driver, the audio driver, the sensor driver and the like.

Fig. 6 is a flowchart of a visual positioning method according to an embodiment of the present application. As shown in fig. 6, the visual positioning method of the embodiment of the present application includes the following steps:

step S101, when the electronic equipment enters a target area, acquiring the rough positioning of the electronic equipment in the target area; the rough positioning is used for determining the azimuth information of the electronic equipment; the orientation information includes position coordinates and an orientation axis of the electronic device.

It should be noted that the positioning method of the electronic device can be in various forms, such as Global positioning System (GNSS) positioning, base station positioning, hybrid positioning, and the like; the electronic equipment can realize positioning with an error within 5 meters through a Global Navigation Satellite System (GNSS); the electronic equipment needs to judge the location of the location according to the strength of the received signal through the base station location, and the electronic equipment can combine Wifi and the base station to perform location through mixed location. The coarse positioning in the embodiment of the present application requires positioning the position coordinates and the azimuth axis of the electronic device to a small range.

As an alternative implementation, the electronic device obtains a coarse positioning in the target area, including: and the electronic equipment acquires the coarse positioning of the electronic equipment in the target area according to the UWB positioning.

Fig. 7 is a schematic diagram of a UWB positioning range provided in the embodiment of the present application. As shown in fig. 7, when the UWB base stations are installed in four directions indoors, the electronic device can control the self-positioning accuracy error such that the position coordinates are located in a spherical region with a radius of 20 cm and the azimuth axis is located in a conical region with a cone half angle of 3 degrees, by the connection between the built-in UWB chip and the UWB base stations. The electronic device can obtain the coarse positioning of the electronic device in the target area within the precision error range. The azimuth axis is an axis perpendicular to the center point of the electronic device.

Step S102, the electronic equipment determines a viewing cone area corresponding to the electronic equipment according to the azimuth information; the viewing cone region comprises at least one preset characteristic point; each preset feature point corresponds to a feature point coordinate and a feature point descriptor vector in at least one direction; the viewing cone region is a perspective observation range of the electronic device at a current viewing angle.

Fig. 8 is a schematic view of a viewing cone region according to an embodiment of the present application. As shown in fig. 8, the viewing cone region is a quadrangular frustum shape and is a perspective observation range of the electronic device at the current viewing angle. The view boundary points of the view range of the image acquisition device of the electronic equipment correspond to four extension lines in the real world, all objects behind the displayed point on the line can be shielded, and the shielding range is the perspective observation range of the current view angle.

As an alternative implementation, the electronic device determines a viewing cone region corresponding to the electronic device according to the orientation information, including: the method comprises the steps that electronic equipment obtains a feature map of a target area, wherein the feature map comprises at least one sub-map; the electronic equipment determines a sub-map where the electronic equipment is located from the feature map; and the electronic equipment determines a view cone area corresponding to the electronic equipment in the sub-map according to the rough positioning.

Fig. 9 is a schematic diagram of a sub-map according to an embodiment of the present application. As shown in fig. 9, the feature map of the target area is divided into four sub-maps, namely, a first sub-map, a second sub-map, a third sub-map and a fourth sub-map. And the electronic equipment determines the fourth sub-map of the electronic equipment from the feature map according to the rough positioning, and further determines a viewing cone area corresponding to the electronic equipment in the sub-map according to the rough positioning. By adopting the embodiment, the electronic equipment only needs to match the viewing cone area with the sub-map, and does not need to traverse all the feature maps, so that the time consumption for positioning is reduced.

Step S103, the electronic equipment performs plane projection on the preset feature points along the azimuth axis direction, and selects at least one target feature point from the result of the plane projection, wherein the target feature point is a point which is not shielded in the result of the plane projection.

It should be noted that the preset feature point in the view cone region is a part of the point cloud of the feature region. The point cloud is a point data set of the product appearance surface obtained by a measuring instrument in the reverse engineering.

Fig. 10 is a schematic point cloud diagram according to an embodiment of the present disclosure. As shown in fig. 10, the point cloud is a cube of six surfaces, the point cloud in the feature map is obtained by using a three-dimensional coordinate measuring machine, and the distance between the point and the point is large, and the point cloud is a sparse point cloud. It should be noted that the electronic device may obtain a better matching effect by acquiring the preset feature points in the sparse point cloud, which may reduce the matching time, but the feature map is not limited to constructing the preset feature points through the sparse point cloud, the point cloud in the feature map may also be obtained by using a three-dimensional laser scanner or a photographic scanner, and the number of points is large and the distance between the points is dense. The electronic equipment acquires the preset characteristic points of the dense point cloud, so that a more accurate matching effect can be obtained.

As an alternative implementation manner, the electronic device performs plane projection on the preset feature points along the azimuth axis direction, and selects at least one target feature point from the result of the plane projection, including: the electronic equipment acquires a first vector between the position coordinate and the feature point coordinate of each preset feature point; the electronic equipment acquires the product of each first vector and the characteristic point descriptor vector along the azimuth axis direction; the electronic equipment acquires an inverse cosine value of a product of each preset feature point; and if the inverse cosine value is smaller than the preset threshold value, the electronic equipment determines the preset feature point as a candidate feature point.

It should be noted that, the preset feature points of the view cone region are all feature points in the region space, and taking the point cloud of the cube as shown in fig. 10 as an example, the feature map provides all feature points on all surfaces of the cube, and the electronic device can obtain all feature points in the view cone region space at the current viewing angle, however, an image acquired by an image acquisition device of the electronic device can only acquire image information of an outermost layer of the view cone region corresponding to the current viewing angle, and therefore, the electronic device needs to acquire a common view relationship between the image information acquired by the image acquisition device and the preset feature points to match the image information with the corresponding preset feature points.

Fig. 11 is a schematic diagram of a first vector and a feature point descriptor vector along an azimuth axis in an embodiment of the present application. As shown in fig. 11, a position coordinate point p₀Has the coordinates of (x)₀，y₀，z₀) Presetting a characteristic point p_iHas the coordinates of (x)_i，y_i，z_i) (i is an integer greater than 0)Number), the electronic device obtains a first vector between the position coordinates and the feature point coordinates of each preset feature point as = (x)_i-x₀，y_i-y₀，z_i-z₀) (ii) a The feature point descriptor vector of the preset feature point along the azimuth axis direction is

= (a, b, c, … …); the electronic equipment obtains the product of each first vector and the characteristic point descriptor vector along the direction of the azimuth axis as

(ii) a The electronic equipment acquires that the inverse cosine value of the product of each preset feature point is arccos (

). After the electronic device obtains the inverse cosine value of each preset feature point, the preset feature points meeting the inverse cosine value of the preset threshold value can be determined as a common viewpoint so as to match image information with the common viewpoint. Wherein the preset threshold is the angle of view that can be observed by the image acquisition device of the electronic equipment

Measurement angle error from 2 times coarse positioning

And (4) summing.

Determining according to the acquisition capacity of an image acquisition device of the electronic equipment; taking the coarse positioning as the UWB positioning as an example,

and determining according to the precision of UWB positioning.

As an alternative implementation manner, the electronic device performs plane projection on the preset feature points along the azimuth axis direction, and selects at least one target feature point from the result of the plane projection, including: the electronic equipment sequentially performs plane projection on each candidate characteristic point along the azimuth axis direction according to the sequence of the distance between each candidate characteristic point and the electronic equipment from far to near; in the plane projection process of the electronic equipment, if the candidate feature point projected first is shielded on the candidate feature point projected later, deleting the candidate feature point projected first; and the electronic equipment determines the candidate characteristic points which are not shielded in the plane projection process as target characteristic points.

Fig. 12 is a schematic plan projection diagram according to an embodiment of the present application. As shown in fig. 12, candidate feature points in the view cone region have a blocking relationship, and since image information acquired by the electronic device is image information of an outer surface of an object corresponding to a current view angle, it is sufficient to match the image information with feature points on a planar projection in a direction close to the electronic device, and it is necessary to delete the blocked candidate feature points in order to reduce the number of times of matching.

It should be noted that the occlusion in the present application is not limited to complete coverage, and if the candidate feature point projected first is occluded in the occlusion range of the candidate feature point projected later, it can be considered that the candidate feature point projected first is occluded in the candidate feature point projected later. Wherein, the sheltering range can be set according to the actual situation.

Step S104, the electronic equipment acquires a feature point descriptor vector of the target feature point along the azimuth axis direction; the feature point descriptor sub-vector along the azimuth axis direction is obtained from the feature point descriptor sub-vector in at least one direction.

As an alternative implementation manner, the electronic device obtains a feature point descriptor vector of the target feature point along the azimuth axis direction, including: the electronic equipment acquires a feature point descriptor vector of at least one direction of a target feature point; and the electronic equipment determines the characteristic point descriptor vector of the at least one direction as the characteristic point descriptor vector of the target characteristic point along the azimuth axis direction according to a linear interpolation method.

It should be noted that the feature map of the target area is a 3D map obtained by accumulating the collected multi-frame feature images in a frame-by-frame accumulation manner. The multi-frame feature image is used for shooting each angle of the target area, so that for each preset feature point on each frame image, a feature point descriptor sub-vector of the direction exists in the angle direction corresponding to the frame image.

Fig. 13 is a schematic diagram of a linear interpolation method according to an embodiment of the present application. As shown in fig. 13, a feature point descriptor vector of the target feature point along the azimuth axis direction can be obtained by performing linear interpolation on the first feature point descriptor vector and the second feature point descriptor vector of the target feature point. Wherein the feature point descriptor vector along the azimuth axis direction is

= (a, b, c, … …), first feature point descriptor sub-vector

=（a₁，b₁，c₁… …), a second feature point descriptor vector

=（a₂，b₂，c₂，……）；

(ii) a Wherein the content of the first and second substances,

、

is based on

、

Is determined. It should be noted that the descriptor vector of the feature point is a vector including multiple dimensions, and for example, the descriptor vector of the feature point may be a vector of 256 dimensions.

In the embodiment of the application, the linear interpolation can adopt at least one of nearest neighbor interpolation, bilinear interpolation or bicubic interpolation, and the linear interpolation method adopts the same interpolation kernel in the image interpolation process without considering the position of the pixel point to be interpolated. The electronic equipment can obtain a faster calculation speed by adopting a nearest neighbor interpolation mode; the electronic equipment adopts a bilinear interpolation method, the interpolation effect is slightly inferior to bicubic interpolation, and the operation speed is higher than nearest neighbor interpolation; the electronic equipment adopts a bicubic interpolation method, so that the interpolation effect is good, but the calculation speed is low.

As an alternative implementation manner, the electronic device obtains a feature point descriptor vector of the target feature point along the azimuth axis direction, including: the electronic equipment acquires a feature point descriptor vector of at least one direction of a target feature point; the electronic equipment determines the feature point descriptor vector in at least one direction as the feature point descriptor vector of the target feature point along the direction of the azimuth axis according to a deep learning model interpolation method. The deep learning model interpolation can adopt a method based on edge information or a method based on wavelet coefficients.

And step S105, the electronic equipment matches the feature point coordinates and the feature point descriptor vectors of the target feature points with the currently acquired image of the electronic equipment to acquire the current pose of the electronic equipment.

As an alternative implementation manner, the electronic device matches the feature point coordinates and the feature point descriptor vectors of the target feature points with an image currently acquired by the electronic device to obtain a current pose of the electronic device, including: the method comprises the steps that the electronic equipment obtains image feature points in a currently acquired image, wherein the image feature points are extracted from the currently acquired image by the electronic equipment according to a feature extraction algorithm; the electronic equipment generates a matching point pair according to the image characteristic point and the target characteristic point; and the electronic equipment executes a 2D-3D matching algorithm on the matching point pairs to obtain a matching result of the image characteristic points and the target characteristic points.

The 2D-3D matching algorithm is commonly used for matching the 2D image and the 3D point cloud image, and when the technical scheme shown in the application screens out the target feature points from the feature map (namely, the 3D point cloud image), the image feature points in the image (namely, the 2D image) currently acquired by the electronic equipment can be matched with the target feature points according to the algorithm to obtain a matching result.

As an alternative implementation manner, the electronic device matches the feature point coordinates and the feature point descriptor vectors of the target feature points with an image currently acquired by the electronic device to obtain a current pose of the electronic device, including: and the electronic equipment executes a pose estimation algorithm PNP and a random sampling algorithm RANSAC on the matching result to acquire the current pose of the electronic equipment.

Among them, PNP is the problem of estimating the pose of the calibration electronics given a set of n 3D points on the world and their corresponding 2D projections in the image. The pose of the electronic device consists of 6 degrees of freedom, which consists of options (roll, pitch and yaw) and translation of the electronic device relative to the 3D world, so the target feature points in the embodiment of the present application are projections of 2D images, and the pose of the camera in 6 degrees of freedom can be determined in the form of rotation and translation relative to the world from matching point pairs formed by the target feature points and the image feature points.

RANSAC is a data set including outliers, parameters of a model are estimated in an iterative mode, the method can be applied to vision, straight lines or parameterized shapes are identified in point clouds, and matching point pairs forming specific shapes can be identified in a mode that PNP and RANSAC are combined to quickly determine the current pose of electronic equipment.

In the embodiments provided in the present application, the aspects of the visual positioning method provided in the present application are introduced from the perspective of the electronic device itself and the interaction between the electronic device and the network, satellite, and the like. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 14 is a schematic diagram of a visual positioning hardware device according to an embodiment of the present application.

In some embodiments, the electronic device may implement the corresponding functions by the hardware apparatus shown in fig. 14. As shown in fig. 14, the visual positioning device may include: a transceiver 1401, a memory 1402, and a processor 1403.

As a possible implementation, the processor 1403 may include one or more processing units, such as: the processor 1403 may include an application processor, a modem processor, a graphics processor, an image signal processor, a controller, a video codec, a digital signal processor, a baseband processor, and/or a neural network processor, among others. The different processing units may be separate devices or may be integrated into one or more processors. The memory 1402 is coupled to the processor 1403 for storing various software programs and/or sets of instructions. In some embodiments, memory 1402 may include volatile memory and/or non-volatile memory. The transceiver 1401 is, for example, a transceiver that may include a radio frequency circuit, a mobile communication module, a wireless communication module, and the like, and implements a wireless communication function of the UE.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: when entering a target area, acquiring the rough positioning of the electronic equipment in the target area; the rough positioning is used for determining the azimuth information of the electronic equipment; the azimuth information comprises position coordinates and azimuth axes of the electronic equipment; determining a viewing cone area corresponding to the viewing cone area according to the azimuth information; the viewing cone region comprises at least one preset characteristic point; each preset feature point corresponds to a feature point coordinate and a feature point descriptor vector in at least one direction; the viewing cone area is a perspective observation range of the electronic equipment at the current viewing angle; carrying out plane projection on the preset feature points along the direction of the azimuth axis, and selecting at least one target feature point from the result of the plane projection, wherein the target feature point is a point which is not shielded in the result of the plane projection; acquiring a characteristic point descriptor vector of a target characteristic point along the azimuth axis direction; the feature point descriptor sub-vectors in the azimuth axis direction are obtained from feature point descriptor sub-vectors in at least one direction.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: obtaining a coarse localization at a target area, comprising: and acquiring the rough positioning of the electronic equipment in the target area according to the ultra-wideband UWB positioning.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: determining a viewing cone region corresponding to the viewing cone region according to the orientation information, wherein the method comprises the following steps: acquiring a feature map of a target area, wherein the feature map comprises at least one sub-map; determining a sub-map where the electronic equipment is located from the feature map; and determining a view cone area corresponding to the electronic equipment in the sub-map according to the rough positioning.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: carrying out plane projection on the preset feature points along the direction of the azimuth axis, and selecting at least one target feature point from the result of the plane projection, wherein the method comprises the following steps: acquiring a first vector between the position coordinate and the feature point coordinate of each preset feature point; obtaining the product of each first vector and the feature point descriptor vector along the direction of the azimuth axis; acquiring an inverse cosine value of the product of each preset feature point; and if the inverse cosine value is smaller than the preset threshold value, determining the preset feature point as a candidate feature point.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: carrying out plane projection on the preset feature points along the direction of the azimuth axis, and selecting at least one target feature point from the result of the plane projection, wherein the method comprises the following steps: sequentially carrying out plane projection on each candidate characteristic point along the azimuth axis direction according to the sequence of the distance of each candidate characteristic point from far to near; in the process of plane projection, if the candidate feature point projected first is shielded on the candidate feature point projected later, deleting the candidate feature point projected first; and determining the candidate feature points which are not shielded in the plane projection process as target feature points.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: acquiring a feature point descriptor vector of the target feature point along the azimuth axis direction, wherein the method comprises the following steps: acquiring a feature point descriptor vector of at least one direction of a target feature point; and determining the characteristic point descriptor vector in at least one direction as the characteristic point descriptor vector of the target characteristic point along the direction axis direction according to a linear interpolation method.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: obtaining a feature point descriptor vector of a target feature point along the azimuth axis direction, comprising: acquiring a feature point descriptor vector of at least one direction of a target feature point; and determining the feature point descriptor vector in at least one direction as the feature point descriptor vector of the target feature point along the direction axis direction according to a deep learning model interpolation method.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: matching the feature point coordinates and the feature point descriptor vectors of the target feature points with the currently acquired image of the electronic equipment to acquire the current pose of the electronic equipment, wherein the matching comprises the following steps: acquiring image characteristic points in the currently acquired image, wherein the image characteristic points are extracted from the currently acquired image by the electronic equipment according to a characteristic extraction algorithm; generating a matching point pair according to the image characteristic point and the target characteristic point; and executing a 2D-3D matching algorithm on the matching point pairs to obtain a matching result of the image characteristic points and the target characteristic points.

As a possible implementation, the software program and/or sets of instructions in the memory 1402, when executed by the processor 1403, cause the electronic device to perform the following method steps: matching the feature point coordinates and the feature point descriptor vectors of the target feature points with the currently acquired image of the electronic equipment to acquire the current pose of the electronic equipment, wherein the matching process comprises the following steps: and executing a pose estimation algorithm PNP and a random sampling algorithm RANSAC on the matching result to acquire the current pose of the electronic equipment.

In addition, in some embodiments, the electronic device may implement the corresponding functionality through software modules. As shown in fig. 15, the visual pointing device includes: coarse positioning module 1501, viewing cone region determining module 1502, projection module 1503, feature point descriptor vector obtaining module 1504, and matching module 1505.

Wherein: the rough positioning module 1501 is configured to obtain rough positioning of the electronic device in the target area when the electronic device enters the target area; the rough positioning is used for determining the azimuth information of the electronic equipment; the azimuth information comprises position coordinates and azimuth axes of the electronic equipment; a viewing cone region determining module 1502, configured to determine a viewing cone region corresponding to the electronic device according to the orientation information; the viewing cone region comprises at least one preset characteristic point; each preset feature point corresponds to a feature point coordinate and a feature point descriptor vector in at least one direction; the viewing cone area is a perspective observation range of the electronic equipment at the current viewing angle; the projection module 1503 is used for performing plane projection on the preset feature points along the azimuth axis direction, and selecting at least one target feature point from the result of the plane projection, wherein the target feature point is a non-shielded point in the result of the plane projection; a feature point descriptor vector obtaining module 1504, configured to obtain a feature point descriptor vector of a target feature point along an azimuth axis direction; obtaining a characteristic point descriptor vector along the azimuth axis direction according to a characteristic point descriptor vector in at least one direction; the matching module 1505 is configured to match the feature point coordinates and the feature point descriptor vectors of the target feature points with the currently acquired image of the electronic device, so as to obtain the current pose of the electronic device.

Embodiments of the present application also provide a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-mentioned aspects.

Embodiments of the present application also provide a computer program product containing instructions which, when executed on a computer, cause the computer to perform the method of the above aspects.

The embodiment of the application also provides a chip system. The system-on-chip comprises a processor for enabling the apparatus to perform the functions referred to in the above aspects, e.g. to generate or process information referred to in the above methods. In one possible design, the chip system further includes a memory for storing program instructions and data necessary to receive the visual positioning device. The chip system may be constituted by a chip, or may include a chip and other discrete devices.

The controller/processor for executing the above-mentioned visual positioning apparatus according to this embodiment of the present invention may be a Central Processing Unit (CPU), a general purpose processor, an Application Processor (AP), a modem processor, a controller, a Digital Signal Processor (DSP), a baseband processor, a neural-Network Processing Unit (NPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a radio access network device. Of course, the processor and the storage medium may reside as discrete components in user equipment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above embodiments are provided to explain the purpose, technical solutions and advantages of the present application in further detail, and it should be understood that the above embodiments are merely illustrative of the present application and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims

1. A visual positioning method, characterized in that the method comprises:

when electronic equipment enters a target area, acquiring the rough positioning of the electronic equipment in the target area; the coarse positioning is used for determining the azimuth information of the electronic equipment; the orientation information comprises position coordinates and an orientation axis of the electronic device;

the electronic equipment determines a viewing cone area corresponding to the electronic equipment according to the azimuth information; the viewing cone region comprises at least one preset characteristic point; each preset feature point corresponds to a feature point coordinate and a feature point descriptor vector in at least one direction; the viewing cone area is a perspective observation range of the electronic equipment at a current viewing angle;

the electronic equipment performs plane projection on the preset feature points along the azimuth axis direction, and selects at least one target feature point from the result of the plane projection, wherein the target feature point is a point which is not shielded in the result of the plane projection;

the electronic equipment acquires a feature point descriptor vector of the target feature point along the azimuth axis direction; the characteristic point descriptor sub-vector along the azimuth axis direction is obtained according to the characteristic point descriptor sub-vector in at least one direction;

the electronic equipment matches the feature point coordinates and the feature point descriptor vectors of the target feature points with the image currently acquired by the electronic equipment to acquire the current pose of the electronic equipment.

2. The visual positioning method of claim 1, wherein the electronic device obtains a coarse position in the target area, comprising:

and the electronic equipment acquires the rough positioning of the electronic equipment in the target area according to the ultra-wideband UWB positioning.

3. The visual positioning method of claim 1, wherein the electronic device determines a viewing cone region corresponding to itself according to the orientation information, comprising:

the electronic equipment acquires a feature map of the target area, wherein the feature map comprises at least one sub-map;

the electronic equipment determines a sub-map where the electronic equipment is located from the feature map;

and the electronic equipment determines a viewing cone area corresponding to the electronic equipment in the sub-map according to the rough positioning.

4. The visual positioning method of claim 1, wherein the electronic device performs planar projection on the preset feature points along an azimuth axis direction, and selects at least one target feature point from the result of the planar projection, comprising:

the electronic equipment acquires a first vector between the position coordinate and the feature point coordinate of each preset feature point;

the electronic equipment acquires the product of each first vector and a feature point descriptor vector along the azimuth axis direction;

the electronic equipment acquires an inverse cosine value of the product of each preset feature point;

and if the inverse cosine value is smaller than a preset threshold value, the electronic equipment determines the preset feature point as a candidate feature point.

5. The visual positioning method of claim 4, wherein the electronic device performs planar projection on the preset feature points along an azimuth axis direction, and selecting at least one target feature point from the result of the planar projection comprises:

the electronic equipment sequentially performs plane projection on each candidate characteristic point along the azimuth axis direction according to the sequence of the distance between each candidate characteristic point and the electronic equipment from far to near;

in the plane projection process of the electronic equipment, if the candidate feature point projected first is shielded on the candidate feature point projected later, deleting the candidate feature point projected first;

and the electronic equipment determines the candidate characteristic points which are not shielded in the plane projection process as target characteristic points.

6. The visual positioning method of claim 1, wherein the electronic device obtains a feature point descriptor vector of the target feature point along an azimuth axis direction, and comprises:

the electronic equipment acquires a feature point descriptor vector of at least one direction of the target feature point;

and the electronic equipment determines the characteristic point descriptor vector of the at least one direction as the characteristic point descriptor vector of the target characteristic point along the azimuth axis direction according to a linear interpolation method.

7. The visual positioning method of claim 1, wherein the electronic device obtains a feature point descriptor vector of the target feature point along an azimuth axis direction, and comprises:

and the electronic equipment determines the feature point descriptor vector of the at least one direction as the feature point descriptor vector of the target feature point along the azimuth axis direction according to a deep learning model interpolation method.

8. The visual positioning method of claim 1, wherein the electronic device matches the feature point coordinates and the feature point descriptor vectors of the target feature points with the currently acquired image of the electronic device to obtain the current pose of the electronic device, and the method comprises:

the electronic equipment acquires image feature points in the currently acquired image, wherein the image feature points are extracted from the currently acquired image by the electronic equipment according to a feature extraction algorithm;

the electronic equipment generates a matching point pair according to the image characteristic point and the target characteristic point;

and the electronic equipment executes a 2D-3D matching algorithm on the matching point pairs to obtain a matching result of the image characteristic points and the target characteristic points.

9. The visual positioning method of claim 8, wherein the electronic device matches the feature point coordinates and the feature point descriptor vectors of the target feature points with the currently acquired image of the electronic device to obtain the current pose of the electronic device, and the method comprises:

and the electronic equipment executes a pose estimation algorithm PNP and a random sampling algorithm RANSAC on the matching result to acquire the current pose of the electronic equipment.

10. A visual positioning device, comprising:

the system comprises a rough positioning module, a rough positioning module and a rough positioning module, wherein the rough positioning module is used for acquiring rough positioning of electronic equipment in a target area when the electronic equipment enters the target area; the coarse positioning is used for determining the azimuth information of the electronic equipment; the orientation information comprises position coordinates and an orientation axis of the electronic device;

the viewing cone area determining module is used for determining a viewing cone area corresponding to the electronic equipment according to the azimuth information; the viewing cone region comprises at least one preset characteristic point; each preset feature point corresponds to a feature point coordinate and a feature point descriptor vector in at least one direction; the viewing cone area is a perspective observation range of the electronic equipment at a current viewing angle;

the projection module is used for carrying out plane projection on the preset characteristic points along the azimuth axis direction, and selecting at least one target characteristic point from the result of the plane projection, wherein the target characteristic point is a point which is not shielded in the result of the plane projection;

the characteristic point descriptor vector acquisition module is used for acquiring a characteristic point descriptor vector of the target characteristic point along the azimuth axis direction; the characteristic point descriptor sub-vector along the azimuth axis direction is obtained according to the characteristic point descriptor sub-vector in at least one direction;

and the matching module is used for matching the feature point coordinates and the feature point descriptor vectors of the target feature points with the currently acquired image of the electronic equipment so as to acquire the current pose of the electronic equipment.

11. A chip system, comprising: a memory and a processor, the processor and the memory coupled; wherein the memory includes program instructions which, when executed by the processor, cause the system-on-chip to perform the method of any one of claims 1-9.

12. A computer-readable storage medium, having stored thereon computer program instructions, which, when executed, implement the method of any one of claims 1-9.