CN113223007A

CN113223007A - Visual odometer implementation method and device and electronic equipment

Info

Publication number: CN113223007A
Application number: CN202110715562.4A
Authority: CN
Inventors: 胡鲲; 卢维; 王政; 李铭
Original assignee: Zhejiang Huaray Technology Co Ltd
Current assignee: Zhejiang Huaray Technology Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-08-06

Abstract

The application provides a method and a device for realizing a visual odometer, electronic equipment and a computer-readable storage medium; the method comprises the following steps: dividing a first image acquired by a downward-looking camera to obtain at least two image blocks; acquiring first characteristic points of each image block and main direction information of the first characteristic points; matching each first characteristic point with a second characteristic point included in a last frame of image of the first image based on a locality sensitive hashing algorithm to obtain a corresponding characteristic point pair; determining, based on the pair of feature points, that the camera acquires first pose change information between the first image and a last frame image of the first image; the first pose change information is related to a scale ratio parameter of the camera; determining the pose of the camera when acquiring the first image based on the first pose change information. The method and the device are not influenced by external illumination and a reflective environment, and can reduce the realization time of the visual odometer and improve the precision of the visual odometer.

Description

Visual odometer implementation method and device and electronic equipment

Technical Field

The present disclosure relates to image processing technologies, and in particular, to a method and an apparatus for implementing a visual odometer, and an electronic device.

Background

The vision odometer can calculate the pose change between adjacent frame images acquired by the camera by using a vision-related algorithm, and then obtains the pose change of the robot by using a calibration relation between the camera and the robot. The Visual odometer can also perform Visual incremental Mapping on the scene, or autonomously establish a Visual map, so as to realize Visual Simultaneous Localization and Mapping (VSLAM).

In the related technology, the first scheme is to capture images of an environment through a camera carried by a robot body, and then solve the motion between adjacent frame images captured by the camera through a pyramid optical flow method, so as to obtain the pose change of the camera within a certain time. The second scheme is that feature point extraction is carried out on two adjacent frames of images shot by a camera, descriptors of the feature points are calculated, corresponding matching point pairs are found by matching the feature points in the two adjacent frames of images, and then pose change of the camera between frames is unlocked; and accumulating the continuous interframe pose relations to form data of the visual odometer. The third scheme is that an inclined downward camera is installed at the front end of the robot, ground feature points in an image are extracted by using ground point cloud in the laser odometer, absolute scale camera motion estimation is realized based on homography transformation, and then the camera motion estimation is used for correcting self-motion point cloud distortion and pose optimization in the laser odometer to form data of the visual odometer.

However, the first solution is realized based on the assumption that the gray scale of the image is not changed, and the pose change of the camera is easily affected by external illumination. In the second scheme, the descriptor usually takes 512 dimensions to achieve the best effect, but the 512-dimensional descriptor needs more matching time and more storage space. In the third scheme, the camera is obliquely and downwards arranged on the robot, so that external parameters of the obliquely and downwards arranged camera need to be calibrated and are easily interfered by structural change, the calibration relation between the camera and the laser radar is changed, and the precision of the visual odometer is influenced; the mounting structure for mounting the camera obliquely downward is susceptible to front side light reflection in some scenes where the light reflection is severe.

Disclosure of Invention

The embodiment of the application provides a method and a device for realizing a visual odometer and electronic equipment, which are not influenced by external illumination and a reflective environment, reduce the realization time of the visual odometer and improve the realization precision of the visual odometer.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides an implementation method of a visual odometer, including:

dividing a first image acquired by a downward-looking camera to obtain at least two image blocks;

acquiring first characteristic points of each image block and main direction information of the first characteristic points;

matching each first characteristic point with a second characteristic point included in a last frame of image of the first image based on a locality sensitive hashing algorithm to obtain a corresponding characteristic point pair;

determining, based on the pair of feature points, that the camera acquires first pose change information between the first image and a last frame image of the first image; the first pose change information is related to a scale ratio parameter of the camera;

determining the pose of the camera when acquiring the first image based on the first pose change information.

In some embodiments, the obtaining the first feature points of each image block and the principal direction information of each first feature point includes:

respectively executing the following operations for each image block:

determining a centroid of the image block based on image moments of the image block;

determining a main direction of the image block based on a centroid of the image block and a geometric center of the image block;

and determining main direction information of each first characteristic point in the image block based on the main direction of the image block.

respectively executing the following operations aiming at each pixel point in each image block:

determining first gray information of the pixel points;

determining second gray information of the pixel points with the distance between the pixel points and the distance equal to the distance threshold;

judging whether the pixel points are angular points or not based on the first gray information and the second gray information;

and if the pixel point is the angular point, determining the pixel point as the first characteristic point.

In some embodiments, the matching, based on a locality-sensitive hashing algorithm, each of the first feature points with a second feature point included in a previous frame of the first image to obtain a corresponding feature point pair includes:

respectively carrying out Hash transformation on the descriptor of the first characteristic point and the descriptor of the second characteristic point;

performing dimension reduction processing on the first feature points after the hash transformation and the second feature points after the hash transformation;

calculating the distance between the first feature point after the dimension reduction processing and the second feature point after the dimension reduction processing;

determining the feature point pairs based on the calculation result.

In some embodiments, the determining, based on the pair of feature points, first pose change information between the camera acquiring the first image and a last frame image of the first image includes:

mapping a first characteristic point corresponding to the characteristic point pair to a second characteristic point corresponding to the characteristic point pair to obtain a homography matrix;

determining a reference rotational offset and a reference translational offset based on the homography matrix;

and multiplying the reference rotation offset and the reference translation offset by the scale proportion parameter respectively to obtain the rotation translation amount and the translation offset between the first image acquired by the camera and the last frame image of the first image.

In some embodiments, before determining, based on the pair of feature points, that the camera acquires the first pose change information between the first image and the image of the last frame of the first image, the method further includes:

determining second attitude change information between the camera acquiring the second frame image and the camera acquiring the first frame image;

determining mileage change information occurring when the camera acquires the second frame image and when the camera acquires the first frame image;

determining the scale ratio parameter based on the mileage change information and the second posture change information.

In some embodiments, the determining the scale ratio parameter based on the mileage change information and the second posture change information comprises:

calculating a first ratio of the mileage change information of a first coordinate axis included in the mileage change information to the translation amount of the first coordinate axis included in the second position change information, and a second ratio of the mileage change information of a second coordinate axis included in the mileage change information to the translation amount of the second coordinate axis included in the second position change information;

determining half of the sum of the first ratio and the second ratio as the scale ratio parameter.

In some embodiments, the determining the pose of the camera at the time of acquiring the first image based on the first pose change information comprises:

and determining the pose of the camera when the first image is acquired based on the first pose change information and the pose of the last frame of image of the first image.

In a second aspect, an embodiment of the present application provides an apparatus for implementing a visual odometer, including:

the image segmentation module is used for segmenting a first image acquired by the downward-looking camera to obtain at least two image blocks;

the information acquisition module is used for acquiring first characteristic points of each image block and main direction information of each first characteristic point;

a characteristic point pair determining module, configured to match each first characteristic point with a second characteristic point included in a previous frame of the first image based on a locality sensitive hashing algorithm, to obtain a corresponding characteristic point pair;

a pose determination module for determining first pose change information between the first image acquired by the camera and a last frame image of the first image based on the feature point pairs; the first pose change information is related to a scale ratio parameter of the camera; determining the pose of the camera when acquiring the first image based on the first pose change information.

In some embodiments, the information obtaining module is configured to perform the following operations for the image blocks respectively:

In some embodiments, the information obtaining module is configured to perform the following operations for each pixel point in each of the image blocks respectively:

determining first gray information of the pixel points;

In some embodiments, the feature point pair determining module is configured to perform hash transformation on the descriptor of the first feature point and the descriptor of the second feature point respectively;

determining the feature point pairs based on the calculation result.

In some embodiments, the pose determination module is configured to map a first feature point corresponding to the feature point pair to a second feature point corresponding to the feature point pair, so as to obtain a homography matrix;

In some embodiments, the pose determination module is further configured to determine second pose change information between the camera acquiring the second frame of image and the camera acquiring the first frame of image;

In some embodiments, the pose determination module is configured to calculate a first ratio of the mileage variation information of a first coordinate axis included in the mileage variation information to the translation amount of a first coordinate axis included in the second posture variation information, and a second ratio of the mileage variation information of a second coordinate axis included in the mileage variation information to the translation amount of a second coordinate axis included in the second posture variation information;

In some embodiments, the pose determination module is configured to determine the pose of the camera at the time the first image was acquired based on the first pose change information and the pose of the last frame of the first image.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the realization method of the visual odometer provided by the embodiment of the application when executing the executable instructions stored in the memory.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium storing executable instructions for implementing a method for implementing a visual odometer, when the computer-readable storage medium is executed by a processor.

According to the implementation method of the visual odometer, a first image collected by a downward-looking camera is segmented to obtain at least two image blocks; acquiring first characteristic points of each image block and main direction information of the first characteristic points; matching each first characteristic point with a second characteristic point included in a last frame of image of the first image based on a locality sensitive hashing algorithm to obtain a corresponding characteristic point pair; determining, based on the pair of feature points, that the camera acquires first pose change information between the first image and a last frame image of the first image; the first pose change information is related to a scale ratio parameter of the camera; determining the pose of the camera when acquiring the first image based on the first pose change information. Therefore, the implementation method of the visual odometer provided by the embodiment of the application determines the feature point pairs based on the locality sensitive hashing algorithm, only 256-dimensional descriptors are needed, the dimensionality of a search space is reduced, the matching time of the feature point pairs is further reduced, and the implementation time of the visual odometer is prolonged. In addition, the implementation method of the visual odometer provided by the embodiment of the application determines the first attitude change information by combining the scale proportion parameter of the camera, so that the precision of the visual odometer can be improved. By adopting the structure of the downward-looking camera, the implementation method of the visual odometer provided by the embodiment of the application is suitable for scenes with serious light reflection.

Drawings

FIG. 1 is a schematic diagram of an architecture of a system for implementing a visual odometer according to an embodiment of the present application;

fig. 2 is a schematic architecture diagram of a terminal device provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram illustrating an alternative method for implementing a visual odometer according to an embodiment of the present disclosure;

FIG. 4 is a schematic view of an alternative processing flow for segmenting a first image according to an embodiment of the present application;

fig. 5 is a schematic processing flow diagram for acquiring a first feature point of an image block according to an embodiment of the present application;

fig. 6 is a schematic view of an alternative processing flow for determining the principal direction information of the first feature point according to the embodiment of the present application;

FIG. 7 is a schematic diagram of an alternative process for determining first posture change information according to an embodiment of the present disclosure;

fig. 8 is a schematic processing flow diagram for determining first pose change information between the first image acquired by the camera and a last frame image of the first image, according to the feature point pair.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein. In the following description, the term "plurality" referred to means at least two.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Descriptors, information used to describe feature points, typically describe geometric features around a point based on point coordinates, normal vectors, and curvatures.

2) The centroid, which may also be referred to as the center of gravity of the image.

3) Image moments, which are commonly used to describe the segmented image blocks. Partial properties of the image block, including area (or overall brightness), and information about the geometric center and orientation can be obtained from the image moments.

4) And the visual odometer is used for estimating the motion of the camera according to the image shot by the camera.

5) Database (Database): similar to an electronic file cabinet, namely a place for storing electronic files, a user can perform operations of adding, inquiring, updating, deleting and the like on data in the files. A database is also to be understood as a collection of data that are stored together in a manner that can be shared with a plurality of users, with as little redundancy as possible, independent of the application. In embodiments of the present application, the database may store data for model training.

The embodiment of the application provides a method and a device for realizing a visual odometer, electronic equipment and a computer readable storage medium, which can reduce the realization time of the visual odometer and improve the realization precision of the visual odometer. An exemplary application of the electronic device provided in the embodiment of the present application is described below, and the electronic device provided in the embodiment of the present application may be implemented as various types of terminal devices, and may also be implemented as a server.

Referring to fig. 1, fig. 1 is an architectural diagram of a system 100 for implementing a visual odometer according to an embodiment of the present application, in which a terminal device 400 is connected to a server 200 through a network 300, and the server 200 is connected to a database 500, where the network 300 may be a wide area network or a local area network, or a combination of the two.

In some embodiments, taking the electronic device implementing the method for implementing the visual odometer as an example, as a terminal device, the method for implementing the visual odometer provided in the embodiments of the present application may be implemented by the terminal device. For example, the terminal device 400 runs a client 410, and the client 410 may be a client for executing an implementation method of the visual odometer.

The client 410 acquires a first image acquired by a downward-looking camera, and then the client 410 divides the first image to obtain at least two image blocks; acquiring first characteristic points of each image block and main direction information of the first characteristic points; matching each first characteristic point with a second characteristic point included in a last frame of image of the first image based on a locality sensitive hashing algorithm to obtain a corresponding characteristic point pair; determining, based on the pair of feature points, that the camera acquires first pose change information between the first image and a last frame image of the first image; the first pose change information is related to a scale ratio parameter of the camera; determining the pose of the camera when acquiring the first image based on the first pose change information.

In some embodiments, taking the electronic device implementing the method for implementing the visual odometer as an example, being a server, the method for implementing the visual odometer provided in the embodiments of the present application may be implemented cooperatively by the server and the terminal device.

The server 200 acquires a first image captured by the downward-looking camera from the client 410. Then, the server 200 divides the first image to obtain at least two image blocks; acquiring first characteristic points of each image block and main direction information of the first characteristic points; matching each first characteristic point with a second characteristic point included in a last frame of image of the first image based on a locality sensitive hashing algorithm to obtain a corresponding characteristic point pair; determining, based on the pair of feature points, that the camera acquires first pose change information between the first image and a last frame image of the first image; the first pose change information is related to a scale ratio parameter of the camera; determining the pose of the camera when acquiring the first image based on the first pose change information; the server 200 sends the pose of the camera when acquiring the first image to the client 410. The process of obtaining the image block by segmenting the first image by the server 200 can be implemented by using a pre-trained image block model; the server 200 acquires the sample image and the image blocks included in the sample image from the database 500, and trains the sample block model by using the characteristics of the sample image with different dimensions as granularity, so that the sample block model has the function of dividing the input image into the image blocks.

In some embodiments, the terminal device 400 or the server 200 may implement the implementation method of the visual odometer provided by the embodiments of the present application by running a computer program, for example, the computer program may be a native program or a software module in an operating system; can be a local (Native) Application program (APP), i.e. a program that needs to be installed in an operating system to run; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet that can be embedded into any APP. In general, the computer programs described above may be any form of application, module or plug-in.

In some embodiments, the server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a Cloud server providing basic Cloud computing services such as a Cloud service, a Cloud database, Cloud computing, a Cloud function, Cloud storage, a web service, Cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, where Cloud Technology (Cloud Technology) refers to a hosting Technology for unifying resources of hardware, software, a network, and the like in a wide area network or a local area network to implement computing, storage, processing, and sharing of data. The terminal device 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Taking the electronic device provided in the embodiment of the present application as an example for illustration, it can be understood that, for the case where the electronic device is a server, parts (such as the user interface, the presentation module, and the input processing module) in the structure shown in fig. 2 may be default. Referring to fig. 2, fig. 2 is a schematic structural diagram of a terminal device 400 provided in an embodiment of the present application, where the terminal device 400 shown in fig. 2 includes: at least one processor 460, memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in fig. 2.

The Processor 460 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 460.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 2 shows an implementation apparatus 455 of the visual odometer, which may be software in the form of programs and plug-ins, and the like, stored in the memory 450, and may include the following software modules: an image segmentation module 4551, an information acquisition module 4552, a feature point pair determination module 4553 and a pose determination module 4554, which are logical and thus may be arbitrarily combined or further separated according to the functions implemented. The functions of the respective modules will be explained below.

The embodiment of the application provides an implementation method of a visual odometer, which can at least solve the problem.

The implementation method of the visual odometer provided by the embodiment of the present application will be described below in conjunction with an exemplary application and implementation of the electronic device provided by the embodiment of the present application.

Referring to fig. 3, fig. 3 is a schematic flow chart of an alternative implementation method of the visual odometer according to the embodiment of the present application, which will be described with reference to the steps shown in fig. 3.

Step S101, a first image collected by a downward-looking camera is divided to obtain at least two image blocks.

In some embodiments, the manner in which the downward-looking camera captures the first image may be vertical capture or vertical photography; as an example, a downward-looking camera is mounted below the chassis of an electronic device (e.g., a robot), the downward-looking camera being perpendicular to the object being photographed when capturing the first image. By adopting the structure of the downward-looking camera, the implementation method of the visual odometer provided by the embodiment of the application is suitable for scenes with serious reflection, and can better track the camera when the camera moves or rotates in a large scale, so that the camera is prevented from falling into the local optimal dilemma. External parameters of the downward-looking camera do not need to be calibrated, and the precision of the visual odometer can be improved. Because the downward-looking camera can be arranged below the chassis of the robot, the working environment of the downward-looking camera is more stable and single compared with the exposed forward-looking camera, the image processing of a visual algorithm is facilitated, and the interference of illumination and a dynamic environment to the camera can be reduced.

In some embodiments, the feature points are generally characteristic of some aspect of the image, and orb (organized FAST and organized brief) feature points have local invariance and strong noise immunity, and can be used in systems of various scales of visual SLAM. And identifying the acquired image at the current position based on a quadtree structure algorithm and an rBRIEF algorithm so as to extract corresponding texture information in the image. Wherein the texture information comprises a first feature point.

In some embodiments, an alternative processing flow diagram for segmenting the acquired first image may be as shown in fig. 4, and at least includes the following steps:

s101a, selecting the number of initial root nodes according to the aspect ratio of the first image.

In some embodiments, images of different aspect ratios correspond to different numbers of initial root nodes. Wherein the initial root node may be set to 1 or 2.

And S101b, performing 'splitting' operation on the determined initial root node according to the width and the height of the first image to obtain an image block.

In some embodiments, a point of interest in the acquired first image is first selected, and after determining the point of interest in the first image, it may be determined whether a response value of the point of interest is greater than a response threshold, and if the response value of the point of interest is greater than the response threshold, the point of interest is retained.

In some embodiments, the interest points with larger response values in one image block obtained by splitting may be set to 4. And continuing the splitting operation in the image blocks after splitting, and stopping the splitting operation when the number of the total interest points in the split image blocks meets the preset number. In another alternative embodiment, when the interest points in the split image blocks do not meet the response threshold, the splitting operation is stopped.

In particular implementation, the splitting operation may be performed on the acquired first image through a quadtree structure. That is, the acquired first image is split into 4 image blocks, and then the 4 image blocks are subjected to splitting operation to obtain 16 image blocks, and as long as the splitting condition is satisfied, the image splitting operation can be continuously performed.

Step S102, acquiring first characteristic points of each image block and main direction information of each first characteristic point.

In some embodiments, the first feature point of each image block may be extracted by the oFAST method.

In specific implementation, the collected first image may be subjected to downsampling processing of different levels by using a pyramid principle to obtain an image pyramid of the first image, and then first feature point detection is performed on each layer of the image pyramid, so that multi-size features are obtained, and the obtained first feature points have scale invariance.

As an example, a first feature point may be extracted by FAST corner feature; as shown in fig. 5, the process flow of acquiring the first feature point of the image block may at least include:

step S102a, determining first gray information of the pixel point.

In some embodiments, the first gray scale information may refer to a gray scale value of a pixel point, and it is assumed that a gray scale value at a pixel point P in the image block is I_p。

Step S102b, determining second gray scale information of the pixel point whose distance from the pixel point is equal to the distance threshold.

In some embodiments, in each layer of the image pyramid, the grayscale information of 12 pixels on a circle with a radius of 3 and the center of the pixel P is determined as the second grayscale information.

Step S102c, determining whether the pixel point is an angular point based on the first gray information and the second gray information.

In some embodiments, a threshold T is set, as an example T = I_p*20%。

Step S102d, if the pixel point is an angular point, determining that the pixel point is the first feature point.

In some embodiments, if the brightness value of N continuous pixels in the circle is greater than or equal to I_p+ T, or less than I_pAnd T, judging the pixel point P as an angular point, wherein the pixel point P is a first characteristic point.

After the first image is divided for one time, feature point extraction is carried out on the image blocks obtained by division to obtain first feature points. And judging whether the response value of the first characteristic point in each image block is greater than a response threshold value or not, and further determining whether the segmentation operation needs to be continuously executed or not. The feature points extracted after the first image is segmented through the quadtree structure are distributed in the image uniformly, so that the global information of the segmented image can be fully utilized, and the specific position information of the current position can be determined more accurately. According to the method and the device, the first characteristic point is determined according to the relation between the gray values of different pixel points, so that the pose of the camera is not easily influenced by external illumination when the pose is determined.

In some embodiments, the optional process flow of determining the principal direction information of the first feature point, as shown in fig. 6, may include at least:

step S102e, determining the centroid of the image block based on the image moments of the image block.

In some embodiments, the image moments of an image block may be represented by the following formula:

（1）

wherein m represents an image moment; x and y represent coordinates of pixel points, p and q represent the magnitude relation between the pixel value of any one pixel point and the pixel value of the central pixel point, and if the pixel value of the pixel point and the pixel value of the central pixel point are large, the value of p or q is 1; if the pixel value of the pixel point is smaller than that of the central pixel point, the value of p or q is 0.

The centroid of the image block can be represented by the following formula:

（2）

wherein C represents the centroid of the image block,

a matrix representing the lower left corner in the quadtree structure,

the matrix representing the upper left corner in the quadtree structure,

the matrix representing the upper right corner of the quadtree structure,

the matrix representing the bottom right corner in the quadtree structure.

Step S102f, determining a main direction of the image block based on the centroid of the image block and the geometric center of the image block.

In some embodiments, the centroid C of an image block is connected to the geometric center O of the image block, resulting in a principal direction vector OC of the image block.

Step S102g, determining principal direction information of each first feature point in the image block based on the principal direction of the image block.

In some embodiments, the principal direction information of the first feature point extracted in the image block may be represented as:

（3）

step S103, matching each first feature point with a second feature point included in a previous frame of image of the first image based on a locality-sensitive hashing algorithm, to obtain a corresponding feature point pair.

In some embodiments, it may be determined whether the first image is a first frame image captured by a camera, and if the first image is the first frame image captured by the camera, the information acquired in step S102 is stored; then, re-execution of step S101 is performed. If the first image is not the first frame image acquired by the camera, respectively carrying out Hash transformation on the descriptor of the first characteristic point and the descriptor of the second characteristic point; performing dimension reduction processing on the first feature points after the hash transformation and the second feature points after the hash transformation; calculating the distance between the first feature point after the dimension reduction processing and the second feature point after the dimension reduction processing; determining the feature point pairs based on the calculation result.

Based on local sensitive hash algorithm restraining, the distance between two variables in a high-dimensional space is very close, and the distances of the two variables are also approximately very close after the two variables are transformed by the same artificially designed hash function. Therefore, in the embodiment of the present application, after performing hash transformation on the descriptor of the first feature point and the descriptor of the second feature point, dimension reduction processing is performed on the descriptor of the first feature point and the descriptor of the second feature point, proximity search is performed on the first feature point after the dimension reduction processing and the second feature point after the dimension reduction processing, a distance between the first feature point and the second feature point is calculated, and a second feature point closest to the first feature point is found to obtain a feature point pair. In this way, the dimension and time required for the feature point pair matching search can be reduced.

Step S104, determining that the camera acquires first posture change information between the first image and a last frame image of the first image based on the characteristic point pairs.

In some embodiments, the optional process flow of determining the first posture change information may include, as shown in fig. 7, at least:

step S104a, selecting N pixel points and the first feature point to form N feature point pairs within a set range centered on the first feature point by using the rBRIEF algorithm, and performing binary assignment by comparing the gray values to generate a code combination of 0 or 1.

Specifically, a region with a size of 31 × 31 may be selected with a first feature point as a center, and N pixel points are selected in this region. The mode of selecting the N pixel points is selected according to the positions obtained by training, namely the N pixel points are located at the N positions in the region obtained by training. Where N may be 256, then the rBRIEF descriptor dimension is 256. And pairing the selected N pixel points with the first characteristic point serving as the center to obtain N characteristic point pairs. In a specific embodiment, in the feature point pair, by comparing the gray values of the first feature point as the center point and the selected 256 pixel points, the pixel point whose gray value is smaller than the gray value of the first feature point in the image block is defaulted to 0, and the pixel point whose gray value is greater than the gray value of the first feature point in the image block is defaulted to 1, that is, 256 descriptors whose gray values are not 0, that is, 1 are generated.

In step S104b, the weighted sum of the coded combinations of 0 or 1 determines the centroid of the N feature point pairs.

Specifically, based on the obtained coding combination of 0 or 1, 0 or 1 at pixel points at different positions is subjected to weighted summation to obtain gray centroid points of N feature point pairs.

Step S104c, connecting the first feature point and the centroid and determining the direction angle of the first feature point.

Specifically, the first feature point is connected with the centroid, so that a connecting line of the first feature point and the centroid has a direction. In a specific embodiment, the direction angle θ of the connecting line of the feature point and the centroid is determined by the position coordinates of the N feature point pairs. The specific direction angle θ can be obtained by the following formula.

（4）

Wherein N is the number of the characteristic point pairs; y is_NiIs the ordinate, y, of the pixel point in the feature point pair_AIs the ordinate, x, of the first feature point in the pair of feature points_NiIs the abscissa, x, of the pixel point in the feature point pair_AIs the abscissa of the first feature point in the pair of feature points.

Step S104d, the pixel points are rotated and sampled according to the direction angle to obtain the feature point pairs in the rotating state, and whether the feature point pairs are matched with the pre-stored texture information in the texture information base is determined.

Specifically, the obtained 256 pixel points are rotated in the direction of 360 ° by using the obtained direction angle θ as an angle step length to perform sampling, so as to obtain a lookup table of rotation descriptors. That is to say, the feature point pairs of a plurality of angles are obtained through rotation, the feature point pairs obtained through rotation are compared with the feature point pairs in the pre-stored texture information in the texture information base, the feature point pairs are matched with the pre-stored texture information and are determined, and the rotation angle of the feature point pairs is determined according to the rotation direction of the feature point pairs matched with each other. The pre-stored texture information in the texture information base may include feature points corresponding to images collected by a camera in history.

In some embodiments, the pose change information between two adjacent frames of images acquired by the camera can be represented as:

. Wherein the content of the first and second substances,

the camera pose when the camera acquires the next frame of image is relative to the camera pose when the camera acquires the previous frame of image

The amount of displacement in the axial direction,

the displacement amount of the camera pose when the camera captures the next frame image relative to the camera pose when the camera captures the previous frame image in the axial direction,

the amount of angular rotation of the camera pose when the camera captures the next frame of image relative to the camera pose when the camera captures the previous frame of image is determined.

In some embodiments, when the pose is calculated based on the first frame image and the second frame image acquired by the camera, the pose change between the first frame image and the second frame image acquired by the camera is only a transformation relation from the pixel distance, and is not an actual change amount in the physical coordinate system. Therefore, the scale ratio parameter of the camera can be obtained by combining the wheel type odometer to acquire data in the same time period (the time interval between the camera acquiring the first frame image and the second frame image).Wherein the scale parameter may be expressed as S in units of millimeters per pixel. As an example, if the pose of the first frame image collected by the corresponding camera changes to

The data of the wheel type odometer is changed into

Then, the scale parameter S of the camera can be expressed as:

（5）

for other frame images except the first frame image acquired by the camera, the obtained camera pose change information

The actual distance is obtained by default in combination with the scale ratio parameter S and converted into the actual physical distance. Wherein the unit of the actual physical distance may be millimeters.

Step S105, determining the pose of the camera when the camera collects the first image based on the first pose change information.

In some embodiments, after the rotation angles of the feature point pairs are determined in step S104, the specific position information of the current position of the electronic device corresponding to the camera and the current pose of the electronic device are determined according to the pre-stored texture information matched with the feature points.

In some embodiments, the pose accumulator accumulates the pose variation between the adjacent frames of images acquired by the camera to obtain the pose state of the current frame of image acquired by the camera relative to the first frame of image acquired by the camera, so as to realize the effect of the visual odometer. As an example, if the camera captures the first frame of image, the robot has a pose of

And the relative position of the robot when the camera collects the second frame imageThe posture change amount is

Then, the absolute pose of the robot when the camera acquires the second frame image can be calculated as follows:

（6）

（7）

（8）

similarly, for any frame

The absolute pose of the camera in a world coordinate system can be obtained by accumulating the pose when the camera collects the previous frame image and the pose variation between two adjacent frames collected by the camera:

（9）

（10）

（11）

wherein the content of the first and second substances,

is the output result of the visual odometer.

In some embodiments, the process of determining, based on the feature point pairs, that the camera acquires the first pose change information between the first image and the image of the previous frame of the first image may be as shown in fig. 8, including:

step S1, mapping the first feature point corresponding to the feature point pair to the second feature point corresponding to the feature point pair, so as to obtain a homography matrix.

In some embodiments, the second feature point may be a feature point corresponding to pre-stored texture information.

Let the ground equation be

Then, obtaining:

（12）

wherein, P is the space coordinate of the first characteristic point in the world coordinate system, n is the direction vector of the connecting line of the first characteristic point and the optical center of the camera, d is the vertical distance between the optical center of the camera and the ground, and T is the transposition symbol.

Meanwhile, an image collector images the model, and the characteristic points of the obtained current frame and the pre-stored texture information satisfy the following relations:

（13）

（14）

wherein s is₁，s₂Respectively, the scaling factor, s in the application scenario of this embodiment₁，s₂Is a parameter that needs to be calibrated. K is an internal reference matrix of the camera and can be obtained by a common Zhang Zhengyou calibration method. p is a radical of₁，p₂Respectively representing pixel points of the second characteristic point and the first characteristic point in respective images; r | t is the rotation and translation conversion relation matrix | vector between the two. Thus, it is possible to obtain:

（15）

wherein the content of the first and second substances,

，

respectively the pixel coordinates of the second feature point and the first feature point in the respective images,

、

and

three column vectors of the homography matrix H, respectively.

And mapping the first characteristic points corresponding to the characteristic point pairs matched with the pre-stored texture information to a characteristic point set in the pre-stored texture information to obtain a matrix. Specifically, H is a homography matrix in which the first feature point of the current frame is mapped to a feature point set of pre-stored texture information. Ideally, according to the orthogonality of the rotation matrix, the first two column vectors of H can be directly normalized and cross-multiplied to obtain the rotation matrix:

（16）

wherein r is₁，r₂，r₃Respectively three column vectors of the rotation matrix R.

Step S2, a reference rotational offset and a reference translational offset are determined based on the homography matrix.

In some embodiments, the rotation matrix and the translation vector are obtained by using an SVD decomposition matrix.

Specifically, because the output of the visual algorithm is not matched with the input parameters of the navigation positioning algorithm, and an error exists during actual calculation, a rotation matrix and a translation vector can be obtained by decomposing the matrix through SVD, and the result correctness and interpretability are ensured by taking the orthogonality of the rotation matrix as a theoretical basis.

（17）

（18）

（19）

U, S and V can be obtained by carrying out SVD on the homography matrix H, wherein R is a rotation matrix of the detection characteristic point relative to the second characteristic point, and t is a translation vector of the first characteristic point relative to the second characteristic point. And then, determining the rotation offset and the translation offset of the electronic equipment provided with the downward-looking camera in the operation process according to the rotation matrix and the translation vector.

Step S3, multiplying the reference rotational offset and the reference translational offset by the scale ratio parameter, respectively, to obtain a rotational translation amount and a translational offset between the first image acquired by the camera and a previous frame image of the first image.

In some embodiments, the pixel-level pose transformation relationship R | t from the current frame feature point pair to the feature point pair corresponding to the preset texture information is obtained according to the above steps. In an optional embodiment, because the scale information in the optical axis direction of the camera is normalized in the simplification process, only 8 equations are needed for solving R | t, namely 4 pairs of feature point pairs, if the number of matched feature point pairs is more than 4 pairs, the algorithm adopts a RANSAC method for optimizing, and the optimal 4 pairs of matched points are found for resolving by taking the reprojection error of the homography matrix as a standard. The translation vector t is multiplied by a calibrated scaling coefficient s, namely the translation vector under the actual physical space scale, which reflects the translation offset of the electronic equipment provided with the downward-looking camera near the target position corresponding to the preset texture information, and the corresponding angle reflects the rotation offset of the electronic equipment provided with the downward-looking camera near the target position corresponding to the preset texture information. The two devices together guide the electronic device to adjust the operation parameters, so that the electronic device can navigate through the path of the next target position, and the accuracy of the visual odometer is further improved.

In some embodiments, when the texture information in the running environment has special situations such as dirty or gradual change, the texture information base needs to be updated.

In specific implementation, the confidence of the acquired environmental image information is determined according to the number of the screened feature point pairs and the translation vector. Wherein the confidence is calculated by the following formula:

（20）

wherein a is an adjustable parameter and depends on the richness degree of the environment texture; b is a fixed parameter related to the size of the field of view of the image acquisition device; m is the number of the characteristic point pairs after screening;

indication checkAnd measuring the actual translation distance between the characteristic points and the characteristic points in the pre-stored texture information.

The more the matching point pairs of the environmental image of the current frame and the characteristic point pairs in the preset texture information are, the smaller the offset distance of the calculation result is, the higher the image matching similarity is, and the better the matching result is.

Setting a preset confidence coefficient, and judging whether the confidence coefficient corresponding to the environment image of the current frame exceeds the preset confidence coefficient or not; if the confidence corresponding to the environment image of the current frame exceeds the preset confidence, fusing the acquired environment image with the pre-stored texture information to update the texture information base; and if the confidence corresponding to the environment image of the current frame does not exceed the preset confidence, not updating the texture information base.

In the embodiment of the present application, the electronic device mounted with the downward-looking camera may be a robot having a mobile function.

Continuing with the exemplary structure of the implementation device 455 of the visual odometer provided by the embodiment of the present application as a software module, in some embodiments, as shown in fig. 2, the software module stored in the implementation device 455 of the visual odometer in the memory 450 may include: the image segmentation module 4551 is configured to segment a first image acquired by the downward-view camera to obtain at least two image blocks; an information obtaining module 4552, configured to obtain first feature points of each image block and main direction information of each first feature point; a feature point pair determining module 4553, configured to match each first feature point with a second feature point included in a previous frame of the first image based on a locality sensitive hashing algorithm, to obtain a corresponding feature point pair; a pose determination module 4554 configured to determine, based on the feature point pairs, first pose change information between the first image acquired by the camera and a last frame image of the first image; the first pose change information is related to a scale ratio parameter of the camera; determining the pose of the camera when acquiring the first image based on the first pose change information.

In some embodiments, the information obtaining module 4552 is configured to perform the following operations for the image blocks respectively: determining a centroid of the image block based on image moments of the image block; determining a main direction of the image block based on a centroid of the image block and a geometric center of the image block; and determining main direction information of each first characteristic point in the image block based on the main direction of the image block.

In some embodiments, the information obtaining module 4552 is configured to perform the following operations for each pixel point in each image block respectively: determining first gray information of the pixel points; determining second gray information of the pixel points with the distance between the pixel points and the distance equal to the distance threshold;

judging whether the pixel points are angular points or not based on the first gray information and the second gray information; and if the pixel point is the angular point, determining the pixel point as the first characteristic point.

In some embodiments, the feature point pair determining module 4553 is configured to perform hash transform on the descriptor of the first feature point and the descriptor of the second feature point respectively; performing dimension reduction processing on the first feature points after the hash transformation and the second feature points after the hash transformation; calculating the distance between the first feature point after the dimension reduction processing and the second feature point after the dimension reduction processing; determining the feature point pairs based on the calculation result.

In some embodiments, the pose determining module 4554 is configured to map the first feature points corresponding to the feature point pairs to the second feature points corresponding to the feature point pairs, so as to obtain a homography matrix; determining a reference rotational offset and a reference translational offset based on the homography matrix; and multiplying the reference rotation offset and the reference translation offset by the scale proportion parameter respectively to obtain the rotation translation amount and the translation offset between the first image acquired by the camera and the last frame image of the first image.

In some embodiments, the pose determination module 4554 is further configured to determine second pose change information between the camera acquiring the second frame image and the camera acquiring the first frame image; determining mileage change information occurring when the camera acquires the second frame image and when the camera acquires the first frame image; determining the scale ratio parameter based on the mileage change information and the second posture change information.

In some embodiments, the pose determination module 4554 is configured to calculate a first ratio of the mileage variation information of the first coordinate axis included in the mileage variation information to the translation amount of the first coordinate axis included in the second posture variation information, and a second ratio of the mileage variation information of the second coordinate axis included in the mileage variation information to the translation amount of the second coordinate axis included in the second posture variation information; determining half of the sum of the first ratio and the second ratio as the scale ratio parameter.

In some embodiments, the pose determination module 4554 is configured to determine the pose of the camera at the time of acquiring the first image based on the first pose change information and the pose of the last frame of image of the first image.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the implementation method of the visual odometer, which is described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, a method for implementing a visual odometer as shown in fig. 3 to 8.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method of implementing a visual odometer, the method comprising:

2. The method according to claim 1, wherein the obtaining the first feature points of each image block and the principal direction information of each first feature point comprises:

respectively executing the following operations for each image block:

3. The method according to claim 1, wherein the obtaining the first feature points of each image block and the principal direction information of each first feature point comprises:

determining first gray information of the pixel points;

4. The method according to claim 1, wherein the matching each of the first feature points with a second feature point included in a previous frame of the first image based on a locality-sensitive hashing algorithm to obtain a corresponding feature point pair comprises:

determining the feature point pairs based on the calculation result.

5. The method of claim 1, wherein the determining, based on the pair of feature points, first pose change information between the camera acquiring the first image and a last frame image of the first image comprises:

6. The method of claim 5, wherein prior to determining, based on the pair of feature points, that the camera acquires first pose change information between the first image and a last frame of the first image, the method further comprises:

7. The method of claim 6, wherein the determining the scale ratio parameter based on the range change information and the second posture change information comprises:

8. The method of claim 1, wherein the determining the pose of the camera at the time the first image was acquired based on the first pose change information comprises:

9. An apparatus for implementing a visual odometer, the apparatus comprising:

10. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 8 when executing the executable instructions stored in the memory.

11. A computer-readable storage medium storing executable instructions for implementing the method of any one of claims 1 to 8 when executed by a processor.