CN112880675B

CN112880675B - Pose smoothing method and device for visual positioning, terminal and mobile robot

Info

Publication number: CN112880675B
Application number: CN202110088173.3A
Authority: CN
Inventors: 陈建楠; 王超; 姚秀军; 桂晨光; 马福强; 王峰; 崔丽华
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2023-04-07
Anticipated expiration: 2041-01-22
Also published as: CN112880675A

Abstract

The embodiment of the disclosure discloses a pose smoothing method and device for visual positioning, a terminal and a mobile robot. One embodiment of the method comprises: acquiring inter-frame movement information corresponding to a target frame, wherein the inter-frame movement information is used for representing the pose change from a key frame which precedes the target frame by time to the target frame; acquiring associated frame information corresponding to the target frame, wherein the associated frame information comprises positioning information corresponding to a key frame matched with the target frame in a preset map; respectively generating time constraint observation and space constraint observation based on the inter-frame movement information and the comparison between the target frame and the corresponding key frame information; and generating the corresponding pose of the target frame as the smoothed pose by using an optimization method according to the time constraint observation and the space constraint observation. The embodiment can obtain a smoother positioning pose.

Description

Pose smoothing method and device for visual positioning, terminal and mobile robot

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a pose smoothing method, a pose smoothing device, a terminal and a mobile robot for visual positioning.

Background

A mobile robot is generally mounted with various sensors for positioning, such as a laser radar, a camera, an IMU (Inertial Measurement Unit), an encoder, and the like. The camera is low in cost and rich in information, so that the camera is widely applied to robot self-positioning. A positioning technique based on a sensor such as a camera is called VSLAM (Visual Simultaneous positioning and Mapping).

In the prior art, a VSLAM may generally comprise two steps, mapping and positioning. Mapping is a complete SLAM process, which generally comprises the steps of front-end visual odometry, back-end optimization and global optimization. And after the SLAM process is finished, the visual map is stored for subsequent positioning. In the visual positioning process, the robot pose under the visual map can be obtained in real time by matching with the visual map. However, because the global positioning based on the visual map does not depend on the pose recurred by continuous frames, the obtained positioning result is not smooth enough.

Disclosure of Invention

The embodiment of the disclosure provides a pose smoothing method and device for visual positioning, a terminal and a mobile robot.

In a first aspect, an embodiment of the present disclosure provides a pose smoothing method for visual localization, including: acquiring inter-frame movement information corresponding to the target frame, wherein the inter-frame movement information is used for representing the pose change from a key frame which precedes the target frame in time to the target frame; acquiring associated frame information corresponding to a target frame, wherein the associated frame information comprises positioning information corresponding to a key frame matched with the target frame in a preset map; respectively generating time constraint observation and space constraint observation based on the comparison between the inter-frame movement information and the target frame and the corresponding key frame information; and generating the corresponding pose of the target frame as the smoothed pose by utilizing an optimization method according to the time constraint observation and the space constraint observation.

In some embodiments, the target frame includes an image frame within a sliding window of a preset size.

In some embodiments, the cost function used by the optimization method is generated based on a first cost function for characterizing errors of the spatially constrained observations and a second cost function for characterizing errors of the temporally constrained observations.

In some embodiments, the cost function is generated based on the first cost function and the second cost function processed by the preset robust kernel function.

In some embodiments, the acquiring associated frame information corresponding to the target frame includes: selecting at least one piece of historical positioning information matched with the target frame from a preset historical positioning information set, wherein the historical positioning information is obtained based on global positioning of a preset map; based on the feature association between the selected at least one historical positioning information and the target frame, generating associated frame information corresponding to the target frame using a local Bundle Adjustment (local BA) optimization algorithm.

In a second aspect, an embodiment of the present disclosure provides a pose smoothing apparatus for visual localization, the apparatus including: the first acquisition unit is configured to acquire inter-frame movement information corresponding to the target frame, wherein the inter-frame movement information is used for representing the pose change from a key frame which precedes the target frame in time to the target frame; the second acquisition unit is configured to acquire associated frame information corresponding to the target frame, wherein the associated frame information comprises information for positioning corresponding to a key frame matched with the target frame in a preset map; a generating unit configured to generate a time-constrained observation and a space-constrained observation based on the inter-frame movement information and the comparison of the target frame with the corresponding key frame information, respectively; and the smoothing unit is configured to generate the pose corresponding to the target frame as the smoothed pose by using an optimization method according to the time constraint observation and the space constraint observation.

In some embodiments, the second acquiring unit includes: the selecting module is configured to select at least one piece of historical positioning information matched with the target frame from a preset historical positioning information set, wherein the historical positioning information is obtained based on global positioning of a preset map; a generating module configured to generate associated frame information corresponding to the target frame using a local beam adjustment optimization algorithm based on the feature association between the selected at least one historical positioning information and the target frame.

In a third aspect, an embodiment of the present disclosure provides a terminal, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a mobile robot including: the terminal as described in the third aspect; a camera configured to capture an image; an odometer; an inertial sensor; a mobile device.

In a fifth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which when executed by a processor implements the method as described in any of the implementations of the first aspect.

According to the pose smoothing method, the device, the terminal and the mobile robot for visual positioning, time constraint observation is established through inter-frame movement information related to time, space constraint observation is established through matching of key frames in a preset map, so that pose results are obtained under the two constraint conditions.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a pose smoothing method for visual localization according to the present disclosure;

FIG. 3 is a schematic diagram of one application scenario of a pose smoothing method for visual localization according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a pose smoothing method for visual localization according to the present disclosure;

FIG. 5 is a schematic structural diagram of one embodiment of a pose smoothing device for visual localization according to the present disclosure;

fig. 6 is a schematic block diagram of a terminal suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and the features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary architecture 100 to which the pose smoothing method for visual localization or the pose smoothing apparatus for visual localization of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various navigation applications, such as map applications, may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices that support positioning, including but not limited to smartphones, inspection robots, etc. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as a plurality of software or software modules (e.g., software or software modules used to provide distributed services) or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the positioning of the

terminal devices

101, 102, 103. For example, the background server may issue a preset map to the terminal device.

It should be noted that the preset map may also be directly stored locally in the

terminal devices

101, 102, and 103, and the

terminal devices

101, 102, and 103 may directly extract the locally stored preset map for positioning, in this case, the network 104 and the server 105 may not exist.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules for providing distributed services) or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the pose smoothing method for visual positioning provided by the embodiment of the present disclosure is generally executed by the

terminal devices

101, 102, 103, and accordingly, the pose smoothing apparatus for visual positioning is generally disposed in the

terminal devices

101, 102, 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a pose smoothing method for visual localization according to the present disclosure is shown. The pose smoothing method for visual positioning comprises the following steps of:

step 201, inter-frame movement information corresponding to the target frame is obtained.

In this embodiment, the executing body (such as the

terminals

101, 102, 103 shown in fig. 1) of the pose smoothing method for visual positioning may acquire the inter-frame movement information corresponding to the target frame through a wired connection manner or a wireless connection manner. The target frame generally refers to an image frame corresponding to a pose to be smoothed. The inter-frame movement information is generally used to characterize the pose change from the key frame that precedes the target frame in time to the target frame. The pose may include, for example, 6 degrees of freedom, including 3 degrees of freedom that characterize displacement (Translation) and 3 degrees of freedom that characterize spatial Rotation (Rotation).

As an example, the mobile robot takes three frames of images during movement. When the 2 nd frame image is taken as a target frame, the inter-frame movement information can be used for representing the posture change from the shooting of the 1 st frame image to the shooting of the 2 nd frame image. At this time, the execution body may acquire the inter-frame movement information from a preset information set for storing the inter-frame movement information. When the 3 rd frame image is taken as the target frame, the inter-frame movement information can be used for representing the change of the pose from the shooting of the 2 nd frame image to the shooting of the 3 rd frame image. At this time, the execution body may acquire the inter-frame movement information from various odometers. Wherein the odometer may include, but is not limited to, at least one of: visual odometers, wheel odometers.

In some optional implementations of this embodiment, the target frame may include an image frame within a sliding window of a preset size. As an example, the size of the sliding window may be set to 5.

Based on the optional implementation mode, the method and the device can ensure stable memory occupation in the process of pose smoothing of the target frame by only maintaining a fixed number of historical frames instead of the whole track in the mapping process, and avoid large consumption of memory resources, thereby improving the real-time performance of the method and the device.

Step 202, obtaining associated frame information corresponding to the target frame.

In this embodiment, the execution main body may acquire the associated frame information corresponding to the target frame through a wired connection manner or a wireless connection manner. The related frame information may include information for positioning corresponding to a key frame matched with the target frame in a preset map. The positioning information may include various information for positioning. The preset map generally refers to a map consistent with a range in which the positioning is located. Which may generally include the pose, 2-dimensional feature point coordinates, descriptors, and 3-dimensional map point coordinates corresponding to each keyframe in the mapping process.

As an example, the execution subject may match the target frame with a key frame in the preset map. Wherein the matching may generally include image feature point matching. Generally, the execution subject may select a key frame with the highest similarity from the preset map. Then, the execution subject may perform feature point matching on the selected key frame and the target frame, thereby generating a correlation feature. Then, according to the generated association features, the execution subject may solve a PnP (coherent-n-Point) problem by using a Random sample consensus (Random sample consensus) algorithm, for example, to generate a matching pose as the association frame information.

Step 203, respectively generating time constraint observation and space constraint observation based on the inter-frame movement information and the comparison between the target frame and the corresponding key frame information.

In this embodiment, the execution subject may generate a time-constrained observation based on the inter-frame movement information acquired in step 201. The time constraint observation can be used for indicating the pose change between the previous frame and the next frame (for example, the jth frame and the ith frame) in the positioning process. As an example, the time constraint observation described above may include the following formula:

wherein, the above

Can be used to characterize the translation of the ith frame image in the world coordinate system w. Above-mentioned>

Can be used to characterize the translation of the image of the j frame in the world coordinate system w. Above->

Can be used to characterize the inverse of the rotation matrix between the world coordinates and the image coordinates of the image of the ith frame. Above-mentioned>

Can be used to characterize the shift increment from the j frame image to the i frame image. In the same way, the above-mentioned>

Can be respectively used for characterizing the j frame imageRoll angle, pitch angle, and yaw angle. As described above

May be used to characterize roll, pitch and yaw angles, respectively, for the ith frame image. Above-mentioned>

Can be used to characterize roll angle, pitch angle and yaw angle increment from the jth frame image to the ith frame image, respectively.

It should be noted that, when the execution subject of the pose smoothing method for visual localization can acquire the observed values of the roll angle and the pitch angle by the inertial measurement unit, the constraints of the above equations (2) and (3) may not be required.

In this embodiment, the execution subject may spatially constrain the observation based on the comparison of the target frame with the key frame information obtained in step 202. The above-mentioned spatial constraint observation can be used to indicate pose matching between a current frame (for example, i-th frame) and a matching key frame (for example, key frame l) in a preset map in the positioning process. As an example, the above-mentioned spatially constrained observation may include the following formula:

wherein, the above

Can be used to characterize the translation of the key frame l in the world coordinate system w. Above->

Can be used to characterize the inverse of the rotation matrix between the world coordinates and the image coordinates of the key frame/. Above-mentioned>

Can be used to characterize the shift increment of the ith frame image to the key frame l. In the same way, the above-mentioned>

May be used to characterize roll, pitch and yaw, respectively, of the key frame l. Above-mentioned>

Can be used to characterize roll, pitch and yaw increments from the ith image to key frame l, respectively.

It should be noted that, when the execution subject of the pose smoothing method for visual localization can acquire the observed values of the roll angle and the pitch angle by the inertial measurement unit, the constraints of the above equations (6) and (7) may not be required.

And 204, generating the pose corresponding to the target frame as the smoothed pose by using an optimization method according to the time constraint observation and the space constraint observation.

In the present embodiment, the execution subject may generate the pose corresponding to the target frame as the smoothed pose using various optimization methods according to the time constraint observation and the space constraint observation generated in step 203. As an example, the executing entity may convert the maximum likelihood estimation problem corresponding to the time-constrained observation and the space-constrained observation generated in step 203 into a least square problem, and solve the smoothed pose so that the objective function is minimized. Wherein the objective function is generally consistent with the error. The optimization method may include, but is not limited to, at least one of the following: newton method, gradient descent method, gauss newton method, and Levenberg-marquardt (LM) method.

In some alternative implementations of the present embodiment, the cost function used by the optimization method may be generated based on the first cost function and the second cost function. Wherein, the first cost function can be used for representing the error of the space constraint observation. The second cost function described above may be used to characterize the error of the time-constrained observation. Alternatively, the cost function used in the optimization method may also be obtained by weighted summation of the first cost function and the second cost function.

As an example, the cost function may be as shown in equation (9):

wherein r is as defined above _i,j And r _i,l Can be used to characterize the residual between the j frame image and the i frame image and the residual between the i frame image and the corresponding key frame l, respectively. The set S may be used to characterize image frames that satisfy temporal constraints and the set R may be used to characterize image frames that satisfy spatial constraints. The optimization variables are generally consistent with the time-constrained and space-constrained observations. As an example, the above-mentioned time-constrained observation and space-constrained observation include 6 degrees of freedom, and the above-mentioned cost functionThe number of optimization variables is 6. As yet another example, the time-constrained observation and the space-constrained observation include 4 degrees of freedom (roll angle Φ and pitch angle θ can be directly observed), and the optimization variables of the cost function are 4, i.e. (t includes 3 translational directions and yaw angle ψ).

As an example, when the optimization variable is 4, the above residual error may be specifically shown as formula (9-1) and formula (9-2), for example:

wherein, the above

And &>

Can be used to characterize the inverse of the rotation matrix corresponding to the ith frame image and the key frame l, respectively. The meaning of the remaining variables may be consistent with the foregoing description and will not be repeated here.

Optionally, the executing body may further generate the smoothed pose by using a graph optimization algorithm. In these implementations, the nodes in the graph may be used to characterize the image frame, and the edges in the graph may include temporally constrained edges used to characterize temporal constraints and spatially constrained edges used to characterize spatial constraints.

Optionally, based on the optional implementation manner, the cost function may be generated based on the first cost function and the second cost function processed by the preset robust kernel function. Wherein, the preset robust kernel function may include, but is not limited to, at least one of the following: a cauchy kernel function, a huber kernel function.

Based on the above alternative implementation, the present solution can reduce the impact of erroneous spatial constraint matching (usually referred to as relocation) results.

With continuing reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of a pose smoothing method for visual localization according to an embodiment of the present disclosure. In the application scenario of fig. 3, the mobile robot 301 may obtain inter-frame movement information corresponding to the current frame 3011 from the odometer. The inter-frame movement information may be used to represent pose changes of the previous frame 3010 to the current frame 3010. Thereafter, the mobile robot 301 may acquire associated frame information corresponding to the current frame 3011 from the preset map 302. The related frame information may be, for example, positioning information corresponding to the key frame 3023 in the preset map 302. Alternatively, the mobile robot 301 may further acquire associated frame information corresponding to the previous frame 3010 (for example, information for positioning corresponding to the key frame 3022) and associated frame information corresponding to the next previous frame (for example, information for positioning corresponding to the key frame 3021) from the preset map 302. Based on the comparison between the inter-frame movement information and the positioning information corresponding to the current frame 3011 and the key frame 3023, the mobile robot 301 can generate the time constraint observation 303 and the space constraint observation 304. Based on the time constraint observation 303 and the space constraint observation 304, the mobile robot 301 may generate the pose corresponding to the current frame 3011 as the smoothed pose using an optimization method.

At present, one of the prior art generally only considers the matching result of the current frame and the key frame in the preset map for positioning, so that the obtained positioning result is often not smooth enough. In the method provided by the embodiment of the disclosure, time constraint observation is established through inter-frame movement information associated with time, and space constraint observation is established through matching of key frames in a preset map, so that a pose result is obtained under the two constraint conditions.

With further reference to fig. 4, a flow 400 of yet another embodiment of a pose smoothing method for visual localization is illustrated. The process 400 of the pose smoothing method for visual positioning includes the following steps:

step 401, acquiring inter-frame movement information corresponding to a target frame.

Step 402, selecting at least one historical positioning information matched with the target frame from a preset historical positioning information set.

In this embodiment, an executing subject (e.g., the

terminals

101, 102, 103 shown in fig. 1) of the pose smoothing method for visual localization may extract at least one piece of historical localization information matching the target frame from a preset set of historical localization information in various ways. The historical positioning information can be obtained based on the global positioning of the preset map. As an example, the execution subject may select at least one piece of historical positioning information matching the target frame by calculating image similarity or determining a matching degree through feature point association.

And 403, generating associated frame information corresponding to the target frame by using a local beam adjustment optimization algorithm based on the characteristic association between the selected at least one piece of historical positioning information and the target frame.

In this embodiment, based on the feature association between the at least one piece of historical positioning information selected in step 402 and the target frame, the executing entity may establish a local window near the key frame corresponding to the selected piece of historical positioning information, and further generate associated frame information corresponding to the target frame by using a local beam adjustment optimization algorithm.

Step 404, respectively generating a time constraint observation and a space constraint observation based on the inter-frame movement information and the comparison between the target frame and the corresponding key frame information.

And 405, generating a pose corresponding to the target frame as a smoothed pose by using an optimization method according to the time constraint observation and the space constraint observation.

Step 401, step 404, and step 405 are respectively consistent with step 201, step 202, step 203, and their optional implementations in the foregoing embodiments, and the above description on step 201, step 202, step 203, and their optional implementations also applies to step 401, step 404, and step 405, which is not described herein again.

As can be seen from fig. 4, the process 400 of the pose smoothing method for visual positioning in this embodiment embodies the steps of generating the associated frame information by using local beam adjustment optimization. Therefore, the scheme described in this embodiment can optimize the global pose by using the keyframe associated with the preset map, so as to improve the accuracy of the global pose and further contribute to improving the positioning accuracy of the pose after smoothing.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a pose smoothing apparatus for visual localization, which corresponds to the method embodiment shown in fig. 2 or fig. 4, and which can be applied in various electronic devices.

As shown in fig. 5, the pose smoothing apparatus 500 for visual localization provided by the present embodiment includes a first acquisition unit 501, a second acquisition unit 502, a generation unit 503, and a smoothing unit 504. The first obtaining unit 501 is configured to obtain inter-frame movement information corresponding to a target frame, where the inter-frame movement information is used to represent a pose change from a key frame that precedes the target frame in time to the target frame; a second obtaining unit 502 configured to obtain associated frame information corresponding to the target frame, where the associated frame information includes information for positioning corresponding to a key frame matched with the target frame in a preset map; a generating unit 503 configured to generate a time-constrained observation and a space-constrained observation based on the inter-frame movement information and the comparison of the target frame with the corresponding key frame information, respectively; and a smoothing unit 504 configured to generate the pose corresponding to the target frame as a smoothed pose using an optimization method based on the time-constrained observation and the space-constrained observation.

In the present embodiment, in the pose smoothing device 500 for visual localization: for specific processing of the first obtaining unit 501, the second obtaining unit 502, the generating unit 503 and the smoothing unit 504 and technical effects thereof, reference may be made to relevant descriptions of step 201, step 202, step 203 and step 204 in the corresponding embodiment of fig. 2, and details are not repeated here.

In some optional implementations of this embodiment, the target frame may include an image frame within a sliding window of a preset size.

In some optional implementations of the embodiment, the cost function used by the optimization method may be generated based on a first cost function and a second cost function, where the first cost function may be used to characterize the error of the spatial constraint observation. The second cost function described above can be used to characterize the error of the time-constrained observation.

In some optional implementations of the embodiment, the cost function may be generated based on the first cost function and the second cost function processed by the preset robust kernel function.

In some optional implementation manners of this embodiment, the second obtaining unit 502 may include: the selecting module is configured to select at least one piece of historical positioning information matched with the target frame from a preset historical positioning information set; a generating module configured to generate associated frame information corresponding to the target frame using a local beam adjustment optimization algorithm based on the feature association between the selected at least one historical positioning information and the target frame. The historical positioning information can be obtained based on the global positioning of the preset map.

According to the device provided by the above embodiment of the present disclosure, the generating unit 503 establishes a time constraint observation according to the inter-frame movement information associated with time acquired by the first acquiring unit 501, and also establishes a space constraint observation according to the key frame information matched in the preset map acquired by the second acquiring unit 502, so that the smoothing unit 504 obtains a pose result under the above two constraint conditions.

Reference is now made to fig. 6, which illustrates a schematic structural diagram of an electronic device (e.g., the terminal device in fig. 1) 600 suitable for implementing embodiments of the present application. The terminal device in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a mobile robot, and the like. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.

As shown in fig. 6, electronic device 600 may include a processing device (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage device 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; a storage device 608 including, for example, an SD card or the like; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or installed from the storage means 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present application.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the foregoing.

Embodiments of the present disclosure also provide a mobile robot, which may include the electronic device described in the above embodiments, a camera for collecting an image, a odometer, an inertial sensor, and a mobile device. The camera may include an optical image camera and a depth image camera. The moving means may comprise, for example, a movable chassis, a crawler, etc.

The computer readable medium may be included in the terminal device; or may exist separately without being assembled into the terminal device. The computer readable medium carries one or more programs which, when executed by the terminal device, cause the terminal device to: acquiring inter-frame movement information corresponding to the target frame, wherein the inter-frame movement information is used for representing the pose change from a key frame which precedes the target frame by time to the target frame; acquiring associated frame information corresponding to a target frame, wherein the associated frame information comprises positioning information corresponding to a key frame matched with the target frame in a preset map; respectively generating time constraint observation and space constraint observation based on the comparison between the inter-frame movement information and the target frame and the corresponding key frame information; and generating the corresponding pose of the target frame as the smoothed pose by using an optimization method according to the time constraint observation and the space constraint observation.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and including conventional procedural programming languages, such as "C," Python, or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit, a second acquisition unit, a generation unit, and a smoothing unit. For example, the first acquiring unit may be further described as a unit that acquires inter-frame movement information corresponding to the target frame, where the inter-frame movement information is used to represent a pose change from a key frame that temporally precedes the target frame to the target frame.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A pose smoothing method for visual localization, comprising:

acquiring inter-frame movement information corresponding to a target frame, wherein the inter-frame movement information is used for representing pose change from a key frame which precedes the target frame in time to the target frame;

acquiring associated frame information corresponding to the target frame, wherein the associated frame information comprises positioning information corresponding to a key frame matched with the target frame in a preset map;

respectively generating time constraint observation and space constraint observation based on the inter-frame movement information and the comparison between the target frame and the corresponding associated frame information;

and generating the corresponding pose of the target frame as the smoothed pose by utilizing an optimization method according to the time constraint observation and the space constraint observation.

2. The method of claim 1, wherein the target frame comprises an image frame within a sliding window of a preset size.

3. The method of claim 1, wherein the cost function used by the optimization method is generated based on a first cost function for characterizing errors of spatially constrained observations and a second cost function for characterizing errors of temporally constrained observations.

4. The method of claim 3, wherein the cost function is generated based on the first cost function and the second cost function processed by a preset robust kernel function.

5. The method according to one of claims 1 to 4, wherein the obtaining of the associated frame information corresponding to the target frame comprises:

selecting at least one piece of historical positioning information matched with the target frame from a preset historical positioning information set, wherein the historical positioning information is obtained based on global positioning of the preset map;

and generating associated frame information corresponding to the target frame by using a local beam adjustment optimization algorithm based on the characteristic association between the selected at least one piece of historical positioning information and the target frame.

6. A pose smoothing device for visual localization, comprising:

the first acquisition unit is configured to acquire inter-frame movement information corresponding to a target frame, wherein the inter-frame movement information is used for representing a pose change from a key frame which precedes the target frame in time to the target frame;

the second acquisition unit is configured to acquire associated frame information corresponding to the target frame, wherein the associated frame information comprises information for positioning corresponding to a key frame matched with the target frame in a preset map;

a generating unit configured to generate a time-constrained observation and a space-constrained observation based on the inter-frame movement information and a comparison of the target frame with corresponding associated frame information, respectively;

and the smoothing unit is configured to generate the corresponding pose of the target frame as the smoothed pose by utilizing an optimization method according to the time constraint observation and the space constraint observation.

7. The apparatus of claim 6, wherein the target frame comprises an image frame within a sliding window of a preset size.

8. The apparatus of claim 6, wherein the cost function used by the optimization method is generated based on a first cost function characterizing errors of the spatially constrained observations and a second cost function characterizing errors of the temporally constrained observations.

9. The apparatus of claim 8, wherein the cost function is generated based on the first cost function and the second cost function processed by a preset robust kernel function.

10. The apparatus according to one of claims 6 to 9, wherein the second obtaining unit comprises:

a selecting module configured to select at least one piece of historical positioning information matched with the target frame from a preset historical positioning information set, wherein the historical positioning information is obtained based on global positioning of the preset map;

a generating module configured to generate associated frame information corresponding to the target frame using a local beam adjustment optimization algorithm based on the feature association between the selected at least one historical positioning information and the target frame.

11. A terminal, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A mobile robot, comprising:

the terminal of claim 11;

a camera configured to acquire an image;

an odometer;

an inertial sensor;

and (4) moving the device.

13. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.