CN117710591A

CN117710591A - Space map processing method, device, electronic equipment, medium and program product

Info

Publication number: CN117710591A
Application number: CN202211086404.8A
Authority: CN
Inventors: 彭树学
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2024-03-15

Abstract

The present disclosure relates to a spatial map processing method, apparatus, electronic device, medium, and program product, the method comprising: constructing a three-dimensional visual map based on images in a plurality of image sets acquired by a multi-view camera, the multi-frame images in one image set being acquired by one of the multi-view cameras, the three-dimensional visual map comprising each frame of image in each image set, and a pose of each frame of image and a three-dimensional point on each frame of image; determining correction parameters based on the three-dimensional visual map and the target parameters; and correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map.

Description

Space map processing method, device, electronic equipment, medium and program product

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to a spatial map processing method, apparatus, electronic device, storage medium, and program product.

Background

The positioning and navigation of robots in the prior art are generally based on acquired images and reconstructed in three dimensions by means of algorithms such as SFM (Structure From Motion, motion restoration structure). However, because the SFM algorithm and other algorithms are purely visual mapping methods, the problem that the constructed three-dimensional visual map is not matched with the actual environment can be caused.

Disclosure of Invention

To solve or at least partially solve the above technical problems, the present disclosure provides a spatial map processing method, apparatus, electronic device, storage medium, and program product.

In a first aspect of an embodiment of the present disclosure, there is provided a spatial map processing method, including: constructing a three-dimensional visual map based on images in a plurality of image sets acquired by a multi-view camera, the multi-frame images in one image set being acquired by one of the multi-view cameras, the three-dimensional visual map comprising each frame of image in each image set, and a pose of each frame of image and a three-dimensional point on each frame of image; determining correction parameters based on the three-dimensional visual map and the target parameters; and correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map.

Optionally, the target parameter includes an external parameter of the multi-view camera, and the correction parameter is a target correction ratio; the determining of the correction parameters based on the three-dimensional visual map and the target parameters includes: determining the target correction ratio based on the distance between each frame of image shot synchronously in each two image sets in the plurality of image sets and the external parameters of the multi-camera in the three-dimensional visual map; correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain a corrected three-dimensional visual map, wherein the method comprises the following steps: and according to the target correction proportion, carrying out scale alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map to obtain the corrected three-dimensional visual map.

Optionally, the determining the target correction ratio based on the distance between each frame of image in the three-dimensional visual map, which is synchronously shot in each two image sets in the plurality of image sets, and the external parameters of the multi-camera includes: determining a plurality of correction proportions based on the distance between each frame of image in the three-dimensional visual map, which is shot synchronously in each two image sets in the plurality of image sets, and the external parameters of the cameras corresponding to each two image sets; an average or median of the plurality of correction ratios is determined as the target correction ratio.

Optionally, the performing scale alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the target correction proportion to obtain a corrected three-dimensional visual map, including: based on the target correction proportion, carrying out preliminary correction on the pose of each image and each three-dimensional point in the three-dimensional visual map to obtain the three-dimensional visual map after preliminary correction; acquiring pose information of each image in each image set in the preliminarily corrected three-dimensional visual map to obtain a plurality of pose information sets, wherein each pose information set corresponds to one image set; constructing a scale optimization function based on a plurality of pose information sets and external parameters of the multi-camera, wherein the scale optimization function is used for representing the sum of differences between measured values and real values of the relative pose relation of each two adjacent cameras in the multi-camera, the measured values are determined according to the plurality of pose information sets, and the real values are determined according to the external parameters of the multi-camera; optimizing the scale optimization function through a nonlinear optimization algorithm, so as to obtain the corrected pose information of each image in the primarily corrected three-dimensional visual map through calculation when the function value of the scale optimization function is smaller than or equal to the scale error threshold value; and correcting the pose of each image in the three-dimensional visual map based on the corrected pose information of each image to obtain the corrected three-dimensional visual map.

Optionally, the constructing a scale optimization function based on the plurality of pose information sets and the extrinsic parameters of the multi-view camera includes: constructing a first function according to the plurality of pose information sets and external parameters of the multi-camera, wherein the first function is used for representing the sum of differences between measured values and actual values of the relative pose relation of every two adjacent cameras in the multi-camera; constructing a second function according to the pose information sets and the internal references of the multi-view camera, wherein the second function is used for representing residual items of a mapping tool used for constructing the three-dimensional visual map; the scale optimization function is determined based on the first function and the second function.

Optionally, the target parameter includes a plurality of IMU accelerations, each IMU acceleration being measured during the acquisition of the plurality of image sets by the multi-camera, the correction parameter including an optimized rotation matrix; the determining of the correction parameters based on the three-dimensional visual map and the target parameters includes: calculating an acceleration vector corresponding to each frame of image based on at least one position information set, wherein each position information set comprises position information of each image in one image set in the three-dimensional visual map; performing linear interpolation processing on the plurality of IMU accelerations to obtain IMU acceleration vectors corresponding to each frame of image; based on the acceleration vector corresponding to each frame of image and the IMU acceleration vector corresponding to each frame of image, calculating to obtain a gravity acceleration vector; constructing a gravity direction optimization function based on the gravity acceleration vector, wherein the gravity direction optimization function is used for representing the difference between the gravity acceleration vector rotated by the rotation matrix and an actual gravity acceleration vector pointing to the gravity direction module value as gravity acceleration; optimizing the gravity direction optimization function through a nonlinear optimization algorithm, so as to obtain the optimized rotation matrix through calculation when the function value of the gravity direction optimization function is smaller than or equal to a direction error threshold value; correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map, wherein the method comprises the following steps: and according to the optimized rotation matrix, carrying out gravity direction alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map to obtain the corrected three-dimensional visual map.

Optionally, the gravity direction optimization function is specifically configured to characterize a difference between the normalized vector rotated by the rotation matrix and a target vector, where the normalized vector is obtained by normalizing the gravity acceleration vector, and the target vector is a unit vector pointing in the gravity direction.

In a second aspect of the embodiments of the present disclosure, there is provided a spatial map processing apparatus including: the system comprises a construction module, a determination module and a correction module; the construction module is used for constructing a three-dimensional visual map based on images in a plurality of image sets acquired by a multi-view camera, wherein a multi-frame image in one image set is acquired by one camera in the multi-view camera, and the three-dimensional visual map comprises each frame of image in each image set, the pose of each frame of image and three-dimensional points on each frame of image; the determining module is used for determining correction parameters based on the three-dimensional visual map and the target parameters obtained by the constructing module; the correction module is used for correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters determined by the determination module, and obtaining the corrected three-dimensional visual map.

Optionally, the target parameter includes an external parameter of the multi-view camera, and the correction parameter is a target correction ratio; the determining module is specifically configured to determine the target correction ratio based on a distance between each frame of image in the three-dimensional visual map, which is captured synchronously in each two image sets in the plurality of image sets, and an external parameter of the multi-camera; the correcting module is specifically configured to perform scale alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the target correction proportion, so as to obtain the corrected three-dimensional visual map.

Optionally, the determining module is specifically configured to determine a plurality of correction scales based on a distance between each frame of image in the three-dimensional visual map, which is captured synchronously in each of the two image sets, and an external parameter of the camera corresponding to each of the two image sets; an average or median of the plurality of correction ratios is determined as the target correction ratio.

Optionally, the correction module is specifically configured to perform preliminary correction on the pose of each image and each three-dimensional point in the three-dimensional visual map based on the target correction proportion, so as to obtain the three-dimensional visual map after preliminary correction; acquiring pose information of each image in each image set in the preliminarily corrected three-dimensional visual map to obtain a plurality of pose information sets, wherein each pose information set corresponds to one image set; constructing a scale optimization function based on a plurality of pose information sets and external parameters of the multi-camera, wherein the scale optimization function is used for representing the sum of differences between measured values and real values of the relative pose relation of each two adjacent cameras in the multi-camera, the measured values are determined according to the plurality of pose information sets, and the real values are determined according to the external parameters of the multi-camera; optimizing the scale optimization function through a nonlinear optimization algorithm, so as to obtain the corrected pose information of each image in the primarily corrected three-dimensional visual map through calculation when the function value of the scale optimization function is smaller than or equal to the scale error threshold value; and correcting the pose of each image in the three-dimensional visual map based on the corrected pose information of each image to obtain the corrected three-dimensional visual map.

Optionally, the correction module is specifically configured to construct a first function according to the multiple pose information sets and external parameters of the multiple cameras, where the first function is used to characterize a sum of differences between measured values and actual values of relative pose relationships of each two adjacent cameras in the multiple cameras; constructing a second function according to the pose information sets and the internal references of the multi-view camera, wherein the second function is used for representing residual items of a mapping tool used for constructing the three-dimensional visual map; the scale optimization function is determined based on the first function and the second function.

Optionally, the target parameter includes a plurality of IMU accelerations, each IMU acceleration being measured during the acquisition of the plurality of image sets by the multi-camera, the correction parameter including an optimized rotation matrix; the determining module is specifically configured to calculate an acceleration vector corresponding to each frame of image based on at least one set of position information, where each set of position information includes position information of each image in one set of images in the three-dimensional visual map; performing linear interpolation processing on the plurality of IMU accelerations to obtain IMU acceleration vectors corresponding to each frame of image; based on the acceleration vector corresponding to each frame of image and the IMU acceleration vector corresponding to each frame of image, calculating to obtain a gravity acceleration vector; constructing a gravity direction optimization function based on the gravity acceleration vector, wherein the gravity direction optimization function is used for representing the difference between the gravity acceleration vector rotated by the rotation matrix and an actual gravity acceleration vector pointing to the gravity direction module value as gravity acceleration; optimizing the gravity direction optimization function through a nonlinear optimization algorithm, so as to obtain the optimized rotation matrix through calculation when the function value of the gravity direction optimization function is smaller than or equal to a direction error threshold value; the correction module is specifically configured to perform gravity direction alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the optimized rotation matrix, so as to obtain the corrected three-dimensional visual map.

A third aspect of an embodiment of the present disclosure provides an electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program implementing the spatial map processing method according to the first aspect when executed by the processor.

In a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the spatial map processing method according to the first aspect.

A fifth aspect of embodiments of the present disclosure provides a computer program product, wherein the computer program product comprises a computer program which, when run on a processor, causes the processor to execute the computer program for implementing the spatial map processing method as described in the first aspect.

A sixth aspect of embodiments of the present disclosure provides a chip comprising a processor and a communication interface coupled to the processor for executing program instructions to implement the spatial map processing method according to the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: in an embodiment of the present disclosure, a three-dimensional visual map (the three-dimensional visual map includes each frame of image in each image set, and a pose of the each frame of image and a three-dimensional point on the each frame of image) is constructed based on images in a plurality of image sets acquired by a multi-view camera (a multi-frame image in one image set is acquired by one camera of the multi-view camera); determining correction parameters based on the three-dimensional visual map and the target parameters; and correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map. Determining correction parameters based on the three-dimensional visual map and the target parameters; and then correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map matched with the actual environment.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a schematic flow chart of a spatial map processing method according to an embodiment of the disclosure;

fig. 2 is a block diagram of a spatial map processing device according to an embodiment of the present disclosure;

fig. 3 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, where appropriate, such that embodiments of the disclosure may be practiced in sequences other than those illustrated and described herein, and that the objects identified by "first," "second," etc. are generally of the same type and are not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The electronic device in the embodiment of the disclosure may be a mobile electronic device or a non-mobile electronic device. The mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant, PDA), etc.; the non-mobile electronic device may be a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, or the like; the embodiments of the present disclosure are not particularly limited.

The execution body of the spatial map processing method provided in the embodiment of the present disclosure may be the above-mentioned electronic device (including mobile electronic device and non-mobile electronic device), or may be a functional module and/or a functional entity capable of implementing the spatial map processing method in the electronic device, which may be specifically determined according to actual use requirements, and the embodiment of the present disclosure is not limited.

The spatial map processing method provided by the embodiment of the disclosure is described in detail below through specific embodiments and application scenarios thereof with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present disclosure provides a spatial map processing method, which may include steps 101 to 103 described below.

101. And constructing a three-dimensional visual map based on the images in the plurality of image sets acquired by the multi-view camera.

Wherein a plurality of frame images in one image set are acquired by one camera of the multi-view camera, the three-dimensional visual map comprises each frame image in each image set, and the pose of each frame image and the three-dimensional point on each frame image.

Wherein each frame of image in each image set is acquired simultaneously by a different one of the multiple cameras.

The multi-view camera may be a normal camera including at least two cameras, a Virtual Reality (VR) camera, or a panoramic camera, which may be specifically determined according to a visual situation, and is not limited herein.

It should be noted that, in the embodiment of the present disclosure, the construction method and the construction tool used for constructing the three-dimensional visual map are not limited, for example, the construction method may be a sparse reconstruction method, a dense reconstruction method, and the construction tool may be a colomap tool, and may be specifically determined according to the actual situation.

Wherein COLMAP is the framework of open-source three-dimensional reconstruction, COLMAP is a generic motion structure (SfM) and multi-view stereo (MVS) pipeline with graphics and command line interfaces.

Illustratively, the construction method is a sparse reconstruction method, the construction tool is a colop tool, and the above step 101 may be specifically implemented through steps 101a to 101d described below.

101a, feature point extraction and local descriptor calculation: feature point extraction and local descriptor computation are performed on all images in the plurality of image sets using a deep learning based superpoint network.

Among them, sift, orb, surf, hf-net and other methods can be used in 101a, and superpoint is used because of its better illumination, scale and rotation invariance.

101b, global matching connection: and calculating global descriptors by using a netvlad network based on deep learning for all the images, and then calculating the similarity of every two images by using dot multiplication according to the characteristics of the global descriptors after obtaining the global descriptors, wherein the more similar images establish a matching relationship.

101c, local descriptor matching: and for all the images establishing the matching relationship, matching local descriptors by using a learning-based superglue network to obtain a final descriptor matching pair.

101d, SFM mapping: and (3) for all the images, after the characteristic points and the matching relation are obtained, performing sparse reconstruction to obtain an offline three-dimensional reconstruction result, and thus obtaining the three-dimensional visual map.

Optionally, the following step 101e may be further included before the step 101 b.

101e, odometer initial: the image of the map can be used as an initial value if there is pre-existing odometry data.

102. Based on the three-dimensional visual map and the target parameters, correction parameters are determined.

103. And correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map.

Wherein the target parameter comprises at least one of: the external parameters of the multi-view camera are the acceleration of a plurality of Inertial Measurement Units (IMU).

Wherein each IMU acceleration is measured during the acquisition of the plurality of image sets by the multi-camera, i.e., each IMU acceleration is measured by the inertial navigation system during the acquisition of the plurality of image sets by the multi-camera.

At present, because the algorithms such as SFM are all purely visual mapping methods, the problem that the scale of the constructed three-dimensional visual map is not matched with the scale of the actual environment is caused, and/or the problem that the gravity direction of the constructed three-dimensional visual map is not matched with the gravity direction in the actual environment is caused.

It will be appreciated that if the target parameters include external parameters of the multi-view camera, the above-mentioned corrections (step 102 and step 103) include scale alignment corrections, thus correcting the problem that the scale of the constructed three-dimensional visual map does not match the actual environmental scale; if the target parameter includes the plurality of IMU accelerations, the correction includes a gravity direction alignment correction, so as to correct a problem that a gravity direction of the constructed three-dimensional visual map does not match a gravity direction in an actual environment.

Optionally, the target parameter comprises an external parameter of the multi-view camera, and the correction comprises a scale alignment correction; the above step 102 may be specifically realized by the following step 102a, and the above step 103 may be specifically realized by the following step 103 a.

102a, determining the target correction ratio based on the distance between each frame of image shot synchronously in each two image sets in the plurality of image sets in the three-dimensional visual map and the external parameters of the multi-camera.

It can be understood that the distance between each frame of image captured synchronously in each two image sets in the plurality of image sets in the three-dimensional visual map can be calculated according to the position coordinates of each frame of image captured synchronously in each two image sets in the plurality of image sets in the three-dimensional visual map, and the calculation formula is thatWherein, (x) ₁ ,y ₁ ,z ₁ ) Is the position coordinate of an image in the three-dimensional visual map, (x) ₂ ,y ₂ ,z ₂ ) Is the position coordinates of another image in the three-dimensional visual map. The relative distance of each two cameras in the three-dimensional visual map is then determined based on the distance in the three-dimensional visual map of each frame of images captured synchronously in each two of the plurality of image sets (captured by each corresponding two of the multi-view cameras).

Alternatively, the above step 102a may be specifically implemented by the following steps 102a1 and 102a 2.

102a1, determining a plurality of correction ratios based on the distance between each frame of image in the three-dimensional visual map, which is shot synchronously in each of the plurality of image sets, and the external parameters of the cameras corresponding to each of the plurality of image sets.

102a2, determining an average value or an intermediate value of the plurality of correction ratios as the target correction ratio.

According to the external parameters of the multi-camera, calculating the relative distance between every two cameras in practice; then determining at least one scaling according to the relative distance between each two cameras in the three-dimensional visual map and the actual relative distance between each two cameras, wherein each scaling can be the ratio of the relative distance between each two cameras in the three-dimensional visual map and the actual relative distance between each two cameras, or each scaling can be the ratio of the actual relative distance between each two cameras and the relative distance between each two cameras in the three-dimensional visual map; finally, according to the at least one scaling, determining a scaling of the three-dimensional visual map, namely, a target correction scaling, wherein the scaling of the three-dimensional visual map is an average value of the at least one scaling or an intermediate value in the at least one scaling, and can be specifically determined according to actual conditions, and the method is not limited herein.

In the embodiment of the disclosure, various methods for calculating the target correction ratio are provided, and the target correction ratio may be calculated by other methods, specifically may be determined according to actual situations, and is not limited herein.

And 103a, carrying out scale alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the target correction proportion to obtain the corrected three-dimensional visual map.

It can be understood that if each scaling is a ratio of a relative distance between two cameras in the three-dimensional visual map to a relative distance between each two cameras in practice, based on the target correction ratio, performing scale alignment correction on pose of each image and each three-dimensional point in the three-dimensional visual map to obtain the corrected three-dimensional visual map, specifically: dividing the pose of each image and each three-dimensional point in the three-dimensional visual map by a target correction proportion to obtain a corrected three-dimensional visual map; if each scaling ratio can be a ratio of a relative distance between each two cameras in practice and a relative distance between two cameras in a three-dimensional visual map, performing scale alignment correction on pose of each image and each three-dimensional point in the three-dimensional visual map based on the target correction ratio to obtain the corrected three-dimensional visual map, specifically including: and multiplying the pose of each image and each three-dimensional point in the three-dimensional visual map by a target correction proportion to obtain the corrected three-dimensional visual map.

In the embodiment of the disclosure, the problem that the scale of the constructed three-dimensional visual map is not matched with the scale of the actual environment can be well corrected through the step 102a and the step 103a, so that the scale accuracy of the final three-dimensional visual map is improved, and an accurate basis can be provided for subsequent positioning and navigation based on the three-dimensional visual map.

Alternatively, the above step 103a may be specifically realized by the following steps 103a1 to 103a 5.

103a1, performing preliminary correction on the pose of each image and each three-dimensional point in the three-dimensional visual map based on the target correction proportion, and obtaining the three-dimensional visual map after preliminary correction.

The description of the step 103a1 may refer to the related description in the step 103a, and will not be repeated here.

103a2, obtaining pose information of each image in each image set in the preliminarily corrected three-dimensional visual map to obtain a plurality of pose information sets, wherein each pose information set corresponds to one image set.

Wherein each pose information set corresponds to an image set. It will be appreciated that each set of pose information includes pose information for each image in one set of images in the three-dimensional visual map, each set of pose information being pose information for each frame of images in one set of images in the three-dimensional visual map.

103a3, constructing a scale optimization function based on the plurality of pose information sets and external parameters of the multi-camera.

The scale optimization function is used for representing the sum of differences between measured values and true values of the relative pose relationship of every two adjacent cameras in the multi-camera, the measured values are determined according to the plurality of pose information sets, and the true values are determined according to external parameters of the multi-camera.

Taking the ith camera and the jth camera as examples (i is not equal to j), the measured values of the relative pose relationship of the ith camera and the jth camera are: the product of the inverse matrix of the pose of the w frame image in the three-dimensional visual map in the image set corresponding to the ith camera and the pose of the w frame image in the three-dimensional visual map in the image set corresponding to the jth camera is i, 1,2,3 and … … N, and j is 1,2 and 3 … … N. w is 1,2,3 … … M, N is the number of cameras in the multi-camera, M is the number of images in each image set; can be determined from the extrinsic parameters between the ith and jth cameras of the multi-view camera.

Illustratively, the scale optimization function is formulated as follows:

or,

wherein T is _wi Representing the pose of the w frame image in the three-dimensional visual map in the image set corresponding to the i camera, T _wi ^-1 Representing T _wi And T _wj Representing the pose, T, of the w frame image in the three-dimensional visual map in the image set corresponding to the j cameras _ij A true value representing the relative pose relationship of the ith camera and the jth camera, T _ij ^-1 Representing T _ij Inverse matrix of (T) _wi ^-1 *T _wj ) ^-1 Representing T _wi ^-1 *T _wj Log is a logarithmic mapping.

Alternatively, j is i+1, or j is i-1.

It should be noted that, the formula of the scale optimization function may also be a modification based on the above formula, and may specifically be determined according to practical situations, which is not limited herein.

103a4, optimizing the scale optimization function through a nonlinear optimization algorithm, so as to obtain the corrected pose information of each image in the primarily corrected three-dimensional visual map through calculation when the function value of the scale optimization function is smaller than or equal to the scale error threshold value.

Wherein the nonlinear optimization algorithm may include any one of the following: the gaussian newton algorithm, the first-order gradient descent algorithm, the second-order gradient descent algorithm, the LM algorithm, or the dopreg algorithm may be specifically determined according to practical situations, and is not limited herein.

The scale error threshold may be determined according to practical situations, and is not limited herein.

103a5, correcting the pose of each image in the three-dimensional visual map based on the corrected pose information of each image, so as to obtain the corrected three-dimensional visual map.

It will be appreciated that the pose of each image in the three-dimensional visual map may be updated to the corrected pose information for each image, resulting in the corrected three-dimensional visual map.

In the embodiment of the disclosure, the pose of each image in the three-dimensional map is optimized based on the scale optimization function by constructing the scale optimization function, so that the problem that the scale of the constructed three-dimensional visual map is not matched with the scale of the actual environment is further corrected, the scale accuracy of the final three-dimensional visual map is further improved, and an accurate basis can be provided for subsequent positioning and navigation based on the three-dimensional visual map.

Optionally, the calibration parameters further include internal parameters of the multi-view camera; the above step 103a3 can be specifically realized by the following steps 103a3a to 103a3 c.

103a3a, constructing a first function according to the plurality of pose information sets and external parameters of the multi-camera.

Wherein the first function is used to characterize a sum of differences between measured and actual values of the relative pose relationship of each adjacent two of the multiple cameras.

103a3b, constructing a second function based on the plurality of pose information sets and the internal parameters of the multi-camera.

Wherein the second function is used to characterize residual terms of a mapping tool used to construct the three-dimensional visual map.

103a3c, determining the scale optimization function based on the first function and the second function.

Wherein, the scale optimization function is: and a residual term of a mapping tool used for constructing the three-dimensional visual map and the sum of the scale optimization function are determined according to the pose information sets and internal references of the multi-camera.

Illustratively, the scale optimization function is formulated as follows:

or,

wherein other represents the residual term.

Illustratively, taking the construction tool as a colmap tool, and taking other as a residual term corresponding to the colmap tool as an example, the formula of the residual term is as follows:

wherein ρ represents the inverse depth, K represents the internal reference, T _iw Representing the pose of the w frame image in the three-dimensional visual map in the image set corresponding to the i camera, P _wpq Representing coordinates of 3D points corresponding to pixel points (p, q) with coordinates of w frame image in image set corresponding to ith camera in three-dimensional map, pix _pq And representing the pixel corresponding to the pixel point with the coordinates (p, q) of the w frame image in the image set corresponding to the ith camera.

In the embodiment of the disclosure, the scale optimization function is the sum of the residual error item and the scale optimization function, so that the residual error item and the scale optimization function can be simultaneously optimized, the optimization efficiency is improved, the pose of each image and each three-dimensional point in the three-dimensional map are optimized based on the scale optimization function, the problem that the scale of the constructed three-dimensional visual map is not matched with the actual environment scale can be further corrected, the scale accuracy of the final three-dimensional visual map is further improved, and an accurate basis can be provided for subsequent positioning and navigation based on the three-dimensional visual map.

Optionally, in the embodiment of the present disclosure, the residual term and the scale optimization function may be optimized respectively, specifically, the pose of each image in the three-dimensional map may be optimized based on the scale optimization function first, then each three-dimensional point in the three-dimensional map may be optimized based on the residual term, which may be determined specifically according to the actual use situation, and the present disclosure is not limited herein.

Optionally, the target parameter includes a plurality of IMU accelerations, each IMU acceleration being measured during the process of acquiring the plurality of image sets by the multi-camera, and the correction parameter includes an optimized rotation matrix, and the correction is a gravity direction alignment correction; the above step 102 may be specifically realized by the following steps 102b to 102f, and the above step 103 may be specifically realized by the following step 103b.

102b, calculating an acceleration vector corresponding to each frame of image based on at least one position information set.

Wherein each set of location information includes location information of individual images in one set of images in the three-dimensional visual map. That is, each position information in each set of position information is position information (three-dimensional coordinate information) of each image in one set of images in the three-dimensional visual map.

It can be understood that the difference is made according to the position information of each two adjacent frames of images in each image set in the three-dimensional visual map, and then the difference is divided by the time interval of the acquisition of the two adjacent frames of images, so as to obtain the speed vector between each two adjacent frames of images under the world coordinate system; and then, dividing the difference between two adjacent velocity vectors by the time interval of acquisition of two adjacent frames of images to obtain the acceleration vector of each frame of image under the world coordinate.

If each image set includes M frames of images, M-2 acceleration vectors can be obtained by the above calculation, and if the acceleration vector calculated from the position information of the three continuous frames of images in the three-dimensional visual map is recorded as the acceleration vector corresponding to the middle frame of image, the first frame of image and the last frame of image have no corresponding acceleration vector.

102c, performing linear interpolation processing on the plurality of IMU accelerations to obtain IMU acceleration vectors corresponding to each frame of image.

The inertial measurement unit (Inertial measurement unit, IMU) is a device that measures the three-axis attitude angle (or angular rate) and acceleration of an object.

Because the frame rate of the acquired image and the frequency of the measured IMU acceleration are different, the IMU acceleration vector corresponding to each frame of image can be obtained based on the plurality of IMU accelerations by a linear difference method.

102d, calculating to obtain a gravity acceleration vector based on the acceleration vector corresponding to each frame of image and the IMU acceleration vector corresponding to each frame of image.

The method comprises the steps of firstly calculating a gravity acceleration vector corresponding to each frame of image based on the acceleration vector corresponding to each frame of image and the IMU acceleration vector corresponding to each frame of image, wherein the gravity acceleration vector corresponding to each frame of image is the difference between the IMU acceleration vector corresponding to each frame of image and the acceleration vector corresponding to each frame of image. And then adding and averaging the gravity acceleration vectors corresponding to each frame of image to determine a final gravity acceleration vector.

102e, constructing a gravity direction optimization function based on the gravity acceleration vector.

The gravity direction optimization function is used for representing the difference between the gravity acceleration vector rotated by the rotation matrix and the actual gravity acceleration vector pointing to the gravity direction modulus value of the gravity acceleration.

The actual gravitational acceleration vector pointing to the gravitational direction modulus is determined according to the gravitational acceleration at the current position, which is not limited herein.

If the gravity acceleration vector can be the gravity acceleration vector corresponding to each camera, the gravity direction optimization function is the sum of differences between the gravity acceleration vector corresponding to each camera rotated by the rotation matrix and the actual gravity acceleration vector pointing to the gravity direction modulus value; if the gravity acceleration vector can be the gravity acceleration vector corresponding to the capturing camera, the gravity direction optimization function is the difference between the gravity acceleration vector corresponding to each camera rotated by the rotation matrix and the actual gravity acceleration vector pointing to the gravity direction module value; specifically, the method can be determined according to practical situations, and is not limited herein.

For example, the gravity acceleration vector may be a gravity acceleration vector corresponding to each camera, and the formula of the gravity direction optimization function is as follows:

wherein R represents a rotation matrix, which is a rotation matrix with two degrees of freedom, and can be [ a, b,0]A and b are any non-0 number, v _i Represents the gravity acceleration vector corresponding to the ith camera, g represents the actual gravity acceleration of the current position, [0, g ]]Actual gravitational acceleration vector representing gravitational acceleration pointing to gravitational direction modulus, [0, g] ^T Representing a transpose of the actual gravitational acceleration vector pointing to the gravitational direction modulus value and gravitational acceleration.

102f, optimizing the gravity direction optimization function through a nonlinear optimization algorithm, so as to obtain the optimized rotation matrix through solution when the function value of the gravity direction optimization function is smaller than or equal to a direction error threshold value.

The gravity direction optimization function may be determined according to practical situations, and is not limited herein.

The direction error threshold may be determined according to practical situations, and is not limited herein.

103b, according to the optimized rotation matrix, carrying out gravity direction alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map to obtain the corrected three-dimensional visual map.

It can be appreciated that the pose of each image and each three-dimensional point in the three-dimensional visual map are rotated by the rotation matrix to correct the direction of gravity.

In the embodiment of the disclosure, the problem that the gravity direction of the constructed three-dimensional visual map is not matched with the gravity direction of the actual environment can be well corrected through the steps 102b to 102f and the step 103b, so that the gravity direction accuracy of the final three-dimensional visual map is improved, and an accurate basis can be provided for subsequent positioning and navigation based on the three-dimensional visual map.

Illustratively, the formula for the gravity direction optimization function described above is as follows:

wherein v is _i ' represents a normalized vector corresponding to a gravitational acceleration vector corresponding to an ith camera, [0, 1 ]]Representing the target vector, [0, 1 ]] ^T Representing a transpose of the target vector, R may be r=exp ([ 1, 0)])。

In the embodiment of the disclosure, the gravity acceleration vector is normalized (the modulus is 1), so that the actual gravity acceleration of the current position is not required to be considered, the gravity direction optimization function is simplified, the optimization process is simplified, and the optimization efficiency is improved.

Alternatively, the gravitational direction alignment correction may be performed after the dimensional alignment correction is performed, and specifically, the above-described step 103a5 may be implemented through steps 201 to 207 described below.

201. And correcting the pose of each image in the three-dimensional visual map based on the corrected pose information of each image to obtain a three-dimensional visual map after secondary correction.

202. And calculating an acceleration vector corresponding to each frame of image based on at least one position information set.

Wherein each set of positional information includes positional information of each image in one set of images in the three-dimensional visual map after the correction.

203. And carrying out linear interpolation processing on the plurality of IMU accelerations to obtain IMU acceleration vectors corresponding to each frame of image.

204. And calculating to obtain a gravity acceleration vector based on the acceleration vector corresponding to each frame of image and the IMU acceleration vector corresponding to each frame of image.

205. Based on the gravitational acceleration vector, a gravitational direction optimization function is constructed.

206. And optimizing the gravity direction optimization function through a nonlinear optimization algorithm, so that when the function value of the gravity direction optimization function is smaller than or equal to a direction error threshold value, the optimized rotation matrix is obtained through calculation.

207. And according to the optimized rotation matrix, carrying out gravity direction alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map after correction, and obtaining the three-dimensional visual map after correction.

The description of the above step 201 may refer to the description of the above step 103a5, which is not repeated herein, and the descriptions of the above steps 202 to 207 may refer to the description of the above steps 102b to 102f and the description of the above step 103b, which is not repeated herein.

In the embodiment of the disclosure, the problem that the scale of the constructed three-dimensional visual map is not matched with the scale of the actual environment can be corrected, the problem that the gravity direction of the constructed three-dimensional visual map is not matched with the gravity direction of the actual environment can be corrected, the gravity direction accuracy of the final three-dimensional visual map is improved, and an accurate basis can be provided for subsequent positioning and navigation based on the three-dimensional visual map.

Fig. 2 is a block diagram of a spatial map processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 2, includes: a construction module 21, a determination module 22 and a correction module 23; the construction module 21 is configured to construct a three-dimensional visual map based on images in a plurality of image sets acquired by a multi-view camera, where a multi-frame image in one image set is acquired by one of the multi-view cameras, and the three-dimensional visual map includes each frame of image in each image set, and a pose of the each frame of image and a three-dimensional point on the each frame of image; the determining module 22 is configured to determine a correction parameter based on the three-dimensional visual map and the target parameter obtained by the constructing module; the correction module 23 is configured to correct the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameter determined by the determination module, so as to obtain the corrected three-dimensional visual map.

Optionally, the target parameter includes an external parameter of the multi-view camera, and the correction parameter is a target correction ratio; the determining module 22 is specifically configured to determine the target correction ratio based on a distance between each frame of image captured synchronously in each two image sets in the plurality of image sets in the three-dimensional visual map and an external parameter of the multi-camera; the correction module 23 is specifically configured to perform scale alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the target correction ratio, so as to obtain the corrected three-dimensional visual map.

Optionally, the determining module 22 is specifically configured to determine a plurality of correction scales based on a distance in the three-dimensional visual map between each frame of image captured synchronously in each of the two image sets and an external parameter of the camera corresponding to each of the two image sets; an average or median of the plurality of correction ratios is determined as the target correction ratio.

Optionally, the correction module 23 is specifically configured to perform preliminary correction on the pose of each image and each three-dimensional point in the three-dimensional visual map based on the target correction ratio, so as to obtain the three-dimensional visual map after preliminary correction; acquiring pose information of each image in each image set in the preliminarily corrected three-dimensional visual map to obtain a plurality of pose information sets, wherein each pose information set corresponds to one image set; constructing a scale optimization function based on a plurality of pose information sets and external parameters of the multi-camera, wherein the scale optimization function is used for representing the sum of differences between measured values and real values of the relative pose relation of each two adjacent cameras in the multi-camera, the measured values are determined according to the plurality of pose information sets, and the real values are determined according to the external parameters of the multi-camera; optimizing the scale optimization function through a nonlinear optimization algorithm, so as to obtain the corrected pose information of each image in the primarily corrected three-dimensional visual map through calculation when the function value of the scale optimization function is smaller than or equal to the scale error threshold value; and correcting the pose of each image in the three-dimensional visual map based on the corrected pose information of each image to obtain the corrected three-dimensional visual map.

Optionally, the correction module 23 is specifically configured to construct a first function according to the pose information sets and external parameters of the multi-camera, where the first function is used to characterize a sum of differences between measured values and actual values of relative pose relationships of each two adjacent cameras in the multi-camera; constructing a second function according to the pose information sets and the internal references of the multi-view camera, wherein the second function is used for representing residual items of a mapping tool used for constructing the three-dimensional visual map; the scale optimization function is determined based on the first function and the second function.

Optionally, the target parameter includes a plurality of IMU accelerations, each IMU acceleration being measured during the acquisition of the plurality of image sets by the multi-camera, the correction parameter including an optimized rotation matrix; the determining module 22 is specifically configured to calculate an acceleration vector corresponding to each frame of image based on at least one set of position information, where each set of position information includes position information of each image in one set of images in the three-dimensional visual map; performing linear interpolation processing on the plurality of IMU accelerations to obtain IMU acceleration vectors corresponding to each frame of image; based on the acceleration vector corresponding to each frame of image and the IMU acceleration vector corresponding to each frame of image, calculating to obtain a gravity acceleration vector; constructing a gravity direction optimization function based on the gravity acceleration vector, wherein the gravity direction optimization function is used for representing the difference between the gravity acceleration vector rotated by the rotation matrix and an actual gravity acceleration vector pointing to the gravity direction module value as gravity acceleration; optimizing the gravity direction optimization function through a nonlinear optimization algorithm, so as to obtain the optimized rotation matrix through calculation when the function value of the gravity direction optimization function is smaller than or equal to a direction error threshold value; the correction module 23 is specifically configured to perform gravity direction alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the optimized rotation matrix, so as to obtain the corrected three-dimensional visual map.

In the embodiment of the disclosure, each module may implement the spatial map processing method provided in the embodiment of the method, and may achieve the same technical effect, so that repetition is avoided and redundant description is omitted.

Fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure, which is used to exemplarily illustrate an electronic device implementing any spatial map processing method in an embodiment of the present disclosure, and should not be construed as specifically limiting the embodiment of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processor (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with programs stored in a Read Only Memory (ROM) 302 or loaded from a storage 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processor 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While an electronic device 300 having various means is shown, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. The computer program, when executed by the processor 301, may perform the functions defined in any of the spatial map processing methods provided by the embodiments of the present disclosure.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the client, server, may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: constructing a three-dimensional visual map based on images in a plurality of image sets acquired by a multi-view camera, the multi-frame images in one image set being acquired by one of the multi-view cameras, the three-dimensional visual map comprising each frame of image in each image set, and a pose of each frame of image and a three-dimensional point on each frame of image; determining correction parameters based on the three-dimensional visual map and the target parameters; and correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map.

In an embodiment of the present disclosure, computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a computer-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a computer-readable storage medium would include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method of spatial map processing, the method comprising:

constructing a three-dimensional visual map based on images in a plurality of image sets acquired by a multi-view camera, wherein the multi-frame images in one image set are acquired by one camera in the multi-view camera, and the three-dimensional visual map comprises each frame of image in each image set, the pose of each frame of image and three-dimensional points on each frame of image;

determining correction parameters based on the three-dimensional visual map and the target parameters;

and correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain the corrected three-dimensional visual map.

2. The method of claim 1, wherein the target parameter comprises an external parameter of the multi-view camera, the correction parameter being a target correction ratio;

The determining a correction parameter based on the three-dimensional visual map and the target parameter comprises the following steps:

determining the target correction ratio based on the distance between each frame of image shot synchronously in each two image sets in the plurality of image sets in the three-dimensional visual map and the external parameters of the multi-camera;

correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain a corrected three-dimensional visual map, wherein the method comprises the following steps:

and carrying out scale alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the target correction proportion to obtain the corrected three-dimensional visual map.

3. The method of claim 2, wherein the determining the target correction scale based on the distance in the three-dimensional visual map between each frame of image captured synchronously in each two of the plurality of image sets and the external reference of the multi-view camera comprises:

determining a plurality of correction proportions based on the distance between each frame of image in the three-dimensional visual map, which is shot synchronously in each two image sets in the plurality of image sets, and the external parameters of the cameras corresponding to each two image sets;

And determining an average value or an intermediate value of the plurality of correction ratios as the target correction ratio.

4. A method according to claim 2 or 3, wherein the performing scale alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map according to the target correction scale to obtain the corrected three-dimensional visual map comprises:

based on the target correction proportion, carrying out preliminary correction on the pose of each image and each three-dimensional point in the three-dimensional visual map to obtain the three-dimensional visual map after preliminary correction;

acquiring pose information of each image in each image set in the preliminarily corrected three-dimensional visual map to obtain a plurality of pose information sets, wherein each pose information set corresponds to one image set;

constructing a scale optimization function based on a plurality of pose information sets and external parameters of the multi-camera, wherein the scale optimization function is used for representing the sum of differences between measured values and real values of the relative pose relation of each two adjacent cameras in the multi-camera, the measured values are determined according to the plurality of pose information sets, and the real values are determined according to the external parameters of the multi-camera;

Optimizing the scale optimization function through a nonlinear optimization algorithm, so as to obtain the corrected pose information of each image in the primarily corrected three-dimensional visual map through calculation when the function value of the scale optimization function is smaller than or equal to the scale error threshold value;

and correcting the pose of each image in the three-dimensional visual map based on the corrected pose information of each image to obtain the corrected three-dimensional visual map.

5. The method of claim 4, wherein constructing a scale optimization function based on the plurality of pose information sets and external parameters of the multi-view camera comprises:

constructing a first function according to the pose information sets and external parameters of the multi-camera, wherein the first function is used for representing the sum of differences between measured values and true values of the relative pose relation of every two adjacent cameras in the multi-camera;

constructing a second function according to the pose information sets and internal parameters of the multi-camera, wherein the second function is used for representing residual items of a mapping tool used for constructing the three-dimensional visual map;

and determining the scale optimization function according to the first function and the second function.

6. The method of claim 1, wherein the target parameters comprise a plurality of IMU accelerations, each IMU acceleration being measured during acquisition of the plurality of image sets by the multi-camera, the correction parameters comprising an optimized rotation matrix;

calculating an acceleration vector corresponding to each frame of image based on at least one position information set, wherein each position information set comprises position information of each image in one image set in the three-dimensional visual map;

performing linear interpolation processing on the plurality of IMU accelerations to obtain IMU acceleration vectors corresponding to each frame of image;

based on the acceleration vector corresponding to each frame of image and the IMU acceleration vector corresponding to each frame of image, calculating to obtain a gravity acceleration vector;

constructing a gravity direction optimization function based on the gravity acceleration vector, wherein the gravity direction optimization function is used for representing the difference between the gravity acceleration vector rotated by the rotation matrix and an actual gravity acceleration vector pointing to the gravity direction modulus value as the gravity acceleration;

Optimizing the gravity direction optimization function through a nonlinear optimization algorithm, so as to obtain the optimized rotation matrix through solution when the function value of the gravity direction optimization function is smaller than or equal to a direction error threshold value;

correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters to obtain a corrected three-dimensional visual map, wherein the method comprises the following steps of:

and according to the optimized rotation matrix, carrying out gravity direction alignment correction on the pose of each image and each three-dimensional point in the three-dimensional visual map to obtain the corrected three-dimensional visual map.

7. The method of claim 6, wherein the gravity direction optimization function is specifically configured to characterize a difference between a normalized vector rotated by the rotation matrix and a target vector, the normalized vector being obtained by normalizing the gravity acceleration vector, and the target vector being a unit vector pointing in a gravity direction.

8. A space map processing apparatus, characterized by comprising: the system comprises a construction module, a determination module and a correction module;

the construction module is used for constructing a three-dimensional visual map based on images in a plurality of image sets acquired by the multi-view camera, wherein the multi-frame images in one image set are acquired by one camera in the multi-view camera, and the three-dimensional visual map comprises each frame of image in each image set, the pose of each frame of image and three-dimensional points on each frame of image;

The determining module is used for determining correction parameters based on the three-dimensional visual map and the target parameters obtained by the constructing module;

and the correction module is used for correcting the pose of each image and each three-dimensional point in the three-dimensional visual map according to the correction parameters determined by the determination module, so as to obtain the corrected three-dimensional visual map.

9. An electronic device, comprising: a memory and a processor, the memory for storing a computer program; the processor is configured to execute the spatial map processing method of any one of claims 1 to 7 when the computer program is invoked.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, implements the spatial map processing method of any one of claims 1 to 7.

11. A computer program product, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the spatial map processing method of any of claims 1 to 7.