CN113450253B

CN113450253B - Image processing method, image processing device, electronic equipment and computer readable storage medium

Info

Publication number: CN113450253B
Application number: CN202110553182.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Chengshi Wanglin Information Technology Co Ltd
Current assignee: Beijing Chengshi Wanglin Information Technology Co Ltd
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2022-05-20
Anticipated expiration: 2041-05-20
Also published as: CN113450253A

Abstract

An image processing method, an image processing apparatus, an electronic device, and a medium. The image processing method comprises the following steps: acquiring a plurality of initial images; performing feature point matching on a plurality of initial images to obtain at least one matched image pair; for each matching image pair, selecting one of a first initial image and a second initial image in the matching image pair as a gridding image, and carrying out gridding processing on the gridding image so as to divide the gridding image into a plurality of grids; calculating a mapping matrix corresponding to each grid in a plurality of grids according to the matched characteristic points between the first initial image and the second initial image for each matched image pair; mapping each initial image in the plurality of initial images to a target pixel canvas to obtain a plurality of target images based on the mapping matrix corresponding to each grid; and fusing the target images to obtain a spliced image. The method not only reduces the requirements and the operation amount of image splicing on the images, but also enables the spliced images to be more natural.

Description

Image processing method, image processing device, electronic equipment and computer readable storage medium

Technical Field

Embodiments of the present disclosure relate to an image processing method, apparatus, electronic device, and computer-readable storage medium.

Background

Currently, image mosaicing (image mosaic) is gaining increasing attention, and it has become a hot spot in photo-graphing, computer vision, image processing, and computer graphics research. Image stitching generally forms a seamless, high-definition image by aligning a series of images, which has a higher resolution and a larger field of view than a single image. The application scene of image splicing is wide, such as unmanned aerial vehicle aerial photography, remote sensing images and the like.

Disclosure of Invention

At least one embodiment of the present disclosure provides an image processing method, including: acquiring a plurality of initial images; performing feature point matching on the plurality of initial images to obtain at least one matched image pair, wherein each matched image pair in the at least one matched image pair comprises a first initial image and a second initial image, and the first initial image and the second initial image are different initial images with matched feature points in between; for each of the matched image pairs, selecting one of the first initial image and the second initial image in the matched image pair as a gridded image, and performing gridding processing on the gridded image to divide the gridded image into a plurality of grids; calculating a mapping matrix corresponding to each grid in the plurality of grids for each matched image pair based on matched feature points between the first initial image and the second initial image; mapping each initial image in the plurality of initial images to a target pixel canvas to obtain a plurality of target images based on the mapping matrix corresponding to each grid; and fusing the target images to obtain a spliced image.

For example, in an image processing method provided by an embodiment of the present disclosure, for each matching image pair, calculating the mapping matrix corresponding to each mesh in the multiple meshes based on feature points matched between the first initial image and the second initial image includes: for each grid, determining the weight of each pair of feature points in the matched feature points between the first initial image and the second initial image to the grid, wherein each pair of feature points comprises a first feature point and a second feature point, and the first feature point and the second feature point are the matched feature points in the first initial image and the second initial image respectively; determining first image coordinates of the characteristic points in the first initial image, and determining second image coordinates of the characteristic points in the second initial image; and for each grid, determining a mapping matrix corresponding to the grid based on the weight of each pair of feature points to the grid and the first image coordinate and the second image coordinate.

For example, in an image processing method provided by an embodiment of the present disclosure, for each grid, determining a mapping matrix corresponding to the grid based on the weight of each pair of feature points to the grid and the first image coordinate and the second image coordinate includes: for each grid, constructing a singular value decomposition matrix corresponding to the grid based on the first image coordinates, the second image coordinates and weights for the grid; and carrying out singular value decomposition on the singular value decomposition matrix to obtain a mapping matrix corresponding to the grid.

For example, in an image processing method provided by an embodiment of the present disclosure, determining, for each mesh, a weight of each pair of feature points in the feature points matched between the first initial image and the second initial image to the mesh includes: determining the distance from the characteristic points in the gridding image to the grid according to the gridding image; and determining the weight of each pair of feature points to the grid based on the distance of each feature point in the gridded image to the grid.

For example, in an image processing method provided by an embodiment of the present disclosure, mapping each of the plurality of initial images into the target pixel canvas based on the mapping matrix corresponding to each grid to obtain the plurality of target images includes: determining target pose information corresponding to each grid based on the mapping matrix corresponding to each grid; determining projection information of each grid in the curved surface projection based on the target pose information corresponding to each grid; and for each initial image, mapping the initial image into the target pixel canvas to generate the target image based on the projection information of each mesh corresponding to the matching image pair in which the initial image is located in the curved surface projection.

For example, in an image processing method provided by an embodiment of the present disclosure, determining target pose information corresponding to each grid based on a mapping matrix corresponding to each grid includes: determining a reference image from the plurality of initial images; determining first position information of each grid relative to the reference image based on the mapping matrix corresponding to each grid and the reference image; and determining target pose information for each grid based on first pose information for the grid relative to the reference image.

For example, in an image processing method provided by an embodiment of the present disclosure, dividing the plurality of initial images into a plurality of image groups, where each image group includes at least one of the at least one matching image pair, and there is a matching feature point between different matching image pairs in each image group, and there is no matching feature point between different image groups, and determining first pose information of each mesh with respect to the reference image based on the mapping matrix corresponding to each mesh and the reference image includes: for each grid, in the case that the initial image to which the grid belongs and the reference image belong to the same image group, determining first pose information of the grid relative to the reference image based on a mapping matrix corresponding to the grid and the reference image; in the case that the initial image to which the mesh belongs and the reference image do not belong to the same image group, determining a pose relationship between the image group in which the initial image to which the mesh belongs and the image group in which the reference image belongs, and determining first pose information of the mesh with respect to the reference image based on the pose relationship and a mapping matrix corresponding to the mesh.

For example, in an image processing method provided by an embodiment of the present disclosure, the plurality of initial images are obtained by an image capturing device, the image capturing device includes a sensor, and the method further includes: acquiring construction pose information constructed by the sensor, wherein the construction pose information comprises a pose adopted in the process of acquiring each initial image by the image acquisition device; determining the position and orientation relation between the image group of the initial image belonging to the grid and the image group of the reference image, wherein the position and orientation relation comprises the following steps: and determining the pose relationship between the image group of the initial image to which the grid belongs and the image group of the reference image based on the construction pose information corresponding to at least one initial image respectively included in the image group of the initial image to which the grid belongs and the image group of the reference image.

For example, in an image processing method provided by an embodiment of the present disclosure, determining target pose information of each grid based on first pose information of each grid with respect to the reference image includes: and carrying out data fusion on the first pose information of each grid relative to the reference image and the constructed pose information to obtain target pose information of each grid.

For example, in an image processing method provided by an embodiment of the present disclosure, performing data fusion on the first pose information of each mesh with respect to the reference image and the constructed pose information to obtain target pose information of each mesh includes: converting the constructed pose information into a coordinate system taking the reference image as a reference to obtain second pose information; performing data fusion on the second attitude information and the first attitude information to obtain a fusion screening result; and determining the target pose information based on the fusion screening result.

For example, in an image processing method provided by an embodiment of the present disclosure, determining the target pose information based on the fusion filtering result includes: and processing the fusion screening result by using a parameter optimization method to obtain the target pose information.

For example, in an image processing method provided by an embodiment of the present disclosure, a sensor includes a first sensor and a second sensor, and acquiring the construction pose information constructed by the sensor includes: acquiring first pose data, wherein the first pose data is a first pose adopted by the image acquisition device constructed by the first sensor to acquire each initial image; acquiring second pose data, wherein the second pose data is a second pose adopted by the image acquisition device constructed by the second sensor to acquire each initial image; and performing data fusion on the first position and posture data and the second position and posture data to obtain the construction position and posture information adopted by each initial image.

For example, in an image processing method provided by an embodiment of the present disclosure, determining projection information of each mesh in the curved surface projection based on target pose information corresponding to each mesh includes: converting each grid into a world coordinate system based on the target pose information corresponding to each grid to obtain world coordinate information of each grid in the world coordinate system; according to the world coordinate information of each grid, determining curved surface coordinate information of each grid in the curved surface projection; and converting the curved surface coordinate information of each mesh into the projection information.

For example, in the image processing method provided by an embodiment of the present disclosure, for each initial image, based on projection information of each mesh corresponding to the matching image pair in which the initial image is located in the curved projection, mapping the initial image into the target pixel canvas to generate the target image, includes: determining a size of the target pixel canvas; and for each initial image, determining the position of a pixel point in each mesh in the target pixel canvas based on the size of the target pixel canvas and the projection information of each mesh corresponding to the matching image pair in which the initial image is located in the curved surface projection, so as to map each mesh into the target pixel canvas to generate the target image.

For example, in the image processing method provided in an embodiment of the present disclosure, the projection information includes longitude and latitude of a pixel point in the initial image, and a position of the pixel point in each grid in the target pixel canvas is calculated by the following formula:

wherein c is the column of the pixel point in the target pixel canvas, r is the row of the pixel point in the target pixel canvas, theta is the longitude in the projection information, and theta is the length of the projection information

As latitude in the projection information, the I_WIs the width of the target pixel canvas, the I_HIs the height of the target pixel canvas.

For example, in an image processing method provided by an embodiment of the present disclosure, fusing the plurality of target images to obtain the stitched image includes: and performing weighted fusion on the plurality of target images to obtain the spliced image.

For example, in an image processing method provided by an embodiment of the present disclosure, performing weighted fusion on the multiple target images to obtain the stitched image includes: determining an overlapping area and a non-overlapping area of each target image in the plurality of target images based on the positions of the plurality of initial images in the target pixel canvas respectively, wherein the overlapping area is an area where a plurality of matched feature points exist between each target image and a target image except the target image in the plurality of target images, and the non-overlapping area is an area except the overlapping area in the target image; determining the weight corresponding to the overlapping area; and performing weighted fusion on the plurality of target images based on the weights corresponding to the overlapping areas.

For example, in an image processing method provided by an embodiment of the present disclosure, determining a weight corresponding to the overlap region includes: determining the Manhattan distance from the pixel points in the overlapping area to the center of the initial image; and determining the weight corresponding to the overlapping region based on the Manhattan distance.

For example, in an image processing method provided by an embodiment of the present disclosure, performing feature point matching on the plurality of initial images to obtain the at least one matched image pair includes: for each initial image, determining a neighboring initial image of the initial image, wherein the shooting point of the neighboring initial image and the shooting point of the initial image are adjacent to each other; feature point matching is performed on the initial image and the adjacent initial image to obtain the at least one matched image pair.

For example, in an image processing method provided in an embodiment of the present disclosure, the method further includes: and carrying out illumination homogenization treatment on the plurality of target images to enable the illumination intensity of the plurality of target images to be uniform.

For example, in an image processing method provided by an embodiment of the present disclosure, a plurality of initial images are obtained by an image capturing device, and the method further includes: acquiring a shooting pose of an image acquisition device; determining at least one target shooting area in a shooting environment based on the shooting pose; and displaying prompt information based on the at least one target shooting area to prompt a user to acquire a plurality of initial images in the at least one target shooting area.

For example, in an image processing method provided by an embodiment of the present disclosure, displaying prompt information based on at least one shooting area includes: displaying at least one acquisition guide area on the basis of at least one target shooting area, wherein the at least one acquisition guide area corresponds to the at least one target shooting area respectively; displaying prompt information, wherein the prompt information indicates a reference shooting point currently aligned with the image acquisition device; and under the condition that the prompt message falls into a target acquisition guide area in the at least one acquisition guide area, the reference shooting point currently aligned with the image acquisition device is a shooting point in the target shooting area corresponding to the target acquisition guide area.

For example, in an image processing method provided in an embodiment of the present disclosure, the method further includes: and in response to the movement of the image acquisition device, controlling the prompt information to at least surround the acquisition guide area for one circle in the same direction as the movement direction of the image acquisition device so as to acquire a plurality of initial images.

For example, in the image processing method provided by an embodiment of the present disclosure, for each of a plurality of initial images acquired according to the prompt information, each initial image has an overlapping area with an adjacent initial image, and the adjacent initial image is an initial image acquired at a shooting point adjacent to a shooting point corresponding to each initial image.

For example, in an image processing method provided in an embodiment of the present disclosure, the method further includes: converting the stitched image into a three-dimensional image; and outputting the three-dimensional image to show the three-dimensional image as a panoramic image.

At least one embodiment of the present disclosure provides an image processing apparatus including: an acquisition unit configured to acquire a plurality of initial images; a feature point matching unit configured to perform feature point matching on the plurality of initial images to obtain at least one matching image pair, wherein each matching image pair in the at least one matching image pair includes a first initial image and a second initial image, and the first initial image and the second initial image are different initial images between which matched feature points exist; a gridding unit configured to select one of the first initial image and the second initial image in the matching image pair as a gridded image for each matching image pair, and perform gridding processing on the gridded image to divide the gridded image into a plurality of grids; a computing unit configured to compute, for each of the matched image pairs, a mapping matrix corresponding to each of the plurality of meshes based on feature points matched between the first initial image and the second initial image; a mapping unit configured to map each of the plurality of initial images into a target pixel canvas to obtain a plurality of target images based on the mapping matrix corresponding to each grid; and the fusion unit is configured to fuse the target images to obtain a spliced image.

At least one embodiment of the present disclosure provides an electronic device comprising a processor; a memory including one or more computer program modules; one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules including instructions for implementing the image processing method provided by any of the embodiments of the present disclosure.

At least one embodiment of the present disclosure provides a computer-readable storage medium for storing non-transitory computer-readable instructions, which when executed by a computer, can implement an image processing method provided by any embodiment of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.

Fig. 1 illustrates a flowchart of an image processing method according to at least one embodiment of the present disclosure;

FIG. 2A illustrates a schematic diagram of a matched image pair provided by at least one embodiment of the present disclosure;

fig. 2B illustrates a schematic diagram of the division of the first initial image 210 into a plurality of grids provided by at least one embodiment of the present disclosure;

fig. 3A illustrates a flowchart of a method of step S40 in fig. 1 according to at least one embodiment of the present disclosure;

FIG. 3B is a schematic diagram provided by at least one embodiment of the present disclosure to illustrate determining the weight of each pair of feature points for a grid;

fig. 4 illustrates a flowchart of a method of providing step S50 of fig. 1 according to at least one embodiment of the present disclosure;

fig. 5A illustrates a flowchart of a method of step S51 in fig. 4 according to at least one embodiment of the present disclosure;

fig. 5B illustrates a flowchart of a method of step S512 provided by at least one embodiment of the present disclosure;

FIG. 5C is a schematic diagram illustrating two image sets provided by at least one embodiment of the present disclosure;

fig. 6A illustrates a flowchart of a method of step S52 in fig. 4 according to at least one embodiment of the present disclosure;

FIG. 6B illustrates a schematic view of a spherical projection provided by some embodiments of the present disclosure;

fig. 7A illustrates a flowchart of a method of step S53 in fig. 4 according to at least one embodiment of the present disclosure;

figures 7B and 7C illustrate a schematic diagram of a method of determining a size of a target pixel canvas provided by at least one embodiment of the present disclosure;

FIGS. 7D and 7E illustrate schematic diagrams of a target image 401 and a target image 402, respectively, generated by mapping a first initial image and a second initial image into a target pixel canvas, respectively;

FIG. 8A is a flowchart illustrating a method for performing weighted fusion on a plurality of target images to obtain a stitched image according to at least one embodiment of the present disclosure;

FIGS. 8B and 8C are schematic diagrams illustrating weighted fusion of a plurality of target images to obtain a stitched image according to at least one embodiment of the present disclosure;

fig. 8D is a schematic diagram showing a stitched image obtained by stitching the target image 401 and the target image 402;

fig. 9 is a flowchart illustrating another image processing method according to at least one embodiment of the disclosure;

fig. 10A illustrates a flow chart of another image processing method provided by at least one embodiment of the present disclosure;

fig. 10B is a schematic diagram illustrating a method for determining a target shooting area according to at least one embodiment of the present disclosure;

FIG. 10C is a diagram illustrating a scenario in which a prompt message is displayed according to at least one embodiment of the disclosure;

FIG. 10D is a schematic diagram illustrating an effect of generating a stitched image according to at least one embodiment of the present disclosure;

FIG. 10E is a schematic diagram illustrating another effect of generating a stitched image provided by at least one embodiment of the present disclosure;

fig. 11 illustrates a schematic block diagram of an image processing apparatus 1100 provided in at least one embodiment of the present disclosure;

fig. 12A is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure;

fig. 12B illustrates a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure; and

fig. 13 illustrates a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

At present, image stitching algorithms are embedded in a lot of image processing software, and although the image stitching algorithms can help a user to stitch a plurality of images to a certain extent, the image stitching algorithms have high requirements on image quality, and images used by the user often cannot meet the quality requirements, so that the effect of stitched images generated by the image processing software is poor. Moreover, most of the current image stitching algorithms require depth data, so that the calculation process is complex, and therefore, the mobile terminal often cannot realize image stitching. In the related art, it is common to upload image data such as depth data of a plurality of images to be stitched to a server, and to stitch the plurality of images by using the server, which is time-consuming and labor-consuming.

At least one embodiment of the present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium. The image processing method comprises the following steps: acquiring a plurality of initial images; performing feature point matching on the plurality of initial images to obtain at least one matched image pair, wherein each matched image pair in the at least one matched image pair comprises a first initial image and a second initial image, and the first initial image and the second initial image are different initial images with matched feature points in between; for each of the matched image pairs, selecting one of the first initial image and the second initial image in the matched image pair as a gridded image, and performing gridding processing on the gridded image to divide the gridded image into a plurality of grids; calculating a mapping matrix corresponding to each grid in the plurality of grids for each matched image pair based on matched feature points between the first initial image and the second initial image; mapping each initial image in the plurality of initial images to a target pixel canvas to obtain a plurality of target images based on the mapping matrix corresponding to each grid; and fusing the target images to obtain a spliced image. The image processing method can ensure that the image splicing does not depend on depth data, reduces the requirements and the calculation amount of the image splicing on the image quality, is beneficial to realizing the image splicing at a mobile terminal, not only ensures that the image splicing is simpler to realize and has lower cost, but also ensures that the image splicing effect is better and more natural.

Fig. 1 shows a flowchart of an image processing method according to at least one embodiment of the present disclosure.

As shown in FIG. 1, the method may include steps S10-S60.

Step S10: a plurality of initial images are acquired.

Step S20: feature point matching is performed on the plurality of initial images to obtain at least one matched image pair.

Step S30: for each matching image pair, one of the first initial image and the second initial image in the matching image pair is selected as a gridded image, and the gridded image is subjected to gridding processing to divide the gridded image into a plurality of grids.

Step S40: and calculating a mapping matrix corresponding to each grid in the plurality of grids according to the matched characteristic points between the first initial image and the second initial image for each matched image pair.

Step S50: and mapping each initial image in the plurality of initial images into a target pixel canvas to obtain a plurality of target images based on the mapping matrix corresponding to each grid.

Step S60: and fusing the target images to obtain a spliced image.

According to the image processing method, the initial image can be gridded, and the mapping matrix corresponding to each grid is determined, so that the initial image can be mapped into the target pixel canvas according to the mapping matrix corresponding to each grid to generate the target image, the accuracy of mapping the initial image into the target pixel canvas can be improved, the quality of the spliced image is improved, and the spliced image is more natural. In addition, the image processing method does not need depth information corresponding to a plurality of initial images, does not need the plurality of initial images to meet higher image quality requirements, reduces the requirements and the computation amount of image splicing on the image quality, enables the realization of the image splicing to be simpler, does not depend on higher hardware conditions (such as a depth camera, a panoramic camera and the like), and is lower in time cost. In addition, because the image processing method has a small calculation amount, the image processing method can be directly realized at a mobile terminal without uploading to a server, namely, the image processing method is realized without depending on a network and can be suitable for any indoor or outdoor shooting scene.

For step S10, a plurality of initial images may be captured on site by the image capture device, or read from a local (e.g., mobile) storage device, for example. The manner in which the plurality of initial images are acquired is not limited by this disclosure.

The plurality of initial images may be any images that the user wants to image-stitch. There may or may not be an overlapping region between the plurality of initial images. The overlap region refers to the image content that respectively appears in both of the at least two initial images. For example, if the plurality of initial images includes a first initial image and a second initial image, the first initial image includes first image content, and the second initial image also includes the first image content, then a pixel region corresponding to the first image content in the first initial image and a pixel region corresponding to the first image content in the second initial image are overlapped regions.

For example, the plurality of initial images may be a plurality of images obtained by the user photographing in a plurality of directions at one or more photographing places, respectively. The shooting location may be any location where shooting can be performed, such as a room, an attraction, a street, a mall, and the like. For another example, the plurality of initial images may be a plurality of images selected by the user from an image library stored in the local mobile terminal, or a plurality of images downloaded by the user from the network.

For step S20, each pair of matching images in at least one pair of matching images includes a first initial image and a second initial image, which are different initial images between which there are matching feature points.

In some embodiments of the present disclosure, in step S20, the feature point matching is performed on a plurality of initial images, and besides at least one matching image pair can be obtained, at least one single initial image without matching feature points with any other initial image can be obtained.

In some embodiments of the present disclosure, for example, Feature point matching pairs between a plurality of initial images are obtained quickly by performing Feature point matching on the plurality of initial images using a Grid-based Motion Statistics for Fast, Ultra-robust Feature registration (GMS) Feature point matching method, and then at least one matching image pair is determined by screening the plurality of Feature point matching pairs. For example, a plurality of feature point matching pairs are filtered using RANSAC (Random Sample Consensus) to determine at least one matching image group.

It is understood that other methods of feature point matching may be used by those skilled in the art, and the present disclosure is not limited to the method of feature point matching.

In some embodiments of the present disclosure, the step S20 performs feature point matching on a plurality of initial images to obtain at least one matched image pair, including: for each initial image, determining an adjacent initial image of the initial image, wherein the shooting point of the adjacent initial image and the shooting point of the initial image are adjacent to each other; and performing feature point matching on the initial image and the adjacent initial image to obtain at least one matched image pair.

For example, when the image capturing apparatus captures an initial image at each of the capturing points 1,2, … …, and i (i is greater than or equal to 2), and the capturing point 2 and the capturing point 3 are adjacent to the capturing point 1, the initial image captured at the capturing point 2 and the initial image captured at the capturing point 3 may be respectively matched with the initial image captured at the capturing point 1.

The probability that matched feature points exist in a plurality of initial images shot by adjacent shooting points is high, and therefore, the overall matching speed of a system for executing the image processing method provided by the disclosure can be improved by performing feature point matching on the plurality of initial images shot by the adjacent shooting points.

Fig. 2A illustrates a schematic diagram of a matched image pair provided by at least one embodiment of the present disclosure.

As shown in fig. 2A, for example, the matching image pair may comprise a first initial image 210 and a second initial image 220. The plurality of feature points 211-214 in the first initial image 210 are respectively matched with the plurality of feature points 221-224 in the second initial image, and the feature points 211-214 and the feature points 221-224 are matched in a one-to-one correspondence manner.

It is to be understood that fig. 2A is merely a schematic illustration. For example, although only 4 pairs of matched feature points are shown in fig. 2A, in practice, the number of matched feature points between the first initial image 210 and the second initial image 220 is much more than 4 by far.

For step S30: for example, in the scene shown in FIG. 2A, either the first initial image 210 is selected as the gridded image or the second initial image 220 is selected as the gridded image. Hereinafter, unless otherwise specified, the first initial image 210 is taken as a gridded image, and the gridding process is performed on the first initial image 210 as an example to describe the embodiment of the present disclosure.

It should be noted that, different pairs of matching images may select any one of the first initial image and the second initial image as the gridding image.

Fig. 2B illustrates a schematic diagram of dividing the first initial image 210 into a plurality of grids according to at least one embodiment of the present disclosure.

As shown in fig. 2B, the first initial image 210 is divided into a plurality of grids, for example, 20 grids. The size of each grid may or may not be the same. The size of each grid can be set according to actual requirements, and the smaller the grid is, the higher the quality of the finally obtained spliced image is. For example, the grid may be rectangular, such as square, and embodiments of the present disclosure are not limited in this respect. In some examples, when the shape of the mesh is rectangular, the aspect ratio of each mesh may be the same or substantially the same as the aspect ratio of the gridded image.

In some embodiments of the present disclosure, the first initial image may be uniformly divided into a plurality of meshes, or may be non-uniformly divided into a plurality of meshes. For example, the mesh size is smaller for a matching region in the first initial image 210 in which feature points matching the second initial image 220 are distributed, and the mesh size is larger in a region other than the matching region in the first initial image.

It should be understood that although the schematic diagram shown in fig. 2B shows 4 matched feature points distributed in different grids, there may be any number of matched feature points in each grid. In addition, in fig. 2B, the division of the first initial image into 20 grids is also only a schematic representation, and the disclosure does not limit the number of the grids into which the gridded image is divided, and in fact, the size or the number of the grids can be set by a person skilled in the art according to actual needs. That is, fig. 2B is only an exemplary illustration and is not intended to limit the disclosure.

In some embodiments of the present disclosure, the gridded image is gridded, for example, by rows and columns of pixels in the gridded image. For example, a grid may comprise a 3 x 3 matrix of pixels.

For step S40: for example, for a matching image pair composed of the first initial image 210 and the second initial image 220, assuming that there are 4 matching feature points between the first initial image 210 and the second initial image 220, a mapping matrix corresponding to each of 20 grids included in the first initial image is calculated according to the 4 matching feature points. For example, the 4 matched feature points are used to calculate a mapping matrix corresponding to grid 1, a mapping matrix corresponding to grid 2, … …, and a mapping matrix corresponding to grid 20.

Fig. 3A illustrates a flowchart of a method of step S40 in fig. 1 according to at least one embodiment of the present disclosure.

Fig. 3B shows a schematic diagram provided by at least one embodiment of the present disclosure for explaining determining the weight of each pair of feature points for the grid.

As shown in fig. 3A, step S40 may include steps S41 to S43.

Step S41: for each mesh, determining a weight of each pair of feature points of the matched feature points between the first initial image and the second initial image to the mesh.

Each pair of feature points comprises a first feature point and a second feature point, and the first feature point and the second feature point are respectively matched feature points in the first initial image and the second initial image.

As shown in fig. 3B, for example, the feature point 211 in the first initial image 210 and the feature point 221 in the second initial image 220 match, and the feature point 211 and the feature point 221 are a pair of feature points. Similarly, the feature point 212 and the feature point 222 are a pair of feature points, the feature point 213 and the feature point 223 are a pair of feature points, and the feature point 214 and the feature point 224 are a pair of feature points. In step S41, for each mesh, the weights of the above-described 4 pairs of feature point pairs (feature points 211 and 221, feature points 212 and 222, feature points 213 and 223, and feature points 214 and 224) for the mesh, respectively, are determined.

For example, for mesh 1, the weight of a pair of feature points 211 and 221 for mesh 1, the weight of a pair of feature points 212 and 222 for mesh 1, the weight of a pair of feature points 213 and 223 for mesh 1, and the weight of a pair of feature points 214 and 224 for mesh 1 are determined, respectively. Similarly, the weights of each pair of feature points to the grids are also determined for each of the other grids (i.e., grid 2 to grid 20), and are not described in detail herein.

In some embodiments of the present disclosure, for example, step S41 may include determining a distance from the grid to the grid for each feature point in the grid image from the grid image, and determining a weight for the grid for each pair of feature points based on the distance from the grid for each feature point in the grid image.

In some embodiments of the present disclosure, for example, for each feature point, the feature point to mesh distance may be a calculated feature point to a pre-set point of the mesh (e.g., top left vertex, top right vertex, center, etc.). In some embodiments of the present disclosure, the distance may be, for example, a manhattan distance.

For example, in the scenario shown in FIG. 3B, for grid 1, from the gridded image (i.e., the first initial image 210), the Manhattan distance from each of the feature points 211-214 in the first initial image 210 to the top left vertex 301 of grid 1 is determined, thereby determining the weight of each pair of feature points for the grid from the Manhattan distance.

For example, the weight of a pair of feature points 211 and 221 for grid 1 is determined based on the manhattan distance of feature point 211 to the top left vertex 301 of grid 1. The weight of a pair of feature points 212 and 222 to grid 1 is determined based on the manhattan distance of the feature point 212 to the top left vertex 301 of grid 1. The weight of a pair of feature points, feature point 213 and feature point 223, for grid 1 is determined based on the manhattan distance of feature point 213 to the top left vertex 301 of grid 1. The weight of a pair of feature points, feature point 214 and feature point 224, for grid 1 is determined based on the manhattan distance of feature point 214 to the top left vertex 301 of grid 1. Similarly, the weight of each pair of feature points to other grids may be determined.

In some embodiments of the present disclosure, the distance from each feature point in the gridded image to the grid may be used as a weight of each pair of feature points to the grid, and the step of weighting may include: and normalizing the distance from each characteristic point in the gridded image to the grid, and taking the normalized result as the weight of the grid.

Step S42: first image coordinates of the feature points in the first initial image are determined, and second image coordinates of the feature points in the second initial image are determined.

For example, in the scenario shown in FIG. 3B, first image coordinates of the feature points 211-214 in the first initial image 210 in the first initial image are determined, respectively, and second image coordinates of the feature points 221-224 in the second initial image are determined, respectively.

In some embodiments of the present disclosure, the coordinate systems of the first initial image and the second initial image may be the same coordinate system, for example, both of which are based on the center of the image, and the image coordinate system is established by taking the row direction X axis of the pixels in the image and the column direction Y axis of the pixels in the image as the origin.

Step S43: and for each grid, determining a mapping matrix corresponding to the grid based on the weight of each pair of feature points to the grid and the first image coordinate and the second image coordinate.

For example, in the scenario shown in fig. 3B, for grid 1, the mapping matrix corresponding to grid 1 is determined based on the weights of 4 pairs of feature points for grid 1, and the first image coordinates of feature points 211 to 214 and the second image coordinates of feature points 221 to 224. Similarly, a mapping matrix corresponding to any grid may be determined based on the weights of the 4 pairs of feature points for any grid, and the first image coordinates of the feature points 211 to 214 and the second image coordinates of the feature points 221 to 224, which is not described herein again.

In some embodiments of the present disclosure, step S43 may include, for each mesh, constructing a singular value decomposition matrix corresponding to the mesh based on the first image coordinates, the second image coordinates, and the weights for the mesh, and performing singular value decomposition on the singular value decomposition matrix to obtain a mapping matrix corresponding to the mesh.

In some embodiments of the present disclosure, the mapping matrix corresponding to the grid may be, for example, a homography matrix representing a mapping relationship between the first initial image and the second initial image.

In some embodiments of the present disclosure, the homography matrix H may be, for example, a 3 x 3 matrix.

For example, if a feature point in the first initial image is represented as P (u, v), a feature point in the second initial image is represented as Q (x, y), and the feature point P and the feature point Q match, the following conversion relationship exists:

therefore, as can be seen from the above conversion relationship, 9 elements of the homography matrix H can be solved by at least 4 pairs of matched feature points, thereby obtaining a mapping matrix between the second initial image and the first initial image. The construction of the singular value decomposition matrix corresponding to the grid based on the first image coordinates, the second image coordinates, and the weights for the grid is described below by taking 4 pairs of matched feature points as an example.

When two initial images comprise four pairs of matched feature points (x)_i,y_i)、(u_i,v_i) (i ═ 1,2,3,4), the following system of equations can be obtained:

Ah₁＝0

wherein h is₁＝(h₁,h₂,h₃,h₄,h₅,h₆,h₇,h₈,h₉)

A＝[X₁ Y₁ X₂ Y₂ X₃ Y₃ X₄ Y₄]，

X_i＝(x_i,y_i,1,0,0,0,-u_ix_i,-u_iy_i,-u_i)

Y_i＝(0,0,0,x_i,y_i,1,-v_ix_i,-v_iy_i,-v_i)

That is to say that the first and second electrodes,

and taking the distance between each pair of feature points and the grid as the weight of the feature points to the grid, and obtaining WAH which is 0, wherein W is a weight matrix formed by each pair of feature points.

Therefore, in the embodiment of the present disclosure, constructing a singular value decomposition matrix corresponding to the grid based on the first image coordinates, the second image coordinates, and the weights for the grid may be a matrix obtained by left-multiplying the matrix a by the weight matrix (i.e., WA).

In some embodiments of the present disclosure, feature points close to the grid are given a greater weight, and feature points far from the grid are given a lesser weight, so that the computed homography matrix H is more accurate. This is because the closer the feature points are to the grid and the more the grid is to one plane, the more the homography matrix H calculated from the feature points on one plane is, the more accurate. For example, each pair of feature points is given different weight because the position of each pair of feature points is different.

In some embodiments of the present disclosure, the homography matrix H is calculated by performing Singular Value Decomposition (SVD) on the Singular Value Decomposition matrix, and the homography matrix H corresponding to the grid can be obtained.

Although the singular value decomposition matrix is constructed with 4 pairs of matched feature points in the above example of constructing the singular value decomposition matrix, in practice, one skilled in the art may construct the singular value decomposition matrix using any number of matched feature points. For example, the singular value decomposition matrix may be constructed using all the matched feature points, and the homography matrix H may be solved using SVD, which is not limited by the present disclosure.

Fig. 4 illustrates a flowchart of a method of providing step S50 in fig. 1 according to at least one embodiment of the present disclosure.

As shown in fig. 4, the method may include steps S51 to S53.

Step S51: and determining target pose information corresponding to each grid based on the mapping matrix corresponding to each grid.

For example, the target pose information respectively corresponding to each grid may refer to a pose adopted when the image acquisition device acquires an initial image to which the grid belongs. The pose adopted by the image acquisition device when acquiring each initial image can be recovered through the mapping matrix corresponding to each grid. In the embodiment of the disclosure, an error exists between the target pose information restored by the mapping matrix corresponding to each grid and the actual pose respectively adopted when the image acquisition device acquires each initial image, and the smaller the error is, the higher the quality of the finally obtained image mosaic is.

Step S51 is explained by the following embodiments, and will not be described herein.

Step S52: and determining projection information of each grid in the curved surface projection based on the target pose information corresponding to each grid.

In some embodiments of the present disclosure, the curved surface projection may include, for example, a spherical projection, a cylindrical projection, etc., and the present disclosure does not limit the type of the curved surface projection. For example, in an application scene in which a plurality of initial images are stitched to obtain a panoramic image, the curved surface projection may be a spherical surface projection, so that it can be ensured that top and bottom information of the stitched panoramic image is not lost under a certain condition, and a 3D effect can be perfectly displayed. The top and bottom are for example relative to the image capturing device, e.g. the top may refer to the area on the side of the image capturing device remote from the ground and the bottom may refer to the area on the side of the image capturing device close to the ground. For example, in an application scenario in which a plurality of initial images are stitched to obtain a panoramic image of a room, the curved surface projection may be a spherical surface projection, so that it can be ensured that the stitched panoramic image does not lose information of a roof and a floor under a certain condition. The "certain condition" may be, for example, that the angle of view of the image pickup device is large.

The explanation of step S52 by the following embodiments is not repeated herein.

Step S53: and for each initial image, mapping the initial image into a target pixel canvas to generate a target image based on the projection information of each mesh corresponding to the matching image pair in which the initial image is positioned in the curved surface projection.

In some embodiments of the present disclosure, the projection information includes longitude and latitude of a pixel point in the initial image.

As shown in fig. 3B, for the initial image 210, according to the projection information of each mesh in the initial image 210 in the surface projection, the initial image 210 is mapped into the same target pixel canvas to generate a target image corresponding to the initial image 210.

Fig. 5A illustrates a flowchart of a method of step S51 in fig. 4 according to at least one embodiment of the present disclosure.

As shown in fig. 5A, step S51 may include steps S511 to S513.

Step S511: a reference image is determined from a plurality of initial images.

For example, one initial image is selected from the plurality of initial images as a reference image, and accordingly, the pose of the reference image is taken as the reference pose.

For example, the plurality of initial images may include an initial image 1 to an initial image N, and one initial image is selected from the initial image 1 to the initial image N as a reference image.

In some embodiments of the present disclosure, for example, a plurality of initial images may be sorted, the initial image ranked first may be selected as the reference image, or the initial image ranked last may be selected as the reference image. For example, the ordering may be in the order of acquisition of the initial images. Sequencing according to the acquisition sequence of different initial images in each matching image group can facilitate the subsequent calculation of determining the target pose information according to the mapping matrix. In other embodiments of the present disclosure, for example, a selection criterion of a reference image may be set, and then an initial image may be selected from a plurality of initial images as the reference image according to the selection criterion. For example, the selection criterion may be the number of matched feature points, the size of the overlapping area, the size of the connected area, and the like. For example, one initial image is selected as the reference image from the plurality of initial images based on the feature points with which different initial images match in each of the matching image pairs. For example, if the number of feature points where matching exists between the initial image 1 and the initial image 2 is 100, the number of feature points where matching exists between the initial image 2 and the initial image 3 is 200, and the number of feature points where matching exists between the initial image 3 and the initial image 4 is 50, the initial image 2 may be selected as the reference image.

Step S512: and determining first pose information of each grid relative to the reference image based on the mapping matrix corresponding to each grid and the reference image.

As shown in fig. 3B, first pose information of grid 1 with respect to the reference image (i.e., second initial image 220) is determined, for example, based on the mapping matrix to which grid 1 corresponds. The first pose information may include, for example, a rotation matrix and a translation vector. The rotation matrix and translation vector of grid 1 with respect to the second initial image 220 may be recovered by the mapping matrix of grid 1.

For another example, in addition to the matching feature points between the first initial image 210 and the second initial image 220, there are also matching feature points between the first initial image 210 and the initial image N, that is, the first initial image 210 and the initial image N are also a matching image pair. In this embodiment, for example, the initial image N may be gridded to determine a mapping matrix for each grid in the initial image N relative to the first initial image 210, such that the rotation matrix and the translation vector for each grid in the initial image N relative to the first initial image 210 may be recovered by the mapping matrix. First pose information of initial image N relative to the reference image may then be determined based on the rotation matrix and translation vector between first initial image 210 and the reference image (i.e., second initial image 220).

Similarly, first pose information of the images in any one of the matched image pairs with respect to the reference image may be determined.

In some embodiments of the present disclosure, the plurality of initial images may be divided into a plurality of image groups, each image group including at least one of the at least one pair of matching images, and there may be matching feature points between different pairs of matching images in each image group, and there may be no matching feature points between different image groups.

For example, for a plurality of initial images in one image group, the plurality of initial images are associated, and different image groups are not associated. For example, there are matching feature points between the initial image 1 and the initial image 2, between the initial image 2 and the initial image 3, and between the initial image 4 and the initial image 5, but neither the initial image 4 nor the initial image 5 has matching feature points with any of the initial image 1, the initial image 2, and the initial image 3, that is, the initial image 1 and the initial image 3 are associated with the initial image 2, and neither the initial image 4 nor the initial image 5 has matching feature points with any of the initial image 1, the initial image 2, and the initial image 3. Accordingly, the primary image 1 and the

primary images

2 and 3 can be one image group, and the primary image 4 and the primary image 5 can be one image group.

For example, there is a matching feature point between each initial image in each image group and at least one initial image in the image group. For example, the at least one image group includes a first image group including N initial images, and for each of the N initial images, there is a matching feature point between at least one of the other N-1 initial images in the first image group and the initial image. For example, the at least one image group further includes a second image group between which there is no matching feature point.

In some embodiments of the present disclosure, for example, in step S20, the plurality of initial images may be one image group, that is, the plurality of initial images are associated, and for any one of the plurality of initial images, there is a feature point where there is a match between another initial image and the initial image. For example, in some embodiments of the present disclosure, the N initial images are initial image 1 to initial image N, respectively, there are matching feature points between initial image 1 and initial image 2, there are matching feature points between initial image 2 and initial image 3, … …, initial image N-1 and initial image N, and there are matching feature points between initial image N and initial image 1.

Fig. 5B shows a flowchart of a method of step S512 provided by at least one embodiment of the present disclosure.

As shown in fig. 5B, step S512 may include step S5121 and step S5122.

Step S5121: for each grid, in the case that the initial image to which the grid belongs and the reference image belong to the same image group, first position information of the grid relative to the reference image is determined based on the mapping matrix corresponding to the grid and the reference image.

Step S41 and the following steps or methods are described below with reference to fig. 5C and taking two image groups as an example.

Fig. 5C illustrates a schematic diagram of two image groups provided by at least one embodiment of the present disclosure.

As shown in fig. 5C, the two image groups are an image group 510 and an image group 520, respectively, and the image group 510 includes an initial image 311, an initial image 312, and initial images 313 and … …. The image group 520 includes initial images 321, … ….

With step S5121, for example, the initial image 311 in the image group 510 is selected as the reference image.

For example, the initial image 312 and the initial image 311 are a matched image pair, and the initial image 312 is a gridded image. For each mesh in initial image 312, a mapping matrix corresponding to each mesh may be obtained by using the method provided in step S40, so that the first pose information (i.e., the rotation matrix and the translation vector) of each mesh in initial image 312 with respect to the reference image (i.e., initial image 311) may be restored by using the mapping matrix. For example, the initial image 312 and the initial image 313 are a matched image pair, and the initial image 313 is a gridded image. For each mesh in the initial image 313, the mapping matrix corresponding to each mesh may be obtained by using the method provided in step S40, so that the pose information of each mesh in the initial image 313 with respect to the initial image 312 may be recovered by using the mapping matrix, and the first pose information of each mesh in the initial image 313 with respect to the initial image 311 may be determined according to the first pose information of each mesh in the initial image 312 with respect to the initial image 311. By analogy, the first pose information for each grid in the set of images 310 can be obtained.

In some embodiments of the present disclosure, the first bit position information may be recovered, for example, by decomposing the mapping matrix. For example, the mapping matrix is decomposed numerically or analytically to obtain a rotation matrix and a translation vector between the two initial images in the matched image pair. Each initial image in the set of images is then transformed into first pose information relative to a reference pose according to a plurality of matching image pairs.

Step S5122: in the case that the initial image to which the grid belongs and the reference image do not belong to the same image group, determining a pose relationship between the image group in which the initial image to which the grid belongs and the image group in which the reference image belongs, and determining first pose information of the grid relative to the reference image based on the pose relationship and a mapping matrix corresponding to the grid.

In some embodiments of the present disclosure, for example, determining a pose relationship between an image group in which an initial image to which a mesh belongs and an image group in which a reference image belongs includes: and determining the pose relationship between the image group of the initial image to which the grid belongs and the image group of the reference image based on the construction pose information corresponding to at least one initial image respectively included in the image group of the initial image to which the grid belongs and the image group of the reference image. The pose information comprises poses adopted in the process that the image acquisition device constructed by the sensor acquires each initial image.

In some embodiments of the present disclosure, the plurality of initial images are obtained by an image capturing device, the image capturing device includes a sensor, and the image processing method may further include, on the basis of the foregoing embodiments: and acquiring construction pose information constructed by the sensor, wherein the construction pose information comprises a pose adopted in the process of acquiring each initial image by the image acquisition device so as to determine the pose relationship among different image groups according to the construction pose information.

For example, the image capture device may be a mobile terminal that includes a camera. In the embodiment of the present disclosure, the sensor may include hardware such as a camera and a gyroscope installed at the mobile terminal, and may further include software for processing data of the camera and the gyroscope.

In some embodiments of the present disclosure, for example, the sensor may be a synchronous positioning and mapping sensor, and the synchronous positioning and mapping sensor constructs a pose adopted in the process of acquiring each initial image by a synchronous positioning and mapping (SLAM) algorithm. For example, the sensor may be an Inertial sensor (IMU), and the IMU may construct a pose adopted in the process of acquiring each initial image by the image acquisition device through 9-axis sensors (including a 3-axis accelerometer, a 3-axis gyroscope, and a 3-axis magnetometer) in the mobile terminal.

In other embodiments of the present disclosure, the sensor includes a first sensor and a second sensor, and the acquiring pose information corresponding to each of the plurality of initial images includes: and acquiring first position and posture data and second position and posture data, and fusing the first position and posture data and the second position and posture data to obtain construction position and posture information corresponding to the plurality of initial images respectively. The first pose data is the pose adopted by the image acquisition device constructed by the first sensor to acquire each initial image, and the second pose data is the pose adopted by the image acquisition device constructed by the second sensor to acquire each initial image. The construction pose information corresponding to the initial images is obtained by performing data fusion on the first pose data and the second pose data, so that the accuracy and the stability of the construction pose information can be improved, and the quality of the spliced image is improved.

In some embodiments of the present disclosure, for example, the first sensor may be the aforementioned synchronized positioning and mapping sensor and the second sensor may be the aforementioned IMU.

It is understood that the first sensor is the above mentioned synchronous positioning and mapping sensor, and the second sensor is the above mentioned IMU, which is only one embodiment provided by the present disclosure, and those skilled in the art can use any sensor capable of acquiring pose information to obtain the first and second pose data.

In some embodiments of the present disclosure, fusing the first pose data and the second pose data to obtain construction pose information corresponding to each of the plurality of initial images, including: and fusing the first position and posture data and the second position and posture data by using an extended Kalman filter to obtain construction position and posture information respectively corresponding to the plurality of initial images. Extended Kalman Filtering (EKF) is performed on the first and second position and attitude data by using an Extended Kalman Filter, so that the fusion efficiency of the first and second position and attitude data is high.

Of course, those skilled in the art can also adopt other data fusion methods different from the EKF, such as algebraic method, image regression method, etc., or directly tightly couple pose data constructed by different sensors. The present disclosure does not limit the data fusion method of the first and second attitude data.

For example, different image groups include a first image group including the original image 1 to the original image N and a second image group including the original image X1 to the original image XN, the pose relationship between any one original image i of the original image 1 to the original image N and any one original image Xk of the original image X1 to the original image XN can be determined from the construction pose information. The positional relationship between the initial image i and the initial image Xk is taken as the positional relationship between the first image group and the second image group.

For example, in the scenario shown in fig. 5C, the pose relationship between the initial image 311 and the initial image 321 may be determined in step S5122 from the constructed pose information, thereby regarding the pose relationship between the initial image 311 and the initial image 321 as the pose relationship between the image group 310 and the image group 320.

In some embodiments of the present disclosure, determining a pose relationship between different image groups based on construction pose information corresponding to at least one initial image respectively included in the different image groups may include: determining construction pose information (for example, a first construction pose and a second construction pose respectively) of two initial images respectively belonging to different image groups in the construction pose information, and then calculating a homography matrix between the first construction pose and the second construction pose so as to recover a rotation matrix and a translation vector of the second construction pose relative to the first construction pose according to the homography matrix or recover a rotation matrix and a translation vector of the first construction pose relative to the second construction pose according to the homography matrix.

It is also possible for the individual initial images mentioned above to determine the pose relationship between the individual initial images and the reference image from the constructed pose information.

In step S5122, after determining the pose relationship between the image group in which the initial image to which the mesh belongs and the image group in which the reference image belongs, the first pose information of the mesh with respect to the reference image may be determined based on the pose relationship and the mapping matrix to which the mesh corresponds.

For example, in the scenario shown in fig. 5C, for example, for the initial image 321 in the second image group 320, since the reference image is the initial image 311 in the first image group 310, it is necessary to determine the pose relationship between the first image group 310 and the second image group 320, and then convert the mapping matrix corresponding to the mesh in the initial image 321 into the first pose information with respect to the reference image (i.e., the initial image 311) according to the pose relationship between the first image group 310 and the second image group 320. And the pose relationship between the first image group 310 and the second image group 320 may be determined, for example, based on the construction pose information described above. The pose relationship between the initial image 311 and the initial image 321 can be determined from the constructed pose information and is the pose relationship between the first image group 310 and the second image group 320, for example.

Step S513: target pose information for each grid is determined based on first pose information for each grid relative to a reference image.

In some embodiments of the present disclosure, the first pose information of each mesh with respect to the reference image is data fused with the build pose information to obtain target pose information for each mesh.

For example, the first pose information of each mesh with respect to the reference image and the constructed pose information corresponding to the initial image to which the mesh belongs are data-fused to obtain the target pose information of each mesh.

In some embodiments of the present disclosure, data fusing the first pose information and the build pose information of each grid with respect to the reference image to obtain the target pose information of each grid includes: converting the constructed pose information into a coordinate system taking the reference image as a reference to obtain second pose information; performing data fusion on the second attitude information and the first attitude information to obtain a fusion screening result; and determining the target pose information based on the fusion screening result.

In some embodiments of the present disclosure, the construction pose information and the first pose information are data-fused to obtain a fusion filtering result, for example, using EKF.

In some embodiments of the present disclosure, the data fusion may further comprise data screening. For example, the second posture information and the first posture information may be scored to screen a fused screening result from the second posture information and the first posture information. For example, if the score of the first pose information is higher than the score of the second information, the first pose information may be used as the fusion filtering result. For example, for a matching image pair including an initial image a and an initial image B, a first mapping result of the initial image B is obtained by mapping the initial image B into the initial image a according to the second pose information, a second mapping result of the initial image B is obtained by mapping the initial image B into the initial image a according to the first pose information, then differences between the first mapping result and the second mapping result and the initial image a are compared, and the first pose information and the second pose information are scored according to the differences, and the score obtained by scoring is inversely related to the difference between the mapping result and the initial image.

In some embodiments of the present disclosure, for each initial image pair, a more accurate pose information may be filtered out from the first and second pose information as a result of the data filtering using the above-described method.

In some embodiments of the present disclosure, for example, determining target pose information based on the fusion screening results may include: and processing the fusion screening result by using a parameter optimization method to obtain target pose information.

The parameter optimization method is used for further optimizing the fusion screening result, so that the accuracy of the target pose information can be further improved, the image splicing quality can be improved, and the spliced image obtained by image splicing is more natural.

The parameter optimization method may be, for example, a Bundle Adjustment (BA). Of course, those skilled in the art may also optimize the fusion screening result by using other parameter optimization methods to obtain the target pose information.

Fig. 6A illustrates a flowchart of a method of step S52 in fig. 4 according to at least one embodiment of the present disclosure.

As shown in fig. 6A, step S52 may include steps S521 to S523.

Step S521: and converting each grid into a world coordinate system based on the target pose information corresponding to each grid to obtain the world coordinate information of each grid in the world coordinate system.

In some embodiments of the present disclosure, for example, for each grid, the target pose information may be utilized to convert each pixel point in the grid from the pixel plane coordinate system to the world coordinate system, so as to obtain world coordinate information of the grid in the world coordinate system. The world coordinate information of the grid in the world coordinate system is the world coordinate of each pixel point in the grid in the world coordinate system. Each pixel point in the grid is converted from a pixel plane coordinate system to a world coordinate system, and the conversion can be carried out according to a camera calibration method.

For example, 4 coordinate systems are used in the camera calibration process, respectively, a world coordinate system, a camera coordinate system, an image physical coordinate system, and a pixel plane coordinate system. According to the conversion relation among the 4 coordinate systems in the camera calibration method, the following conversion relation can be obtained between the coordinates of the pixel points in the pixel plane coordinate system and the world coordinates of the pixel points in the world coordinate system:

wherein R is a rotation matrix, t is a translation vector, (u, v) are pixel coordinates of a pixel point in a pixel plane coordinate system, and Z_CFor the Z-axis coordinate of the pixel in the camera coordinate system, dx may be the ratio of the number of pixels per line of the camera in the x-direction to the size of the camera in the x-direction, dy may be the ratio of the number of pixels per line of the camera in the y-direction to the size of the camera in the y-direction, (u)₀,v₀) May be the image coordinates of the origin of the image physical coordinate system in the image pixel coordinate system, f may be the focal length of the camera, (X)_w， Y_w，Z_w) Is the world coordinate, M, of a pixel in the world coordinate system₁For camera internal reference, M₂Is an external reference of the camera.

Camera external parameter M₂The pose (i.e., position and orientation) of the camera in three-dimensional space is determined.

In some embodiments of the present disclosure, for each pixel point in the grid, the pixel coordinate of each pixel point in the pixel plane coordinate system may be converted into a world coordinate in the world coordinate system according to the above conversion relationship. For example, for a grid, the target pose information (i.e., R and t) corresponding to the grid and the pixel coordinates of each pixel point in the grid are substituted into the transformation relation, so that the world coordinates of each pixel point in the world coordinate system are calculated by using the transformation relation.

Step S522: and determining the curved surface coordinate information of each mesh in the curved surface projection according to the world coordinate information of each mesh.

As described above, the curved surface projection may include, for example, a spherical projection, a cylindrical projection, etc., and the present disclosure does not limit the type of the curved surface projection. The method of step S522 is explained below by taking spherical projection as an example.

For example, for each mesh, the world coordinates of each pixel point in the mesh are converted to spherical coordinates in a spherical projection. The curved surface coordinate information of each grid in the spherical projection is the spherical coordinate of each pixel point in each grid in the spherical projection.

Fig. 6B illustrates a schematic view of a spherical projection provided by some embodiments of the present disclosure.

For example, the origin of the spherical coordinate system is at the center of the sphere, and the spherical projection is the sphere formed by a unit sphere.

As shown in fig. 6B, a pixel point (X) in the world coordinate system_w，Y_w，Z_w) The spherical coordinates in the spherical projection are (x ', y ', z ').

In the spherical projection shown in fig. 6B, normalizing the pixel point from world coordinates to spherical coordinates can be calculated according to the following formula:

step S523: and converting the curved surface coordinate information of each mesh in the curved surface projection into projection information.

In some embodiments of the present disclosure, the projection information includes the longitude and latitude of the pixel points in the initial image.

In some embodiments of the present disclosure, to represent three-dimensional spherical coordinates as planar coordinates, two variants are establishedQuantity θ and

theta is the included angle between the Z axis and the projection line of the ray on the ZX plane, clockwise is negative,

is the angle between the Y-axis and the ray, clockwise is negative.

The conversion relationship between the spherical coordinates and the planar coordinates is as follows:

from the above equation, θ and

where theta is the longitude in the projection information,

is the latitude in the projected information.

The embodiment described in fig. 6A converts the pose information into the projection information of the initial image in the curved surface projection, so that the influence caused by the translation between the plurality of initial images can be at least partially eliminated, and the initial images do not need to be amplified or compressed, and the like, so that the definition of the images can be ensured.

Fig. 7A illustrates a flowchart of a method of step S53 in fig. 4 according to at least one embodiment of the present disclosure.

As shown in fig. 7A, step S53 may include step S61 and step S62.

Step S61: the size of the target pixel canvas is determined.

In some embodiments of the present disclosure, the size of the target pixel canvas may be set by one skilled in the art according to actual needs.

For example, step S61 determining the size of the target pixel canvas may include: acquiring acquisition parameters of an image acquisition device, determining the number of a plurality of initial images, determining a rotation angle between every two adjacent initial images in the plurality of initial images, determining an overlapping area between every two adjacent initial images based on a field angle and the rotation angle, and determining the size of a target pixel canvas based on the number, the overlapping area and the image size.

In some embodiments of the present disclosure, the acquisition parameters include an image size of an image generated by the image acquisition device and a field angle of the image acquisition device.

The image size and the angle of view of the image generated by the image capturing device are determined by the image capturing device itself. For example, the image size generated by the image capture device may be h height x w width, w and h each being an integer greater than 0. For example, w is 1024 and h is 512. The field angle fov of the image capture device may be, for example, 60 °.

For example, the acquisition parameters of the image acquisition device may be obtained by directly reading a parameter table that is fixed into the image acquisition device. Alternatively, the acquisition parameters of the image acquisition device may be preset by those skilled in the art.

In some embodiments of the present disclosure, the rotation angle between each two adjacent initial images may be determined by the above-mentioned object pose information, that is, by the rotation matrix R.

Figures 7B and 7C illustrate a schematic diagram of a method of determining a size of a target pixel canvas, provided by at least one embodiment of the present disclosure.

The above-described embodiment is described below taking two adjacent initial images as a first initial image and a second initial image as an example. For example, as shown in fig. 7B, the field angle fov of the camera is 60 °, and if the camera rotates at 30 ° from the time when the first initial image is captured to the time when the second initial image is captured, the field of view of the camera capturing the first initial image and the field of view of the camera capturing the second initial image overlap (as shown by the shaded portion in fig. 7B). From the camera's field angle fov of 60 ° and the angle of rotation, it can be estimated that the overlapping portion occupies 1/2 of the entire field of view of the camera, and therefore the overlapping area between the two adjacent initial images occupies 1/2 of each initial image. As shown in fig. 7C, in this embodiment, the overlapping area between the first initial image and the second initial image is an area a (i.e., a filled area made up of horizontal lines). That is, the image content corresponding to the region a is both the content in the first initial image and the content in the second initial image, and in this embodiment, the pixel region corresponding to the region a occupies approximately 1/2 of the entire pixel region in the first initial image, and the pixel region corresponding to the region a occupies approximately 1/2 of the entire pixel region in the second initial image.

For example, if the overlapping area of the first initial image and the second initial image is area a and the image size of one image is h height × w width, the size of the target pixel canvas determined by the first initial image and the second initial image may be h height × 3w/2 width or slightly larger than h height × 3w/2 width.

For example, if the number of initial images is N, the field angle fov of the camera is 60, and the angle of rotation of the camera between each two initial images is 30, the size of the target pixel canvas may be h height X (N-1) w/2 wide or slightly larger than h height X (N-1) w/2 wide.

Step S62: and for each initial image, determining the position of a pixel point in each grid in the target pixel canvas based on the size of the target pixel canvas and the projection information of each grid corresponding to the matching image pair where the initial image is located in the curved surface projection, so that each grid is mapped into the target pixel canvas to generate the target image.

In some embodiments of the present disclosure, for each initial image, the position of the pixel point in the target pixel canvas is calculated by the following formula:

wherein c is the column of each pixel point in the target pixel canvas,r is the row of each pixel point in the target pixel canvas, theta is the longitude of the pixel point,

is the latitude, I, of a pixel_WIs the width of the target pixel canvas, I_HIs the height of the target pixel canvas.

Figures 7D and 7E show schematic diagrams of a target image 401 and a target image 402, respectively, generated by mapping a first initial image and a second initial image into a target pixel canvas, respectively.

As shown in FIG. 7D, the target image 401 includes a target pixel canvas 411 and a first initial image 421 located within the target pixel canvas 411.

Target pixel canvas 411 of size I_H×I_WThen the location in the target pixel canvas to which the first initial image 421 maps may be determined according to the calculation of c and r above, thereby mapping the first initial image 421 to the location to generate the target image 401.

As shown in FIG. 7E, the target image 402 includes a target pixel canvas 431 and a second initial image 441 located within the target pixel canvas 431.

The target pixel canvas 431 and the target pixel canvas 411 are both the same size and shape. For example, the target pixel canvas 431 and the target pixel canvas 411 are each sized according to the method described above with reference to FIG. 7A. The location in the target pixel canvas to which the second initial image 441 is mapped may be determined from the calculation of c and r above, thereby mapping the second initial image 441 to the location to generate the target image 402.

Similarly, other initial images of the plurality of initial images may be mapped into the target pixel canvas respectively according to the method described above to obtain a plurality of target images.

In some embodiments of the present disclosure, the fusing the plurality of target images to obtain the stitched image in step S70 includes: and carrying out weighted fusion on the plurality of target images to obtain a spliced image.

Fig. 8A is a flowchart illustrating a method for performing weighted fusion on a plurality of target images to obtain a stitched image according to at least one embodiment of the present disclosure.

Fig. 8B and 8C are schematic diagrams illustrating weighted fusion of a plurality of target images to obtain a stitched image according to at least one embodiment of the present disclosure.

An embodiment of performing weighted fusion on a plurality of target images to obtain a stitched image is described below with reference to fig. 8A, 8B, and 8C.

As shown in fig. 8A, the method may include steps S71 to S73.

For step S71: overlapping and non-overlapping regions of each of the plurality of target images are determined based on the position of the plurality of initial images in the target pixel canvas, respectively.

In some embodiments of the present disclosure, the overlapping region is a region where there are a plurality of matching feature points between each target image and other target images than the target image in the plurality of target images, and the non-overlapping region is a region other than the overlapping region in the target images.

Fig. 8B and 8C show the overlapping areas of the target image 401 and the target image 402, respectively. As shown in fig. 8B, since a plurality of feature points in the region W in the target image 401 in fig. 8B match a plurality of feature points in the target image 402, the region W in fig. 8B is an overlapping region. Similarly, since a plurality of feature points in the region V in the target image 402 in fig. 8C match a plurality of feature points in the target image 401, the region V in fig. 8C is an overlapping region.

For step S72: the corresponding weights of the pixel points in the overlap region are determined.

For example, for the target image 401, the weight corresponding to each pixel point in the overlap region W is determined, and for the target image 402, the weight corresponding to each pixel point in the overlap region V is determined.

In some embodiments of the present disclosure, step S72 may include determining a manhattan distance of a pixel point in the overlap region to the initial image center; and determining weights corresponding to the pixel points in the overlap region based on the manhattan distance.

As shown in fig. 8B, for a pixel point Q in the overlapping area W of the target image 401, step S72 determines the manhattan distance S1 between the pixel point Q to the center O1 of the initial image 421. Similarly, in the target image 402, the feature point matching the pixel point Q is the feature point P, step S72 determines the manhattan distance S2 between the pixel point P and the center O2 of the initial image 441, normalizes the manhattan distance S1 and the manhattan distance S2, and takes the normalized result as the weight of the pixel point Q and the pixel point P, respectively.

For step S73: and performing weighted fusion on the plurality of target images based on the weights corresponding to the pixel points in the overlapping area to obtain a spliced image.

For example, for the target image 401 and the target image 402, weighted fusion is performed on pixel values corresponding to the pixel point Q and the pixel point P, so as to obtain a pixel value of a pixel point (a pixel point at a position corresponding to the pixel point Q and the pixel point P) in the stitched image.

Fig. 8D is a schematic diagram showing a stitched image obtained by stitching the target image 401 and the target image 402.

For example, for a non-overlapping region in each of the plurality of target images, the pixel value of a pixel in the non-overlapping region is directly used as the pixel value of the pixel in the stitched image. For example, the pixel value of each pixel point in the non-overlapping region (i.e., the region on the left side of the region W in the initial image 421) other than the overlapping region W in the target image 401 is directly used as the pixel value of the pixel point in the stitched image. That is, as shown in fig. 8D, the region H1 in the stitched image is obtained from the non-overlapping region in the target image 401, and the pixel values of the pixels in the region H1 are the pixel values in the non-overlapping region in the target image 401. Similarly, the region H3 in the stitched image is obtained from the non-overlapping region in the target image 402, and the pixel values of the pixels in the region H3 are the pixel values in the non-overlapping region in the target image 402.

For example, for the overlapped region, the pixel values of the pixels in the overlapped region in the two target images are subjected to weighted average calculation, and the result of the weighted average calculation can be used as the pixel value of the pixel in the spliced image. As shown in fig. 8D, the pixel values of the pixels in the region H2 in the stitched image are obtained by performing weighted average calculation based on the pixel values of the pixels in the overlapping region of the target image 401 and the target image 402.

For step S70, a person skilled in the art may also use other image fusion methods to fuse a plurality of target images to obtain a stitched image. For example, a plurality of target images may be fused by poisson fusion to obtain a stitched image.

It should be understood that fig. 8D is only an example of a stitched image obtained by stitching two initial images of a plurality of initial images, and actually, the region in the target pixel canvas in fig. 8D other than the stitched image obtained by stitching the two initial images also includes the result of stitching other initial images.

In some embodiments of the present disclosure, the image processing method may further comprise cropping the stitched image. For example, the stitched image in FIG. 8D is cropped to crop off excess portions of the target pixel canvas, i.e., to crop off portions of the target pixel canvas that are not occupied by the stitched image.

Fig. 9 shows a flowchart of another image processing method provided in at least one embodiment of the present disclosure.

As shown in fig. 9, the image processing method further includes a step S70 in addition to the steps S10 to S60 described in fig. 1. Step S70 may be performed, for example, after step S50 and before step S60.

Step S70: and carrying out illumination homogenization treatment on the plurality of target images to enable the illumination intensity of the plurality of target images to be uniform.

In some embodiments of the present disclosure, the illumination homogenization process is performed on multiple target images, for example, using a High-Dynamic Range (HDR) technique. Of course, other image processing methods may be used by those skilled in the art to perform illumination homogenization on multiple target images.

In some embodiments of the present disclosure, a plurality of actually photographed initial images are affected by exposure of a camera, so that illumination of an image sequence is not uniform, if illumination processing is not performed, a synthesized stitched image will have obvious illumination segmentation, that is, illumination is not uniform, which may affect user experience, and illumination homogenization processing is performed on a plurality of target images. The image splicing can be more natural, and the user experience is improved.

Fig. 10A illustrates a flowchart of another image processing method provided in at least one embodiment of the present disclosure.

As shown in fig. 10A, the image processing method may further include steps S110 to S130 on the basis of the steps described in fig. 1 or fig. 9. Steps S110 to S130 may be executed before step S10, for example.

Step S110: and acquiring the shooting pose of the image acquisition device.

In some embodiments of the present disclosure, the shooting pose of the image capture device may be user-selected. For example, the user picks up the mobile terminal in a shooting pose to shoot, at which point the mobile terminal can perform shooting initialization, and after the shooting initialization, determines the shooting pose of the mobile terminal. For example, the mobile terminal may be a mobile phone or a tablet computer, or may be other suitable mobile terminal devices.

Step S120: and determining at least one target shooting area in the shooting environment based on the shooting pose.

In some embodiments of the present disclosure, the at least one target photographing region may be preset. For example, at least one target image capture area is determined according to the environment to be captured and the field angle of the image capture device. For example, if the field angle of the image capturing device is 60 ° and the field angles of the two adjacent images captured by the camera overlap by 30 °, 12 target capturing areas can be designed to guide the user to capture a panoramic image of a circle around the capturing position of the user.

For another example, in a scene of shooting a panoramic image, a spherical area is determined according to the current position of the moving end, the center of the sphere of the spherical area is the current position, and at least one circular track parallel to the horizontal plane is selected from the spherical area to determine the target shooting area according to the circular track. The circular locus may be a circle corresponding to the equator in the spherical region, a circle corresponding to the return line of north and south, or a circle of other latitudes, and each circular locus is parallel to the horizontal plane.

It should be noted that, in the acquisition process, the spherical region may not be displayed in the graphical user interface of the mobile terminal, and the center of the sphere of the spherical region is the position point of the mobile terminal in the physical space.

Fig. 10B is a schematic diagram illustrating a method for determining a target shooting area according to at least one embodiment of the present disclosure.

As shown in fig. 10B, a spherical area is determined with the current position of the moving end as the center of the sphere. For example, two circular trajectories parallel to the horizontal plane, corresponding to the return to south and north lines, are selected from the spherical area. The two circular trajectories are respectively used as reference lines of two target photographing regions, and then each reference line radiates a preset distance outwards to respectively form the two target photographing regions, namely, two regions are selected as the target photographing regions in the spherical region. The first target shooting area positioned on the first side (the side far away from the ground) of the camera is used for acquiring an initial image of a position, relatively above the user, in the physical space; the second target shooting area positioned at the second side (the side close to the ground) of the camera is used for acquiring an initial image of a position relatively below the physical space where the user is positioned, so that the panoramic information of the physical space where the user is positioned can be completely acquired. The method can realize the acquisition of the panoramic information through the terminal, overcomes the problem of depending on fixed equipment, effectively reduces the acquisition cost and simplifies the acquisition flow. It is to be understood that the target photographing region may be one, two, three, or the like, and the present disclosure does not limit the number of the target photographing regions.

Step S130: and displaying prompt information based on the at least one target shooting area to prompt a user to acquire a plurality of initial images at a plurality of shooting points respectively.

The prompt may be, for example, an icon. For example, a first icon and a second icon may be displayed on the interface of the moving end, where the first icon is a current position at which the image capturing device is actually aligned, the second image is a target position at which the image capturing device needs to be aligned, which is determined according to the target shooting area, the first icon changes with a change in the shooting pose of the image capturing device until the first icon and the second image are aligned (i.e., overlapped or partially overlapped), it is determined that the image capturing device is aligned with the target position, and the initial image may be captured.

In some embodiments of the present disclosure, displaying a prompt message based on the at least one target shooting area includes: displaying at least one acquisition guide area on the basis of at least one target shooting area, wherein the at least one acquisition guide area corresponds to the at least one target shooting area respectively; and displaying prompt information, wherein the prompt information indicates the reference shooting point currently aligned with the image acquisition device. And under the condition that the prompt information falls into a target acquisition guide area in at least one acquisition guide area, the reference shooting point currently aligned with the image acquisition device is the shooting point in the target shooting area corresponding to the target acquisition guide area.

Fig. 10C is a schematic view illustrating a scene for displaying prompt information according to at least one embodiment of the present disclosure.

As shown in fig. 10C, the scene includes the electronic device 1000, the electronic device 1000 is the aforementioned mobile terminal, and the electronic device 1000 is performing initial image acquisition. During the acquisition process, an acquisition guide area and an acquisition sight are displayed on a graphical user interaction interface of the electronic device 1000.

For example, the acquisition guidance area may be an area that guides the user to perform panoramic information acquisition. As shown in fig. 10C, the acquisition guide area may be identified by a plurality of guide loops, guide rings, or the like. With the movement of the mobile terminal, the mobile terminal may display one acquisition guide area or a plurality of acquisition guide areas in the acquisition interface.

For example, the prompt information is an acquisition sight, and the acquisition sight may be an identifier located in the acquisition interface for aiming and positioning the acquisition guide area to indicate a reference shooting point at which the image acquisition device is currently aligned.

The acquisition sight can be in any shape and size as an aiming and positioning mark, and only has the function of prompting. As shown in fig. 10C, for example, the acquisition sight may be a circle consisting of two circles of different radii.

And under the condition that the acquisition sight falls into a target acquisition guide area in at least one acquisition guide area, the reference shooting point currently aligned by the image acquisition device is a shooting point in the target shooting area corresponding to the target acquisition guide area.

The collection sight can move on a collection object presented on the collection interface, and a user or an image collection device can determine whether to start collecting an initial image by judging whether the collection sight all falls into a collection guide area or whether the center of the collection sight falls into the collection guide area.

In some embodiments of the present disclosure, the image processing method further comprises: and in response to the movement of the image acquisition device, controlling the prompt information to at least surround the acquisition guide area for one circle in the same direction as the movement direction of the image acquisition device so as to acquire a plurality of initial images.

For example, the acquisition front sight is controlled to make at least one turn in the same direction as the moving direction of the image acquisition device in the acquisition guide region to acquire a plurality of initial images.

In some embodiments of the present disclosure, move the end and remove and/or rotate when user control, make the collection sight fall into gathering the guide area, then move with fixed direction along with removing the end, it can control the collection sight to move according to the same direction of the direction of motion of removing the end in gathering the guide area to remove the end, and carry out image acquisition in real time in removing the end, obtain a plurality of initial images, thereby through gathering sight and gathering the guide area, it guides to carry out panorama information acquisition to the user, can effectively help the user to carry out information acquisition to the shooting point in the shooting environment, panorama information's collection can be realized through removing the end, the problem that depends on fixed equipment has been overcome, effectively reduce the collection cost, the collection flow has been simplified.

In some embodiments of the present disclosure, for each of the plurality of initial images acquired according to the prompt information, there is an overlapping area between each of the plurality of initial images and an adjacent initial image, where the adjacent initial image is an initial image acquired at a shooting point adjacent to a shooting point corresponding to each of the plurality of initial images.

For example, for each initial image, there may be at least one overlap region between the initial image and another initial image different from the initial image in the plurality of initial images. Therefore, a plurality of initial images shot by the user according to the prompt message can form a closed loop, namely, an overlapping area exists between every two adjacent initial images, and an overlapping area exists between the first initial image and the last initial image, so that the panoramic image can be shot.

In some embodiments of the present disclosure, the image processing method may further include, on the basis of the foregoing steps: the stitched image is converted into a three-dimensional image, and the three-dimensional image is output to be presented as a panoramic image.

The stitched image may be converted to a three-dimensional image using, for example, three-dimensional modeling software.

In some embodiments of the present disclosure, since the image processing method may be directly performed at the mobile terminal without uploading the acquired image to the server, the three-dimensional image may also be displayed at the mobile terminal in real time for the user to view.

In some embodiments of the present disclosure, the above-described image processing method may be applied to a scene of a room, a car, or the like, to generate a panoramic view of the room or a panoramic view of the car.

For example, in some embodiments of the present disclosure, an image processing method may include the steps of: first, an image sequence is acquired, the image sequence including a plurality of initial images, the plurality of initial images being obtained by image-capturing the accommodating space. Then, obtaining pose information corresponding to the plurality of initial images respectively; projection information of each initial image in the curved surface projection is determined based on the pose information corresponding to the initial images, the initial images are mapped to target pixel canvas respectively according to the projection information to generate target images, and then the target images can be fused to obtain a panoramic image of the accommodating space. I.e. to perform the method described above with reference to fig. 1.

In some embodiments of the present disclosure, the accommodation space may be, for example, a room, a car, or the like.

In some embodiments of the present disclosure, the image of the accommodating space may be acquired in the accommodating space, or may be acquired outside the accommodating space.

If the image of the accommodating space is collected in the accommodating space, a panoramic image of the inside of the accommodating space can be obtained according to the method described above. For example, by image-capturing the interior of a room, a panoramic view of the interior of the room may be generated according to the method described above, so that a three-dimensional image of the interior of the room may be presented. For example, by performing image acquisition inside the vehicle, a panoramic view inside the vehicle may be generated according to the above-described method, so that a three-dimensional image inside the vehicle may be displayed.

If the image of the accommodating space is collected outside the accommodating space, a panoramic view of the shape of the accommodating space can be obtained according to the method described above. For example, by capturing images of the vehicle outside the vehicle, a panoramic view of the exterior of the vehicle may be generated according to the methods described above.

Fig. 10D illustrates an effect diagram of generating a stitched image according to at least one embodiment of the present disclosure.

As shown in fig. 10D, for example, the user has acquired a plurality of initial images in a room using the mobile terminal. In FIG. 10D, the initial images 810-830 are schematically shown.

For example, the plurality of initial images 810-830 are acquired by controlling the acquisition sight bead to at least surround in the acquisition guide area according to the movement of the moving end (namely, the image acquisition device) in the same direction as the movement direction of the moving end.

For example, poses respectively adopted when cameras at a moving end collect initial images 810-830 are obtained, then based on pose information respectively corresponding to the initial images 810-830, longitude and latitude of pixel points of each initial image in the initial images 810-830 in curved surface projection are determined, and according to the longitude and latitude of the pixel points in the curved surface projection, the initial images 810-830 are respectively mapped to target pixel canvas to respectively generate target images 811-813. Then, the target images 811 to 813 are fused to obtain a stitched image 814, i.e., a panoramic view of the room.

The three-dimensional image 814 is three-dimensionally converted to obtain a three-dimensional image of the room, so that the user experiences the effect of watching the room by using a Virtual Reality (VR).

Fig. 10E is a schematic diagram illustrating another effect of generating a stitched image according to at least one embodiment of the present disclosure.

As shown in fig. 10E, for example, the user has acquired a plurality of initial images within the cart using the mobile end. FIG. 10E schematically shows the initial images 901-903.

For example, the plurality of initial images 901 to 903 are acquired by controlling the acquisition sight to at least surround in the acquisition guide area in the same direction as the moving direction of the moving end according to the movement of the moving end (namely, the image acquisition device).

For example, poses respectively adopted when cameras at a moving end collect initial images 901 to 903 are obtained, then based on pose information respectively corresponding to the initial images 901 to 903, longitude and latitude of pixel points of each initial image in the initial images 901 to 903 in curved surface projection are determined, and according to the longitude and latitude of the pixel points in the curved surface projection, the initial images 901 to 903 are respectively mapped to target pixel canvas to respectively generate target images 911 to 913. And then, fusing the target images 911-913 to obtain a spliced image 914, namely obtaining a panoramic image in the vehicle.

The three-dimensional image 914 is three-dimensionally converted to obtain a three-dimensional image of the room, so that the user experiences the VR (Virtual Reality) car-watching effect.

Fig. 11 shows a schematic block diagram of an image processing apparatus 1100 provided in at least one embodiment of the present disclosure.

As shown in fig. 11, the image processing apparatus 1100 includes an acquisition unit 1110, a feature point matching unit 1120, a gridding unit 1130, a calculation unit 1140, a mapping unit 1150, and a fusion unit 1160.

The acquisition unit 1110 is configured to acquire a plurality of initial images. The acquisition unit 1110 may perform step S10 described in fig. 1, for example.

The feature point matching unit 1120 is configured to feature point match the plurality of initial images to obtain at least one matched image pair. For example, each of the at least one pair of matching images includes a first initial image and a second initial image, which are different initial images between which there are matching feature points. The feature point matching unit 1120 may perform, for example, step S20 described in fig. 1.

The gridding unit 1130 is configured to select, for each of the pairs of matched images, one of the first initial image and the second initial image of the pair of matched images as a gridded image, and perform a gridding process on the gridded image to divide the gridded image into a plurality of grids. The gridding unit 1130 may perform, for example, step S30 described in fig. 1.

The calculation unit 1140 is configured to calculate, for each of the matched image pairs, a mapping matrix corresponding to each of the meshes based on feature points matched between the first initial image and the second initial image. The calculation unit 1140 may, for example, perform step S40 described in fig. 1.

The mapping unit 1150 is configured to generate a plurality of target images by mapping the plurality of initial images into target pixel canvases according to the projection information of each initial image in the curved projection. The mapping unit 1150 may perform, for example, step S50 described in fig. 1.

The fusion unit 1160 is configured to fuse the plurality of target images to obtain a stitched image. The fusing unit 1160 may perform, for example, step S60 described in fig. 1.

For example, the obtaining unit 1110, the feature point matching unit 1120, the gridding unit 1130, the calculating unit 1140, the mapping unit 1150, and the fusing unit 1160 may be hardware, software, firmware, and any feasible combination thereof. For example, the obtaining unit 1110, the feature point matching unit 1120, the gridding unit 1130, the calculating unit 1140, the mapping unit 1150 and the fusing unit 1160 may be a dedicated or general-purpose circuit, a chip or a device, and may also be a combination of a processor and a memory. The embodiments of the present disclosure are not limited in this regard to the specific implementation forms of the above units.

It should be noted that, in the embodiment of the present disclosure, each unit of the image processing apparatus 1100 corresponds to each step of the aforementioned image processing method, and for specific functions of the image processing apparatus 1100, reference may be made to the related description about the image processing method, which is not described herein again. The components and configuration of the image processing apparatus 1100 shown in fig. 11 are exemplary only, and not limiting, and the image processing apparatus 1100 may further include other components and configurations as necessary.

At least one embodiment of the present disclosure also provides an electronic device comprising a processor and a memory, the memory including one or more computer program modules. One or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the image processing method described above. The electronic equipment ensures that the image splicing does not depend on depth data, reduces the requirements and the calculation amount of the image splicing on the image quality, is beneficial to realizing the image splicing at a mobile terminal, not only ensures that the image splicing is simpler to realize and has lower cost, but also ensures that the image splicing effect is better and more natural.

Fig. 12A is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure. As shown in fig. 12A, the electronic device 1200 includes a processor 1210 and a memory 1220. The memory 1220 is used to store non-transitory computer-readable instructions (e.g., one or more computer program modules). The processor 1210 is configured to execute non-transitory computer readable instructions, which when executed by the processor 1210 may perform one or more of the steps of the image processing method described above. The memory 1220 and the processor 1210 may be interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, processor 1210 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capabilities and/or program execution capabilities. For example, the Central Processing Unit (CPU) may be an X86 or ARM architecture or the like. The processor 1210 may be a general-purpose processor or a special-purpose processor that may control other components in the electronic device 1200 to perform desired functions.

For example, memory 1220 may include any combination of one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer-readable storage medium and executed by processor 1210 to implement various functions of electronic device 1200. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer-readable storage medium.

It should be noted that, in the embodiment of the present disclosure, reference may be made to the above description on the image processing method for specific functions and technical effects of the electronic device 1200, and details are not described here again.

Fig. 12B is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. The electronic device 1300 is, for example, suitable for implementing the image processing method provided by the embodiments of the present disclosure. The electronic device 1300 may be a terminal device or the like. It should be noted that the electronic device 1300 shown in fig. 12B is only one example, and does not bring any limitation to the functions and the scope of the use of the embodiments of the present disclosure.

As shown in fig. 12B, the electronic device 1300 may include a processing device (e.g., central processing unit, graphics processor, etc.) 1310 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1320 or a program loaded from a storage device 1380 into a Random Access Memory (RAM) 1330. In the RAM1330, various programs and data necessary for the operation of the electronic apparatus 1300 are also stored. The processing device 1310, the ROM 1320, and the RAM1330 are connected to each other by a bus 1340. An input/output (I/O) interface 1350 is also connected to bus 1340.

Generally, the following devices may be connected to I/O interface 1350: input devices 1360 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, and the like; an output device 1370 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, etc.; a storage 1380 including, for example, magnetic tape, hard disk, etc.; and a communication device 1390. The communication means 1390 may allow the electronic device 1300 to communicate wirelessly or by wire with other electronic devices to exchange data. While fig. 12B illustrates an electronic device 1300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided, and that the electronic device 1300 may alternatively be implemented or provided with more or less means.

For example, according to an embodiment of the present disclosure, the above-described image processing method may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program comprising program code for performing the image processing method described above. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 1390, or installed from the storage device 1380, or installed from the ROM 1320. When executed by the processing device 1310, the computer program may implement the functions defined in the image processing method provided by the embodiments of the present disclosure.

At least one embodiment of the present disclosure also provides a computer-readable storage medium for storing non-transitory computer-readable instructions that, when executed by a computer, may implement the image processing method described above. By utilizing the computer readable storage medium, the image splicing does not depend on depth data, the requirements and the operation amount of the image splicing on the image quality are reduced, the image splicing is favorably realized at a mobile end, the image splicing is simpler to realize and lower in cost, and the image splicing effect is better and more natural.

Fig. 13 is a schematic diagram of a storage medium according to some embodiments of the present disclosure. As shown in fig. 13, storage medium 1400 is used to store non-transitory computer readable instructions 1410. For example, the non-transitory computer readable instructions 1410, when executed by a computer, may perform one or more steps in accordance with the image processing method described above.

For example, the storage medium 1400 may be applied to the electronic device 1200 described above. For example, the storage medium 1400 may be the memory 1220 in the electronic device 1200 shown in fig. 12A. For example, the related description about the storage medium 1400 can refer to the corresponding description of the memory 1220 in the electronic device 1200 shown in fig. 12A, and is not repeated here.

The following points need to be explained:

(1) the drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to common designs.

(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.

The above description is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

Claims

1. An image processing method comprising:

acquiring a plurality of initial images;

performing feature point matching on the plurality of initial images to obtain at least one matched image pair, wherein each matched image pair in the at least one matched image pair comprises a first initial image and a second initial image, and the first initial image and the second initial image are different initial images with matched feature points in between;

for each of the matched image pairs, selecting one of the first initial image and the second initial image in the matched image pair as a gridded image, and performing gridding processing on the gridded image to divide the gridded image into a plurality of grids;

calculating a mapping matrix corresponding to each grid in the plurality of grids for each matched image pair based on matched feature points between the first initial image and the second initial image;

mapping each initial image in the plurality of initial images to a target pixel canvas to obtain a plurality of target images based on the mapping matrix corresponding to each grid; and

fusing the plurality of target images to obtain a spliced image,

wherein computing the mapping matrix for each mesh of the plurality of meshes based on feature points matched between the first initial image and the second initial image for the each matched image pair comprises:

for each grid, determining the weight of each pair of feature points matched between the first initial image and the second initial image to the grid, wherein each pair of feature points comprises a first feature point and a second feature point, and the first feature point and the second feature point are the matched feature points in the first initial image and the second initial image respectively;

determining first image coordinates of the feature points in the first initial image, and determining second image coordinates of the feature points in the second initial image; and

for each grid, determining a mapping matrix corresponding to the grid based on the weight of each pair of feature points to the grid and the first image coordinate and the second image coordinate.

2. The method of claim 1, wherein determining, for the each grid, a mapping matrix to which the grid corresponds based on the weights of the pairs of feature points for the grid and the first and second image coordinates comprises:

for each grid, constructing a singular value decomposition matrix corresponding to the grid based on the first image coordinates, the second image coordinates and weights for the grid; and

and carrying out singular value decomposition on the singular value decomposition matrix to obtain a mapping matrix corresponding to the grid.

3. The method of claim 1, wherein determining, for the each mesh, a weight for the mesh for each pair of feature points in the matched feature points between the first initial image and the second initial image comprises:

determining the distance from each feature point in the gridded image to the grid according to the gridded image; and

determining a weight of each pair of feature points for the grid based on a distance of each feature point in the gridded image to the grid.

4. The method of claim 1, wherein mapping each of the plurality of initial images into the target pixel canvas to obtain the plurality of target images based on the mapping matrix corresponding to the each mesh comprises:

determining target pose information corresponding to each grid based on the mapping matrix corresponding to each grid;

determining projection information of each grid in the curved surface projection based on the target pose information corresponding to each grid; and

and for each initial image, mapping the initial image into the target pixel canvas to generate the target image based on the projection information of each mesh corresponding to the matched image pair in which the initial image is located in the curved surface projection.

5. The method of claim 4, wherein determining the target pose information for each grid based on the mapping matrix for each grid comprises:

determining a reference image from the plurality of initial images;

determining first position information of each grid relative to the reference image based on the mapping matrix corresponding to each grid and the reference image; and

determining target pose information for each grid based on first pose information for the grid relative to the reference image.

6. The method of claim 5, wherein the plurality of initial images are divided into a plurality of image groups, each image group including at least one of the at least one pair of matching images, and wherein there are matching feature points between different pairs of matching images in each image group and there are no matching feature points between different image groups,

determining first pose information of each grid relative to the reference image based on the mapping matrix corresponding to each grid and the reference image, including:

for each grid, in the case that the initial image to which the grid belongs and the reference image belong to the same image group, determining first pose information of the grid relative to the reference image based on a mapping matrix corresponding to the grid and the reference image;

in the case that the initial image to which the mesh belongs and the reference image do not belong to the same image group, determining a pose relationship between the image group in which the initial image to which the mesh belongs and the image group in which the reference image belongs, and determining first pose information of the mesh with respect to the reference image based on the pose relationship and a mapping matrix corresponding to the mesh.

7. The method of claim 6, wherein the plurality of initial images are obtained by an image acquisition device comprising a sensor, the method further comprising:

acquiring construction pose information constructed by the sensor, wherein the construction pose information comprises a pose adopted in the process of acquiring each initial image by the image acquisition device;

determining the position and orientation relation between the image group of the initial image belonging to the grid and the image group of the reference image, wherein the position and orientation relation comprises the following steps:

and determining the pose relationship between the image group of the initial image to which the grid belongs and the image group of the reference image based on the construction pose information corresponding to at least one initial image respectively included in the image group of the initial image to which the grid belongs and the image group of the reference image.

8. The method of claim 7, wherein determining the target pose information for each grid based on the first pose information for each grid with respect to the reference image comprises:

and carrying out data fusion on the first pose information of each grid relative to the reference image and the constructed pose information to obtain target pose information of each grid.

9. The method of claim 8, wherein data fusing the first pose information for the each mesh relative to the reference image with the build pose information to obtain the target pose information for the each mesh comprises:

converting the constructed pose information into a coordinate system taking the reference image as a reference to obtain second pose information;

performing data fusion on the second attitude information and the first attitude information to obtain a fusion screening result; and

and determining the target pose information based on the fusion screening result.

10. The method of claim 9, wherein determining the target pose information based on the fusion screening result comprises:

and processing the fusion screening result by using a parameter optimization method to obtain the target pose information.

11. The method of claim 7, wherein the sensor comprises a first sensor and a second sensor,

acquiring the construction pose information constructed by the sensor, including:

acquiring first pose data, wherein the first pose data is a first pose adopted by the image acquisition device constructed by the first sensor to acquire each initial image;

acquiring second pose data, wherein the second pose data is a second pose adopted by the image acquisition device constructed by the second sensor to acquire each initial image; and

and performing data fusion on the first position and posture data and the second position and posture data to obtain the construction position and posture information adopted by each initial image.

12. The method of claim 4, wherein determining projection information of each mesh in the curved projection based on the target pose information corresponding to the each mesh comprises:

converting each grid into a world coordinate system based on the target pose information corresponding to each grid to obtain world coordinate information of each grid in the world coordinate system;

according to the world coordinate information of each grid, determining curved surface coordinate information of each grid in the curved surface projection; and

and converting the curved surface coordinate information of each grid into the projection information.

13. The method of claim 4, wherein for each initial image, generating the target image by mapping the initial image into the target pixel canvas based on projection information of each mesh corresponding to the matching image pair of the initial image in the curved projection comprises:

determining a size of the target pixel canvas; and

and for each initial image, determining the position of a pixel point in each mesh in the target pixel canvas based on the size of the target pixel canvas and the projection information of each mesh corresponding to the matching image pair of the initial image in the curved surface projection, so as to map each mesh into the target pixel canvas to generate the target image.

14. The method of claim 13, wherein the projection information includes longitude and latitude of pixel points in the initial image, and the position of the pixel points in each grid in the target pixel canvas is calculated by the following formula:

wherein c is a column of the pixel point in the target pixel canvas, r is a row of the pixel point in the target pixel canvas, theta is a longitude in the projection information, and phi is a latitude in the projection information,

for the width of the target pixel canvas,

is the height of the target pixel canvas.

15. The method of claim 1, wherein fusing the plurality of target images to obtain the stitched image comprises:

and performing weighted fusion on the plurality of target images to obtain the spliced image.

16. The method of claim 15, wherein the weighted fusion of the plurality of target images to obtain the stitched image comprises:

determining an overlapping area and a non-overlapping area of each target image in the plurality of target images based on the positions of the plurality of initial images in the target pixel canvas respectively, wherein the overlapping area is an area where a plurality of matched feature points exist between each target image and a target image except the target image in the plurality of target images, and the non-overlapping area is an area except the overlapping area in the target image;

determining the weight corresponding to the overlapping area; and

and performing weighted fusion on the plurality of target images based on the weights corresponding to the overlapping areas.

17. The method of claim 16, wherein determining the weight corresponding to the overlap region comprises:

determining the Manhattan distance from the pixel points in the overlapping area to the center of the initial image; and

and determining the weight corresponding to the overlapping region based on the Manhattan distance.

18. The method of claim 1, wherein feature point matching the plurality of initial images to obtain the at least one matched image pair comprises:

for each initial image, determining a neighboring initial image of the initial image, wherein the shooting point of the neighboring initial image and the shooting point of the initial image are adjacent to each other;

feature point matching is performed on the initial image and the adjacent initial image to obtain the at least one matched image pair.

19. The method of claim 1, further comprising:

and carrying out illumination homogenization treatment on the target images to enable the illumination intensity of the target images to be uniform.

20. The method of claim 1, wherein the plurality of initial images are obtained by an image acquisition device,

the method further comprises the following steps:

acquiring a shooting pose of the image acquisition device;

determining at least one target shooting area in a shooting environment based on the shooting pose; and

and displaying prompt information based on the at least one target shooting area to prompt a user to acquire the plurality of initial images in the at least one target shooting area.

21. The method of claim 20, wherein displaying the reminder information based on the at least one capture area comprises:

displaying at least one acquisition guide area based on the at least one target shooting area, wherein the at least one acquisition guide area corresponds to the at least one target shooting area respectively; and

displaying the prompt information, wherein the prompt information indicates a reference shooting point at which the image acquisition device is currently aligned;

and under the condition that the prompt message falls into a target acquisition guide area in the at least one acquisition guide area, the reference shooting point currently aligned with the image acquisition device is a shooting point in the target shooting area corresponding to the target acquisition guide area.

22. The method of claim 20, wherein for each of the plurality of initial images acquired in accordance with the hint information, the each initial image has an overlap region with an adjacent initial image, the adjacent initial image being an initial image acquired at a shot point adjacent to a shot point corresponding to the initial image.

23. The method of claim 21, further comprising:

converting the stitched image into a three-dimensional image; and

and outputting the three-dimensional image to show the three-dimensional image as a panoramic image.

24. An image processing apparatus comprising:

an acquisition unit configured to acquire a plurality of initial images;

a feature point matching unit configured to perform feature point matching on the plurality of initial images to obtain at least one matching image pair, wherein each matching image pair in the at least one matching image pair includes a first initial image and a second initial image, and the first initial image and the second initial image are different initial images between which matched feature points exist;

a gridding unit configured to select one of the first initial image and the second initial image in the matched image pair as a gridded image for each of the matched image pairs, and perform gridding processing on the gridded image to divide the gridded image into a plurality of grids;

a computing unit configured to compute, for each of the matched image pairs, a mapping matrix corresponding to each of the plurality of meshes based on feature points matched between the first initial image and the second initial image;

a mapping unit configured to map each of the plurality of initial images into a target pixel canvas to obtain a plurality of target images based on the mapping matrix corresponding to each grid; and

a fusion unit configured to fuse the plurality of target images to obtain a stitched image,

wherein the computing unit is configured to:

25. An electronic device, comprising:

a processor;

a memory including one or more computer program modules;

wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the image processing method of any of claims 1-23.

26. A computer readable storage medium storing non-transitory computer readable instructions which, when executed by a computer, may implement the image processing method of any one of claims 1-23.