CN115439331A

CN115439331A - Corner point correction method and three-dimensional model generation method and device in meta universe

Info

Publication number: CN115439331A
Application number: CN202211075976.6A
Authority: CN
Inventors: 王海君
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-12-06
Anticipated expiration: 2042-09-02
Also published as: CN115439331B

Abstract

The invention provides a method for correcting angular points and a method and a device for generating a three-dimensional model in a metas, and relates to the field of artificial intelligence, in particular to the technical fields of virtual reality, augmented reality, metas, computer vision, deep learning and the like. The specific implementation scheme of the corner point correction method is as follows: determining position information of a spatial point corresponding to a target object corner point according to the position information of the target object corner point included in each of the at least two images to obtain at least two pieces of spatial position information; clustering the target object corner points in the at least two images according to the at least two pieces of spatial position information to obtain at least one corner point group; determining reference position information of the target object corner points in each corner point group according to the position information of the space points corresponding to the target object corner points in each corner point group; and correcting the position of the corner point of the target object in the image according to the reference position information of the corner point of the target object.

Description

Corner point correction method and three-dimensional model generation method and device in meta universe

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the technical fields of virtual reality, augmented reality, meta universe, computer vision, deep learning and the like, and particularly relates to a method for correcting an angular point and a method, a device, equipment and a medium for generating a three-dimensional model.

Background

In the case of too large a scene or too many occlusions, a single image cannot represent the complete scene information. A complete three-dimensional model of the scene can typically be obtained by stitching point cloud data of multiple images. For a single target object in a scene, because the three-dimensional model of the target object is obtained by splicing point cloud data of a plurality of images and different images are acquired based on different poses, the three-dimensional model of the target object can be in a dislocation state between different parts.

Disclosure of Invention

The present disclosure aims to provide a method for correcting an angular point and a method, an apparatus, a device, and a medium for generating a three-dimensional model, which solve the misalignment problem.

According to an aspect of the present disclosure, there is provided a method for correcting a corner point, including: determining the position information of spatial points corresponding to the corner points of the target object according to the position information of the corner points of the target object included in each of the at least two images to obtain at least two pieces of spatial position information; clustering the target object corner points in the at least two images according to the at least two pieces of spatial position information to obtain at least one corner point group; determining reference position information of the target object corner points in each corner point group according to the position information of the space points corresponding to the target object corner points in each corner point group; and correcting the position of the corner point of the target object in the image according to the reference position information of the corner point of the target object.

According to another aspect of the present disclosure, there is provided a method of generating a three-dimensional model, including: determining point cloud data corresponding to each panoramic image in at least two panoramic images including a target object; and aggregating point cloud data corresponding to at least two panoramic images to obtain a three-dimensional scene model comprising a target object, wherein the target object corner in each panoramic image is the corner corrected by adopting the corner correction method provided by the disclosure.

According to another aspect of the present disclosure, there is provided an apparatus for correcting a corner point, including: the position information determining module is used for determining the position information of a spatial point corresponding to a target object corner point according to the position information of the target object corner point included in each of the at least two images to obtain at least two pieces of spatial position information; the angular point clustering module is used for clustering the angular points of the target object in the at least two images according to the at least two pieces of spatial position information to obtain at least one angular point group; the reference position determining module is used for determining the reference position information of the target object corner points in each corner point group according to the position information of the space points corresponding to the target object corner points in each corner point group; and the position correction module is used for correcting the position of the corner point of the target object in the image according to the reference position information of the corner point of the target object.

According to another aspect of the present disclosure, there is provided a three-dimensional model generation apparatus including: the system comprises a point cloud data determining module, a point cloud data determining module and a processing module, wherein the point cloud data determining module is used for determining point cloud data corresponding to each panoramic image in at least two panoramic images comprising a target object; and the model obtaining module is used for aggregating point cloud data corresponding to at least two panoramic images to obtain a three-dimensional scene model comprising a target object, wherein the target object corner in each panoramic image is the corner corrected by the corner correcting device provided by the disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for corner correction and/or the method for three-dimensional model generation provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the correction method of corner points and/or the generation method of a three-dimensional model provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising computer programs/instructions which, when executed by a processor, implement the corner correction method and/or the three-dimensional model generation method provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of a method and an apparatus for correcting a corner point and/or generating a three-dimensional model according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a corner point correction method according to an embodiment of the present disclosure;

figure 3 is a schematic diagram of a principle of clustering a plurality of target object corner points according to an embodiment of the present disclosure;

FIG. 4 is a schematic illustration of a principle of determining directional characteristics according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a principle of clustering a plurality of target object corners according to location features according to an embodiment of the present disclosure;

FIG. 6 is a flow chart diagram of a method of generating a three-dimensional model according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a correction apparatus for a corner point according to an embodiment of the present disclosure;

fig. 8 is a block diagram of a structure of a three-dimensional model generation apparatus according to an embodiment of the present disclosure; and

fig. 9 is a block diagram of an electronic device for implementing a corner point correction method and/or a three-dimensional model generation method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In three-dimensional reconstruction techniques, three-dimensional layout information for a scene may be generated from a single image. However, due to the influence of perspective, in the case of too large scene or too many obstacles, a single image cannot represent complete scene information. In order to generate a complete three-dimensional model, a three-dimensional model of a partial region may be generated for each image, then the spatial relationship between the three-dimensional models of the partial regions is determined by the positioning data, and the three-dimensional models of the partial regions are stitched according to the spatial relationship. When the three-dimensional models in two different coordinate systems are spliced, the three-dimensional models in the two different coordinate systems can be transformed into the same coordinate system according to the transformation relation between the two different coordinate systems, and then the three-dimensional models are spliced.

For a single object of a larger size in a scene, such as a wall, there may be a case where a three-dimensional model of the single object is obtained by stitching three-dimensional models of a plurality of local regions. Due to the fact that cameras corresponding to the images are different in pose, a three-dimensional target object obtained by splicing three-dimensional models of the local areas can be in a model dislocation state. Thus, the fidelity of the three-dimensional scene model may be affected, affecting the real experience of the user in virtual reality, augmented reality, and/or metastic applications.

In order to solve the problem of misalignment, the misaligned model is usually adjusted manually or the overlapped model due to misalignment is deleted. However, the method has strong dependence on manpower and low operation efficiency, and the final result is greatly influenced by the manpower.

Based on this, the present disclosure aims to provide a method for correcting corner points of a target object, so as to avoid a situation that a three-dimensional model formed by splicing is misaligned by correcting positions of the corner points of the target object. Based on the corner point correction method, the invention also provides a three-dimensional model generation method.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a corner correction method, a three-dimensional model generation method, and an apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be various electronic devices with processing functions, including but not limited to a smart phone, a tablet computer, a laptop computer, a desktop computer, a server, and so on.

The electronic device 110 may, for example, process the input image 120, specifically, correct a corner of a target object such as a wall in the image 120, generate point cloud data according to the image after the corner correction, and represent a three-dimensional model corresponding to the image 120 by using the generated point cloud data.

In an embodiment, the electronic device 110 may process a plurality of images acquired for one scene, resulting in a plurality of sets of point cloud data generated based on the plurality of images. The electronic device 110 may also obtain a three-dimensional model of the scene (i.e., the three-dimensional scene model 130) by aggregating multiple sets of point cloud data. The plurality of images may be images captured by the image capturing device at different angles and different positions.

In an embodiment, the application scenario 100 may further include a server 150. The electronic device 110 may be communicatively coupled to the server 150 via a network. The network may include wired or wireless communication links.

For example, the electronic device 110 may correct only corner points of a target object such as a wall in the image 120 and send the corner point corrected image to the server 150 as the corrected image 140. Accordingly, the server 150 may, for example, generate point cloud data from the corrected image 140 after receiving the corrected image 140, and obtain the three-dimensional scene model 130 by stitching the point cloud data of the plurality of corrected images.

Illustratively, the electronic device 110 may also send the image 120 directly to the server 150, process the image 120 by the server 150, and generate the three-dimensional scene model 130. After obtaining three-dimensional scene model 130, server 150 may also send three-dimensional scene model 130 to electronic device 110, for example, for presentation by the electronic device.

It is understood that the server 150 may be a background management server supporting the running of the client application in the electronic device 110, a virtual server, or the like, which is not limited by the disclosure.

It should be noted that the correction method of the corner point and/or the generation method of the three-dimensional model provided by the present disclosure may be executed by the electronic device 110, and may also be executed by the server 150. Accordingly, the correction device for corner points and/or the generation device for three-dimensional model provided by the present disclosure may be disposed in the electronic device 110, and may also be disposed in the server 150.

It should be understood that the number and type of electronic devices 110 and servers 150 in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110 and servers 150, as desired for an implementation.

The method for correcting a corner provided by the present disclosure will be described in detail below with reference to fig. 2 to 5.

Fig. 2 is a schematic flow chart of a corner point correction method according to an embodiment of the present disclosure.

As shown in fig. 2, the method 200 for correcting a corner point of this embodiment may include operations S210 to S240.

In operation S210, according to the position information of the target object corner included in each of the at least two images, the position information of the spatial point corresponding to the target object corner is determined, so as to obtain at least two pieces of spatial position information.

According to an embodiment of the present disclosure, an image capture device captures a scene including a target object at different poses, resulting in at least two images. Alternatively, the embodiment may randomly select at least two images from a plurality of images including the target object acquired by the image acquisition apparatus in different poses.

According to the embodiment of the disclosure, each image can be processed by adopting a deep neural network model, so that the position information of the target object corner in each image is obtained. The corner of the target object is the corner of the target object. The deep neural network model may include, for example, a semantic segmentation model composed of an encoder and a decoder, and may be, for example, a Unet model, a MaskRCNN model, or the like. Specifically, each image may be input into the deep neural network model, the deep neural network model outputs a semantic segmentation result, and the pixel position of the target object corner may be located according to the semantic segmentation result, which is used as the position information of the target object corner.

In one embodiment, the deep neural network model may be a HoHoHoNet (360 Indor Holistic unknown with Latent Horizontal Features) model. The input of the model is an image, and the output is a boundary line between adjacent surfaces in the target object. The embodiment can deduce the pixel position of the corner point of the target object according to the boundary line according to the solid geometry.

It can be understood that the position information of the target object corner in the image may be detected in real time or detected in advance, which is not limited by the present disclosure. The target object corner may be, for example, an intersection of three mutually adjacent surfaces of the target object, or an intersection of three mutually adjacent surfaces of the target object and another object. For example, if the target object is a wall, the corner point is a wall corner point, which is not limited in this disclosure.

The embodiment can convert the position information of each target object corner point according to the conversion relation between the pixel coordinate system and the space three-dimensional coordinate system, so as to obtain the position information of the space point corresponding to each target object corner point. The spatial three-dimensional coordinate system may be a camera coordinate system, or may also be a spatial rectangular coordinate system using any point in the spatial points as a coordinate origin, and the position information of the spatial point may be represented by a coordinate value of the spatial point in the spatial three-dimensional coordinate system, which is not limited in this disclosure.

In an embodiment, at least two images may be panoramic images, and then the embodiment may convert the position information of each corner point of the target object according to a conversion relationship between the spherical coordinate system and the spatial three-dimensional coordinate system.

For example, a coordinate value p of a pixel position of a corner of a target object is set _i (x, y), the coordinate value of the space point corresponding to the corner point of the certain target object is P _i (X，Y，Z) Then, the coordinate values of the pixel positions can be transformed into the spherical coordinate system by the following formulas (1) to (2):

r＝c ^h |cotθ _y equation (2).

Wherein, c ^h The depth is the height from the ground of a shooting viewpoint of the image acquisition equipment, r is the horizontal distance from a space point corresponding to a target object corner point to the shooting viewpoint under a camera coordinate system, and r can be understood as the depth of the target object corner point, for example, and the depth can be obtained by processing each image by adopting a monocular depth estimation model. The resolution of the image is W × H, and W =2H is set.

If the corner point of the target object is a ground pixel point, for example, the coordinate value of the space point corresponding to the corner point of the target object can be calculated based on a trigonometric function by the following formula (3):

in operation S220, the target object corner in the at least two images is clustered according to the at least two spatial position information, so as to obtain at least one corner group.

According to the embodiment of the disclosure, a clustering algorithm can be adopted to cluster at least two pieces of spatial position information corresponding to target object corner points in at least two images to obtain at least one information group. Then, the target space points represented by all the information in each information group can be determined, and all the target object corner points corresponding to the target space points form a corner group. The clustering algorithm may include a Kmeas clustering algorithm or a density-based clustering algorithm, which is not limited in this disclosure.

In operation S230, reference position information of the target object corner in each corner group is determined according to the position information of the spatial point corresponding to the target object corner in each corner group.

According to the embodiment of the present disclosure, in consideration of the misalignment, it can be considered that the spatial points corresponding to all the target object corner points in each corner point group obtained by clustering are substantially the same spatial point. Based on this, in this embodiment, the clustering center of the information group corresponding to each corner group may be used as the reference spatial information of the same spatial point, and the position information is converted to the pixel coordinate system corresponding to the image where each target corner in each corner group is located according to the conversion relationship between the pixel coordinate system and the spatial three-dimensional coordinate system, so as to obtain the reference position information of each target corner.

In an embodiment, an average value of all information in the information group corresponding to each corner group may also be used as the reference space information, that is, an average value of position information of all space points corresponding to all target object corners in each corner group may be used as the reference space information. The reference position information of each target object corner obtained by conversion is substantially the position information corresponding to the reference space information in the image where each target object corner is located.

In operation S240, the position of the target object corner in the image is corrected according to the reference position information of the target object corner.

The embodiment can realize the correction of the position of each target object corner in the image by taking the reference position information of each target object corner as the position information of each target object corner in the image where the target object corner is located.

The method and the device for clustering the corner points of the target object in the at least two images can cluster the same corner point of the target object in the at least two images into a corner point group by projecting the corner points of the target object in the at least two images into a three-dimensional space and clustering the corner points of the target object according to the positions of the space points obtained by projection in the three-dimensional space. And finally, determining the reference positions of the corner points of the target object according to the positions of all the corner points of the target object in the same corner point group projected into the three-dimensional space, unifying the positions of the same corner point of the target object in at least two images, achieving the aim of correcting the images and solving the problem of dislocation when point cloud data corresponding to a plurality of images are spliced. Compared with the technical scheme of manually correcting the angular points in the model, the correction precision and the correction efficiency can be improved.

Setting each image to include at least two target object corner points, and further expanding and detailing the operation of clustering the target object corner points according to the spatial position information with reference to fig. 3.

Fig. 3 is a schematic diagram of a principle of clustering a plurality of target object corner points according to an embodiment of the present disclosure.

According to the embodiment of the disclosure, when clustering a plurality of target object corner points, in addition to considering the position information of the spatial point corresponding to each target object corner point, for example, the relative position relationship between at least two spatial points corresponding to at least two target object corner points in the same image can be considered, so that the spatial points corresponding to all the target object corner points in one corner point group obtained by clustering are further ensured to be the same spatial point, which is beneficial to improving the precision of the determined reference position information of the target object corner points and is therefore beneficial to improving the correction effect.

For example, as shown in fig. 3, in the embodiment 300, when clustering a plurality of target object corner points, at least two target object corner points 320 of a target object in a first image 311 of at least two images may be determined first for the first image 311. For each target object corner 321 of the at least two target object corners 320, a neighboring corner 322 of this each target object corner 321 is determined. The neighboring corner point 322 may be a corner point adjacent to the position of each target object corner point 321 in at least two corner points included in the target object in the first image 311. For example, the neighboring corner point 322 may be a corner point located within a predetermined range of each target object corner point 321 in the first image, or may be a corner point located at a predetermined azimuth of each target object corner point 321 in the first image and closest to each target object corner point 321.

After determining the neighboring corner points, the embodiment may determine the position feature 340 of the spatial point corresponding to each target corner point 321 according to the position information 331 of the spatial point corresponding to each target corner point 321 and the position information 332 of the spatial point corresponding to the neighboring corner point 322. For example, the position information of the spatial point corresponding to each target object corner 321 and the distance between the spatial point corresponding to each target object corner 321 and the spatial point corresponding to the neighboring corner 322 may be combined into the position feature 340. Thus, for at least two target object corner points included in the first image, the position features of the corresponding at least two spatial points can be obtained.

For each of the other images 312 in the image set except the first image 311, the operation for the first image 311 is performed, and then a plurality of target object corner points 350 may be obtained for at least two images in total, where the plurality of target object corner points 350 includes at least two target object corner points 320. Accordingly, the position features of the spatial points corresponding to each of the plurality of target object corner points 350 may be obtained, and a plurality of position features of the plurality of spatial points may be obtained in total.

This embodiment may cluster the plurality of target object corner points 350 according to the position features of the plurality of spatial points (i.e. a plurality of position features), thereby obtaining at least one corner point group 360. It is to be understood that, in general, the number of the at least one corner set 360 may be equal to the number of the actual corners of the target object, which is not limited by the present disclosure.

In an embodiment it is taken into account that the relative orientation between two neighboring corner points of the target object is not influenced by a change of the pose of the image acquisition device. That is, the relative orientation between two spatial points corresponding to two adjacent corner points in different images is the same. When the position feature of the space point corresponding to each target object corner point is determined, the direction feature of the space point corresponding to each target object corner point can be determined according to the position information of the space point corresponding to each target object corner point and the position information of the space point corresponding to the adjacent corner point, and the direction feature can represent the relative orientation of each target object corner point and the adjacent corner point in the three-dimensional space. In this embodiment, the direction feature and the position information of the spatial point corresponding to each target object corner point may form a multi-element group, and the multi-element group is used as the position feature. For example, if there are N neighboring corner points, the tuple is an (N + 1) tuple. The (N + 1) tuple comprises N position features and spatial position information corresponding to one target object corner, and the N position features correspond to N adjacent corners one to one.

According to the embodiment, the position characteristics of the space corresponding to each target object corner point are determined according to the direction characteristics and the position information, so that the position characteristics can more comprehensively reflect the positions of the space points corresponding to the corner points, the clustering precision is improved, and all the target object corner points in each corner point group obtained by clustering are further ensured to correspond to the same space point of the three-dimensional space.

FIG. 4 is a schematic diagram of a principle of determining a directional feature according to an embodiment of the present disclosure.

As shown in fig. 4, in the embodiment 400, the target object corner points in the setting image 410 at least include corner points 401 to 406.

For example, for the corner point 403, the determined neighboring corner points may only include any one of the corner point 404, the corner point 401, and the corner point 405, and may also include any plurality of the corner points 404, 401, and 405, which is not limited by this disclosure.

In an embodiment, the determined neighboring corner points may comprise two corner points located at two orientations on both sides of each target object corner point. For example, for corner point 403, the determined neighboring corner points may comprise corner point 401 neighboring corner point 403 in a first orientation, and corner point 405 neighboring corner point 403 in a second orientation. Wherein the first azimuth and the second azimuth are located on both sides of the corner point 403. By setting the adjacent corner points on two directions at two sides of the corner point, the position characteristics of the space point corresponding to the corner point 403 can be more accurately expressed.

In the embodiment 400, for example, a direction vector between a space point corresponding to each target object corner point and a space point corresponding to an adjacent corner point may be used as a direction feature of the space point corresponding to each target object corner point. For example, for corner 403, neighboring corners are set to include corner 401 and corner 405. In this embodiment, a vector pointing from a space point corresponding to the corner point 403 to a space point corresponding to the corner point 401 is determined as a direction vector according to the position information of the space point corresponding to the corner point 403 and the position information of the space point corresponding to the corner point 401. According to the position information of the spatial point corresponding to the corner point 403 and the position information of the spatial point corresponding to the corner point 405, it is determined that the vector pointing from the spatial point corresponding to the corner point 403 to the spatial point corresponding to the corner point 405 is another direction vector, and the two direction vectors can form the direction feature of the spatial point corresponding to the corner point 403.

For example, if the first orientation is set as the left orientation and the second orientation is set as the right orientation, the position feature of the space point corresponding to the corner point 403 may be represented by the following triplet [ left direction vector, space point position, right direction vector ]. The left direction vector may be a vector from a space point corresponding to the corner point 403 to a space point corresponding to the corner point 401, the position of the space point is represented by the position information of the space point corresponding to the corner point 403, and the right direction vector may be a vector from a space point corresponding to the corner point 403 to a space point corresponding to the corner point 405.

Fig. 5 is a schematic diagram illustrating a principle of clustering a plurality of target object corner points according to location features according to an embodiment of the present disclosure.

According to the embodiment of the present disclosure, for example, a plurality of target object corner points may be clustered according to the feature distance between the position features. Specifically, the feature distance between the position features may be calculated first, that is, for each two corner points of the plurality of target object corner points, the distance between the position features of two spatial points corresponding to each two corner points is determined, so as to obtain the feature distances between the position features of the plurality of spatial points.

For example, the characteristic distance may be represented by a euclidean distance, a cosine distance, or the like, which is not limited by the present disclosure. This embodiment may classify two corner points corresponding to two position features having a feature distance smaller than a distance threshold value between them into a corner point group.

As shown in fig. 5, in the embodiment 500, the position feature includes a direction feature of a spatial point corresponding to each target object corner point and position information of a spatial point corresponding to each target object corner point. In the embodiment, when the feature distance is determined, the feature sub-distances may be calculated for the direction feature and the position information, respectively, and finally the feature distance may be determined according to the two obtained feature sub-distances. By respectively determining the characteristic sub-distances according to different information in the position characteristics, the distances can be conveniently calculated according to different information in different calculation modes, and therefore the accuracy of the determined characteristic distances is improved.

For example, for a first corner point 510, the determined first location features 520 corresponding to the spatial points may include a first direction feature 521 and first location information 522. For the second corner point 530, the determined second location features 540 of the corresponding spatial point may comprise a second direction feature 541 and second location information 542. This embodiment may first calculate the cosine distance between the first direction feature 521 and the second direction feature 541 to obtain the first feature sub-distance 551, and simultaneously calculate the euclidean distance between the first position information 522 and the second position information 542 to obtain the second feature sub-distance 552. Finally, the weighted sum between the first feature sub-distance 551 and the second feature sub-distance 552 is taken as the feature distance 560 between the first location feature 520 and the second location feature 540. The weighting coefficient used in weighting may be set according to actual requirements, for example, the two characteristic sub-distances may be directly added to obtain the characteristic distance, which is not limited in this disclosure.

In this embodiment, if the characteristic distance 560 is smaller than the preset distance threshold, it may be determined that the first corner 510 and the second corner 530 belong to the same corner group. In this embodiment, in the obtained corner group, feature distances between position features of two spatial points corresponding to any two target object corners may be both smaller than a distance threshold. Or, in the obtained corner group, the feature distances between the position features of the spatial points corresponding to each target corner and the position features serving as the clustering centers are all smaller than a distance threshold, which is not limited by the present disclosure.

Based on the corner correction method provided by the present disclosure, the present disclosure also provides a three-dimensional model generation method, which will be described in detail below with reference to fig. 6.

Fig. 6 is a flow chart diagram of a method of generating a three-dimensional model according to an embodiment of the disclosure.

As shown in fig. 6, the method 600 for generating a three-dimensional model of this embodiment may include operations S610 to S620.

In operation S610, point cloud data corresponding to each of at least two panoramic images including a target object is determined.

In operation S620, point cloud data corresponding to at least two panoramic images are aggregated to obtain a three-dimensional scene model including a target object.

According to an embodiment of the present disclosure, each of the at least two panoramic images may be an image obtained by correcting the corner points of the target object in the initial image by using the above-described method for correcting the corner points.

The embodiment may employ a point cloud generation network to generate point cloud data corresponding to each panoramic image. For example, a sparse point cloud network may be employed to generate sparse point cloud data from each image, followed by input of the sparse point cloud data into a Dense model (Dense Module) to generate Dense point clouds. The sparse point cloud network may include, among other things, an encoder and a decoder. The encoder is composed of a convolution network, and the decoder is composed of a deconvolution network and a convolution network. The dense model may process the sparse point cloud data through a feature extraction operation and a feature expansion operation.

According to the embodiment of the disclosure, when point cloud data corresponding to each panoramic image is determined, for example, a monocular depth estimation model may be used to process each panoramic image to obtain a depth map of each panoramic image. And then, according to each panoramic image and the depth map, point cloud data corresponding to each panoramic image is determined.

The monocular depth estimation model is a generation model, and is input into an image and output into an image containing depth information. The monocular depth estimation model may be a model constructed based on a deep learning technique. The monocular depth estimation model may be a supervised monocular depth estimation model or an unsupervised monocular depth estimation model. The supervised monocular depth estimation model needs to use a real depth map as supervision, and needs to rely on a high-precision depth sensor to capture real depth information. The auto-supervised monocular depth estimation model may use constraints between consecutive frames to predict depth information. The Self-Supervised Monocular Depth Estimation model may include, for example, a model framework MLDA-Net (Mult-Level Dual orientation-Based Network for Self-Supervised cellular Depth Estimation), and the like. The model framework MLDA-Net takes a color image with low resolution as input and can estimate corresponding depth information in an automatic supervision mode; the framework uses a multi-level feature fusion (MLFE) strategy to extract rich hierarchical representations from different perceptual scopes for high-quality depth prediction. The model framework can obtain efficient features with a dual attention strategy that enhances global and local structural information by combining global and local attention modules. The model framework utilizes a re-weighted strategy to calculate a loss function, and re-weights the output depth information of different levels, so that the final output depth information is effectively supervised.

In the overall structure of the model framework, input comprises a plurality of input data with different scales, the scales are selectable parameters scales, the input data is processed by two convolution networks to extract features, and then the features are integrated by an attention network GA and further extracted. The convolutional network and the attention network constitute a coding network structure. After the features are extracted, the model framework inputs the extracted features into a second network structure, the second network structure is mainly used for feature extraction and upsampling based on two attention modules, and finally, depth maps of different scales corresponding to input maps of different scales are output.

It is to be understood that the model framework of the monocular depth estimation model described above is merely an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto. For example, the monocular depth estimation model may be constructed based on a markov random field, a Monodepth algorithm, an SVS (Single View Stereo Matching) algorithm, or the like.

After obtaining the depth map, the embodiment may map the depth map and each panoramic image into three-dimensional point cloud data based on three-dimensional geometric principles. The three-dimensional point cloud data may include three-dimensional coordinates, color information, reflection intensity information, and the like. The color information may be represented by, for example, RGB values of corresponding pixels in each panoramic image, and the reflection intensity information included in each point cloud data corresponding to each pixel may be calculated according to a ray tracing algorithm and the RGB values of the corresponding pixels in each panoramic image.

According to the embodiment, the monocular depth estimation model is adopted to generate the depth map corresponding to each image, and the point cloud data is determined according to the depth map, so that the accuracy and the efficiency of determining the point cloud data can be improved.

According to the embodiment of the present disclosure, when aggregating point cloud data corresponding to at least two panoramic images, for example, one image of the panoramic images may be used as a pose reference image, and each of the other images except the pose reference image may form an image pair with the pose reference image. And then, determining the relative pose of the image pair, and converting the point cloud data corresponding to each other image into a three-dimensional coordinate system where the point cloud data corresponding to the pose reference image are located according to the relative pose. And aggregating the point cloud data after the coordinate conversion and the point cloud data corresponding to the pose reference image to obtain a three-dimensional scene model comprising the target object.

In the embodiment of the disclosure, because the panoramic image of the three-dimensional model is generated and corrected by adopting the corner correction method, the generated three-dimensional scene model has no dislocation, so that the three-dimensional scene model is more real and vivid, and the improvement of user experience is facilitated.

Based on the method for correcting the corner provided by the present disclosure, the present disclosure also provides a device for correcting the corner, which will be described in detail below with reference to fig. 7.

Fig. 7 is a block diagram of a structure of a correction apparatus for a corner point according to an embodiment of the present disclosure.

As shown in fig. 7, the corner correction apparatus 700 of this embodiment may include a location information determination module 710, a corner clustering module 720, a reference location determination module 730, and a location correction module 740.

The position information determining module 710 is configured to determine, according to position information of a target object corner included in each of the at least two images, position information of a spatial point corresponding to the target object corner, so as to obtain at least two pieces of spatial position information. In an embodiment, the location information determining module 710 may be configured to perform the operation S210 described above, which is not described herein again.

The corner clustering module 720 is configured to cluster the target object corners in the at least two images according to the at least two spatial position information, so as to obtain at least one corner group. In an embodiment, the corner clustering module 720 may be configured to perform the operation S220 described above, which is not described herein again.

The reference position determining module 730 is configured to determine reference position information of the corner of the target object in each corner group according to the position information of the spatial point corresponding to the corner of the target object in each corner group. In an embodiment, the reference position determining module 730 may be configured to perform the operation S230 described above, which is not described herein again.

The position correction module 740 is configured to correct the position of the corner point of the target object in the image according to the reference position information of the corner point of the target object. In an embodiment, the position correction module 740 may be configured to perform the operation S240 described above, which is not described herein again.

According to an embodiment of the present disclosure, each image comprises at least two target object corners. The corner clustering module 720 may include a location feature determination sub-module and a clustering sub-module. The position characteristic determining submodule is used for determining the position characteristic of the space point corresponding to each target object corner point according to the position information of the space point corresponding to each target object corner point in each image and the position information of the space point corresponding to the adjacent corner point. And the adjacent corner points are corner points adjacent to the position of each target object corner point in at least two target object corner points included in each image. The clustering submodule is used for clustering a plurality of target object corner points according to the position characteristics of a plurality of space points corresponding to the plurality of target object corner points in at least two images to obtain at least one corner point group.

According to an embodiment of the present disclosure, the location feature determination submodule may include a direction feature determination unit and a location feature determination unit. The direction feature determination unit is used for determining the direction feature of the space point corresponding to each target object corner point according to the position information of the space point corresponding to each target object corner point and the position information of the space point corresponding to the adjacent corner point. And the position characteristic determining unit is used for determining the position characteristic according to the direction characteristic and the position information of the space point corresponding to each target object corner point.

According to an embodiment of the present disclosure, the clustering submodule includes a feature distance determining unit and a clustering unit. The feature distance determining unit is configured to determine a feature distance between the location features of two spatial points corresponding to each two corner points of the plurality of target object corner points. The clustering unit is used for clustering the target object corner points according to the characteristic distance between the position characteristics of the space points.

According to an embodiment of the present disclosure, the location features include the direction features and location information of the spatial points corresponding to each corner point. The characteristic distance determining unit comprises a first distance determining subunit, a second distance determining subunit and a third distance determining subunit. The first distance determining subunit is configured to determine a distance between positions indicated by two pieces of position information in two position features of the two spatial points, and obtain a first feature sub-distance. The second distance determining subunit is configured to determine a difference between two directional features of the two position features, and obtain a second feature sub-distance. The third distance determining subunit is configured to determine a feature distance between the position features of the two spatial points according to the first feature sub-distance and the second feature sub-distance. The direction features indicate direction vectors between a space point corresponding to each target object corner point and a space point corresponding to an adjacent corner point.

According to an embodiment of the present disclosure, the neighboring corner points comprise a corner point adjacent to each target object corner point in the first orientation and a corner point adjacent to each target object corner point in the second orientation. The first azimuth and the second azimuth are located on two sides of each target object corner point.

According to an embodiment of the present disclosure, the reference position determination module 730 may include a mean determination sub-module and a reference position determination sub-module. And the mean value determining submodule is used for determining the mean value of the position information of all the space points corresponding to all the target object corner points in each corner point group as reference space information. And the reference position determining submodule is used for determining position information corresponding to the reference space information in the image where each target object corner in each corner group is located, and the position information is used as the reference position information of each target object corner.

Based on the method for generating the three-dimensional model provided by the present disclosure, the present disclosure also provides a device for generating the three-dimensional model, which will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a three-dimensional model generation apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the generation apparatus 800 of the three-dimensional model of this embodiment may include a point cloud data determination module 810 and a model obtaining module 820.

The point cloud data determining module 810 is configured to determine, for at least two panoramic images including the target object, point cloud data corresponding to the at least two panoramic images. Wherein the target object corner in each panoramic image is a corner corrected by the above-described correction means for corners. In an embodiment, the point cloud data determining module 810 may be configured to perform the operation S610 described above, which is not described herein again.

The model obtaining module 820 is configured to aggregate point cloud data corresponding to at least two panoramic images to obtain a three-dimensional scene model including a target object. In an embodiment, the model obtaining module 820 may be configured to perform the operation S620 described above, which is not described herein again.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users all conform to the regulations of related laws and regulations, and necessary security measures are taken without violating the good customs of the public order. In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 9 illustrates a schematic block diagram of an example electronic device 900 that may be used to implement the correction method of corners and/or the generation method of a three-dimensional model of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the correction method of corner points and/or the generation method of a three-dimensional model. For example, in some embodiments, the method of correction of corner points and/or the method of generation of the three-dimensional model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the above-described corner point correction method and/or the three-dimensional model generation method may be performed. Alternatively, in other embodiments, the calculation unit 901 may be configured by any other suitable means (e.g. by means of firmware) to perform a correction method of corner points and/or a generation method of a three-dimensional model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for corner correction, comprising:

determining position information of a spatial point corresponding to a target object corner point according to position information of the target object corner point included in each of at least two images to obtain at least two pieces of spatial position information;

clustering the target object corner points in the at least two images according to the at least two pieces of spatial position information to obtain at least one corner point group;

determining reference position information of the target object corner points in each corner point group according to the position information of the space points corresponding to the target object corner points in each corner point group; and

and correcting the position of the target object corner in the image according to the reference position information of the target object corner.

2. The method of claim 1, wherein at least two target object corners are included in each image; the clustering the target object corner points in the at least two images according to the at least two spatial position information to obtain at least one corner point group comprises:

determining the position characteristics of the space points corresponding to the corner points of each target object according to the position information of the space points corresponding to the corner points of each target object in each image and the position information of the space points corresponding to the adjacent corner points; the adjacent corner points are corner points which are adjacent to the position of each target object corner point in at least two target object corner points included in each image; and

and clustering the target object corner points according to the position characteristics of the space points corresponding to the target object corner points in the at least two images to obtain the at least one corner point group.

3. The method according to claim 2, wherein the determining, according to the position information of the spatial point corresponding to each target object corner in each image and the position information of the spatial point corresponding to an adjacent corner, the position characteristic of the spatial point corresponding to each target object corner comprises:

determining the direction characteristics of the space points corresponding to each target object corner point according to the position information of the space points corresponding to each target object corner point and the position information of the space points corresponding to the adjacent corner points; and

and determining the position characteristics according to the direction characteristics and the position information of the space points corresponding to the corner points of each target object.

4. The method of claim 2, wherein the clustering the plurality of target object corner points according to the location features of the plurality of spatial points corresponding to the plurality of target object corner points in the at least two images comprises:

determining a feature distance between the position features of two space points corresponding to each two corner points of the plurality of target object corner points; and

and clustering the target object corner points according to the characteristic distances among the position characteristics of the space points.

5. The method according to claim 4, wherein the position features comprise direction features and position information of spatial points corresponding to each target object corner point; the determining the feature distance between the position features of the two spatial points corresponding to each two corner points of the plurality of target object corner points comprises:

determining the distance between the positions indicated by the two position information in the two position characteristics of the two space points to obtain a first characteristic sub-distance;

determining the difference between two direction features in the two position features to obtain a second feature sub-distance; and

determining a feature distance between the location features of the two spatial points according to the first feature sub-distance and the second feature sub-distance,

and the direction features indicate direction vectors between the space point corresponding to each target object corner point and the space point corresponding to the adjacent corner point.

6. The method of any one of claims 2-5, wherein:

said neighbouring corner points comprising corner points neighbouring said each target object corner point in a first orientation and corner points neighbouring said each target object corner point in a second orientation,

wherein the first orientation and the second orientation are located on both sides of each target object corner point.

7. The method of claim 1, wherein the determining the reference position information of the target object corner points in each corner point group according to the position information of the spatial points corresponding to the target object corner points in each corner point group comprises:

determining the mean value of the position information of all the space points corresponding to all the target object corner points in each corner point group as reference space information; and

and determining position information corresponding to the reference space information in the image where each target object corner in each corner group is located, as reference position information of each target object corner.

8. A method of generating a three-dimensional model, comprising:

determining point cloud data corresponding to each panoramic image in at least two panoramic images including a target object; and

aggregating point cloud data corresponding to the at least two panoramic images to obtain a three-dimensional scene model including the target object,

wherein the target object corner in each panoramic image is a corner corrected by the method of any one of claims 1 to 7.

9. A device for correcting a corner point, comprising:

the position information determining module is used for determining the position information of a spatial point corresponding to a target object corner according to the position information of the target object corner included in each of at least two images to obtain at least two pieces of spatial position information;

the angular point clustering module is used for clustering the angular points of the target object in the at least two images according to the at least two pieces of spatial position information to obtain at least one angular point group;

the reference position determining module is used for determining the reference position information of the target object corner points in each corner group according to the position information of the space points corresponding to the target object corner points in each corner group; and

and the position correction module is used for correcting the position of the corner point of the target object in the image according to the reference position information of the corner point of the target object.

10. The apparatus of claim 9, wherein each image comprises at least two target object corners; the corner clustering module comprises:

the position feature determination submodule is used for determining the position feature of the space point corresponding to each target object corner point according to the position information of the space point corresponding to each target object corner point in each image and the position information of the space point corresponding to the adjacent corner point; wherein the adjacent corner is a corner adjacent to the position of each target corner in at least two target corners included in each image; and

and the clustering submodule is used for clustering the target object corner points according to the position characteristics of the space points corresponding to the target object corner points in the at least two images to obtain the at least one corner point group.

11. The apparatus of claim 10, wherein the location feature determination sub-module comprises:

a direction feature determining unit, configured to determine, according to the position information of the space point corresponding to each target object corner point and the position information of the space point corresponding to the adjacent corner point, a direction feature of the space point corresponding to each target object corner point; and

and the position feature determining unit is used for determining the position features according to the direction features and the position information of the space points corresponding to the corner points of each target object.

12. The apparatus of claim 10, wherein the clustering submodule comprises:

the characteristic distance determining unit is used for determining the characteristic distance between the position characteristics of two space points corresponding to each two corner points in the plurality of target object corner points; and

and the clustering unit is used for clustering the target object corner points according to the characteristic distances among the position characteristics of the space points.

13. The apparatus according to claim 12, wherein the position feature includes a direction feature and position information of a spatial point corresponding to each target object corner point; the feature distance determination unit includes:

the first distance determining subunit is configured to determine a distance between positions indicated by two pieces of position information in two position features of the two spatial points, to obtain a first feature sub-distance;

a second distance determining subunit, configured to determine a difference between two directional features in the two position features, to obtain a second feature sub-distance; and

a third distance determining subunit, configured to determine a feature distance between the location features of the two spatial points according to the first feature sub-distance and the second feature sub-distance,

14. The apparatus of any one of claims 10-13, wherein:

and the first azimuth and the second azimuth are positioned at two sides of each target object corner point.

15. The apparatus of claim 9, wherein the reference location determination module comprises:

the mean value determining submodule is used for determining the mean value of the position information of all the space points corresponding to all the target object corner points in each corner point group and taking the mean value as reference space information; and

and the reference position determining submodule is used for determining position information corresponding to the reference space information in the image where each target object corner in each corner group is located, and the position information is used as the reference position information of each target object corner.

16. An apparatus for generating a three-dimensional model, comprising:

the system comprises a point cloud data determining module, a point cloud data determining module and a processing module, wherein the point cloud data determining module is used for determining point cloud data corresponding to each panoramic image in at least two panoramic images comprising a target object; and

a model obtaining module for aggregating point cloud data corresponding to the at least two panoramic images to obtain a three-dimensional scene model including the target object,

wherein the target object corner in each panoramic image is a corner corrected using the apparatus of any one of claims 9 to 15.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of claims 1-8.

19. A computer program product comprising computer program/instructions stored on at least one of a readable storage medium and an electronic device, which when executed by a processor implement the steps of the method according to any one of claims 1 to 8.