CN112055192A

CN112055192A - Image processing method, image processing apparatus, electronic device, and storage medium

Info

Publication number: CN112055192A
Application number: CN202010774831.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Urban Network Neighbor Information Technology Co Ltd
Current assignee: Beijing Urban Network Neighbor Information Technology Co Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2020-12-08
Anticipated expiration: 2040-08-04
Also published as: CN112055192B

Abstract

The invention discloses an image processing method, an image processing device, electronic equipment and a storage medium, and relates to the technical field of image data processing. The method comprises the following steps: acquiring a video stream and a time stamp corresponding to at least one panoramic image shot by each three-dimensional sub-object in at least one three-dimensional object; acquiring the position of each video frame in the video stream according to the geometric relationship between the video frames in the video stream; taking a position corresponding to the time stamp in the position of each video frame as a position of the panoramic camera when at least one panoramic image of each three-dimensional sub-object is shot, wherein each panoramic image is shot for one three-dimensional sub-object; and splicing the plane contour of at least one panoramic image of each three-dimensional sub-object in a three-dimensional space based on the position of the panoramic camera to obtain the multi-object plane contour of at least one three-dimensional object. The embodiment of the invention can efficiently process the acquired data to provide the preparation data for object modeling.

Description

Image processing method, image processing apparatus, electronic device, and storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a storage medium.

Background

In the field of object modeling, how to make the generated object model have high resolution and/or high accuracy is a goal that is strongly pursued in the industry.

The object modeling can realize that a user can browse a 2D and/or 3D structure of a three-dimensional object without leaving the user (for example, through a network), and the 3D modeling of the object can realize the effect like being personally on the scene, which is a very important application in the field of virtual reality.

In the field of object modeling, especially 2D and 3D modeling, the technical solutions at home and abroad are mainly divided into two categories: manual fabrication and automated modeling.

For the manual manufacturing method, a large amount of manual operations are required, the three-dimensional structure of the object is identified, and the multiple object models are spliced manually. The manual production of a set of 3D models of three-dimensional objects takes a long time, so that a large amount of three-dimensional object data needs a lot of personnel to produce manually, the personnel cost is too high, and the practical application is difficult.

For the automatic 3D modeling method, currently, most professional 3D scanning devices are used, and the three-dimensional point clouds of a single object can be directly obtained, and then the three-dimensional point clouds are spliced to generate a 3D model. However, the image acquisition equipment of such professional 3D scanning equipment is not accurate, resulting in a captured image with a low resolution, resulting in a generated three-dimensional model with a low resolution. Moreover, such 3D scanning devices are often expensive and difficult to meet with the requirements of consumer-grade applications.

Therefore, how to obtain a high-resolution captured image, how to efficiently process the captured image to provide modeling preparation data for object modeling, and the like are technical problems that the present invention considers to solve.

Disclosure of Invention

In order to solve one of the above problems, the present invention provides an image processing method, an apparatus, an electronic device, and a storage medium.

According to an embodiment of the present invention, there is provided an image processing method including:

acquiring a video stream and a time stamp corresponding to at least one panoramic image shot by each three-dimensional sub-object in at least one three-dimensional object;

acquiring the position of each video frame in the video stream according to the geometric relationship between the video frames in the video stream;

taking a position corresponding to the timestamp in the position of each video frame as a position of a panoramic camera when at least one panoramic image of each three-dimensional sub-object is shot, wherein each panoramic image is shot for one three-dimensional sub-object;

and splicing the plane contour of the at least one panoramic image of each three-dimensional sub-object in the three-dimensional space based on the position of the panoramic camera to obtain the multi-object plane contour of the at least one three-dimensional object.

According to another embodiment of the present invention, there is provided an image processing apparatus including: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform one of the image processing methods described above.

According to yet another embodiment of the present invention, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing one of the image processing methods described above when executing the computer program.

According to yet another embodiment of the invention, there is provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor, causes the processor to perform one of the image processing methods described above.

According to the image processing method, the image processing device, the electronic equipment and the storage medium, the position of the panoramic camera when the panoramic image is shot is obtained according to the video stream shot for the three-dimensional object and the time stamp shot for the panoramic image, so that the plane contour of the three-dimensional sub-pair object extracted from the panoramic image is spliced according to the position of the panoramic camera to obtain the multi-object plane contour of the three-dimensional object for subsequent object modeling, and the collected data are efficiently processed to provide the preparation data for object modeling.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

FIG. 1 is a flow chart illustrating steps of an image processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating steps of another image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a planar contour in three-dimensional space of a plurality of panoramic images of a three-dimensional object provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a plan profile of a three-dimensional object in three-dimensional space provided by an embodiment of the invention;

FIG. 5 is a flowchart illustrating steps of a method for extracting a plane contour according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating steps of another plane contour extraction method according to an embodiment of the present invention;

FIG. 7 is a flowchart illustrating steps of a method for obtaining a plane contour according to an embodiment of the present invention;

fig. 8 is a flowchart illustrating steps of a method for acquiring coordinates of three-dimensional points according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating steps of a method for determining a three-dimensional object according to an embodiment of the present invention;

FIG. 10 is a flow chart illustrating steps of a method for modeling an object according to an embodiment of the present invention;

fig. 11 is a block diagram showing a configuration of an image processing apparatus according to an embodiment of the present invention;

fig. 12 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that the numbers, serial numbers and reference numbers in the present application are only presented for convenience of description, and no limitation is made to the steps, the sequence and the like of the present invention unless the specific sequence of the steps is explicitly indicated in the specification.

The invention provides an image processing method, an image processing apparatus, an electronic device, and a storage medium.

Firstly, in the invention, a common panoramic camera is adopted to shoot a high-resolution panoramic image and a video stream of each three-dimensional sub-object in the three-dimensional object, thereby overcoming the defect of low resolution of the image captured by the 3D scanning camera described in the background technology.

Then, using the plurality of taken panoramic images, a planar profile (which may be referred to as a "single-image planar profile") of a single panoramic image in a three-dimensional space may be extracted, and a position of each video frame is acquired according to a geometric relationship between video frames in a video stream, and a position of a video frame corresponding to a time stamp at which a three-dimensional sub-object is taken as a position of the panoramic camera.

Furthermore, through the scale normalization, the unification between the scale of the single image plane outline and the scale of the position of the panoramic camera can be realized, the normalized single image plane outlines are generated, high-resolution and sufficient data preparation is provided for the subsequent object modeling, and the difficulty of the subsequent processing work is reduced.

Still further, an accurate single-object plane contour can be obtained by single-object fusion of the single image plane contours belonging to the same three-dimensional sub-object.

Further, each three-dimensional sub-object of the three-dimensional object can be spliced with the plane contour of each single image in the three-dimensional space to obtain a multi-object model (in this case, a 2D model).

In addition, the multi-object model can be corrected to obtain a more accurate model, so that the model display effect is better.

Finally, a complete, high resolution and accurate 3D model is obtained by 3D model generation.

Hereinafter, for the convenience of understanding and description, the respective processes of the present invention will be described in detail by taking house modeling as an example.

Fig. 1 presents a flow chart of schematic steps of an image processing method according to an exemplary embodiment of the present invention.

Step 101, obtaining a video stream and a timestamp corresponding to at least one panoramic image shot by each three-dimensional sub-object in at least one three-dimensional object.

In the embodiment of the present invention, the three-dimensional object is a spatial object that needs to be subjected to image processing, and the three-dimensional sub-object refers to a spatial object included in the three-dimensional object, for example: the house comprises a room, and if the house is a three-dimensional object, the room is a three-dimensional sub-object. The video stream contains video data of each three-dimensional sub-object in the three-dimensional object, and the time stamp refers to a time stamp of taking a panoramic image of each three-dimensional sub-object.

And 102, acquiring the position of each video frame in the video stream according to the geometric relationship among the video frames in the video stream.

In the embodiment of the invention, the moving shooting track of the panoramic camera can be determined according to the geometric relation between the video frames in the video stream, so that the positions of the video frames in the video stream in the moving shooting track can be determined.

And 103, taking the position corresponding to the time stamp in the position of each video frame as the position of the panoramic camera when at least one panoramic image of each three-dimensional sub-object is shot, wherein each panoramic image is shot for one three-dimensional sub-object.

In the embodiment of the invention, after the position of each video frame is determined, the position of the panoramic camera when the panoramic image is shot can be obtained according to the position of the video frame in the timestamp corresponding to the panorama. Each three-dimensional sub-object in the three-dimensional object needs to shoot at least one panoramic image, and the number of the shot panoramic images can be determined according to actual requirements, which is not limited herein.

And 104, splicing the plane contour of the at least one panoramic image of each three-dimensional sub-object in the three-dimensional space based on the position of the panoramic camera to obtain a multi-object plane contour of the at least one three-dimensional object.

In the embodiment of the invention, the single-object plane contour of each three-dimensional sub-object is extracted from at least one panoramic image of each three-dimensional sub-object, then the single-object plane contour of each three-dimensional sub-object is subjected to scale normalization according to the position of the panoramic camera when each panoramic image is shot, the single-object plane contour of each three-dimensional sub-object is unified into a three-dimensional coordinate system, and finally the single-object plane contours of each three-dimensional sub-object after the scale normalization can be spliced to obtain the multi-object plane contour of the three-dimensional object for the use of subsequent three-dimensional modeling.

In practical application, after the single-object plane contour of each room is extracted from the panoramic image of each room in the house, the single-object plane contours of each room can be normalized and spliced according to the position of the panoramic camera when the panoramic image of each room is shot, so that the multi-object plane contour of each room in the house is obtained.

According to the image processing method provided by the invention, the position of the panoramic camera when the panoramic image is shot is obtained according to the video stream shot for the three-dimensional object and the time stamp shot for the panoramic image, so that the plane contour of the three-dimensional sub-pair object extracted from the panoramic image is spliced according to the position of the panoramic camera to obtain the multi-object plane contour of the three-dimensional object for subsequent object modeling, and the acquired data are efficiently processed to provide the preparation data for object modeling.

Fig. 2 presents a flow chart of the steps of an image processing method according to another exemplary embodiment of the present invention.

In the three-dimensional object modeling method according to an exemplary embodiment of the present invention, based on at least one panoramic image (one panoramic image corresponds to only one room (three-dimensional object) but a plurality of panoramic images may be taken in one room, that is, one room may correspond to a plurality of panoramic images) taken for one three-dimensional sub-object (for example, in one room), a plane contour in a three-dimensional space of each panoramic image of the three-dimensional sub-object is extracted, then the extracted plane contour is normalized to obtain a plane contour required for modeling, then, stitching of the plane contours is realized through coordinate transformation, thereby obtaining a multi-object plane contour (2D model), and then, a 3D model of the three-dimensional object may be generated from the multi-object plane contour.

Step 201, shooting each three-dimensional sub-object in at least one three-dimensional object to be processed, and acquiring at least one panoramic image of each three-dimensional sub-object and a video stream corresponding to the panoramic image, wherein the panoramic image is shot in the process of shooting the video stream of each three-dimensional sub-object.

In the embodiment of the present invention, in the process of shooting the three-dimensional object, each three-dimensional sub-object in the three-dimensional object may be continuously shot or intermittently shot first, as long as the video stream obtained by shooting may reflect the geometric relationship between each three-dimensional sub-object. For example: the video stream captured for room a includes video frames of room B, C adjacent to the video stream, i.e., reflects the geometric relationship between room a and room B, C. A panoramic image of each room of the house may be taken by the panoramic camera while each room is being taken.

Step 202, recording the time stamp of at least one panoramic image for shooting each three-dimensional sub-object.

In the embodiment of the present invention, the time stamp for capturing the panoramic image needs to correspond to the time stamp of the video stream, if the panoramic image is captured during the process of video capturing the three-dimensional sub-object, the time stamp for capturing the three-dimensional sub-object can be recorded corresponding to the video stamp, and if the panoramic image is captured after video capturing the three-dimensional sub-object, the time stamp for video capturing the three-dimensional sub-object corresponding to the panoramic image needs to be recorded.

In practical applications, mobile video shooting can be performed in sequence in each room in a house through a panoramic camera, and each time one room is shot, a panoramic image of the room can be shot, and the video shooting is still continued at the moment, so that the time stamp of a video stream when the panoramic image is shot can be recorded, so that the time stamp of the video stream can reflect the time point when the panoramic image is shot.

According to the embodiment of the invention, the video stream is shot and the corresponding timestamp of each panoramic image is recorded in the process of carrying out panoramic shooting on each three-dimensional sub-object in the three-dimensional object, so that the position of the panoramic camera determined subsequently is more accurate, and the accuracy of three-dimensional object modeling is improved.

Step 203, obtaining the position of each video frame in the video stream according to the geometric relationship between the video frames in the video stream.

This step can refer to the detailed description of step 102, which is not repeated here.

And 204, taking the position corresponding to the time stamp in the position of each video frame as the position of the panoramic camera when at least one panoramic image of each three-dimensional sub-object is shot, wherein each panoramic image is shot for one three-dimensional sub-object.

This step can refer to the detailed description of step 103, which is not repeated herein.

Step 205, extracting a planar contour of the at least one panoramic image of each three-dimensional sub-object in a three-dimensional space with respect to the at least one panoramic image of each three-dimensional sub-object.

In an embodiment of the present invention, a three-dimensional sub-object (e.g., a certain room of an indoor scene) may correspond to a plurality of panoramic images, and the three-dimensional sub-object may be considered to be composed of a bottom surface (i.e., a bottom), a top surface (i.e., a top), and a wall surface (i.e., a support). Then, the planar contour of the panoramic image in the three-dimensional space refers to the contour of the bottom surface of the three-dimensional sub-object corresponding to the panoramic image, i.e. the planar contour of the panoramic image in the three-dimensional space is a two-dimensional contour. As shown in fig. 3, three panoramic images are captured for a certain three-dimensional sub-object at different positions (i.e., a position a, a position B, and a position C), and the plane contour of each panoramic image corresponds to a different rectangular frame (or contours of other patterns), for example, the plane contour of the rectangle corresponding to the panoramic image captured at the position a is the largest, and the plane contour of the rectangle corresponding to the panoramic image captured at the position C is the smallest. Note that three black dots in the three panoramic images in fig. 4 indicate the positions of the panoramic camera when the panoramic images are captured.

It should be noted that at least one panoramic image may be taken for the same three-dimensional sub-object at the position of each panoramic camera.

Optionally, referring to fig. 5, the step 205 includes:

sub-step 2051, obtaining the three-dimensional point coordinates of the matching feature points on each panoramic image using the geometric relationship of at least two panoramic images taken for each three-dimensional sub-object of the at least one three-dimensional object to be processed.

Sub-step 2052, for each of the at least two panoramic images of each three-dimensional sub-object, generating a planar contour of the panoramic image in the three-dimensional space based on a contour surrounded by edge pixels among pixels whose contour features belong to a specific category on the panoramic image, where the specific category at least includes a top, a bottom, and a support of a three-dimensional object in the panoramic image.

In the embodiment of the present invention, the extraction of the plane contour may be achieved in various ways. An example will be briefly given below to explain the plane contour extraction method.

Taking at least one three-dimensional object as an indoor house as an example, each room may be regarded as one three-dimensional sub-object, and at least one panoramic image is taken for each room, so one panoramic image corresponds to one room, but each room may correspond to a plurality of panoramic images.

In this case, since the ceiling is always above the camera, the uppermost pixel point in the panoramic image is always on the ceiling. And thirdly, most of the pixel points belonging to the ceiling have similar characteristics, so that all the pixel points belonging to the ceiling can be finally obtained according to the characteristic similarity of the pixel points.

For example, all the pixel points in the first row of the panoramic image are regarded as pixel points belonging to the ceiling; for each pixel in the second row, the feature similarity (the feature may adopt color, gray scale, etc., and the feature similarity of two pixels may be, for example, the absolute value of the difference between the features of two pixels (e.g., the difference between gray scales or the difference between colors, etc.)) with the pixel belonging to the same column in the first row is calculated. If the feature similarity is within a certain threshold (if a gray scale value of 0-255 is adopted, the threshold may be set to 10, for example), the pixel also belongs to the ceiling, and then the similarity between the third row and the second row on the column and the similarity between the fourth row and the third row are calculated continuously until the feature similarity exceeds the threshold, and the pixel position at this time is an edge pixel of the ceiling.

The edge pixels of the ceiling form the edge of the ceiling, and therefore, the plane outline of the ceiling can be formed by projecting the edge pixels to the three-dimensional space.

The projection of the pixel points into three-dimensional space will be described below.

Assume that the width of a panoramic image is W and the height is H, and assume that the obtained edge pixel point c of the ceiling has the coordinate (p) in the image coordinate system_c，q_c). Since the panoramic image is obtained by spherical projection, it is expressed as (θ) in a spherical coordinate system_c，φ_c) Wherein theta_c∈[-π，π]Is longitude, phi_c∈[-π/2，π/2]Is a dimension.

The relationship between the spherical coordinate system and the image coordinate system can be obtained by the following formula 1:

because the ceiling can be regarded as a plane, the pixel points at the edge of the ceiling have uniform height h from the camera_c(h here)_cCan assume any value, such as 100), can be referred to as "assumed height of ceiling from camera" h_c. To avoid misunderstandings, it should be noted here that the ceiling is at an assumed height h from the camera_cNot the ordinate in the image coordinate system but the height in the three-dimensional coordinate system (i.e., the value of the y-axis in the three-dimensional coordinate system).

The following formula 2 is a coordinate (θ) of a pixel point c at the edge of the ceiling in a spherical coordinate system _c，φ_c) Three-dimensional point coordinates (x) projected onto a three-dimensional plane_c，y_c，z_c)：

In this document, the term "image coordinate system" refers to a coordinate system where image pixels are located, and is mainly used to describe the locations of the pixels in the image. Therefore, the panoramic image coordinate system refers to a coordinate system where the pixel points of the panoramic image are located, and is mainly used for describing the positions where the pixel points are located in the panoramic image.

Note that the above gives only one example of generating a plane contour in a three-dimensional space of a panoramic image based on the similarity of ceiling feature points on the panoramic image, and the method that can be used by the present invention is not limited to this example.

Since the ceiling can be regarded as a plane, and since the panoramic camera is generally supported by a tripod, the height of the panoramic camera is generally fixed when the house is photographed, and therefore, it can be regarded that each pixel point of the ceiling edge obtained from the processed panoramic image has a uniform height from the camera, which can be referred to as "the height of the camera from the ceiling".

Here, since the panoramic camera is generally supported by a tripod and has a fixed height, it can be considered that the height of the camera from the ceiling and the height of the camera from the floor are fixed.

For the plane contour (ceiling plane contour) in the three-dimensional space obtained in this step, a height value can be assumed for each three-dimensional point on the contour, for example, the height of the camera from the ceiling is assumed to be h _c(may be referred to as "assumed height of camera from ceiling" h_c) And the assumed height may be an arbitrary value, such as 100 (the actual height of the real camera from the ceiling may be found by subsequent processing, and the height acquired subsequently may be referred to as "extracted height of the camera from the ceiling"). To avoid confusion, the height h of the camera from the ceiling will be assumed here below_cReferred to as "assumed height of camera from ceiling" h_c。

In the above embodiments, the planar contour of the image can be automated based on the panoramic image without human involvement in the production and without the use of expensive 3D scanning equipment.

Optionally, the sub-step 2052 includes: determining the edge pixel points among the pixel points of which the contour features belong to the specific category on the at least two panoramic images of each three-dimensional sub-object based on the feature similarity between the pixel points on the at least two panoramic images of each three-dimensional sub-object; the feature similarity of the two pixels is an absolute value of a difference between features of the two pixels, and the feature of each pixel comprises gray scale and color.

Optionally, referring to fig. 6, the step 205 may include:

Sub-step 2053, obtaining the three-dimensional point coordinates of the matching feature points on each panoramic image, using the geometric relationship of at least two panoramic images taken for each three-dimensional sub-object of said at least one three-dimensional object to be processed.

Sub-step 2054, for each of at least two panoramic images of said each three-dimensional sub-object, extracting a planar contour of said panoramic image in said three-dimensional space by means of a deep learning model for extracting an image contour.

In the embodiment of the invention, the at least two panoramic images of each three-dimensional sub-object are input into the trained deep learning model, so as to obtain the category of the contour feature corresponding to each pixel point in the at least two panoramic images of each three-dimensional sub-object; extracting edge pixel points at the edge from pixel points of which the contour characteristics belong to a specific category from each panoramic image of at least two panoramic images of each three-dimensional sub-object as specific category edge pixel points; assuming that all the specific category edge pixel points on each panoramic image of the at least two panoramic images of each three-dimensional sub-object have the same height h _cAnd as an assumed contour height of a contour of a specific category, projecting the specific category edge pixel points on each panoramic image onto a three-dimensional plane to obtain specific category three-dimensional points corresponding to each panoramic image, and then forming a plane contour of each panoramic image in the three-dimensional space based on the specific category three-dimensional points corresponding to each panoramic image.

Optionally, the deep learning model is obtained by:

step S1, generating a contour feature training data set of the three-dimensional object of the type of interest by manually labeling contour features of the three-dimensional object of the type of interest on a plurality of panoramic images as training images;

step S2, training the deep learning model by using the contour feature training data set of the three-dimensional object of the type of interest, thereby obtaining a trained deep learning model, wherein an output of the deep learning model includes a category of contour features of the three-dimensional object of the type of interest.

In the embodiment of the present invention, the indoor house is taken as an example, and the specific content of the step will be described in detail.

For example, for most scenes, the ceiling of a room is a plane, which can be used to represent a plan view of the room, and therefore, in the present invention, a plane contour of a panoramic image is obtained by extracting a ceiling contour through a deep learning model as a semantic segmentation model.

Here, semantic segmentation refers to classifying each pixel point in an image into categories. Therefore, the semantic segmentation model of the invention can be regarded as a deep learning model for classifying pixel points on an image.

Those skilled in the art will appreciate that machine Learning can be divided into Shallow Learning (shalow Learning) and Deep Learning (Deep Learning). The hidden nodes for shallow learning are generally fewer in layers, and the hidden nodes for deep learning are generally more in layers, for example, the deep learning model is generally 5 layers, 6 layers, and even 10 layers or more of the hidden nodes.

In the semantic segmentation model, the classification of the pixel points is usually defined in advance. For example, for an indoor house scene, the pixel points may be generally defined as a ceiling, a floor, a wall, a door, a cabinet, a sofa, and so on. For outdoor scenes, for example, the class of pixel points may be defined as, for example, sky, road, trees, buildings, and so on.

Most of the traditional semantic segmentation technology adopts a classifier and graph model method. Common conventional classifiers include Support Vector Machines (SVMs), Random Forest (Random Forest), and other classification algorithms. The input of the classifier is usually artificially designed local features, and the commonly used features are RGB, gray scale, SIFT and the like. And the classifier judges the category of each pixel point in the image one by one. Commonly used graph modeling techniques include Markov random fields (Markov random fields), Conditional random fields (Conditional random fields), which act to enhance the consistency of the classes of neighboring pixels.

With the application of deep learning techniques in semantic segmentation, deep learning methods have greatly surpassed traditional semantic segmentation techniques.

The common deep learning model of semantic segmentation is mainly based on the framework of CNN (convolutional neural network). Since semantic segmentation requires outputting the category of each pixel (if the size of the input image is H × W, the output is also H × W), on the basis of the conventional CNN, an upsampling method is required to be introduced to increase the resolution of the final output (the simplest upsampling method is nearest neighbor sampling). Therefore, common semantic segmentation models include deep lab, UperNet, PSPNet, and the like according to different upsampling modes.

According to the technology, a large number of images shot by a common camera are collected, and manual semantic annotation is carried out on each pixel point, for example, outdoor scenes are marked as sky, road surfaces, trees, buildings and the like. When the deep network is trained, the samples are sent into a deep semantic segmentation model, an estimated probability matrix is output, an objective function of Cross Entropy Loss (Cross Entropy Loss) is adopted to reduce the error between the estimated value and the real labeled value until the final error is not changed, and then the model training is finished.

And inputting the input image to be processed into the trained deep learning model to obtain an output probability matrix, and calculating the dimension corresponding to the maximum probability output value at each position to serve as the class value corresponding to the pixel. For example, the size of the input image is H × W, and the size of the probability matrix output by the model is H × W × C, where C represents the number of classes. Each pixel point in the image corresponds to a C-dimensional probability vector (the sum of the probability vectors is 1), and the position of the maximum value is the category label corresponding to the pixel point.

In the invention, the semantic segmentation of the panoramic image is realized by labeling the panoramic data, which is different from the traditional method for performing semantic segmentation by using a common image.

Specifically, in the present invention, for example, training data including a ceiling, a wall, and a floor may be generated by a method of manually labeling a boundary line between the ceiling and the wall and a boundary line between the floor and the wall on a panoramic image of an indoor house (since the boundary line between the ceiling, the boundary line between the wall, and the floor are automatically generated in the subsequent model generation process, it is not necessary to manually label these pieces of information here).

Then, the deep learning model whose output includes three categories of ceiling, floor, and wall is trained by using the training data. That is, the classification corresponding to each pixel point in the panoramic image, i.e., one of the three classifications of the ceiling, floor, and wall surface, can be output from the trained deep learning model for each panoramic image.

Next, those pixels that are at the edge (which may be referred to as "ceiling edge pixels") among the pixels whose category belongs to "ceiling" are extracted.

Assuming that the pixel points at the edge of the ceiling have the same height information (because the pixel points belong to the ceiling), then, projecting the pixel points onto a three-dimensional plane to obtain corresponding three-dimensional points, and forming a plane outline of the ceiling in a three-dimensional space based on the three-dimensional points.

Specifically, in this step, for example, the method for projecting the pixel points (i.e., the pixel points on the edge of the ceiling) onto the three-dimensional plane may refer to the description about the projection of the pixel points onto the three-dimensional space.

Note that the above gives only one example of generating a plane contour in a three-dimensional space of the panoramic image by the deep learning model, and the method that can be used by the present invention is not limited to this example.

Optionally, referring to fig. 7, the sub-step 2054 includes:

and a substep 20541, inputting the at least two panoramic images of each three-dimensional sub-object into the trained deep learning model, and obtaining a category of the contour feature corresponding to each pixel point in the at least two panoramic images of each three-dimensional sub-object.

And a substep 20542 of extracting edge pixel points at the edge from the pixel points of which the types of the contour features belong to the specific types from each of the at least two panoramic images of each three-dimensional sub-object as specific type edge pixel points.

Sub-step 20543 of assuming all the pixels of the edge pixels of the specific class on each of said at least two panoramic images of each three-dimensional sub-object have the same height h_cAnd as an assumed contour height of a contour of a specific category, projecting the specific category edge pixel points on each panoramic image onto a three-dimensional plane to obtain specific category three-dimensional points corresponding to each panoramic image, and then forming a plane contour of each panoramic image in the three-dimensional space based on the specific category three-dimensional points corresponding to each panoramic image, wherein the specific category includes a top of a three-dimensional object in the panoramic image.

Optionally, with reference to fig. 8, the sub-step 2051 or 2053 comprises:

sub-step 20501, performing feature point matching between at least two panoramic images of each three-dimensional sub-object by using the geometric relationship between at least two panoramic images shot for each three-dimensional sub-object of the at least one three-dimensional object to be processed, and recording mutually matched feature points in at least two panoramic images of each three-dimensional sub-object as matching feature points.

In the embodiment of the present invention, in the image processing technology, the image feature point refers to a point where the image gray value changes drastically or a point where the curvature is large on the edge of the image (i.e., the intersection of two edges). The image feature points can reflect the essential features of the image and can identify the target object in the image.

How to efficiently and accurately match the same object in two images from different perspectives is the first step in many computer vision applications. Although the image exists in the form of a gray matrix in the computer, the same object in the two images cannot be accurately found by using the gray of the image. This is because the gray scale is affected by the light, and when the image viewing angle changes, the gray scale value of the same object will also change. Therefore, it is desirable to find a feature that can remain unchanged when the camera moves and rotates (the angle of view changes), and use the unchanged feature to find the same object in images from different angles of view.

Therefore, in order to better perform image matching, it is necessary to select a representative region in an image, for example: corners, edges and some blocks in the image. Wherein the identification degree of the corner point is the highest. In many computer vision processes, angular points are usually extracted as feature points to match images, and examples of usable methods include SFM (Structure Form Motion), SLAM (Simultaneous Localization and Mapping), and the like.

However, a simple corner point does not meet the requirements well, for example: the camera gets a corner point from far, but may not be at near; alternatively, the corner points change when the camera is rotated. For this reason, researchers in computer vision have designed many more stable Feature points that do not change with the movement, rotation, or illumination of the camera, and examples of the method that can be used include SIFT (Scale-Invariant Feature Transform), SURF (speedup Robust Features), and the like.

The feature points of an image are composed of two parts: a Keypoint (Keypoint) and a Descriptor (Descriptor). The key points refer to the positions of the feature points in the image, and some feature points also have direction and scale information; a descriptor is typically a vector that describes the information of the pixels around a keypoint. In general, in matching, two feature points can be considered as the same feature point as long as their descriptors are close to each other in the vector space.

Matching of feature points typically requires the following three steps: 1) extracting key points in the image; 2) calculating descriptors of the feature points according to the obtained positions of the key points; 3) and matching according to the descriptors of the feature points.

Alternatively, the related processing of feature point matching in this step may be implemented using, for example, the open source computer vision library OpenCV. For brevity and without obscuring the subject matter of the present invention, further details of the processing of this section are not provided herein.

After feature point matching between these panoramic images is performed, feature points (also referred to as "matching feature points") that match each other in these panoramic images are recorded, and recording of the matching feature points may be performed, for example, as follows.

For example, if a feature point a on the image 1 matches a feature point b on the image 2, the feature point b on the image 2 matches a feature point c on the image 3, and the feature point c on the image 3 matches a feature point d on the image 4, a piece of feature point matching data (a, b, c, d) (also referred to as a "feature point tracking trajectory") may be recorded. Thereby, the input panoramic images are recorded with respect to the mutually matched feature points.

A sub-step 20502 of reducing the reprojection error of the matching feature points on each panoramic image by, for each panoramic image of the at least two panoramic images of each three-dimensional sub-object, the three-dimensional point coordinates of the matching feature points on each panoramic image.

Image re-projection refers to generating a new image by projecting a reference image of an arbitrary viewpoint, that is, image re-projection can change the line-of-sight direction of an already generated image.

Specifically, in the present invention, the image Reprojection refers to projecting the three-dimensional point coordinates corresponding to one feature point p1 on the image 1 into another image 2 by the current camera parameters, and the position difference between the resulting projected point q2 on this image 2 and the feature point p1 on the image 1 in the matching feature point p2 in this image 2 constitutes a Reprojection Error (Reprojection Error). Here, the matching feature point p2 in the image 2 is an actual position, and the projected point q2 obtained by the re-projection is an estimated position, and the camera position is solved by minimizing the difference in position between the projected point q2 and the matching feature point p2 as much as possible, that is, by making the projected point q2 and the matching feature point p2 coincide as much as possible.

The variables contained in the objective function for optimizing (reducing) the re-projection error comprise the three-dimensional coordinates of the camera position and the feature points, and the three-dimensional coordinates of the camera position and the feature points are obtained in the process of gradually reducing (optimizing) the re-projection error.

Optionally, in the present invention, the reprojection error may be reduced by combining a gradient descent algorithm and a Delaunay Triangulation algorithm (Delaunay Triangulation), so as to achieve the purpose of optimization.

When the gradient descent algorithm is used, the three-dimensional point coordinates of the matched characteristic points are taken as a constant, and the position of the camera is taken as a variable, and conversely, when the Delaunay triangle algorithm is used, the three-dimensional point coordinates of the matched characteristic points are taken as a variable, and the position of the camera is taken as a constant.

Alternatively, in the present invention, progressive solution may be used to improve the accuracy of the solved camera position and three-dimensional point coordinates, i.e., in the solution process, its camera position and the three-dimensional point coordinates of the matching feature points are solved by adding one image at a time. Among them, methods for progressive solution include, for example, isfm (incremental sfm).

Additionally, further optionally, bundle adaptation (bundle adaptation) may be employed to further reduce the reprojection error. Specifically, after the process of reducing the reprojection error to obtain the camera positions and the three-dimensional point coordinates is performed for each panoramic image, all the camera positions and all the three-dimensional point coordinates can be optimized simultaneously using the bundle optimization in the lump finally. In the process of reducing the reprojection error to obtain the camera position and the three-dimensional point coordinates, after the camera position and the three-dimensional point coordinates are acquired for any panoramic image, the processing of bundle optimization may be added to optimize the acquired camera position and the three-dimensional point coordinates.

Here, the bundle optimization refers to a method of optimizing the positions of all cameras and all three-dimensional point coordinates at the same time, and is different from a method of optimizing only the current camera position and the three-dimensional point coordinates on the current image in the progressive solution, respectively.

In addition, in addition to the progressive solution described above, a global solution method may be employed.

And 206, normalizing the plane contour of the at least one panoramic image of each three-dimensional sub-object in the three-dimensional space based on the position of the panoramic camera to obtain the normalized plane contour of each three-dimensional sub-object.

In the embodiment of the present invention, the purpose of the scale normalization is to unify the scale of the position of the panoramic camera and the scale of the plane contour of each panoramic image, which may be a real scale such as meters. That is, the coordinates of the position of the panoramic camera and the size of the plane profile are under a uniform unit of measure. For example, the position of the panoramic camera includes the coordinates and orientation of the camera.

Optionally, referring to fig. 9, the step 206 includes:

a substep 2061 of sorting the height values in all the three-dimensional point coordinates on the at least one panoramic image of each three-dimensional sub-object obtained from small to large, and taking the median or mean of the height values sorted in the top as the extracted contour height h of the contour of the specific category _c'; and an assumed profile height h using a profile of a particular class_cExtracted profile height h from a profile of a particular class_c' generating a normalized planar contour in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object from the planar contours in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object.

Sub-step 2062 of determining a camera determination height h at which the position of the panoramic camera is predetermined_cIn the case of' the height h is assumed with an assumed camera_cDetermining the height h with the camera_c' generating normalized at least two panoramas of each three-dimensional sub-object from planar contours of the at least two panoramas of each three-dimensional sub-object in three-dimensional spaceA planar contour of the image in three-dimensional space, wherein an assumed contour height h of the contour of the particular class_cIs an arbitrarily assumed height.

A sub-step 2063 of determining, for at least two panoramic images taken for said at least one three-dimensional object to be processed, one by one, whether a plurality of panoramic images belong to the same three-dimensional object, by: and if the two panoramic images have more than a specific proportion of matched characteristic points, determining that the two panoramic images belong to the same three-dimensional object.

And a sub-step 2064 of, if it is determined that the plurality of panoramic images belong to the same three-dimensional object, taking a union of all plane contours of the same three-dimensional object as a plane contour of the three-dimensional object in a three-dimensional space for each plane contour of the same three-dimensional object obtained from the plurality of panoramic images.

And step 207, splicing the normalized plane contour of each three-dimensional sub-object based on the position of the panoramic camera to obtain a multi-object plane contour of the at least one three-dimensional object.

In the embodiment of the present invention, a corresponding planar contour in a three-dimensional space is obtained from a panoramic image, which may be referred to as a "single-object planar contour".

For example, taking each room as a three-dimensional object of the type of interest as an example, as described above, since a plurality of panoramic images of the same room may be included in a captured panoramic image, in this case, the same room will correspond to a plurality of plane contours in a three-dimensional space, and therefore, in a multi-room plane contour obtained by a subsequent multi-room stitching process, a phenomenon may occur in which plane contours obtained from different panoramic images of one or more rooms do not coincide, resulting in overlapping or confusing stitched contours. Therefore, it is considered to perform fusion of the contour of the same room (object) first (which may be referred to as "single-object fusion") to avoid such a phenomenon. Moreover, single object fusion (e.g., fusion of the same room contour) can also eliminate the single object contour incomplete phenomenon.

For the above-mentioned situation that single object fusion is required, the following exemplary method will be given below by taking a room as a three-dimensional sub-object as an example.

First, it is determined whether two panoramic images belong to the same room.

Here, a feature point matching-based approach may be adopted, and if there are more than a certain proportion (a certain proportion, for example, 50%, etc.) of matching feature points between two panoramic images, it may be determined that the two panoramic images belong to the same room.

Then, if a plurality of panoramic images belong to the same room, that is, for plane contours of the same room obtained from different panoramic images, a union of these plane contours is taken as a single room plane contour in a three-dimensional space (one room contour, avoiding the case of a plurality of single image contours of one room), thereby realizing fusion of the same room contour in the three-dimensional space.

The proportion of the matching feature points can be set in the following way: suppose that image 1 has n1 feature points, image 2 has n2 feature points, and the number of matching feature points of the two images is n. Then the proportion of matching feature points may be n/min (n1, n 2).

Alternatively, it may be set that if the proportion of matching feature points between the two panoramic images is greater than, for example, 50%, the two images are regarded as the same room.

Here, the setting of the proportion of the matching feature points and the actual size of the proportion may be tested or determined empirically according to actual circumstances, and the present invention is not limited thereto.

As described above, in the present invention, for the above-mentioned at least two panoramic images, it can be determined whether a plurality of panoramic images belong to the same room by means of single-room fusion as follows: if there are more than a certain proportion of matching feature points between two panoramic images, it can be determined that the two panoramic images belong to the same room.

If it is determined that the plurality of panoramic images belong to the same room, for plane profiles of the same room obtained from the plurality of panoramic images, a union of the plane profiles is taken as a plane profile of the room.

In addition, after the contours of the same room are fused, noise may exist due to the obtained contour edges, and for example, the phenomena that the edge lines are not straight and the edge lines are not perpendicular to the edge lines may appear. Therefore, the invention can further carry out right-angle polygon fitting on the outline of each room to obtain a more reasonable room plane outline.

Through the optimization processing specially performed for the single object, such as single object fusion and/or right-angle polygon fitting, a more accurate single object plane contour can be obtained, the subsequent generation of 2D and 3D models is facilitated, and the resolution and the accuracy of the models are improved.

Note that this step is not a necessary step for two-dimensional or three-dimensional modeling of three-dimensional objects, but is a preferred way of processing that can improve the accuracy of the model.

Optionally, step 207 includes:

assuming that the planar profiles in the three-dimensional space of all the panoramic images are N in total, the pth three-dimensional point of the nth planar profile is represented as

The position of the panoramic camera when the panoramic image corresponding to the nth plane profile is shot is expressed as { R_n，t_nIn which R is_nA rotation matrix of rotation parameters for representing the position of the panoramic camera, t_nA translation vector which is a translation parameter for representing a position of the panoramic camera, wherein N is an integer greater than 1, N is an integer greater than or equal to 1,

the position of the panoramic camera when the panoramic image corresponding to the ith plane contour is shot is selected as a reference coordinate system, and three-dimensional points of other plane contours can be unified under the reference coordinate system through the following formula (3):

converting all three-dimensional points of the scale-normalized plane contour except the ith plane contour through the formula to unify the three-dimensional points of all the plane contours to the same coordinate system, thereby splicing the plane contours of the three-dimensional sub-objects into the multi-object plane contour.

In the embodiment of the present invention, a room is taken as an example of a three-dimensional sub-object, and the specific operation thereof will be described in detail below.

Assuming the contour of N rooms, the p-th three-dimensional point of the nth room contour is represented as

The camera position of the room is denoted as { Rn, tn }, where Rn is a rotation matrix representing rotation parameters of the camera position and tn is a translation vector representing translation parameters of the camera position.

At this time, the camera position of the first room can be selected as the reference coordinate system, because the currently obtained room outlines are the outline positions in the respective coordinate systems, and need to be unified into one coordinate system, so that one reference coordinate system needs to be selected. Specifically, the coordinate system in which the camera position of the first room is located may be selected as the reference coordinate system. Then, the contour three-dimensional points of other rooms can be unified into the coordinate system by the above equation 3.

All dimension-normalized contour three-dimensional points (for example, three-dimensional points on a ceiling edge, a wall surface edge and a floor edge) except the first room are converted through a formula 3, so that the three-dimensional points of all rooms can be unified to the same coordinate system (namely, a reference coordinate system of the first room), and therefore splicing of the multi-room plane contour can be achieved.

Here, the coordinate system of any one room may be selected as the reference coordinate system (for example, the coordinate system of the ith room is taken as the reference coordinate system, where n → 1 in equation 3 is changed to n → i, R1 is changed to Ri, and t1 is changed to ti), which is not limited in this respect, because the present invention requires a relative positional relationship, not an absolute positional relationship. For example, in the case where the camera position at the time when the panoramic image corresponding to the ith plane contour is photographed is selected as the reference coordinate system, the three-dimensional points of the other plane contours can be unified under the reference coordinate system by the following formula (3).

Converting all three-dimensional points of the dimension-normalized plane contour except the ith plane contour through the formula to unify the three-dimensional points of all the plane contours to the same coordinate system, thereby splicing the plane contours of all the three-dimensional sub-objects into a multi-object plane contour.

Here, the multi-object planar contour obtained after the multi-object stitching of this step may be output as a 2D model (e.g., a 2D house graph) of the at least one (including a plurality of) three-dimensional sub-objects.

Step 208, calculating the distance between the adjacent edges of two single-object outlines in the multi-object outline, and if the distance is not zero and is less than a specific threshold, shifting the two single-object outlines so that the distance between the adjacent edges becomes zero.

In the embodiment of the invention, after the multi-object contour is spliced, the multi-object contour can be further corrected to obtain a more accurate multi-object contour.

Taking a room as an example of a three-dimensional sub-object of the type of interest, due to the influence of the single-image plane contour extraction precision and the camera position extraction precision, the contours of adjacent multi-dimensional objects (such as a set of indoor houses) may have an overlapping region or a gap after splicing, and therefore, the contours can be further corrected for the two cases.

The correction method may be, for example, as follows. First, the distance between adjacent edges of two contours (which should theoretically coincide, that is, should theoretically be one coinciding edge of the multi-room contour) is calculated, and if the distance is smaller than a certain threshold, it can be determined that the two edges are in an adjacent relationship, and at this time, the contour can be shifted accordingly so that the distance between the adjacent edges becomes 0 (i.e., the adjacent edges become coinciding edges), thereby correcting the overlap or gap between the adjacent edges.

For the above threshold, for example, an average length L of the adjacent edges that should be an overlapped edge may be calculated, and a certain proportion of the average length may be used as the threshold, for example, 0.2 × L may be used as the distance threshold.

Note that the above is merely an exemplary threshold value given for ease of understanding, and in fact, the present invention does not impose additional limitations on the threshold value, which can be determined experimentally and empirically.

Thus, the multi-room contour after the above single-room contour fusion and multi-room contour modification can be used as a complete and accurate 2D floor plan (2D model of the house) of the set of houses.

And 209, converting the multi-object plane contour in the three-dimensional space obtained by splicing into a multi-object 3D model.

Optionally, the bottom planar profile is obtained by: the extracted contour height h of the contour of the specific category in all the three-dimensional point coordinates on the top plane contour of each three-dimensional object is obtained_c' the height value of the extraction height of the camera from the top of the corresponding three-dimensional object is replaced by the extraction height h of the camera from the bottom of the corresponding three-dimensional object_f' and keeping the length and width values in all three-dimensional point coordinates on the top planar contour of each three-dimensional object constant, obtaining the bottom planar contour of each three-dimensional object correspondingly, wherein,

the extracted contour height h of the camera from the contour of the specific class at the top of the respective three-dimensional object _c' is obtained by: sorting the height values in all three-dimensional point coordinates on at least one panoramic image of each three-dimensional sub-object obtained when the position of the panoramic camera is obtained from small to large, and taking the median or mean of the height values sorted at the top as the extracted contour height h of the camera from the contour of the specific category at the top of the corresponding three-dimensional object_c', and

the camera is at an extraction height h from the bottom of the corresponding three-dimensional object_f' is obtained by: will be in acquiring the position of the panoramic cameraSorting the height values in all three-dimensional point coordinates on at least one panoramic image of each three-dimensional sub-object from small to large, and taking the median or mean of the height values after sorting as the extraction height h from the camera to the bottom of the corresponding three-dimensional object_f’。

Optionally, referring to fig. 10, the step 209 includes:

substep 2091, performing three-dimensional point interpolation on the top plane contour in the multi-object plane contours obtained by splicing, and projecting all three-dimensional point coordinates on each top plane contour obtained to a corresponding panoramic image coordinate system to obtain a top texture;

Substep 2092, performing three-dimensional point interpolation on the bottom plane contour in the multi-object plane contours obtained by splicing, and projecting all three-dimensional point coordinates on each obtained bottom plane contour into a corresponding panoramic image coordinate system to obtain bottom textures;

substep 2093, connecting three-dimensional vertexes on the same plane position between the top outline and the bottom outline to form a plane outline of the supporting part, performing three-dimensional point interpolation inside the plane outline of the supporting part, and projecting all three-dimensional point coordinates of the obtained plane outline of each supporting part into a corresponding panoramic image coordinate system so as to obtain a supporting part texture;

substep 2094 generates a 3D texture model of the entire three-dimensional object based on the top texture, the bottom texture, and the support portion texture.

In the embodiment of the present invention, for convenience of understanding and description, the house modeling will be described as an example below.

For the multi-object plane contour (e.g., multi-room plane contour) obtained in the previous step, three-dimensional point interpolation is performed internally, and then all three-dimensional point coordinates are projected into the corresponding panoramic image so as to acquire the ceiling texture (color value).

Here, a method of interpolating three-dimensional points will be exemplified. For example, assuming that the ceiling profile of the obtained multi-room plane profile is a rectangle, assuming that the length is H and the width is W, the length and the width can be divided into N intervals, respectively, so that a total of N × N interpolation points can be obtained. Then, a vertex of the rectangle may be selected (assuming that the three-dimensional point coordinates of the vertex are (x, y, z)) as an origin, and the N × N points may be sequentially represented by (x + H/N, y, z), (x +2 × H/N, y, z) … (x, y + W/N, z) (x, y +2 × W/N, z), … (x + H/N, y + W/N, z) …. Therefore, after the three-dimensional point interpolation, the dense three-dimensional point coordinates inside the contour are obtained.

It should be noted that a specific example of three-dimensional point interpolation is given above for the sake of understanding, and in fact, the three-dimensional point interpolation method applicable to the present invention may be many and is not limited to this example.

In addition, for example, a specific projection method may be as follows. The coordinate of the three-dimensional point after interpolation is assumed to be (x)_i，y_i，z_i) The longitude and latitude projected on the panoramic image is (theta)_i，φ_i) Then the projection formula can be represented by the following formula 4:

after the latitude and longitude are obtained by the formula, the coordinate of the three-dimensional point on the panoramic image plane can be obtained according to the formula 1, and the color value of the point can be used as the texture of the three-dimensional point.

For most scenes, the contour of the ceiling and the contour of the floor may be assumed to be parallel and the same. Therefore, the corrected ceiling plane profile of each room obtained above, together with the above-obtained extracted height hf' of the camera from the floor, can be used to generate three-dimensional points of the multi-room floor plane profile, also according to equation 2.

Here, the shape of the plane contour of the floor is assumed to be the same as the ceiling, i.e., the three-dimensional coordinates x and z of the horizontal plane are the same, except for the height, i.e., the y value in the vertical direction (e.g., the plane contour of the ceiling is above the camera, and the floor is below the camera, so the heights are different). Therefore, the y value (the extraction height hc 'of the camera from the ceiling) in the three-dimensional point coordinates of the obtained ceiling contour may be replaced with the extraction height hf' of the camera from the floor.

Similarly to the three-dimensional point interpolation of the planar contour of the ceiling, for the planar contour of the floor, the three-dimensional point interpolation is internally performed and then projected into the corresponding panoramic image using equation 4 so as to obtain the texture of the floor.

Then, three-dimensional vertices at the same plane position between the ceiling profile and the floor profile are connected to form plane profiles of a plurality of wall surfaces, and similarly, three-dimensional point interpolation is performed on the interiors of the plane profiles, and then the three-dimensional point interpolation is projected into the corresponding panoramic image by using formula 4 so as to obtain the texture of the wall surface.

Thus, a 3D texture model of the complete house may be generated.

The embodiment of the invention can effectively improve the resolution and the accuracy of the generated model.

Moreover, it should be noted that, for the sake of understanding and description, the method for modeling based on images of the present invention is described by taking house modeling as an example, and actually, the present invention should not be limited to the application scenario of house modeling, but can be applied to various scenarios for modeling based on images, such as the scenario of modeling vehicles to realize VR (virtual reality) driving, and the present invention actually provides an innovative comprehensive image processing scheme.

Fig. 11 is a block diagram of an image processing apparatus according to the present invention:

a first obtaining module 301, configured to obtain a video stream corresponding to at least one panoramic image captured for each three-dimensional sub-object in at least one three-dimensional object and a timestamp;

a second obtaining module 302, configured to obtain a position of each video frame in the video stream according to a geometric relationship between video frames in the video stream;

a third obtaining module 303, configured to take a position corresponding to the timestamp in the position of each video frame as a position of the panoramic camera when at least one panoramic image of each three-dimensional sub-object is taken, wherein each panoramic image is taken for one three-dimensional sub-object;

A processing module 304 configured to stitch planar contours of the at least one panoramic image of each three-dimensional sub-object in a three-dimensional space based on the position of the panoramic camera to obtain a multi-object planar contour of the at least one three-dimensional object.

Optionally, the first obtaining module 301 is further configured to:

shooting each three-dimensional sub-object in at least one three-dimensional object to be processed to obtain at least one panoramic image of each three-dimensional sub-object and a video stream corresponding to the panoramic image, wherein the panoramic image is shot in the process of shooting the video stream of each three-dimensional sub-object;

and recording the time stamp of at least one panoramic image for shooting each three-dimensional sub-object.

Optionally, the processing module 304 is further configured to:

for at least one panoramic image of each three-dimensional sub-object, extracting a plane contour of the at least one panoramic image of each three-dimensional sub-object in a three-dimensional space;

normalizing the plane contour of at least one panoramic image of each three-dimensional sub-object in a three-dimensional space based on the position of the panoramic camera to obtain the normalized plane contour of each three-dimensional sub-object;

And splicing the normalized plane contour of each three-dimensional sub-object based on the position of the panoramic camera to obtain a multi-object plane contour of the at least one three-dimensional object.

Optionally, the processing module 304 is further configured to:

acquiring three-dimensional point coordinates of matched feature points on each panoramic image by using the geometric relationship of at least two panoramic images shot for each three-dimensional sub-object in the at least one three-dimensional object to be processed;

generating a planar contour of the panoramic image in the three-dimensional space based on a contour surrounded by edge pixels among pixels whose contour features belong to a specific category on the panoramic image for each of at least two panoramic images of each three-dimensional sub-object,

wherein the particular categories include at least a top, a bottom, a support of a three-dimensional object in the panoramic image.

Optionally, the processing module 304 is further configured to:

determining the edge pixel points among the pixel points of which the contour features belong to the specific category on the at least two panoramic images of each three-dimensional sub-object based on the feature similarity between the pixel points on the at least two panoramic images of each three-dimensional sub-object;

The feature similarity of the two pixels is an absolute value of a difference between features of the two pixels, and the feature of each pixel comprises gray scale and color.

Optionally, the processing module 304 is further configured to:

for each of the at least two panoramic images of each three-dimensional sub-object, extracting a planar contour of the panoramic image in the three-dimensional space through a deep learning model for extracting an image contour.

Optionally, the deep learning model is trained by:

generating a contour feature training data set of a three-dimensional object of a type of interest by artificially labeling contour features of the three-dimensional object of the type of interest on a plurality of panoramic images serving as training images;

training the deep learning model using the contour feature training dataset of the three-dimensional object of the type of interest, thereby resulting in a trained deep learning model,

Wherein an output of the deep learning model contains a category of contour features of the three-dimensional object of the type of interest.

Optionally, the extracting, for each of the at least two panoramic images of each three-dimensional sub-object, a planar contour of the panoramic image in the three-dimensional space through a deep learning model for extracting an image contour includes:

inputting the at least two panoramic images of each three-dimensional sub-object into the trained deep learning model to obtain the category of the contour feature corresponding to each pixel point in the at least two panoramic images of each three-dimensional sub-object;

extracting edge pixel points at the edge from pixel points of which the types of the contour features belong to specific types from each panoramic image of at least two panoramic images of each three-dimensional sub-object as specific type edge pixel points;

assuming that all the specific category edge pixel points on each panoramic image of the at least two panoramic images of each three-dimensional sub-object have the same height h_cThe assumed contour height of the contour of a specific category is taken as, the specific category edge pixel points on each panoramic image are projected onto a three-dimensional plane to obtain specific category three-dimensional points corresponding to each panoramic image, and then the plane contour of each panoramic image in the three-dimensional space is formed based on the specific category three-dimensional points corresponding to each panoramic image;

Wherein the particular category includes a top of a three-dimensional object in the panoramic image.

Optionally, the processing module 304 is further configured to:

performing feature point matching between at least two panoramic images of each three-dimensional sub-object by using the geometric relationship of at least two panoramic images shot for each three-dimensional sub-object of the at least one three-dimensional object to be processed, and recording mutually matched feature points in the at least two panoramic images of each three-dimensional sub-object as matching feature points; and

reducing a reprojection error of the matching feature points on each panoramic image by, for each panoramic image of the at least two panoramic images of each three-dimensional sub-object, the three-dimensional point coordinates of the matching feature points on each panoramic image.

Optionally, the processing module 304 is further configured to:

sorting the height values in all three-dimensional point coordinates on at least one acquired panoramic image of each three-dimensional sub-object from small to large, and taking the median or mean of the height values sorted at the top as the extracted contour height h of the contour of a specific category_c'; and an assumed profile height h using a profile of a particular class _cExtracted profile height h from a profile of a particular class_c' generating a normalized planar contour in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object from the planar contours in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object,

camera determined height h at which the position of the panoramic camera is predetermined_cIn the case of the' case (a),

assumed height h with assumed camera_cDetermining the height h with the camera_c' generating a normalized planar contour in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object from the planar contours in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object,

wherein the assumed profile height h of the profile of the particular class_cIs an arbitrarily assumed height.

Optionally, the processing module 304 is further configured to:

for at least two panoramic images shot for the at least one three-dimensional object to be processed, determining one by one whether a plurality of panoramic images belong to the same three-dimensional object by: if more than specific proportion of matching feature points exist between the two panoramic images, the two panoramic images are determined to belong to the same three-dimensional object; and

And if the plurality of panoramic images belong to the same three-dimensional object, taking a union set of all plane contours of the same three-dimensional object obtained from the plurality of panoramic images as the plane contour of the three-dimensional object in the three-dimensional space.

Optionally, the processing module 304 is further configured to: and splicing to obtain the multi-object plane contour in the three-dimensional space based on the plane contour in the three-dimensional space of each single three-dimensional sub-object.

Optionally, the processing module 304 is further configured to:

the position of the panoramic camera when the panoramic image corresponding to the ith plane contour is shot is selected as a reference coordinate system, and three-dimensional points of other plane contours can be unified under the reference coordinate system through the following formula:

Optionally, the processing module 304 is further configured to:

and calculating the distance between the adjacent edges of two single-object outlines in the multi-object outline, and if the distance is nonzero and less than a specific threshold value, offsetting the two single-object outlines so as to enable the distance between the adjacent edges to be zero.

Optionally, the apparatus further comprises:

a construction module 305 configured to convert the multi-object planar contour in three-dimensional space resulting from the stitching into a multi-object 3D model.

Optionally, the building module 305 is further configured to:

performing three-dimensional point interpolation on the top plane contour in the spliced multi-object plane contours, and projecting all three-dimensional point coordinates on each top plane contour to a corresponding panoramic image coordinate system to obtain top textures;

performing three-dimensional point interpolation on the inside of the bottom plane contour in the spliced multi-object plane contours, and projecting all three-dimensional point coordinates on each obtained bottom plane contour into a corresponding panoramic image coordinate system to obtain bottom textures;

Connecting three-dimensional vertexes on the same plane position between the top outline and the bottom outline to form a plane outline of the supporting part, performing three-dimensional point interpolation inside the plane outline of the supporting part, and projecting all three-dimensional point coordinates of the obtained plane outline of each supporting part into a corresponding panoramic image coordinate system so as to obtain supporting part textures;

and generating a 3D texture model of the whole three-dimensional object based on the top texture, the bottom texture and the supporting part texture.

Optionally, the bottom planar profile is obtained by: the extracted contour height h of the contour of the specific category in all the three-dimensional point coordinates on the top plane contour of each three-dimensional object is obtained_c' the height value of the extraction height of the camera from the top of the corresponding three-dimensional object is replaced by the extraction height h of the camera from the bottom of the corresponding three-dimensional object_f' andand the length and width values in all three-dimensional point coordinates on the top plane contour of each three-dimensional object are kept unchanged to obtain the corresponding bottom plane contour of each three-dimensional object, wherein,

the camera is at an extraction height h from the bottom of the corresponding three-dimensional object_f' is obtained by: sorting the height values in all three-dimensional point coordinates on at least one panoramic image of each three-dimensional sub-object obtained when the position of the panoramic camera is obtained from small to large, and taking the median or mean of the height values after sorting as the extraction height h of the camera from the bottom of the corresponding three-dimensional object_f’。

The invention provides an image processing device, which can be used for splicing the plane contour of a three-dimensional sub-pair object extracted from a panoramic image according to the position of a panoramic camera to obtain the multi-object plane contour of the three-dimensional object for subsequent object modeling by acquiring the position of the panoramic camera when the panoramic image is shot according to a video stream shot for the three-dimensional object and a time stamp for shooting the panoramic image, thereby efficiently processing the collected data to provide the prepared data for object modeling.

Fig. 12 presents a schematic block diagram of an electronic device in accordance with the present invention.

Referring to fig. 12, the electronic device 1 comprises a memory 10 and a processor 20.

The processor 20 may be a multi-core processor or may include a plurality of processors. In some embodiments, processor 20 may comprise a general-purpose host processor and one or more special purpose coprocessors such as a Graphics Processor (GPU), Digital Signal Processor (DSP), or the like. In some embodiments, processor 20 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

The memory 10 has stored thereon executable code, which, when executed by the processor 20, causes the processor 20 to perform one of the image processing methods described above, wherein the memory 10 may include various types of storage units, such as a system memory, a Read Only Memory (ROM), and a permanent storage device. Wherein the ROM may store static data or instructions that are required by the processor 20 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 10 may comprise any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 10 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowcharts, block diagrams, etc. in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method of claim 1, wherein obtaining the video stream and the time stamp corresponding to the at least one panoramic image captured for each of the at least one three-dimensional sub-object comprises:

3. The method of claim 1, wherein the stitching the planar contour of the at least one panoramic image of each three-dimensional sub-object in three-dimensional space based on the position of the panoramic camera to obtain a multi-object planar contour of the at least one three-dimensional object comprises:

4. The method of claim 3, wherein said extracting, for the at least one panoramic image of each three-dimensional sub-object, a planar contour of the at least one panoramic image of each three-dimensional sub-object in three-dimensional space comprises:

5. The method of claim 4, wherein the generating, for each of the at least two panoramic images for each three-dimensional sub-object, a planar contour of the panoramic image in the three-dimensional space based on contours on the panoramic image bounded by edge pixels among pixels whose contour features belong to a particular class comprises:

6. The method of claim 3, wherein said extracting, for the at least one panoramic image of each three-dimensional sub-object, a planar contour of the at least one panoramic image of each three-dimensional sub-object in three-dimensional space comprises:

7. The method of claim 6, wherein the deep learning model is trained by:

8. The method of claim 7, wherein said extracting, for each of said at least two panoramic images for each three-dimensional sub-object, a planar contour of said panoramic image in said three-dimensional space through a deep learning model for extracting image contours, comprises:

9. The method of claim 4 or 6, wherein the obtaining three-dimensional point coordinates of matching feature points on each panoramic image using the geometric relationship for at least two panoramic images taken for each three-dimensional sub-object of the at least one three-dimensional object to be processed comprises:

10. The method of claim 9, wherein normalizing the planar contour of the at least one panoramic image of each three-dimensional sub-object in three-dimensional space based on the position of the panoramic camera to obtain a normalized planar contour of each three-dimensional sub-object comprises:

sorting the height values in all three-dimensional point coordinates on at least one acquired panoramic image of each three-dimensional sub-object from small to large, and taking the median or mean of the height values sorted at the top as the extracted contour height h of the contour of a specific category _c'; and an assumed profile height h using a profile of a particular class_cExtracted profile height h from a profile of a particular class_c' generating a normalized planar contour in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object from the planar contours in three-dimensional space of the at least two panoramic images of each three-dimensional sub-object,

11. The method of claim 3, wherein after said normalizing the planar contour of the at least one panoramic image of each three-dimensional sub-object in three-dimensional space based on the position of the panoramic camera to obtain the normalized planar contour of each three-dimensional sub-object, further comprising:

12. The method of claim 3, wherein normalizing the planar contour of the at least one panoramic image of each three-dimensional sub-object in three-dimensional space at the panoramic camera-based position, resulting in a normalized planar contour of each three-dimensional sub-object, further comprises:

and splicing to obtain the multi-object plane contour in the three-dimensional space based on the plane contour in the three-dimensional space of each single three-dimensional sub-object.

13. The method of claim 3, wherein said stitching the normalized planar contour of each three-dimensional sub-object based on the position of the panoramic camera to obtain a multi-object planar contour of the at least one three-dimensional object comprises:

14. The method of claim 13, further comprising, after said stitching the planar contours of the at least one panoramic image of each three-dimensional sub-object in three-dimensional space based on the position of the panoramic camera to obtain a multi-object planar contour of the at least one three-dimensional object:

15. The method of claim 3, wherein after said stitching the normalized planar contour of each three-dimensional sub-object based on the position of the panoramic camera to obtain a multi-object planar contour of the at least one three-dimensional object, further comprising:

and converting the multi-object plane contour in the three-dimensional space obtained by splicing into a multi-object 3D model.

16. The method of claim 15, wherein said converting the stitched multi-object planar contour in three-dimensional space into a multi-object 3D model comprises:

17. The method of claim 16, wherein the bottom planar profile is obtained by: the extracted contour height h of the contour of the specific category in all the three-dimensional point coordinates on the top plane contour of each three-dimensional object is obtained_c' the height value of the extraction height of the camera from the top of the corresponding three-dimensional object is replaced by the extraction height h of the camera from the bottom of the corresponding three-dimensional object_f' and keeping the length and width values in all three-dimensional point coordinates on the top planar contour of each three-dimensional object constant, obtaining the bottom planar contour of each three-dimensional object correspondingly, wherein,

18. An image processing apparatus characterized by comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the image processing method of any one of claims 1 to 17.

19. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the image processing method of any one of claims 1 to 17 when executing the computer program.

20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the image processing method of any one of claims 1 to 17.