CN117808987B

CN117808987B - Indoor scene three-dimensional reconstruction method and device, electronic equipment and storage medium

Info

Publication number: CN117808987B
Application number: CN202410218859.3A
Authority: CN
Inventors: 魏辉; 卢丽华; 张晓辉; 李茹杨; 赵雅倩
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-02-28
Filing date: 2024-02-28
Publication date: 2024-05-14
Anticipated expiration: 2044-02-28
Also published as: CN117808987A

Abstract

The invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment and a storage medium for indoor scenes, which relate to the technical field of computers and are characterized in that a plurality of semantic member units are extracted from indoor scene point cloud data by acquiring the point cloud data; generating a scene layout diagram according to the position information and the size information of the semantic member units; dividing each semantic component unit into corresponding scene component semantic categories, generating a fixed component entity structure according to the component category corresponding to each fixed component in the scene layout, searching a preset object model asset library by taking the object category corresponding to the auxiliary object as an index, and obtaining an object model corresponding to the auxiliary object; the invention can reduce the data processing amount in the reconstruction process and can improve the fineness of the geometric surfaces of the fixed component and the accessory object.

Description

Indoor scene three-dimensional reconstruction method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for three-dimensional reconstruction of an indoor scene, an electronic device, and a storage medium.

Background

Three-dimensional reconstruction refers to a technique of scanning and computing a real physical scene by using various sensor devices to generate a corresponding digitized model. Three-dimensional reconstruction has many applications in the fields of digital content authoring, industrial design, mapping, smart cities, virtual reality/meta universe, and the like. The method is divided into object-level three-dimensional reconstruction and scene-level three-dimensional reconstruction from reconstructed objects, wherein the scene-level three-dimensional reconstruction can be divided into indoor scenes, outdoor scenes and the like. Compared with an outdoor scene, the three-dimensional reconstruction of the indoor scene is more complicated and difficult, and the problems of a large number of object objects, complex layout, mutual shielding and the like exist, so that a great challenge is provided for the three-dimensional reconstruction of the indoor scene. In the related art, a triangular mesh model is generated according to point cloud data by acquiring the point cloud data of an indoor scene, and then the triangular mesh model is segmented to obtain a three-dimensional reconstruction result.

Disclosure of Invention

The invention provides an indoor scene three-dimensional reconstruction method, an indoor scene three-dimensional reconstruction device, electronic equipment and a storage medium, which are used for solving the defects that the traditional indoor scene three-dimensional reconstruction method has a large amount of redundant data in the reconstruction process, the memory occupation is high, and components in a reconstruction model are obtained based on triangular mesh optimization, so that the geometric surface of the components is uneven and the details are rough.

The invention provides a three-dimensional reconstruction method of an indoor scene, which comprises the following steps:

Acquiring indoor scene point cloud data through a sensor, and extracting a plurality of semantic member units from the point cloud data;

the deployment equipment calculates the position information and the size information of each semantic component unit, and generates a scene layout chart according to the position information and the size information of the semantic component units;

Dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout, searching a preset object model asset library according to the object category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object;

And assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain the three-dimensional model of the indoor scene.

According to the three-dimensional reconstruction method of the indoor scene provided by the invention, the plurality of semantic member units are extracted from the point cloud data, and the three-dimensional reconstruction method comprises the following steps:

And identifying a plurality of semantic component units in the point cloud data through a point cloud network structure, wherein each semantic component unit comprises a structural semantic tag, and the structural semantic tag corresponds to a specific category in the component category or a specific category of the object category.

According to the three-dimensional reconstruction method of the indoor scene provided by the invention, after a plurality of semantic member units are extracted from the point cloud data, the three-dimensional reconstruction method further comprises the following steps:

optimizing each semantic component unit specifically comprises the following steps:

Initializing an optimized point set corresponding to each semantic component unit, wherein the optimized point set is used for storing the points contained in the optimized semantic component units;

randomly selecting three non-collinear points from all points of the semantic component unit, and calculating a semantic component unit plane according to coordinates of the three non-collinear points;

calculating the distances from other points in the semantic member unit to the plane of the semantic member unit respectively, counting the number of points with the distance smaller than a preset error threshold, if the number of points with the distance smaller than the preset error threshold is greater than a preset number record variable, setting the preset number record variable as the number of points with the current distance smaller than the preset error threshold, and storing the points with the distance smaller than the preset error threshold into a corresponding optimized point set;

and re-selecting three non-collinear points in the semantic component units, calculating a semantic component unit plane, updating a corresponding optimized point set according to the distance from other points to the semantic component unit plane until the preset optimization times are reached, and taking the points in the optimized set as the points of the optimized semantic component units to obtain the optimized semantic component units.

According to the three-dimensional reconstruction method of the indoor scene provided by the invention, when the semantic component units are fixed components, the calculating of the position information and the size information of each semantic component unit comprises the following steps:

respectively calculating the maximum coordinate position and the minimum coordinate position of each optimized fixed component in the three-dimensional coordinate axis direction;

and calculating the position coordinates of the central point of the optimized fixing member and the geometric dimension of the optimized fixing member according to the maximum coordinate position and the minimum coordinate position of each coordinate axis direction.

According to the three-dimensional reconstruction method for indoor scene provided by the invention, when the semantic component units are auxiliary objects, the calculating of the position information and the size information of each semantic component unit comprises the following steps:

calculating the maximum coordinate position and the minimum coordinate position of the axis alignment bounding box of the accessory object in the three-dimensional coordinate axis direction;

and calculating the center point position coordinates of the auxiliary object according to the maximum coordinate position and the minimum coordinate position of the three-dimensional coordinate axis direction, and acquiring the geometric dimension of the auxiliary object according to the dimension of the axis alignment bounding box.

According to the three-dimensional reconstruction method for the indoor scene, provided by the invention, the scene layout is of a layered structure, and the number of layers of the scene layout is determined according to the spatial structure of the indoor scene and the types and the number of semantic member units included in the indoor scene;

Each layer of the scene layout map corresponds to one or more component categories, and each layer of the scene layout map is expressed as a unit grid, the unit grid comprises a plurality of unit grids, each unit grid comprises a semantic label, a central position, a geometric dimension and an identifier, and the identifier is used for assisting in indicating position information of one or more component categories corresponding to fixed components.

According to the three-dimensional reconstruction method for the indoor scene, provided by the invention, the component category comprises at least one of floor layers, walls, doors, windows, ceilings, beams and columns.

The three-dimensional reconstruction method of the indoor scene provided by the invention further comprises the following steps:

constructing a mapping relation between the value of the identifier and the position information of the door and window type components;

Acquiring the value of an identifier in a unit cell of a door and window type component, and acquiring auxiliary indication information according to the matching of the value of the identifier and the mapping relation between the value of the identifier and the position information of the door and window type component;

And acquiring the position information corresponding to the door and window type components according to the auxiliary indication information.

According to the three-dimensional reconstruction method of the indoor scene, the ceilings comprise ceilings with simple shapes and ceilings with complex shapes;

the simple-shaped ceilings are represented by central ceiling cells, the other ceiling cells being empty;

The complex-shaped ceiling is represented by a plurality of ceiling cell combinations.

According to the three-dimensional reconstruction method for the indoor scene, provided by the invention, the cells of different layers of the scene layout diagram are used for representing the position information of different fixed members or representing the distribution of the same fixed member at different positions in space.

According to the three-dimensional reconstruction method of the indoor scene, provided by the invention, in the construction process of the scene layout diagram, the central position of the cell is iteratively updated;

the center position of the current cell is the average value of the center position calculated in the last iteration and the center position calculated in the current iteration.

According to the three-dimensional reconstruction method of the indoor scene, provided by the invention, in the construction process of the scene layout diagram, the geometric dimension of the cell is iteratively updated;

The geometric dimension of the current cell is the union of the geometric dimension calculated in the previous iteration and the geometric dimension calculated in the current iteration.

According to the three-dimensional reconstruction method of the indoor scene provided by the invention, the scene layout diagram is generated according to the position information and the size information of the fixed component and the accessory object, and the method comprises the following steps:

Projecting points in the fixed component optimization point set corresponding to each component category to an XY plane, and respectively calculating the maximum coordinate position and the minimum coordinate position of the projected points in the X, Y direction;

Calculating vertex coordinates of the fixed component in the scene layout according to the maximum and minimum coordinate positions of the X, Y directions;

acquiring a coordinate range of a cell corresponding to the fixed member according to the vertex coordinates of the fixed member in the scene layout;

Determining a center point of the fixed member according to the coordinate range of the corresponding cell of the fixed member;

If the fixed component is a wall component, calculating the distance from other projection points of the wall component to the center point of the wall component, selecting a cell corresponding to the center point with the smallest distance as a wall component cell, and if no data exists in the wall component cell, filling the semantic label center position, the geometric dimension and the identifier of the wall component into the wall component cell; if the wall member cells have data, averaging the central positions in the wall member cells and the stored central positions in the wall member cells to obtain new central positions; taking the union of the geometric dimensions in the wall member cells and the stored geometric dimensions of the wall member cells as new geometric dimensions;

If the fixed component is a door and window component, calculating the distance from other projection points of the door and window component to the center point of the door and window component, selecting a cell corresponding to the center point with the smallest distance as a door and window component cell, and if no data exists in the door and window component cell, filling the semantic label center position, the geometric dimension and the identifier of the door and window component into the door and window component cell; if the door and window component cells have data, filling the door and window component cells in a layer with high priority in the scene layout chart according to a preset priority, and taking the average value of the central positions in the door and window component cells and the stored central positions in the door and window component cells as a new central position; and taking the union set of the geometric dimensions in the door and window type component cells and the stored geometric dimensions of the door and window type component cells as a new geometric dimension, and indicating the wall where the door and window type component is located through an identifier.

According to the three-dimensional reconstruction method of the indoor scene provided by the invention, the solid structure of the fixed component is generated according to the component category corresponding to each fixed component in the scene layout diagram, and the method comprises the following steps:

Calculating the space coordinates of all vertexes of the fixed component according to the central position and the geometric dimension in each fixed component cell in the scene layout;

Performing triangular mesh dissection by taking the space coordinates of all vertexes as key points to obtain a triangular mesh model of the fixed component;

Inputting the component category corresponding to the fixed component into a texture generation neural network model, and obtaining a texture map of the fixed component, wherein the texture generation neural network model is obtained based on the component category and the corresponding texture map training;

And generating a solid structure of the fixing member according to the triangular mesh model of the fixing member and the texture map of the fixing member.

According to the three-dimensional reconstruction method of the indoor scene, which is provided by the invention, the texture generation neural network model comprises a coding module and a multi-layer perceptron;

the coding module is used for coding the component categories to obtain implicit characteristics of each component category;

the component categories are used for performing perception learning on implicit features of each component category, and generating texture maps corresponding to the component categories.

According to the three-dimensional reconstruction method of the indoor scene provided by the invention, the asset library of the preset object model is constructed according to the semantic category of the auxiliary object, and after the object model corresponding to the auxiliary object is obtained, the three-dimensional reconstruction method further comprises the following steps:

and replacing the auxiliary objects in the scene layout by using the object models corresponding to the acquired auxiliary objects.

According to the three-dimensional reconstruction method of the indoor scene provided by the invention, before replacing the auxiliary object in the scene layout by the object model corresponding to the acquired auxiliary object, the three-dimensional reconstruction method further comprises the following steps:

and scaling the object model indexed from the preset object model asset library according to the axis alignment bounding box size of the auxiliary object.

The invention also provides a three-dimensional reconstruction device of the indoor scene, which comprises:

The acquisition module is used for acquiring indoor scene point cloud data through a sensor and extracting a plurality of semantic member units from the point cloud data;

The computing module is used for computing the position information and the size information of each semantic component unit by deployment equipment and generating a scene layout chart according to the position information and the size information of the semantic component units;

The generation module is used for dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout diagram, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout diagram, searching a preset object model asset library according to the category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object;

And the assembling module is used for assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain the three-dimensional model of the indoor scene.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the three-dimensional reconstruction method of the indoor scene when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the three-dimensional reconstruction method of indoor scene as described in any of the above.

According to the indoor scene three-dimensional reconstruction method, the device, the electronic equipment and the storage medium, indoor scene point cloud data are obtained through the sensor, and a plurality of semantic member units are extracted from the point cloud data; the deployment equipment calculates the position information and the size information of each semantic component unit, and generates a scene layout chart according to the position information and the size information of the semantic component units; dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout, searching a preset object model asset library according to the object category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object; and assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain a three-dimensional model of the indoor scene, generating the scene layout diagram, refining specific components according to the scene, generating the three-dimensional model according to the specific components, reducing the data processing amount in the reconstruction process, acquiring the fixed component according to the component category, acquiring the auxiliary object according to the asset library, and improving the fineness of the geometric surfaces of the fixed component and the auxiliary object.

Drawings

In order to more clearly illustrate the invention or the technical solutions in the related art, the following description will briefly explain the drawings used in the embodiments or the related art description, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for those skilled in the art.

Fig. 1 is a schematic flow chart of an indoor scene three-dimensional reconstruction method provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of a scene layout provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a two-layer scene layout diagram according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a texture-generating neural network model according to an embodiment of the present invention;

fig. 5 is a deployment schematic diagram of an indoor scene three-dimensional reconstruction method provided by an embodiment of the invention;

Fig. 6 is a schematic functional structural diagram of an indoor scene three-dimensional reconstruction device according to an embodiment of the present invention;

fig. 7 is a schematic functional structure of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of an indoor scene three-dimensional reconstruction method provided by an embodiment of the present invention, where, as shown in fig. 1, the indoor scene three-dimensional reconstruction method provided by the embodiment of the present invention includes:

step 101, acquiring indoor scene point cloud data through a sensor, and extracting a plurality of semantic member units from the point cloud data;

In the embodiment of the invention, the indoor scene to be reconstructed can be directly scanned by radar, a scanner and other equipment to obtain the corresponding three-dimensional point cloud, or the three-dimensional point cloud of the scene can be obtained by acquiring image or depth information by using a camera and the like and then calculating by using photogrammetry or multi-view solid geometry principle. The invention does not limit the indoor scene point cloud data technology.

Step 102, the deployment equipment calculates the position information and the size information of each semantic component unit, and generates a scene layout chart according to the position information and the size information of the semantic component units;

Step 103, dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout diagram, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout diagram, searching a preset object model asset library according to the category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object;

In the embodiment of the invention, the design of the semantic class set S is shown in table 1, and the whole semantic class set S is divided into two classes of fixed components and accessory objects. The fixed components include the scene itself invariable component objects such as walls, floors, ceilings, beams, etc.; the auxiliary objects include additional variable objects in the scene, such as lights, tables, chairs, air conditioners, etc. It should be noted that, those skilled in the art may increase the semantic categories according to the actual situation.

TABLE 1 scene component semantic Categories table

In the embodiment of the invention, the parameter information of each fixed component in the scene layout is, for example, the central position and the geometric dimension of the fixed component, and the triangular mesh model of the fixed component is generated according to the central position and the geometric dimension of each fixed component.

In the embodiment of the invention, the attached firmware of the auxiliary object is obtained according to the scene layout diagram, which comprises determining the attached firmware of the auxiliary object according to the position of the auxiliary object, and verifying the class of the auxiliary object by using the attached firmware before retrieving the model library, so that the searching accuracy can be improved. If the attachment firmware is a ceiling, the identified auxiliary object is a chair, and the chair cannot be installed on the ceiling, so that the object type identification is not accurate, the object identification needs to be carried out again, and the searching accuracy and the searching efficiency can be further improved by combining the scene layout diagram with the object model library.

And 104, assembling the solid structure of the fixed component and the object model corresponding to the accessory object in the scene layout diagram to obtain the three-dimensional model of the indoor scene.

The traditional indoor scene three-dimensional reconstruction method comprises the steps of obtaining point cloud data of an indoor scene, generating a triangular mesh model according to the point cloud data, and then dividing the triangular mesh model to obtain a three-dimensional reconstruction result.

According to the indoor scene three-dimensional reconstruction method provided by the embodiment of the invention, the indoor scene point cloud data is acquired, and a plurality of semantic member units are extracted from the point cloud data; calculating the position information and the size information of each semantic member unit, and generating a scene layout chart according to the position information and the size information of the semantic member units; dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises fixed components and auxiliary objects, generating a fixed component entity structure according to the component category corresponding to each fixed component in the scene layout, and searching a preset object model asset library by taking the object category corresponding to the auxiliary object as an index to acquire an object model corresponding to the auxiliary object; and assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain a three-dimensional model of the indoor scene, generating the scene layout diagram, refining specific components according to the scene, generating the three-dimensional model according to the specific components, reducing the data processing amount in the reconstruction process, acquiring the fixed component according to the component category, acquiring the auxiliary object according to the asset library, and improving the fineness of the geometric surfaces of the fixed component and the auxiliary object.

Based on any one of the above embodiments, the three-dimensional reconstruction method for indoor scene provided by the embodiment of the present invention includes:

step 201, identifying a plurality of semantic component units in the point cloud data through a point cloud network structure, wherein each semantic component unit comprises a structural semantic tag, and the structural semantic tag corresponds to a specific category in the component categories or a specific category of the object categories.

In the embodiment of the invention, all semantic component units { Pi } in the point cloud are identified through PointNet, pointNet ++ and other network structures, each semantic component unit Pi consists of a semantic tag and all contained points, and the identified structural semantic tag corresponds to a certain category in a semantic category set S. PointNet directly takes the point cloud data as input, and can output a class integral label or output labels point by point for each point.

Step 202, optimizing each semantic component unit;

in the embodiment of the invention, each semantic member unit is optimized, and the method specifically comprises the following steps:

Taking a certain component Pi as an example when optimizing and refining the identified semantic component point cloud, the points included in the component Pi are Pi1, pi2, … and Pin, and the optimization and refinement of the Pi comprises the following specific steps:

a) Initializing a preset error threshold d for controlling the precision of optimizing refinement, wherein d is determined according to the actual precision requirement; initializing a quantity record variable c=0; setting the optimization iteration times K, wherein the larger the K is, the higher the corresponding optimization precision is; the set of points comprised by the optimized component Pi is initialized to Q.

B) Three non-collinear points are randomly selected from all the points of Pi, and the plane A where the three points are located can be calculated through the coordinates of the three points.

C) The distances from other points in Pi to the plane a are calculated respectively, the number x of points with the statistical distance smaller than the preset threshold d is calculated, if x > c, c=x is set, and the set Q is updated to the points with the distances from the other points to the plane a smaller than the preset threshold d.

D) Repeating the steps b) and c) until the preset optimization times K are reached. The points in the set Q are now the points of the member Pi after optimization.

In the embodiment of the invention, the identified semantic component point cloud is optimized and refined, and the optimized component point cloud is obtained through random sampling and repeated iterative optimization, so that the accuracy of the component point cloud is higher than that of the original component point cloud.

Step 203, calculating the position information and the size information of each optimized semantic member unit, and generating a scene layout according to the position information and the size information of the semantic member units;

204, dividing each optimized semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises fixed components and auxiliary objects, generating a fixed component entity structure according to the component category corresponding to each fixed component in the scene layout, and searching a preset object model asset library by taking the object category corresponding to the auxiliary object as an index to obtain an object model corresponding to the auxiliary object;

Step 205, assembling the solid structure of the fixed member and the object model corresponding to the auxiliary object in the scene layout diagram to obtain the three-dimensional model of the indoor scene.

In the embodiment of the invention, the space layout of the scene is represented by dividing the objects in the scene into two types of fixed members and auxiliary objects and combining the multi-layer layout diagram, so that the space structural representation of the reconstructed target scene is realized. Aiming at the problem of poor quality of model details in the traditional reconstruction method, the quality of the reconstructed model can be improved by combining a scene layout diagram and different object classifications, generating fixed component objects based on mesh subdivision and accessory objects based on asset library replacement, and realizing the structured three-dimensional reconstruction of the scene by means of assembling the scene diagram.

In the embodiment of the present invention, when the semantic member units are fixed members, the calculating the position information and the size information of each semantic member unit includes:

In the embodiment of the invention, for the fixed components, respectively calculating the maximum coordinate position and the minimum coordinate position (xmin, xmax), (ymin, ymax), (zmin, zmax) of each component in the X, Y, Z coordinate axis direction; for the accessory object, the maximum coordinate position and the minimum coordinate position of the axis alignment bounding box in the X, Y, Z coordinate axis direction are calculated, and the geometric dimension of the fixed component is (xmax-xmin) × (ymax-ymin) × (zmax-zmin).

In the embodiment of the present invention, when the semantic component units are attached objects, the calculating the location information and the size information of each semantic component unit includes:

In the embodiment of the invention, the position coordinate Oi of the center point of each component and the geometric dimension of the component can be calculated according to the maximum coordinate position and the minimum coordinate position of each component in the X, Y, Z coordinate axis direction; the geometry of the accessory member is the size of its axis aligned bounding box.

Based on any one of the above embodiments, the scene layout is mainly used for representing a spatial layout of a scene fixing member, the scene layout is a layered structure, and the number of layers of the scene layout is determined according to the spatial structure of the indoor scene and the types and the number of semantic member units included in the indoor scene;

In an embodiment of the invention, the component categories include at least one of floor layers, walls, doors, windows, ceilings, beams and columns. The ceilings include simple shape ceilings and complex shape ceilings; the simple-shaped ceilings are represented by central ceiling cells, the other ceiling cells being empty; the complex-shaped ceiling is represented by a plurality of ceiling cell combinations.

As shown in fig. 2, the first layer represents a floor layer, wall, door, window; the second layer represents the ceiling, door, window. The layout representation method is extensible, for example, when the fixed component exists in the category of components such as beams, columns and the like, the representation can be performed by adding a new layout layer. The following description will be given by taking two-layer representation as an example.

Each layer of the layout is represented as a 3x3 grid of cells, the data for each cell being defined as follows:

(semantic tags, central location, geometry, identifier)

The semantic tag is the semantic tag result obtained in the first step, and the center position and the geometric dimension are the results obtained in the second step. The identifier is used for assisting in indicating the corresponding position information of the door and window type components, such as the door component in the first cell in the upper left corner of the first layer, and when the identifier of the cell is 0, the door component is positioned on the wall represented by the adjacent cell on the right side; when the identifier of the unit is 1, it means that the door member is located on the wall of the adjacent unit below. The identifier of the wall, ceiling, etc. member is empty.

The second layer is primarily used to represent ceiling elements in which multiple ceiling cells can be combined to represent a complex-shaped ceiling, a simple-shaped planar ceiling can be represented by only the central ceiling cell, and the other ceiling cells can be empty.

The second tier of fenestration units is used to aid in the fenestration presentation of the first tier, where there are door or window members on both adjacent walls, the members can be distributed in two layouts, e.g., where there are doors on both the left and upper walls, which can be accomplished with the presentation shown in fig. 3.

Based on any one of the above embodiments, the three-dimensional reconstruction method for indoor scene provided by the present invention further includes:

step 301, constructing a mapping relation between the value of the identifier and the position information of the door and window type components;

302, acquiring the value of an identifier in a unit cell of a door and window type component, and acquiring auxiliary indication information according to the matching of the value of the identifier and the mapping relation between the value of the identifier and the position information of the door and window type component;

And 303, acquiring the position information corresponding to the door and window type components according to the auxiliary indication information.

In the embodiment of the invention, the cells of different layers of the scene layout are used for representing the position information of different fixed components or representing the distribution of the same fixed component at different positions in space.

In the embodiment of the invention, in the construction process of the scene layout diagram, the central position of the cell is iteratively updated;

In the embodiment of the invention, in the construction process of the scene layout diagram, the geometric dimension of the cell is updated iteratively;

Based on any of the above embodiments, generating a scene layout according to the position information and the size information of the fixed member and the subordinate object includes:

Specific examples of generating a scene layout according to fig. 3 include:

a) Initializing a layout, namely projecting points in the optimized point cloud set of all the components to an XY plane, and respectively solving the maximum coordinate position and the minimum coordinate position (Xmin, xmax) of the projected points in the X, Y direction, and the minimum coordinate positions (Ymin, ymax) of the points, wherein the four vertex coordinates of the layout are (Xmin, ymin), (Xmin, ymax), (Xmax, ymin) and (Xmax, ymax). From the four vertex positions, the coordinate range of each sub-cell in the layout can be calculated.

B) And (3) projecting the central points of all the wall components to an XY plane, respectively calculating the distances from the projection points to four wall sub-cells, selecting the sub-cell closest to the central point, and filling the sub-cell at the corresponding position of the 1 st layer of the layout diagram by using the component. If the sub-cell already has data, the cell data is processed as follows:

Center position: taking an average value of the central position of the current processing component and the central position stored by the cell as a new central position;

Geometric dimensions: the geometry (W1, L1, H1) of the current processing means is combined with the geometry (W2, L2, H2) of the cell storage as a new geometry, i.e. (W1, W2), max (L1, L2), max (H1, H2)).

C) The projection calculation method for the ceiling type member is similar to that for the wall type member, and the ceiling type member is represented by 5 sub-cells of layer 2.

D) The projection calculation method of the door and window type components is similar to that of the wall type components, the layer 1 layout diagram is filled preferentially, when the sub-cells at the corresponding positions of the layer 1 layout diagram already have data, the sub-cells at the corresponding positions of the layer 2 layout diagram are filled, and the wall where the door and window type components are located is indicated by the identifier.

Based on any of the above embodiments, generating a fixed component entity structure according to a component class corresponding to each fixed component in the scene layout includes:

Step 401, calculating the space coordinates of all vertexes of the fixed component according to the central position and the geometric dimension in each fixed component cell in the scene layout;

Step 402, performing triangular mesh dissection by taking the space coordinates of all vertexes as key points to obtain a triangular mesh model of the fixed component;

in the embodiment of the invention, triangulation can be performed by adopting a Delaunay equal subdivision algorithm.

Step 403, inputting a component category corresponding to the fixed component into a texture generation neural network model, and obtaining a texture map of the fixed component, wherein the texture generation neural network model is obtained by training based on the component category and the corresponding texture map;

in an embodiment of the invention, the training loss function L of the texture generation neural network model is lost by an image And/>Semantic category loss is two components, image loss is the generation of texture map/>And reference truth map/>An average of pixel-by-pixel value errors between;

semantic category loss is the coding feature of the input text Semantically encoded features associated with generating texture mapsThe error in which the semantically encoded features of the texture map are generated can be derived from a common image encoding network model. As shown in the following formula:

wherein n is the number of images, m is the number of semantic categories, For image identification,/>Identifying for the semantic category.

In an embodiment of the invention, a texture generation neural network model comprises a coding module and a multi-layer perceptron;

In the embodiment of the invention, the texture of the fixing member is automatically generated through a texture generation neural network model, and the structure of the texture generation neural network model is shown in fig. 4. The input component category, such as 'wall', is encoded into implicit characteristics by an encoder, and then the texture map of the wall component is obtained by a texture generation network of a multi-layer perceptron (MLP), compared with a method based on triangular mesh optimization, the embodiment of the invention can enable the geometric surface of a fixed component to be finer by generating the texture map by a texture generation neural network model.

And 404, generating a solid structure of the fixing member according to the triangular mesh model of the fixing member and the texture map of the fixing member.

Based on any one of the above embodiments, the preset object model asset library is constructed according to the semantic category of the subordinate object, and after obtaining the object model corresponding to the subordinate object, the method further includes:

In the embodiment of the invention, the object model in the preset object model asset library is a refined model, the object model corresponding to the obtained auxiliary object is used for replacing the auxiliary object in the scene layout, the auxiliary object does not need to be reconstructed, and the geometric surface of the reconstructed model of the auxiliary object with a small size can be finer.

In the embodiment of the present invention, before replacing the auxiliary object in the scene layout by the object model corresponding to the acquired auxiliary object, the method further includes:

As shown in fig. 5, after the sensor adopts the point cloud data, the point cloud data is transferred to the deployment device, on the deployment device, the point cloud is reconstructed and optimized, the point cloud semantic categories and bounding boxes are identified, and different semantic categories are processed, for example, for a fixed component, a scene layout is generated and optimized, a physical structure is finally generated, corresponding to an accessory object, the object category is further identified, the object model is subjected to size transformation after the object model is retrieved, an object model with a proper size is obtained, the object model and the physical structure are assembled, and the indoor scene three-dimensional reconstruction model is generated.

According to the indoor scene three-dimensional reconstruction method provided by the embodiment of the invention, the geometric precision quality of each semantic component of the point cloud can be improved based on the semantic components and the point cloud optimization method based on random sampling, and the problem of low quality of the acquired or calculated three-dimensional point cloud is solved. According to the scene organization method based on the layout diagram, objects in the scene are divided into two types, namely a fixed component and an accessory object, and the scene space layout is represented by combining the multi-layer layout diagram, so that the space structural representation of the reconstructed target scene is realized, and the problem that the objects are more and more complicated in the scene reconstruction is solved. Combining a scene layout diagram and different object classifications, generating fixed component objects based on mesh division and accessory objects based on asset library replacement, and assembling by means of the scene diagram to realize structural high-quality three-dimensional reconstruction of the scene, so that the problem of poor detail quality of a model in the traditional reconstruction method is solved.

The three-dimensional reconstruction device for indoor scene provided by the invention is described below, and the three-dimensional reconstruction device for indoor scene described below and the three-dimensional reconstruction method for indoor scene described above can be referred to correspondingly.

Fig. 6 is a schematic functional structural diagram of an indoor three-dimensional reconstruction device according to an embodiment of the present invention, where, as shown in fig. 6, the indoor three-dimensional reconstruction device according to an embodiment of the present invention includes:

An obtaining module 601, configured to obtain indoor scene point cloud data, and extract a plurality of semantic component units from the point cloud data;

a calculation module 602, configured to calculate position information and size information of each semantic member unit, and generate a scene layout according to the position information and the size information of the semantic member unit;

The generating module 603 is configured to divide each semantic component unit into corresponding scene component semantic categories, where the scene component semantic categories include fixed components and auxiliary objects, generate a fixed component entity structure according to a component category corresponding to each fixed component in the scene layout, and search a preset object model asset library by using an object category corresponding to the auxiliary object as an index, so as to obtain an object model corresponding to the auxiliary object;

and an assembling module 604, configured to assemble the fixed member entity structure and the object model corresponding to the auxiliary object in the scene layout diagram, so as to obtain the three-dimensional model of the indoor scene.

According to the indoor scene three-dimensional reconstruction device provided by the embodiment of the invention, the indoor scene point cloud data are acquired, and a plurality of semantic member units are extracted from the point cloud data; calculating the position information and the size information of each semantic member unit, and generating a scene layout chart according to the position information and the size information of the semantic member units; dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout, searching a preset object model asset library according to the object category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object; and assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain a three-dimensional model of the indoor scene, generating the scene layout diagram, refining specific components according to the scene, generating the three-dimensional model according to the specific components, reducing the data processing amount in the reconstruction process, acquiring the fixed component according to the component category, acquiring the auxiliary object according to the asset library, and improving the fineness of the geometric surfaces of the fixed component and the auxiliary object.

In an embodiment of the present invention, the acquisition module 601 is configured to:

In an embodiment of the present invention, the method further includes an optimization module configured to: after extracting a plurality of semantic component units from the point cloud data, optimizing each semantic component unit specifically comprises the following steps: initializing an optimized point set corresponding to each semantic component unit, wherein the optimized point set is used for storing the points contained in the optimized semantic component units; randomly selecting three non-collinear points from all points of the semantic component unit, and calculating a semantic component unit plane according to coordinates of the three non-collinear points; calculating the distances from other points in the semantic member unit to the plane of the semantic member unit respectively, counting the number of points with the distance smaller than a preset error threshold, if the number of points with the distance smaller than the preset error threshold is greater than a preset number record variable, setting the preset number record variable as the number of points with the current distance smaller than the preset error threshold, and storing the points with the distance smaller than the preset error threshold into a corresponding optimized point set; and re-selecting three non-collinear points in the semantic component units, calculating a semantic component unit plane, updating a corresponding optimized point set according to the distance from other points to the semantic component unit plane until the preset optimization times are reached, and taking the points in the optimized set as the points of the optimized semantic component units to obtain the optimized semantic component units.

In an embodiment of the invention, the computing module 602 is configured to:

In the embodiment of the invention, the scene layout is a layered structure, and the number of layers of the scene layout is determined according to the spatial structure of the indoor scene and the types and the number of semantic member units included in the indoor scene;

In an embodiment of the invention, the component categories include at least one of floor layers, walls, doors, windows, ceilings, beams, and columns.

In an embodiment of the present invention, the device further includes an auxiliary indication module configured to:

In embodiments of the present invention, ceilings include simple-shape ceilings and complex-shape ceilings;

In the embodiment of the invention, the cells of different layers of the scene layout diagram are used for representing the position information of different fixed components or representing the distribution of different positions of the same fixed component in space.

In an embodiment of the present invention, the generating a scene layout according to the position information and the size information of the fixed member and the auxiliary object includes:

In an embodiment of the present invention, the generating module 603 is configured to:

In an embodiment of the invention, the texture generation neural network model comprises a coding module and a multi-layer perceptron;

In an embodiment of the present invention, the generating module 603 is further configured to:

According to the indoor scene three-dimensional reconstruction device provided by the embodiment of the invention, the geometric precision quality of each semantic component of the point cloud can be improved based on the semantic components and the point cloud optimization method of random sampling, and the problem of low quality of the acquired or calculated three-dimensional point cloud is solved. According to the scene organization method based on the layout diagram, objects in the scene are divided into two types, namely a fixed component and an accessory object, and the scene space layout is represented by combining the multi-layer layout diagram, so that the space structural representation of the reconstructed target scene is realized, and the problem that the objects are more and more complicated in the scene reconstruction is solved. Combining a scene layout diagram and different object classifications, generating fixed component objects based on mesh division and accessory objects based on asset library replacement, and assembling by means of the scene diagram to realize structural high-quality three-dimensional reconstruction of the scene, so that the problem of poor detail quality of a model in the traditional reconstruction method is solved.

Fig. 7 illustrates a physical schematic diagram of a deployment device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. The processor 710 may call logic instructions in the memory 730 to perform a method of three-dimensional reconstruction of an indoor scene, the method comprising: acquiring indoor scene point cloud data, and extracting a plurality of semantic member units from the point cloud data; calculating the position information and the size information of each semantic member unit, and generating a scene layout chart according to the position information and the size information of the semantic member units; dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout, searching a preset object model asset library according to the object category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object; and assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain the three-dimensional model of the indoor scene. In the embodiment of the invention, the deployment device can be cloud computing device, edge computing device or terminal device.

In another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the indoor scene three-dimensional reconstruction method provided by the above methods, the method comprising: acquiring indoor scene point cloud data, and extracting a plurality of semantic member units from the point cloud data; calculating the position information and the size information of each semantic member unit, and generating a scene layout chart according to the position information and the size information of the semantic member units; dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout, searching a preset object model asset library according to the object category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object; and assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain the three-dimensional model of the indoor scene.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The three-dimensional reconstruction method of the indoor scene is characterized by comprising the following steps of:

Each layer of the scene layout map corresponds to one or more component categories, and is expressed as a unit grid, wherein the unit grid comprises a plurality of unit cells, and each unit cell comprises a semantic label, a central position, a geometric dimension and an identifier;

Generating a scene layout according to the position information and the size information of the fixed member and the accessory object, including:

Generating a grid model of the fixed component according to the parameter information of each fixed component in the scene layout, and performing texture refinement on the grid model to obtain a solid structure of the fixed component, wherein the method comprises the following steps: calculating the space coordinates of all vertexes of the fixed component according to the central position and the geometric dimension in each fixed component cell in the scene layout; performing triangular mesh dissection by taking the space coordinates of all vertexes as key points to obtain a triangular mesh model of the fixed component; inputting the component category corresponding to the fixed component into a texture generation neural network model, and obtaining a texture map of the fixed component, wherein the texture generation neural network model is obtained based on the component category and the corresponding texture map training; performing texture refinement on the triangular mesh model of the fixing member by using the texture map of the fixing member to generate a solid structure of the fixing member;

2. The method of claim 1, wherein extracting a plurality of semantic member units from the point cloud data comprises:

And identifying a plurality of semantic component units in the point cloud data through a point cloud network structure, wherein each semantic component unit comprises a structural semantic tag, and the structural semantic tag corresponds to a specific category in component categories or a specific category of the object category.

3. The method for three-dimensional reconstruction of an indoor scene according to claim 1 or 2, wherein after extracting a plurality of semantic member units from the point cloud data, the method further comprises:

4. The indoor scene three-dimensional reconstruction method according to claim 3, wherein when the semantic member units are fixed members, the calculating the position information and the size information of each semantic member unit comprises:

5. The indoor scene three-dimensional reconstruction method according to claim 3, wherein when the semantic member units are subordinate objects, the calculating the position information and the size information of each semantic member unit includes:

6. The three-dimensional reconstruction method of an indoor scene according to claim 1, wherein the scene layout is a layered structure, and the number of layers of the scene layout is determined according to the spatial structure of the indoor scene and the kind and number of semantic member units included in the indoor scene;

the identifier is used to assist in indicating the location information of one or more of the component categories corresponding to the fixed component.

7. The method of three-dimensional reconstruction of an indoor scene according to claim 6, wherein the component categories comprise at least one of floor layers, walls, doors, windows, ceilings, beams, and columns.

8. The method for three-dimensional reconstruction of an indoor scene according to claim 7, further comprising:

9. The method of three-dimensional reconstruction of an indoor scene according to claim 7, wherein the ceilings comprise simple-shaped ceilings and complex-shaped ceilings;

10. The method for three-dimensional reconstruction of an indoor scene as claimed in claim 6, wherein,

The cells of different layers of the scene layout are used for representing the position information of different fixed components or for representing the distribution of the same fixed component at different positions in space.

11. The method according to claim 6, wherein in the scene layout construction process, the central positions of the cells are iteratively updated;

12. The method for three-dimensional reconstruction of an indoor scene as claimed in claim 6, wherein,

In the scene layout diagram construction process, the geometric dimension of the cell is updated iteratively;

13. The method of claim 7, wherein generating a scene layout map from the position information and the size information of the fixed member and the auxiliary object, further comprises:

14. The method of claim 1, wherein the texture generating neural network model comprises an encoding module and a multi-layer perceptron;

15. The method for three-dimensional reconstruction of an indoor scene according to claim 1, wherein the preset object model asset library is constructed according to the semantic category of the auxiliary object, and further comprises, after obtaining the object model corresponding to the auxiliary object:

16. The method of claim 15, further comprising, before replacing the auxiliary object in the scene layout with the object model corresponding to the acquired auxiliary object:

17. An indoor scene three-dimensional reconstruction device, characterized by comprising:

The computing module is used for computing the position information and the size information of each semantic component unit by deployment equipment and generating a scene layout chart according to the position information and the size information of the semantic component units; each layer of the scene layout map corresponds to one or more component categories, and is expressed as a unit grid, wherein the unit grid comprises a plurality of unit cells, and each unit cell comprises a semantic label, a central position, a geometric dimension and an identifier; generating a scene layout according to the position information and the size information of the fixed member and the accessory object, including: projecting points in the fixed component optimization point set corresponding to each component category to an XY plane, and respectively calculating the maximum coordinate position and the minimum coordinate position of the projected points in the X, Y direction; calculating vertex coordinates of the fixed component in the scene layout according to the maximum and minimum coordinate positions of the X, Y directions; acquiring a coordinate range of a cell corresponding to the fixed member according to the vertex coordinates of the fixed member in the scene layout; determining a center point of the fixed member according to the coordinate range of the corresponding cell of the fixed member; the generation module is used for dividing each semantic component unit into corresponding scene component semantic categories, wherein each scene component semantic category comprises a fixed component and an auxiliary object, generating a triangular mesh model of the fixed component according to parameter information of each fixed component in the scene layout diagram, carrying out texture refinement on the triangular mesh model to obtain a fixed component entity structure, acquiring attachment firmware of the auxiliary object according to the scene layout diagram, searching a preset object model asset library according to the category of the auxiliary object verified by the attachment firmware as an index, and acquiring an object model corresponding to the auxiliary object;

generating a grid model of the fixed component according to the parameter information of each fixed component in the scene layout, and performing texture refinement on the grid model to obtain a solid structure of the fixed component, wherein the method comprises the following steps: calculating the space coordinates of all vertexes of the fixed component according to the central position and the geometric dimension in each fixed component cell in the scene layout; performing triangular mesh dissection by taking the space coordinates of all vertexes as key points to obtain a triangular mesh model of the fixed component; inputting the component category corresponding to the fixed component into a texture generation neural network model, and obtaining a texture map of the fixed component, wherein the texture generation neural network model is obtained based on the component category and the corresponding texture map training; performing texture refinement on the triangular mesh model of the fixing member by using the texture map of the fixing member to generate a solid structure of the fixing member; and the assembling module is used for assembling the solid structure of the fixed component and the object model corresponding to the auxiliary object in the scene layout diagram to obtain the three-dimensional model of the indoor scene.

18. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of three-dimensional reconstruction of an indoor scene as claimed in any one of claims 1 to 16 when the program is executed by the processor.

19. A non-transitory readable storage medium having stored thereon a computer program, which when executed by a processor implements the indoor scene three-dimensional reconstruction method according to any one of claims 1 to 16.