CN117994311A

CN117994311A - Method and device for generating room house type graph

Info

Publication number: CN117994311A
Application number: CN202211350967.3A
Authority: CN
Inventors: 王子琪; 林曼青; 陈普
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2024-05-07

Abstract

The application provides a method and a device for generating a room type graph, wherein the method comprises the following steps: acquiring information of dense point clouds of at least one room and an initial architecture model diagram of the at least one room; wherein the information of the dense point cloud of one room is applied to the scale optimization of the initial architecture model diagram of one room, the information of the dense point cloud of one room is a dense three-dimensional point set used for representing the structure of the room, and the information of the dense point cloud of one room is obtained by utilizing the image and inertia information of the room; performing scale optimization on the initial architecture model graph of each room based on the information of the dense point cloud of each room to obtain a first architecture model graph of each room; obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room; and generating a room type graph based on the standard architecture model graph corresponding to the at least one room respectively. The method can generate the high-quality multi-room house type graph which is easy to popularize with low cost.

Description

Method and device for generating room house type graph

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a room layout.

Background

With the stable growth of the house watching market, in order to facilitate the renting and the purchase of houses of users, the house type diagrams of the rooms on the line are released at the present stage so as to enable the users to realize the on-line house watching. The existing room house type diagram is mainly drawn by manpower, and is low in efficiency and high in cost. In addition, some schemes for automatically generating the room type images exist, but the schemes capable of automatically generating the high-quality multi-room type images need high-cost equipment, and the requirements of automatically generating the high-quality multi-room type images cannot be met because images acquired through low-cost equipment such as mobile phones or panoramic equipment are used as input images. Therefore, how to generate a high-quality multi-room house type graph with low cost and easy popularization is a technical problem to be solved.

Disclosure of Invention

A method and a device for generating a room house type graph can generate a high-quality multi-room house type graph which is easy to popularize with low cost.

In a first aspect, the present application provides a method for generating a room type graph, where the method may be executed by a cloud generating device, for example, by a cloud generating module, a cloud server, or the like, and the method may also be executed by a generating device of a room type graph of a certain entity, or may be executed by a component (for example, a chip, or a chip system, or a circuit) of the generating device of the room type graph of the entity. The method specifically comprises the following steps: acquiring information of dense point clouds of at least one room and an initial architecture model diagram of the at least one room; wherein the information of the dense point cloud of one room is applied to the scale optimization of the initial architecture model diagram of one room, the information of the dense point cloud of one room is a dense three-dimensional point set used for representing the structure of the one room, and the information of the dense point cloud of one room is obtained by using the image and the inertia information of the one room; performing scale optimization on the initial architecture model graph of each room based on the information of the dense point cloud of each room to obtain a first architecture model graph of each room; obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room; and generating a room type graph based on the standard architecture model graph corresponding to the at least one room respectively.

The architecture model diagram of a room may also be referred to as an architecture model of the room, a structure model diagram of the room; the user can intuitively see the layout, the size and the like of the room through the architecture model diagram of the room.

In the scheme of the application, aiming at a user house (the house can comprise at least one room), generating means of a house type map (also can be generating means of a cloud) firstly acquire information of dense point clouds of at least one room and an initial architecture model map of the at least one room, wherein the information of the dense point clouds of one room corresponds to scale optimization of the initial architecture model map of one room, the information of the dense point clouds of one room is a dense three-dimensional point set used for representing a structure of the one room, and the information of the dense point clouds of the one room is obtained by utilizing images and inertia information of the one room; the generation device of the house type graph can conduct scale optimization on the initial architecture model graph of the corresponding room based on the information of the dense point cloud of each room, so that the scale of each room structure is more accurate; further, the generating device of the house type graph obtains a standard architecture model graph of each room based on the architecture model graph (namely the first architecture model graph) after the dimension optimization of each room; and the generating device of the final house type graph generates a house type graph of the room based on the standard architecture model graph corresponding to the at least one room respectively. In the scheme of the application, the obtained information of the dense point cloud of the at least one room has no requirement on higher precision, the generation device of the house type graph can be obtained through simple equipment (such as a mobile phone or a camera and the like) with a shooting function, the accurate house type graph can be generated based on the information of the dense point cloud of the at least one room with lower precision, and the generation cost is low, so that the generation method of the house type graph provided by the application has higher usability and popularization.

In one embodiment, the obtaining information of the dense point cloud of at least one room includes: acquiring information of a dense point cloud of an ith room of the at least one room, comprising: acquiring pose information and sparse point cloud information of an RGB image of the ith room and a slice diagram of a panoramic image of the ith room; the sparse point cloud is a sparse three-dimensional point set used for representing the structure of the ith room; registering a slice diagram of the panoramic image based on pose information of the RGB image of the ith room and information of sparse point cloud to obtain pose information of the slice diagram of the panoramic image and information of the sparse point cloud; performing stereo matching on pose information of the panoramic slice image and sparse point cloud to obtain at least one depth image of the ith room; filtering and fusing at least one depth map of the ith room to obtain information of dense point clouds of the ith room, wherein the depth map of each ith room corresponds to an image frame of the ith room; wherein i takes any one positive integer from 1 to N, which is the number of the rooms.

With this embodiment, the generation device of the house type map (may also be a generation device of the cloud) may effectively obtain the information of the dense point cloud of each room based on the information and pose information of the sparse point cloud of each room and the slice map of the panoramic image of each room.

In one embodiment, the acquiring pose information of the RGB image of the i-th room and information of the sparse point cloud includes: acquiring an RGB image of the ith room and inertial information when shooting the room; based on the RGB image of the ith room and the inertia information, estimating and obtaining initial pose information of the ith room and information of an initial point cloud, wherein the initial point cloud is a two-dimensional point set for representing the structure of the ith room; optimizing the initial pose information of the ith room to obtain pose information of an RGB image of the ith room, and quantifying the information of the initial point cloud of the ith room to obtain the information of the sparse point cloud of the ith room.

According to the embodiment, the generation device of the house type graph (also can be a cloud generation device) can effectively estimate and obtain the information of the initial pose and the initial point cloud of each room based on the RGB image and the shot inertia information of each room, and the initial point cloud is a two-dimensional point set for representing the structure of the room; the initial pose information of each room is further optimized, and the initial point cloud information of each room is quantized, so that the pose information of the relatively accurate RGB image and the sparse point cloud information of each room can be effectively obtained.

In one embodiment, obtaining an initial architecture model diagram of at least one room includes: acquiring an initial architecture model graph of an ith room of the at least one room, comprising: detecting the door position of the panoramic image of the ith room, and determining the door position information of the ith room; performing structure detection on the panoramic image of the ith room, and determining the structure information of the first room; obtaining an initial architecture model diagram of the ith room based on the position information of the door of the ith room and the structure information of the first room; wherein i takes any one positive integer from 1 to N, which is the number of the rooms.

According to the embodiment, the generation device of the house type graph (also can be a cloud generation device) can effectively detect the door position information and the structure information of the corresponding rooms based on the panoramic image of each room, further can effectively obtain the initial architecture model graph of each room, and optimizes and adjusts based on the initial architecture model graph of each room in a subsequent step so as to obtain the accurate architecture model graph of each room.

In one embodiment, the method further comprises: acquiring an RGB image of the ith room; and generating the panoramic image of the ith room by adopting a characteristic panoramic stitching method based on the RGB image of the ith room.

With this embodiment, if the user's electronic device (i.e., an imaging device such as a mobile phone or a camera) only supports imaging an RGB image of a room, and does not support imaging a panoramic image of the room, the panoramic image of the room can be effectively obtained by using the RGB image of the room imaged by the electronic device.

This embodiment may not be performed if the user's electronic device (i.e., camera, e.g., cell phone, camera, etc.) supports capturing panoramic images of the room.

In one embodiment, the performing scale optimization on the initial architecture model graph of the corresponding room based on the information of the dense point cloud of each room to obtain a first architecture model graph of each room includes: and performing scale optimization on the initial architecture model of each room based on the information of the dense point cloud of each room and the pre-constructed multi-sensor scale optimization model of each room to obtain a first architecture model diagram of each room.

According to the embodiment, the generating device of the house type graph (also can be a cloud generating device) can effectively improve the scale of the initial architecture model of each room by using the dense point cloud (namely, the 3D point cloud) of each room and the pre-built multi-sensor scale optimization model of each room, so that an architecture model graph (namely, a first architecture model graph) of the room with relatively accurate scale can be obtained.

In one embodiment, the obtaining a standard architecture model graph of each room based on the first architecture model graph of each room includes: obtaining a standard architecture model diagram of the ith room based on the first architecture model diagram of the ith room, wherein the standard architecture model diagram comprises the following steps: acquiring a plurality of frames of the first architecture model diagram of the ith room; adjusting and fusing the multi-frame first architecture model graph to obtain a standard architecture model graph of the ith room; wherein i takes any one positive integer from 1 to N, which is the number of the rooms.

The multi-frame panoramic image of the ith room is a panoramic image captured by the electronic device of the user for different positions of the ith room, and the generating device of the house type image (may also be a generating device of the cloud) may obtain the corresponding first architecture model image based on the panoramic image captured by each position of the ith room by using the generating manner of the first architecture model image in the above steps, which is not described herein again.

According to the embodiment, the generating device of the house type graph (also can be a cloud generating device) can obtain the architecture model graph (namely the first architecture model graph) of the multiframe of each room after the dimension optimization, and then the multiframe architecture model graph of each room is subjected to adjustment and fusion processing, so that the accurate architecture model graph of each room can be effectively obtained.

In addition, as can be seen from the above embodiment, the requirement of the solution of the present application on the obtained point cloud of each room is not very high, so that the generating device of the house type map (may also be the generating device of the cloud) may obtain the RGB image of each room or the panoramic image of each room through a simple photographing device (such as a mobile phone or a common camera); then, the generating device of the house type graph (also can be a cloud generating device) can generate the information of the point cloud with low accuracy of each room based on the panoramic image of each room; further, the initial architecture model diagram of each room can be optimized and adjusted through the multi-sensor scale optimization and multi-frame fusion mode in the scheme of the application, so that the accurate architecture model diagram of each room can be obtained, and the problem of incomplete room structure caused by generating the architecture model of each room based on the single-frame panoramic image of each room is avoided. Obviously, the scheme of the application not only can improve the accuracy of the generated room architecture model diagram, but also can reduce the equipment cost, so that the scheme of the application has easier usability and popularization.

In one embodiment, generating a room type graph based on the standard architecture model graph corresponding to the at least one room, includes: based on the standard architecture model diagrams respectively corresponding to the at least one room, calculating to obtain the adjacent relation between the at least one room according to the distance between the at least one room; and according to the adjacent relation between the at least one room, the standard architecture model diagrams corresponding to the adjacent rooms are spliced in sequence, and the room type diagram is generated.

According to the embodiment, the generation device of the house type graph (also can be a cloud generation device) can effectively and accurately obtain the whole house type graph of the house.

In one embodiment, the method further comprises: and performing deduplication processing on the standard architecture model diagrams corresponding to the at least one room respectively. By the implementation mode, the situation that a plurality of standard architecture model diagrams exist in each room, and image overlapping or errors occur when the room type diagrams are finally generated can be avoided, and the processing workload can be reduced.

In a second aspect, the present application provides a device for generating a room type graph, where the device for generating a room type graph may be configured to perform the method of the first aspect, where the device for generating a room type graph may be a cloud end generating device, for example, a cloud end generating module, a cloud end server, or a certain entity device, or may be a component (for example, a chip, or a chip system, or a circuit) in a certain entity device, or may be a device that can be used in match with the device for generating a room type graph.

In a possible implementation manner, the generating device of the room type graph may include modules or units corresponding to each other in a one-to-one manner to perform the method/operation/step/action described in the first aspect, where the modules or units may be hardware circuits, software, or a combination of hardware circuits and software implementation. In a possible implementation manner, the generating device of the room type graph may include a processing module and a communication module. The processing module is used for calling the communication module to execute the receiving and/or transmitting function.

In one possible implementation manner, the device for generating the room type graph comprises a communication unit and a processing unit; the processing unit may be configured to invoke the communication unit to perform a function of receiving and/or transmitting; the communication unit is used for acquiring information of dense point clouds of at least one room and an initial architecture model diagram of the at least one room; wherein the information of the dense point cloud of one room is applied to the scale optimization of the initial architecture model diagram of one room, the information of the dense point cloud of one room is a dense three-dimensional point set used for representing the structure of the one room, and the information of the dense point cloud of one room is obtained by utilizing the image and inertia information of the one room; the processing unit is used for performing scale optimization on the initial architecture model graph of the corresponding room based on the information of the dense point cloud of each room to obtain a first architecture model graph of each room; obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room; and generating a room type graph based on the standard architecture model graph corresponding to the at least one room respectively.

In a possible implementation manner, the communication unit is specifically configured to, when acquiring information of a dense point cloud of at least one room:

The communication unit is specifically configured to, when acquiring information of a dense point cloud of an i-th room in the at least one room: acquiring pose information and sparse point cloud information of an RGB image of the ith room and a slice diagram of a panoramic image of the ith room; the sparse point cloud is a sparse three-dimensional point set used for representing the structure of the ith room; registering a slice diagram of the panoramic image based on pose information of the RGB image of the ith room and information of sparse point cloud to obtain pose information of the slice diagram of the panoramic image and information of the sparse point cloud; performing stereo matching on the pose information of the panoramic slice image and the sparse point cloud to obtain at least one depth image of the ith room; filtering and fusing at least one depth map of the ith room to obtain information of dense point clouds of the ith room, wherein the depth map of each ith room corresponds to one image frame of the ith room; and i is any positive integer from 1 to N, wherein N is the number of the rooms.

In a possible implementation manner, the communication unit is specifically configured to, when acquiring pose information of an RGB image of an i-th room and information of a sparse point cloud: acquiring an RGB image of the ith room and inertial information when shooting the room; based on the RGB image of the ith room and the inertia information, estimating and obtaining initial pose information of the ith room and information of an initial point cloud, wherein the initial point cloud is a two-dimensional point set for representing the structure of the ith room; optimizing the initial pose information of the ith room to obtain pose information of the RGB image of the ith room, and quantifying the initial point cloud information of the ith room to obtain sparse point cloud information of the ith room.

In a possible implementation manner, the communication unit is specifically configured to, when acquiring an initial architecture model diagram of at least one room: acquiring an initial architecture model of an ith room of the at least one room, comprising: detecting the door position of the panoramic image of the ith room, and determining the door position information of the ith room; performing structure detection on the panoramic image of the ith room, and determining the structure information of the first room; obtaining an initial architecture model diagram of the ith room based on the position information of the door of the ith room and the structure information of the first room; and i is any positive integer from 1 to N, wherein N is the number of the rooms.

In a possible implementation, the communication unit is further configured to: acquiring an RGB image of the ith room; and generating the panoramic image of the ith room by adopting a characteristic panoramic stitching method based on the RGB image of the ith room through the processing unit.

In a possible implementation manner, the processing unit is specifically configured to, when performing scale optimization on an initial architecture model diagram of each room based on information of a dense point cloud of each room to obtain a first architecture model diagram of each room: and performing scale optimization on the initial architecture model of each room based on the information of the dense point cloud of each room and the pre-constructed multi-sensor scale optimization model of each room to obtain a first architecture model diagram of each room.

In a possible implementation manner, the processing unit is specifically configured to, when obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room: obtaining a standard architecture model diagram of an ith room based on a first architecture model diagram of the ith room, wherein the standard architecture model diagram comprises the following steps: acquiring a plurality of frames of the first architecture model diagrams of the ith room; adjusting and fusing the multi-frame first architecture model graph to obtain a standard architecture model graph of the ith room; and i is any positive integer from 1 to N, wherein N is the number of the rooms.

In a possible implementation manner, the processing unit is specifically configured to, when generating a room type graph based on the standard architecture model graphs corresponding to the at least one room respectively: calculating the adjacent relation between the at least one room according to the distance between the at least one room based on the standard architecture model diagrams respectively corresponding to the at least one room; and according to the adjacent relation between the at least one room, the standard architecture model diagrams corresponding to the adjacent rooms are spliced in sequence, and the room type diagram is generated.

In a possible implementation manner, the processing unit is further configured to: and performing duplicate removal processing on the standard architecture model diagrams corresponding to the at least one room respectively.

In a third aspect, embodiments of the present application provide a computer storage medium having stored therein a software program which, when read and executed by one or more processors, implements a method as provided by any one of the possible embodiments of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method provided by any one of the possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a chip system, where the chip system includes a processor, and the processor is configured to perform the functions referred to in the first aspect.

In one possible design, the system on a chip may further include a memory to hold the necessary program instructions and data. The chip system may be formed of a chip or may include a chip and other discrete devices.

In a sixth aspect, in an embodiment of the present application, there is further provided a chip system, where the chip system includes a processor and an interface, where the interface is configured to obtain a program or an instruction, and the processor is configured to call the program or the instruction to implement or support the device to implement the function related to the first aspect.

In one possible design, the chip system may further include a memory for storing program instructions and data necessary for the terminal device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

The technical effects achieved by any one of the possible implementations of the second aspect to the sixth aspect may be referred to for description of the technical effects achieved by any one of the possible implementations of the first aspect, and the detailed description is not repeated here.

Drawings

Fig. 1A is a schematic diagram of a architecture to which a method for generating a room type graph according to an embodiment of the present application may be applied;

Fig. 1B is a schematic structural diagram of an electronic device according to an embodiment of the present application;

Fig. 2A is a flow chart of a method for generating a room type graph according to an embodiment of the present application;

Fig. 2B is a specific flowchart of generating a room type graph by using a cloud according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a method according to an embodiment of the present application;

FIG. 4A is a schematic flow chart of a method according to an embodiment of the present application;

fig. 4B is a schematic flow structure diagram of pose and sparse point cloud for generating RGB images according to the present application;

Fig. 4C is a schematic flow structure diagram of pose and sparse point cloud of a slice diagram for generating a panoramic image according to the present application;

FIG. 5A is a schematic flow chart of a method according to an embodiment of the present application;

FIG. 5B is a schematic diagram of a Mask R-CNN according to the present application;

FIG. 5C is a graph showing the effect of detecting the door position based on the deep learning network Mask R-CNN;

FIG. 5D is a schematic flow chart of the deep learning network according to the present application;

fig. 5E is an effect diagram of detecting corner points based on the deep learning network (hohonet);

FIG. 5F is a schematic diagram of an initial architectural model of a room generated by a generation module according to the present application;

FIG. 6 is a schematic flow chart of a method according to an embodiment of the present application;

FIG. 7A is a schematic flow chart of a method according to an embodiment of the present application;

FIG. 7B is a schematic illustration of the effect of gravity adjustment according to the present application;

FIG. 8 is a schematic flow chart of a method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a room type map generating device according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of a room type map generating device according to an embodiment of the present application;

fig. 11 is a schematic diagram of a device structure of a chip according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in embodiments of the present application, "one or more" means one, two, or more than two; "and/or", describes an association relationship of the association object, indicating that three relationships may exist; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise. The word "exemplary" or "such as" is used to mean an example, instance, or illustration, and any embodiment or design described as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. The use of the word "exemplary" or "such as" is intended to present the relevant concepts in a concrete fashion to facilitate understanding.

The plurality of the embodiments of the present application is greater than or equal to two. It should be noted that, in the description of the embodiments of the present application, the terms "first," "second," and the like are used for distinguishing between the descriptions and not necessarily for indicating or implying a relative importance, or alternatively, for indicating or implying a sequential order.

The application provides a method for generating a room type graph, and in order to better understand the scheme of the embodiment of the application, the application of the room type graph is described in the following.

At present, house market on-line views are steadily increased, on-line progress of house purchase and renting is accelerated, the VR views the house rate to gradually climb, and on-line views of house users are more prone to having more detailed three-dimensional house types. The current room house type diagram is mainly drawn by manpower, and has low efficiency and high cost. In addition, some schemes for automatically generating house types are limited, and cannot meet the requirements of users, such as the following schemes:

Scheme one: the method comprises the steps that a house type graph reconstruction scheme based on point cloud data is adopted, and a house type graph generation device acquires high-quality point cloud of a room image (namely, the high-quality point cloud is used as an input); then, the house type graph generating device projects wall point clouds in the point clouds of the room to a two-dimensional plane, and the projected two-dimensional plane is divided into an element point set, an element side set and an element plane set; secondly, the house type graph generating device determines an outline polygon of the house based on the element point set and the element side set; further, the user pattern generating device determines the target inter-component category of each element surface positioned in the outer outline polygon in the element surface set; and finally, the house type graph generating device divides the interior of the outline polygon according to the determined target division categories so as to obtain the house type graph of the house.

The scheme can automatically generate the multi-room house type graph, the generation speed is high, but the quality requirement on the point cloud of the house is very high, laser or structured light equipment is required to be used for acquisition, and the equipment is high in cost, large in size and difficult to widely popularize.

Scheme II: according to the building scheme based on the house type graph, the house type graph generating device acquires at least one panoramic picture of each room of a house to be modeled through panoramic camera shooting, and acquires a corresponding camera position and a camera placement height; then, vertically correcting at least one panoramic picture of each room; secondly, the house type graph generating device detects each wall corner point of a first plane in at least one panoramic picture after vertical correction of each room; further, the house type graph generating device projects each wall corner point of a first plane in at least one panoramic picture after vertical correction of each room into the three-dimensional virtual space based on the camera placement height of each room, and forms a plane house type graph corresponding to each room based on each wall corner point of the first plane projected into the three-dimensional virtual space; and finally, the house type graph generating device forms a plane house type graph of the house to be modeled based on the information of the camera position corresponding to the at least one panoramic picture after the vertical correction of each room and the plane house type graph corresponding to each room.

In the scheme, only acquisition equipment with low cost such as a mobile phone, a panoramic camera and the like is needed, but in the house type image generated by the scheme, the corner detection is carried out on the panoramic image through a deep learning network, the detected corner is projected back to a spherical coordinate system, so that a frame model of a single house is obtained, however, a plurality of pixels of the corner detection deviation possibly cause larger scale deviation in the process of projecting back the spherical coordinate system, and in addition, for a complex house type, shielding exists in the shooting process, so that all corners of the house cannot be shot in a single Zhang Quanjing image, and the generated house frame model is incomplete or wrong in structure.

In summary, in the room type graph generation scheme existing in the current industry, a higher-cost device is required for automatically generating a high-quality multi-room type graph, and the device which is acquired as an input through a mobile phone or panoramic low-cost device cannot meet the requirement of automatically generating the high-quality multi-room type graph. Therefore, how to generate a high-quality multi-room house type graph with low cost and easy popularization is a technical problem to be solved.

The application provides a method for generating a house type graph, which comprises the following steps: acquiring information of dense point clouds of at least one room and an initial architecture model diagram of the at least one room; wherein the information of the dense point cloud of one room is applied to the scale optimization of the initial architecture model graph of one room, the information of the dense point cloud of one room being a dense three-dimensional point set for characterizing the structure of the one room; performing scale optimization on the initial architecture model graph of each room based on the information of the dense point cloud of each room to obtain a first architecture model graph of each room; obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room; and generating a room type graph based on the standard architecture model graph corresponding to the at least one room respectively. The method provided by the application can generate the high-quality multi-room house type graph which is easy to popularize with low cost.

The method for generating the room type graph provided by the embodiment of the application can be applied to but not limited to Virtual Reality (VR) and augmented reality (augmented reality, AR) house watching scenes, and the method for generating the room type graph mainly comprises the step of generating the end-to-end multi-room type graph (2D+3D).

The method provided by the application can be implemented in the cloud end, and executed by a device in the cloud end, for example, a generation module in the cloud end, a server in the cloud end and the like, and also can be executed by a generation device of a room type graph, wherein the generation device of the room type graph can be a virtual (e.g. software) or physical device. Therefore, the embodiments of the present application are not limited to the main body and the specific form of the main body for performing the method of the present application.

In addition, when the method of the embodiment of the application is executed, an electronic device can shoot RGB (red, green and blue) pictures (images) and/or shoot panoramic pictures (images) and/or record videos, and the electronic device mainly provides images or image related information of a room for a generating device (also can be a cloud generating device) of the room house type graph implemented by the application according to the information such as the images or videos shot by the electronic device. The generating device of the room type graph can be located at the cloud end and is in communication connection with the electronic equipment; or the generating device of the room type graph is independent equipment and is connected and communicated with the electronic equipment; or the generating device of the room type graph can be located in the electronic equipment in a module form or a software form. The embodiment of the application does not limit the position relation and the connection communication mode between the device for generating the room house type and the electronic equipment.

Fig. 1A shows a possible architecture diagram to which the method of the embodiment of the present application may be applied, as shown in fig. 1A, where an electronic device is independent of a room type diagram generating device (may also be a cloud generating device), and the electronic device may be communicatively connected to the room type diagram generating device (may also be a cloud generating device), and the electronic device photographs at least one room in a house to obtain an image and image information (such as inertial information during photographing) of the at least one room, inputs the image and image information (such as inertial information during photographing) of the at least one room into the room type diagram generating device (may also be a cloud generating device), generates a room type diagram through processing of the room type diagram generating device (may also be a cloud generating device), and then outputs the room type diagram to the electronic device, where a user may view the room type diagram by using a display screen of the electronic device.

In some embodiments of the application, the electronic device may be a portable electronic device that includes functionality such as a camera, such as a cell phone, tablet, wearable device with wireless communication functionality (e.g., a smart watch), a vehicle-mounted device, and the like. Exemplary embodiments of portable electronic devices include, but are not limited to, piggy-backOr other operating system. The portable electronic device may be a Laptop computer (Laptop) or the like. It should also be appreciated that in other embodiments of the application, the electronic device may also be a desktop computer. In some embodiments of the application, the cloud server may be one or more desktop computers or the like.

Fig. 1B is a schematic diagram illustrating an alternative hardware structure of an electronic device 100 according to an embodiment of the present application.

As shown in fig. 1B, the electronic device 100 includes a processor 110, an internal memory 121, an external memory interface 122, an antenna 1, a mobile communication module 131, an antenna 2, a wireless communication module 132, an audio module 140, a speaker 140A, a receiver 140B, a microphone 140C, an earphone interface 140D, a display 151, a user identification module (subscriber identification module, SIM) card interface 152, a camera 153, keys 154, a sensor module 160, a universal serial bus (universal serial bus, USB) interface 170, a charge management module 180, a power management module 181, and a battery 182. In other embodiments, the electronic device may also include a motor, an indicator, and the like.

Wherein the processor 110 may include one or more processing units. For example: the processor 110 may include an application processor (application processor, AP), a modem, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural-Network Processor (NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

In some embodiments, a memory may also be provided in the processor 110 for storing instructions and data.

The internal memory 121 may be used to store one or more computer programs, including instructions. The processor 110 may generate a sequence of images, etc. by executing the above-described instructions stored in the internal memory 121 for processing the room images provided to the room profile generating means.

The internal memory 121 may include a storage program area and a storage data area. The storage program area can store an operating system; the storage area may also store one or more applications (e.g., gallery, contacts, etc.), and so forth. The storage data area may store data (e.g., images, contacts, etc.) created during use of the electronic device 100, and so forth. In addition, the internal memory 121 may include high-speed random access memory, and may also include nonvolatile memory, such as one or more magnetic disk storage devices, flash memory devices, and the like. In some embodiments, the processor 110 may process the room images provided to the room profile generation device, generate a sequence of images, etc., by executing instructions stored in the internal memory 121, and/or instructions stored in a memory provided in the processor 110.

The external memory interface 122 may be used to connect an external memory card (e.g., a Micro SD card) to enable expansion of the memory capabilities of the electronic device. The external memory card communicates with the processor 110 via an external memory interface 122 to implement data storage functions. For example, files such as images, videos, etc. are stored in an external memory card.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device may be used to cover a single or multiple communication bands, and different antennas may also be multiplexed.

The mobile communication module 131 may provide a solution for wireless communication including 2G/3G/4G/5G or the like applied on an electronic device. The mobile communication module 131 may include filters, switches, power amplifiers, low noise amplifiers (low noise amplifier, LNAs), etc.

The wireless communication module 132 may provide solutions for wireless communication including WLAN (e.g., wi-Fi network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), etc. as applied on electronic devices. The wireless communication module 132 may be one or more devices that integrate at least one communication processing module.

In some embodiments, the antenna 1 of the electronic device 100 is coupled to the mobile communication module 131, and the antenna 2 is coupled to the wireless communication module 132, so that the electronic device 100 may communicate with a network and other devices (such as a generation device of a room-type diagram in the implementation of the present application) or a generation device of a cloud through a wireless communication technology. The wireless communication technology may include global system for mobile communications (global system for mobile communications, GSM), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, and/or IR technologies, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), or the like.

The electronic device 100 may implement audio functions through an audio module 140, a speaker 140A, a receiver 140B, a microphone 140C, an earphone interface 140D, an application processor, and the like.

The electronic device 100 may implement display functions through a GPU, a display screen 151, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 151 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display 151 may be used to display images, video, etc. The display 151 may include a display panel. The display panel may employ a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, an organic light-emitting diode (OLED), or the like. In some embodiments, the electronic device may include 1 or N display screens 151, N being a positive integer greater than 1.

The electronic device may implement a photographing function through an ISP, a camera 153, a video codec, a GPU, a display screen 151, an application processor, and the like. Wherein the ISP may be used to process the data fed back by the camera 153. For example, when photographing, a shutter is opened, an optical signal is collected by the camera 153, and then the camera 153 converts the collected optical signal into an electrical signal, and the electrical signal is transmitted to an ISP for processing and is converted into an image visible to the naked eye. ISP can also perform algorithm optimization on noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature, etc. of the photographed scene. In some embodiments, the ISP may be provided in the camera 153. The camera 153 may be used to capture still images or video of a room. Typically, the camera 153 includes a lens and an image sensor. Wherein the object generates an optical image through the lens and projects the optical image to the graphics sensor. The image sensor may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The image sensor converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format.

In some embodiments, the electronic device may include 1 or N cameras 153, N being a positive integer greater than 1. For example, the electronic device may include two cameras 153, where one camera 153 is a front camera and the other camera 153 is a rear camera. For another example, the electronic device may further include three cameras 153, where one camera 153 is a front camera and the other two cameras 153 are rear cameras; or one camera 153 is a rear camera and the other two cameras 153 are front cameras. For another example, the electronic device includes four cameras 153, where one camera 153 is a front camera and the other three cameras 153 are rear cameras.

The keys 154 may include a power on key, a volume key, etc. The keys 154 may be mechanical keys. Or may be a touch key. The sensor module 160 may include one or more sensors. For example, a touch sensor 160A, a fingerprint sensor 160B, a pressure sensor 160C, and the like. In some embodiments, the sensor module 160 may also include a gyroscope sensor, an environmental sensor, a distance sensor, a proximity light sensor, a bone conduction sensor, an acceleration sensor, and the like.

In other embodiments, the processor 110 may also include one or more interfaces. For example, the interface may be a SIM card interface 152. As another example, the interface may also be a USB interface 170. For another example, the interface may also be an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, or the like. It will be appreciated that the processor 110 of the present embodiment may interface with different modules of an electronic device, thereby enabling the electronic device to perform different functions. Such as photographing, processing, etc. It should be noted that, the connection mode of the interface in the electronic device is not limited in the embodiment of the present application.

It should be understood that the hardware configuration shown in fig. 1B is only one example. The electronic device of embodiments of the present application may have more or fewer components than shown in fig. 1B, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 1B may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

In order to better understand the scheme of the embodiment of the present application, the conceptual terms related to the embodiment of the present application are explained first.

1) Room house type diagram: the pattern diagram of the house is usually referred to as a space layout diagram of the house, i.e. a pattern describing the usage function, corresponding position and size of each independent space. The trend layout of the house can be intuitively seen.

Typically, the house type map may include a two-dimensional (2D) type map and a three-dimensional (3D) type map, and the three-dimensional type map is an upgrade of the two-dimensional type map, and may transmit the decoration and the use condition of a room to a user while intersecting the space layout, thereby providing the user with much information in detail.

2) And (3) point cloud: the set of points on the apparent surface of the object scanned by the three-dimensional scanning device may be referred to as a point cloud (point cloud). A point cloud is a set of vectors in a three-dimensional coordinate system. These vectors are typically expressed in terms of x, y, z three-dimensional coordinates and are generally primarily used to represent the shape of the exterior surface of an object. Furthermore, in addition to the geometric position information represented by (x, y, z), the point cloud may represent RGB colors, gray values, depths, object reflection surface intensities, and the like of one point. The point cloud coordinate system referred to in the embodiment of the present application is a three-dimensional (x, y, z) coordinate system in which the point cloud point is located.

The point clouds can be roughly classified into dense point clouds and sparse point clouds according to the density of points in the point clouds. Among other things, dense point clouds can be used to finely reconstruct 3D objects, such as characters, rooms, etc., which can be widely used for virtual reality and augmented reality. The 3D object reconstructed by the dense point cloud supports 6 degrees of freedom, and compared with a 360-degree panoramic video, the 3D object reconstructed by the dense point cloud can only support 3 degrees of freedom, so that better visual experience can be brought to a user. The dense point cloud can reconstruct a 3D scene with high precision, and can be used in applications such as automatic driving, robot vision and the like by combining with a high-definition image video acquired by a 2D camera.

3) Pose: generally refers to the relative orientation and position of an object with respect to a camera, which a user may change pose by moving the object with respect to the camera or the camera with respect to the object.

In SLAM, pose is a transformation of the world coordinate system to the camera coordinate system, including rotation and translation; typically the world coordinate system may be user-defined and not altered after definition; the camera coordinate system is a coordinate system composed of the optical center of the camera as the origin, and the camera coordinate system is also moving because the camera is moving.

The pose is usually represented by an euclidean transformation in three dimensions, and the transformation matrix T is most commonly represented by a rotation R and a translation vector T, respectively.

In the embodiment of the application, the pose of the image refers to the fact that a 3D object in the image is positioned based on a single RGB image, and the rigid body change from an object coordinate system to a camera coordinate system is determined so as to determine the relative direction and position of the object to the camera.

4) Corner point: corner points, i.e. points where properties are particularly prominent in some way, refer to points with representativeness and robustness in point clouds and images, such as the intersection of two edges, etc.

5) Region of interest (region of interesting, ROI): in the image processing, a region to be processed is outlined from the processed image in a box, circle, ellipse, irregular polygon and other modes, and is called a region of interest, and in the embodiment of the application, the region of interest can be regarded as a region where an object exists in the image.

6) Multiple sensors: refers to a device or apparatus that may collect, provide information, aggregate, or combine information from multiple or multiple sensors for use.

7) A plurality of coordinate systems involved in embodiments of the present application: the process of image capturing of a room by an electronic device with a capturing function can be understood as the imaging process of a camera, essentially an optical imaging process, which usually involves four coordinate systems: world coordinate system, camera coordinate system, image coordinate system, and pixel coordinate system. Four coordinate systems are described below:

World coordinate system: is an absolute coordinate system of an objective three-dimensional world, also called an objective coordinate system. Because the camera is placed in a three-dimensional space, the world coordinate system, which is a reference coordinate system, is required to describe the position of the camera and is used to describe the position of any other object placed in the three-dimensional environment, its coordinate values can be expressed in terms of (Xw, yw, zw).

Camera coordinate system (optical center coordinate system): the optical center of the camera is taken as the origin of coordinates, the X axis and the Y axis are respectively parallel to the X axis and the Y axis of the image coordinate system, the optical axis of the camera is taken as the Z axis, and the coordinate values can be expressed by (Xc, yc, zc).

Image coordinate system: the center of the CCD image plane is taken as the origin of coordinates, the X axis and the Y axis are respectively parallel to two vertical sides of the image plane, and the coordinate values are expressed by (X, Y). The image coordinate system is a representation of the position of a pixel in an image in physical units (e.g., millimeters).

Pixel coordinate system: the coordinate values of the X-axis and the Y-axis are respectively parallel to the X-axis and the Y-axis of the image coordinate system by taking the top left corner vertex of the CCD image plane as the origin, and the (u, v) is used for representing the coordinate values. The image captured by the camera is first in the form of a standard electrical signal and then converted to a digital image by analog to digital conversion. The stored form of each image is an M x N array, and the value of each element in the image of M rows and N columns represents the gray scale of the image point. Each such element is called a pixel, and the pixel coordinate system is an image coordinate system in units of pixels.

8) Calibration of the camera and parameters of the camera:

in the embodiment of the present application, a camera may be understood as a camera device on an electronic device, which is mainly used for shooting, for example, a lens of a camera, a video camera of a mobile phone, and the like, which are collectively referred to herein as a camera.

Calibrating a camera: in image measurement processes and machine vision applications, in order to determine the correlation between the three-dimensional geometrical position of a point on the surface of a spatial object and its corresponding point in the image, it is necessary to build geometric models of camera imaging, these geometric model parameters being parameters of the camera (internal parameters of the camera, external parameters of the camera, distortion parameters). Under most conditions these parameters must be obtained by experimentation and calculation, and the process of solving these parameters is called camera calibration (or camera calibration).

The calibration method of the camera generally comprises the following steps: a traditional camera calibration method, an active vision camera calibration method, a camera self-calibration method and a zero-distortion camera calibration method.

The parameters of the camera mainly include: internal parameters of the camera, external parameters of the camera, and distortion parameters.

Internal parameters of the camera: refers to parameters related to the camera's own properties, such as focal length, pixel size, etc. of the camera.

Internal matrix of camera (CAMERA INTRINSICS matrix): the camera internal reference matrix is usually used when an image coordinate system is converted into a pixel coordinate system, reflects the attribute of the camera, and can be obtained through camera calibration. The internal matrix of the camera may be denoted by K, which may be expressed as:

Wherein f _x represents the length of the focal length in the x-axis direction, f _y represents the length of the focal length in the y-axis direction, and f _x、f_y is in pixels; (u ₀,v₀) represents the coordinates (relative to the imaging plane) of the principal point, and the unit is also a pixel; gamma denotes a coordinate axis tilt parameter, ideally 0.

External parameters of the camera: refers to parameters in the world coordinate system that describe the camera's motion in a static scene or rigid motion of a moving object, such as the camera's position, direction of rotation, etc., when the camera is stationary.

The external parameters of the camera include: rotation parameters omega, delta, theta corresponding to three direction axes (i.e., x, y, z axes) in the spatial coordinate system, and translation parameters T _x、T_y、T_z corresponding to the three direction axes (i.e., x, y, z axes).

External reference matrix (camera extrinsic matrix) of camera: typically, the world coordinate system is converted to the camera coordinate system, and the external reference matrix of the camera includes a rotation matrix and a translation matrix, and usually the pose of the camera under the world coordinate system can be described by the rotation matrix and the translation vector of the camera, that is, through the external reference matrix of the camera, it is known how the real world point (on the world coordinate) falls on another real world point (on the camera coordinate system) after undergoing rotation and translation.

The extrinsic matrix of a camera can be expressed as: wherein R is a rotation matrix, and T is a translation matrix.

Distortion parameters: distortion parameters can be understood as parameters related to distortion, typically internal parameters of a camera, including: k1, k2, k3, p1, p2; where k1, k2, k3 are radial distortion coefficients and p1, p2 are tangential distortion coefficients.

Distortion may be introduced due to the lens manufacturing accuracy of the camera and the deviation of the assembly process, resulting in distortion of the original image. The aberrations of the lens mainly include tangential and radial aberrations. The radial distortion occurs in the process of converting the camera coordinate system into the image physical coordinate system, the generated lens is not completely parallel to the image plane, the phenomenon occurs when the imager is stuck to the video camera, and in addition, the radial distortion also comprises barrel distortion and pincushion distortion. Tangential distortion occurs during camera fabrication due to the non-parallel planes of the photoreceptor elements and the lens.

The technical scheme of the application is described below in connection with specific embodiments.

The method for generating the room type graph provided by the embodiment of the application can be applied to, but is not limited to, VR and AR house seeing scenes, and the scheme of the embodiment of the application mainly adopts an end-to-end multi-room type graph (2D+3D) generation mode. The method for generating the multi-room house type graph can generate an accurate room house type graph to represent room structures and house type information, so that on-line VR/AR house watching, on-line decoration design and the like can be realized.

Fig. 2A shows a flowchart of a method according to an embodiment of the present application, where the method may be performed by an entity generating device of a room type graph, or may be performed by a generating device of a cloud, for example, by a generating module of the cloud, a cloud server, or the like. For example, the method of the embodiment of the present application is implemented in the cloud, and is described by the cloud generating device as an example, where the cloud is abbreviated as the cloud generating device, please refer to fig. 2A, and the specific flow of the method is as follows:

S201A: the cloud acquires information of dense point clouds of at least one room and an initial architecture model diagram of the at least one room.

Wherein the information of the dense point cloud of one room is applied to the scale optimization of the initial architecture model diagram of one room, the information of the dense point cloud of one room is a dense three-dimensional point set used for representing the structure of one room, and the information of the dense point cloud of one room is obtained by using the image and the inertia information of the one room.

In the embodiment of the application, the cloud end can acquire the image data of the at least one room and shooting information (such as inertial information during shooting) when shooting the image of the room from the electronic equipment (namely shooting device, such as mobile phone, camera and the like) of the user; if the electronic device supports shooting of panoramic images of rooms, the image data of at least one room may include RGB images of the rooms or panoramic images of the rooms; if the electronic device only supports shooting the RGB image of the room, and does not support shooting the panoramic image of the room, the image data of the at least one room only includes the RGB image of the at least one room, and the cloud end may generate the panoramic image of the at least one room according to the RGB image of the at least one room. If the electronic equipment of the user supports shooting of the panoramic image of the room, the panoramic image of the room can be directly shot by the electronic equipment and fed back to the cloud end, so that the processing process of generating the panoramic image by the cloud end through a characteristic panoramic stitching method is omitted. The cloud then determines, based on the acquired panoramic image of the room, information of the dense point cloud of the at least one room and an initial architecture model diagram of the at least one room, an implementation process that will be described in detail below.

Taking the ith room as an example, the cloud terminal generates a panoramic image of the ith room according to the RGB image of the ith room, and the process can be that the cloud terminal acquires the RGB image of the ith room; and then generating the panoramic image of the ith room by adopting a characteristic panoramic stitching method based on the RGB image of the ith room. i can take any positive integer from 1 to N, wherein N is the number of rooms; that is, the method of generating the panoramic image of the corresponding room by the cloud end according to the RGB image of any other room may refer to the method of generating the panoramic image of the ith room by the cloud end according to the RGB image of the ith room, and the repetition is not repeated.

By the mode, the cloud end can obtain the RGB image of the at least one room and can obtain the panoramic image of the at least one room. Further, the cloud end may acquire dense point cloud information of the at least one room and an initial architecture model diagram of the at least one room based on the RGB image and the panoramic image of the at least one room, and photographing information (such as inertial information when photographing) when photographing the image of the room.

In one embodiment, taking the cloud end to obtain the information of the dense point cloud of the ith room as an example, the cloud end may perform a process of i may take any one positive integer from 1 to N, where N is the number of rooms, that is, the process of the cloud end to obtain the information of the dense point cloud of any other room may refer to the process of the cloud end to obtain the information of the dense point cloud of the ith room.

Step1: the cloud acquires pose information of an RGB image of an ith room, sparse point cloud information and a slice diagram of a panoramic image of the ith room; the sparse point cloud is a sparse three-dimensional set of points used to characterize the structure of the i-th room.

The cloud acquires pose information of the RGB image of the ith room and information of the sparse point cloud, and the RGB image of the ith room and inertial information when the room is shot can be acquired for the cloud; the cloud end estimates and obtains initial pose information of the ith room and information of an initial point cloud based on the RGB image and the inertia information of the ith room, wherein the initial point cloud is a two-dimensional point set for representing the structure of the ith room; the cloud end optimizes the initial pose information of the ith room to obtain pose information of an RGB image of the ith room, and quantifies the initial point cloud information of the ith room to obtain sparse point cloud information of the ith room.

The cloud may also acquire the sequence of the RGB image and the sequence of the inertial information of the ith room from the electronic device (for example, a mobile phone and a camera) of the user, for example, the sequence of the RGB image may be generated by the processor of the electronic device based on the RGB image of the ith room, the sequence of the inertial information may be generated by the processor of the electronic device based on the inertial information when the room is photographed, and the cloud may estimate and obtain the initial pose information and the initial point cloud information of the ith room based on the sequence of the RGB image of the ith room and the sequence of the inertial information.

Step2: the cloud end registers a slice image of the panoramic image of the ith room based on pose information of the RGB image of the ith room and information of sparse point cloud, and obtains pose information of the slice image of the panoramic image and information of the sparse point cloud.

Step3: and the cloud performs stereo matching on the pose information of the panoramic slice image and the sparse point cloud to obtain at least one depth image of the ith room.

Step4: and the cloud performs filtering and fusion processing on at least one depth map of the ith room to obtain information of dense point clouds of the ith room, wherein the depth map of each ith room corresponds to one image frame of the ith room.

In one embodiment, the main process of the cloud end obtaining the initial architecture model diagram of at least one room is described below by taking the initial architecture model diagram of the ith room as an example, where i is any one positive integer from 1 to N, where N is the number of rooms, that is, the process of the cloud end obtaining the initial architecture model diagram of any other room may refer to the process of the cloud end obtaining the initial architecture model diagram of the ith room:

The cloud detects the door position of the panoramic image of the ith room, and determines the door position information of the ith room; carrying out structural detection on the panoramic image of the ith room, and determining structural information of the ith room; based on the position information of the door of the ith room and the structural information of the ith room, an initial architecture model diagram of the ith room is obtained.

S202A: and the cloud end performs scale optimization on the initial architecture model graph of the corresponding room based on the information of the dense point cloud of each room to obtain a first architecture model graph of each room.

For example, the cloud may scale-optimize the initial architecture model of each room based on the information of the dense point cloud of each room and the multi-sensor scale optimization model of each room constructed in advance, to obtain a first architecture model map of each room.

S203A: the cloud end obtains a standard architecture model diagram of each room based on the first architecture model diagram of each room.

Taking the ith room as an example, the process of obtaining the standard architecture model diagram of the ith room may be as follows, where i is taken through any one positive integer from 1 to N, where N is the number of rooms, that is, the cloud end obtains the standard architecture model diagram based on the first architecture model diagram of any other room, and the processing procedure of obtaining the standard architecture model diagram based on the first architecture model diagram of the ith room by referring to the cloud end may be as follows:

The cloud acquires a multi-frame first architecture model diagram of a multi-frame ith room; and adjusting and fusing the multi-frame first architecture model graph to obtain a standard architecture model graph of the ith room.

The cloud firstly acquires a first architecture model diagram of a plurality of frames of the ith room, wherein the first architecture model diagram of each frame can be obtained based on the panoramic image of the ith room of the frame, the panoramic image of the ith room of each frame can correspond to one detection position, and different panoramic images can be obtained at different detection positions; then, the cloud performs gravity adjustment on each frame of the first architecture model graph, and projects each frame of the first architecture model graph from a 3D plane to a 2D plane to obtain a 2D projection graph of each frame of the first architecture model graph; then, carrying out edge aggregation on the 2D projection graph of the first architecture model graph based on a plurality of frames to obtain a 2D projection graph of an i-th room after edge aggregation; secondly, extending each side in the 2D projection graph based on the 2D projection graph of the i-th room after side aggregation, calculating to obtain an intersection point between the adjacent sides, further taking the longest side as a standard side, performing Manhattan correction on the adjacent sides, if the included angle between the adjacent sides is not 90 degrees, rotating the adjacent sides to adjust the adjacent sides to 90 degrees, and sequentially performing the adjustment on each side until the longest side is returned; and then calculating the height of the architecture model of the ith room, taking the average height h, projecting the corner faces of the architecture of the room in the corrected 2D projection diagram of the ith room, generating corresponding top faces and floors at the heights of z=h and z=0 respectively, gradually generating other wall faces, and finally forming all the wall faces into a 3D architecture model diagram (namely the standard architecture model diagram) of the ith room.

According to the embodiment, for each room, the cloud end can generate a plurality of first architecture model diagrams based on panoramic images shot at a plurality of different positions of each room (namely, multi-frame panoramic images), and then adjust and fuse the plurality of first architecture model diagrams of each room, so that the problem that the architecture model of the room generated based on the RGB images is in structural deficiency due to the fact that the RGB images do not comprise room parts which are not shot by the electronic equipment can be avoided.

S204A: the cloud end generates a room type graph based on the standard architecture model graph corresponding to at least one room.

In one embodiment, the cloud end may calculate, based on the standard architecture model diagrams corresponding to the at least one room, an adjacency relationship between the at least one room according to a distance between the at least one room; and according to the adjacent relation between at least one room, sequentially splicing the standard architecture model diagrams corresponding to the adjacent rooms respectively to generate a room type diagram. In addition, the cloud end can perform duplicate removal processing on the standard architecture model diagrams corresponding to at least one room before generating the room type diagrams based on the standard architecture model diagrams corresponding to at least one room, so that the problem that images overlap or are wrong when the room type diagrams are generated finally due to the fact that a plurality of standard architecture model diagrams exist in each room can be avoided, and the calculation processing workload can be reduced.

By way of example, fig. 2B illustrates a specific flow of generating a room floor plan for cloud execution.

In fig. 2B, steps S1 to S4 are a process of obtaining, by the cloud, information of a dense point cloud of at least one room, and specifically include: s1: the electronic equipment shoots image data of a room and sends the image data to the cloud, if the image data is an RGB image of the room, S2 is executed, and if the image data is a panoramic image of the room, S2 is not executed, and S3 is directly executed; s2: the cloud end generates a panoramic image of the room according to image data (namely RGB image of the room) of the room sent by the electronic equipment (optional step); s3: the cloud end carries out visual and inertial pose estimation according to the panoramic image and the RGB image of the room and inertial information (which can be understood as the inertial information of the RGB image) during shooting to obtain the pose of the panoramic image and sparse point cloud; s4: and the cloud performs dense reconstruction based on the sparse point cloud to obtain dense point cloud of the room.

The process of steps S5-S7 in fig. 2B is a process of obtaining an initial architecture model diagram of the at least one room by the cloud, and specifically includes: if the image data of the room sent to the cloud end by the electronic device in S1 is the RGB image of the room, S5 and S6 may be executed synchronously after S2 is executed, and if the image data of the room sent to the cloud end by the electronic device in S1 is the panoramic image of the room, S5 and S6 may be executed synchronously directly after S1 is executed. S5: the cloud terminal detects the door position according to the panoramic image of the room, and determines the door position information of the room; s6: the cloud terminal detects the room structure according to the panoramic image of the room sent by the electronic equipment, and determines the structure information of the room; s7: and the cloud end generates an initial architecture model diagram of the room according to the door position information of the room and the structure information of the room.

In fig. 2B, step S8 is a process of performing scale optimization on the initial architecture model diagram of each room based on the information of the dense point cloud of each room by the cloud to obtain a first architecture model diagram of each room, and specifically includes: s8: the cloud end performs multi-sensor scale optimization on the initial architecture model diagram of the room based on the dense point cloud of the room, and the architecture model diagram of the room after the scale optimization is obtained.

In fig. 2B, step S9 is a process of obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room by the cloud, and specifically includes: s9: and the cloud end performs multi-frame fusion based on the architecture model diagram of the room after the scale optimization to obtain a standard architecture model diagram of the room.

In fig. 2B, step S10 is a process of generating a room type map by the cloud based on the standard architecture model map corresponding to the at least one room, and specifically includes: s10: the cloud end performs multi-room joint optimization based on the standard architecture model diagrams of the multiple rooms and combining the pose of the panoramic image, and a room type diagram is generated.

Through the embodiment, the generation method of the house type graph provided by the application can generate the high-quality multi-room house type graph which is easy to popularize with low cost.

The following further details the steps in the method for generating a room layout according to the embodiment of the present application shown in fig. 2A through the following embodiments.

Embodiment one

In the first embodiment, the step S201A shown in fig. 2A is described in detail, that is, how the user photographing device only photographs the RGB image of the room, and when not photographing the panoramic image of the room, the panoramic image of the room is generated by the obtained RGB image of the room. As shown in fig. 3, taking one room (corresponding to the i-th room in the above-described embodiment of the present application) as an example, the process of generating a panoramic image of the one room is as follows:

s301: the cloud obtains a set of RGB images of the room.

Illustratively, the cloud may acquire a set of RGB (Red Green Blue) images obtained by a cell phone (equivalent to the user's electronic device) capturing the room.

For example, when capturing a set of RGB images of the room by a mobile phone, the user may take a photograph of the room in the following manner:

The user firstly determines the position of the central point of the camera coordinate system, and the user locates the mobile phone (namely the camera of the mobile phone can be regarded as a camera) at the position of the central point of the camera coordinate system (namely the position of the optical center); then, the user keeps the position of the mobile phone unchanged, only adjusts the orientation of the mobile phone, operates the photographing button of the mobile phone, and photographs the room in turn in 360 degrees, namely, the camera of the mobile phone faces different directions when photographing each time, so that a group of RGB images of the room are obtained through photographing for many times, the group of RGB images can cover a complete room picture of 360 degrees with the position of the mobile phone as the center, and overlapping room areas exist in the RGB images obtained by photographing for any two adjacent times. In addition, the mobile phone records and stores information during shooting through the inertial measurement unit IMU in the shooting process, such as shooting habit information and the like during shooting a room by the mobile phone, and transmits the information during shooting recorded and stored by the IMU to the cloud.

Further, the cloud end can generate a panoramic image of the room based on the RGB images of the room obtained by shooting by the group of mobile phones.

S302: and the cloud performs feature matching on the group of RGB images to determine an RGB image pair with a matching relationship.

If any two RGB images can be transformed to the same plane by homography, that is, the two RGB images photographed at different angles can be transformed to the same viewing angle to achieve image stitching, then the two RGB images can be considered to be homography transformed. In addition, for any two RGB images, since the angles of the photographing of the mobile phone are different, there is perspective distortion, which causes distortion of the RGB images obtained by photographing, in which case, if the two RGB images can be corrected to be identical by affine transformation, then the two RGB images can be regarded as affine transformation.

Therefore, for any two RGB images, in the scene of the above two transformations (i.e., homography transformation and affine transformation), the cloud may detect an essential feature (i.e., a robust feature) of the RGB image, where the essential feature does not change due to different specific manifestations of the images, such as SIFT features, so that the cloud may determine whether there is a matching relationship between the two RGB images based on the essential feature.

For example, for any two RGB images, the manner in which the cloud determines whether the RGB image pair has a matching relationship may specifically be as follows:

Cloud end detection and extraction of a plurality of SIFT feature points of each RGB image, and generation of a plurality of corresponding SIFT feature vectors; then, calculating the Euclidean distance between the SIFT feature vector corresponding to one SIFT feature point of one RGB image and the SIFT feature vector corresponding to one SIFT feature point of the other RGB image, and if the Euclidean distance is smaller than a set threshold value, determining the two SIFT feature points as matching point pairs; when the number of the matching point pairs between the two RGB images meets the preset number, the matching relationship between the two RGB images can be determined, namely the RGB image pairs with the matching relationship.

The cloud detection and extraction of SIFT features and feature matching of the two RGB images can be realized by referring to the prior art.

Through the feature matching mode, the cloud end can select a plurality of pairs of RGB image pairs with matching relations from the group of RGB images, and each RGB image pair can be understood as at least two RGB images with matching relations.

S303: and the cloud end screens the RGB image pairs with the matching relation.

For example, when the cloud filters the RGB image pairs with the matching relationship, the filtering conditions may be:

For each RGB image pair, the cloud calculates the probability of correct matching of two RGB images in the RGB image pair, and if the probability of correct matching is lower than a preset threshold value, the matching relation between the RGB image pairs is deleted; if the probability of correct matching is not smaller than the preset threshold value, the matching relation between the RGB image pairs is reserved. By the method, the cloud can effectively screen and reserve RGB image pairs with high probability of correct matching (namely better robustness).

Illustratively, for any RGB image pair, the cloud may calculate the probability of a correct match of the two RGB images in the RGB image pair by:

The cloud may use an n-fold bernoulli test to determine the probability of a correct match for the pair of RGB image pairs, as follows:

In each test, the cloud may first divide the matching point pair (e.g., SIFT feature point pair) of the RGB image pair into an interior point (inliers) and an outlier (outliers) by using a random sample consensus (random sample consensus, RANSAC) method, where the interior point (inliers) may be understood as a non-noise point and the outlier (outliers) may be understood as a noise point. Let n _f be the total number of matching point pairs between the RGB image pairs, n _i be the number of interior point (inliers) matching point pairs, n _i should satisfy the condition one: n _i＞α+βn_f, wherein α and β are constants.

In each test, the cloud end judges whether the RGB image pair meets the first condition and records the RGB image pair.

And then, after multiple tests, the cloud counts the test times which can meet the first condition.

Finally, the cloud calculates the proportion of the number of tests meeting the first condition to the total number of tests, and takes the proportion as the probability of correct matching of the RGB image pair. If the probability of the correct match is not less than the preset threshold value, the accuracy (robustness) of the RGB image pair is higher, and the RGB image pair is reserved. If the probability of the correct match is smaller than a preset threshold value, which indicates that the accuracy of the RGB image pair is lower (the robustness is poor), the matching relationship between the RGB image pair is deleted.

When executing the step S303, the cloud end calculates the probability of determining that the two RGB images in the RGB image pair are correctly matched, and may also adopt other existing schemes, which will not be described in detail herein.

S304: the cloud performs binding adjustment (bundle adjustment, BA) optimization on the RGB image pairs.

Through the step S303, the cloud end can screen out the RGB image pair with better robustness, however, when the RGB image pair with better robustness is used for stitching, a certain error (such as the same room area or a gap exists in the image pair) usually exists between the RGB image pairs, and if the error between the RGB image pairs is not optimized in time, the two ends of the finally stitched panoramic image cannot be closed. Therefore, in order to eliminate errors between RGB image pairs, the cloud may perform BA optimization on the RGB image pairs before stitching with the above-mentioned screened RGB image pairs with better robustness.

The specific way in which the cloud performs BA optimization on the RGB image pairs may be implemented according to an existing BA optimization scheme, which will not be described here.

S305: and the cloud end uses the RGB image pair after BA optimization to splice panoramic images, and a primarily spliced panoramic image of the room is obtained.

S306: and the cloud end straightens the preliminarily spliced panoramic image of the room.

The primarily stitched panoramic image of the room is a panoramic projection plane, and the panoramic projection plane belongs to a spherical surface, so that the cloud end expands (i.e. straightens) the primarily stitched panoramic image to be mapped into an image of the room.

In actual processing, scene distribution obtained by expanding the panoramic image along the vertical direction is more reasonable, but because the camera cannot be perfectly horizontal and not inclined during user acquisition, the situation that the panoramic expansion axis is inconsistent with the vertical direction is likely to occur, so that the scenes at the same height are in wavy distribution on the final panoramic image, the cloud can determine the accurate vertical direction first, and then expand (namely straighten) the primarily spliced panoramic image based on the accurate vertical direction, so that the accurate panoramic image of the room can be obtained.

The cloud end can obtain a gravity axis obtained by the IMU from the mobile phone, then determine the vertical direction by utilizing the gravity axis obtained by the IMU, and further expand the initially spliced panoramic image upwards along the vertical direction based on the vertical direction, so that the panoramic image of the room after expansion can be obtained.

S307: the cloud performs panorama color mixing on the panorama image of the room.

The panoramic image of the room is formed by splicing different RGB images essentially, the different RGB images are obtained by shooting by a mobile phone at different moments, so that the different RGB images have different illumination, exposure parameters and the like, and the luminosity of the panoramic image of the room is inconsistent.

Illustratively, the cloud performs color mixing on the panoramic image of the room, which may be implemented in the following manner:

firstly, the cloud end determines an optimal gain value for a panoramic image of the room, and then performs gain compensation on pixels of the panoramic image based on the optimal gain value. Then, for the overlapping area of the plurality of RGB images in the panoramic image, the cloud end can determine a weight function for each RGB image, and then based on the colors of the plurality of RGB images and the corresponding weight functions, the overlapping area is subjected to color adjustment in a weighted mixing mode, so that the overlapping area after the color adjustment is obtained.

When the cloud performs the step S307, that is, the specific mode of the cloud performing color mixing on the panoramic image of the room may be implemented with reference to the existing scheme.

The steps S301 to S307 are described by taking a room as an example, and other rooms in the house where the room is located can refer to the steps S301 to S307 to obtain the corresponding panoramic image, which is not described in detail herein.

With the first embodiment, when a mobile phone used by a user can only support photographing RGB images and does not support photographing panoramic images, a cloud terminal can effectively obtain (generate) a panoramic image of a room based on a set of RGB images of the room.

Second embodiment

The second embodiment is mainly described in detail with respect to step S201A shown in fig. 2A, that is, how to obtain information of dense point clouds of at least one room. As shown in fig. 4A, taking a room in a house (corresponding to the ith room in the above-mentioned scheme of the present application) as an example, the cloud obtains the information of the dense point cloud of the room as follows:

S401A: the cloud end obtains the pose and sparse point cloud of the RGB image of the room based on the RGB image of the room.

The pose of the RGB image of the room may be generally used to represent the position and orientation of the mobile phone (or the camera of the mobile phone) when the mobile phone captures and obtains the RGB image of the room, and the pose of the RGB image of the room may also be generally referred to as the pose of the mobile phone or the pose of the camera, which may be collectively referred to as the pose of the camera in this embodiment. The sparse point cloud of the room refers to a set of points on the three-dimensional space used to construct the room.

The cloud terminal obtains the pose and sparse cloud of the RGB image of the room based on the RGB image of the room, and referring to fig. 4B, the cloud terminal specifically may include the following steps:

step1: the cloud acquires a group of RGB images and corresponding IMU information of the room.

Specifically, the cloud may acquire a set of RGB images and corresponding IMU information that are acquired by photographing the mobile phone, where the set of RGB images may be acquired by photographing the room multiple times by using the mobile phone in the photographing manner in the first embodiment (that is, the mobile phone is located at the same position, and photographing the room and acquiring a set of RGB images in sequence in the 360-degree direction), which is not described herein in detail.

Step2: the cloud uses a set of RGB images and corresponding IMU information of the room, and adopts a visual and inertial instant positioning and map construction (simultaneous localization AND MAPPING, SLAM) method to obtain the pose of a set of initial RGB images.

The cloud end uses the set of RGB images and the corresponding IMU information, and uses the existing visual and inertial SLAM techniques to jointly estimate and obtain the pose of the set of initial RGB images (i.e., the pose of the initial camera corresponding to the set of RGB images), and the initial sparse point cloud of the room, which may be specifically implemented according to the existing visual and inertial SLAM techniques, and will not be specifically described herein.

The pose of a group of initial RGB images obtained by the cloud through the visual and inertial SLAM method can approximately reflect the motion track (namely the rotation track) of the mobile phone. However, the initial sparse point cloud of the room obtained by the cloud via visual and inertial SLAM methods is typically too sparse and thus negligible in this embodiment.

Step3: the cloud end optimizes the pose of the initial group of RGB images by adopting a structure-from-motion (SFM) method of the prior pose to obtain the pose of the optimized group of RGB images, and obtains the sparse point cloud of the room. The method specifically comprises the following steps:

(a) Feature extraction:

The cloud acquires a group of RGB images shot and obtained from the mobile phone, and detects and extracts local feature points, such as SIFT feature points, with illumination invariance and geometric invariance (scale invariance/rotation invariance) on each RGB image; or cloud detects and extracts binary feature points for each RGB image, such as an improved version of accelerated segmentation test feature and binary coded descriptor-based features (oriented fast and rotated brief, ORB). In the second embodiment, the SIFT feature points of each RGB image are detected and extracted by the cloud end.

After the cloud detects and extracts the SIFT feature points in each RGB image, corresponding SIFT feature vectors can be generated for the SIFT feature points in each RGB image.

(B) Feature matching:

The cloud terminal performs feature matching on the SIFT feature points in the group of RGB images according to the SIFT feature vectors corresponding to the group of RGB images, and the specific process may be as follows:

For any two RGB images: firstly, the cloud can calculate the Euclidean distance of SIFT feature vectors of the two RGB images according to the SIFT feature vectors of the two RGB images, and the Euclidean distance is used as a judging measure of similarity of SIFT feature points in the two RGB images (which is equivalent to a basis for judging whether the key points in the two RGB images have similarity or not); and then the cloud can compare the Euclidean distance of the SIFT feature vectors of the two RGB images with a preset distance threshold, if the Euclidean distance is larger than the preset distance threshold, the two key SIFT feature points are not the matching point pairs, and if the Euclidean distance is smaller than the preset distance threshold, the two SIFT feature points are considered to be the matching point pairs.

For example, for any two RGB images, the cloud detects and extracts 3 key SIFT feature points on the first RGB image as point 1, point 2 and point 3, and generates corresponding SIFT feature vectors as follows: feature vector 1.1, feature vector 1.2, feature vector 1.3.

The cloud detection and extraction of 3 SIFT feature points on the second RGB image are respectively as follows: point 1, point 2, point 3, and the corresponding SIFT feature vectors generated are respectively: feature vector 2.1, feature vector 2.2, feature vector 2.3.

The cloud calculates the euclidean distance between the feature vector 1.1 and the feature vector 2.1, if the euclidean distance is greater than a preset distance threshold value, the point 1 on the first image and the point 1 on the second image are not matched point pairs, and if the euclidean distance is less than the preset distance threshold value, the point 1 on the first image and the point 1 on the second image are matched point pairs.

Through the above feature matching process, the cloud may obtain some matching point pairs corresponding to the group of RGB images, but the accuracy and robustness of these matching point pairs are not very high, so the cloud may further test these matching point pairs by using the following geometric verification method to remove the wrong matching point pairs and keep the precise matching point pairs.

(C) Geometric verification:

the cloud performs geometric verification on some of the obtained matching point pairs, which can be as follows:

For any matching point pair, the cloud can firstly determine the geometric relationship of the matching point pair, then determine whether the matching point pair is an error matching point pair or not based on the geometric relationship of the matching point pair by adopting a random sampling consistency algorithm; and finally, removing the wrong matching point pair, so that the accurate matching point pair can be kept. The specific geometry verification process may be performed according to existing geometry verification schemes, which are not described in detail herein.

Through the feature extraction, feature matching and geometric verification, the cloud end can determine accurate matching point pairs (such as SIFT feature point pairs) between RGB images based on a group of RGB images shot and obtained from a mobile phone.

(D) Triangularization:

In this step, the cloud may take the pose of the initial set of RGB images obtained by vision and inertia based SLAM as the prior pose of the set of RGB images.

The cloud end can determine a plurality of pairs of image pairs (each pair at least comprises two RGB images) with a common matching relationship from the group of RGB images based on the obtained accurate matching points and the prior pose of the group of RGB images, and then triangulate the image pairs with the common matching relationship by adopting a triangulation principle to generate corresponding 3D points. The specific process of the triangularization process can be implemented according to existing triangularization implementations, and will not be described in detail herein.

The cloud performs triangularization processing on the images with the common matching relationship respectively to obtain some 3D points, wherein the 3D points can form a set of 3D points of the room (namely, sparse point cloud of the room). However, the accuracy of these 3D points obtained after the first triangularization by the cloud is not yet high enough, so the cloud can also perform BA optimization on these 3D points.

(E) BA (binding adjustment) optimization:

in this step, the cloud may perform BA optimization on the prior pose of the set of RGB images and the triangulated 3D points by using a method of minimizing the re-projection error.

In the step, the cloud performs BA optimization on the prior pose of the group of RGB images and the 3D points obtained after the first triangulation, so that the pose and the 3D points of the group of RGB images after optimization can be obtained, the pose and the 3D points of the group of RGB images after optimization are returned to the step (D), the triangulation is performed again, the BA optimization is performed continuously after the triangulation, and the steps of the triangulation and the BA optimization are executed in a sequential loop iteration mode; the cloud end can stop after the number of loop iterations reaches the preset number of times, the pose of a group of RGB images which are finally optimized and the 3D points which are finally optimized are obtained, and the 3D points which are finally optimized can form a sparse point cloud of the room.

The specific process of executing the geometric verification and the BA optimization by the cloud can be realized by referring to the existing geometric verification and BA technology.

S402A: the cloud end obtains a plurality of slice images of the panoramic image based on the panoramic image of the room.

In the second embodiment, the cloud may or may not execute the step S401A and the step S402A synchronously, and the sequence of executing the step S401A and the step S402A is not specifically limited in the present application. In addition, the cloud may acquire the panoramic image of the room obtained by shooting with the mobile phone, or may generate the panoramic image of the room according to a set of RGB images obtained by shooting with the mobile phone after acquiring the set of RGB images of the room.

Since the projection mode of the panoramic image and the projection mode of the RGB image are greatly different, the panoramic image and the RGB image cannot be directly matched, and therefore, the cloud end can process the panoramic image so that the projection mode is similar to the projection mode of the RGB image, and the panoramic image can be used in step S403A described below.

The projection mode of the panoramic image is usually an equidistant cylindrical projection (equirectangular projection, ERP) mode, and pixels on the panoramic image projected by the EPR mode have a one-to-one correspondence relationship with a spherical coordinate system and a rectangular coordinate system respectively. Therefore, in the step S402A, for a panoramic image of the room, the cloud may determine the correspondence between the pixels on the panoramic image projected in the EPR mode and the spherical coordinate system, then the cloud may determine the corresponding rectangular coordinate system according to the correspondence between the pixels on the panoramic image projected in the EPR mode and the spherical coordinate system, and finally the cloud may divide the panoramic image of the room into a plurality of slice images based on the rectangular coordinate system, so that the projection mode (perspective projection) of each slice image is similar to the projection mode of the RGB image. The specific process can be as follows:

Assuming that the resolution of the panoramic image is width×height, taking any point on the panoramic image as an example, the pixel coordinate corresponding to the point is (h, w), and the corresponding relationship between the pixel coordinate and the spherical coordinate on the panoramic image can satisfy the following formula one:

The angle theta, the angle phi and the radius r are spherical coordinates, a point P is assumed in a space, the distance between an origin O and the point P is expressed as the radius r, the zenith angle between a connecting line from the origin to the P and a z-axis in a positive direction is expressed as theta, and the azimuth angle of a perpendicular angle between a projection line of the connecting line from the origin to the point P in an xy plane and an x-axis in the positive direction is expressed as phi.

In the above formula one, θ∈ [ -pi, pi ], φ∈ [ pi/2, -pi/2 ], θ and φ are equally proportional changes along the pixel distance. Here r may be by default an arbitrary constant.

It should be noted that, on this panoramic image, θ is continuously changed in one rotation, and the leftmost end and the rightmost end are connected, but Φ is discontinuous (the uppermost and lowermost correspond to different ends, respectively).

First, the cloud end can determine the spherical coordinates corresponding to the point according to the pixel coordinates of the point through the formula one (i.e. the corresponding relationship between the pixel coordinates on the panoramic image and the spherical coordinate system).

Then, the cloud end can determine the rectangular coordinate corresponding to the point according to the spherical coordinate corresponding to the point through the following formula II (namely, the corresponding relation between the spherical coordinate system and the rectangular coordinate system).

The corresponding relation between the spherical coordinate system and the rectangular coordinate system can satisfy the following formula II:

Wherein,

X represents an abscissa in a rectangular coordinate system, y represents an ordinate in a rectangular coordinate system, and z represents an ordinate in a rectangular coordinate system.

With reference to the above conversion manner of any point on the panoramic image, the cloud end may up-convert all points on the panoramic image from the pixel coordinate system to the rectangular coordinate system to obtain corresponding rectangular coordinates, which will not be described in detail herein.

Further, the cloud end can generate a plurality of slice images of the panoramic image based on rectangular coordinates corresponding to all points on the panoramic image. Taking a slice image (hereinafter referred to as a target slice image) of the panoramic image generated by the cloud as an example, the following may be specifically mentioned:

Assuming that the target slice is a slice with a resolution of w×w (hfov = wfov), and the conditions that the target slice should also satisfy may include: i: the center point of the target slice is the origin of coordinates; ii: the right direction of the target slice corresponds to the positive direction of the x-axis of the rectangular coordinate system, and the lower direction of the target slice corresponds to the positive direction of the y-axis of the rectangular coordinate system; iii: the direction from the center of the panoramic image to the center point of the target slice is the positive direction of the z-axis of the rectangular coordinate system, and the spherical coordinate corresponding to the position of the center of the panoramic image is θ=0, Φ=0.

The cloud end maps to the pixel coordinate system corresponding to the target slice according to the rectangular coordinates corresponding to all points of the panoramic image through the following formula III, so that the corresponding pixel coordinate can be obtained.

Through the formula I, the formula II and the formula III, the cloud end can realize pixel-by-pixel mapping of the panoramic image onto the target slice.

The cloud performs projection mapping on the panoramic image with the center points at different positions, so that a plurality of slice images of the panoramic image can be generated, and the method for generating the target slice image from the panoramic image can be specifically referred to and is not described herein.

Further, the size of each slice of the panoramic image and the size of the RGB image are set to be close to each other, so that the field of view (FoV) of each slice can be made to be close to the field of view FoV of the RGB image, but the field of view FoV of each slice cannot exceed 90 °, the total number n of slices of the panoramic image is related to the field of view FoV of each slice, n > 360 °/FoV is set, and the difference between the center positions of two adjacent slices is 360 °/n, so that it can be ensured that all slices of the panoramic image can cover one circle of the equator of the spherical coordinates in the horizontal direction without omission.

S403A: the cloud end obtains the pose and the sparse point cloud of the panoramic slice image based on the pose and the sparse point cloud of the RGB image and the plurality of slice images of the panoramic image.

The cloud end can obtain the pose and sparse point cloud of the panoramic slice based on the pose and sparse point cloud of the RGB image obtained in the step S401A and the plurality of slice images of the panoramic image obtained in the step S402A.

Referring to fig. 4C, the method specifically includes the following steps:

(a) And (5) extracting characteristics.

(B) And (5) feature matching.

(C) And (5) geometric verification.

The cloud performs feature extraction, feature matching and geometric verification on each slice of the panoramic image, which can refer to Step3 in the Step S401A, and will not be described in detail here.

After geometric verification, the cloud can obtain accurate matching point pairs among the plurality of slice images.

(D) Image registration (the process of image registration may be referred to as registration).

The cloud end can register the multiple slice images to the pose and the point cloud of the group of RGB images obtained in the step S401A by adopting a random sampling consistency algorithm based on the accurate matching points of the multiple slice images and adopting a three-point perspective imaging principle (perspective-3-points, p3 p), so as to calculate and obtain a group of poses corresponding to the multiple slice images.

(E) Triangularization.

The cloud end can determine a plurality of pairs of slice images with a common matching relationship from the plurality of slice images based on the obtained accurate matching pair among the plurality of slice images and a group of poses corresponding to the plurality of slice images (each pair of slice images with the common matching relationship at least comprises two slice images); and then the cloud can triangulate the slice diagram pairs with the common matching relationship by adopting a triangulation principle, so as to generate corresponding 3D points. The specific process of the triangularization process can be implemented according to existing triangularization implementations, and will not be described in detail herein.

Through the step, the cloud can generate some 3D points based on a plurality of slice diagram pairs with common matching relation, and the 3D points can form sparse point cloud.

(F) And (5) binding and adjusting.

In this step, the cloud may use a method for minimizing the re-projection error to optimize a set of poses corresponding to the plurality of slice images and some 3D points generated by the triangularization. The cloud end can refer to the step3 binding adjustment mode in the step S401A to achieve the final optimized pose of the plurality of slice images and the final optimized 3D points, and the final optimized 3D points can form the sparse point cloud of the room.

The cloud may use the final optimal pose of the plurality of slice images and the final optimized 3D points (the 3D points may constitute a sparse point cloud) to further execute step S404A described below.

S404A: and the cloud end obtains a dense point cloud of the room based on the pose of the slice diagram of the panoramic image and the sparse point cloud.

The cloud may perform stereo matching based on the set of poses and sparse point clouds (i.e., some 3D points) of the plurality of slice images of the panoramic image obtained in the step S403A to obtain a depth image corresponding to each image frame (each image frame may correspond to one slice image), i.e., obtain a plurality of Zhang Shendu images; the cloud may then filter and fuse the multiple depth maps using multi-view stereo (MVS) principles, thereby obtaining the final dense point cloud for the room. How the cloud specifically filters and fuses the multiple depth maps by using the MVS principle can be realized by referring to the existing MVS technical scheme, and will not be described in detail here.

The steps S401A-S404A are described by taking a room as an example, and other rooms in the house where the room is located can refer to the steps S401A-S404A to obtain the pose of the slice image of the panoramic image of the room and the dense point cloud of the room, which are not described in detail herein.

According to the second embodiment, the cloud end can obtain the pose and sparse cloud point of the slice image of the panoramic image based on the RGB image of the room or the panoramic image of the room, further, the cloud end can effectively and accurately obtain the pose and dense point cloud of the slice image of the panoramic image based on the pose and sparse cloud point of the slice image of the panoramic image, and the pose and dense point cloud of the slice image of the panoramic image can be used in subsequent steps to adjust the scale of the initial architecture model of the corresponding room.

Embodiment III

The third embodiment is mainly described in detail in step S201A shown in fig. 2A, that is, how to obtain an initial architecture model of at least one room. As shown in fig. 5A, taking a room in a house (corresponding to the ith room in the above-mentioned scheme of the present application) as an example, the process of obtaining the initial architecture model of the room by the cloud end is as follows:

S501A: the cloud end detects the door position of a room based on the panoramic image of the room, and determines the position of the door in the panoramic image.

The cloud end can detect the door position of the panoramic image of the room through the Mask R-CNN deep learning network, and the door position of the room is determined.

The exemplary cloud may determine the door position of the room through the Mask-CNN deep learning network, and referring to fig. 5B, the following may be specifically mentioned:

Firstly, a cloud end can input a panoramic image of the room into a feature pyramid network (feature pyramid network, FPN), the FPN extracts multi-scale features of the panoramic image, the multi-scale features of the panoramic image can be obtained, and the multi-scale features can be used for detecting target objects (namely doors of the room) with different scales; the multi-scale features are then input from the FPN into the following two branches, respectively:

branch 1: the multi-scale features are input into a regional suggestion network (region proposal network, RPN), and after the RPN processing, the approximate existence region of the target object (gate), namely the region of interest ROI, can be primarily identified, and then input into the region of interest alignment pooling layer of branch 2 described below.

Branch 2: the multi-scale features obtained by FPN described above are input into a region of interest alignment pooling layer (ROI alignment).

The multi-scale features of the region of interest ROI obtained by the RPN of the branch 1 and the multi-scale features of the branch 2 are input into a region of interest alignment pooling layer (ROI alignment) for merging, and the region of interest alignment pooling layer may output the merged features after merging, aligning and pooling based on the multi-scale features obtained by the FPN and the region of interest ROI obtained by the RPN.

Further, the fusion characteristics output by the region of interest alignment pooling layer are respectively input into the following two branches:

Branch 1: the fusion feature is input to a Full Connection (FC), and the category and bounding box of the target object (i.e., door) can be detected after FC. In this embodiment of the present application, the class of the target object only retains the term "gate", and the bounding box refers to a rectangle that is smallest and may contain the target object, and whose two sides are parallel to the horizontal and vertical directions of the image, respectively.

Branch 2: the fusion feature is input to a convolutional layer (CONV) from which a mask (mask) corresponding to the gate can be obtained.

Through the above, the cloud can detect the door position of the panoramic image of the room through the Mask-CNN deep learning network, so that the position of the door in the panoramic image can be effectively determined.

For example, fig. 5C shows an effect diagram of the cloud end detecting the door position based on the deep learning network Mask R-CNN, which inputs the panoramic image of the room as shown in (a) of fig. 5C, and outputs the detection result diagram of the door position of the room as shown in (b) of fig. 5C after passing through the deep learning network Mask R-CNN.

In addition, the cloud end can also use the deep learning network Mask R-CNN to detect the position of the room window in the panoramic image so as to determine the position of the room window in the panoramic image, and the detection of the door position of the room can be specifically referred to, which is not described herein again.

S502A: the cloud end detects the structure of the room based on the panoramic image of the room, and determines the structure of the room on the panoramic image.

The cloud end can detect corner points of the panoramic image of the room by adopting a deep learning network (such as hohonet), so as to obtain all the corner points of the room, and further, the cloud end can determine the structure of the room based on all the corner points of the room.

For example, the cloud may detect the corner point of the room through the deep learning network (e.g. hohonet), and referring to fig. 5D, the following may be specifically mentioned:

Firstly, the cloud end can input a panoramic image of the room into a residual neural network (Resnet), the characteristic pyramid of the panoramic image can be extracted through the residual neural network (Resnet), and the characteristic pyramid of the panoramic image is output; then, respectively inputting the feature pyramids of the panoramic image into a plurality of high-efficiency high-compression blocks (EFFICIENT HEIGHT compression Block, EHC Block), respectively compressing and fusing the feature pyramids through the plurality of high-efficiency high-compression blocks, refining and optimizing through a multi-head self-attention, MHSA structure, and then outputting potential horizontal features (latent horizontal feature, LHFeat); finally, the cloud may output the detection result, i.e., the pixel coordinates of each corner, by a one-dimensional convolution (Conv 1) and an inverse discrete cosine transform (INVERSE DISCRETE cosine transformation, IDCT) on the latent horizontal feature.

Illustratively, fig. 5E shows an effect diagram of the cloud based on the detection of the wall corner points using the deep learning network (hohonet), which inputs the panoramic image of the room as shown in fig. 5E (a), and outputs the detection result diagram of the wall corner points of the room after passing through the deep learning network (hohonet) as shown in fig. 5E (b).

S503A: the cloud generates an initial architecture model of the room based on the location of the door on the panoramic image of the room and the structure of the room.

The cloud may detect the door position in the panoramic image obtained by the step S501A and detect the corner point of the room in the panoramic image obtained by the step S502A, and project the corner point to the spherical coordinate system in an ERP projection manner; then, the cloud end uses four vertexes as vertexes of the panel for each wall or each door, and two large triangular panels can be generated, namely, one door or one wall of the room can be formed by the two large triangular panels. Finally, the cloud may combine the walls and doors to obtain an initial architectural model of the room.

For example, as shown in fig. 5F, the cloud may generate an initial architecture model of the room through the generating module, input the door position in the panoramic image detected in the step S501A and the corner points of the room in the panoramic image detected in the step S502A (the structure of the room may be determined based on the corner points), and output the initial architecture model of the room (the actual output is a grid for describing or representing the initial architecture model of the room) after processing (i.e. including the projected back spherical coordinate system and mesh generation) through the generating module.

The steps S501A-S503A are described by taking a room as an example, and other rooms in the house where the room is located can refer to the steps S501A-S503A to obtain an initial architecture model, which is not described herein in detail.

According to the third embodiment, the cloud end can effectively and accurately obtain the initial architecture model of the corresponding room based on the panoramic image of each room, the initial architecture model is used for being further adjusted in the subsequent step, and finally the accurate standard architecture model of each room can be obtained.

Fourth embodiment

In the fourth embodiment, the details of how to scale-optimize the initial architecture model map of each room based on the information of the dense point cloud of each room in step S202A shown in fig. 2A, so as to obtain the first architecture model map of each room are mainly described. As shown in fig. 6, taking a room in a house (corresponding to the ith room in the above-mentioned scheme of the present application) as an example, the process of obtaining the first architecture model of the room by the cloud end is as follows:

S601: and the cloud end removes the point cloud of the top ground from the dense point cloud of the room according to the information of the dense point cloud of the room, and the point cloud after the top ground is removed is obtained.

By way of example, with the above-described second embodiment, the cloud may obtain the information of the dense point cloud of the room, which includes the information of the plurality of point clouds since the dense point cloud of the room is constituted of the plurality of point clouds (i.e., the plurality of 3D points). The cloud may remove the point clouds of the top surface and the ground (collectively referred to as the top ground) from the dense point clouds of the room according to the information of the plurality of point clouds, which may specifically be as follows:

Firstly, the cloud can calculate the normal vector of each point cloud according to the information of each point cloud; then calculating an included angle between the normal vector of each point cloud and the xy plane, and obtaining a normal vector included angle corresponding to each point cloud; and finally, filtering (removing) the point cloud with the normal vector included angle exceeding 15 degrees from the dense point cloud of the room, so that the point cloud with the top and the ground removed can be obtained.

The step S601 may be equivalent to performing top-ground segmentation by the cloud to obtain a point cloud with the top surface and the ground removed.

S602: and the cloud performs semantic segmentation on the point cloud with the top ground removed to obtain the point cloud of the wall surface.

The cloud end can perform semantic segmentation on the point cloud with the top surface removed through an open source neural network (RandLA-Net), so that the point cloud outside the wall surface is removed, only the point cloud of the wall surface of the room is reserved, and the method can refer to the existing scheme specifically and is not described in detail here.

S603: the cloud end performs wall surface matching based on the point cloud of the wall surface of the room.

For each wall in the initial architecture model of the room, the cloud performs wall matching based on the point cloud of the wall, which may be specifically as follows:

Firstly, the cloud calculates the normal vector of the wall surface, and the direction of the normal vector can face the inside of the room; then, the cloud can adopt a random sampling coincidence (random) method to carry out random plane detection on the point cloud of the wall surface, and the normal vector of the wall surface is used as the normal vector constraint for carrying out random (random) plane detection on the corner of the wall, so that a plurality of planes corresponding to the wall surface can be obtained; and secondly, the cloud end can calculate the loss of each plane through the following loss calculation formula IV, select the plane with the smallest loss from the planes, and serve as a matching wall surface (namely a dense point cloud wall surface) of the wall surface.

Loss = planar distance/planar coverage formula four

Through the step, the cloud end can obtain the matched wall surface (the matched wall surface can be regarded as a dense point cloud wall surface) of each wall surface in the initial architecture model of the room, and one wall surface in the initial architecture model of the room and the corresponding matched wall surface can form a wall surface matched pair. Therefore, in step S604, the cloud end may perform scale optimization on all the walls in the initial architecture model of the room by using the wall matching pair.

S604: the cloud performs scale optimization on the wall surfaces in the initial architecture model of the room.

Taking a wall surface in the initial architecture model of the room as an example, the cloud end performs a scale optimization process on the wall surface as follows:

First, the cloud end can calculate a wall plane corresponding to the wall surface through the following wall plane formula (i.e., the wall plane can be expressed by the following wall plane formula).

If the moving direction of the wall surface is limited to be the positive and negative directions of the normal vector, the vector is managed to be (nx, ny, nz), and the wall plane formula can meet the following formula:

Ax+By+Cz+d(nx,ny,nz)＝0；

Wherein A, B, C, D, n are constants.

Then, the cloud end can calculate the distances between all points in the dense point cloud wall surface and the wall plane according to the matched wall surface (i.e. the dense point cloud wall surface) corresponding to the wall surface obtained in the step S603A, solve the d value in the wall plane formula when the distance is minimum, and finally, the cloud end can obtain a new wall plane corresponding to the wall surface according to the wall plane formula, wherein the new wall plane is the new wall surface after the dimension optimization.

With reference to the above process, the cloud end can perform scale optimization on all the walls in the initial architecture model of the room, and obtain new optimized walls corresponding to all the walls. Furthermore, the cloud can derive the new wall surfaces after each dimension optimization, and calculate the intersection line between each new wall surface and the adjacent new wall surface. The cloud end can finally generate a framework model (namely a first framework model of the room) with optimized dimensions of the room based on intersecting lines between all new wall surfaces and all wall surfaces.

The steps S601-S604 are described by taking a room as an example, and other rooms in the house where the room is located can refer to the steps S601-S604 to obtain an architecture model (i.e., a first architecture model) of the room after the dimension optimization, which is not described herein in detail.

The accuracy of determining the dimensions and position information of the room by using the 3D dense point clouds of the room is generally high, so by using the fourth embodiment, the cloud can accurately optimize the dimensions of the initial architecture model of the corresponding room by using the 3D dense point clouds of each room, thereby ensuring the accuracy of the dimensions of the architecture model of each room.

Fifth embodiment

In the fifth embodiment, a detailed description is mainly made of how to obtain a standard architecture model diagram of each room based on the first architecture model diagram of each room (i.e., the architecture model diagram of the room after the scale optimization) in step S203A shown in fig. 2A. As shown in fig. 7A, taking a room in a house (corresponding to the ith room in the above-mentioned scheme of the present application) as an example, the process of obtaining a standard architecture model of the room by the cloud end is as follows:

S701A: and the cloud performs gravity adjustment on the architecture model of the room to obtain all wall surface projection data.

For the room, the cloud may acquire multiple architectural models generated by multiple frames of panoramic images of the room (i.e., one frame of panoramic image corresponds to generate one architectural model of the room).

The multi-frame panoramic image of the room may be a plurality of panoramic images obtained by taking a plurality of frames of shots of the room by using a mobile phone (each frame of shot corresponds to one panoramic image of the room), and when the user takes a plurality of frames of shots of the room by using the mobile phone, the position of the mobile phone may change slightly in different frames, so the multi-frame panoramic image may be understood as a panoramic image taken at a plurality of different positions. The cloud terminal generates the corresponding architecture model (i.e., the first architecture model) of the room based on each frame of panoramic image (i.e., one panoramic image), which can be implemented by referring to the steps in the second to fourth embodiments, and will not be described in detail herein.

In the step S701A, the cloud performing gravity adjustment on the architecture model of the room may refer to that the cloud performs gravity adjustment on the first architecture model corresponding to the multi-frame panoramic image of the room, respectively.

Taking an architecture model of the room corresponding to a panoramic image as an example, the cloud end performs gravity adjustment on the architecture model of the room, and the steps may be as follows:

firstly, the cloud calculates the normal vector of the ground in the architecture model of the room, then, the cloud calculates the included angle theta between the normal vector of the ground and the z-axis, and further, the cloud can calculate the rotation axis ra of the architecture model of the room according to the cross multiplication of the normal vector of the ground and the z-axis; then, after rotating the architecture model of the room by an angle theta around the rotation axis ra, the architecture model of the room after the correction can be obtained; finally, the cloud projects the architecture model of the room after the transformation to the xy plane, so that wall projection data (namely projection data of all walls of the room) of the architecture model of the room can be obtained.

Fig. 7B shows a schematic view of the effect of gravity adjustment, in which the adjusted room top surface is at an angle to the xy plane as shown in fig. 7B (a), and the adjusted room top surface is parallel to the xy plane as shown in fig. 7B (B).

Through the above steps, the cloud end can obtain wall projection data corresponding to multiple frames of the room based on the architecture model of the room generated by the multiple frames of panoramic images of the room, and the wall projection data corresponding to the multiple frames can be used in step S702A described below.

S702A: and the cloud end performs edge aggregation based on all wall surface projection data to obtain all edges of the room after aggregation.

The cloud end performs edge aggregation based on the wall projection data corresponding to the multiple frames, so that all edges of the room after aggregation can be obtained.

Because the wall projection data corresponding to each frame includes the projection data of all the walls of the room, the cloud can bind and aggregate the data which are distributed in a similar way based on the projection data of the wall j corresponding to a plurality of frames aiming at the edge of any wall of the room, such as the edge of the wall j, so that each edge of the wall j after aggregation can be obtained.

It should be noted that, when the cloud end performs edge aggregation based on wall projection data corresponding to multiple frames, the door position can be considered, in addition, the normal vector included angle of the wall corresponding to different frames for aggregation is not more than 15 degrees, and the distance is not more than 1m.

S703A: and the cloud end respectively fuses all the aggregated edges to generate a standard architecture model of the room.

Illustratively, when the cloud performs this step S703A, the specific steps may be as follows:

Firstly, the cloud end respectively prolongs all the aggregated edges, calculates and obtains intersection points between the adjacent edges, further, can take the longest edge as a standard edge, manhattan correction can be carried out on the adjacent edges, if the included angle between the adjacent edges and the standard edge is not 90 degrees, the adjacent edges can be rotated, so that the angle between the adjacent edges and the standard edge is adjusted to 90 degrees, and similarly, the cloud end can sequentially carry out the adjustment on the edges until the round returns to the longest edge.

Then the cloud calculates the heights of the architecture models of the rooms corresponding to the multiple frames respectively, and calculates the average heights of the architecture models of the rooms; then the cloud can generate corresponding top surfaces and ground surfaces at the heights of z=h and z=0 respectively based on the wall projection (2D) data of the modified architecture model of the room, and then gradually generate other wall surfaces of the room; finally, cloud points constitute all the walls into a 3D architectural model of the room (i.e., the standard architectural model of the room).

It should be noted that, the above steps S701A-S703A are described by taking a room as an example, and other rooms in the house where the room is located can be executed with reference to the above steps S701A-S703A, which is not described herein in detail.

Because the architecture model of the room after the scale optimization may have a problem of structural deficiency, according to the fifth embodiment, the cloud end can perform fusion processing on the architecture model of the room generated by shooting panoramic images at a plurality of different positions for each room, so that the accurate architecture model of each room can be effectively obtained.

Embodiment six

The sixth embodiment is mainly described in detail with respect to step S204A shown in fig. 2A, namely, how to generate a multi-room family pattern diagram based on the standard architecture model of the at least one room. As shown in fig. 8, the process of obtaining the house type map of the room by the cloud end is as follows:

S801: the cloud performs deduplication on the standard architecture models of all rooms.

Through the above embodiments, the cloud end can obtain the standard architecture models of all rooms (i.e. at least one room), and because the cloud end can obtain a plurality of standard architecture models when processing each room, the repeated standard architecture model of each room can be removed through the step, and one standard architecture model of each room is reserved, so that the problem that a plurality of standard architecture model diagrams exist in each room, and image overlapping or errors occur when finally generating the room type diagrams can be avoided, and the calculation processing workload can be reduced.

S802: the cloud calculates adjacency relations among the rooms based on the standard architecture models of all the rooms.

The cloud may determine the adjacency between all rooms based on the distances between all rooms, and may be implemented specifically with reference to a method for calculating the adjacency between all rooms in the prior art, which will not be described in detail herein.

S803: and the cloud end sequentially splices adjacent rooms according to the adjacent relation among all the rooms to obtain a multi-room house type graph.

The cloud can sequentially arrange and splice adjacent rooms according to the adjacent relation among all the rooms to obtain a final multi-room family pattern of the house.

Through the sixth embodiment, the cloud end can accurately arrange and splice the cloud end based on the standard architecture models of all rooms of the house, so that a complete house model of the house, namely a room house pattern diagram, can be effectively obtained.

In the embodiment of the present application, the method shown in each embodiment may be executed by the processing module of the cloud end in a unified manner, or may be executed by a separate module of the cloud end, which is not limited in particular.

In the embodiment provided by the application, the method provided by the embodiment of the application is introduced from the interaction angle among the devices. In order to implement the functions in the method provided by the embodiment of the present application, the generating device of the house type graph (may also be a generating device of the cloud end) may include a hardware structure and/or a software module, and implement the functions in the form of a hardware structure, a software module, or a combination of a hardware structure and a software module. Some of the functions described above are performed in a hardware configuration, a software module, or a combination of hardware and software modules, depending on the specific application of the solution and design constraints.

The division of the modules in the embodiment of the application is schematic, only one logic function is divided, and other division modes can be adopted in actual implementation. In addition, each functional module in the embodiments of the present application may be integrated in one processor, or may exist alone physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.

As shown in fig. 9, the embodiment of the present application further provides a device 900 for generating a family pattern for implementing the cloud execution function in the above method. The generating means of the house pattern graph may be, for example, a software module or a system-on-chip. In the embodiment of the application, the chip system can be formed by a chip, and can also comprise the chip and other discrete devices. The apparatus 900 for generating a house type graph may include: a communication unit 901 and a processing unit 902.

In the embodiment of the present application, the communication unit 901 may also be referred to as a transceiver unit, and may include a transmitting unit and/or a receiving unit, which are configured to perform the steps of transmitting and receiving the generating device of the family pattern in the foregoing method embodiment, respectively. The processing unit 902 may be configured to read the instructions and/or data in the storage module, so that the apparatus 900 for generating a family pattern implements the foregoing method embodiments.

Optionally, the apparatus 900 for generating a family pattern may further include a storage unit 903, where the storage unit 903 corresponds to a storage module and may be used to store instructions and/or data.

The device for generating a house type map according to the embodiment of the present application is described in detail below with reference to fig. 9 to 10. It should be understood that the descriptions of the apparatus embodiments and the descriptions of the method embodiments correspond to each other, and thus, descriptions of details not described may be referred to the above method embodiments, which are not repeated herein for brevity.

The communication unit 901 may also be referred to as a transceiver, transceiving means, etc. The processing unit may also be called a processor, a processing board, a processing module, a processing device, etc. Alternatively, the device for realizing the receiving function in the communication unit 901 may be regarded as a receiving unit, and the device for realizing the transmitting function in the communication unit 901 may be regarded as a transmitting unit, i.e., the communication unit 901 includes a receiving unit and a transmitting unit. The communication unit may also be referred to as a transceiver, transceiver circuitry, or the like. The receiving unit may also be referred to as a receiver, or receiving circuit, among others. The transmitting unit may also sometimes be referred to as a transmitter, or a transmitting circuit, etc.

When the apparatus 900 for generating a house pattern executes the function of the main body in the flow shown in fig. 2A in the above embodiment:

A communication unit 901, configured to acquire information of dense point clouds of at least one room and an initial architecture model diagram of the at least one room; wherein the information of the dense point cloud of one room is applied to the scale optimization of the initial architecture model diagram of one room, the information of the dense point cloud of one room is a dense three-dimensional point set used for representing the structure of the one room, and the information of the dense point cloud of one room is obtained by utilizing the image and inertia information of the one room; the processing unit 902 is configured to scale-optimize the initial architecture model graph of each room based on the information of the dense point cloud of each room, to obtain a first architecture model graph of each room; obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room; and generating a room type graph based on the standard architecture model graph corresponding to the at least one room respectively.

The foregoing is merely an example, and the processing unit 902 and the communication unit 901 may perform other functions, and a more detailed description may refer to a related description in the method embodiment shown in fig. 2A, which is not repeated herein.

As shown in fig. 10, which illustrates an apparatus 1000 provided by the embodiment of the present application, the apparatus for generating a family pattern shown in fig. 10 may be an implementation manner of a hardware circuit of the apparatus for generating a family pattern shown in fig. 9. The device for generating the house type graph can be applied to the flow chart shown in the foregoing, and the function of the device for generating the house type graph (for example, cloud) in the method embodiment is executed. For convenience of explanation, fig. 10 shows only the main components of the house pattern generation apparatus.

As shown in fig. 10, the apparatus 1000 for generating a house pattern includes a processor 1002 and a communication interface 1001. The processor 1002 and the communication interface 1001 are coupled to each other. It is understood that the communication interface 1001 may be a transceiver or an input/output interface, or may be an interface circuit such as a transceiver circuit. Optionally, the apparatus 1000 for generating a house pattern may further include a memory 1003 for storing instructions executed by the processor 1002 or storing input data required for the processor 1002 to execute the instructions or storing data generated after the processor 1002 executes the instructions.

When the apparatus 1000 for generating a house-hold diagram is used to implement the method shown in fig. 2A, the processor 1002 is configured to implement the functions of the processing unit 902, and the communication interface 1001 is configured to implement the functions of the communication unit 901.

The specific connection medium between the communication interface 1001, the processor 1002, and the memory 1003 is not limited in the embodiment of the present application. In the embodiment of the present application, the memory 1003, the processor 1002 and the communication interface 1001 are connected by a communication bus 1004 in fig. 10, and the communication bus 1004 is shown by a thick line in fig. 10, and the connection manner between other components is merely illustrative and not limited thereto. The communication bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus.

When the communication device is a chip, fig. 11 shows a simplified chip structure, and the chip 1100 includes an interface circuit 1101 and one or more processors 1102. Optionally, the chip 1100 may also include a bus.

Wherein:

The processor 1102 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the communication method described above may be performed by integrated logic circuitry of hardware in the processor 1102 or instructions in the form of software. The processor 1102 may be a general purpose processor, a digital communicator (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The methods and steps disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The interface circuit 1101 may be used for transmitting or receiving data, instructions or information, and the processor 1102 may process using the data, instructions or other information received by the interface circuit 1101 and may transmit processing completion information through the interface circuit 1101.

Optionally, the chip further includes a memory 1103, which memory 1103 may include read only memory and random access memory, and provide operating instructions and data to the processor. A portion of the memory 1103 may also include non-volatile random access memory (NVRAM).

Optionally, the memory stores executable software modules or data structures and the processor may perform corresponding operations by invoking operational instructions stored in the memory (which may be stored in an operating system).

Optionally, the chip may be used in the generating device (may also be the generating device of the cloud) of the family pattern diagram related in the embodiment of the present application. Optionally, the interface circuit 1101 may be configured to output the execution result of the processor 1102. The method for generating the house type graph according to one or more embodiments of the present application may refer to the foregoing embodiments, and will not be described herein.

It should be noted that, the functions corresponding to the interface circuit 1101 and the processor 1102 may be implemented by a hardware design, a software design, or a combination of hardware and software, which is not limited herein.

The embodiment of the present application further provides a computer readable storage medium, on which computer instructions for implementing the method performed by the first communication device in the above method embodiment are stored, and/or on which computer instructions for implementing the method performed by the generating device (which may also be the generating device of the cloud) of the room type graph in the above method embodiment are stored.

For example, when the computer program is executed by a computer, the computer may implement a method executed by a generating device (may also be a generating device of a cloud) of a room type graph in the above method embodiment.

The embodiment of the application also provides a computer program product containing instructions, which when executed by a computer, cause the computer to implement the method executed by the cloud end in the method embodiment, and/or cause the computer to implement the method executed by the generating device (may also be the generating device of the cloud end) of the house type graph in the method embodiment.

The embodiment of the application also provides a chip device, which comprises a processor, wherein the processor is used for calling the computer degree or the computer instruction stored in the memory so that the processor executes the generation method of the room layout diagram of the embodiment shown in the figure 2A.

In a possible implementation, the input of the chip device corresponds to the receiving operation in the embodiment shown in fig. 2A, and the output of the chip device corresponds to the transmitting operation in the embodiment shown in fig. 2A.

Optionally, the processor is coupled to the memory through an interface.

Optionally, the chip device further comprises a memory, in which the computer degree or the computer instructions are stored.

The processor mentioned in any of the above may be a general purpose central processing unit, a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of a program for generating a room floor plan of the embodiment shown in fig. 2A. The memory referred to in any of the above may be read-only memory (ROM) or other type of static storage device, random access memory (random access memory, RAM), or the like, that may store static information and instructions.

It should be noted that, for convenience and brevity, explanation and beneficial effects of related content in any of the above-provided generating devices for house types may refer to the above-provided corresponding method embodiment and each embodiment mode of the generating method for house types of rooms, which are not described herein again.

In the application, the communication devices can also comprise a hardware layer, an operating system layer running on the hardware layer and an application layer running on the operating system layer. The hardware layer may include a central processing unit (central processing unit, CPU), a memory management module (memory management unit, MMU), and a memory (also referred to as a main memory). The operating system of the operating system layer may be any one or more computer operating systems that implement business processing through processes (processes), for example, a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or windows operating system, etc. The application layer may include applications such as a browser, address book, word processor, instant messaging software, and the like.

The division of the modules in the embodiments of the present application is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.

From the above description of embodiments, it will be apparent to those skilled in the art that embodiments of the present application may be implemented in hardware, or firmware, or a combination thereof. When implemented in software, the functions described above may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. Taking this as an example but not limited to: computer readable media can include RAM, ROM, electrically erasable programmable read-Only memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY, EEPROM), compact-disk-read-Only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, it is possible to provide a device for the treatment of a disease. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (digital subscriber line, DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the fixing of the medium. As used in the embodiments of the present application, discs (disks) and disks include Compact Discs (CDs), laser discs, optical discs, digital versatile discs (digital video disc, DVDs), floppy disks, and blu-ray discs where disks usually reproduce data magnetically, while disks reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

In summary, the foregoing description is only exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made according to the disclosure of the present application should be included in the protection scope of the present application.

Claims

1. The method for generating the room type graph is characterized by comprising the following steps of:

acquiring information of dense point clouds of at least one room and an initial architecture model diagram of the at least one room; wherein the information of the dense point cloud of one room is applied to the scale optimization of the initial architecture model diagram of one room, the information of the dense point cloud of one room is a dense three-dimensional point set used for representing the structure of the one room, and the information of the dense point cloud of one room is obtained by utilizing the image and inertia information of the one room;

performing scale optimization on the initial architecture model graph of each room based on the information of the dense point cloud of each room to obtain a first architecture model graph of each room;

Obtaining a standard architecture model diagram of each room based on the first architecture model diagram of each room;

and generating a room type graph based on the standard architecture model graph corresponding to the at least one room respectively.

2. The method of claim 1, wherein the obtaining information of the dense point cloud of at least one room comprises:

acquiring information of a dense point cloud of an ith room in the at least one room, including:

Acquiring pose information and sparse point cloud information of an RGB image of the ith room and a slice diagram of a panoramic image of the ith room; the sparse point cloud is a sparse three-dimensional point set used for representing the structure of the ith room;

Registering a slice diagram of the panoramic image based on pose information of the RGB image of the ith room and information of sparse point cloud to obtain pose information of the slice diagram of the panoramic image and information of the sparse point cloud;

performing stereo matching on the pose information of the panoramic slice image and the sparse point cloud to obtain at least one depth image of the ith room;

Filtering and fusing at least one depth map of the ith room to obtain information of dense point clouds of the ith room, wherein the depth map of each ith room corresponds to one image frame of the ith room;

and i is any positive integer from 1 to N, wherein N is the number of the rooms.

3. The method according to claim 2, wherein the acquiring pose information of the RGB image of the i-th room and information of the sparse point cloud includes:

Acquiring an RGB image of the ith room and inertial information when shooting the room;

Based on the RGB image of the ith room and the inertia information, estimating and obtaining initial pose information of the ith room and information of an initial point cloud, wherein the initial point cloud is a two-dimensional point set for representing the structure of the ith room;

Optimizing the initial pose information of the ith room to obtain pose information of the RGB image of the ith room, and quantifying the initial point cloud information of the ith room to obtain sparse point cloud information of the ith room.

4. The method of claim 1, wherein obtaining an initial architectural model map of at least one room comprises:

acquiring an initial architecture model diagram of an ith room of the at least one room, comprising:

detecting the door position of the panoramic image of the ith room, and determining the door position information of the ith room; performing structure detection on the panoramic image of the ith room, and determining the structure information of the first room;

Obtaining an initial architecture model diagram of the ith room based on the position information of the door of the ith room and the structure information of the first room;

5. A method according to claim 2 or 3, characterized in that the method further comprises:

acquiring an RGB image of the ith room;

And generating the panoramic image of the ith room by adopting a characteristic panoramic stitching method based on the RGB image of the ith room.

6. The method according to claim 1, wherein the performing scale optimization on the initial architecture model graph of the corresponding room based on the information of the dense point cloud of each room to obtain the first architecture model graph of each room includes:

And performing scale optimization on the initial architecture model of each room based on the information of the dense point cloud of each room and the pre-constructed multi-sensor scale optimization model of each room to obtain a first architecture model diagram of each room.

7. The method of claim 1, wherein the obtaining a standard architecture model graph for each room based on the first architecture model graph for each room comprises:

Obtaining a standard architecture model diagram of an ith room based on a first architecture model diagram of the ith room, wherein the standard architecture model diagram comprises the following steps:

Acquiring a plurality of frames of the first architecture model diagrams of the ith room;

Adjusting and fusing the multi-frame first architecture model graph to obtain a standard architecture model graph of the ith room;

8. The method of claim 1, wherein generating a room profile based on the standard architecture model map for each of the at least one room comprises:

Calculating the adjacent relation between the at least one room according to the distance between the at least one room based on the standard architecture model diagrams respectively corresponding to the at least one room;

And according to the adjacent relation between the at least one room, the standard architecture model diagrams corresponding to the adjacent rooms are spliced in sequence, and the room type diagram is generated.

9. The method of claim 8, wherein the method further comprises:

and performing duplicate removal processing on the standard architecture model diagrams corresponding to the at least one room respectively.

10. A device for generating a room profile, characterized by comprising means or modules for performing the method according to any one of claims 1 to 9.

11. A device for generating a room profile, characterized in that the communication device comprises a processor and a storage medium storing instructions which, when executed by the processor, cause the method according to any one of claims 1 to 9 to be implemented.

12. A computer-readable storage medium comprising instructions which, when executed by a processor, cause the method of any one of claims 1 to 9 to be implemented.

13. A computer program product comprising instructions which, when executed by a processor, cause the method of any one of claims 1 to 9 to be implemented.