CN115937422A

CN115937422A - Three-dimensional scene reconstruction and online home decoration and commodity acquisition method, equipment and medium

Info

Publication number: CN115937422A
Application number: CN202211599959.2A
Authority: CN
Inventors: 宋瑾; 马林; 费义云; 胡晓航; 蒋健安
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-04-07
Also published as: WO2024124653A1

Abstract

The embodiment of the application provides a method, equipment and medium for reconstructing a three-dimensional scene, online home decoration and commodity acquisition. In the embodiment of the application, based on the image corresponding to the first space object, by detecting 2D visual information such as boundary lines and vanishing points existing in the image, and based on the 2D visual information, and combining constraint relations existing between the boundary lines and between the vanishing points, 3D structural information corresponding to the first space, that is, a target three-dimensional model is obtained, real scene information corresponding to the first space object can be embodied in the target three-dimensional model, which is beneficial to improving an application effect based on the three-dimensional model, for example, in a home decoration scene, a home decoration collocation effect based on the three-dimensional model can be improved. Moreover, the three-dimensional model has higher stability, stronger interpretability and stronger effect; meanwhile, the fit degree between the main body structures of the model in the generated three-dimensional model is higher by combining the constraint relation between the boundary lines and between the vanishing points.

Description

Three-dimensional scene reconstruction and online home decoration and commodity acquisition method, equipment and medium

Technical Field

The application relates to the technical field of three-dimensional reconstruction, in particular to a method, equipment and medium for reconstructing a three-dimensional scene, online home decoration and commodity acquisition.

Background

With the development of internet applications, users can perform various operations on line, such as purchasing goods, online home decoration, and the like. Taking online home decoration as an example, a user can select an existing 3D sample plate with the highest similarity to a house type through a home decoration Application (APP), various furniture or decorations are placed between the 3D sample plates to check the home decoration effect, and then the actual home decoration under the line is selected according to the home decoration effect.

However, the three-dimensional house three-dimensional model is manufactured between the existing 3D sample boards by using a 3D drawing technology, real house scene information of a user is lacked, and the house decoration matching effect is not ideal in the face of the house decoration requirement needing to be matched with the real house environment. Therefore, a solution capable of representing real house scene information in a three-dimensional house model is needed to improve the matching effect of online home decoration.

Disclosure of Invention

Aspects of the present application provide a method, device, and medium for three-dimensional scene reconstruction, online home decoration, and commodity acquisition, so as to embody real three-dimensional scene information in a three-dimensional scene model and improve an application effect based on the three-dimensional scene model, such as a home decoration matching effect.

The embodiment of the application provides a three-dimensional scene reconstruction method, which comprises the following steps: acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space object in a target physical space; detecting a plurality of boundary lines and at least two vanishing points in orthogonal main directions existing in the target image, wherein the boundary lines are the boundary lines between adjacent physical main body structures in the first space object; determining camera internal parameters of a target camera and a gravity direction under a camera coordinate system according to the vanishing points in the at least two orthogonal main directions, wherein the target camera is a camera used for shooting the target image; reconstructing an initial three-dimensional model corresponding to the first space object according to the multiple boundary lines and the gravity direction under the camera coordinate system, wherein the initial three-dimensional model comprises an adjacent model main body structure corresponding to the adjacent physical main body structure; and optimizing the initial three-dimensional model according to the constraint relation among the multiple boundary lines, the constraint relation among the vanishing points in the at least two orthogonal main directions and the camera internal parameters to obtain a target three-dimensional model.

The embodiment of the present application further provides an online home decoration method, including: responding to an image uploading operation, and acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space objects in a target physical space; responding to the placement operation of a target home decoration object on the target image, and fusing the target home decoration object into a target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model fused with the target home decoration object; projecting the target three-dimensional model fused with the target home decoration object onto the target image to obtain a home decoration effect graph containing the target home decoration object; the target three-dimensional model is constructed according to the steps in the three-dimensional scene reconstruction method provided by the embodiment of the application.

An embodiment of the present application further provides a method for selecting a commodity, including: responding to selection operation on a commodity page, and determining a selected target commodity, wherein the target commodity is provided with a commodity three-dimensional model; responding to a collocation effect viewing operation, and selecting a target image corresponding to a first space object to be collocated with the target commodity; adding the commodity three-dimensional model to a target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model fused with the target commodity; projecting the target three-dimensional model fused with the target commodity onto the target image to obtain a collocation effect diagram of the target commodity and the first space object; the target three-dimensional model is constructed according to the steps in the three-dimensional scene reconstruction method provided by the embodiment of the application.

An embodiment of the present application further provides an electronic device, including: a memory and a processor; the memory for storing a computer program; the processor is coupled with the memory and configured to execute the computer program, so as to perform the steps in the three-dimensional scene reconstruction method, the online home decoration method, or the commodity selection method provided by the embodiment of the present application.

Embodiments of the present application further provide a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to implement the steps in the three-dimensional scene reconstruction method, the online home decoration method, or the commodity selection method provided in the embodiments of the present application.

In the embodiment of the application, based on the image corresponding to the first space object, by detecting 2D visual information such as boundary lines and vanishing points existing in the image, and based on the 2D visual information, and combining constraint relations existing between the boundary lines and between the vanishing points, 3D structural information corresponding to the first space, that is, a target three-dimensional model is obtained, real scene information corresponding to the first space object can be embodied in the target three-dimensional model, which is beneficial to improving an application effect based on the three-dimensional model, for example, in a home decoration scene, a home decoration collocation effect based on the three-dimensional model can be improved.

Furthermore, in the embodiment of the application, 2D visual information such as boundary lines and vanishing points is directly detected from the image, and then 3D structure information is generated by using the 2D visual information and relevant constraint relation, rather than directly regressing the 3D structure information from the image, so that the three-dimensional model generated by the embodiment of the application has higher stability and higher interpretability, and has more robust effect on the conditions of large inclination of a shooting angle and disordered scenes; meanwhile, by combining the constraint relation between the boundary lines and between the vanishing points, the fit degree between the main body structures of the model in the generated three-dimensional model is higher.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1a is a schematic structural diagram of a three-dimensional scene reconstruction system according to an exemplary embodiment of the present application;

fig. 1b is a schematic flowchart of a three-dimensional scene reconstruction method according to an exemplary embodiment of the present disclosure;

FIG. 2a is a schematic diagram of a boundary line in a first spatial object according to an exemplary embodiment of the present application;

FIG. 2b is a schematic structural diagram of a vanishing point provided in an exemplary embodiment of the present application;

fig. 3a is a schematic structural diagram of a hough transform-based boundary detection model according to an exemplary embodiment of the present application;

fig. 3b is a schematic structural diagram of a hough transform-based vanishing point detection model according to an exemplary embodiment of the present application;

FIG. 3c is a schematic diagram of the position of the intersection between the two boundary lines according to an exemplary embodiment of the present application;

FIG. 3d is a schematic structural diagram of a principal structure of a reference model provided in an exemplary embodiment of the present application;

FIG. 3e is a schematic structural diagram of another model body structure provided in an exemplary embodiment of the present application;

FIG. 4a is a schematic flow chart of an online home decoration method according to an exemplary embodiment of the present application;

fig. 4b is a schematic diagram of a process of reconstructing a three-dimensional scene according to an exemplary embodiment of the present application;

FIG. 4c is a schematic diagram illustrating an online home decoration effect based on a target three-dimensional model according to an exemplary embodiment of the present application;

FIG. 4d is a schematic diagram illustrating another on-line decoration effect based on a target three-dimensional model according to an exemplary embodiment of the present application;

fig. 4e is a schematic flowchart of a commodity selection method according to an exemplary embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a three-dimensional scene reconstruction apparatus according to an exemplary embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

In the method, based on an image corresponding to a first space object, 2D visual information such as boundary lines and vanishing points existing in the image is detected, and based on the 2D visual information, a 3D structure information corresponding to the first space, namely a target three-dimensional model, is obtained by combining constraint relations between the boundary lines and between the vanishing points, and the real scene information corresponding to the first space object can be embodied in the target three-dimensional model, so that the method is beneficial to improving the application effect based on the three-dimensional model.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1a is a schematic structural diagram of a three-dimensional scene reconstruction system according to an exemplary embodiment of the present application. As shown in fig. 1a, the system comprises: terminal equipment 10 and server-side equipment 20. Wherein, the terminal device 10 and the server device 20 are connected in communication.

In this embodiment, the terminal device 10 may be a mobile phone, a notebook computer, a desktop computer, or the like, the server device 20 may be a physical server, a cloud server, a server array, or the like, and fig. 1a illustrates an example in which the terminal device 10 is a smart phone and the server device 20 is a physical server, but the invention is not limited thereto.

The terminal device 10 may acquire a target image corresponding to the first spatial object, for example, the terminal device may own a camera, and the terminal device may acquire the target image in the first spatial object through the camera; for another example, the target image corresponding to the first spatial object is captured by a camera independent of the terminal device 10, and the captured target image is provided to the terminal device 10 by the camera independent of the terminal device.

In this embodiment, the terminal device 10 may provide the acquired target image corresponding to the first space object to the server device 20, and the server device 20 generates the target three-dimensional model corresponding to the first space object. The process of generating the target three-dimensional model corresponding to the first space object by the server device 20 includes: detecting a plurality of boundary lines and at least two vanishing points in the orthogonal main direction, wherein the boundary lines are the boundary lines between adjacent physical main body structures in the first space object; determining camera internal parameters of a target camera and a gravity direction under a camera coordinate system according to at least two vanishing points in orthogonal main directions, wherein the target camera is a camera used for shooting a target image; according to the multiple boundary lines and the gravity direction under the camera coordinate system, an initial three-dimensional model corresponding to the first space object is reconstructed, wherein the initial three-dimensional model comprises an adjacent model main body structure corresponding to an adjacent physical main body structure; and optimizing the initial three-dimensional model according to the constraint relation among the multiple boundary lines, the constraint relation among the vanishing points in at least two orthogonal main directions and the camera internal parameters to obtain the target three-dimensional model. For details of each operation, reference may be made to the following embodiments, and details will not be provided here.

On the basis of the target three-dimensional model, the user can add target objects required by home decoration, such as objects like sofas, wall paintings, office chairs or desks, on the terminal device 10 aiming at the target image; the terminal device 10 may obtain the position range information of the target object in the target image, and provide the information of the target object and the position range information of the target object in the target image to the server device 20; the server device 20 fuses the three-dimensional model of the target object in the target three-dimensional model corresponding to the first space object according to the position range information of the target object in the target image, projects the target three-dimensional model fused with the target object onto the target image to obtain the target image containing the target object, returns the target image containing the target object to the terminal device 10, and displays the target image containing the target object to the user through the terminal device 10 to realize the display of the online home decoration matching effect.

It should be noted that the three-dimensional scene reconstruction method provided in this embodiment of the present application may be applied to the three-dimensional scene reconstruction system shown in fig. 1a, and is completed by the terminal device and the server device in cooperation with each other, or may be implemented by the terminal device independently. For the process of three-dimensional scene reconstruction, reference is made to the description of the method embodiments described below.

Fig. 1b is a schematic flowchart of a three-dimensional scene reconstruction method according to an exemplary embodiment of the present disclosure. As shown in fig. 1b, the method comprises:

101. acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space objects in a target physical space;

102. detecting a plurality of boundary lines and at least two vanishing points in orthogonal main directions existing in a target image, wherein the boundary lines are the boundary lines between adjacent physical main body structures in a first space object;

103. determining camera internal parameters of a target camera and a gravity direction under a camera coordinate system according to at least two vanishing points in the orthogonal main directions, wherein the target camera is a camera used for shooting a target image;

104. according to the multiple boundary lines and the gravity direction under the camera coordinate system, an initial three-dimensional model corresponding to the first space object is reconstructed, wherein the initial three-dimensional model comprises an adjacent model main body structure corresponding to an adjacent physical main body structure;

105. and optimizing the initial three-dimensional model according to the constraint relation among the multiple boundary lines, the constraint relation among the vanishing points in at least two orthogonal main directions and the camera internal parameters to obtain the target three-dimensional model.

In this embodiment, the target physical space refers to a spatial area in a specific scene, for example, the target physical space may be various areas with spatial concepts, such as a mall, a supermarket, an airport, or a house. The target physical space contains at least one space object, in other words, the at least one space object constitutes the target physical space. For example, the target physical space may be a physical room, a plurality of spatial objects included in the physical room, such as a kitchen, a bedroom, a living room, a bathroom, and the like. For another example, the target physical space may be a mall including multiple floors, where different businesses are located on different floors, and each business may be regarded as a spatial object. For convenience of distinction and description, at least a part of the space object in the target physical space is referred to as a first space object, for example, the first space object may be one space object in the target physical space, or may be a plurality of space objects in the target physical space, such as 2 or 3, etc.; for another example, the first spatial object may be a local space in a certain spatial object. Taking the target physical space as a physical house as an example, the first space object may be the whole physical house, or may be an independent space object such as a main sleeping room, a secondary sleeping room or a living room therein, or may be a local space in an independent space object such as a main sleeping room, a secondary sleeping room or a living room, or may be a local space simultaneously including multiple independent space objects, such as a partial living room space and a partial balcony space.

In this embodiment, the target image corresponding to the first spatial object may be acquired, for example, an image in the first spatial object is acquired by a camera, and the target image corresponding to the first spatial object is acquired from the camera, or a camera is installed on the terminal device, and the target image corresponding to the first spatial object is acquired by the camera on the terminal device. The target image is an environment image of the first space object and contains real environment information of the first space object.

In this embodiment, there are a plurality of boundary lines in the target image, and the boundary lines are the boundary lines between adjacent physical body structures in the first spatial object, where the physical body structures may include but are not limited to: wall, ceiling or floor, etc. The intersection between the wall surfaces may be referred to as a wall line, the intersection between the wall surface and the ground may be referred to as a ground line, and the intersection between the wall surface and the ceiling may be referred to as a ceiling line. As shown in fig. 2a, the first space object is illustrated as a local space of a living room, and includes: the ground wire is C1, the wall wire is C2, and the ceiling wire is C3. The ground line C1, the wall line C2, and the ceiling line C3 in fig. 2a are examples of the boundary lines in the embodiment of the present application, but are not limited thereto.

In this embodiment, the first spatial object may conform to a manhattan structure, wherein the manhattan structure includes three orthogonal main directions, for example, the three orthogonal main directions are a gravity direction, a direction facing the main wall surface, and a direction facing the side wall surface. Wherein, the gravity direction can be regarded as the direction vertical to the ground or the ceiling; the main body wall surface is a wall surface with the smallest included angle with the optical axis of the target camera, the target camera is a camera used for shooting a target image, the optical axis of the target camera is a line passing through the center of the lens, and the direction opposite to the main body wall surface is the direction vertical to the main body wall surface, namely the direction of the normal vector of the main body wall surface; the side wall is for the main part wall, mainly refers to the wall adjacent with the main part wall, and the just right direction of side wall is the direction of perpendicular to side wall, generally speaking, can have a plurality of side walls in the target image, for example, 2, and a plurality of side walls are relative or parallel, and the direction that a plurality of side walls correspond is the same, the normal vector direction of side wall promptly. In this illustration, the target image may include physical body structures corresponding to three orthogonal principal directions simultaneously, such as a body wall, a side wall, and a floor or ceiling; it is also possible to include only physical body structures corresponding to two orthogonal main directions, e.g. only body wall surfaces and floors, or only side wall surfaces and ceilings, or only body wall surfaces, ceilings and floors, etc. Whether the target image includes physical body structures corresponding to three orthogonal main directions or includes physical body structures corresponding to two orthogonal main directions may be determined according to the shooting angle of the target image. In short, there are at least physical body structures in the target image that correspond to two orthogonal principal directions.

In this embodiment, parallel straight lines in a three-dimensional space are intersected at a point in a two-dimensional image corresponding to the three-dimensional space after perspective change, where the point is a Vanishing Point (VP), and a direction of the Vanishing point represents a direction of the parallel straight lines in the three-dimensional space where the Vanishing point is intersected. Wherein a plurality of vanishing points exist in the target image, and the vanishing points comprise vanishing points positioned in at least two orthogonal main directions; for a vanishing point located in a certain orthogonal principal direction, parallel straight lines in a three-dimensional space intersecting to obtain the vanishing point may be perpendicular to a physical body structure in the three-dimensional space corresponding to the orthogonal principal direction, and the physical body structure in the three-dimensional space conforms to a manhattan structure, and each physical body structure corresponds to a principal direction, that is, a direction of the vanishing point located in the certain orthogonal principal direction in the target image corresponds to one principal direction in the three-dimensional space, that is, a direction of the vanishing point located in the certain orthogonal principal direction represents a direction of a normal vector of the certain physical body structure. In case there are at least physical body structures in the target image corresponding to the two orthogonal main directions, that means that the target image comprises vanishing points in the at least two orthogonal main directions. As shown in fig. 2b, wherein the line 1 is parallel to the right wall (i.e. the wall hung in fig. 2 b), and the direction of the vanishing point D1 obtained by intersecting the line 1 represents the normal vector direction of the left wall (i.e. the wall with the window in fig. 2 b); the line 2 is parallel to the left wall (i.e., the wall with the window in fig. 2 b), and the direction of the vanishing point D2 obtained by intersecting the line 2 represents the normal vector direction of the right wall (i.e., the wall hung in fig. 2 b); the line 3 is parallel to the ground (or the ceiling), and the direction of the vanishing point D3 obtained by intersecting the line 3 represents the normal vector direction of the ground (or the ceiling), i.e. the direction parallel to the gravity direction.

In this embodiment, a plurality of boundary lines and vanishing points in at least two orthogonal principal directions present in the target image can be detected. When the boundary lines and the vanishing points are detected, the target image does not need to be segmented, the boundary lines and the vanishing points are directly detected on the basis of the target image, and the boundary lines and the vanishing points are not interfered by image segmentation, so that the accuracy and the precision of the detected boundary lines and the vanishing points are high.

In this embodiment, the position of the vanishing point in the target image is determined by the intrinsic parameters of the target camera and the 3D direction corresponding to the vanishing point (i.e. the main direction in which the vanishing point is located), and the intrinsic parameters of the target camera are parameters related to the characteristics of the target camera itself, such as the focal length, the pixel size, and the main point position (i.e. the camera optical center position) of the target camera. Conversely, under the condition that the vanishing points in at least two orthogonal main directions on the target image are known, the camera intrinsic parameters of the target camera and the gravity direction under the camera coordinate system can be determined according to the vanishing points in the at least two orthogonal main directions. Specifically, camera intrinsic parameters of the target camera may be determined according to vanishing points in at least two orthogonal principal directions, and a gravity direction in a camera coordinate system may be determined according to the camera intrinsic parameters and the vanishing points in the gravity direction.

On the basis of obtaining the gravity direction in the camera coordinate system and the multiple boundary lines existing in the target image, an initial three-dimensional model corresponding to the first space object can be reconstructed according to the multiple boundary lines existing in the target image and the gravity direction in the camera coordinate system, and the initial three-dimensional model comprises adjacent model main body structures corresponding to the adjacent physical main body structures. The physical main structure is a main structure constituting a first space object, and the model main structure is a main structure constituting an initial three-dimensional model; the model body structure is an embodiment of the physical body structure in the initial three-dimensional model. The initial three-dimensional model corresponding to the first space object is constructed according to the plane equation of each physical main body structure, for example, the corresponding three-dimensional model of the house can be constructed according to the plane equation of each wall surface, ground surface or ceiling included in the house. The plane equation of the physical body structure may uniquely represent the physical body structure, and may optionally be represented by a normal vector of the physical body structure and a distance from an optical center of the camera, but is not limited thereto. After the initial three-dimensional model is obtained, the model main body structure in the initial three-dimensional model also has a plane equation, and theoretically, the plane equation of the model main body structure and the plane direction of the corresponding physical main body structure should be the same.

The general principle of reconstructing the initial three-dimensional model corresponding to the first space object according to the multiple boundary lines existing in the target image and the gravity direction under the camera coordinate system is as follows: constructing a ground plane in a three-dimensional space according to the gravity direction under the camera coordinate system, wherein the ground plane is vertical to the gravity direction under the camera coordinate system; based on a plurality of boundary lines existing in the target image, a boundary formed by the first space object on the ground plane and a model main body structure existing on the boundary can be determined; it is further possible to assume the height of the camera's optical center from the ground, i.e. the camera height, and determine the height of the model's body structure (e.g. ceiling) based on the assumed camera height and a pre-assumed scaling, resulting in an initial three-dimensional model corresponding to the first spatial object.

In this embodiment, the initial three-dimensional model corresponding to the first space object is constructed based on the boundary line in the target image, and when there is an error in the boundary line, the perspective relationship in the initial three-dimensional model is incorrect, for example, there is no vertical relationship between adjacent wall surfaces, and further, assuming the camera height and the zoom scale, the generated wall surface height is also incorrect. Based on this, in this embodiment, after the initial three-dimensional model is obtained, the initial three-dimensional model is further optimized according to the constraint relationship between the multiple boundary lines and the constraint relationship between the vanishing points in the at least two orthogonal main directions, in combination with the camera internal parameters, so as to obtain the target three-dimensional model. Wherein, the constraint relation among the multiple boundary lines is as follows: the physical structures forming the boundary lines are perpendicular to each other, and the constrained relationship may also be referred to as manhattan constraint, i.e., the wall surface, the floor surface, and the ceiling surface of the first spatial object are perpendicular to each other. The constraint relationship between vanishing points in at least two orthogonal principal directions is: the physical main structures corresponding to the vanishing points in any two orthogonal main directions are perpendicular to each other, and the normal vector direction of the physical main structure corresponding to the vanishing point is parallel to the direction of a parallel straight line represented by the vanishing point. Based on the constraint relations, the normal vector direction corresponding to the model main body structure in each initial three-dimensional model and the distance from the camera optical center can be optimized, so that the model main body structure with the position relation, the height and the like being in height fit with the physical main body structure is obtained.

In the embodiment of the application, based on the image corresponding to the first space object, by detecting 2D visual information such as boundary lines and vanishing points existing in the image, and based on the 2D visual information, and combining constraint relations existing between the boundary lines and between the vanishing points, 3D structural information corresponding to the first space, that is, a target three-dimensional model is obtained, real scene information corresponding to the first space object can be embodied in the target three-dimensional model, which is beneficial to improving an application effect based on the three-dimensional model, for example, in a home decoration scene, a home decoration collocation effect based on the three-dimensional model can be improved. Further, in the embodiment, 2D visual information such as boundary lines and vanishing points is directly detected from the image, and then 3D structure information is generated by using the 2D visual information and related constraint relations, instead of directly regressing the 3D structure information from the image, so that the target three-dimensional model generated by the embodiment has higher stability and stronger interpretability, and has more robust effect on the conditions of large inclination of a shooting angle and disordered scenes; meanwhile, the fit degree between the main body structures of the model in the generated target three-dimensional model is higher by combining the constraint relation between the boundary lines and between the vanishing points.

In an optional embodiment of the present application, when the boundary lines and the vanishing points are detected, a plurality of boundary lines existing in the target image and vanishing points in at least two orthogonal principal directions may be detected based on hough transform. In the present embodiment, a plurality of boundary lines, such as a ground line, a wall line, a ceiling line, and the like, existing in the target image may be detected by using the hough transform, and meanwhile, vanishing points in at least two orthogonal main directions included in the target image may also be detected based on the hough transform.

Further alternatively, in detecting a plurality of boundary lines and vanishing points in at least two orthogonal principal directions present in the target image based on the hough transform, the plurality of boundary lines and the vanishing points in the at least two orthogonal principal directions present in the target image may be detected based on a depth neural network model of the hough transform. Further optionally, the hough transform-based deep neural network model comprises: a hough transform-based boundary detection model and a hough transform-based vanishing point detection model. The hough transform-based boundary line detection model may be any neural network model capable of detecting boundary lines from an image, and the hough transform-based vanishing point detection model may be any neural network model capable of detecting vanishing points from an image. On the basis, on one hand, the target image can be input into a hough transform-based boundary detection model to detect the boundary so as to acquire a plurality of boundaries existing in the target image; on one hand, the target image can be input into a hough transform-based vanishing point detection model for vanishing point detection so as to obtain vanishing points in at least two orthogonal main directions in the target image. The following is an exemplary description of the architecture and detection process of the neural network model for boundary detection and vanishing point detection, respectively.

Boundary line detection based on Hough transform:

in an alternative embodiment, as shown in fig. 3a, the hough transform-based boundary detection model includes: the first feature extraction network is a neural network fused with Skip Connections (Skip Connections) and multi-scale features. The multi-scale features refer to a technology for extracting a plurality of feature maps with different scales aiming at a target image, the feature maps with different scales contain different feature information, and the smaller the size of the feature map is, the larger the depth is, and the feature map belongs to deep network features; conversely, the larger the size of the feature map is, the smaller the depth is, and the feature belongs to shallow network features. The Receptive Field (Receptive Field) of the deep network features is large, the resolution of a feature map is low, the semantic information representation capability is strong, and the geometric detail information representation capability is weak; the receptive field of the shallow network features is small, the resolution of the feature map is high, the semantic information representation capability is weak, and the geometric detail information representation capability is strong. Wherein, the receptive field is the area size of the mapping of the pixel points on the characteristic map (feature map) output by each layer of the convolutional neural network on the input picture. Wherein, the jump connection is a simple and effective operation for fusing the deep network features and the shallow network features, and the jump connection can add the shallow network features and the deep network features with the same scale at element level. Based on the method, the target image can be input into a hough transform-based boundary detection model, and the feature extraction is performed on the target image by using a first feature extraction network fusing jump connection and multi-scale features, so that feature maps of multiple scales are obtained. For convenience of description and distinction, the feature maps of multiple scales finally output by the first feature extraction network are referred to as first target feature maps of multiple scales.

In order to facilitate boundary line extraction, the features in the image pixel space can be converted into features in a hough space, and the features in the hough space are subjected to secondary classification, namely whether the features in the hough space are features corresponding to the boundary lines in the image pixel space or not is distinguished. Based on this, as shown in fig. 3a, the hough transform-based boundary detection model further includes: and (4) Hough transform network. Based on the hough transform network, the first target feature maps of multiple scales may be subjected to hough transform using hough transform based on polar coordinates to map straight lines existing in the target image to points in hough space, and the converted points may be subjected to two classifications in hough space, for example, to distinguish which points in hough space are points formed by boundary lines in image pixel space and which points are not points formed by boundary lines in image pixel space.

The first target feature map of any scale belongs to an image space, that is, a pixel coordinate system is adopted, the pixel coordinate system is a coordinate system with the center of the target image as an origin, the horizontal direction is an x-axis, the vertical direction is a y-axis, and the unit length is a pixel. Unlike the pixel coordinate system, the hough space uses a polar coordinate system, in which a line in the target image can be represented by (ρ, θ), ρ represents a distance from an end of the line to an origin of the pixel coordinate system, θ represents an angle between the line and an x-axis of the pixel coordinate system, and coordinates of the first target feature map in the hough space are (ρ, θ), so that a line representing the target image by the polar coordinate system becomes a point. All straight lines existing in the target image are mapped into points in the Hough space, the points comprise points mapped by the boundary lines, namely the points matched with the characteristics of the boundary lines, and also comprise points mapped by non-boundary lines, namely the points not matched with the characteristics of the boundary lines, wherein the points matched with the characteristics of the boundary lines in the Hough space are called target points for the convenience of distinguishing and describing. Based on the method, a plurality of target points matched with the boundary line characteristics can be selected in the Hough space, and the target points are remapped into the image space to obtain a plurality of boundary lines existing in the target image. As shown in fig. 3a, a plurality of target points may be remapped into the image space by a Reverse Hough Transform (RHT) to obtain a plurality of boundary lines existing in the target image.

Further optionally, as shown in fig. 3a, the first target feature extraction network comprises a plurality of down-sampling modules, a plurality of up-sampling modules, and a hopping connection module. In an alternative embodiment, feature extraction is performed on a target image to obtain an initial feature map, the initial feature map is used as a first intermediate feature map with the maximum scale, and a downsampling module is used to perform downsampling (undersampling) on the first intermediate feature map with the maximum scale to obtain first intermediate feature maps with other scales, where N is a positive integer, and for example, N may be 2, 3, 4, or 6; and taking the first intermediate feature map with the minimum scale as a first target feature map with the minimum scale, performing up-sampling (upsampling) processing on the first target feature map with the minimum scale by using an up-sampling module for N times, and performing jump connection with the first intermediate feature map with the same scale obtained by the down-sampling processing in each up-sampling processing to obtain the first target feature maps with other scales.

As shown in fig. 3a, assuming that the scale of the first intermediate feature map with the largest scale is 512 × 3, n =3, the first intermediate feature maps with the scales of 256 × 3, 128 × 3 and 64 × 3 can be obtained through 3 times of downsampling processing; then, taking the first intermediate feature map with the scale of 64 × 3 as the first target feature map with the minimum scale, sequentially performing 3 times of upsampling processing on the first target feature map, specifically, obtaining a second intermediate feature map with the scale of 128 × 3 through 1 time of upsampling processing, and performing skip connection on the second intermediate feature map and the first intermediate feature map with the scale of 128 × 3 to obtain the first target feature map with the scale of 128 × 3; continuing to perform 1 upsampling on the first target feature map with the scale of 128 × 3 to obtain a second intermediate feature map with the size of 256 × 3, and performing skip connection on the second intermediate feature map with the scale of 256 × 3 and the first intermediate feature map with the size of 256 × 3 to obtain a first target feature map with the scale of 256 × 3; and continuously performing 1-time upsampling processing on the first target feature map with the scale of 256 × 3 to obtain a second intermediate feature map with the scale of 512 × 3, and performing skip connection on the second intermediate feature map with the scale of 512 × 3 and the first intermediate feature map with the scale of 512 × 3 to obtain a first target feature map with the scale of 512 × 3.

As further shown in fig. 3a, the hough transform network comprises: a Hough transform module and a feature fusion module. Based on this, an embodiment of performing hough transform on a first target feature map of multiple scales using hough transform based on polar coordinates to map straight lines existing in a target image to points in hough space includes: carrying out Hough transformation on the first target feature maps X in multiple scales by utilizing a Hough transformation module to obtain second target feature maps Y in multiple scales in a Hough space based on polar coordinates, wherein the Hough transformation can be respectively carried out on the first target feature maps X in each scale to obtain the second target feature maps Y in the Hough space based on polar coordinates in the scale; carrying out scale transformation on the second target feature maps with multiple scales by using a feature fusion module to obtain multiple feature maps with the same scale, and splicing the multiple feature maps with the same scale to obtain a third target feature map Z; and performing convolution dimensionality reduction on the third target feature map to obtain a two-dimensional image in the Hough space, wherein the two-dimensional image uses a polar coordinate system, the two-dimensional image comprises a plurality of points, the coordinates of each point are (rho, theta), and each point corresponds to a straight line in the target image.

In this embodiment, the scale of the third target feature map is not limited, for example, the scale of the third target feature map may be any one of a plurality of scales corresponding to the second target feature map, and preferentially, the scale of the third target feature map is a maximum scale of the plurality of scales corresponding to the second target feature map; based on this, when the second target feature maps of multiple scales are subjected to scale conversion, the second target feature degree of a non-maximum scale may be subjected to upsampling processing so as to change the size thereof to the maximum scale.

As shown in fig. 3a, a first target feature map, a second target feature map, a third target feature map, and a two-dimensional image in the Hough space are shown, and in this embodiment, hough transform is performed on the basis of a Depth feature map, and is referred to as Depth Hough Transform (DHT). In fig. 3a, a feature fusion module is represented by circled c, and the feature fusion module further includes an upsampling unit (upsample) and a splicing unit (Concat), where the upsampling unit is configured to perform scale transformation on a second target feature map with multiple scales to obtain multiple feature maps with the same scale; and the splicing unit (Concat) is used for splicing a plurality of feature maps with the same scale to obtain a third target feature map Z.

Before the boundary detection model based on the hough transform is used, the boundary detection model based on the hough transform may be obtained by performing model training in advance. In this embodiment, a combination of a depth hough transform and a neural network model is innovatively applied to detection of boundary lines in an image, specifically, a picture of a large number of spatial objects (such as an indoor scene) is obtained, a wall line, a ground line and a ceiling line existing in the picture are labeled to obtain a sample data set, a basic neural network based on the hough transform is trained by using the sample data set, the basic neural network comprises a first feature extraction network and a hough transform network, the first feature extraction network comprises an up-sampling module, a down-sampling module and a jump connection module, and is used for performing feature extraction on a sample image in a pixel space to obtain a multi-scale sample feature map, the hough transform network comprises a hough transform module and a feature fusion module, and is used for transforming the multi-scale sample feature map into the hough space through the hough transform to obtain a sample image in the hough space, and then classifying the sample image in the hough space by using a binary classification method, specifically, namely, dividing points in the sample image into first class points corresponding to the boundary lines and second class points corresponding to other straight lines; then, a Loss function is generated according to the labeling result and the classification result, optionally, a Binary cross entropy Loss (BCELoss) function can be adopted, and training is continued under the condition that the Loss function does not reach the standard until the Loss function reaches the standard or the training time or times of the model reach the set time or times, so that the hough transform-based boundary detection model is obtained. It should be noted that, in fig. 3a, a part of performing the computation of the loss function based on the labeling result and the classification result is shown, and this part is used in the model Training (Training only) phase; in addition, the process of remapping a plurality of target points into the image space by inverse hough transform to obtain a plurality of boundary lines existing in the target image is also shown in fig. 3a, and the process is used in the model inference phase.

Detecting vanishing points based on Hough transform:

in an alternative embodiment, as shown in fig. 3b, the hough transform-based vanishing point detection model includes a second feature extraction network, which may be any network capable of feature extraction, for example, a backbone (backbone) network in image classification, such as UNet, stacked hour glass (Stacked Hourglass) model, etc., where UNet is a variant of Full Convolutional Network (FCN), and its network structure is symmetric, similar to the english letter U, and is therefore called UNet; the Stacked Hourglass model is a network structure that utilizes multi-scale features to recognize gestures. The target image can be input into a hough transform-based vanishing point detection model, and a second feature extraction network is used for carrying out feature extraction on the target image to obtain a fourth target feature map; as shown in fig. 3b, the target image is [512x512] × 3, where [512x512] is the scale of the target image and 3 is the number of channels; the fourth target feature map is [128x128] ]x3, but the fourth target feature map is not limited thereto. The fourth target feature map may be mapped to a gaussian spherical space, for example, a straight line in the target image is projected to a gaussian spherical space with the center of the target camera as the center of the sphere, in the gaussian spherical space, a straight line intersecting at the same point in the target image has the strongest response at one point, and vanishing points located in at least two orthogonal principal directions may be obtained according to a response value and an angle of each point in the gaussian spherical space. Further, as shown in fig. 3b, the hough transform-based vanishing point detection model further includes a hough transform network based on a gaussian sphere. Based on the method, the fourth target feature graph is sent into a Hough transform network based on a Gaussian sphere, and in the network, hough transform is carried out on the fourth target feature graph by utilizing Hough transform based on the Gaussian sphere so as to map straight lines existing in the target image to points in a Gaussian sphere space; and then, selecting at least two points with the probability values and the angle values meeting requirements in a Gaussian spherical space as vanishing points in at least two orthogonal main directions, and remapping the points into an image space to obtain the vanishing points in the at least two orthogonal main directions in the target image. The Gaussian spherical space comprises a plurality of points, each point is obtained from a straight line in a target image, each point has two attributes of a brightness value and an angle value, the brightness value represents the probability value that the corresponding point is a point (namely a vanishing point) obtained by direct intersection, and the higher the brightness of the corresponding point on the Gaussian spherical surface is, the higher the probability that the point is the vanishing point is; the angle value represents the included angle between the straight line corresponding to the point and the x axis in the image space, and whether the straight lines corresponding to the points on the Gaussian sphere are mutually vertical can be determined according to the angle value. Based on this, at least two points in the gaussian spherical space for which the probability values and the angle values meet the requirements can be selected as vanishing points in at least two orthogonal principal directions.

Further optionally, the hough transform network based on gaussian sphere comprises a hough transform module and a gaussian sphere transform module. Based on the method, after the fourth target feature map is obtained by performing feature extraction on the target image, the fourth target feature map can be sequentially transformed into a Hough space and a Gaussian spherical space. In the Hough space, a straight line in a target image is mapped to be a point (called Hough point for short), but whether the Hough point is a point intersected by a plurality of straight lines cannot be directly distinguished; furthermore, the hough point is mapped into a gaussian spherical space, in the gaussian spherical space, the response value (i.e. luminance value) of each point (for short, gaussian point) is different due to the different number of corresponding straight lines of the gaussian point, and the response value of the gaussian point formed by one straight line is smaller than the response value of the gaussian point formed by the intersection of a plurality of straight lines, so that a plurality of gaussian points with the strongest response values can be selected as vanishing points. Specifically, the fourth target feature map is input to a hough transform module, and inside the module, the fourth target feature map is subjected to hough transform to obtain a fifth target feature map in a hough space based on polar coordinates, as shown in fig. 3b, the dimension of the fifth target feature map is [184x180] × 128, but not limited thereto, where 184 is the maximum value in the distance dimension represented by ρ,180 is 180 ° in the θ angle dimension, and 128 is the number of channels, and in fig. 3b, HT represents the hough space, and (ρ, θ) are coordinates in the hough space. Further, as shown in fig. 3b, a hough convolution is performed on a fifth target feature map in the hough space to obtain a sixth target feature map, wherein the dimension of the sixth target feature map is unchanged relative to the fifth target feature map, and in fig. 3b, the hough convolution is represented as HT Conv. Next, as shown in fig. 3b, inputting the sixth target feature map into a gaussian spherical transformation module, and performing spherical gaussian spherical transformation on the sixth target feature map inside the module to obtain a seventh target feature map in a gaussian spherical space, where in fig. 3b, (α, β) is a variable in the gaussian spherical space, the seventh target feature map is illustrated by taking the dimension of [32768] × 128 as an example, 128 is the number of channels, and [32768] indicates that the gaussian spherical surface is discretized into 32768 points, and through the spherical gaussian spherical transformation, a point in the hough space can be determined to correspond to a discrete point in the gaussian spherical space; in fig. 3b, the spherical convolution is taken as a objective Conv, and the dimension of the eighth objective feature map is [32768] × 128, but the present invention is not limited thereto.

Optionally, according to the probability value of each gaussian point in the eighth target feature map, selecting a point with a probability value meeting a set requirement from the plurality of gaussian points as a vanishing point, for example, a point with a probability value exceeding a set probability threshold value is used as the vanishing point, and the probability threshold value may be 80%, 90%, or 95%; at least two vanishing points with angles larger than a set angle are selected from the vanishing points to serve as vanishing points in at least two orthogonal main directions. Here, two or three vanishing points with the largest angle may be selected from the vanishing points, and as the vanishing points in at least two orthogonal principal directions, 3 vanishing points may be selected from the gaussian sphere shown in fig. 3b as an example.

In this embodiment, the determining the camera intrinsic parameters of the target camera and the gravity direction in the camera coordinate system according to the vanishing points in the at least two orthogonal principal directions includes: under the condition that at least two finite vanishing points are included in the vanishing points in the at least two orthogonal main directions, selecting two target vanishing points from the at least two finite vanishing points; determining camera intrinsic parameters of the target camera according to the constraint relation between the two target vanishing points and the camera intrinsic parameters; and converting vanishing points in the gravity direction in at least two orthogonal main directions into a camera coordinate system according to camera intrinsic parameters to obtain the gravity direction in the camera coordinate system.

When the target camera collects a target image, two parallel straight lines in a three-dimensional space are intersected through perspective transformation to form vanishing points in the target image, and the vanishing points may be intersected at infinity or not compared with infinity. In the present embodiment, infinity is defined, and for example, when the distance from the pixel coordinate of the vanishing point to the origin of the pixel coordinate system is larger than a set multiple (e.g., 10 times) of the diagonal length of the target image, it is considered that the vanishing point intersects with infinity. For the sake of convenience of distinction and description, vanishing points that intersect at infinity are referred to as infinite vanishing points, and vanishing points that do not intersect at infinity are referred to as finite vanishing points. Based on this, the vanishing points in the at least two orthogonal principal directions in the embodiment of the present application may include infinite vanishing points and finite vanishing points. Specifically, in a case where at least two finite vanishing points are included in the vanishing points in the at least two orthogonal principal directions, two vanishing points are selected from the at least two finite vanishing points as the target vanishing points. An embodiment in which two target vanishing points are selected from the at least two limited vanishing points is not limited, for example, two vanishing points may be randomly selected from the at least two limited vanishing points as the target vanishing points; for another example, two vanishing points closest to the optical center of the camera may be selected from the at least two finite vanishing points as target vanishing points. No matter which two selected target vanishing points are, the two target vanishing points may or may not include the vanishing point located in the gravity direction, which is not limited.

After the two target vanishing points are selected, the camera intrinsic parameters of the target camera can be determined according to the constraint relation between the two target vanishing points and the camera intrinsic parameters. The constraint relation between the two target vanishing points and the camera intrinsic parameters is as follows: (K) ^-1 ⁕VP ₁ ）·（K ^-1 ⁕VP ₂ ) =0, the constraint relation representing twoThe result of the dot product operation of the corresponding linear directions of the target vanishing points in the three-dimensional space is 0, which means that the corresponding linear directions of the two target vanishing points in the three-dimensional space are mutually perpendicular, and VP is ₁ : a three-dimensional vector representing a first target vanishing point, wherein the three-dimensional vector is formed by adding a dimension behind the image coordinate of the first target vanishing point, for example, 1 can be added behind the image coordinate to form the three-dimensional vector of the first target vanishing point; k: representing the intrinsic parameters of the target camera, is a 3X3 matrix including the focal length and the principal point position, which is the camera optical center position, for example, a position with image coordinates of (0.5 ) can be taken as the camera optical center position, but is not limited thereto, wherein the focal length is to be solved and the principal point position is known; k is ^-1 : representing an inverse of the intra-camera parameters; VP ₂ : is a three-dimensional vector of the second target vanishing point, which is formed by adding a dimension behind the image coordinates of the second target vanishing point, for example, a three-dimensional vector of the second target vanishing point can be formed by adding 1 behind the image coordinates.

Wherein, VP ₁ And VP ₂ If the target vanishing point is known, the camera intrinsic parameters can be obtained by solving the constraint relation between the two target vanishing points and the camera intrinsic parameters. Then, the camera intrinsic parameters and the vanishing points in the gravity direction can be used to calculate the gravity direction dir in the camera coordinate system _y = K ^-1 ⁕VP _y Wherein the y-direction denotes the direction of gravity, VP _y Vector, dir, representing the three dimensions of vanishing points in image space, located in the direction of gravity _y Representing the direction of gravity in the camera coordinate system.

Further, the vanishing point constrains the perspective relation of the corresponding main direction, and the normal direction of a certain physical main structure (such as a wall surface) is assumed to be N _i ，N _i For the amount to be calculated, the corresponding vanishing point of the physical main structure is VP _i Then, the constraint that the normal direction of the physical main body structure and the main direction to which the parallel straight line intersecting to obtain the vanishing point belongs are parallel to each other, i.e. N _i ·（K ^-1 ⁕VP _i ）=1。

In an alternative embodiment, when reconstructing the initial three-dimensional model corresponding to the first spatial object, a reference physical body structure included in the first spatial object may be determined, where the reference physical body structure may be a ground surface or a ceiling surface, and specifically depends on the physical body structure included in the target image. On the basis of determining a reference physical main body structure, constructing a reference plane of a three-dimensional space under a camera coordinate system according to the gravity direction under the camera coordinate system, wherein the reference plane corresponds to the reference physical main body structure; identifying a plurality of reference boundary lines intersecting the reference physical body structure from the plurality of boundary lines, wherein if the reference physical body structure is a ground surface, the plurality of reference boundary lines may be a plurality of ground lines obtained by intersecting a plurality of wall surfaces and the ground surface, and if the reference physical body structure is a ceiling, the plurality of reference boundary lines may be a plurality of ceiling lines obtained by intersecting a plurality of wall surfaces and the ceiling; constructing a reference model main body structure corresponding to the reference physical main body structure on a reference plane according to intersection points among the plurality of reference boundary lines; and constructing other model main body structures corresponding to other physical main body structures on the reference model main body structure according to the preset height from the camera optical center to the reference plane and intersection points between the plurality of reference boundary lines to obtain an initial three-dimensional model corresponding to the first space object, wherein if the reference physical main body structure is the ground, the other physical main body structures can be the wall surface, the ceiling and the like. An exemplary illustration of the baseline model body structure and other model body structures is shown in fig. 3 e. Illustratively, in FIG. 3e, the reference model body structure refers to the ground in the initial three-dimensional model, and the other model body structures refer to the wall in the initial three-dimensional model.

Optionally, one embodiment of the building of the reference model body structure corresponding to the reference physical body structure on the reference plane according to the intersection positions between the reference boundary lines includes: selecting a plurality of effective reference boundary lines from the plurality of reference boundary lines, and sorting the effective reference boundary lines according to included angles between the effective reference boundary lines and an x-axis in an image coordinate system to obtain adjacent relations between the effective reference boundary lines; determining intersection positions between the effective reference boundary lines according to the adjacent relationship between the effective reference boundary lines, and dividing the intersection positions into a first intersection position intersecting the boundary of the target image and a second intersection position not intersecting the boundary of the target image, the first intersection position and the second intersection position being exemplarily illustrated in fig. 3c, but not limited thereto; and drawing a model boundary corresponding to the reference physical main structure on the reference plane according to the first intersection point position and the second intersection point position to obtain a reference model main structure, as shown in fig. 3 d. In fig. 3d, the grid area is a reference plane, the white area on the grid area is a reference model body structure formed by the model boundaries corresponding to the reference physical body structure, and the reference model body structure in fig. 3d is a ground formed by the white area, which is connected to the intersection point position shown in fig. 3 c.

Further alternatively, the embodiment of the present application is not limited to the manner of selecting the effective reference boundary line from the plurality of reference boundary lines, and the following examples are given.

Example B1:selecting a first reference boundary line from the plurality of reference boundary lines according to the included angle between the plurality of reference boundary lines and the x-axis in the image coordinate system; for example, a reference boundary line having an included angle smaller than a set angle threshold value with respect to the x-axis in the image coordinate system is selected as a first reference boundary line from among a plurality of reference boundary lines, and the angle threshold value may be 10 degrees, 15 degrees, 20 degrees, or the like, and further, in the case where there are a plurality of boundary lines having included angles smaller than the angle threshold value, a boundary line having the smallest included angle may be selected as the first reference boundary line from among the boundary lines having included angles smaller than the angle threshold value, or one boundary line may be randomly selected as the first reference boundary line from among the boundary lines having included angles smaller than the angle threshold value; then, based on the first reference boundary line, rejecting some erroneously detected reference boundary lines, specifically, based on the included angles between other reference boundary lines and the first reference boundary line, rejecting the reference boundary line with the included angle smaller than the set threshold valueThe angle threshold may be 3 degrees, 5 degrees, 10 degrees, etc.; the non-eliminated reference boundary line and the first reference boundary line are used as effective reference boundary lines.

Example B2:selecting a first reference boundary line from the plurality of reference boundary lines according to the lengths of the plurality of reference boundary lines; for example, a reference boundary line with a length greater than a set length threshold is selected from the line reference boundary lines as a first reference boundary line, and the length threshold is not limited and is specifically determined according to the size of the target image; and then rejecting some misdetected reference boundary lines according to the first reference boundary line, specifically rejecting the reference boundary lines with included angles smaller than a set included angle threshold value according to included angles between other reference boundary lines and the first reference boundary line, and taking the non-rejected reference boundary lines and the first reference boundary line as effective reference boundary lines.

Example B3:selecting a first reference boundary line from the plurality of reference boundary lines according to the included angles between the plurality of reference boundary lines and the x-axis in the image coordinate system and the lengths of the plurality of reference boundary lines; for example, a candidate reference boundary line whose included angle with the x-axis in the image coordinate system is smaller than a set angle threshold value is selected from a plurality of reference boundary lines, and a reference boundary line whose length is larger than a set length threshold value is selected from the candidate reference boundary lines as a first reference boundary line; and rejecting the reference boundary lines with the included angles smaller than the set included angle threshold value according to the included angles between the other reference boundary lines and the first reference boundary line, and taking the reference boundary lines without rejection and the first reference boundary line as effective reference boundary lines.

Optionally, in a case that the intersection positions between the effective reference boundary lines are divided into a first intersection position and a second intersection position, according to a preset height from the camera optical center to the reference plane and the intersection positions between the plurality of reference boundary lines, constructing other model body structures corresponding to other physical body structures on the reference model body structure to obtain an implementation manner of the initial three-dimensional model corresponding to the first space object, including: determining an adjacent model boundary which is intersected at the second intersection point position on the main structure of the reference model according to the second intersection point position; determining the initial heights of other model main body structures according to the preset height from the optical center of the camera to the reference plane and the preset scaling; and constructing other model main structures on the boundaries of the adjacent models intersected at the second intersection point position according to the initial heights of the other model main structures so as to obtain the initial three-dimensional model corresponding to the first space object. For example, if the reference physical body structure is taken as the ground, then another model body structure is constructed on the adjacent model boundary intersecting at the second intersection position, specifically, a wall surface is constructed on the adjacent model boundary intersecting at the second intersection position, and further, a ceiling is supplemented above the adjacent wall surface to obtain the initial three-dimensional model corresponding to the first space object.

In an alternative embodiment, one embodiment of the optimizing the initial three-dimensional model according to the constraint relationship among the boundary lines, the constraint relationship among the vanishing points in at least two orthogonal main directions, and the camera intrinsic parameters to obtain the target three-dimensional model includes: constructing an optimization function taking the position parameter and/or the height parameter of each model main body structure as an optimization variable according to the constraint relation among a plurality of boundary lines, the constraint relation among vanishing points in at least two orthogonal main directions and the camera internal parameter; and solving the optimization function by adopting a least square algorithm to obtain optimized position parameters and/or height parameters of the main body structure of each model, and adjusting the position of the main body structure of each model according to the optimized position parameters and/or height parameters to obtain the target three-dimensional model.

In this embodiment, the reference model main body structure and the other model main body structures may have position parameters and/or height parameters, wherein the position parameters include a normal vector of the model main body structure and a distance from the model main body structure to an optical center of the camera, and the two pieces of information can uniquely determine the position of the model main body structure in the initial three-dimensional model and the relative position between the model main body structure and the other model main body structures; the height parameter is height information from the model body structure to a reference plane, for example, height information of a ceiling, height information of a wall surface, and height information of a ground surface, where the height information of the ground surface is 0 if the ground surface is used as the reference plane.

Optionally, when an optimization function using the position parameter and/or the height parameter of each model body structure as an optimization variable is constructed, at least one of the first-type optimization term, the second-type optimization term and the third-type optimization term may be constructed according to a constraint relationship between a plurality of boundary lines, a constraint relationship between vanishing points in at least two orthogonal principal directions, and an intra-camera parameter, and the optimization function may be generated according to the at least one optimization term. Preferably, the optimization function can be generated according to the first-class optimization term, the second-class optimization term and the third-class optimization term at the same time.

The method comprises the steps of generating a reference normal vector of each model main body structure under a camera coordinate system according to camera internal parameters and vanishing points perpendicular to the model main body structure, wherein the reference normal vector and a normal vector of the model main body structure are parallel theoretically, and therefore a first-class optimization term can be constructed by taking the dot product of the normal vector of the model main body structure and the reference normal vector as an optimization target, wherein the dot product of the normal vector of the model main body structure and the reference normal vector of the model main body structure is 1.

For any adjacent model main body structure, the normal vector of any adjacent model main body structure should be vertical, for example, the ground is vertical to the wall surface, the wall surface is vertical to the ceiling, and the adjacent wall surface is vertical to the wall surface, so that the second type of optimization item can be constructed by taking the dot product of the normal vectors between any adjacent model main body structures as 0 as an optimization target.

Aiming at any adjacent model main body structure, generating a boundary line of the any adjacent model main body structure under an image coordinate system according to camera internal parameters, a normal vector of the any adjacent model main body structure and the distance from the any adjacent model main body structure to an optical center of a camera; in addition, detecting the boundary lines between the corresponding adjacent physical main structures in the target image based on hough transform, theoretically, the boundary lines between any adjacent model main structures in the image coordinate system (the boundary lines can be regarded as the projection of the boundary lines between any adjacent model main structures in the initial three-dimensional model in the image space) should be the same as the boundary lines between corresponding adjacent physical main structures detected from the target image (the boundary lines can be regarded as the projection of the boundary lines between adjacent physical main structures in the first space object in the image space), therefore, a third type of optimization term can be constructed for the optimization target with the same boundary lines between any adjacent model main structures in the image coordinate system and the same boundary lines between corresponding adjacent physical main structures in the target image.

The construction process of the above optimization term is exemplified by a plane equation corresponding to the main structure of the model. The main body structure of the model to be optimized is wall surface, ceiling and ground, and the equation of the wall surface to be optimized is assumed to be

Estimated from vanishing pointsiThe normal vector of the face wall is

Detected from the imageiThe boundary line between the wall and the ground is marked

From the imageiA face wall and ajMarking the boundary line of the wall

Detected from the imageiThe boundary line between the wall and the ceiling is marked

(ii) a Wherein,

the number of the norm is represented,

is the firstiThe normal vector of each wall surface, belonging to the quantity to be optimized, has an initial value (i.e. the value determined in the initial three-dimensional model),

is the firstiDistance from wall surface to optical center of camera, height of ceiling

Are assumed values when constructing the initial three-dimensional model, and further optimization is required, and therefore, the optimization variables are the position parameters of the wall surface (such as the wall surface equation) and the height parameters of the ceiling

. An exemplary optimization function may be represented as follows:

where T represents a set of walls appearing in the image. And solving the optimization function by adopting a least square algorithm, and obtaining the position parameter (such as a wall surface equation) of the wall surface and the height parameter of the ceiling by taking the minimum sum of all items in the optimization function as an optimization target. The optimization function comprises six items, wherein the first item belongs to the first type of optimization items, the second item and the third item belong to the second type of optimization items, and the fourth item, the fifth item and the sixth item belong to the third type of optimization items. The following is a detailed description:

first item, second itemiNormal vector of wall surface

And obtaining the normal vector of the wall surface according to the vanishing point

The direction of the two should be consistent, the dot product of the two should be 1, and then the dot product of the two should be 0 after subtracting 1, wherein,

is the quantity to be optimized.

In the second term, the first term is,

and

belongs to the normal vectors of the adjacent wall surfaces, the normal vectors of the adjacent wall surfaces are vertical, the dot product of the two normal vectors is 0,

and

are all the quantities to be optimized.

In the third item, the first and second items,

and

respectively representing a wall surface and a ground surface, which are perpendicular to each other,

and

the dot product of (a) is 0,

it is known, i.e. the direction of gravity in the camera coordinate system,

is the quantity to be optimized.

In the fourth item, the first and second items,

is shown asiThe boundary line between the wall surface and the ground,

indicates that the boundary line is projected into the image coordinate system, the boundary line projected into the image coordinate system and the boundary line detected from the image

The difference of (a) is 0, and,

it is known to detect a boundary line between a wall surface and a ground surface existing in a target image based on hough transform.

Fifth item, ceiling boundary line and ceiling line detected from image

The distance between them is 0.

Sixth item, adjacent toiA wall surface and the secondjThe boundary line between the wall surfaces and the first detected from the imageiA wall surface andjboundary line between wall surfaces

The difference therebetween is 0.

In this embodiment, after the optimization, a target three-dimensional model corresponding to the first space object may be obtained. In the embodiment of the application, the 3D structure information is not directly detected from the target image, but the 2D visual information such as the boundary line and the vanishing point is directly detected from the target image, and then the 2D visual information and the Manhattan constraint, the vanishing point constraint and the like of the indoor scene are utilized to generate the 3D structure information. In addition, in the embodiment of the application, boundary lines such as wall lines, ground lines and ceiling lines are directly detected from the target image, the boundary lines are used as constraints to optimize a three-dimensional equation of the wall surface, and the finally generated target three-dimensional model is higher in fitting degree with actual partition lines of the wall surface, the ground and the ceiling and higher in model quality. Optionally, other real scene information such as texture in the 2D image may also be embodied in the target three-dimensional model.

After obtaining the target three-dimensional model corresponding to the first space object, various applications may be developed based on the target three-dimensional model. For example, the target three-dimensional model may be presented to a user for the user to view or understand the three-dimensional structure of the first spatial object. For another example, the target three-dimensional model is applied to an online home decoration scene for viewing the matching effect of the home decoration object (such as a wall painting, a sofa, etc.) and the real environment of the first space object, and the home decoration scheme is selected according to the matching effect. For another example, commodity purchasing is performed based on the target three-dimensional model to check the matching effect of the commodity to be purchased and the real environment of the first space object, and commodity purchasing is performed according to the matching effect. In the following embodiments, an online home decoration scenario and an online shopping scenario will be described as examples.

Fig. 4a is a schematic flowchart of an online home decoration method according to an exemplary embodiment of the present application. As shown in fig. 4a, the method comprises:

401a, responding to an image uploading operation, and acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space objects in a target physical space;

402a, responding to the placing operation of the target home decoration object on the target image, fusing the target home decoration object into a target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model fused with the target home decoration object;

and 403a, projecting the target three-dimensional model of the fusion target home decoration object onto the target image to obtain a home decoration effect graph containing the target home decoration object. In this embodiment, the online home decoration method may be implemented on a terminal device, or may be implemented by matching the terminal device with a server device, which is not limited herein. The following describes an example of a cooperative implementation of a terminal device and a server device.

The user can open a home decoration Application (App) on the terminal device and enter an online home decoration page; on the online home decoration page, a plurality of home decoration modes are included, such as a home decoration mode based on a sample plate, a home decoration mode based on a real background picture, and the like; and the user can select the home decoration mode based on the real background image, and the adding page of the real background image can be displayed in response to the operation of selecting the home decoration mode based on the real background image. The user can acquire a target image of the first space object through the terminal equipment, and can also directly select the target image of the first space object from the gallery; the home appliance App may acquire a target image corresponding to a first spatial object in response to an image capture operation or an image selection operation, where the first spatial object may be at least a partial spatial object in a target physical space. For example, the target physical space is a house to be decorated by a user, the house includes a bedroom, a living room, a kitchen or a bathroom, and the like, the first space object may be a bedroom and a living room, or may be a local space of a living room, which is not limited, and details can be referred to in the foregoing embodiments, which are not described herein for a while.

In this embodiment, after the home appliance App acquires the target image corresponding to the first space object, the target image may be uploaded to the server device, and the server device generates the target three-dimensional model corresponding to the first space object by using the three-dimensional scene reconstruction method described in the foregoing method embodiment, and reference may be made to the foregoing embodiment for a generation manner of the target three-dimensional model, which is not described herein again. It is explained here that the process of generating the three-dimensional model of the object is not perceptible to the user. Fig. 4b is a schematic diagram of a target image and a process from the target image to generation of a target three-dimensional model according to an embodiment of the present application. In fig. 4b, the colored intermediate state model and the simultaneously colored and textured intermediate state model from the target image to the generated three-dimensional model of the target are illustrated from the perspective of the model generation principle, but in practice no intermediate state model will normally be generated. The intermediate state model and the target three-dimensional model have different viewing angles, but the viewing angles are not limited to these. The colors in fig. 4b are not explicitly shown, and are illustrated only by grayscale colors.

After the user selects the target image, the home appliance App may present a variety of home appliance objects in the associated area of the target image, e.g., the home appliance objects may be appliances, furniture, or decorative items, etc. In fig. 4c and 4d, a sofa, a tea table, a carpet, a single chair, and a wall painting are shown as an example in the lower issue area of the target image. In fig. 4c and 4d, the user may select a target home object from among a plurality of home objects, place the target home object on the target image, and, for example, may drag the image of the selected target home object to a corresponding position on the target image and then release it. The home App can respond to the placement operation of the target home object on the target image and provide identification information (such as name, commodity ID or image) of the target home object and position range information of the target home object on the target image to the server device; the server equipment acquires a three-dimensional model corresponding to the target home decoration according to the identification information of the target home decoration object, fuses the three-dimensional model of the target home decoration object into the target three-dimensional model according to the position range information of the target home decoration object on the target image, projects the target three-dimensional model fused with the target home decoration object onto the target image to obtain a home decoration effect image containing the target home decoration object, and provides the home decoration effect image for the terminal equipment; the terminal device displays the home decoration effect picture. In fig. 4c, for example, the user selects the wall painting and places the wall painting at the center of the main wall surface, the perspective deformation in the home decoration effect diagram shown in fig. 4c is correct, and the moving area of the wall painting can be limited to be the wall surface part, so that the accuracy is high. In fig. 4d, the user selects the sofa and the carpet in sequence, places the sofa on the center line of the wall surface and the ground surface of the main body, and lays the carpet on the ground surface, and the home decoration effect diagram shown in fig. 4d has correct perspective relation and reasonable placement position.

In the above embodiment, the terminal device and the server device cooperate with each other to implement the online home decoration process, but the invention is not limited thereto. Of course, it may be implemented independently by the terminal device. Specifically, after the home appliance App acquires the target image corresponding to the first space object, the three-dimensional scene reconstruction method described in the foregoing method embodiment may be adopted to generate a target three-dimensional model corresponding to the first space object; meanwhile, the home decoration App can display various home decoration objects in the associated area of the target image, and the home decoration App can respond to the placement operation of the target home decoration object on the target image and acquire the identification information of the target home decoration object and the position range information of the target home decoration object on the target image; acquiring a three-dimensional model corresponding to the target home decoration according to the identification information of the target home decoration object, fusing the three-dimensional model of the target home decoration object into the target three-dimensional model according to the position range information of the target home decoration object on the target image, and projecting the target three-dimensional model fused with the target home decoration object onto the target image to obtain a home decoration effect picture containing the target home decoration object and display the home decoration effect picture.

In this description, in the above-described embodiment, the target three-dimensional model is constructed in real time, but the present invention is not limited to this, and for example, the target three-dimensional model may be constructed when the target image is first uploaded, and the target three-dimensional model constructed in advance may be directly used in the subsequent online home decoration.

Fig. 4e is a schematic flowchart of a commodity selection method according to an exemplary embodiment of the present application. As shown in fig. 4e, the method comprises:

401b, responding to selection operation on the commodity page, and determining a selected target commodity, wherein the target commodity is provided with a commodity three-dimensional model;

402b, responding to a collocation effect viewing operation, and selecting a target image corresponding to a first space object to be collocated with a target commodity;

403b, adding the three-dimensional model of the commodity into the target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model of the fused target commodity;

404b, projecting the target three-dimensional model fused with the target commodity onto the target image to obtain a matching effect graph of the target commodity and the first space object.

In this embodiment, the commodity selection method may be implemented on the terminal device, or may be implemented by matching the terminal device with the server device, which is not limited herein. The following description is made by taking the product selection method implemented in the terminal device as an example, but is not limited thereto.

In this embodiment, in a shopping scene, when a user purchases some furniture, home appliances, or decoration goods, in order to facilitate the user to select and purchase goods that are more adapted to an actual environment, the user may add the selected goods to a target image corresponding to the first spatial object, and view a matching effect after the adding. The first space object may be any area having a space concept, such as a home environment, a mall environment, an office scene, and the like, and the commodity may be any article that needs to be placed in the first space object, for example, in the home environment, the commodity may be a wardrobe, a wall painting, a table and a chair, and in the mall environment, the commodity may be a locker, a shelf, or a shelf.

In this embodiment, a commodity page may be displayed on the terminal device, a plurality of commodities are displayed on the commodity page, a user may select a target commodity on the commodity page, the terminal device may respond to a selection operation on the commodity page, and determine the selected target commodity, and the target commodity corresponds to the commodity three-dimensional model.

In this embodiment, a collocation effect viewing control can be added on a shopping cart page, a listing page or a commodity detail page, a user can initiate an operation of viewing a commodity collocation effect through the control, and the e-commerce App can enable the user to select a target image corresponding to the first space object in response to the operation; the user can acquire a target image of the first space object through the terminal equipment, and can also directly select the target image of the first space object from the gallery; after the e-commerce App acquires the target image, a target three-dimensional model corresponding to the first space object can be constructed by adopting the three-dimensional scene reconstruction method provided by the embodiment, and the commodity three-dimensional model of the target commodity selected by the user is added to the target three-dimensional model corresponding to the first space object, so that a target three-dimensional model fusing the target commodity is obtained; and projecting the target three-dimensional model fused with the target commodity onto the target image to obtain a collocation effect picture of the target commodity and the first space object, and displaying the collocation effect picture. The user can determine whether to select the target product or not through the collocation effect graph. Further optionally, the matching effect graph can be automatically scored and the matching score can be given according to a preset matching rule or strategy so as to assist the user in determining whether to purchase the target commodity, so that the shopping experience of the user can be improved, the probability of the user for successfully purchasing the required commodity can be improved, and the return and exchange probability can be further reduced.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 101 to 103 may be device a; for another example, the execution subject of

steps

101 and 102 may be device a, and the execution subject of step 103 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 101, 102, etc., are merely used for distinguishing different operations, and the sequence numbers do not represent any execution order per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 5 is a schematic structural diagram of a three-dimensional scene reconstruction apparatus according to an exemplary embodiment of the present application, and as shown in fig. 5, the apparatus includes: an acquisition module 51, a detection module 52, a determination module 53, a reconstruction module 54 and an optimization module 55.

An obtaining module 51, configured to obtain a target image corresponding to a first spatial object, where the first spatial object is at least a partial spatial object in a target physical space;

a detection module 52, configured to detect a plurality of boundary lines existing in the target image and vanishing points in at least two orthogonal main directions, where the boundary lines are the boundary lines between adjacent physical body structures in the first spatial object;

a determining module 53, configured to determine, according to at least two vanishing points in orthogonal principal directions, camera internal parameters of a target camera and a gravity direction in a camera coordinate system, where the target camera is a camera used for shooting a target image;

a reconstruction module 54, configured to reconstruct an initial three-dimensional model corresponding to the first space object according to the multiple boundary lines and a gravity direction in the camera coordinate system, where the initial three-dimensional model includes an adjacent model main body structure corresponding to an adjacent physical main body structure;

and the optimization module 55 is configured to optimize the initial three-dimensional model according to a constraint relationship among the multiple boundary lines, a constraint relationship among vanishing points in at least two orthogonal main directions, and camera internal parameters, so as to obtain a target three-dimensional model.

In an optional embodiment, the detecting module 52 is specifically configured to: inputting the target image into a hough transform-based boundary line detection model for boundary line detection so as to obtain a plurality of boundary lines in the target image; inputting the target image into a hough transform-based vanishing point detection model for vanishing point detection so as to obtain vanishing points in at least two orthogonal main directions in the target image.

In an optional embodiment, the detection module 52 is specifically configured to: inputting a target image into a hough transform-based boundary detection model, and performing feature extraction on the target image by using a first feature extraction network fusing jump connection and multi-scale features in the model to obtain a first target feature map with multiple scales; performing Hough transformation on the first target feature maps in multiple scales by utilizing Hough transformation based on polar coordinates to map straight lines existing in the target image into points in Hough space; and selecting a plurality of target points matched with the characteristics of the boundary lines in the Hough space, and remapping the target points into the image space to obtain a plurality of boundary lines existing in the target image.

In an optional embodiment, the detecting module 52 is specifically configured to: performing feature extraction on the target image to obtain a first intermediate feature map with the maximum scale, and performing downsampling processing on the first intermediate feature map with the maximum scale for N times to obtain a plurality of first intermediate feature maps with other scales; and taking the first intermediate feature map with the minimum scale as a first target feature map with the minimum scale, carrying out up-sampling processing on the first target feature map with the minimum scale for N times, and carrying out jump connection with the first intermediate feature map with the same scale in each up-sampling processing to obtain the first target feature maps with other scales.

In an optional embodiment, the detection module 52 is specifically configured to: carrying out Hough transformation on the first target feature maps in multiple scales to obtain second target feature maps in multiple scales in Hough space based on polar coordinates; carrying out scale transformation on the second target feature maps with multiple scales to obtain multiple feature maps with the same scale, and splicing the multiple feature maps with the same scale to obtain a third target feature map; and performing convolution dimensionality reduction on the third target feature map to obtain a two-dimensional image in the Hough space, wherein the two-dimensional image comprises a plurality of points, and each point corresponds to a straight line existing in the target image.

In an optional embodiment, the detecting module 52 is specifically configured to: inputting a target image into a hough transform-based vanishing point detection model, and performing feature extraction on the target image by using a second feature extraction network in the model to obtain a fourth target feature map; performing Hough transform on the fourth target feature map by utilizing Hough transform based on a Gaussian sphere to map straight lines existing in the target image to points in a Gaussian sphere space; and selecting vanishing points on at least two orthogonal principal directions with probability values and angle values meeting requirements in the Gaussian spherical space, and remapping the vanishing points into an image space to obtain the vanishing points on at least two orthogonal principal directions in the target image.

In an optional embodiment, the detection module 52 is specifically configured to: carrying out Hough transformation on the fourth target feature map to obtain a fifth target feature map in Hough space based on polar coordinates; performing Hough convolution on the fifth target feature map in Hough space to obtain a sixth target feature map; performing spherical Gaussian spherical transformation on the sixth target feature map to obtain a seventh target feature map in a Gaussian spherical space; and performing spherical convolution on the seventh target feature map in a Gaussian spherical space to obtain an eighth target feature map, wherein the eighth target feature map comprises a plurality of points, and each point corresponds to a straight line in the target image.

In an optional embodiment, the detection module 52 is specifically configured to: selecting points with probability values meeting set requirements from the multiple points as vanishing points according to the probability values of the points in the eighth target feature map; at least two vanishing points with angles larger than a set angle are selected from the vanishing points to be used as vanishing points in at least two orthogonal main directions.

In an optional embodiment, the determining module 53 is specifically configured to: selecting two target vanishing points closest to the optical center of the camera from the at least two limited vanishing points under the condition that the vanishing points in the at least two orthogonal main directions comprise the at least two limited vanishing points; determining camera intrinsic parameters of the target camera according to the constraint relation between the two target vanishing points and the camera intrinsic parameters; and converting vanishing points in the gravity direction in at least two orthogonal main directions into a camera coordinate system according to camera intrinsic parameters to obtain the gravity direction in the camera coordinate system.

In an alternative embodiment, the reconstruction module 54 is specifically configured to: according to the gravity direction under the camera coordinate system, constructing a reference plane of a three-dimensional space under the camera coordinate system, wherein the reference plane corresponds to a reference physical main body structure contained in the first space object; identifying a plurality of reference boundary lines intersecting the reference physical main body structure from the plurality of boundary lines, and constructing a reference model main body structure corresponding to the reference physical main body structure on a reference plane according to intersection points of the plurality of reference boundary lines; and constructing other model main body structures corresponding to other physical main body structures on the reference model main body structure according to the height from the preset camera optical center to the reference plane and the intersection point positions between the plurality of reference boundary lines so as to obtain the initial three-dimensional model corresponding to the first space object.

In an alternative embodiment, the reconstruction module 54 is specifically configured to: selecting effective reference boundary lines from the plurality of reference boundary lines, and sorting the effective reference boundary lines according to included angles between the effective reference boundary lines and an x-axis in an image coordinate system to obtain adjacent relations between the effective reference boundary lines; determining intersection point positions between the effective reference boundary lines according to the adjacent relation between the effective reference boundary lines, and dividing the intersection point positions into a first intersection point position intersecting with the boundary of the target image and a second intersection point position not intersecting with the boundary of the target image; and drawing a model boundary corresponding to the reference physical main structure on the reference plane according to the first intersection point position and the second intersection point position so as to obtain the reference model main structure.

In an alternative embodiment, the reconstruction module 54 is specifically configured to: selecting a first reference boundary line from the plurality of reference boundary lines according to the included angles between the plurality of reference boundary lines and the x-axis in the image coordinate system and/or the lengths of the plurality of reference boundary lines; and rejecting the reference boundary lines with the included angles smaller than the set included angle threshold value according to the included angles between the other reference boundary lines and the first reference boundary line, and taking the reference boundary lines without rejection and the first reference boundary line as effective reference boundary lines.

In an alternative embodiment, the reconstruction module 54 is specifically configured to: determining the boundary of the adjacent model intersected at the second intersection point position on the main structure of the reference model according to the second intersection point position; determining the initial heights of other model main body structures according to the preset height from the optical center of the camera to the reference plane and the preset scaling; and according to the initial height, constructing other model main body structures on the boundaries of the adjacent models intersected at the second intersection point position so as to obtain the initial three-dimensional model corresponding to the first space object.

In an optional embodiment, the optimization module 55 is specifically configured to: according to the constraint relation among a plurality of boundary lines, the constraint relation among vanishing points in at least two orthogonal main directions and camera internal parameters, constructing an optimization function with the position parameters and/or height parameters of each model main body structure as optimization variables, wherein the position parameters comprise normal vectors of the model main body structure and the distance from the normal vectors to the optical center of the camera; and solving the optimization function by adopting a least square algorithm to obtain optimized position parameters and/or height parameters of the main body structure of each model, and adjusting the position of the main body structure of each model according to the optimized position parameters and/or height parameters to obtain the target three-dimensional model.

In an optional embodiment, the optimization module 55 is specifically configured to: aiming at each model main structure, generating a reference normal vector of the model main structure under a camera coordinate system according to camera internal parameters and vanishing points vertical to the model main structure, and constructing a first type of optimization item by taking the dot product of the normal vector of the model main structure and the reference normal vector as an optimization target; aiming at any adjacent model main body structure, constructing a second type of optimization item by taking the dot product of normal vectors between any adjacent model main body structures as 0 as an optimization target; aiming at any adjacent model main body structure, generating a boundary line of any adjacent model main body structure in an image coordinate system according to camera internal parameters, a normal vector of any adjacent model main body structure and a distance from the normal vector to an optical center of a camera; constructing a third type of optimization item by taking the same boundary line of any adjacent model main body structure in the image coordinate system and the boundary line between corresponding adjacent physical main body structures in the target image as an optimization target; and generating an optimization function according to the first type of optimization items, the second type of optimization items and the third type of optimization items. For a detailed implementation of the above operations, reference may be made to the foregoing embodiments, which are not described herein again.

An embodiment of the present application further provides an online home decoration device, which includes: the device comprises an acquisition module, a fusion module and a projection module. The acquisition module is used for responding to the image uploading operation and acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space object in a target physical space. And the fusion module is used for responding to the placement operation of the target home decoration object on the target image, fusing the target home decoration object into the target three-dimensional model corresponding to the first space object, and obtaining the target three-dimensional model fusing the target home decoration object. The projection module is used for projecting the target three-dimensional model fused with the target home decoration object onto the target image so as to obtain a home decoration effect graph containing the target home decoration object; the target three-dimensional model is constructed according to the steps in the three-dimensional scene reconstruction method provided by the embodiment of the application. For details of the above operations, reference may be made to the foregoing embodiments, which are not described herein again.

An embodiment of the present application further provides a commodity selection device, and the device includes: the device comprises a determining module, a selecting module, an adding module and a projecting module. And the determining module is used for responding to the selection operation on the commodity page and determining the selected target commodity, and the target commodity is provided with a commodity three-dimensional model. And the selection module is used for responding to the collocation effect viewing operation and selecting the target image corresponding to the first space object to be collocated with the target commodity. And the adding module is used for adding the commodity three-dimensional model into the target three-dimensional model corresponding to the first space object so as to obtain a target three-dimensional model fused with the target commodity. The fusion module is used for projecting the target three-dimensional model of the fusion target commodity onto the target image so as to obtain a collocation effect graph of the target commodity and the first space object; the target three-dimensional model is constructed according to the steps in the three-dimensional scene reconstruction method provided by the embodiment of the application. For a detailed implementation of the above operations, reference may be made to the foregoing embodiments, which are not described herein again.

Fig. 6 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 6, the apparatus includes: a memory 64 and a processor 65.

Memory 64 for storing computer programs and may be configured to store other various data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on the computing platform.

The memory 64 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 65, coupled to the memory 64, for executing computer programs in the memory 64 for: acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space objects in a target physical space; detecting a plurality of boundary lines and at least two vanishing points in the orthogonal main direction, wherein the boundary lines are the boundary lines between adjacent physical main body structures in the first space object; determining camera internal parameters of a target camera and a gravity direction under a camera coordinate system according to at least two vanishing points in the orthogonal main directions, wherein the target camera is a camera used for shooting a target image; according to the multiple boundary lines and the gravity direction under the camera coordinate system, an initial three-dimensional model corresponding to the first space object is reconstructed, wherein the initial three-dimensional model comprises an adjacent model main body structure corresponding to an adjacent physical main body structure; and optimizing the initial three-dimensional model according to the constraint relation among the multiple boundary lines, the constraint relation among the vanishing points in at least two orthogonal main directions and the camera internal parameters to obtain the target three-dimensional model.

In an alternative embodiment, the processor 65, when detecting a plurality of boundary lines and vanishing points in at least two orthogonal principal directions present in the target image based on the hough transform, is specifically configured to: inputting the target image into a hough transform-based boundary detection model for detecting the boundary to obtain a plurality of boundaries existing in the target image; inputting the target image into a hough transform-based vanishing point detection model for vanishing point detection so as to obtain vanishing points in at least two orthogonal main directions in the target image.

In an alternative embodiment, when the object image is input to the hough transform-based boundary line detection model for detecting boundary lines, so as to obtain a plurality of boundary lines existing in the object image, the processor 65 is specifically configured to: inputting a target image into a hough transform-based boundary detection model, and performing feature extraction on the target image by using a first feature extraction network fusing jump connection and multi-scale features in the model to obtain a first target feature map with multiple scales; performing Hough transformation on the first target feature maps in multiple scales by utilizing Hough transformation based on polar coordinates to map straight lines existing in the target image into points in Hough space; and selecting a plurality of target points matched with the characteristics of the boundary lines in the Hough space, and remapping the target points into the image space to obtain a plurality of boundary lines existing in the target image.

In an optional embodiment, when the processor 65 performs feature extraction on the target image by using the first feature extraction network fusing the skip connection and the multi-scale features to obtain the first target feature map with multiple scales, the processor is specifically configured to: performing feature extraction on the target image to obtain a first intermediate feature map with the maximum scale, and performing downsampling processing on the first intermediate feature map with the maximum scale for N times to obtain first intermediate feature maps with other scales; taking the first intermediate feature map with the minimum scale as a first target feature map with the minimum scale, carrying out up-sampling processing on the first target feature map with the minimum scale for N times, and carrying out jump connection with the first intermediate feature map with the same scale in each up-sampling processing to obtain the first target feature maps with other scales.

In an alternative embodiment, the processor 65 is specifically configured to, when performing hough transform on the first target feature maps of multiple scales by using hough transform based on polar coordinates to map straight lines existing in the target image to points in hough space: carrying out Hough transformation on the first target feature maps of multiple scales to obtain second target feature maps of multiple scales in Hough space based on polar coordinates; carrying out scale transformation on the second target feature maps with multiple scales to obtain multiple feature maps with the same scale, and splicing the multiple feature maps with the same scale to obtain a third target feature map; and performing convolution dimensionality reduction on the third target feature map to obtain a two-dimensional image in the Hough space, wherein the two-dimensional image comprises a plurality of points, and each point corresponds to a straight line existing in the target image.

In an alternative embodiment, when the target image is input into a hough transform-based vanishing point detection model for vanishing point detection to obtain vanishing points in at least two orthogonal principal directions in the target image, the processor 65 is specifically configured to: inputting a target image into a hough transform-based vanishing point detection model, and performing feature extraction on the target image in the model by using a second feature extraction network to obtain a fourth target feature map; carrying out Hough transform on the fourth target feature map by utilizing Hough transform based on a Gaussian sphere so as to map straight lines existing in the target image into points in a Gaussian sphere space; and selecting vanishing points in at least two orthogonal principal directions in the Gaussian spherical space, wherein the probability value and the angle value meet requirements, and remapping the vanishing points in the at least two orthogonal principal directions in the target image into the image space to obtain the vanishing points in the at least two orthogonal principal directions in the target image.

In an alternative embodiment, the processor 65 is specifically configured to, when performing hough transform on the fourth target feature map by using hough transform based on a gaussian sphere to map straight lines existing in the target image to points in the gaussian sphere space: carrying out Hough transformation on the fourth target feature map to obtain a fifth target feature map in Hough space based on polar coordinates; performing Hough convolution on the fifth target feature map in Hough space to obtain a sixth target feature map; performing spherical Gaussian spherical transformation on the sixth target feature map to obtain a seventh target feature map in a Gaussian spherical space; and performing spherical convolution on the seventh target feature map in a Gaussian spherical space to obtain an eighth target feature map, wherein the eighth target feature map comprises a plurality of points, and each point corresponds to a straight line in the target image.

In an alternative embodiment, the processor 65 is specifically configured to, when selecting the vanishing points in the at least two orthogonal principal directions in the gaussian spherical space where the probability values and the angle values satisfy the requirements: selecting points with probability values meeting set requirements from the multiple points as vanishing points according to the probability values of the points in the eighth target feature map; at least two vanishing points with angles larger than a set angle are selected from the vanishing points to be used as vanishing points in at least two orthogonal main directions.

In an alternative embodiment, the processor 65 is specifically configured to, when determining the camera intrinsic parameters of the target camera and the direction of gravity in the camera coordinate system from vanishing points in at least two orthogonal principal directions: selecting two target vanishing points closest to the optical center of the camera from the at least two limited vanishing points under the condition that the vanishing points in the at least two orthogonal main directions comprise the at least two limited vanishing points; determining camera intrinsic parameters of the target camera according to the constraint relation between the two target vanishing points and the camera intrinsic parameters; and converting vanishing points positioned in the gravity direction in at least two orthogonal main directions into a camera coordinate system according to the camera intrinsic parameters to obtain the gravity direction in the camera coordinate system.

In an alternative embodiment, the processor 65 is specifically configured to, when reconstructing the initial three-dimensional model corresponding to the first spatial object according to the plurality of boundary lines and the direction of gravity in the camera coordinate system: according to the gravity direction under the camera coordinate system, constructing a reference plane of a three-dimensional space under the camera coordinate system, wherein the reference plane corresponds to a reference physical main body structure contained in the first space object; identifying a plurality of reference boundary lines intersecting the reference physical main body structure from the plurality of boundary lines, and constructing a reference model main body structure corresponding to the reference physical main body structure on a reference plane according to intersection points of the plurality of reference boundary lines; and constructing other model main body structures corresponding to other physical main body structures on the reference model main body structure according to the height from the preset camera optical center to the reference plane and the intersection point positions between the plurality of reference boundary lines so as to obtain the initial three-dimensional model corresponding to the first space object.

In an alternative embodiment, the processor 65 is specifically configured to, when constructing the reference model body structure corresponding to the reference physical body structure on the reference plane according to the intersection positions between the plurality of reference boundary lines: selecting effective reference boundary lines from a plurality of reference boundary lines, and sorting the effective reference boundary lines according to included angles between the effective reference boundary lines and an x-axis in an image coordinate system to obtain adjacent relations between the effective reference boundary lines; determining intersection point positions between the effective reference boundary lines according to the adjacent relation between the effective reference boundary lines, and dividing the intersection point positions into a first intersection point position intersecting with the boundary of the target image and a second intersection point position not intersecting with the boundary of the target image; and drawing a model boundary corresponding to the reference physical main structure on the reference plane according to the first intersection point position and the second intersection point position so as to obtain the reference model main structure.

In an alternative embodiment, the processor 65, when selecting the valid reference boundary line from the plurality of reference boundary lines, is specifically configured to: selecting a first reference boundary line from the plurality of reference boundary lines according to the included angles between the plurality of reference boundary lines and the x-axis in the image coordinate system and/or the lengths of the plurality of reference boundary lines; and rejecting the reference boundary lines with the included angles smaller than the set included angle threshold value according to the included angles between the other reference boundary lines and the first reference boundary line, and taking the reference boundary lines without rejection and the first reference boundary line as effective reference boundary lines.

In an alternative embodiment, the processor 65 is specifically configured to, when constructing other model main body structures corresponding to other physical main body structures on the reference model main body structure according to the preset height from the camera optical center to the reference plane and the intersection positions between the plurality of reference boundary lines to obtain the initial three-dimensional model corresponding to the first space object: determining an adjacent model boundary which is intersected at the second intersection point position on the main structure of the reference model according to the second intersection point position; determining the initial heights of other model main body structures according to the preset height from the optical center of the camera to the reference plane and the preset scaling; and according to the initial height, constructing other model main body structures on the boundaries of the adjacent models intersected at the second intersection point position so as to obtain the initial three-dimensional model corresponding to the first space object.

In an alternative embodiment, the processor 65 is specifically configured to, when optimizing the initial three-dimensional model according to the constraint relationship between the plurality of boundary lines, the constraint relationship between the vanishing points in the at least two orthogonal main directions, and the camera internal parameters to obtain the target three-dimensional model: according to the constraint relation among a plurality of boundary lines, the constraint relation among vanishing points in at least two orthogonal main directions and camera internal parameters, constructing an optimization function with the position parameters and/or height parameters of each model main body structure as optimization variables, wherein the position parameters comprise normal vectors of the model main body structure and the distance from the normal vectors to the optical center of the camera; and solving the optimization function by adopting a least square algorithm to obtain the optimized position parameter and/or height parameter of each model main body structure, and adjusting the position of each model main body structure according to the optimized position parameter and/or height parameter to obtain the target three-dimensional model.

In an alternative embodiment, the processor 65 is specifically configured to, when constructing the optimization function using the position parameter and/or the height parameter of each model body structure as the optimization variable according to the constraint relationship between the plurality of boundary lines, the constraint relationship between the vanishing points in the at least two orthogonal principal directions, and the intra-camera parameter: aiming at each model main structure, generating a reference normal vector of the model main structure under a camera coordinate system according to camera internal parameters and vanishing points vertical to the model main structure, and constructing a first type of optimization item by taking the dot product of the normal vector of the model main structure and the reference normal vector as an optimization target; aiming at any adjacent model main body structure, constructing a second type of optimization item by taking the dot product of normal vectors between any adjacent model main body structures as 0 as an optimization target; aiming at any adjacent model main body structure, generating a boundary line of any adjacent model main body structure in an image coordinate system according to camera internal parameters, a normal vector of any adjacent model main body structure and a distance from the camera optical center; constructing a third type of optimization item by taking the same boundary line of any adjacent model main body structure in the image coordinate system and the boundary line between corresponding adjacent physical main body structures in the target image as an optimization target; and generating an optimization function according to the first type of optimization items, the second type of optimization items and the third type of optimization items.

Further, as shown in fig. 6, the electronic device further includes: communication components 66, display 67, power components 68, audio components 69, and the like. Only some of the components are schematically shown in fig. 6, and the electronic device is not meant to include only the components shown in fig. 6.

An embodiment of the present application further provides an electronic device, where a structure of the electronic device is the same as or similar to that of the electronic device shown in fig. 6, and may be specifically implemented by referring to the structure of the electronic device shown in fig. 6, where differences between the electronic device provided in this embodiment and the electronic device in the embodiment shown in fig. 6 mainly lie in: the functions performed by the processor to execute the computer programs stored in the memory are different. For the electronic device provided in this embodiment, the processor thereof executes the computer program stored in the memory, and is configured to: responding to image uploading operation, and acquiring a target image corresponding to a first space object, wherein the first space object is at least part of space objects in a target physical space; responding to the placement operation of the target home decoration object on the target image, and fusing the target home decoration object into a target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model fused with the target home decoration object; projecting the target three-dimensional model fused with the target home decoration object onto a target image to obtain a home decoration effect graph containing the target home decoration object; the target three-dimensional model is constructed according to the steps in the three-dimensional scene reconstruction method provided by the embodiment of the application.

An embodiment of the present application further provides an electronic device, where a structure of the electronic device is the same as or similar to that of the electronic device shown in fig. 6, and may be specifically implemented by referring to the structure of the electronic device shown in fig. 6, where differences between the electronic device provided in this embodiment and the electronic device in the embodiment shown in fig. 6 are mainly that: the functions performed by the processor to execute the computer programs stored in the memory are different. For the electronic device provided in this embodiment, the processor thereof executes the computer program stored in the memory, and is configured to: responding to selection operation on a commodity page, and determining a selected target commodity, wherein the target commodity is provided with a commodity three-dimensional model; responding to the matching effect checking operation, and selecting a target image corresponding to a first space object to be matched with the target commodity; adding the commodity three-dimensional model to a target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model fusing the target commodity; projecting the target three-dimensional model fused with the target commodity onto a target image to obtain a matching effect graph of the target commodity and the first space object; the target three-dimensional model is constructed according to the steps in the three-dimensional scene reconstruction method provided by the embodiment of the application.

Accordingly, the present application also provides a computer readable storage medium storing a computer program, and the computer program can implement the steps in the methods shown in fig. 1b, fig. 4a and fig. 4e when executed.

The communication component of fig. 6 described above is configured to facilitate communication between the device in which the communication component is located and other devices in a wired or wireless manner. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

The display of fig. 6 described above includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The power supply assembly of fig. 6 provides power to the various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

The audio component of fig. 6 described above may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims

1. A method for reconstructing a three-dimensional scene, comprising:

acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space objects in a target physical space;

detecting a plurality of boundary lines and at least two vanishing points in orthogonal main directions existing in the target image, wherein the boundary lines are the boundary lines between adjacent physical main body structures in the first space object;

determining camera internal parameters of a target camera and a gravity direction under a camera coordinate system according to the vanishing points in the at least two orthogonal main directions, wherein the target camera is a camera used for shooting the target image;

reconstructing an initial three-dimensional model corresponding to the first space object according to the multiple boundary lines and the gravity direction under the camera coordinate system, wherein the initial three-dimensional model comprises an adjacent model main body structure corresponding to the adjacent physical main body structure;

and optimizing the initial three-dimensional model according to the constraint relation among the multiple boundary lines, the constraint relation among the vanishing points in the at least two orthogonal main directions and the camera internal parameters to obtain a target three-dimensional model.

2. The method of claim 1, wherein detecting lines of intersection and vanishing points in at least two orthogonal principal directions present in the target image comprises:

inputting the target image into a hough transform-based boundary line detection model for boundary line detection so as to obtain a plurality of boundary lines existing in the target image;

inputting the target image into a hough transform-based vanishing point detection model for vanishing point detection so as to obtain vanishing points in at least two orthogonal main directions in the target image.

3. The method according to claim 2, wherein inputting the target image into a hough transform-based boundary line detection model for boundary line detection to obtain a plurality of boundary lines existing in the target image comprises:

in a hough transform-based boundary detection model, performing feature extraction on the target image by using a first feature extraction network fusing jump connection and multi-scale features to obtain a first target feature map with multiple scales;

performing Hough transform on the first target feature maps in the multiple scales by utilizing Hough transform based on polar coordinates to map straight lines existing in the target image to points in Hough space;

selecting a plurality of target points matched with the characteristics of the boundary lines in the Hough space, and remapping the target points into an image space to obtain a plurality of boundary lines existing in the target image.

4. The method of claim 3, wherein performing a Hough transform on the first target feature maps at the plurality of scales using a polar Hough transform to map straight lines existing in the target image to points in Hough space comprises:

carrying out Hough transformation on the first target feature maps of the multiple scales to obtain second target feature maps of the multiple scales in Hough space based on polar coordinates;

carrying out scale transformation on the second target feature maps with multiple scales to obtain multiple feature maps with the same scale, and splicing the multiple feature maps with the same scale to obtain a third target feature map;

and performing convolution dimensionality reduction on the third target feature map to obtain a two-dimensional image in a Hough space, wherein the two-dimensional image comprises a plurality of points, and each point corresponds to a straight line existing in the target image.

5. The method according to claim 2, wherein inputting the target image into a hough transform-based vanishing point detection model for vanishing point detection to obtain vanishing points in at least two orthogonal principal directions in the target image comprises:

in a hough transform-based vanishing point detection model, performing feature extraction on the target image by using a second feature extraction network to obtain a fourth target feature map;

carrying out Hough transform on the fourth target feature map by utilizing Hough transform based on a Gaussian spherical surface so as to map straight lines existing in the target image into points in a Gaussian spherical surface space;

and selecting vanishing points on at least two orthogonal principal directions with probability values and angle values meeting requirements in the Gaussian spherical space, and remapping the vanishing points into an image space to obtain the vanishing points on the at least two orthogonal principal directions in the target image.

6. The method of claim 5, wherein performing a Hough transform on the fourth target feature map using a Hough transform based on Gaussian spheres to map straight lines existing in the target image to points in Gaussian sphere space comprises:

carrying out Hough transformation on the fourth target feature map to obtain a fifth target feature map in Hough space based on polar coordinates;

performing Hough convolution on the fifth target feature map in the Hough space to obtain a sixth target feature map;

performing Gaussian spherical transformation on the sixth target feature map to obtain a seventh target feature map in a Gaussian spherical space;

and performing spherical convolution on the seventh target feature map in the Gaussian spherical space to obtain an eighth target feature map, wherein the eighth target feature map comprises a plurality of points, and each point corresponds to a straight line in the target image.

7. The method of claim 1, wherein determining the camera intrinsic parameters of the target camera and the direction of gravity in the camera coordinate system from vanishing points in the at least two orthogonal principal directions comprises:

selecting two target vanishing points closest to the optical center of the camera from the at least two finite vanishing points if the vanishing points in the at least two orthogonal principal directions include the at least two finite vanishing points;

determining camera intrinsic parameters of the target camera according to the constraint relation between the two target vanishing points and the camera intrinsic parameters;

and converting vanishing points positioned in the gravity direction in the at least two orthogonal main directions into a camera coordinate system according to the camera intrinsic parameters to obtain the gravity direction in the camera coordinate system.

8. The method of claim 1, wherein reconstructing an initial three-dimensional model corresponding to the first spatial object based on the plurality of lines of intersection and a direction of gravity in the camera coordinate system comprises:

according to the gravity direction in the camera coordinate system, constructing a reference plane of a three-dimensional space in the camera coordinate system, wherein the reference plane corresponds to a reference physical main body structure contained in the first space object;

identifying a plurality of reference boundary lines which intersect with the reference physical main body structure from the plurality of boundary lines, and constructing a reference model main body structure corresponding to the reference physical main body structure on the reference plane according to intersection point positions among the plurality of reference boundary lines;

and constructing other model main body structures corresponding to other physical main body structures on the reference model main body structure according to the preset height from the camera optical center to the reference plane and the intersection point positions between the plurality of reference boundary lines so as to obtain the initial three-dimensional model corresponding to the first space object.

9. The method of claim 8, wherein constructing a reference model body structure corresponding to the reference physical body structure on the reference plane according to intersection positions between the plurality of reference boundary lines comprises:

selecting effective reference boundary lines from the plurality of reference boundary lines, and sorting the effective reference boundary lines according to included angles between the effective reference boundary lines and an x axis in an image coordinate system to obtain adjacent relations between the effective reference boundary lines;

determining intersection point positions between the effective reference boundary lines according to the adjacent relation between the effective reference boundary lines, and dividing the intersection point positions into a first intersection point position intersecting with the boundary of the target image and a second intersection point position not intersecting with the boundary of the target image;

and drawing a model boundary corresponding to the reference physical main body structure on the reference plane according to the first intersection point position and the second intersection point position so as to obtain a reference model main body structure.

10. The method of claim 9, wherein constructing other model body structures corresponding to other physical body structures on the reference model body structure according to the preset height from the camera optical center to the reference plane and the intersection point positions between the plurality of reference boundary lines to obtain the initial three-dimensional model corresponding to the first spatial object comprises:

determining an adjacent model boundary on the main structure of the reference model, which intersects at the second intersection point position, according to the second intersection point position;

determining the initial height of the other model main body structures according to the preset height from the camera optical center to the reference plane and the preset scaling;

and according to the initial height, constructing other model main body structures on the boundaries of the adjacent models intersected at the second intersection point position so as to obtain the initial three-dimensional model corresponding to the first space object.

11. The method according to any one of claims 1-10, wherein optimizing the initial three-dimensional model to obtain the target three-dimensional model based on the constraint relationship between the plurality of boundary lines, the constraint relationship between vanishing points in the at least two orthogonal principal directions, and the intra-camera parameters comprises:

according to the constraint relation among the multiple boundary lines, the constraint relation among the vanishing points in the at least two orthogonal main directions and the camera internal parameters, constructing an optimization function with the position parameters and/or height parameters of each model main body structure as optimization variables, wherein the position parameters comprise the normal vector of the model main body structure and the distance from the normal vector to the optical center of the camera;

solving the optimization function by adopting a least square algorithm to obtain optimized position parameters and/or height parameters of each model main body structure;

and adjusting the position of each model main body structure according to the optimized position parameters and/or height parameters to obtain the target three-dimensional model.

12. The method according to claim 11, wherein constructing an optimization function with the position parameter and/or the height parameter of each model body structure as an optimization variable according to the constraint relationship between the plurality of boundary lines, the constraint relationship between the vanishing points in the at least two orthogonal principal directions, and the intra-camera parameters comprises:

aiming at each model main structure, generating a reference normal vector of the model main structure under a camera coordinate system according to the camera internal parameters and vanishing points vertical to the model main structure, and constructing a first type of optimization item by taking the dot product of the normal vector of the model main structure and the reference normal vector as an optimization target;

aiming at any adjacent model main body structure, constructing a second type of optimization item by taking the dot product of normal vectors between any adjacent model main body structures as 0 as an optimization target;

aiming at any adjacent model main body structure, generating a boundary line of the any adjacent model main body structure under an image coordinate system according to the camera internal parameters, the normal vector of the any adjacent model main body structure and the distance from the camera optical center; constructing a third type of optimization item by taking the boundary line of any adjacent model main body structure in an image coordinate system as the same as the boundary line between corresponding adjacent physical main body structures in the target image as an optimization target;

and generating the optimization function according to the first type of optimization items, the second type of optimization items and the third type of optimization items.

13. An online home decoration method, comprising:

responding to an image uploading operation, and acquiring a target image corresponding to a first space object, wherein the first space object is at least a part of space objects in a target physical space;

responding to the placement operation of a target home decoration object on the target image, and fusing the target home decoration object into a target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model fused with the target home decoration object;

projecting the target three-dimensional model fused with the target home decoration object onto the target image to obtain a home decoration effect graph containing the target home decoration object;

wherein the target three-dimensional model is constructed according to the steps in the method of any one of claims 1-12.

14. A method of selecting an article, comprising:

responding to selection operation on a commodity page, and determining a selected target commodity, wherein the target commodity is provided with a commodity three-dimensional model;

responding to a collocation effect viewing operation, and selecting a target image corresponding to a first space object to be collocated with the target commodity;

adding the commodity three-dimensional model to a target three-dimensional model corresponding to the first space object to obtain a target three-dimensional model fused with the target commodity;

projecting the target three-dimensional model fused with the target commodity onto the target image to obtain a collocation effect diagram of the target commodity and the first space object;

15. An electronic device, comprising: a memory and a processor; the memory for storing a computer program; the processor is coupled with the memory for executing the computer program for performing the steps of the method of any one of claims 1-12, claim 13 and claim 14.

16. A computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, causes the processor to carry out the steps of the method of any one of claims 1-12, 13 and 14.