CN111583417A

CN111583417A - Method and device for constructing indoor VR scene with combined constraint of image semantics and scene geometry, electronic equipment and medium

Info

Publication number: CN111583417A
Application number: CN202010399289.4A
Authority: CN
Inventors: 吴洪宇; 于昊楠; 陈小武
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2020-08-25
Anticipated expiration: 2040-05-12
Also published as: CN111583417B

Abstract

The embodiment of the disclosure discloses a method, a device, electronic equipment and a medium for constructing an indoor VR scene constrained by image semantics and scene geometry. One embodiment of the method comprises: obtaining an image and the height of a camera from the ground; preprocessing the image; inputting the preprocessed image into a depth residual error network trained in advance, and outputting the characteristic information of the image; detecting information of straight lines in the characteristic information; inputting the information of the straight line into a convolutional neural network, and outputting the information of the wall connection part in the image; determining indoor layout information; inputting the characteristic information into a pre-trained target detection network, and outputting the indoor object information in the image; obtaining the position information of the indoor object; and finishing the construction of an indoor and virtual reality interactive scene. According to the embodiment, the real image scene is accurately and quickly restored through the construction of the indoor VR interactive scene.

Description

A method for constructing indoor VR scenes with joint constraints of image semantics and scene geometry Laws, devices, electronic equipment and media

技术领域technical field

本公开的实施例涉及计算机技术领域，具体涉及一种图像语义和场景几何联合约束的室内VR场景构建的方法、装置、电子设备和介质。Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, apparatus, electronic device, and medium for constructing an indoor VR scene with joint constraints of image semantics and scene geometry.

背景技术Background technique

随着虚拟现实技术(VR，Virtual Reality)的迅速发展,使得交互性的实现变得更加容易。VR技术越来越多的应用于我们日常生活中。例如，在室内装潢中，将VR技术应该到室内装潢中就可以与客户进行交互设计从而解决了客户对室内装潢的要求的多样性这个难题。由于虚拟现实场景对细节要求较高，手工制作场景不仅耗费大量人力、物力，其成本高居不下，是制约虚拟现实技术发展的原因之一。目前存在的解决方案有模板式场景生成、使用各种设备采集真实世界的三维信息并转换为三维模型等方法。这些生成方法依赖于一套预先设计的场景模板，随后通过设置参数来批量生成三维场景。但是这些方法有很大的局限性，精准度不高。With the rapid development of virtual reality technology (VR, Virtual Reality), the realization of interactivity has become easier. VR technology is increasingly used in our daily life. For example, in interior decoration, VR technology should be incorporated into interior decoration to interact with customers to design, thus solving the problem of the diversity of customers' requirements for interior decoration. Due to the high requirements for details in virtual reality scenes, manual production of scenes not only consumes a lot of manpower and material resources, but also has a high cost, which is one of the reasons that restrict the development of virtual reality technology. The existing solutions include template-based scene generation, using various devices to collect real-world 3D information and convert it into 3D models. These generation methods rely on a set of pre-designed scene templates, and then generate 3D scenes in batches by setting parameters. However, these methods have great limitations and the accuracy is not high.

发明内容SUMMARY OF THE INVENTION

本公开的内容部分用于以简要的形式介绍构思，这些构思将在后面的具体实施方式部分被详细描述。本公开的内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征，也不旨在用于限制所要求的保护的技术方案的范围。This summary of the disclosure serves to introduce concepts in a simplified form that are described in detail in the detailed description that follows. The content section of this disclosure is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

本公开的一些实施例提出了一种图像语义和场景几何联合约束的室内VR场景构建的方法、装置、电子设备和介质，来解决以上背景技术部分提到的技术问题。Some embodiments of the present disclosure propose a method, apparatus, electronic device, and medium for indoor VR scene construction with joint constraints of image semantics and scene geometry, to solve the technical problems mentioned in the background section above.

第一方面，本公开的一些实施例提供了一种图像语义和场景几何联合约束的室内VR场景构建的方法，该方法包括：获得室内图像和拍摄上述室内图像时相机距离地面的高度；对室内图像进行预处理；将预处理后的上述室内图像输入至预先训练的深度残差网络中，输出上述室内图像的特征信息；检测上述特征信息中直线的信息；将上述直线的信息输入至卷积神经网络中，输出上述室内图像中墙体连接部分的信息，其中，上述墙体连接部分在上述室内图像中是墙体之间连接的直线；基于上述相机距离地面的高度和上述墙体连接部分的信息，得到室内布局信息；将上述特征信息输入至预先训练的目标检测网络中，输出上述室内图像中室内物体信息；基于上述室内物体信息和上述墙体连接部分的信息，得到室内物体的位置信息；根据上述室内布局信息和上述室内物体的位置信息来完成室内与虚拟现实交互场景的构建。In a first aspect, some embodiments of the present disclosure provide a method for constructing an indoor VR scene with joint constraints of image semantics and scene geometry. The method includes: obtaining an indoor image and the height of the camera from the ground when shooting the above-mentioned indoor image; The image is preprocessed; the preprocessed indoor image is input into the pre-trained deep residual network, and the feature information of the indoor image is output; the information of the straight line in the feature information is detected; the information of the straight line is input to the convolution In the neural network, the information of the wall connecting part in the above-mentioned indoor image is output, wherein the above-mentioned wall connecting part is a straight line connecting between the walls in the above-mentioned indoor image; based on the height of the above-mentioned camera from the ground and the above-mentioned wall connecting part to obtain indoor layout information; input the above-mentioned feature information into a pre-trained target detection network, and output the indoor object information in the above-mentioned indoor image; based on the above-mentioned indoor object information and the information of the above-mentioned wall connection part, obtain the position of the indoor object information; according to the above-mentioned indoor layout information and the above-mentioned position information of the indoor objects, the construction of the indoor and virtual reality interaction scene is completed.

第二方面，本公开的一些实施例提供了一种图像语义和场景几何联合约束的室内VR场景构建的装置，装置包括：获取单元，被配置成获得室内图像和拍摄上述室内图像时相机距离地面的高度；处理单元，被配置成对室内图像进行预处理；第一输入输出单元，被配置成将预处理后的上述室内图像输入至预先训练的深度残差网络中，输出上述室内图像的特征信息；检测单元，被配置成检测上述特征信息中直线的信息；第二输入输出单元，被配置成将上述直线的信息输入至卷积神经网络中，输出上述室内图像中墙体连接部分的信息，其中，上述墙体连接部分在上述室内图像中是墙体之间连接的直线；第一确定单元，被配置成基于上述相机距离地面的高度和上述墙体连接部分的信息，得到室内布局信息；第三输入输出单元，被配置成将上述特征信息输入至预先训练的目标检测网络中，输出上述室内图像中室内物体信息；第二确定单元，被配置成基于上述室内物体信息和上述墙体连接部分的信息，得到室内物体的位置信息；第三确定单元，被配置成根据上述室内布局信息和上述室内物体的位置信息来完成室内与虚拟现实交互场景的构建。In a second aspect, some embodiments of the present disclosure provide an apparatus for constructing an indoor VR scene with joint constraints of image semantics and scene geometry. The apparatus includes: an acquisition unit configured to acquire an indoor image and to capture the indoor image when the camera is far from the ground The processing unit is configured to preprocess the indoor image; the first input and output unit is configured to input the preprocessed indoor image into the pre-trained deep residual network, and output the characteristics of the indoor image The detection unit is configured to detect the information of the straight line in the above-mentioned characteristic information; the second input and output unit is configured to input the information of the above-mentioned straight line into the convolutional neural network, and output the information of the connecting part of the wall in the above-mentioned indoor image , wherein the wall connecting part is a straight line connecting the walls in the indoor image; the first determining unit is configured to obtain indoor layout information based on the height of the camera from the ground and the information of the wall connecting part The third input and output unit is configured to input the above-mentioned feature information into the pre-trained target detection network, and outputs the indoor object information in the above-mentioned indoor image; The second determination unit is configured to be based on the above-mentioned indoor object information and the above-mentioned wall. Connect the information of the part to obtain the position information of the indoor objects; the third determination unit is configured to complete the construction of the indoor and virtual reality interaction scene according to the indoor layout information and the position information of the indoor objects.

第三方面，本公开的一些实施例提供了一种电子设备，包括：一个或多个处理器；存储装置，其上存储有一个或多个程序，当一个或多个程序被一个或多个处理器执行，使得一个或多个处理器实现如第一、二方面中任一的方法。In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device on which one or more programs are stored, when one or more programs are stored by one or more The processor executes such that the one or more processors implement the method according to any one of the first and second aspects.

第四方面，本公开的一些实施例提供了一种计算机可读介质，其上存储有计算机程序，其中，程序被处理器执行时实现如第一、二方面中任一的方法。In a fourth aspect, some embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, wherein the program implements the method according to any one of the first and second aspects when executed by a processor.

本公开的上述各个实施例中的一个实施例具有如下有益效果：首先，获得室内图像和拍摄上述室内图像时相机距离地面的高度。进而，对室内图像进行预处理，经过处理后的数据使深度学习网络更容易学习，侧面提高了网络的准确率。再尔，将预处理后的上述室内图像输入至预先训练的深度残差网络中，用来提取上述室内图像的特征信息，为后面墙体连接部分识别和物体识别打下了基础。然后，检测上述特征信息中直线的信息并存储下来，用于检测出所有可能是墙体连接部分的直线。这样就可以利用卷积神经网络对所有可能是墙体连接部分的直线进行评分。根据上述相机距离地面的高度和上述墙体连接部分的信息，基本确定室内布局信息。将上述特征信息输入至预先训练的目标检测网络中，用于确定上述室内图像中室内物体位置、大小等信息。然后，基于上述室内物体信息和上述墙体连接部分的信息，得到室内物体的位置信息。最后，根据上述室内布局信息和上述室内物体的位置信息来完成室内与虚拟现实交互场景的构建。One of the above-mentioned various embodiments of the present disclosure has the following beneficial effects: First, an indoor image is obtained and the height of the camera from the ground when the above-mentioned indoor image is captured. Furthermore, the indoor images are preprocessed, and the processed data makes it easier for the deep learning network to learn, and the accuracy of the network is improved. Furthermore, the pre-processed indoor image is input into a pre-trained deep residual network to extract the feature information of the indoor image, which lays a foundation for the recognition of the connecting part of the wall and object recognition. Then, the information of the straight lines in the above feature information is detected and stored, so as to detect all the straight lines that may be the connecting parts of the wall. This makes it possible to use a convolutional neural network to score all lines that might be connected parts of a wall. The indoor layout information is basically determined according to the height of the camera from the ground and the information of the connecting part of the wall. The above feature information is input into the pre-trained target detection network, which is used to determine the position, size and other information of indoor objects in the above indoor image. Then, based on the above-mentioned indoor object information and the above-mentioned information of the wall connecting part, the position information of the indoor object is obtained. Finally, the construction of the indoor and virtual reality interaction scene is completed according to the above-mentioned indoor layout information and the above-mentioned position information of the indoor objects.

附图说明Description of drawings

结合附图并参考以下具体实施方式，本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中，相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的，原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.

图1是根据本公开的用于室内VR场景构建的方法的一些实施例的流程图。1 is a flowchart of some embodiments of a method for indoor VR scene construction according to the present disclosure.

图2是根据本公开的用于室内VR场景构建的装置的一些实施例的结构示意图。FIG. 2 is a schematic structural diagram of some embodiments of an apparatus for indoor VR scene construction according to the present disclosure.

图3是适于用来实现本公开的一些实施例的电子设备的结构示意图。3 is a schematic structural diagram of an electronic device suitable for implementing some embodiments of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例，然而应当理解的是，本公开可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例。相反，提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是，本公开的附图及实施例仅用于示例性作用，并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings. The embodiments of this disclosure and features of the embodiments may be combined with each other without conflict.

需要注意，本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分，并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.

需要注意，本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的，本领域技术人员应当理解，除非在上下文另有明确指出，否则应该理解为“一个或多个”。It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的，而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

下面将参考附图并结合实施例来详细说明本公开。The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

参考图1，示出了根据本公开的用于室内VR场景构建的方法的一些实施例的流程100。该方法可以由服务器来执行。该用于生成分类信息的方法，包括以下步骤：Referring to FIG. 1 , a flow 100 of some embodiments of a method for indoor VR scene construction according to the present disclosure is shown. This method can be performed by the server. The method for generating classification information includes the following steps:

步骤101，获得室内图像和拍摄上述室内图像时相机距离地面的高度。Step 101: Obtain an indoor image and the height of the camera from the ground when the indoor image is captured.

在一些实施例中，用于室内VR场景构建的方法的执行主体(可以是服务器)可以通过多种方式获取室内图像和拍摄上述室内图像时相机距离地面的高度。例如，上述执行主体可以通过有线连接方式或者无线连接方式获取室内图像和拍摄室内图像时相机距离地面的高度。需要指出的是，上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In some embodiments, the execution subject (which may be a server) of the method for indoor VR scene construction can obtain the indoor image and the height of the camera from the ground when the indoor image is captured in various ways. For example, the above-mentioned executive body may acquire the indoor image and the height of the camera from the ground when the indoor image is captured through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .

步骤102，对室内图像进行预处理。Step 102, preprocessing the indoor image.

在一些实施例中，上述执行主体对步骤101获取的室内图像进行预处理，上述预处理是指对输入图像进行特征抽取、分割和匹配前所进行的处理。图像预处理的主要目的是消除图像中无关的信息，恢复有用的真实信息，增强有关信息的可检测性和最大限度地简化数据，从而改进特征抽取、图像分割、匹配和识别的可靠性。预处理过程一般有数字化、几何变换、归一化、平滑、复原和增强等步骤。In some embodiments, the above-mentioned execution body performs preprocessing on the indoor image acquired in step 101, and the above-mentioned preprocessing refers to the processing performed before feature extraction, segmentation and matching are performed on the input image. The main purpose of image preprocessing is to eliminate irrelevant information in images, restore useful real information, enhance the detectability of relevant information and simplify data to the greatest extent, thereby improving the reliability of feature extraction, image segmentation, matching and recognition. The preprocessing process generally includes steps such as digitization, geometric transformation, normalization, smoothing, restoration and enhancement.

在一些实施例的一些可选的实现方式中，上述对室内图像进行预处理，包括：将上述室内图像调整为预定分辨率。In some optional implementations of some embodiments, the above-mentioned preprocessing of the indoor image includes: adjusting the above-mentioned indoor image to a predetermined resolution.

步骤103，将预处理后的上述室内图像输入至预先训练的深度残差网络中，输出上述室内图像的特征信息。Step 103: Input the preprocessed indoor image into a pre-trained deep residual network, and output the feature information of the indoor image.

在一些实施例中，上述执行主体预处理后的上述室内图像输入至预先训练的深度残差网络(Deep Residual Network，ResNet)取出上述室内图像的特征信息。其中，使用ResNet-50神经网络作为特征提取神经网络。该神经网络输入为224*224*3(长*宽*颜色通道数)，输出为1024*1*256的图像特征张量。ResNet使用名为残差块(Residual Block)的网络结构，通过1*1*64卷积、3*3*64卷积、1*1*256卷积三个层，层与层之间使用ReLU激活函数，有效降低了神经网络中参数的数量。In some embodiments, the indoor image preprocessed by the main body is input to a pre-trained deep residual network (Deep Residual Network, ResNet) to extract feature information of the indoor image. Among them, the ResNet-50 neural network is used as the feature extraction neural network. The input of the neural network is 224*224*3 (length*width*number of color channels), and the output is an image feature tensor of 1024*1*256. ResNet uses a network structure called Residual Block, through three layers of 1*1*64 convolution, 3*3*64 convolution, and 1*1*256 convolution, and ReLU is used between layers The activation function effectively reduces the number of parameters in the neural network.

整流线性单位函数(Rectified Linear Unit,ReLU),又称修正线性单元，是一种神经网络中常用的激励函数。ReLU使用如下公式对神经网络每一层的输出进行处理：Rectified Linear Unit (ReLU), also known as rectified linear unit, is a commonly used excitation function in neural networks. ReLU uses the following formula to process the output of each layer of the neural network:

f(x)＝max{0，x}f(x)=max{0,x}

步骤104，检测所述特征信息中直线的信息。Step 104: Detect the information of the straight line in the feature information.

在一些实施例中，上述执行主体可以将上述步骤103得到的上述室内图像的特征信息。使用直线检测算法检测输入全景图片中的直线，储存检测出的直线位置作为墙体位置判别的特征使用。其中，上述直线检测算法检测可以包括但不限于以下至少一项：CannyLines算法，霍夫变换。In some embodiments, the above-mentioned execution subject may use the characteristic information of the above-mentioned indoor image obtained in the above-mentioned step 103. The straight line detection algorithm is used to detect the straight line in the input panoramic image, and the detected straight line position is stored as the feature of wall position discrimination. Wherein, the detection of the above-mentioned straight line detection algorithm may include but not limited to at least one of the following: CannyLines algorithm, Hough transform.

步骤105，将上述直线的信息输入至卷积神经网络中，输出上述室内图像中墙体连接部分的信息。Step 105: Input the information of the straight line into the convolutional neural network, and output the information of the connecting part of the wall in the indoor image.

在一些实施例中，上述执行主体可以步骤104得到的上述直线的信息输入至卷积神经网络中，输出上述室内图像中墙体连接部分的信息。其中，在墙体位置判别阶段，使用一个卷积神经网络(CNN)作为判别器，对特征提取中提取出的直线位置进行评分，通过评分判定哪些直线为墙体间的连接部分。该神经网络使用如下损失函数进行迭代：In some embodiments, the above-mentioned executive body may input the information of the above-mentioned straight line obtained in step 104 into a convolutional neural network, and output the information of the wall connection part in the above-mentioned indoor image. Among them, in the wall position discrimination stage, a convolutional neural network (CNN) is used as the discriminator to score the straight line positions extracted in the feature extraction, and the straight lines are determined by the score to be the connecting part between the walls. The neural network iterates using the following loss function:

其中，f_loss是损失函数，该损失函数由两部分相加而成：第一部分为属于房间布局部分的边界像素损失，第二部分为属于墙体间连接部分的像素损失。α、β为自定参数，分别表示两部分像素计算结果在损失值中的权重，需要根据实际模型训练参数决定；

是像素样本i的真实判定结果，P_i为像素样本i预测属于墙角的概率，n为图像中的像素点总数。该神经网络训练实为对该损失函数计算值最小化的过程。Among them, f _loss is the loss function, which is formed by adding two parts: the first part is the loss of boundary pixels belonging to the room layout part, and the second part is the loss of pixels belonging to the connection part between walls. α and β are self-defined parameters, which respectively represent the weight of the two parts of the pixel calculation results in the loss value, which need to be determined according to the actual model training parameters;

is the true judgment result of pixel sample i, P _i is the predicted probability of pixel sample i belonging to the corner of the wall, and n is the total number of pixels in the image. The neural network training is actually the process of minimizing the calculated value of the loss function.

为了对生成的墙体信息进行优化，以提高重建准确度，使用如下评分公式对最终生成的墙体信息进行评价：In order to optimize the generated wall information to improve the reconstruction accuracy, the following scoring formula is used to evaluate the final generated wall information:

其中，Score(L)是分数，C为检测出的墙体间夹角坐标(x,y)的集合；l_c∈C表示l_c是集合C中待检测的一个夹角像素坐标点；l_f为检测出的全部地板像素坐标点(x,y)的集合，l_f是集合L_f中待检测的一个地板像素坐标点；P_corner、P_floor分别表示对于每一个检测出的墙体或地板部分，该像素属于此目标的概率。ω_corner、ω_floor为权重参数，对于不同数据集需要通过实验确定，也可通过网格搜索(Grid Search)来确定两个值。Among them, Score(L) is the score, C is the set of detected angle coordinates (x, y) between walls; l _c ∈ C indicates that l _c is an angle pixel coordinate point to be detected in the set C; l _f is the set of all detected floor pixel coordinate points (x, y), _{lf is a floor pixel coordinate point to be detected in the set L f} _; P _corner , P _floor respectively represent for each detected wall or The floor part, the probability that this pixel belongs to this target. ω _corner and ω _floor are weight parameters, which need to be determined through experiments for different data sets, and the two values can also be determined through Grid Search.

步骤106，基于上述相机距离地面的高度和上述墙体连接部分的信息，得到室内布局信息。Step 106 , obtaining indoor layout information based on the height of the camera from the ground and the information on the connecting portion of the wall.

在一些实施例中，首先根据上述相机距离地面的高度，通过投影变换计算出图像中墙角上下的xy坐标到三维环境中的xyz坐标。然后对每一个墙角均计算一次地板及房顶的xyz坐标，取上述全部值的平均值作为重建出的房顶和地板位置。进而遍历全部识别出的可能墙体位置并进行筛选，只保留墙体间相互垂直的一系列墙体作为候选。再尔对筛选出的候选墙体进行投票，删除长度小于0.16m及墙体两端与相机位置所形成夹角小于5°的墙体。针对全景图片中存在的畸变现象，进行图像拉伸处理，提升重建精度。最终得到室内布局信息。In some embodiments, firstly, according to the height of the camera from the ground, the xy coordinates of the upper and lower corners in the image to the xyz coordinates in the three-dimensional environment are calculated through projection transformation. Then, the xyz coordinates of the floor and the roof are calculated once for each wall corner, and the average of all the above values is taken as the reconstructed roof and floor positions. Then, all the identified possible wall positions are traversed and screened, and only a series of walls that are perpendicular to each other between the walls are retained as candidates. Then, vote on the selected candidate walls, and delete the walls whose length is less than 0.16m and the angle formed by the two ends of the wall and the camera position is less than 5°. For the distortion phenomenon in the panoramic image, the image stretching process is performed to improve the reconstruction accuracy. Finally, the indoor layout information is obtained.

步骤107，将上述特征信息输入至预先训练的目标检测网络中，输出上述室内图像中室内物体信息。Step 107: Input the above feature information into a pre-trained target detection network, and output the indoor object information in the above indoor image.

在室内物体识别阶段，使用R-FCN神经网络进行物体识别及定位。R-FCN神经网络输入为步骤103输出的特征张量,其网络除特征层外，还包括三个部分：区域候选网络RPN(Region Proposal Network)、位置敏感预测层，以及一个RoI池化层。In the indoor object recognition stage, the R-FCN neural network is used for object recognition and localization. The input of the R-FCN neural network is the feature tensor output in step 103. In addition to the feature layer, the network also includes three parts: the region candidate network RPN (Region Proposal Network), the position-sensitive prediction layer, and an RoI pooling layer.

其中，对原始图像进行网格式切割，拆分出若干个可能包含待识别目标的兴趣区域(RoI)。为了对各区域内目标进行检测，同时解决神经网络特征的平移不变性问题，使得神经网络在迁移到新任务时能够保持原有的准确率，R-FCN引入了位置敏感池化操作。具体方式为：对于生成的每一个区域候选框，将其拆分为k*k个位置敏感区域，每个区域大小约为

卷积层最终对每个物体分类输出k²个概率。随后对概率谱进行池化，该池化操作计算公式如下：Among them, the original image is cut in a grid format, and several regions of interest (RoI) that may contain the target to be identified are split. In order to detect the targets in each area and solve the translation invariance problem of neural network features, so that the neural network can maintain the original accuracy when migrating to a new task, R-FCN introduces a position-sensitive pooling operation. The specific method is: for each region candidate frame generated, it is divided into k*k position-sensitive regions, and the size of each region is about

The convolutional layer finally outputs k ² probabilities for each object classification. Then the probability spectrum is pooled, and the calculation formula of the pooling operation is as follows:

其中，r_c(i，j，Θ)为该RoI区域(i，j)坐标处对于类别c的池化计算结果，在这里，Θ是网络的学习参数。(x，y)为第(i，j)个格点中的像素值。

为该RoI区域(i，j)坐标处对于c个类别的评分图谱，(x0，y0)为该RoI的中心坐标。

表示ROI中(x,y)相对坐标转换为绝对坐标后，属于第c个类别的评分值Among them, rc (i, j, Θ) is the pooling calculation result for category _c at the coordinates of the RoI region (i, j), where Θ is the learning parameter of the network. (x, y) is the pixel value in the (i, j)th grid point.

is the score map for c categories at the coordinates of the RoI region (i, j), and (x0, y0) is the center coordinate of the RoI.

Indicates the score value belonging to the c-th category after the (x, y) relative coordinates in the ROI are converted to absolute coordinates

在训练R-FCN网络时，使用如下损失函数来评估神经网络的性能：When training the R-FCN network, the following loss function is used to evaluate the performance of the neural network:

其中，t_{x，y，w，h}是当前边界框(bounding box)，x，y，w，h分别表示该框的中心坐标及长、宽；s为在框t_{x，y，w，h}内待识别目标；c^*为训练集RoI区域的真实分类，c^*＝0表示分类为背景；c^*＞0表示方括号内表达式为真时值为1，否则为0；

表示待识别目标s属于分类c^*的概率；L_regression(t，t^*)表示计算该边界框对于正确的标注(ground truth)的回归损失，t^*为对应的ground truth box；

为分类的交叉熵损失；L_regression为检测到目标的边界区域回归损失，λ为不同分类间的平衡权重。若要使得训练时各待检测目标具有相同权重，可设置λ＝1。Among them, t _{x, y, w, h} is the current bounding box (bounding box), x, y, w, h represent the center coordinates, length and width of the box, respectively; s is the box t _{x, y, w, h} The target to be identified in the inner; c ^* is the real classification of the RoI area of the training set, c ^* =0 means the classification is background; c ^* >0 means that the expression in the square brackets is true, the value is 1, otherwise it is 0;

Represents the probability that the target s to be recognized belongs to the classification c ^* ; L _regression (t, t ^* ) represents the regression loss of the bounding box for the correct annotation (ground truth), and t ^* is the corresponding ground truth box;

is the cross-entropy loss of classification; L _regression is the regression loss of the boundary region of the detected target, and λ is the balance weight between different classifications. To make each target to be detected have the same weight during training, λ=1 can be set.

使用R-FCN对图像进行目标检测后，输出每一个识别到的目标的中心坐标P_obj及物体位置的长度、宽度信息。After using R-FCN to detect the target in the image, output the center coordinate P _obj of each recognized target and the length and width information of the object position.

步骤108，基于上述室内物体信息和上述墙体连接部分的信息，得到室内物体的位置信息。Step 108 , based on the above-mentioned indoor object information and the above-mentioned information of the connecting part of the wall, obtain the position information of the indoor object.

在一些实施例中，首先根据各墙体位置坐标，对二维全景图像进行裁减，使用上述深度残差网络对墙体位置进行识别，检测出各个墙面连接处在二维全景图像中的位置坐标P_wall，并提取检测到的各目标区域框中心坐标P_obj，计算P_obj到每一个P_wall的相对坐标，即可得到重建出的三维场景中该物体的位置。In some embodiments, the two-dimensional panoramic image is first cropped according to the position coordinates of each wall, and the position of the wall is identified by using the above-mentioned depth residual network, and the position of each wall connection in the two-dimensional panoramic image is detected. The coordinates P _wall , and the detected center coordinates P _obj of each target area frame are extracted, and the relative coordinates of P _obj to each P _wall are calculated to obtain the position of the object in the reconstructed three-dimensional scene.

步骤109，根据上述室内布局信息和上述室内物体的位置信息来完成室内与虚拟现实交互场景的构建。Step 109: Complete the construction of the indoor and virtual reality interaction scene according to the indoor layout information and the position information of the indoor objects.

在一些实施例中，通过步骤106得到的室内布局信息和步骤108得到的室内物体的位置信息来完成虚拟现实交互场景的构建。In some embodiments, the construction of the virtual reality interaction scene is completed by using the indoor layout information obtained in step 106 and the position information of indoor objects obtained in step 108 .

继续参考图2，作为对上述各图上述方法的实现，本公开提供了一种图像语义和场景几何联合约束的室内VR场景构建的装置的一些实施例，这些装置实施例与图1上述的那些方法实施例相对应，该装置具体可以应用于各种电子设备中。Continuing to refer to FIG. 2 , as an implementation of the above methods in the above figures, the present disclosure provides some embodiments of an apparatus for constructing an indoor VR scene with joint constraints of image semantics and scene geometry. These apparatus embodiments are similar to those described above in FIG. 1 . Corresponding to the method embodiments, the apparatus can be specifically applied to various electronic devices.

如图2所示，一些实施例的用于生成分类信息的装置200包括：As shown in FIG. 2, the apparatus 200 for generating classification information in some embodiments includes:

获取单元201、处理单元202、第一输入输出单元203、检测单元204、第二输入输出单元205、第一确定单元206、第三输入输出单元207、第二确定单元208、第三确定单元209。其中，获取单元，被配置成获得室内图像和拍摄上述室内图像时相机距离地面的高度；处理单元，被配置成对上述室内图像进行预处理；第一输入输出单元，被配置成将预处理后的上述室内图像输入至预先训练的深度残差网络中，输出上述室内图像的特征信息；检测单元，被配置成检测上述特征信息中直线的信息；第二输入输出单元，被配置成将上述直线的信息输入至卷积神经网络中，输出上述室内图像中墙体连接部分的信息，其中，上述墙体连接部分在上述室内图像中是墙体之间连接的直线；第一确定单元，被配置成基于上述相机距离地面的高度和上述墙体连接部分的信息，得到室内布局信息；第三输入输出单元，被配置成将上述特征信息输入至预先训练的目标检测网络中，输出上述室内图像中室内物体信息；第二确定单元，被配置成基于上述室内物体信息和上述墙体连接部分的信息，得到室内物体的位置信息；第三确定单元，被配置成根据上述室内布局信息和上述室内物体的位置信息来完成室内与虚拟现实交互场景的构建。Acquisition unit 201 , processing unit 202 , first input and output unit 203 , detection unit 204 , second input and output unit 205 , first determination unit 206 , third input and output unit 207 , second determination unit 208 , third determination unit 209 . The acquisition unit is configured to obtain indoor images and the height of the camera from the ground when the indoor images are captured; the processing unit is configured to preprocess the indoor images; the first input and output unit is configured to The above-mentioned indoor image is input into the pre-trained deep residual network, and the characteristic information of the above-mentioned indoor image is output; the detection unit is configured to detect the information of the straight line in the above-mentioned characteristic information; the second input and output unit is configured to The information is input into the convolutional neural network, and the information of the wall connection part in the indoor image is output, wherein the wall connection part is a straight line connected between the walls in the indoor image; the first determination unit is configured The indoor layout information is obtained based on the height of the camera from the ground and the information of the connecting part of the wall; the third input and output unit is configured to input the above feature information into the pre-trained target detection network, and output the above indoor image. Indoor object information; a second determining unit, configured to obtain position information of indoor objects based on the above-mentioned indoor object information and information of the above-mentioned wall connecting parts; a third determining unit, configured to be based on the above-mentioned indoor layout information and the above-mentioned indoor objects location information to complete the construction of indoor and virtual reality interaction scenes.

可以理解的是，该装置200中记载的诸单元与参考图1描述的方法中的各个步骤相对应。由此，上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置200及其中包含的单元，在此不再赘述。It can be understood that the units recorded in the apparatus 200 correspond to the respective steps in the method described with reference to FIG. 1 . Therefore, the operations, features, and beneficial effects described above with respect to the method are also applicable to the apparatus 200 and the units included therein, and details are not described herein again.

下面参考图3，其示出了适于用来实现本公开的一些实施例的电子设备服务器300的结构示意图。图3示出的服务器仅仅是一个示例，不应对本公开的实施例的功能和使用范围带来任何限制。Referring next to FIG. 3 , a schematic structural diagram of an electronic device server 300 suitable for implementing some embodiments of the present disclosure is shown. The server shown in FIG. 3 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

如图3所示，电子设备300可以包括处理装置(例如中央处理器、图形处理器等)301，其可以根据存储在只读存储器(ROM)302中的程序或者从存储装置308加载到随机访问存储器(RAM)303中的程序而执行各种适当的动作和处理。在RAM 303中，还存储有电子设备300操作所需的各种程序和数据。处理装置301、ROM 302以及RAM303通过总线304彼此相连。输入/输出(I/O)接口305也连接至总线304。As shown in FIG. 3, an electronic device 300 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 301 that may be loaded into random access according to a program stored in a read only memory (ROM) 302 or from a storage device 308 Various appropriate actions and processes are executed by the programs in the memory (RAM) 303 . In the RAM 303, various programs and data necessary for the operation of the electronic device 300 are also stored. The processing device 301 , the ROM 302 , and the RAM 303 are connected to each other through a bus 304 . An input/output (I/O) interface 305 is also connected to bus 304 .

通常，以下装置可以连接至I/O接口305：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置306；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置307；包括例如磁带、硬盘等的存储装置308；以及通信装置309。通信装置309可以允许电子设备300与其他设备进行无线或有线通信以交换数据。虽然图3示出了具有各种装置的电子设备300，但是应理解的是，并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图3中示出的每个方框可以代表一个装置，也可以根据需要代表多个装置。Typically, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 307 of a computer, etc.; a storage device 308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 309. Communication means 309 may allow electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 3 shows electronic device 300 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in FIG. 3 can represent one device, and can also represent multiple devices as needed.

特别地，根据本公开的一些实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的一些实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的一些实施例中，该计算机程序可以通过通信装置309从网络上被下载和安装，或者从存储装置308被安装，或者从ROM 302被安装。在该计算机程序被处理装置301执行时，执行本公开的一些实施例的方法中限定的上述功能。In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In some such embodiments, the computer program may be downloaded and installed from the network via the communication device 309 , or from the storage device 308 , or from the ROM 302 . When the computer program is executed by the processing device 301, the above-mentioned functions defined in the methods of some embodiments of the present disclosure are performed.

需要说明的是，本公开的一些实施例上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的一些实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的一些实施例中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。It should be noted that, in some embodiments of the present disclosure, the computer-readable medium described above may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the foregoing two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. Rather, in some embodiments of the present disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

在一些实施方式中，客户端、服务器可以利用诸如HTTP(HyperText TransferProtocol，超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信，并且可以与任意形式或介质的数字数据通信(例如，通信网络)互连。通信网络的示例包括局域网(“LAN”)，广域网(“WAN”)，网际网(例如，互联网)以及端对端网络(例如，ad hoc端对端网络)，以及任何当前已知或未来研发的网络。In some embodiments, the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium (eg, a communications network) interconnected. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.

上述计算机可读介质可以是上述装置中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备：获得室内图像和拍摄上述室内图像时相机距离地面的高度；对室内图像进行预处理；将预处理后的上述室内图像输入至预先训练的深度残差网络中，输出上述室内图像的特征信息；检测上述特征信息中直线的信息；将上述直线的信息输入至卷积神经网络中，输出上述室内图像中墙体连接部分的信息，其中，上述墙体连接部分在上述室内图像中是墙体之间连接的直线；基于上述相机距离地面的高度和上述墙体连接部分的信息，得到室内布局信息；将上述特征信息输入至预先训练的目标检测网络中，输出上述室内图像中室内物体信息；基于上述室内物体信息和上述墙体连接部分的信息，得到室内物体的位置信息；根据上述室内布局信息和上述室内物体的位置信息来完成室内与虚拟现实交互场景的构建。The above-mentioned computer-readable medium may be included in the above-mentioned apparatus; or may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the indoor image and the height of the camera from the ground when shooting the above-mentioned indoor image; Perform preprocessing; input the preprocessed indoor image into a pre-trained deep residual network, and output the feature information of the indoor image; detect the information of the straight line in the feature information; input the information of the straight line into the convolutional neural network In the network, the information of the wall connection part in the above-mentioned indoor image is output, wherein the above-mentioned wall connection part is a straight line connected between the walls in the above-mentioned indoor image; based on the height of the camera from the ground and the wall connection part. information to obtain indoor layout information; input the above feature information into a pre-trained target detection network, and output the indoor object information in the above indoor image; based on the above indoor object information and the above information of the wall connection part, obtain the position information of the indoor object ; Complete the construction of the indoor and virtual reality interaction scene according to the above-mentioned indoor layout information and the above-mentioned position information of the indoor objects.

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的一些实施例的操作的计算机程序代码，上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of some embodiments of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, or a combination thereof, Also included are conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开的一些实施例中的单元可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中，例如，可以描述为：获取单元、处理单元、第一输入输出单元、检测单元、第二输入输出单元、第一确定单元、第三输入输出单元、第二确定单元、第三确定单元。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定，例如，获取单元还可以被描述为“获得室内图像和拍摄上述室内图像时相机距离地面的高度的单元”。The units described in some embodiments of the present disclosure may be implemented by means of software, and may also be implemented by means of hardware. The described unit may also be set in the processor, for example, it may be described as: an acquisition unit, a processing unit, a first input and output unit, a detection unit, a second input and output unit, a first determination unit, a third input and output unit, The second determination unit and the third determination unit. The names of these units do not limit the unit itself in some cases. For example, the acquisition unit can also be described as "a unit that obtains indoor images and the height of the camera from the ground when capturing the above indoor images."

本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如，非限制性地，可以使用的示范类型的硬件逻辑部件包括：现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.

以上描述仅为本公开的一些较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开的实施例中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above descriptions are merely some preferred embodiments of the present disclosure and illustrations of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned inventive concept, the above-mentioned Other technical solutions formed by any combination of technical features or their equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in the embodiments of the present disclosure (but not limited to) with similar functions.

Claims

1. A method for constructing an indoor VR scene with joint constraints of image semantics and scene geometry, comprising:

obtaining an indoor image and the height of the camera from the ground when taking the indoor image;

preprocessing the indoor image;

Inputting the preprocessed indoor image into a pre-trained deep residual network, and outputting feature information of the indoor image;

detecting the information of the straight line in the feature information;

Inputting the information of the straight line into the convolutional neural network, and outputting the information of the wall connecting part in the indoor image, wherein the wall connecting part is the straight line connecting the walls in the indoor image;

Obtain indoor layout information based on the height of the camera from the ground and the information of the connecting part of the wall;

Inputting the feature information into a pre-trained target detection network, and outputting indoor object information in the indoor image;

Based on the information of the indoor object and the information of the connecting part of the wall, obtain the position information of the indoor object;

The construction of the indoor and virtual reality interaction scene is completed according to the indoor layout information and the position information of the indoor objects.

2. The method according to claim 1, wherein the preprocessing of the indoor image comprises:

The indoor image is adjusted to a predetermined resolution.

3. The method according to claim 1, wherein the detecting the information of the straight line in the feature information comprises:

The position of the straight line in the feature information is detected by a straight line detection algorithm, and the position information of the straight line is stored.

4. The method according to claim 3, wherein, inputting the information of the straight line into a convolutional neural network, and outputting the information of the connected part of the wall in the indoor image, comprising:

Using the convolutional neural network as a discriminator, the stored position of the straight line is scored, and the straight line of the connecting part of the wall is obtained by scoring.

5. The method according to claim 1, wherein the inputting the feature information of the indoor image into a pre-trained target detection network, and outputting the indoor object information in the indoor image, comprises:

Using a target detection network to detect indoor objects on the indoor image, and output the center coordinates, length and width information of the indoor objects in the indoor image.

6. The method according to claim 5, wherein the construction of the indoor and virtual reality interaction scene is completed according to the indoor layout information and the position information of the indoor objects, comprising:

cropping the indoor image;

Detecting the position coordinates of the wall connection part in the indoor image through the information of the wall connection part output by the convolutional neural network;

extracting center coordinates of indoor objects in the indoor image;

Obtain the position of the indoor object through the relationship between the position coordinates of the wall connection portion in the indoor image and the center coordinates of the indoor object;

Based on the position of the indoor object and the information of the connecting part of the wall, the construction of the indoor and virtual reality interaction scene is completed.

7. An apparatus for constructing an indoor VR scene with joint constraints of image semantics and scene geometry, comprising:

an acquisition unit configured to acquire an indoor image and the height of the camera from the ground when the indoor image is captured;

a processing unit configured to preprocess the indoor image;

a first input-output unit, configured to input the pre-processed indoor image into a pre-trained deep residual network, and output feature information of the indoor image;

a detection unit, configured to detect information of straight lines in the feature information;

The second input and output unit is configured to input the information of the straight line into the convolutional neural network, and output the information of the connected part of the wall in the indoor image, wherein the connected part of the wall is in the indoor image is the straight line connecting the walls;

a first determining unit, configured to obtain indoor layout information based on the height of the camera from the ground and information on the connecting portion of the wall;

a third input-output unit, configured to input the feature information into a pre-trained target detection network, and output indoor object information in the indoor image;

a second determining unit, configured to obtain the position information of the indoor object based on the indoor object information and the information of the wall connecting part;

The third determining unit is configured to complete the construction of the indoor and virtual reality interaction scene according to the indoor layout information and the position information of the indoor objects.

8. An electronic device comprising:

one or more processors;

a storage device for storing one or more programs;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-6.

9. A computer-readable medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.